PyDev of the Week: Jyotika Singh

The PyDev of the Week this week is Jyotika Singh (@JyotikaSingh_). Jyotika is the maintainer of pyAudioProcessing and a speaker at multiple conferences. You can check out what Jyotika is up to by going to her GitHub profile.

Let’s spend a few minutes getting to know Jyotika better!

Can you tell us a little about yourself (hobbies, education, etc): 

I work as the Director of Data Science at Placemakr and volunteer as a mentor at Data Science Nigeria and Women Impact Tech. I actively participate in conferences and webinars to share my knowledge and experiences with the Python and Data Science community, and with students aspiring to a career in software development and data science. As part of my research, I have been granted multiple patents in data science, algorithms, and marketing optimization techniques.

I graduated with a Master’s in Science degree from the University of California, Los Angeles (UCLA), specializing in Signals and Systems.

When not engaged in technology and coding, I enjoy sketching, painting, playing musical instruments, and at times enjoy my evenings at the beach.

Why did you start using Python? 

The first time I used Python was to work on an assignment on data scraping during the course of my MS in 2015. While the course didn’t require me to use Python, I chose to take on the new language given the ease of writing and availability of ML libraries which I was just beginning to look into. As my MS progressed, I kept learning more of Python and got hooked onto its ease and functionality, and compatibility with other tools.

What other programming languages do you know and which is your favorite? 

I have primarily worked with MATLAB, C, Java, R, Golang, and Python. Python has hands-down been my favorite for a while now.

What projects are you working on now? 

For the next few months, I’m working on a few different projects.

  1. Pricing recommendation and optimization models
  2. Content-based recommendation systems
  3. Consumer review analysis and classification
  4. A reinforcement learning-based model for ROI optimization

Furthermore, I’m in the process of writing a book on industrial applications and implementations of Natural Language Processing.

How did you decide to write a book about NLP with Python? 

There is a known gap between what data science graduates master versus the needs of a data scientist in the industry. The majority of the projects start with the availability of data in a state that does not require much thought behind where the data comes from or what data is required. The most commonly available learning resources have attributes far from real-world applications.

For individuals in software development, technology, product management, or those that are new to data science or Natural Language Processing, learning about what to solve and how to proceed can be a challenging problem. Keeping this in mind, I decided to write this book that explains the application across 15 industry verticals. It is set to dig into practical implementations of many popular applications and contains actual code examples. This book will guide users to build applications with Python and highlight the reality of data problems and solutions in the real world.

Which Python libraries are your favorite (core or 3rd party)? 

os, collections, pandas, numpy, sklearn, pytorch, tensorflow, keras, spacy, nltk, matplotlib, and my own pyAudioProcessing.

How did the pyAudioProcessing library come about? 

I was working on a unique audio classification problem. Given the popularity of ML tooling in Python, I was looking to build my audio models in Python. I noticed a large gap between the research and development happening in MATLAB versus the state of 3rd party tooling in Python around audio.

Different types of non-numeric data have different feature formation techniques that work well for numerically representing the data in a meaningful way. An example for text data would be TF-IDF. Similarly, audio data has a completely different and its own set of feature formation techniques that represent the information in a numerical sense. I took it upon myself to mathematically construct audio features from raw audio in Python, and decided to open-source my work while recognizing the need. This gave rise to PyAudioProcessing. Today, you can extract features such as GFCC, MFCC, spectral features, and chroma features using PyAudioProcessing. Its integration with other 3rd party libraries helps extract audio visualizations, audio format conversion, build audio classification models using sklearn models, and use off-the-shelf audio classification models for some common tasks.

What challenges do you face as a maintainer of a Python package? 

Given a packed schedule with my full-time job, research, conferences, and mentorship volunteering, the most challenging bit is taking the time out for continuous development and keeping the library up-to-date with new research, needs, features, and compatibility with the latest Python releases. Contributors are always welcome!

Is there anything else you’d like to say? 

I would like to thank all the people who open-source their work that the community is able to leverage for their personal and professional projects. Also, a huge shout out to people who volunteer and make efforts to share their knowledge and findings via events organized by the Python and Data community.

I’m on Twitter at jyotikasingh_, follow me to catch my latest talks, work, and findings.

Thanks for doing the interview, Jyotika!