PyDev of the Week: Stefanie Molin

This week we welcome Stefanie Molin (@StefanieMolin) as our PyDev of the Week! Stefanie is the author of Hands-On Data Analysis with Pandas. You can learn more about Stefanie by visiting her website or checking out Stefanie’s GitHub profile.

Let’s take a few moments to get to know Stefanie better!

Can you tell us a little about yourself (hobbies, education, etc.)

I am a software engineer and data scientist at Bloomberg in New York City. I work at the intersection of information security and data science. A lot of my time revolves around data wrangling and visualization, building tools for gathering data and providing context, and knowledge sharing. I am also the author of “Hands-On Data Analysis with Pandas,” which is currently in its second edition.

I graduated from Columbia University’s Fu Foundation School of Engineering and Applied Science with a bachelor’s degree in operations research, and I am currently pursuing a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In my free time, I enjoy traveling the world, reading, inventing new recipes, and learning new languages spoken among both people and computers.

Why did you start using Python?

In a previous job, I wanted to improve a model for an alerting system my team had built, but we didn’t have any labeled data. I decided to build a web app to make it possible for users to provide feedback on alerts they received, as well as ones that others received. I was coding in R at that point, so building a Shiny app was an option; however, a teammate suggested I use Python since it would be easier to set up on our server. I spent the next three weeks building out an initial version as a Flask app. Python was easy to learn and more compatible with our infrastructure – it helped to have other coders on the team using the same language.

What other programming languages do you know and which is your favorite?

Other than Python, I currently use React, JavaScript (both vanilla and D3), as well as some Bash and Arduino (C++). I also have prior experience with R and Java, but those are a bit rusty now. Python is definitely my favorite language for most projects, but I like React for front-end web development. Once you get over the (steep) learning curve, there is so much you can do quickly, and it makes it much easier to create an aesthetically pleasing web app when you use a framework.

What projects are you working on now?

I’m in the homestretch of my master’s program in computer science. Other than that, I’ve been working on a pandas workshop and another on data visualization in Python – both of which I’ve been presenting at conferences like ODSC and PyCon.

I’ve also recently dipped my toes into open source contributions. I fixed multiple data visualization bugs for the pandas 1.5.0 release, and I added a new functionality for adding reference lines to JointGrid and FacetGrid plots (which seems to be a crowd favorite) in the Seaborn 0.11.2 release. I love to open my data visualization in Python workshop by telling everyone that after the first section they will know enough to have made that Seaborn contribution. While I was in the process of building the workshop, I was also able to fix a bug in the anatomy of a figure page in Matplotlib’s documentation just before the release candidate at the time became the final release.

Which Python libraries are your favorite (core or 3rd party)?

From the standard library, I’m a huge fan of itertools (so much so that it’s almost a signature of my code), calendar, and collections – the standard library is packed with useful functionality waiting to be discovered.

As far as third-party libraries, my favorites are pandas (of course), matplotlib (people love to hate it, but you can do so much if you take the time to understand the API – check out the first section of my data visualization in Python workshop for help with that), and flask.

How did you decide to be a book author?

I took a couple of Java courses during my undergraduate studies, but it wasn’t until I was in the workforce that I discovered my interest in the intersection of data science and software engineering. I did a lot of self-study to move my career in that direction. Part of this involved reading several data science and programming books. At the time, I felt that writing a book myself would be a good way to package up all that knowledge, kind of like a thesis.

Personally, when I’m new to something, I like to see meaningful examples, not just made-up data. This helps me understand why you would do something, as well as recognize it when I see a use case for it; this was my value proposition for the book I wanted to write.

I never actually thought of a subject for the book. One day, my publisher (Packt) reached out to me looking for an author for a book on data analysis using pandas. The timing was right, so I went for it – and about a year later, “Hands-On Data Analysis with Pandas” was published.

What are the top 3 things you learned while writing a book?

While writing the first few drafts, I quickly learned that just having an outline of the book and a list of 3-5 main concepts for each chapter wasn’t enough to contend with writer’s block – and sometimes, to even get started. It was worth the extra time to break down each of the concepts into another couple of levels until I had a detailed outline for each chapter, along with personal deadlines for each part. This approach helped keep me on track for the chapter deadlines I had set with my publisher. Breaking big projects down into much smaller pieces is how I code, but I learned that this approach also generally lends itself well to large projects.

I was fortunate enough to have a small group of friends and colleagues who provided me with constructive criticism throughout the process. It was a bit of a trial by fire, but I learned how to accept such feedback and used it to improve the final product, while not taking it personally. After that initial round of review, I ended up rewriting more than half of the content in some of the chapters – it was hard to throw out so much content that I had labored over, but taking their feedback into account really made for a better product.

I can be very hard on myself and that translates to perfectionism, so I also had to learn that it’s really more of an optimization problem (of time, quality, being happy with the final product, etc.). I was never going to be able to make sure the draft was 100% free of typos (after all, the book is nearly 800 pages!), but I could control the overall quality of the content and whether it was something that I could be proud of.

Is there anything else you’d like to say?

Push yourself; do things that scare you (within reason) – you only grow by going outside of your comfort zone, and you will be surprised at what you are capable of achieving.

Thanks so much for doing the interview, Stefanie!