PyDev of the Week: Peter Baumgartner

This week we welcome Peter Baumgartner (@pmbaumgartner) as our PyDev of the Week! Peter is a fellow Python blogger who writes about Python and data science. Peter also has a collection of interesting Jupyter Notebooks that you can use to learn from. You can see what projects Peter is working on over on GitHub.

Let’s take a few moments to get to know Peter better!

Can you tell us a little about yourself (hobbies, education, etc):

I’m currently a Machine Learning Engineer at Explosion. Prior to this, I worked at a non-profit research institute called RTI International, and when I started in data science using python I worked for Deloitte. I earned my Masters in Analytics at the Institute for Advanced Analytics at NC State University. Prior to that I was a high school math teacher.

My primary hobbies are running and creating art with a pen plotter. Every weekend I try to run or volunteer at a local timed 5k put on by Parkrun, which is a really cool organization I encourage runners of any ability to check out. A pen plotter is basically a robot you program to draw – I picked one up last year and post most of my work to twitter.

These days I don’t have too much time for hobbies as most of my free time is spent helping raise my now 1-year-old son, Clark. He’s a really fun, adventurous kid who is enjoying exploring the world now that he’s mobile.

Why did you start using Python?

After I finished my Master’s degree, I was doing some contract work for a local marketing firm. My Master’s program taught us everything in the SAS programming language, but SAS is expensive and in my opinion a painful language to program in, so for this contract work we had decided to use python. It was a real trial-by-fire as I had to learn python and provide useful analysis to the client. In the end, it worked out because it made me realize programming could actually be fun and not always a struggle. Ever since then I’ve been primarily a python user.

What other programming languages do you know and which is your favorite?

The first language I learned was Visual Basic—I took a computer science class in high school that really opened my eyes to cool things you could do in programming. In college I also took a Computer Science course that used C++, but I have absolutely no recollection of that knowledge. I also learned a bit of SAS and R during my Master’s program.

Since I’ve been doing programming professionally, the language I’ve learned the most Julia, which is my favorite non-python language. What I like about it is that it was easy to learn coming from python, which I think is very important. I previously attempted to learn Rust, but the syntax and concepts were different enough that it was too difficult for me at the time. With Julia, it exposed me to the fact that there are different ways to think about and solve problems, and I could conceptually take what I had learned and apply it to how I developed python programs. It also forced me to increase my knowledge in some computer science fundamentals that I had never learned.

In general, I’d encourage everyone to learn a second programming language, but probably one syntactically close to their primary language. It was really helpful for me to learn another language after about 5 years of programming in python just to be exposed to alternative ways of how a programming language could work.

What projects are you working on now?

At Explosion, we just launched our consulting offering called spaCy Tailored Pipelines, which is leading to some very interesting applied natural language processing projects. In addition to that, I spend a lot of time reviewing how people are using our products and improving them by updating documentation, adding examples, or creating new open-source libraries to complement our tools. For example, a recent component I’ve developed simply counts the tokens that people see when they pass text through a spaCy pipeline. I started working on that because I noticed a lot of people were requesting this feature and they were confused about how this should work within spaCy. Another example would be a component that parses text from HTML. Often times people will have data from scraped web pages and want to do natural language processing on it, but if they just take the raw text from the HTML they ignore the structure of the document which might have some negative downstream impacts.

Which Python libraries are your favorite (core or 3rd party)?

I think this is the hardest question to answer because there are so many good ones.

Core:

  • collections – I use Counter and defaultdict all the time.
  • itertools – chain is awesome, groupby is great for data, and I’ve used combinations for work and plotter art
  • pathlib – So much of my work for applied projects deals with paths that I’d be totally lost without pathlib.
  • tempfile – Sometimes I work with libraries that have APIs that require persisting to disk when I’d rather pass a buffer. tempfile makes it super easy to work with these in a clean way.

3rd party: There are ones that I use almost every day: pandasspaCyumap-learnaltairtqdmpytestblacknumpy that are all amazing.

Then there are some libraries that I love and use in specific circumstances, like typerrichquestionary for CLI tools. poetry for packaging. streamlit for making simple apps. numba for faster array operations. sentence-transformers for NLP when sentences are involved. loguru for logging. shapely and vsketch for anything with my plotter.

 

How did you decide to write a Python blog?

I’ve had a blog for a long time but until recently I was publishing on it less than I was happy with. Recently I’ve been trying to reframe my writing process by recognizing that blog posts don’t have to be perfect. I read a lot about good writing, and read a lot of good technical writing, and often times that puts so many constraints in my head when writing something that I never end up finishing anything. The practices for good writing would be useful if I was writing something more formal, like a book, but for my personal blog I give myself permission to not think about that stuff too much.

Where do you get your ideas from when it comes to writing articles?

Almost all of my articles are documenting things that I’ve recently learned. I try to think of writing a blog post as steps in the Feynman technique of learning. I used to be a teacher, so I also try and be cognizant of the Curse of Knowledge and write things down as I’m learning them, rather than after I’m done learning, then reorganize those original thoughts in the way that I think about something after I’ve learned it.

Is there anything else you’d like to say?

Support the developers and organizations that help make the python ecosystem great!


Thanks for doing the interview, Peter!