PyDev of the Week: Ted Petrou

This week we welcome Ted Petrou (@TedPetrou) as our PyDev of the Week! Ted is the author of the Pandas Cookbook and also teaches Pandas in several courses on Udemy. Let’s take some time to get to know Ted better!

Can you tell us a little about yourself (hobbies, education, etc):

I graduated with a masters degree in statistics from Rice University in Houston, Texas in 2006. During my degree, I never heard the phrase “machine learning” uttered even once and it was several years before the field of data science became popular. I had entered the program pursuing a Ph.D with just six other students. Although statistics was a highly viable career at the time, it wasn’t nearly as popular as it is today.

After limping out of the program with a masters degree, I looked into the fields of actuarial science, became a professional poker play, taught high school math, built reports with SQL and Excel VBA as a financial analyst before becoming a data scientist at Schlumberger. During my stint as a data scientist, I started the meetup group Houston Data Science where I gave tutorials on various Python data science topics. Once I accumulated enough material, I started my company Dunder Data, teaching data science full time.

Why did you start using Python?

I began using Python when I took an introductory course offered by Rice University on coursera.org in 2013 when I was teaching high school math. I had done quite a bit of programming prior to that, but had never heard of Python before. It was a great course where we built a new game each week.

What other programming languages do you know and which is your favorite?

I began programming on a TI-82 calculator about 22 years ago. There was a minimal built-in language that my friends and I would use to build games. I remember making choose-your-own adventure games using the menu command. I took classes in C and Java in college and worked with R as a graduate student. A while later, I learned enough HTML and JavaScript to build basic websites. I also know SQL quite well and have done some work in Excel VBA.

My favorite language is Python, but I have no emotional attachment to it. I’d actually prefer to use a language that is statically typed, but I don’t have much of a choice as the demand for Python is increasing.

What projects are you working on now?

Outside of building material for my courses, I have two major projects, Dexplo and Dexplot, that are in current development. Dexplo is a data analysis library similar to pandas with the goal of having simpler syntax, better performance, one obvious way to perform common tasks, and more functionality. Dexplot is similar to seaborn with similar goals.

Which Python libraries are your favorite (core or 3rd party)?

I really enjoy using scikit-learn. The uniformity of the estimator object makes it easy to use. The recent introduction of the ColumnTransformer and the upgrade to the OneHotEncoder have really improved the library.

I see you are an author. How did that come about?

I contacted both O’Reilly and Packt Publishing with book ideas. O’Reilly offered me an online course and Packt needed an author for Pandas Cookbook. I really wanted to write a book and already had lots of material available from teaching my data science bootcamp so I went with Packt.

Do you have any advice for other aspiring authors?

It really helps to teach the material before you publish it. The feedback you get from the students is invaluable to making improvements. You can see firsthand what works and what needs to be changed.

What’s the origin story for your company, Dunder Data?

During my time as a data scientist at Schlumberger, I participated in several weeks of poorly taught corporate training. This experience motivated me to start creating tutorials. I started the Houston Data Science meetup group which helped lay the foundation for my Dunder Data. Many people ask if the “Dunder” is related to Dunder Mifflin, the paper company from the popular TV show, The Office. The connection is coincidental, as dunder refers to “magic” or “special” methods. The idea is that Dunder Data translates to “Magical Data”.

Thanks for doing the interview, Ted!