This week we welcome Elizabeth Sander as our PyDev of the Week! Elizabeth is a data scientist at Civis Analytics. She has her own website where you can learn a lot of interesting background information about her. If you are more interested in her open source projects, then her Github profile may be what you really want to check out. Let’s take a few moments to get to know Elizabeth better!
Can you tell us a little about yourself (hobbies, education, etc):
My background is actually in computational biology, basically looking at how food webs are structured, and how to predict network structure from other kinds of data. I was bitten by the software bug in grad school. I tried to turn all of my research into software packages, or at least a series of scripts that could reproduce my work. This is surprisingly uncommon in academia, at least in ecology! I finished my Ph.D. in Ecology and Evolution last year, and now I’m a R&D data scientist at Civis Analytics. Now I get to work on software to help other data scientists do their job, which is a fun balance of programming and analysis.
I have lots of hobbies outside of software, maybe a few too many. I do a lot of circus arts, especially trapeze and contact staff, and I have a lot of crafty hobbies like sewing and knitting. I’m also a big gamer, whether it’s board games, role playing games, or video games.
Why did you start using Python?
I started using Python in grad school for basic scripting and prototyping. I needed to run simulations and optimizations in compiled languages, but Python was perfect for building out a quick prototype without having to worry about details like memory management. I also had a Python script I used all of the time, to automate submitting my jobs to our department’s computing cluster. I accidentally took over the cluster a few times that way, so I had to write another one to automate deleting all of my jobs from the cluster (belated apologies to Ian, our poor sysadmin!).
What other programming languages do you know and which is your favorite?
I like to dabble in lots of different languages, but the ones I’m most comfortable in are R, Python, and C. My favorite depends on the task. I like learning the core abstractions of a language, to understand how it “thinks” about problems, and see what problems it “cares” most about. So my favorite language is whichever one cares most about the problem I’m working on! R cares about giving you easy and immediate access to tabular data. So when I want to explore, manipulate, and plot data, R is my favorite. I think C cares about giving you full access to a computer’s capabilities, at your own risk! Python wants to be really flexible and be pretty good at everything, so it’s great as an all-purpose language, and it’s great for prototyping. It also really cares about being beautiful to read, so I find myself turning to it for all kinds of problems.
What projects are you working on now?
I help maintain a big project for Civis Analytics called CivisML, which is essentially wraps around scikit-learn models and runs them in parallel in the cloud. It’s a fun project because it’s really about taking away all of the frustrating parts of data analysis for the user, which is all of the devops and boilerplate. I love working on projects where I can see it making someone’s day better.
I don’t have any software side projects going on right now, but I like to contribute to open source every fall for Hacktoberfest! I mostly worked on documentation last Hacktoberfest, which can be a great way to help out even if you aren’t super familiar with the details of a codebase.
Which Python libraries are your favorite (core or 3rd party)?
I love scikit-learn because it has an incredible API. It really matches the way I think about models, and it’s so easy to use, extend, and build around. It’s just a joy to work with. I love scikit-learn so much, I’m giving a whole talk about it at PyCon this year!
It’s not a library, but I also have to mention dictionaries. They’re so core to Python and they’re really versatile and convenient for all kinds of data. My first language was R, which doesn’t have a true native dictionary or hashmap structure. It took me a while to understand why I’d use a dict instead of a list, but once it clicked, it really changed how I think about structuring data.
Is there anything else you’d like to say?
If you write Python and you haven’t been to PyCon, you’re missing out! It’s a ton of fun, it’s a great way to learn what’s going on in the community, and people are super friendly. They’re also pretty generous with scholarships for folks who need them, especially if it’s your first time attending. Plus, there are a lot of fun impromptu events, so you can find your tribe even if you don’t know anyone. When I went, I ended up at a board game night, a juggling lesson, and an acroyoga event, and met lots of lovely people. I’ll be there again this year, so if you see me, say hi!
Thanks for doing the interview!