This week we welcome Tyler Reddy (@Tyler_Reddy) as our PyDev of the Week! Tyler is a core developer of Scipy and Numpy. He has also worked on the MDAnalysis library, which is for particle physics simulation analysis. If you’re interested in seeing some of his contributions, you can check out his Github profile. Let’s spend some time getting to know Tyler better!
Can you tell us a little about yourself (hobbies, education, etc):
I grew up in Dartmouth, Nova Scotia, Canada and stayed there until my late twenties. My Bachelor and PhD degrees were both in biochemistry, focused on structural biology. I did travel a lot for chess, winning a few notable tournaments in my early teen years and achieving a master rating in Canada by my late teens. Dartmouth is also known as the “City of Lakes,” and I grew up paddling on the nearby Lake Banook. In the cold Canadian Winter the lake would freeze over and training would switch to a routine including distance running—this is where my biggest “hobby” really took off. I still run about 11 miles daily in the early morning.
I did an almost six year post-doc in Oxford, United Kingdom. I had started to realize during my PhD that my skill set was better suited to computational work than work on the lab bench. Formally, I was still a biol- ogist while at Oxford, but it was becoming clear that my contributions were starting to look a lot more like applied computer science and computational geometry in particular. I was recruited to Los Alamos National Labora- tory to work on viruses (the kind that make a person, not computer, sick), but ultimately my job has evolved into applied computer scientist here, and nothing beats distance running in beautiful Santa Fe, NM.
Why did you start using Python?
I think it started during my PhD with Jan Rainey in Canada. He was pretty good about letting me explore ways to use programming to make research processes more efficient, even when I might have been better off in the short term by “just doing the science.” Eventually my curiosity grew to the point where I just read one of the editions of Mark Lutz’s “Learning Python” from cover to cover. I very rarely used the terminal to test things out while reading the book—I just kept going through chapters feverishly—I suppose Python is pretty readable! I still prefer reading books to random experimenting when approaching new problems/languages, though I don’t always have the time/luxury to do so. I remember reading Peter Seibel’s “Coders at Work,” and making a list of all the books the famous programmers interviewed there were talking about.
What other programming languages do you know and which is your favorite?
During my second postdoc at Los Alamos I read Stephen Kochan’s “Pro- gramming in C.” For that book I did basically do every single exercise in the terminal as I read it—I found that far more necessary with C than Python to get the ideas to stick. I had made an earlier attempt at reading the classic “The C Programming Language” book by K&R and found it rather hard to learn from! I thought I was doing something wrong since it was described as a classic in “Coders at Work,” I think. I’ll probably never go back to that book now, but I certainly get a lot of mileage out of my C knowledge these days.
I did a sabbatical at UC Berkeley with Stéfan van der Walt and the NumPy core team, working on open source full time for a year. NumPy is written in C under the hood, so it was essential I could at least read the source. A lot of the algorithm implementations in SciPy that I review or write are written in the hybrid Cython (C/Python) language to speed up the inner loops, etc.
I’ve also written a fair bit of tcl, and I write a lot of CMake code these days at work.
Python easily wins out as my favorite language, but C isn’t too far be- hind. I have to agree with the high-profile authors in “Coders at Work” who described C as “beautiful” (or similar) and C++ as, well, something else. Indeed, the NumPy team wrote a custom type templating language in C, processed by Python, instead of using C++. That said, Bjarne did visit UC Berkeley while I was there and it sounds like C++ may be taking a few more ideas from the Python world in the future!
What projects are you working on now?
I’m the release manager for SciPy, which has been my main long-term open source project focus in recent years. I’ve been trying really hard to improve the computational geometry algorithms available in SciPy—both in terms of adding new ones from the recent mathematics literature and improving the ones we already have.
A lot of my time goes into code review now though. I don’t mind— that’s kind of how it works—if I’m going to expect the other core devs and community to review my code and help me get over the finish line I should be ready to do the same for them. Indeed, as funding is now starting to show up a bit more for some OSS projects we’re quickly realizing that just dumping a bunch of new code on the core team/community will quickly cause a problem—review bandwidth is really important.
I’ve had a few rejected proposals for funding for computational geometry work in scipy.spatial, but I will keep trying! We recently wrote a paper for SciPy, which was a lot of work with such a big group/history/body of code, but probably worth it in the end.
I also try to stay involved in NumPy code review, especially for infrastructure- related changes (wheels, CI testing, etc.) and some interest I have in datetime code.
My open source journey started with the MDAnalysis library for particle physics simulation analysis. I try to help out there too, but just keeping up with the emails/notifications for 3+ OSS projects is extremely hard in mostly free time. I try to track notifications/stay somewhat involved in what is going on with OpenBLAS and asv as well, though it feels like I’m failing to keep up most of the time!
Which Python libraries are your favorite (core or 3rd party)?
I think hypothesis is probably underrated—some libraries are hesitant to incorporate it into their testing frameworks, but I think the property-based testing has real potential to catch scenarios humans would have a hard time anticipating, or at least that would take a long time to properly plan for. I find that hypothesis almost always adds a few useful test cases I hadn’t thought of that will require special error handling, for example.
Coverage.py is pretty important for showing line coverage, but I wish the broader CI testing ecosystem had more robust/diverse options for displaying coverage data and aggregating results from Python and compiled language source code. A number of the larger projects I work on have issues with reliability of codecov. The Azure Pipelines service has an initial coverage offering—we’ll see if that really takes off. It will be neat if we can soon mouse over a line of tested code and see the name of the test that covers it. I think I saw somewhere that this will perhaps soon be possible.
How did you get involved with SciPy?
My first substantial contribution was the implementation of Spherical Voronoi diagram calculation in
scipy.spatial.SphericalVoronoi. I was working on physics simulations of spherical influenza viruses at the time, and wanted a reliable way to determine the amount of surface area that molecules were occupying. I was fortunate that my postdoc supervisor at the time, Mark Sansom at Oxford, allowed me to explore my interest in computational geom- etry algorithms like that. I gave a talk at what I believe was the second annual PyData London conference about the algorithm implementation, which was still incomplete at the time, and received some really helpful feedback from two expert computational geometers—one was an academic, the other was loosely associated with the CGAL team.
I really enjoyed the process of working with the SciPy team—I remember the first person to ever review my code there was CJ Carey, a computer scientist who is now working at Google. I was pretty intimidated, but they were quite welcoming and I was probably a little too excited when Ralf Gommers, the chair of the steering council, invited me to join the core team. I’ve been hooked ever since!
What are the pros and cons of using SciPy?
You can usually depend on SciPy to have a pretty stable API over time—we generally take changes in behavior quite seriously. A break in backwards compatibility would normally require a long deprecation cycle. The qual- ity/robustness expected for algorithms implemented in SciPy is generally quite high and the library is well-tested, so it is usually best to use SciPy if an algorithm is already available in it. The documentation is of reasonably-high quality and constantly improving and many common questions are answered on i.e., StackOverflow.
If you want to play with experimental algorithms or advocate for a rapid change in behavior, SciPy may not be your first choice. Early adoption of im- mature technologies is usually not likely to happen. Stability and reliability are important at the base of the Python Scientific Computing ecosystem.
How will SciPy / NumPy be changing in the future?
A few things that stand out off the top of my head: improving support for using different backends to perform calculations with NumPy and SciPy (for example, using GPUs or distributed infrastructure), and making it easier to use custom dtypes. You might want to speed up code with Cython or Numba or Pythran and some thought may be required for NumPy and SciPy to remain well-suited for each of those.
I think I’m starting to see indications that binary wheels will eventually become available for PowerPC and ARM architectures, but my impression was that there were still some challenges there.
I think you’ll probably see better published papers/citation targets for these two projects in the future as well. With all the efforts underway to get grants to fund these projects I think we’ll continue to see periods where there will be funded developers driving things forward more quickly, as has happened with the grant for NumPy at UC BIDS.
Thanks for doing the interview, Tyler!