PyDev of the Week: Henry Schreiner III

This week we welcome Henry Schreiner (@HenrySchreiner3) as our PyDev of the Week! Henry is an open-source maintainer/core contributor to multiple projects related to scikit, PyPA, and more. You can get a feel for what Henry is up to by checking out his GitHub profile. Henry also created a “Level Up Your Python” course that is free online.

Henry Schreiner

Let’s spend some time getting to know Henry better!

Can you tell us a little about yourself (hobbies, education, etc):

I’m a Computational Physicist and Research Software Engineer (RSE) working at Princeton University. I work with IRIS-HEP, a multi-university institute for sustainable software in High Energy Physics (HEP). We are building software supporting reproducible analyses of HEP data. I work in several areas, focusing on the underlying software stack and packaging for analysis systems; when I’m not doing that, I work on histogram related tools. I’m also involved in innovative algorithm development and the teaching and outreach efforts.

In my spare time, I do a lot of OSS contributions, so most of my other hobbies are things I can do with my family. I’ve started introducing my 5-year-old to roller coasters; I like computer modeling and animation, and special effects, but don’t have much time to do it anymore.

 

Why did you start using Python?

About the time I started college, I started using Blender; it was my introduction to the open source community, and quickly became my favorite program. It had a built-in Python interpreter, and I really wanted to learn to use Python because of it. This made me really want to learn Python. During an Research Experience for Undergraduates (REU) at Northwestern University in 2008, I got the chance to work on several clusters – and submitting jobs was a pain. I used Python to monitor and submit jobs; this enabled me to multithread and do a lot more than the old bash scripts; I could to pick up the empty nodes before anyone else could by hand. By the time I left, everyone wanted my scripts.

Working on my Ph.D. in High Energy Physics at UT Austin, I rewrote a large Matlab analysis codebase, and then slowly moved it over to Python. The new code could run anywhere on any machine, including in the jungles of Belize, where I didn’t have access to a Matlab license. Near the end of my graduate work, I was offered the role of release manager on Plumbum, my first time being a maintainer on a continuing open source project. After starting at CERN as a postdoc, I found a growing community of Python analysts, so I’ve been involved in the Python ever since – often sitting on the C++ / Python boundary or working with Python. I helped start Scikit-HEP, a collection of Python packages for HEP.


What other programming languages do you know and which is your favorite?

I started with C++; I wrote CLI11, a popular command line parser for C++, and I’m a pybind11 maintainer. I love the changes made to the language every three years, but do get frustrated by the difficulty in using it with Python – we are backsliding in C++ standard support in the toolchain rather than moving forward, due to the loss of CentOS LTS releases to base manylinux on.

I know a little C, but I’m lousy at it, and intend to stay that way; I enjoy Object Oriented programming a little too much. I’m also heavily involved in CMake, which is technically a language too. I’m very fond of Ruby; I use it for Jekyll and Homebrew; it’s like “Python without the training wheels”, and I love the things it lets you do – being a great chef is easier with sharp knives, even though they are dangerous. I’ve also written a lot of Matlab, but haven’t used it in years. I know some Lua, largely for LuaLaTeX, but also for some research work – a tiny language designed to be embedded in applications is a really cool idea – much like the way Blender uses Python.

Due to the community, scope, and support, Python is my favorite. If I were to pick a language to learn next, I’d be torn between Rust and Haskell – but at this point, I’d probably go with Rust. It’s turning into a great language to write Python extensions.

 

What projects are you working on now?

For work, I work on boost-histogram / histvectorawkward-arrayparticleDecayLanguageScikit-HEP/cookie and other packages in Scikit-HEP. We have 30-40 packages at this point, and I help with at least the packing on many of them. I also work on training materials, like Modern CMakeLevel Up Your Python, and several minicourses, and the Scikit-HEP developer pages. As a mix of work and free time, I work on cibuildwheelpybind11buildscikit-build, and GooFit. In my free time, I work on CLI11 and plumbum. I also blog occasionally on iscinumpy.dev. I also contribute to various OSS projects.

 

Which Python libraries are your favorite (core or 3rd party)?

Many of my favorite projects I ended up becoming a maintainer on, so I’ll just focus on ones I am not officially part of.

Pipx is a fantastic tool that now lives alongside pip in the Python Packaging Authority (PyPA). A lot of time is spent trying to teach new Python users to work with virtual environments, and version conflicts are being more common (due to over use of pre-emptive capping, a pet peeve of mine); but pipx skips all that for applications – you can just use pipx instead of pip and then version conflicts and the slow pip update solves just go away. I really like pipx run, which will download and run an application in one step, even on CI; GitHub Actions & Azure provides it as a supported package manager, even without actions/setup-python – perfect for easy composite shell actions (like cibuildwheel’s)! pipx run even caches the environment and reuses it if it’s less than a week old, so I no longer have to think about what’s installed or what’s out-of-date locally, I just use pipx run to access all of PyPI anywhere (that I have pipx, which is everywhere). (I’m a homebrew macOS user, so pipx install – or any install doesn’t work well with the automatic Python upgrades, but pipx run works beautifully.)

I used to dislike tox – it had a weird language, bad defaults, ugly output, and didn’t tell a user how to run commands themselves if they wanted to set up things themselves. While Tox4 is likely better, I’ve really loved Nox. It (intentionally) looks like pytest, it doesn’t hide or assume anything, it works for much more than packaging – it’s almost like a script runner with venv (and conda/mamba) support, with pretty printouts.

Getting away from the common theme of packaging above, I also love pretty-printing and color, so I’ll have to call out the Textualize libraries, Rich / Textual; they are beautiful.

For the standard library, I love contextlib; context managers are fantastic, and a bit underused, and it has some really nice newer additions too.

 

How did you end up working on so many Python packages?

I got involved with Scikit-HEP at the beginning, and there we quickly collected older packages that were in need of maintenance. Working on a large number of packages at the same time helps you appreciate using common, shared tools for the job, rather than writing your own. It also forces you to appreciate packaging. Many of the packages I work on are used heavily by the code I started with.

Besides, show anyone that you can help them with packaging and they will usually take you on in a heartbeat. 🙂

 

Of the Python packages, you have worked on or created, which is your favorite and why?

Each package is for a different use, it’s hard to choose a favorite. I have a reason to like and be involved in all of them. Probably my favorite project was the most different from what I normally do – the Princeton Open Ventilation Monitor project. In early 2020, a group of physicists, engineers, and scientists got together and developed a device to monitor airflow in ventilator systems, initially working with our local hospitals. I developed both the backend software, the graphical interface, and the on-device interface too, while Jim Pivarski (of Awkward-Array) developed the breath analysis code. It was an incredibly intense month for all of us, but in the end we had a great device and a really powerful multi-device software system (which is now all open source with open access designs). It was really fun to work on something that was not a library; I got to design for Python 3.7 instead of 2.7+ (3.6+ today), and I worked with things I wouldn’t normally get to, like PyQT, line displays and rotary controls, and lots of threading. This is also where I properly learned to use static typing & MyPy, which was critical in writing code for hardware that wasn’t even built yet.

I have other exciting things planned that might take that “favorite” title. I’m hoping to get the chance to rewrite scikit-build. I’m planning on using richtextual, and plotext to make a HEP data browser in the terminal – which would also be an “app”.

 

Is there anything else you’d like to say?

Don’t undervalue consistency, readability, and static analysis, which makes code easier to read and maintain with less effort, and often helps keep bugs down. Reading code that is not yours is incredibly important skill, as is packaging, so you can use code others wrote without rewriting it yourself. Tools like pre-commit, mypy, and nox really help code be more accessible. If you make choices that seem to help one specific case, that is almost never worth the loss in consistency which helps others easily digest your code and either use it or even contribute to it. Providing a noxfile can really help “fly-by” contributors!

It’s okay to abandon a project (azure-wheel-helpers, in my case) when you find a library (cibuildwheel) that is better than yours, and instead contribute to that. By helping them, you can help a larger audience, and avoid duplicating work.

I’d highly recommend reading scikit-hep.org/developer (with an accompanying cookiecutter!) if you are developing code, even if you are not developing in HEP or even scientific fields. I also contribute to packaging.python.org, but I’m a lot more free to be opinionated there and recommend specific workflows and tools.

 

Thanks for doing the interview, Henry!