PyDev of the Week: Pamphile Roy

This week we welcome Pamphile Roy (@PamphileRoy) as our PyDev of the Week! Pamphile is one of the core developers of Scipy. If you’d like to see what else Pamphile is working on, you can visit his GitHub profile.

Let’s spend some time getting to know Pamphile better!

Can you tell us a little about yourself (hobbies, education, etc):

Hey, I am Pamphile. I am French, more precisely from Tahiti, French Polynesia. I went to France to study aerospace engineering and ended up doing a PhD. I specialized in a sub-field of statistics: uncertainty quantification and sensitivity analysis.

I moved a few years ago to Austria. I could not have invented that one. I met my Austrian wife in Australia.

I have a few hobbies. I dive (yes, French Polynesia is the end game for that!), like to hike (and climb outdoors), travel (a lot, I did world travel and discover a new country at least every year), and do photography. Lastly, I did not choose aerospace by chance, it’s actually one of my passions and I am a private pilot (it’s getting harder and harder to take the time to maintain my license, but I still enjoy flying very much especially acrobatic flights.)

Why did you start using Python?

I did an internship at Airbus to finalize my engineering degree. I was in the simulation department building which is building the ground simulation model used in Airbus’ simulators. Besides the awesome aeronautical experience, I had the chance to have a passionate mentor (hi Florian) who was quite into Python. He taught me Python and also showed me that it was not just a programming language. That there was a community and what open source meant. That was my first encounter with NumPy and SciPy! Well, almost as at the time, only NumPy was allowed on the systems and I was actually re-implementing some optimization methods from SciPy to use for my project.

What other programming languages do you know and which is your favourite?

During my studies, I did a lot of Matlab (loved it, so I hated Python at first before I understood NumPy), a bit of C, too much of VBA, and some R (this one I am still forced to use from time to time…)

Since I started working on SciPy, I have been playing a bit with Cython (I added a few functions for QMC.) and Pythran. I am not a huge fan of both due to the verbosity, but it does well.

I don’t really feel the need right now to do more than Python. I really like the language, the community, what it offers. Along the years, I tried a few things and I am always coming back to it. But that could change as I am trying to get into Rust. For now I am interested in using it as an accelerator for hot code, but who knows.

Besides that, I can find my way around web things. As I worded a few years as a backend engineer, I did some JS, HTML, CSS, usual things I would say. I am almost tempted to add to the list YAML. Some configurations are so complex now (for better or worse.)

Last but not least, I had a LaTeX period during my academic time. A love and hate relationship.

What projects are you working on now?

I am working almost full-time on SciPy as a maintainer! I am fortunate to be working at Quansight and as such, I get to work on open source. We are mostly focused on the Scientific Python stack.

On SciPy I do a lot of things: general maintenance, infrastructure work on the documentation, onboarding of newcomers, support, review PRs, and implement new features. SciPy is an atypical project as its modules are quite different from each other. I am mostly interested in the stats module and never ever touch anything in linalg for instance.

Matt Haberland and I recently got some funding from the CZI to work on the stats module to support the biomedical community.

We are adding interesting things like survival analysis tools, sensitivity indices, etc.

On a different note, I also try to stay academically active. I have a paper in preparation around sensitivity analysis and I participate in a lot of discussions about this topic and Quasi-Monte Carlo methods. As I am working on Scientific Python tooling, I find it very important to connect with the people that actually use what we do. It helps me to have different perspectives, get feedback and also get expert support on what I am actually trying to implement. When I added the QMC module in SciPy, we had a very, very long email chain with a lot of experts in the field. This is why I am extremely confident in the quality of what we released.

Which Python libraries are your favourite (core or 3rd party)

I am putting aside the Scientific Python core stack NumPy, SciPy, pandas, Matplotlib, etc. I am too biased and these are fundamental. The answer varies constantly. But I tend to like Flask (I prefer it over FastAPI for production, this is more of a statement here as I don’t like the way it’s maintained), Pydantic, seaborn, locust, httpx/aiohttp/respx, shapely, SALib, pingouin. I try to give a star on GitHub to projects I like. (When applying for grants, It does help us and sadly SciPy has not that many stars!)

Also, not a library, but I am only using conda/mamba to manage my setup. On a new project, the first thing I always do is to spin up a conda environment before doing anything. I only use pip if the package I want is missing in conda. But still, I am using pip within a conda env. And you know what, it just works. The only issue I have is with libraries distributed with upper pins in their requirements, looking at you Tensorflow (they prevented many people from using NumPy 1.20 for far too long for no good reason) 😉

How did you get involved with the SciPy project?

While doing my PhD, I released my first open-source project. As a young PhD, I had great ambition for my code and wanted it to be useful. I quickly understood that it would be hard for me to “compete” with established libraries or just make people aware of my code. So I sent a few emails to projects asking if they would be interested in Quasi-Monte Carlo methods. I was lucky enough that someone else, Max Balandat, wanted to do the same with SciPy. He already had done that with PyTorch, so we had a “weigh in”.

The PR that included this submodule (scipy.stats.qmc) ended up being one of the largest PR SciPy’s seen, with more than 600 comments! This was tiring, I wanted to drop the ball at least 10 times along the way and thought it would never finish. During the pandemic, we were on a world travel and got stuck in Tahiti. There I got time to finish and move the PR to the finish line.

Even though the whole experience was painful, it felt amazing to know that I had contributed some code that could maybe one day help the Scientific community. This is what motivated me to stay in the loop. I almost instantly started to reply to issues, review PRs and be active in SciPy’s community. And not so long after, the other maintainers offered me to become a maintainer.

What are some of the challenges that you’ve overcome during your work on SciPy?

To me, the big challenge working on SciPy is to not lose your motivation. SciPy is a very intimidating project. We are trying to change that by doing more and more outreach and community-related events, but there are intrinsic aspects of the project that cannot be changed.

One is that SciPy is a mature project, trusted by thousands of libraries. This means that everything we do must be done for a reason to ensure reliability. Every time we add to the public API, we commit ourselves to maintaining the addition for a long time. And when we decide to remove something, we have long deprecation cycles and are very careful about not breaking people’s code all the time.

All that add churn when you contribute to the project and this can be very frustrating. My last large addition was to add Sobol’ indices. It was a year-long discussion to make it happen. These are painful experiences, but it makes it even more rewarding to get to the finish line.

What we do is not always as visible as adding a whole new module or function, but we have something like 500 PRs/issues that go into every release. After more than 20 years of existence, that’s quite impressive to me.

Is there anything else you’d like to say?

If you share code, please carefully consider the license and copyright. We see this way too often on SciPy. Some great code is getting written in a GPL library (almost everything in Matlab and R for instance) and we cannot use it because the license is incompatible with BSD/MIT. When we do ask the authors about the eventuality of relicensing, very often the answer would be: oh sure, I just followed some sort of template and did not think about that… We actually have a case right now with some code a researcher published under GPL and since he passed, the situation is complex…

On that note, I am really on the fence with all the new AI tools like Copilot. To me there is a profound ethical issue with the way they collected and use data. I am really not ok with such practices. I’ve worked at an AI company and we did care about such things and it’s still possible to build models of quality. Yes, it takes more investment to build your own dataset or source proper dataset which comply with legal, moral or ethical rules. But there should be no questions here. It’s also quite paradoxical that we all say we are against data collection from big corp and at the same time completely oversee this.

Thanks for doing the interview, Pamphile!