PyDev of the Week: Joe Kaufeld

This week we welcome Joe Kaufeld as our PyDev of the Week! Joe has been a convention organizer for more than a decade and cofounded TranscribersOfReddit. Joe is active in the Python community and has been working with the language for many years.

Let’s spend some time getting to know Joe better!

Can you tell us a little about yourself (hobbies, education, etc)?

General intro time! I’m a self-taught developer based in Indianapolis, Indiana. When I’m not thinking about code, you can usually find me covered in sawdust or under a pile of cats.

I have a BS in Information Systems and Operations Management from Ball State University, which prepared me for an illustrious career as a consultant. After graduation, I became a consultant and quickly realized that I hated it! I transitioned to an IT helpdesk role and worked my way up to my current job: senior developer. Along the way, I launched a few side projects including an international nonprofit and a very active reference library for 3D printing plastics. (More on both of those later!)

I live in a relatively sleepy corner of the city with my wonderful partner and three cats.

Why did you start using Python?

During my undergrad, I stumbled across a neglected room in the College of Business that hadn’t been touched since 2003. It had been dedicated to a student project to create a Beowulf cluster utilizing discarded university equipment. After I tracked down the professor who was in charge of the room, I asked if I could use it and he basically threw the door key at me and said “have fun!”. I replaced all of the equipment with new-ish (circa 2012) hardware which was being removed from other departments on campus, threw out all the old hardware, and built a 115 node cluster on CentOS 5.

Given that I had complete freedom over how to actually make software run on the monstrosity, I spent a lot of time evaluating different options and languages. Eventually, I landed on Python 2.7.0 and Parallel Python, a HTTP-based multiprocessing library. I used that to develop a system called ClusterBuster, a surprisingly fast Windows NTLM password cracker.

The absolute ease of development was a gamechanger to me; my previous experience had been QBASIC and FORTRAN, and compared to these, Python was a breath of fresh air that I didn’t know I needed.

What other programming languages do you know and which is your favorite?

My first was QBASIC: when I was 11, my dad sat me down in front of a Windows 95 machine, opened the familiar blue screen of QBASIC, hit F1 to bring up the help menu, and left me to my own devices with an Epson dot-matrix printer. I used this knowledge to convince (bribe) my math teacher into letting me trade my QBASIC program which did the day’s math homework for the worksheets. Since I could explain to the computer how to solve it, then I clearly understood what the homework was supposed to cover. That excuse probably wouldn’t fly today!

I picked up some FORTRAN in high school because I was bored, but honestly never enjoyed it. The Myspace era introduced me to HTML and CSS, and a proper loathing for JS followed shortly after. I still do a fair amount of JS, but my bitter complaining during and after hasn’t changed. Once I was introduced to Python, everything really clicked into place – I found a language that just made sense to me and how my brain works.

Since then, I’ve toyed with other languages – C++, Rust, Ruby, Crystal, and a brief fling with Java come to mind – but my programming love remains Python.

What projects are you working on now?

My day job is as a Python developer building APIs with Django, but around that, I run FilamentColors.xyz, a color matching tool for 3D printer filaments. That’s something that’s more or less under constant development, as I’m always wanting to show off the collection in the best light and make it as usable as I can. We have about 880 filaments cataloged and published and another roughly 800 currently in progress, as I had to pause to work on a refresh of the UI.

Besides FilamentColors, I also serve as the president and cofounder of the Grafeas Group, a 501c3 that focuses on increasing accessibility on the internet. (You may have seen us featured on WIRED!) With ~5800 volunteers across the globe, we convert images, video, and audio into the one format everyone can access: text. The nonprofit oversees a community on Reddit called TranscribersOfReddit, a “job board” of sorts that anyone can join. We provide templates and guidance designed in close collaboration with r/Blind, the primary community on Reddit for blind and visually-impaired folks. You may have seen our work sprinkled across Reddit in the form of a comment that starts with “Image Transcription:” and ends with “I’m a human volunteer content transcriber and you could be too!” with a complete description of the linked content written out in between. To date, we’ve done a little over 267,500 transcriptions all across Reddit and plan on doing many more!

Which Python libraries are your favorite (core or 3rd party)?

I have literally made “knowing a lot about Django” into a career, so I am immensely thankful to the core Django team and all the contributors who have helped make it what it is today. Besides that, I really enjoy working with (in absolutely no particular order):

poetry (dependency manager that just works): https://python-poetry.org/
httpx (the last http client you need): https://github.com/encode/httpx
shiv (python zipapps but with venv support): https://github.com/linkedin/shiv
PRAW (Reddit API wrapper): https://praw.readthedocs.io/en/stable/
django-htmx (native HTMX support in views for Django): https://github.com/adamchainz/django-htmx
black (the uncompromising code formatter): https://github.com/psf/black

There are so many amazing packages and libraries out there; out of everything I work with regularly, Django is definitely the clear favorite (though it’s technically a framework and not a library), so in the spirit of answering the question as it’s worded, I really have to call out Poetry and Black; when starting a new project, the first commands I run are:

poetry init
poetry add black –group dev

How did TranscribersOfReddit come about?

Back in 2014, I had a really crappy phone. It was tiny, had approximately zero reasonable features, and (most importantly) couldn’t zoom in on images very well. I was (and continue to be) an avid reader of r/DnDGreentext, a community on Reddit that is all about roleplaying groups doing silly things, often in game-breaking ways. Occasionally there are images posted that are massive screenshots stitched together from game stories from different places around the web, and whenever I found one of those, I’d save it on mobile and type it up on my computer on my lunch break. I used it as typing practice at the time and I figured, “hey, maybe there’s someone else like me who can’t read these for some reason or another. I’ll just put the work out there so if someone needs it, it’s there.”

Time passed and I eventually started getting pings from other readers essentially asking when I would have a transcription up for reading, since it turns out that even manipulating these giant screenshots on desktop doesn’t even work that well sometimes.

More time passed and then something strange happened: someone started racing me to transcribe these screenshots whenever they popped up. I thought, “well that’s weird, I don’t know why someone else would want to do this too,” and so I just kind of ignored them for a while. We eventually chatted and found out that he was a bored college student while I was a bored help desk tech. After talking about some of the things we’d learned, we decided to set up a subreddit and some basic scripting (Python and PRAW to the rescue!) to create a kind of job board so that we knew what the other person was working on and didn’t duplicate work.

As I built the initial bot, I kept thinking, “I wonder if there are other people who would do this too?”. We decided that we would announce the opening of our little system, and we were so excited that we didn’t check the date: we opened officially on April 1st, 2017.

That didn’t stop people from joining; at the end of the first day, we had 30 people signed up and one very serious message from a lead moderator of r/Blind that essentially boiled down to “do you understand what a lifechanging tool this could be?” We immediately started working closely together to ensure we could mesh these two visions and the modern TranscribersOfReddit was born!

What are the top three things you’ve learned creating TranscribersOfReddit?

I’ve made a lot of mistakes over my career (as we all have) and I am lucky enough that I’ve made some of my more serious gaffes on TranscribersOfReddit (ToR) and not at my day job! These are the situations and things that immediately come to mind:

Design projects to match the requirements AND the team
- When we launched ToR, we really didn’t plan on really anyone to join us. As such, I took a lot of shortcuts while writing the bots that power the subreddit. As it quickly became evident that we needed to be able to handle a lot more traffic, I rebuilt things using a more microservice-y design because I simply didn’t know how much traffic we would need to handle and that’s what I was familiar with using at the Fortune 500 company where I was a developer.Turns out that microservices are basically a death sentence for an all-volunteer team, and development progress essentially stagnated until we “de-microserviced” our systems into a core Django monolith with the bots acting as services around it. We went from maybe one deployment a month (because it required several people, careful timing, and shoddy documentation) to a fully automated process that can be deployed multiple times per day without thinking twice by anyone on the team.
You cannot grow a community in a bubble
- If it was just still the two of us, I know for a fact that ToR would not have grown the way that it has. With our wonderful industry contacts and the amazing team that help keep everything running – depending on the time of the year, between 17 and 23 people – not an idea comes up that doesn’t immediately get hit with “Well, what about X or Y?” This is pretty much guaranteed to be something the rest of us haven’t considered, and being able to have these discussions means that the feature or idea will be stronger and more useful. If we had never gotten that message from r/Blind, we would absolutely not be where we are today.
It takes a village
- I am guilty (as many of us are) of trying to shoulder every aspect of a project myself, and the one thing that has been beaten over my head time after time with ToR is that I simply can’t do it alone. There’s just too much. The Grafeas board and our wonderful team of ToR mods are indispensable. Without the team, there is no ToR. Without the volunteers, there is no ToR. Without the readers, there is no ToR. It takes so many people and visions to make this all work and I am so immensely thankful to every single one of them.

Is there anything else you’d like to say?

I have so much to thank the wider Python community for, because the things I build are built on the shoulders of giants. I love the ecosystem of Python and how varied it is – and how open everything is – and it’s just an absolute joy. To Mike: thanks so much for offering me the opportunity to be a part of this series!

Thanks for doing the interview, Joe!