PyDev of the Week: Saul Pwanson

This week we welcome Saul Pwanson (@saulfp) as our PyDev of the Week! Saul is the creator of VisiData, an interactive multitool for tabular data. If you’d like to see what Saul has been up to, then you should check out his website or his Github profile. You can also support Saul’s open source endeavors on Patreon. Let’s take a few moments to get to know Saul better!

Can you tell us a little about yourself (hobbies, education, etc):

I grew up in Chicagoland in the 80s, was on BBSes in the early 90s, and IRC in college and thereafter. I’ve been once to the Recurse Center in New York, twice to Holland, and six times to Bruno’s in Gerlach, NV. I like crossword puzzles, board games, and point-and-click adventures. One day I’d like to finish my “board simulation” of the awe-inspiring mechanics inside mitochondria.

Why did you start using Python

It was for a job at a startup back in 2004. It’s really great as a scripting language, and the standard library makes most common things easy by itself, with the rest of the ecosystem providing not just one but usually about 4 different ways of doing any task, often including one that works really well. I tip my hat to all the unsung developers of Python libraries who make interfaces to other systems that *just work*. VisiData supports so many data formats simply because the richness of the Python ecosystem makes it easy.

What other programming languages do you know and which is your favorite?

I did a lot of x86 assembly as a teenager in my BBS days, and started using both C and C++ in college. I still use C on a daily basis doing embedded development for my day job. I haven’t used C++ for about 10 years, which means I’m way out of date on it now.

My favorite language, though, is an older language called Forth, which is a brilliant little system and gets you the most bang for your buck in highly constrained environments. (We’re talking kilobytes and megahertz, orders of magnitude fewer resources than most software could even dream of fitting their runtime into). The esssence of Forth is incredibly elegant, with the implementation setting things up “just so” and then everything falls into place naturally by design, with very little actual code.

Programming in Forth has encouraged me to think in very clean ways about my own code in other languages. Often if you’re looking at the VisiData source code, a particular bit of code may seem devastatingly simple and turn out to be subtly and amazingly powerful, but it wasn’t by chance. The rest of the system often has to be designed “just so” that little bit of code can be elegant. I know many modern software engineers might consider that a waste of time, but spending that effort on the core design often leads to other surprising capabilities that then just magically work.

What projects are you working on now?

  • VisiData 2.0, with an API we can get behind, to encourage a rich ecosystem of plugins and loaders.
    Some people like have already started [writing plugins](https://github.com/jsvine/visidata-plugins), and I dream that someday there will be a visidata loader for every format and service that has tabular data.
  • “Where in the Data is Carmen Sanmateo?” a data-diving game a la Noah Veltman’s Command-line Mystery or the Knightlab [SQL murders](https://mystery.knightlab.com/). You know, like a detective game for data nerds.

Which Python libraries are your favorite (core or 3rd party)?

From a quick grep on the VisiData source code, it seems that collections, functools, and itertools are used the most. As for 3rd party utils, I always have to mention python-dateutil. It just makes date parsing so easy, no matter the format, it just figures it out. My only wish is that it allowed access to the deduced format, so you could reformat other dates the same way.

What is VisiData and how did it come about?

VisiData is a playground for tabular data in the terminal. It provides a spreadsheet-like interface for many formats, including even its own internals. I first made a version back in 2011 when I was at F5 Networks. It was surprisingly flexible and unreasonably effective to use for lots of tasks, and after I left that job, I found myself missing it (for example to view and explore HDF5 files which I worked with at my next job). But I couldn’t use that version because F5 owned it, so I decided to remake it completely, and release it as open-source. Then, I could use it for my own projects and at other jobs.

But to do it “right” is a lot of thankless work, making it reliable and seamless in all kinds of situations, and hashing out all the little details and edge cases so that it feels like a tool that doesn’t just get the job done, but is so smooth that it is *fun* to use it. It’s quite a bit of work, and I would never take the time to do it just for myself. But when other people use it and appreciate it, that makes the effort worthwhile; both in a global optimization sense, and in a personal emotional satisfaction sense.

Are there any new challenges or features you expect to add to VisiData?

I really want split-pane: two separate but related VisiData windows in the same terminal, for things like internal menus (e.g. of aggregators and jointypes), or directory/file browsing (a la Norton/Midnight Commander), and a number of other interesting use cases. But it’s been complicated from a design perspective. I’m hoping that making a public statement like this will spur my subconscious to find an elegant solution like happened in my Podcast__init__ interview.

Thanks for doing the interview, Saul!