This week we welcome Christopher Clarke (@realchrisdev) as our PyDev of the Week. Christopher writes a Python blog that is worth your time perusing. You might also find his github profile interesting. Let’s spend some time getting to know him better!
Can you tell us a little about yourself (hobbies, education, etc):
I have a Masters in Computer Science from the University of Canterbury at Kent (UK) and a Masters in Economics for the University of the West Indies (Trinidad and Tobago). Indeed, before I became a developer I worked as an Economist at the Central Bank of Trinidad and Tobago for about seven (7) years. I’m a big fan of track and field and the NBA and enjoy cooking, hiking and reading books on philosophy, ethics and religion (although I admit I’ve been stuck on Wittgenstein’s “Philosophical Investigations” for the last six months). I live in Trinidad and Tobago, I’m married to Arva, we have no kids but I have two cats (Git & Chad) who adopted me after I rescued them from a trash compactor
Why did you start using Python?
In my days as an Economist at the Central Bank I worked mainly on the development of data sets and economic statistics along modelling and forecasting. My data development work exposed me to various databases (Oracle, Dbase IV, Foxpro, FAME) and the 4GLs associated with them. I also taught my self VB/VBA and built a number of office automation and statistical data processing and reporting systems which to my surprise (and horror) are still is use today. I eventually lost enthusiasm for economics and developed a real passion for programming and software development. I was eventually recruited by an Irish based consultancy company as a FAME consultant. I was mainly based in London and worked for a number of investment banks and data vendors. During this stint of I picked up a bit of C and Java as well as scripting languages like Perl and Awk.
In 2000 an opportunity arose to further my formal eduction in computer science. At that time, I was all about Java and was determined use all the “extra” time I would have in school to migrate some the systems I had previously developed to Java. However, one of my instructors (Leonid Timochouk) kept talking about how OCaml and Python were better suited to the kinds of systems that wanted to develop. I just did not “grok” OCaml but immediately fell in love with Python. Of course the official language of the University program was Java so I did not get to do as much Python as I wanted. In any case, the job market at the time especially at the “City” firms where I had consulted were heavily slanted towards Java/C/C++. Fortunately, when I returned to Trinidad on a break I was contacted by colleague from my central banking days who had assumed the leadership of research institute that was based at the University of the West Indies in Trinidad. They were very interested in the ideas of Open Data and Open Source and wanted to build a statistical “data bank” to support economic and social research in the Caribbean. As there was nothing in place, I would be free to use the most suitable technologies for the job. Remember this was back in the early 2000s so I was a pleasantly surprised that people from outside the world of software were even familiar with OSS and related concepts. I took the job and got them to agree to use Python as the our main implementation language. A key aspect of the project was the statistical database component. We set about developing a time series database API using Numpy and Scipy with a PostgreSQL backed. As could be expected, this turned out to be a lot more complex than we expected as there was tools like Pandas or ORMs like SQLAlchemy were not in existence. We had to build everything from scratch. There was Numpy and Scipy but there were a number of critical technical decisions that had to be made really early in the process like wether to use Numpy masked arrays (MA) to handle missing data or to use the standard Numpy arrays with sentinel values to represent missing values. Like many, we choose to go with MA arrays while interestingly Pandas uses NA arrays. This was clearly the in hindsight the better approach given the state of Numpy at the time. Anyway, we were eventually able to make some substantial progress particularly after discovering a paper presented at the SciPY Conference by Reggie Dugard that described a similar system (but without the PostgreSQL backend) that he developed at his job. I contacted him via email he did not hesitate to provide us with a lot of guidance in the area. He even got his company to give us special access to the source code he developed for the project. This showed me that the strength Python was really in its community of great folks willing to share ideas about Python with strangers from all over the world. We also used Plone as out CMS and tried and ultimately failed to build the Web UI using Zope. Unfortunately, our project eventually ran into funding difficulties so we did not get to complete our mission, but all in all it a great introduction to the world of Python development.
What other programming languages do you know and which is your favourite?
I’ve used R and much prefer it to any of the Econometrics and Statistical package that I used when I was an economist. It also has better built in visualisation tools than the Python alternatives but think Python is a much stronger language to use for building a data analytics system.
I’ve played around with Objective C and I’m very curious about Swift.
We’ve built back-end APIs for a couple of IOs/Android applications and every time we do this I wish that there was a better Pythonic answer for building mobile applications. Of course there is Kivy but this seems more oriented to gaming. I have great hopes for toga but it is probably too early to say how this project will pan out.
What projects are you working on now?
I run a small consultancy business we mainly do web development almost exclusively using Django. More recently we’ve adopted Wagtail CMS for our CMS based projects. For Statistical/Financial type applications we use typically Django, Pandas and one or more of the scikits (sckits-learn, sckits-stats-model, etc.). Like most consultancy companies also have a “secret” plan to gradually move away from client work and focus on our own apps and products. Hopefully, I will be able to say more about this closer to the end of the year.
I also have a few open source projects that I maintain; django-pandas is the most popular. It provides bridge between the Django ORM and Pandas allowing the user to quickly build Pandas dataframes out of Django Querysets. wagtial-cookiecutter-foundation that provides a way to quickly spin up full fledged Wagtail CMS sites that use the Zurb Foundation framework. These sites come with a comprehensive set of pages and apps, ansible provisioning and deployment, front-end dependency management with bower and so on.
I was a big contributor and advocate for gratipay/gittip but we were forced to pivot due to legal problems that resulted with some aspects of our business model that had the potential to raise flags with the Financial Regulators. Unfortunately, while we’ve pivoted to be about “Teams and Open Works” I feel that things on Gratipay front are still very much up in the air.
Which Python libraries are your favorite (core or 3rd party)?
Django continues to be my favorite web framework. We’ve heard the usual complaints that it’s too heavy, the ORM is a mess and the templating system is too slow, it does not scale and so on. However, It continues to meet 90 per-cent of our needs out of the box. In any case, the strong third party ecosystem and community means that you can always find the tools/apps/information to support many different styles of applications. We’re big fans of Wagtail CMS we especially like the fact that there are no “plugins” just standard Python modules and classes that inherit from an Abstract model.
Pandas and IPython Notebook have really put Python on the map as a serious competitor for R in the field of data analysis. While it is true that we could always use Numpy and Scipy to accomplish the same tasks but It takes a quite a lot of code (compared to something like R) to do anything significant. Pandas and IPython Notebook allow your typical tech savvy data analyst, journalists, economist , etc., to feel that what they are doing is simply getting on with the task at hand and not learning how to become “hard-core” programmers. Indeed, I’ve had some success in providing these kind of users with a set of “template notebooks” that include lots of helper functions and built in documentation. They can just copy and modify these notebooks to analyse a different dataset, add more indicators to the analysis and so on. Most of these users are on Windows and I’m also really impressed how Conda is allowing for easy installation of the Pydata Stack and other Python packages on Windows.
Other really indispensable packages include scrapy, Ansible, venv/virtualenv/virtualenvwrapper, requests, scikits-statmodels and so many others.
Where do you see Python going as a programming language?
I think the transition to Python 3 is going to happen a lot faster than people think. A lot of the popular frameworks and applications (Django, Flask, PyData stack, etc. ) already support Python 3. A lot of the popular books out there (The Python Cookbook, Test Driven Development, Two Scoops of Django , etc.) all assume that the user will use be Python 3. As usual, the sticking point is deployment. Recently a colleague developed a REST API for a mobile application Django/GeoDjango, Python 3. Everything was fine until we tried to deploy to a DigitalOcean Droplet. Unfortunately, we found that venv is currently completely broken on Ubuntu 14.04 and even our favourite Devops tool Asnsible, is really dragging is feet about supporting Python 3.
What is your take on the current market for Python programmers?
All I have to offer on this is anecdotal evidence. When I first attended Pycon in 2006 there we only a few sponsors and only one or two companies interested in recruiting developers. In contrast, at Pycon 2015, the place was absolutely crawling with recruiters and firms trying to recruit Python developers. Even in In the Caribbean where Python is not so popular but for what I can see the few Python developers that I know in the regions seem to keep them selves pretty busy.
Is there anything else youâ€™d like to say?
Python is a fun language and its has a strong community characterised by respect for others. I feel that some of the main organisations in the community like the PSF and DSF are doing a good job encouraging diversity and participation among traditionally under represented groups. I think that more has to be done by way of outreach to developers outside the traditional computing countries like those in the Caribbean, parts of Latin America, Middle Easts and even areas of India outside Bangalore etc
The Last 10 PyDevs of the Week