PyDev of the Week: Jens Winkelmann

This week we welcome Jens Winkelmann (@WinmanJ) as our PyDev of the Week! Jens is a former PhD researcher in the Foams and Complex System Group at Trinity College Dublin (TCD) but is now working as a Data Scientist at talpasolutions. You can find out more about what Jens does on his web page. Jens is also a conference speaker.

Let’s spend a few moments getting to know Jens better!

Can you tell us a little about yourself (hobbies, education, etc):

I was born and raised in the beautiful city of Essen, Germany, where I also currently live and work again after a couple of years abroad.

I obtained a B.Sc. and an M.Sc., both in Physics from TU Dortmund (Germany), in 2013 and 2015, respectively. End of 2015, I moved to Dublin, Ireland, to pursuit a PhD in Physics in the Foams and Complex Systems research group of Trinity College Dublin, from which I graduated last year.

In December 2019 I returned to Essen and am working here now as a Data Scientist at talpasolutions GmbH. Talpasolutions is the leading driver of the Industrial Internet of Things in the heavy industry. We build digital products that offer actionable insights for machine manufacturers as well as operators based on collected machine sensor data.

In my free time I enjoy climbing, both rope climbing as well as bouldering. It is a great sport because it combines mental focus with physical workout and can be individual or communal as much as you like.

Why did you start using Python?

I started using Python for the data analysis and plotting parts of the Physics labs during my undergrad at TU Dortmund. Some friends of my study group who were more familiar with programming languages introduced me to it. They quickly convinced me that it reduces my stress level for the Physics labs tremendously in the long run compared to Excel.

First, I used it for typical tasks in Physics labs where you analyse and then plot experimental data using NumPy and Matplotlib. Over time the data analysis became more and more complex. I also used it for my Bachelor, Master and later on PhD thesis, where I analysed and visualised large amount of data created by computer simulations. It was only then that I fully appreciated what a powerful tool Python can be.

What other programming languages do you know and which is your favorite?

I also learned C/C++ in an introductory coding lecture as well as part of a Computational Physics lecture. I implemented a hydrodynamic simulation in C/C++ for my Bachelor as well as Master thesis. Computational speed is quite essential here and everything needed to be programmed from scratch. So Python was unfortunately not an option for this.

I also got a bit into functional programming through a lecture about Haskell during my Master studies. But the only learning that remained is the functools package in Python which provides some functional programming tools.

Python is by far my favourite programming language at the moment. Since it is so straightforward, it allows me to fully focus on the problem that I’d like to solve rather than getting distracted by unnecessary boiler-plate code. This and Python’s large ecosystem ranging from NumPy to tensorflow and keras makes it to a powerful tool in the repertoire of a Data Scientist.

What projects are you working on now?

Most of my current projects are related to my work as a Data Scientist at talpasolutions where I analyse data from the world’s largest machines that are being used in the mining industry. Our data science solutions increase overall equipment efficiency, operational productivity, predict possible maintenance downtimes, and also have an ecological impact: For example, we help our customers to reduce their diesel consumption and thus save CO2 emissions.

There are two particular projects or use cases that I’ve been currently involved in:

  • Activity detection &
  • predictive maintenance.

Our activity detection algorithms are comparable to object detections in image recognitions. The sensors of a heavy machinery such as a truck or excavator can be used to classify its current activity state. A truck for instance may be loading, dumping, idle, driving loaded, or driving unloaded. Based on sensor signal such as payload, speed, and dump angle, our algorithms infer its activity state. Activity detection algorithms are crucial because they build the basis for a digital surveillance of the mine’s productivity and further analytical tools of our software. Based on these algorithms, we provide actionable insights to our users that optimise their mine operations, e.g.: What is the average loading time of a truck? What are the largest efficiency losses in the mine operation?

The goal behind predictive maintenance is to reduce the mine operator’s maintenance costs which occur either due to unplanned downtimes or component failures. Our algorithms achieve this goal by predicting unplanned downtimes based on the machine’s historical data. The analytical results are then displayed in our software solution to inform the right person at the right time. With unplanned downtime quickly costing more than $1000 per truck per hour, the importance of this issue is indisputable. One of our exemplary strategy includes live-casting sensor data by using anomaly detection. For this strategy, we employ a neural network to detect possible anomalous behaviours in sensor signals such as the suspension pressure.

If this got you excited about my Data Science work feel free to watch my talk at the pyjamas conference (an online conference dedicated to Python) on YouTube.

Another project unrelated to my job as a Data Scientist includes writing an academic book by the title Columnar Structures of Spheres: Fundamentals and Applications together with Professor Ho-Kei Chan from the Harbin Institute of Technology in China. The book covers the topic of my PhD thesis about so-called ordered columnar structures that we investigated using computer simulations in Python. Such structures occur when identical spheres are being packed densely inside a cylindrical confinement (for more details check out this wikipedia article). We simulated such structures by employing optimisation algorithms in Python, which helped us to discover a novel experimental foam structure, a so-called line-slip structure.

The full range of their applications is still under discovery, but so far they have been found in foam structures (like beer foam), botany, and nano science. My personal favourite application is that of a photonic metamaterial. Such materials are characterised by having a negative refractive index which allows them to be used for super lenses or cloaking. Some of our structures are potential candidates for such material.

Because of Covid-19, we actually made good progress on the writing lately. The book is now planned to be published in the summer 2021 by Jenny Stanford Publishing.

Which Python libraries are your favorite (core or 3rd party)?

The Python ecosystem provides an amazing variety of well-developed Python libraries for Data Scientists. They all serve different purpose. Some that I most often use are:

  • Pandas (for data wrangling and manipulation)
  • NumPy (for numerical data structures and methods)
  • SciPy (for everything scientific, e.g. linear algebra, optimisation algorithms, or statistics)
  • Scikit-learn (for standard machine learning models)
  • Matplotlib (for data visualisation)
  • Plotly (for interactive data visualisation)

I especially like Matplotlib because of how versatile it is in creating graphs and data visualisation. But of course, Plotly shouldn’t go unmentioned here either. Matplotlib lacks a bit in plotting large amount of data in an interactive graph. This is where Plotly actually shines.

What drew you to data science?

In retrospective, it seems like Data Science is the natural path after studying Physics. But winding the clocks back to when I was starting my Physics undergrad degree, I didn’t even know what Data Science was.

During my time of my PhD in Dublin, I came across the Python Ireland community and participated in a few of the monthly meet-ups as well as the Python Conference in 2016. The talks and discussion with people at these meet-ups made me curious about Data Science. What I really liked about Data Science was the fact that it provided a way to do Science outside of Academia. On top of this, my Python skills turned out to be quite useful for Data Science as well.

So after I finished my PhD in Dublin, I decided to apply for a couple of positions in Germany and Ireland, including my current position at talpasolutions in my hometown Essen.

Talpasolutions stood out to me from all the other companies that I applied to because talpasolutions mission has meaning to me. By developing digital products for the mining industry, we improve the working conditions of heavy industry workers and we make the industry more environment-friendly by reducing its carbon food print.

Additionally, the mining industry has a long and famous history in Essen. Even though the last mines have been closed for years, it feels like, we at talpasolutions are carrying on the spirit of this era. Since Essen is my hometown, I really enjoy working here. For many other Data Science positions, I would be starving for meaning because what lots of those companies do is make people click ads or make rich people richer.

Can people without math backgrounds get into data science? Why or why not?

I think a solid foundation of math skills, especially statistics, is essential for Data Science. It is important to understand the math behind the models that you employ as a Data Scientist. The math background helps you to optimise your model and how to avoid over- or underfitting.

But you don’t need to be a math genius because the Data Science work in most companies consists only of applying and optimising already developed (machine learning) models to their data. Data Scientist at FAANG companies or research facilities are mainly the once developing completely new algorithms. In that case of course, your math skills better be in good shape.

Similar to Computer Science, Data Science is also ranging over a broad spectrum and it will continue to broaden in the future. I’d say there are some Data Science fields that require more and some less mathematics skills. We at talpasolutions deal entirely with numerical data from the engineering world which requires a certain degree of mathematical understanding from all our developers.

Is there anything else you’d like to say?

As final words, I’d like to say thank you for giving me the opportunity to answer your questions here. I hope my answers got your blog audience intrigued and more eager than ever to learn more about Data Science. I also would like to thank my friend Sanyo for proofreading my answers and making sure that they are making crispy-clear sense.

Thanks for doing the interview, Jens!