ML CONFERENCE Blog

“There are always surprises when you work with data because data is not very clean, naturally”

Oct 29, 2017

Our first ML Conference will debut in December in Berlin. Until then, we’d like to give you a taste of what’s to come. We talked with, Markus Ehrenmüller-Jensen, Business Intelligence Architect at Runtastic about how the company involves machine learning into their daily business, the benefits, the battle scars and everything in between. Also, you’ll get a sneak peek at his talk.

JAXenter: For your data analysis, you use Microsoft Data Platform as well as the coding language R. Did you use R before you started working with Data Platform or did you build up the knowledge from scratch?

Markus Ehrenmüller-Jensen: We are a team of data people who love to work with data, to discover insights on their own and to help others gain insights and to derive actions from them. Some of them, like myself, are approaching this from a “data engineering” perspective.

The language R first appeared on my radar when Revolution Analytics was acquired by Microsoft in 2015 and in subsequent steps integrated into (almost) all tools of their Data Platform. Even though R is a mature language, it was totally new to me personally almost three years ago and I had to learn it from scratch by reading dozens of books and applying it in my daily work, where possible. On the other side, my colleagues in the data science department, which consists of mostly statisticians, already had both the experience and knowledge when it came to R and were really happy about how easily we could include the scripts they came up with into the data warehouse process.

JAXenter: You turn data into information by giving them both meaning and context. What were your first steps? What knowledge did you want to gain from the data in the first place?

Markus Ehrenmüller-Jensen: Runtastic’s journey into data analytics was a very typical one. In the early days, direct queries to the production databases were used to build up reports. Later on, analytics based on spreadsheets were added to combine data from different sources.

Soon, the spreadsheets turned into a painful experience, as maintaining them involves a lot of manual work. While looking for a better visualization tool, it turned out that it would be better to start building a centralized data warehouse from scratch in order to deliver a common base for analytics.

That’s when I came into play and was hired by Runtastic in 2014. Having, on the one hand, reports and analytics already in place and, on the other hand, colleagues who deeply understand the data needed for those reports analytics made it easy to start with quick wins; insights, valuable for the company goals, like the number of monthly active users, registered users, or premium users where the starting points.

In the meantime, we expanded to analytics to learn from our users’ behavior, like usage of different features. Enriching all those analytics by welcoming machine learning into our work made everything even more valuable for our colleagues to derive priorities for their actions, so we can improve our apps to enable Runtastic’s user base to achieve a better and healthier lifestyle.

JAXenter: Did you encounter problems – maybe issues that you did not anticipate?

Markus Ehrenmüller-Jensen: There are always surprises when you work with data 🙂 That is because of the simple fact that data is not very clean, naturally. You will soon discover quality issues, which can be hunted down to either bugs in applications or because of lack of right context or due to assumptions, which turn out wrong later in time.

Analyzing the habit of different age groups had its share of surprises, as a certain age group was —to our surprise— over-represented. It turned out that this was not due to the true age of our users, but due to default values *sic* used for the ages. Sometimes it is simple as that, sometimes it will need a deep understanding of the business to come up with a meaningful hypothesis, which you can then proof against the data, to either gain new insights or discover data issues.

JAXenter: You will be speaking at the upcoming ML Conference. Is Machine Learning deeply included in Runtastic’s Big Data solution?

Markus Ehrenmüller-Jensen: Runtastic only recently started to adopt Machine Learning methods to improve both the customer’s experience with the apps, but also to help our business users work and make data-driven decisions. Nevertheless, we already have a productive clustering and classification algorithm to assign users to different groups based on their activity level.

This helps us conduct targeted campaigns for re-engagement and identify different user activity patterns (e.g. seasonality). To help our business users better plan and evaluate the performance of our products, we implemented and deployed prediction models that forecast the daily values of our KPIs. We use these forecasts to detect trends, evaluate the performance of campaigns and define future goals.

JAXenter: What should participants expect to learn from your talk – and what not?

Markus Ehrenmüller-Jensen: You will learn how the integration of R into Microsoft’s Data Platform helped Runtastic to improve data quality on the one hand and gain new insights on the other. I will talk about the different services Microsoft offers, and which of them we use in production and why we didn’t add others into our current data warehouse architecture.

You will be able to learn from the best practices we came up with, from an architectural perspective as well as some helpful R packages we do use. Unfortunately, I will not be able to share all of the data products we are currently working on, as some of them will first be launched next year.

Thank you!

 

Melanie Feldmann studied Technology Journalism at the Bonn-Rhein-Sieg University of Applied Sciences, and works at S&S Media since Oktober 2015.

 

 

 

 

Behind the Tracks