Productionizing machine learning models: Lessons learned in the Hadoop ecosystem

Join the ML Revolution! ✓ Special Discount Register till March 7 and get the Workshop Day for free ✓ Team Discount Register with 3+ colleagues and get 10 % off! Register Now

The deployment of machine learning models can be challenging, especially in the context of distributed systems. Although Python is the dominant language among data scientists, it can create friction when integrating with JVM-based tools such as Spark or managing application dependencies on clusters of heterogeneous machines. Many data scientists developing on such systems struggle with the subtleties of these challenges. 

This presentation will share lessons learned working on large-scale Hadoop clusters and examine the most promising approaches to alleviate common issues. In particular, we will discuss our experience with leveraging containerization to tackle the dependency management challenge from a data scientist’s point of view.

Behind the Tracks