11:30 - 12:30
The deployment of machine learning models can be challenging, especially in the context of distributed systems. Although Python is the dominant language among data scientists, it can create friction when integrating with JVM-based tools such as Spark or managing application dependencies on clusters of heterogeneous machines. Many data scientists developing on such systems struggle with the subtleties of these challenges.
This presentation will share lessons learned working on large-scale Hadoop clusters and examine the most promising approaches to alleviate common issues. In particular, we will discuss our experience with leveraging containerization to tackle the dependency management challenge from a data scientist’s point of view.