Parameter Servers Suck, All Hail Horovod: Distributed Learning Technology Overview

Join the ML Revolution! Team Discount Register with 3+ colleagues and get 10 % off! Register Now
Thursday, December 6 2018
09:00 - 10:00
Salon 4+5

Modern deep learning architectures are getting more and more computationally demanding which has started hurting hyperparameter tuning and experimentation speed. GPUs are getting stronger and cheaper, but vertical scaling is too slow to keep up with professional demand; we need to go horizontal, multi-GPU and multi-machine.

But what is distributed learning? Should it be used? How is it used? Data parallelism, model parallelism, federated learning, what, what, WHAT?

In this talk, I’ll present bottlenecks that various distributed learning approaches solve, so you learn when to start looking at distributed learning if you encounter the presented hindrances. I’ll also highlight the differences between different distributed learning technologies, e.g., TensorFlow Parameter Servers and Horovod.

Behind the Tracks