The Conference for Machine Learning Innovation

Applying machine learning online at scale

Session
Join the ML Revolution!
Register until October 20:
✓ Save up to $233
✓ Team discount
✓ Extra Specials for Freelancers
Register Now
Join the ML Revolution!
Register until October 20:
✓ Save up to $233
✓ Team discount
✓ Extra Specials for Freelancers
Register Now
Join the ML Revolution!
Register until November 03:
✓ Save up to €494
✓ 10% Team Discount
✓ Special discount for freelancers
Register Now
Join the ML Revolution!
Register until November 03:
✓ Save up to €494
✓ 10% Team Discount
✓ Special discount for freelancers
Register Now
Join the ML Revolution!
Until the Conference starts:
✓ Group discount
✓ Special discount for freelancers
Register Now
Join the ML Revolution!
Until the Conference starts:
✓ Group discount
✓ Special discount for freelancers
Register Now
Infos

Applying machine learning in online applications requires solving the problem of model serving: Evaluating the machine-learned model over some data point(s) in real time while the user is waiting for a response. Solutions such as TensorFlow Serving are available to solve this problem where the model only needs to be evaluated over a one data point per user request, but this is not sufficient for problems where many data points must be evaluated to make a decision, such as in search and recommendation. This talk will show that this is a bandwidth constrained problem, and outline an architectural solution where computation is pushed down to data shards in parallel. It will demonstrate how this solution can be put into use with Vespa.ai, an open source engine, to achieve scalable model serving of TensorFlow and ONNX, and show benchmarks comparing performance and scalability to TensorFlow Serving. Model serving with Vespa is used today for some of the world’s largest recommender systems, such as serving personalized content on all Yahoo content pages and personalized ads in the world’s third-largest ad network. These systems evaluate models over millions of data points per request for hundreds of thousands of requests per second.

This Session originates from the archive of Diese Session stammt aus dem Archiv von SingaporeSingapore and  und MunichMunich . Take me to the program of . Hier geht es zum aktuellen Programm von Singapore Singapore .

This Session originates from the archive of Diese Session stammt aus dem Archiv von SingaporeSingapore and  und MunichMunich . Take me to the program of . Hier geht es zum aktuellen Programm von Berlin Berlin .

This Session originates from the archive of Diese Session stammt aus dem Archiv von SingaporeSingapore and  und MunichMunich . Take me to the program of . Hier geht es zum aktuellen Programm von Munich Munich .

This Session Diese Session originates from the archive of stammt aus dem Archiv von SingaporeSingapore and  und MunichMunich . Take me to the current program of . Hier geht es zum aktuellen Programm von Singapore Singapore , Berlin Berlin or oder Munich Munich .

Behind the Tracks