The Conference for Machine Learning Innovation

Preparing Text Input for Machine Learning

Session
Join the ML Revolution!
Register until October 21:
✓ 50% off on all prices
✓ 10% team discount
Register Now
Join the ML Revolution!
Register until October 21:
✓ 50% off on all prices
✓ 10% team discount
Register Now
Join the ML Revolution!
Register until September 23:
✓ PS Classic or C64 Mini for free
✓ Save up to €310
10 % Team Discount
Register Now
Join the ML Revolution!
Register until September 23:
✓ PS Classic or C64 Mini for free
✓ Save up to €310
10 % Team Discount
Register Now
Join the ML Revolution!
Register until the conference starts:
✓ 2-in-1 conference special
✓ 10 % Team Discount
Register Now
Join the ML Revolution!
Register until the conference starts:
✓ 2-in-1 conference special
✓ 10 % Team Discount
Register Now
Infos
Wednesday, June 20 2018
09:00 - 10:00
Room:
Asam 2

Deep down ML is a pure numbers game. With very few exceptions, the actual input to an ML model is always a collection of float values. This is straightforward for numerical, spreadsheet-like input, images where pixels are just numerical color values or audio samples, but how do ML algorithms work on words and letters? As proper preprocessing is often the most crucial part in a successful ML project, it is important to understand how to handle textual input properly. We will have a look at the two most important jobs when handling text in ML: preprocessing/normalization and vector representations of text. We will first navigate the minefield of correct Unicode normalization of our input and then – after we have tamed our strings – how to convert normalized and sanitized strings into various vector representations, from simple one-hot encodings to embeddings produced by algorithms like Word2Vec.

This Session originates from the archive of Diese Session stammt aus dem Archiv von MunichMunich . Take me to the program of . Hier geht es zum aktuellen Programm von Singapore Singapore .

This Session originates from the archive of Diese Session stammt aus dem Archiv von MunichMunich . Take me to the program of . Hier geht es zum aktuellen Programm von Berlin Berlin .

This Session originates from the archive of Diese Session stammt aus dem Archiv von MunichMunich . Take me to the program of . Hier geht es zum aktuellen Programm von Munich Munich .

This Session Diese Session originates from the archive of stammt aus dem Archiv von MunichMunich . Take me to the current program of . Hier geht es zum aktuellen Programm von Singapore Singapore , Berlin Berlin or oder Munich Munich .

Behind the Tracks