Dealing with data issues in natural language processing: All that glitters

Session
Join the ML Revolution! ✓ Special Discount Register till March 7 and get the Workshop Day for free ✓ Team Discount Register with 3+ colleagues and get 10 % off! Register Now

It is often said that in machine learning, data is the new oil or gold. Unfortunately, this does not only mean that data is extremely valuable, but also that high-quality data is rare and can take considerable effort to refine. Not only do machine learning practitioners have to collect sufficiently labelled data to train a model, they also have to make sure this data does not contain any unwanted biases that may compromise the results. 

In this talk, I’ll discuss how recent advances in natural language processing allow NLP professionals to address these two challenges. First, thanks to breakthroughs in transfer learning, it is now possible to train a high-quality model with much fewer data than before. Second, now that recent failures have drawn people’s attention to ethical AI, more companies and researchers are developing ways to remove problematic biases from language data and models. I will give an overview of existing approaches and illustrate their effectiveness with example projects that my company NLP Town has worked on in the past.

Behind the Tracks