Thanks to Transformer Architectures like BERT and GPT-2, we have recently experienced the “ImageNet Moment” for Natural Language Processing (NLP). Like with Computer Vision benchmarks eight years ago, the results for various NLP tasks have improved significantly in a very short amount of time. But how to train a neural network to deal with language, with its varying input length and non-numerical data? How to compare the flat output tensor of a neural network with an expected syntax tree to compute a loss? In this talk, we will look at ways to feed language into a neural network and how to interpret its output for various common NLP tasks, like classification, sentiment analysis, Part-of-Speech (POS) tagging, Named Entity Recognition (NER) and coreference resolution. We will also see how the output of a network needs to be interpreted to create completely new text. The talk will be accompanied by real world examples based on Transformer Architectures.