How is it that humans can learn from just a few examples and machine learning algorithms can’t? ML algorithms have a lot of trouble understanding common sense and need insane amounts of very costly labelled data. Luckily this is about to change. Self-supervised learning allows ML algorithms to train on low quality unlabelled data. On top of that ML algorithms trained using self-supervised learning seem to pick up common sense and are able to beat human performance in language tasks. Why is that? And how does it work?