Transformers are the new go-to technology for Natural Language Processing and are also starting to gain traction in the computer vision community. However, despite all their successes and widespread adaption, they have one major drawback: Their computation and memory requirements grow quadratically with the input size. Hence training transformer models from scratch is a very resource-intensive task.
In this session we want to take a look at the current state of the research into efficient transformer layers, i.e. reformulations of the vanilla transformers that have computation and/or memory requirements of O(n*log(n)) or even O(n). If your knowledge about transformers or complexity theory is a bit rusty, do not worry: The session will start with a short refresher on both topics so you can make the most of it.