The Levenshtein Transformer

Although just released in 2017, the Transformer architecture (Vaswani et al.) has widely become the de facto standard approach for sequence-to-sequence problems like translation or abstractive text summarization. Despite the great success, the relatively slow inference process as well the general lack of flexibility due to the autoregressive nature of the generation process can quickly become a problem. The Levenshtein Transformer (Gu et al., 2019) proposes to overcome these flaws by not modelling the output sequence directly but two operations on top of a given input instead: insertions and deletions, effectively turning the generation process into a non-autoregressive one, allowing to refine a given sequence over and over again in a dynamic and flexible way. In this talk, we will have a short introduction to language modelling, common implementations like recurrent models or Transformers before we turn to the actual Levenshtein Transformer. Finally we will have a short discussion on possible applications outside of the scope of the original paper.


In this series