Categories
Deep Learning

Transformer Architecture

This post provides a primer on the Transformer model architecture. It is extremely adept at sequence modelling tasks such as language modelling, where the elements in the sequences exhibit temporal correlations with each other.