Building Transformer Models with Attention

Implementation from Scratch in TensorFlow Keras

Following this book to teach myself about the transformer architecture in depth.

Some excellent resources I've come across along the way:

Illustrated Guide to Transformers Neural Network: A step by step explanation - by Michael Phi (@LearnedVector)
Let's build GPT: from scratch, in code, spelled out. - by the legendary Andrej Karpathy (@karpathy)
Transformers from Scratch - by Peter Bloem (@pbloem)
Lil'Log > The Transformer Family Version 2.0 - by Lilian Weng (@lilianweng)
The Illustrated Transformer - by Jay Alammar (@jalammar)
Transformer Architecture: The Positional Encoding - by Amirhossein Kazemnejad (@kazemnejad)
Dive into Deep Learning > Attention Mechanisms and Transformers
Harvard NLP > The Annotated Transformer
Towards Data Science > Transformers Explained Visually: Part 1, Part 2, Part 3 and Part 4 - by Ketan Doshi
Lecture 12 of the "Deep Learning at the Vrije Universiteit Amsterdam" (DLVU) Series - by Peter Bloem (@pbloem)
Natural Language Processing in Action Using Transformers in TensorFlow 2.0 - by Aurélien Geron (@ageron)
TensorFlow Tutorials > Neural machine translation with a Transformer and Keras