This is a simple implementation of a GPT-like decoder-only transformer model. The implementation is done step-by-step, starting from a simple bigram language model and ending with a full transformer model. The code is written in PyTorch and is meant to be as simple as possible. Efficency is not a concern.
The code is largely based on Andrej Karpathy's nanoGPT lecture. All credit goes to him.