TODOs

Loss graph for LAMBADA next word prediction

Approach

My goal was to end up with a deeper understanding of the Transformer architecture by writing it "from scratch". My primary sources were the Attention Is All You Need paper (Vaswani et al) and the Torch documentation. I was strict about LLMs, using them to help me check my reasoning and offer high-level feedback on code, but not for actually generating code. A lot of time spent just scribbling on pen and paper and bashing my head against the wall. Overall, pretty satisfied with the approach, as I feel more comfortable with Torch than I would have otherwise. Ofc I didn't do everything from scratch - towards the end I referred to Jalammar's visual guide to see if my intuition was on track, I used built in LayerNorm, BPE tokenizer etc. Building from scratch is an infinite rabbithole but I'm pretty happy with the compromise I struck in that fractal execise

Exercises

Here are some first principle questions to answer (source):

What is different architecturally from the Transformer, vs a normal RNN, like an LSTM? (Specifically, how are recurrence and time managed?)
Attention is defined as, Attention(Q,K,V) = softmax(QK^T/sqrt(d_k))V. What are the dimensions for Q, K, and V? Why do we use this setup? What other combinations could we do with (Q,K) that also output weights?
Are the dense layers different at each multi-head attention block? Why or why not?
Why do we have so many skip connections, especially connecting the input of an attention function to the output? Intuitively, what if we didn't?

Solutions: https://chatgpt.com/share/671590d9-3268-800d-b9f6-f9802dc366d0

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.gitignore		.gitignore
README.md		README.md
scratch.ipynb		scratch.ipynb
scratch.py		scratch.py
submit.sh		submit.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TODOs

Approach

Exercises

About

Releases

Packages

Languages

taziksh/attention-is-all-you-need

Folders and files

Latest commit

History

Repository files navigation

TODOs

Approach

Exercises

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages