You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey Andrej, thank you for this awesome video. I watched all of it!
I have been developing a transformer model from scratch and would like to know if you could advise me.
It's a decoder-only model and I want to pretrain it. I feel like I'm ready.
The architecture is: 12 layers and heads, 768 model, lr 5e-5, 512 sequence length, 40 batch size.
An autoregressive model.
My concern is that the data is quite small even though I am looking to pretrain even the most basic model(which can generate coherent and meaningful text).
my corpus is like 150 million tokens.
If there is a need for clarity please let me know.
The reason I'm doing it from scratch is because am just curious.
I would like to oversee the whole process from 0.
It is an immersive learning experience.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hey Andrej, thank you for this awesome video. I watched all of it!
I have been developing a transformer model from scratch and would like to know if you could advise me.
It's a decoder-only model and I want to pretrain it. I feel like I'm ready.
The architecture is: 12 layers and heads, 768 model, lr 5e-5, 512 sequence length, 40 batch size.
An autoregressive model.
My concern is that the data is quite small even though I am looking to pretrain even the most basic model(which can generate coherent and meaningful text).
my corpus is like 150 million tokens.
If there is a need for clarity please let me know.
The reason I'm doing it from scratch is because am just curious.
I would like to oversee the whole process from 0.
It is an immersive learning experience.
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions