Skip to content

tyler-romero/nanogpt-speedrun

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nanogpt-speedrun

Reproducing GPT-2 (124M) as fast as possible on an RTX 4090.

karparthy:

The 124M model is the smallest model in the GPT-2 series released by OpenAI in 2019, and is actually quite accessible today, even for the GPU poor...You can train the model with a single GPU too, it would just take proportionally longer (e.g. ~4-24 hours depending on the GPU).

This repo is heavily influced by https://github.com/KellerJordan/modded-nanogpt. The initial baseline here was taken directly from the initial commit of that repo, with minor modifications.

See also: karpathy/llm.c#481 and https://github.com/tysam-code/hlb-gpt

setup

uv sync --all-extras
uv run python src/data/cached_fineweb10B.py
./run.sh

About

NanoGPT (124M) as fast as possible

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published