nanogpt-speedrun

Reproducing GPT-2 (124M) as fast as possible on an RTX 4090.

karparthy:

The 124M model is the smallest model in the GPT-2 series released by OpenAI in 2019, and is actually quite accessible today, even for the GPU poor...You can train the model with a single GPU too, it would just take proportionally longer (e.g. ~4-24 hours depending on the GPU).

This repo is heavily influced by https://github.com/KellerJordan/modded-nanogpt. The initial baseline here was taken directly from the initial commit of that repo, with minor modifications.

See also: karpathy/llm.c#481 and https://github.com/tysam-code/hlb-gpt

setup

uv sync --all-extras
uv run python src/data/cached_fineweb10B.py
./run.sh

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
logs		logs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
run.sh		run.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nanogpt-speedrun

setup

About

Releases

Packages

Languages

License

tyler-romero/nanogpt-speedrun

Folders and files

Latest commit

History

Repository files navigation

nanogpt-speedrun

setup

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages