Skip to content

Commit

Permalink
Add template variable to ppl validation file manifest (#110)
Browse files Browse the repository at this point in the history
  • Loading branch information
undfined authored Nov 21, 2024
1 parent ca44cf4 commit 7655a3b
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 11 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- (Optimization) Mark model input sizes as dynamic for `torch.compile()` to avoid recompile during evals or variable-sequence / batch size training. This doesn't seem to hurt throughput.
- Made HTTPS and GCS IO functions more robust.
- Fixed a bug where we were always getting dolma2 tokenized validation data when generating config with DataMix.v3_small_ppl_validation.

## [v1.6.3](https://github.com/allenai/OLMo-core/releases/tag/v1.6.3) - 2024-11-15

Expand Down
22 changes: 11 additions & 11 deletions src/olmo_core/data/mixes/v3-small-ppl-validation.txt
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
c4_en-validation,eval-data/perplexity/v3_small_dolma2-tokenizer/c4_en/val/part-0-00000.npy
dolma_books-validation,eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_books/val/part-0-00000.npy
dolma_common-crawl-validation,eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_common-crawl/val/part-0-00000.npy
dolma_pes2o-validation,eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_pes2o/val/part-0-00000.npy
dolma_reddit-validation,eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_reddit/val/part-0-00000.npy
dolma_stack-validation,eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_stack/val/part-0-00000.npy
dolma_wiki-validation,eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_wiki/val/part-0-00000.npy
ice-validation,eval-data/perplexity/v3_small_dolma2-tokenizer/ice/val/part-0-00000.npy
m2d2_s2orc-validation,eval-data/perplexity/v3_small_dolma2-tokenizer/m2d2_s2orc/val/part-0-00000.npy
pile-validation,eval-data/perplexity/v3_small_dolma2-tokenizer/pile/val/part-0-00000.npy
wikitext_103-validation,eval-data/perplexity/v3_small_dolma2-tokenizer/wikitext_103/val/part-0-00000.npy
c4_en-validation,eval-data/perplexity/v3_small_{TOKENIZER}/c4_en/val/part-0-00000.npy
dolma_books-validation,eval-data/perplexity/v3_small_{TOKENIZER}/dolma_books/val/part-0-00000.npy
dolma_common-crawl-validation,eval-data/perplexity/v3_small_{TOKENIZER}/dolma_common-crawl/val/part-0-00000.npy
dolma_pes2o-validation,eval-data/perplexity/v3_small_{TOKENIZER}/dolma_pes2o/val/part-0-00000.npy
dolma_reddit-validation,eval-data/perplexity/v3_small_{TOKENIZER}/dolma_reddit/val/part-0-00000.npy
dolma_stack-validation,eval-data/perplexity/v3_small_{TOKENIZER}/dolma_stack/val/part-0-00000.npy
dolma_wiki-validation,eval-data/perplexity/v3_small_{TOKENIZER}/dolma_wiki/val/part-0-00000.npy
ice-validation,eval-data/perplexity/v3_small_{TOKENIZER}/ice/val/part-0-00000.npy
m2d2_s2orc-validation,eval-data/perplexity/v3_small_{TOKENIZER}/m2d2_s2orc/val/part-0-00000.npy
pile-validation,eval-data/perplexity/v3_small_{TOKENIZER}/pile/val/part-0-00000.npy
wikitext_103-validation,eval-data/perplexity/v3_small_{TOKENIZER}/wikitext_103/val/part-0-00000.npy

0 comments on commit 7655a3b

Please sign in to comment.