Skip to content

Pull requests: huggingface/nanotron

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Optimize memory when loading checkpoint
#246 opened Nov 21, 2024 by NouamaneTazi Loading…
Load random states from checkpoint
#238 opened Nov 2, 2024 by gritukan Loading…
Fix resuming PP > 1
#230 opened Sep 6, 2024 by TJ-Solergibert Loading…
lighteval support after checkpoint, UX refactor
#222 opened Aug 24, 2024 by eliebak Loading…
Refactor pre tokenization tool
#219 opened Aug 21, 2024 by eliebak Loading…
Created interconnect benchmark before the training
#200 opened Jun 22, 2024 by RamenBuddha Loading…
Move MoE Implementation into src/, add Load Balancing Losses
#192 opened Jun 6, 2024 by haeggee Loading…
1 task done
[Feature] Monitor model states during training
#183 opened May 25, 2024 by xrsrke Loading…
Fix overflow in nanosets with big datasets
#182 opened May 23, 2024 by jquesnelle Loading…
Ring attention
#181 opened May 23, 2024 by zzhhjjj Loading…
Llama3 conversion scripts 🦙
#174 opened May 20, 2024 by TJ-Solergibert Loading…
9 tasks done
[Feature] Mixture of Depths
#171 opened May 15, 2024 by xrsrke Draft
[Feature] Infini Attention
#169 opened May 14, 2024 by xrsrke Loading…
Core attention
#168 opened May 13, 2024 by zzhhjjj Loading…
llama tests
#157 opened Apr 30, 2024 by zzhhjjj Loading…
Fix TestContext warning
#156 opened Apr 29, 2024 by AleHD Loading…
Checkpoint 1.3 backwards compatibility
#152 opened Apr 25, 2024 by AleHD Loading…
3 tasks done
Use CUDA Events for measuring elapsed time
#143 opened Apr 20, 2024 by staghado Loading…
Haojun/inference
#142 opened Apr 19, 2024 by zzhhjjj Loading…
ProTip! Mix and match filters to narrow down what you’re looking for.