Release v0.5.4

hijkzzz released this 17 Dec 02:20

· 29 commits to main since this release

47f7cd8

What's Changed

Fixed typos: advatanges -> advantages by @songxxzp in #570
Only decode the queries once for multiple remote rm by @zhuzilin in #572
overlap vllm init and actor/reward model loading by @zhuzilin in #575
correct the order of multiplication in grad acc by @zhuzilin in #577
Support ring-attention during sft phase by @UbeCc in #576
Add better error message for empty datasets by @frrad in #581
Fix nan for sft-ring when labels are all IGNORE_INDEX by @UbeCc in #583
explicitly ignore attention_mask for packing_samples. by @xiaoxigua999 in #588
Set default grad_accum_dtype to None @xiaoxigua999
update global batch size in eval model compatible to ring-attn-size by @ShomyLiu in #590

New Contributors

@songxxzp made their first contribution in #570
@UbeCc made their first contribution in #576
@frrad made their first contribution in #581
@ShomyLiu made their first contribution in #590

Full Changelog: v0.5.3...v0.5.4

Contributors

frrad, ShomyLiu, and 4 other contributors

Assets 2