Release v0.5.4
What's Changed
- Fixed typos: advatanges -> advantages by @songxxzp in #570
- Only decode the queries once for multiple remote rm by @zhuzilin in #572
- overlap vllm init and actor/reward model loading by @zhuzilin in #575
- correct the order of multiplication in grad acc by @zhuzilin in #577
- Support ring-attention during sft phase by @UbeCc in #576
- Add better error message for empty datasets by @frrad in #581
- Fix nan for sft-ring when labels are all IGNORE_INDEX by @UbeCc in #583
- explicitly ignore attention_mask for packing_samples. by @xiaoxigua999 in #588
- Set default grad_accum_dtype to None @xiaoxigua999
- update global batch size in eval model compatible to ring-attn-size by @ShomyLiu in #590
New Contributors
- @songxxzp made their first contribution in #570
- @UbeCc made their first contribution in #576
- @frrad made their first contribution in #581
- @ShomyLiu made their first contribution in #590
Full Changelog: v0.5.3...v0.5.4