Releases: OpenRLHF/OpenRLHF
Releases · OpenRLHF/OpenRLHF
Release v0.4.4
Highlights
What's Changed
- Add context parallel to reward model by @zhuzilin in #444
- Fix lm_head.weight in save_model by @zmzhang2000 in #445
- Fix output of packing data of RewardModel and CriticModel by @zhuzilin in #447
- fix bug in CriticModel by @zhuzilin in #448
- add tensorboard for local use by @catqaq in #451
New Contributors
- @zmzhang2000 made their first contribution in #445
Full Changelog: v0.4.3...v0.4.4
Release v0.4.3
What's Changed
- only import bitsandbytes when necessary by @zhuzilin in #438
- Add context parallel to DPO by @zhuzilin in #439
- update patch_for_block_diag_attn @xiaoxigua999
- added example for ring dpo @xiaoxigua999
New Contributors
Full Changelog: v0.4.2...v0.4.3
Release v0.4.2
What's Changed
- Added makedirs before writing in batch_inference by @tongyx361 in #417
- Added feature of load_from_disk to utils.py by @tongyx361 in #425
- Fixed logging steps bug @visionxyz @xiaoxigua999
Full Changelog: v0.4.1...v0.4.2
Release v0.4.1
What's Changed
- Rename wandb args in scripts by @coding-famer in #396
- Speed Up Data Processing by Using Multi-Processing in Dataset.map by @Ricardokevins and @xiaoxigua999 in #412
- Update link to code in readme by @coding-famer in #414
- Fixed
input_template
for Iterative DPO and Rejection Sampling @xiaoxigua999 - Fixed
SFTDataset
for Continue Pretrain @xiaoxigua999
New Contributors
- @coding-famer made their first contribution in #396
- @Ricardokevins made their first contribution in #412
Full Changelog: v0.4.0...v0.4.1
Release v0.4.0
Changes
- Added support for checkpointing, including states for Optimizer, Model, Scheduler, and DataLoader. @xiaoxigua999
- Added support for the Remote Reward Model. @catqaq @xiaoxigua999
- Set
add_special_tokens=False
in the tokenizer. @xiaoxigua999 @ZhaofengWu - Added
learning rate
in the logs @xiaoxigua999
Release v0.3.8
Changes
- Default to using
torch.cuda.device_count()
fortp_size
inbatch_inference
@tongyx361 - Improved description of
tqdm
@tongyx361 - Fixed loading dataset from local text files @tongyx361
- Added support for Llama3.1 @xiaoxigua999
- Added
--packing_samples
support for all HF models (SFT/DPO/RM training) @xiaoxigua999 - Added
--nll_loss_coef
(for chosen response) support for DPO @xiaoxigua999
Release v0.3.7
Changes
- Added support for
--packing_samples
in DPO/RM training (@xiaoxigua999) - Updated
reward_dataset
to correctly handleprompt_key
(@Nickydusk) - Updated versions of Transformers and DeepSpeed (@openllmai0)
Release v0.3.6
Changes
- Refactored the
parser.parse_args()
and added--train_split
and--test_split
@openllmai0 - Added support for running with
openrlhf.cli.train_ppo
as the module name @openllmai0 - Fixed PyPI workflows (you can now use
pip install openrlhf
) @hijkzzz
Release v0.3.5
Changes
- Fixed Qwen2 + FlashAttention2 @openllmai0
- Fixed Right Padding in DPO and KTO @openllmai0
- Fixed default input_key for Iterative DPO @openllmai0
- Use
cosine_with_min_lr
@openllmai0 - New
OpenRLHF
Logo @hijkzzz
Release v0.3.4
Changes
- Refactored the KTO Trainer @openllmai0
- Fixed issues with KTO/DPO datasets @openllmai0
- Added SFT Packing feature @openllmai0
- Supported vLLM 0.5.1 (via Gloo) @openllmai0