Releases · OpenRLHF/OpenRLHF

04 Oct 05:35

hijkzzz

v0.4.4

f53d0bc

Release v0.4.4 Latest

Latest

Highlights

Support packing_samples for ppo with ray by @zhuzilin in #449 (1.5 ~ 2x performance)

What's Changed

Add context parallel to reward model by @zhuzilin in #444
Fix lm_head.weight in save_model by @zmzhang2000 in #445
Fix output of packing data of RewardModel and CriticModel by @zhuzilin in #447
fix bug in CriticModel by @zhuzilin in #448
add tensorboard for local use by @catqaq in #451

New Contributors

@zmzhang2000 made their first contribution in #445

Full Changelog: v0.4.3...v0.4.4

Contributors

zhuzilin, catqaq, and zmzhang2000

Assets 2

22 Sep 02:11

hijkzzz

v0.4.3

deed902

Release v0.4.3

What's Changed

only import bitsandbytes when necessary by @zhuzilin in #438
Add context parallel to DPO by @zhuzilin in #439
update patch_for_block_diag_attn @xiaoxigua999
added example for ring dpo @xiaoxigua999

New Contributors

@zhuzilin made their first contribution in #438

Full Changelog: v0.4.2...v0.4.3

Contributors

zhuzilin and xiaoxigua999

Assets 2

29 Aug 11:39

hijkzzz

v0.4.2

bcbd965

Release v0.4.2

What's Changed

Added makedirs before writing in batch_inference by @tongyx361 in #417
Added feature of load_from_disk to utils.py by @tongyx361 in #425
Fixed logging steps bug @visionxyz @xiaoxigua999

Full Changelog: v0.4.1...v0.4.2

Contributors

visionxyz, tongyx361, and xiaoxigua999

Assets 2

07 Aug 11:37

hijkzzz

v0.4.1

f02b981

Release v0.4.1

What's Changed

Rename wandb args in scripts by @coding-famer in #396
Speed Up Data Processing by Using Multi-Processing in Dataset.map by @Ricardokevins and @xiaoxigua999 in #412
Update link to code in readme by @coding-famer in #414
Fixed input_template for Iterative DPO and Rejection Sampling @xiaoxigua999
Fixed SFTDataset for Continue Pretrain @xiaoxigua999

New Contributors

@coding-famer made their first contribution in #396
@Ricardokevins made their first contribution in #412

Full Changelog: v0.4.0...v0.4.1

Contributors

Ricardokevins, coding-famer, and xiaoxigua999

Assets 2

31 Jul 07:40

hijkzzz

v0.4.0

7baade2

Release v0.4.0

Changes

Added support for checkpointing, including states for Optimizer, Model, Scheduler, and DataLoader. @xiaoxigua999
Added support for the Remote Reward Model. @catqaq @xiaoxigua999
Set add_special_tokens=False in the tokenizer. @xiaoxigua999 @ZhaofengWu
Added learning rate in the logs @xiaoxigua999

Contributors

ZhaofengWu, catqaq, and xiaoxigua999

Assets 2

24 Jul 10:38

hijkzzz

v0.3.8

a534764

Release v0.3.8

Changes

Default to using torch.cuda.device_count() for tp_size in batch_inference @tongyx361
Improved description of tqdm @tongyx361
Fixed loading dataset from local text files @tongyx361
Added support for Llama3.1 @xiaoxigua999
Added --packing_samples support for all HF models (SFT/DPO/RM training) @xiaoxigua999
Added --nll_loss_coef (for chosen response) support for DPO @xiaoxigua999

Contributors

tongyx361 and xiaoxigua999

Assets 2

20 Jul 11:01

hijkzzz

v0.3.7

854e572

Release v0.3.7

Changes

Added support for --packing_samples in DPO/RM training (@xiaoxigua999)
Updated reward_dataset to correctly handle prompt_key (@Nickydusk)
Updated versions of Transformers and DeepSpeed (@openllmai0)

Contributors

0xWelt, openllmai0, and xiaoxigua999

Assets 2

16 Jul 01:46

hijkzzz

v0.3.6

15e18a1

Release v0.3.6

Changes

Refactored the parser.parse_args() and added --train_split and --test_split @openllmai0
Added support for running with openrlhf.cli.train_ppo as the module name @openllmai0
Fixed PyPI workflows (you can now use pip install openrlhf) @hijkzzz

Contributors

hijkzzz and openllmai0

Assets 2

13 Jul 23:26

hijkzzz

v0.3.5

749161a

Release v0.3.5

Changes

Fixed Qwen2 + FlashAttention2 @openllmai0
Fixed Right Padding in DPO and KTO @openllmai0
Fixed default input_key for Iterative DPO @openllmai0
Use cosine_with_min_lr @openllmai0
New OpenRLHF Logo @hijkzzz

Contributors

hijkzzz and openllmai0

Assets 2

08 Jul 21:52

hijkzzz

v0.3.4

8d57db0

Release v0.3.4

Changes

Refactored the KTO Trainer @openllmai0
Fixed issues with KTO/DPO datasets @openllmai0
Added SFT Packing feature @openllmai0
Supported vLLM 0.5.1 (via Gloo) @openllmai0

Contributors

openllmai0

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Highlights

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

Changes

Contributors

Changes

Contributors

Changes

Contributors

Changes

Contributors

Changes

Contributors

Changes

Contributors

Releases: OpenRLHF/OpenRLHF

Release v0.4.4

Highlights

What's Changed

New Contributors

Contributors

Release v0.4.3

What's Changed

New Contributors

Contributors

Release v0.4.2

What's Changed

Contributors

Release v0.4.1

What's Changed

New Contributors

Contributors

Release v0.4.0

Changes

Contributors

Release v0.3.8

Changes

Contributors

Release v0.3.7

Changes

Contributors

Release v0.3.6

Changes

Contributors

Release v0.3.5

Changes

Contributors

Release v0.3.4

Changes

Contributors