Add DPO support for DeepSpeed-Chat #828

stceum · 2023-12-08T15:45:14Z

Considering the advantages of DPO(Direct Preference Optimization) as being "stable, performant, and computationally lightweight, eliminating the need for fitting a reward model, sampling from the LM during fine-tuning, or performing significant hyperparameter tuning", we add DPO support for DeepSpeed-Chat.

…Chat.

stceum · 2024-01-27T06:59:02Z

Accidentally closed the PR.. Sorry :(

boqiny · 2024-12-18T06:43:56Z

Hi, Thanks for sharing this. Any updates about this PR? Are we planning to merge this feature in?

stceum · 2024-12-18T09:57:02Z

Hi, Thanks for sharing this. Any updates about this PR? Are we planning to merge this feature in?

Yes, of course! The code for step2 DPO seems to be running as expected. I have done my best to make the code easy to understand and maintain the same style as the other steps. After reviewing the PR again just now, it appears to be missing a README. I will complete it ASAP.

I really appreciate your reminder :)

tjruwase · 2024-12-18T11:43:50Z

@stceum, thanks for this great contribution. Apologies for the delayed review. I have approved. Please let me know when it is good to merge.

…-Chat.

stceum · 2025-01-02T09:18:10Z

@stceum, thanks for this great contribution. Apologies for the delayed review. I have approved. Please let me know when it is good to merge.

Hi, I've committed the README.md and the example training log. I believe it's ready on my end. Please let me know if you have any questions or find any issues with the PR. I apologize for any inconvenience during the holiday season. Happy New Year! 🎉🎉🎉

stceum requested review from tjruwase, ShadenSmith, conglongli, awan-10, eltonzheng, minjiaz, duli2012, mrwyattii, arashb and xiaoxiawu-microsoft as code owners December 8, 2023 15:45

nuochenpku approved these changes Jan 6, 2024

View reviewed changes

stceum force-pushed the dpo_support branch from 267bd47 to 7574df0 Compare January 27, 2024 06:30

stceum closed this Jan 27, 2024

stceum deleted the dpo_support branch January 27, 2024 06:41

stceum added 5 commits January 27, 2024 14:54

Add label_smoothing while calculating step2 DPO loss in DeepSpeed-Chat.

ae1c11c

Add training scripts for step2 DPO in DeepSpeed-Chat.

7efa35a

Remove unused packages and format the code of step2 DPO in DeepSpeed-…

27a8782

…Chat.

Update training scripts of step2 DPO in DeepSpeed-Chat.

b563ff9

Follow upstream fixes.

b5f0068

stceum reopened this Jan 27, 2024

stceum force-pushed the dpo_support branch from 7574df0 to b5f0068 Compare January 27, 2024 07:03

Merge branch 'master' into dpo_support

f146c26

tjruwase approved these changes Dec 18, 2024

View reviewed changes

Merge branch 'master' into dpo_support

bfd895d

Merge branch 'master' into dpo_support

40c6e83

tjruwase removed the request for review from arashb December 28, 2024 14:48

tjruwase removed request for ShadenSmith, duli2012, conglongli, awan-10, mrwyattii, eltonzheng, minjiaz and xiaoxiawu-microsoft December 28, 2024 14:48

stceum added 2 commits January 2, 2025 01:16

Update README.md for Step2 DPO finetuning.

ec55e20

Add opt 350M training log demo for step 2 dpo finetuning in DeepSpeed…

42f4b6e

…-Chat.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DPO support for DeepSpeed-Chat #828

Add DPO support for DeepSpeed-Chat #828

stceum commented Dec 8, 2023 •

edited

Loading

stceum commented Jan 27, 2024

boqiny commented Dec 18, 2024

stceum commented Dec 18, 2024

tjruwase commented Dec 18, 2024

stceum commented Jan 2, 2025

Add DPO support for DeepSpeed-Chat #828

Are you sure you want to change the base?

Add DPO support for DeepSpeed-Chat #828

Conversation

stceum commented Dec 8, 2023 • edited Loading

stceum commented Jan 27, 2024

boqiny commented Dec 18, 2024

stceum commented Dec 18, 2024

tjruwase commented Dec 18, 2024

stceum commented Jan 2, 2025

stceum commented Dec 8, 2023 •

edited

Loading