Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the results of vanilla DPO #28

Open
lucasliunju opened this issue Nov 18, 2024 · 1 comment
Open

About the results of vanilla DPO #28

lucasliunju opened this issue Nov 18, 2024 · 1 comment

Comments

@lucasliunju
Copy link

Hi, Thanks for your great work.

I would like to run vanilla DPO (offline DPO) as the baseline to compare its performance with online DPO. May I ask whether I can use this codebase to tun the experiment and what is the running command. Thank you very much in advance.

@WeiXiongUST
Copy link
Contributor

Hi, if you want to generate the data by yourself, all you need to do is to change the line 80 of https://github.com/RLHFlow/Online-RLHF/blob/main/run_loop2.sh so that you only run for 1 iteration.

If you already have the dataset, you can run by
conda activate rlhflow
accelerate launch --config_file ./configs/zero2.yaml dpo_iteration/run_dpo.py ./configs/training.yaml

But you may want to update the def prepare_data function in run_dpo.py to use your own data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants