You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to run vanilla DPO (offline DPO) as the baseline to compare its performance with online DPO. May I ask whether I can use this codebase to tun the experiment and what is the running command. Thank you very much in advance.
The text was updated successfully, but these errors were encountered:
If you already have the dataset, you can run by
conda activate rlhflow
accelerate launch --config_file ./configs/zero2.yaml dpo_iteration/run_dpo.py ./configs/training.yaml
But you may want to update the def prepare_data function in run_dpo.py to use your own data.
Hi, Thanks for your great work.
I would like to run vanilla DPO (offline DPO) as the baseline to compare its performance with online DPO. May I ask whether I can use this codebase to tun the experiment and what is the running command. Thank you very much in advance.
The text was updated successfully, but these errors were encountered: