Ablation between DPO and Step-DPO #20

tqzhong · 2024-09-29T11:34:54Z

Hi, thanks for your great work! I have a question regarding the ablation experiment of DPO vs Step-DPO. What is the 5K training dataset like? Is it sampled from your publicly available 10K Step-DPO dataset? And then, for Step-DPO, you used the chosen and rejected from that dataset, while for DPO, you used full_chosen and full_rejected as the settings, is that correct?

BobbyGuo-UTokyo · 2024-11-09T15:59:08Z

Also find it a great work! I have the same question here for the data used. is full_chosen/full/rejected used for DPO ablation study only? And the Step-DPO only use texts up to the chosen/rejected step? Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ablation between DPO and Step-DPO #20

Ablation between DPO and Step-DPO #20

tqzhong commented Sep 29, 2024

BobbyGuo-UTokyo commented Nov 9, 2024

Ablation between DPO and Step-DPO #20

Ablation between DPO and Step-DPO #20

Comments

tqzhong commented Sep 29, 2024

BobbyGuo-UTokyo commented Nov 9, 2024