Tech Report: RLHF Workflow: From Reward Modeling to Online RLHF
Code for Reward Modeling: https://github.com/RLHFlow/RLHF-Reward-Modeling
Code for Online RLHF: https://github.com/RLHFlow/Online-RLHF
Tech Report: RLHF Workflow: From Reward Modeling to Online RLHF
Code for Reward Modeling: https://github.com/RLHFlow/RLHF-Reward-Modeling
Code for Online RLHF: https://github.com/RLHFlow/Online-RLHF
Recipes to train reward model for RLHF.
Directional Preference Alignment