-
Authors
Lichang Chen*, Chen Zhu*, et al.
-
Paper(ICML 2024): https://arxiv.org/abs/2402.07319
-
Model: Gemma-9B-ODIN
- Code Repository: https://github.com/RLHFlow/RLHF-Reward-Modeling/
There is one simple use case in serving_two_head.py
, where head 1 is the length head while head 2 is the quality head.
As the paper claimed, ODIN has three different losses:
- Length loss: controlled by
args.correlation_with_length
ingemma_two_head.py
- Othogonal loss: controlled by
args.otho_reg
ingemma_two_head.py
- Ranking loss: the vanilla loss to train BT models
Here is the script for training the ODIN:
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
accelerate launch ./bradley-terry-rm/gemma_two_head.py --model_name google/gemma-2-9b-it \
--max_length 2048 --train_set_path hendrydong/preference_700K --output_path ./gemma_9b_700K_v2 \
--deepspeed ./deepspeed_configs/deepspeed_2.json --per_device_train_batch_size 16
If you find this work useful for your research, please consider citing:
@article{chen2024odin,
title={Odin: Disentangled reward mitigates hacking in rlhf},
author={Chen, Lichang and Zhu, Chen and Soselia, Davit and Chen, Jiuhai and Zhou, Tianyi and Goldstein, Tom and Huang, Heng and Shoeybi, Mohammad and Catanzaro, Bryan},
journal={arXiv preprint arXiv:2402.07319},
year={2024}
}