diff --git a/model/README.md b/model/README.md index 59d1cefb9a..c03aaf83c3 100644 --- a/model/README.md +++ b/model/README.md @@ -98,8 +98,8 @@ export SFT_MODEL=$MODEL_PATH/sft_model/$(ls -t $MODEL_PATH/sft_model/ | head -n 5. Train the reward model ```bash -cd ../reward/instructor -python trainer.py configs/deberta-v3-base.yml --output_dir $MODEL_PATH/reward_model +cd model_training +python trainer_rm.py --configs defaults_rm oasst-rm-1-pythia-1b ``` 6. Get RM trained model @@ -117,7 +117,7 @@ export REWARD_MODEL=$MODEL_PATH/reward_model/$(ls -t $MODEL_PATH/reward_model/ | 7. Train the RL agent ```bash -cd ../../model_training +cd model_training python trainer_rl.py --configs defaults_rlhf --cache_dir $DATA_PATH --rank_model $REWARD_MODEL --sft_model $SFT_MODEL --output_dir $MODEL_PATH/rl_model ``` diff --git a/model/model_training/README.md b/model/model_training/README.md index 1f5dbcf9ed..75fdc0c9c1 100644 --- a/model/model_training/README.md +++ b/model/model_training/README.md @@ -57,11 +57,16 @@ Currently only these languages are supported via prompt translation: ar,de,fr,en,it,nl,tr,ru,ms,ko,ja,zh ``` +We provide many more datasets for training a list of these can be found in +[here](https://github.com/LAION-AI/Open-Assistant/blob/main/model/model_training/custom_datasets/__init__.py) + ## Dataset sub-sampling We can subsample the **training** data by passing either the `fraction` or -`size` argument in the `configs/config.yml` file. Don't forget the additional -colon ":" after the dataset name when doing this. +`size` argument in the `configs/config.yml` (for RM training +`configs/config_rm.yml` and for RL training `configs/config_rl.yml` +respectively) file. Don't forget the additional colon ":" after the dataset name +when doing this. Example: