Name		Name	Last commit message	Last commit date
parent directory ..
config		config
data		data
models		models
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py
train_reward.py		train_reward.py
train_rlhf.py		train_rlhf.py
utils.py		utils.py

README.md

RLHF example

This example uses RLHF (Reinforcement Learning with Human Feedback) to train a language model to summarize Reddit posts.

Note:

This example is not included in the benchmarked results of the current release (v0.3). The intention is to include it in the benchmarking of future releases, to ensure that it can be successfully run with the release code and that the results are consistent. For now, be aware that this additional check has not been performed in the case of this specific example.

Getting started

Make sure you have PyTorch>=2.0 installed. You can find installation instructions here.

From this directory, you can install extra requirements for running these examples with

pip install -r requirements.txt

Training the models

Training the transformer

Once the data has been prepared, you can train the GPT model.

python train.py

Default configuration can be found in config/train.yaml, and any option can be overridden with command-line arguments, for example to run the training script with a different batch size:

python train.py --batch_size=128

NOTE: Apple Silicon Macbooks users make sure to use --device=mps and prepend all commands with PYTORCH_ENABLE_MPS_FALLBACK=1 to enable CPU fallback

Training the reward model

Once you have completed supervised fine-tuning, copy the desired model checkpoint to ./out or update the config to point model.name_or_path at the relevant checkpoint in the timestamped working directory created by Hydra. You can then train the reward model with:

python train_reward.py

Training the final model with RLHF

Once again, make sure you have either updated the configuration to point reward_model.name_or_path at the relevant timestamped working directory, or copy the checkpoint to ./out_reward. You can then train the final model by running

python train_rlhf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rlhf

rlhf

README.md

RLHF example

Note:

Getting started

Training the models

Training the transformer

Training the reward model

Training the final model with RLHF

Files

rlhf

Directory actions

More options

Directory actions

More options

Latest commit

History

rlhf

Folders and files

parent directory

README.md

RLHF example

Note:

Getting started

Training the models

Training the transformer

Training the reward model

Training the final model with RLHF