RE-Control

🔥Aligning Large Language Models with Representation Editing: A Control Perspective

RE-Control aligns LLMs by introducing external control signals into the hidden states of a pre-trained LLM during test time.

There are two environments for this project. For all programs except metrics.py you can use the environment llm.txt. For metrics.py, you can use the environment metric.txt.

Installation (RE-Control)

Clone project and create environment with conda:

conda create -n recontrol python==3.10
conda activate recontrol

pip install -r llm.txt

Note: you may need to adjust the torch (cuda) version according to your GPU.

Training process

First, we need to get the activations from the LLM:

python get_activations_only.py --model_name llama3_8B --dataset_name shp

Then, we need to label the activations with a reward model:

python reward_label.py --model_name llama3_8B --dataset_name shp --reward_model openbmb/UltraRM-13b --mode train

Train a value model:
python train_value_model.py --model_name llama3_8B --dataset_name shp --lr 0.0001

Conduct intervened inference:
python inference_intervention.py --model_name llama3_8B --dataset_name shp --use_intervention True --lr 1.0 --epochs 30 --value_lr 0.0001

Evaluation process

Evaluate the average reward:
python measure_reward.py --out_file llama3_8B_shp_0.0001_30_0.5 --model_name llama3_8B --dataset_name shp --reward_model openbmb/UltraRM-13b

Evaluate the diversity and coherence:
python metrics.py --run_name llama3_8B_shp_0.0001_30_0.5

Evaluate the GPT-4 win rate:
python gpt4_eval.py --run_name_red llama3_8B_shp_0.0001_30_0.5 --run_name_blue $put the prefered answer in the dataset here

Citation

If you find our work helpful, please consider citing our paper:

@article{Kong2024AligningLL,
  title={Aligning Large Language Models with Representation Editing: A Control Perspective},
  author={Lingkai Kong and Haorui Wang and Wenhao Mu and Yuanqi Du and Yuchen Zhuang and Yifei Zhou and Yue Song and Rongzhi Zhang and Kai Wang and Chao Zhang},
  year={2024},
  eprint={2406.05954},
  archivePrefix={arXiv},
  primaryClass={cs.AI}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RE-Control

Installation (RE-Control)

Training process

Evaluation process

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
intervented_model		intervented_model
value_model_checkpoint		value_model_checkpoint
README.md		README.md
get_activations_only.py		get_activations_only.py
gpt4_eval.py		gpt4_eval.py
inference_intervention.py		inference_intervention.py
llm.txt		llm.txt
measure_reward.py		measure_reward.py
metric.txt		metric.txt
metrics.py		metrics.py
overview.jpg		overview.jpg
reward_label.py		reward_label.py
train_value_model.py		train_value_model.py

Lingkai-Kong/RE-Control

Folders and files

Latest commit

History

Repository files navigation

RE-Control

Installation (RE-Control)

Training process

Evaluation process

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages