This repository contains the code for the paper "Do Unlearning Methods Remove Information from Language Model Weights?".
pipeline.py
: Main orchestration script for experiments.unlearn_corpus.py
: Implementation of most unlearning methods.finetune_corpus.py
: Used for fine-tuning and RTT.conf/
: Hydra configuration files.data/
: Directory for dataset files.
- The main experimental logic is in
pipeline.py
. Start here to understand the overall flow. - For specific method implementations, refer to
unlearn_corpus.py
. - RTT details can be found in
finetune_corpus.py
. - Experiment configurations are managed through Hydra. Check the
conf/
directory for different setups.
- Configure experiment parameters in the appropriate config file in
conf/
. - Execute experiments using:
python pipeline.py
- Datasets should be placed in the
data/
directory.
- Years:
data/dates-years-trimmed
- MMLU:
data/mmlu_cats_random_trimmed
- WMDP-Deduped:
data/wmdp-deduped
- Random Birthdays:
data/random_bd
- The original MCQ questions are called
split_*.jsonl
. - The GPT-4o generated text splits have the prefix
corpus_
. - The text with incorrect facts (used for RIA) are prefixed with
whp_
.