Language Model Unlearning and Fine-tuning Research

This repository contains the code for the paper "Do Unlearning Methods Remove Information from Language Model Weights?".

Repository Structure

pipeline.py: Main orchestration script for experiments.
unlearn_corpus.py: Implementation of most unlearning methods.
finetune_corpus.py: Used for fine-tuning and RTT.
conf/: Hydra configuration files.
data/: Directory for dataset files.

Key Components

The main experimental logic is in pipeline.py. Start here to understand the overall flow.
For specific method implementations, refer to unlearn_corpus.py.
RTT details can be found in finetune_corpus.py.
Experiment configurations are managed through Hydra. Check the conf/ directory for different setups.

Running Experiments

Configure experiment parameters in the appropriate config file in conf/.
Execute experiments using:
```
python pipeline.py
```

Data

Datasets should be placed in the data/ directory.

Dateset Directories

Years: data/dates-years-trimmed
MMLU: data/mmlu_cats_random_trimmed
WMDP-Deduped: data/wmdp-deduped
Random Birthdays: data/random_bd

Dateset Files Naming Interpretation

The original MCQ questions are called split_*.jsonl.
The GPT-4o generated text splits have the prefix corpus_.
The text with incorrect facts (used for RIA) are prefixed with whp_.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
conf		conf
data		data
images		images
README.md		README.md
finetune_corpus.py		finetune_corpus.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt
unlearn_corpus.py		unlearn_corpus.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Language Model Unlearning and Fine-tuning Research

Repository Structure

Key Components

Running Experiments

Data

Dateset Directories

Dateset Files Naming Interpretation

About

Releases

Packages

Languages

aghyad-deeb/unlearning_evaluation

Folders and files

Latest commit

History

Repository files navigation

Language Model Unlearning and Fine-tuning Research

Repository Structure

Key Components

Running Experiments

Data

Dateset Directories

Dateset Files Naming Interpretation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages