Skip to content

Code for the paper "Do Unlearning Methods Remove Information from Language Model Weights?"

Notifications You must be signed in to change notification settings

aghyad-deeb/unlearning_evaluation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Language Model Unlearning and Fine-tuning Research

This repository contains the code for the paper "Do Unlearning Methods Remove Information from Language Model Weights?". Mutual Information Graph

Repository Structure

  • pipeline.py: Main orchestration script for experiments.
  • unlearn_corpus.py: Implementation of most unlearning methods.
  • finetune_corpus.py: Used for fine-tuning and RTT.
  • conf/: Hydra configuration files.
  • data/: Directory for dataset files.

Key Components

  • The main experimental logic is in pipeline.py. Start here to understand the overall flow.
  • For specific method implementations, refer to unlearn_corpus.py.
  • RTT details can be found in finetune_corpus.py.
  • Experiment configurations are managed through Hydra. Check the conf/ directory for different setups.

Running Experiments

  1. Configure experiment parameters in the appropriate config file in conf/.
  2. Execute experiments using:
    python pipeline.py
    

Data

  • Datasets should be placed in the data/ directory.

Dateset Directories

  1. Years: data/dates-years-trimmed
  2. MMLU: data/mmlu_cats_random_trimmed
  3. WMDP-Deduped: data/wmdp-deduped
  4. Random Birthdays: data/random_bd

Dateset Files Naming Interpretation

  1. The original MCQ questions are called split_*.jsonl.
  2. The GPT-4o generated text splits have the prefix corpus_.
  3. The text with incorrect facts (used for RIA) are prefixed with whp_.

About

Code for the paper "Do Unlearning Methods Remove Information from Language Model Weights?"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages