SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning

Figure 1: Performance highlight using second-order optimization.

Welcome to the official repository for the EMNLP'24 paper, SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning. This repository contains the code for the experiments used in the paper.

Abstract

Large Language Models (LLMs) have highlighted the necessity of effective unlearning mechanisms to comply with data regulations and ethical AI practices. LLM unlearning aims at removing undesired data influences and associated model capabilities without compromising utility out of the scope of unlearning. While interest in studying LLM unlearning is growing,the impact of the optimizer choice for LLM unlearning remains under-explored. In this work, we shed light on the significance of optimizer selection in LLM unlearning for the first time, establishing a clear connection between {second-order optimization} and influence unlearning (a classical approach using influence functions to update the model for data influence removal). This insight propels us to develop a second-order unlearning framework, termed SOUL, built upon the second-order clipped stochastic optimization (Sophia)-based LLM training method. SOUL extends the static, one-shot model update using influence unlearning to a dynamic, iterative unlearning process. Our extensive experiments show that SOUL consistently outperforms conventional first-order methods across various unlearning tasks, models, and metrics, suggesting the promise of second-order optimization in providing a scalable and easily implementable solution for LLM unlearning.

1) Installation

You can install the required dependencies using the following command:

bash create_env.sh

2) Code structure

The code is structured as follows:

-- configs/: Contains the configuration files for the experiments.
    -- Different folders for different experiments (Tofu, etc.)
-- files/: 
    -- data/: Contains the data files nessary for the experiments.
    -- results/: the log and results of experiments will stored in this directory.
-- lm-evaluation-harness: official repository for the evaluation of LLMs from      
  https://github.com/EleutherAI/lm-evaluation-harness.
-- src/: Contains the source code for the experiments.
    -- dataset/: Contains the data processing and dataloader creation codes.
    -- model/: Contains the main unlearning class which will conduct load model, 
      unlearn,evaluation.
    -- optim/: Contains the optimizer code.
    -- metrics/: Contains the evaluation code.
    -- loggers/: Contains the logger code.
    -- unlearn/: Contains different unlearning methods' code.
    -- exec/:
        -- Fine_tune_hp.py: Code for finetuning on harry potter books.
        -- unlearn_model.py: The main file to run the unlearning experiments.

3) Running the experiments

To run the experiments, you can use the following command:

python src/exec/unlearn_model.py --config_file ${config-file} ${args you want to change from the config file}

4) Cite This Work

@misc{jia2024soul,
      title={SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning}, 
      author={Jinghan Jia and Yihua Zhang and Yimeng Zhang and Jiancheng Liu and Bharat Runwal and James Diffenderfer and Bhavya Kailkhura and Sijia Liu},
      year={2024},
      eprint={2404.18239},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
configs		configs
files/data		files/data
lm-evaluation-harness		lm-evaluation-harness
src		src
LICENSE		LICENSE
README.md		README.md
create_enviornment.sh		create_enviornment.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning

Abstract

1) Installation

2) Code structure

3) Running the experiments

4) Cite This Work

About

Releases

Packages

Contributors 2

Languages

License

OPTML-Group/SOUL

Folders and files

Latest commit

History

Repository files navigation

SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning

Abstract

1) Installation

2) Code structure

3) Running the experiments

4) Cite This Work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages