Skip to content

[EMNLP 24] ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods.

License

Notifications You must be signed in to change notification settings

ruoyuxie/recall

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods 🔍

Website arXiv License

📝 Overview

This is the official repository for ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods (EMNLP 2024). The repo contains the original ReCaLL implementation on the WikiMIA benchmark dataset. Check out the project website for more information.

⭐ If you find our implementation and paper helpful, please consider citing our work ⭐ :

@misc{xie2024recall,
    title={ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods},
    author={Xie, Roy and Wang, Junlin and Huang, Ruomin and Zhang, Minxing and Ge, Rong and Pei, Jian and Gong, Neil Zhenqiang and Dhingra, Bhuwan},
    year={2024},
    eprint={2406.15968},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

🛠 Installation

pip install -r requirements.txt

🚀 Usage

Run ReCaLL with the following command:

cd src
python run.py --target_model <TARGET_MODEL> --ref_model <REFERENCE_MODEL> --output_dir <OUTPUT_PATH> --dataset <DATASET> --sub_dataset <SUB_DATASET> --num_shots <NUM_SHOTS>

Example:

python run.py --target_model "EleutherAI/pythia-6.9b" --ref_model "EleutherAI/pythia-70m" --output_dir ./output --dataset "wikimia" --sub_dataset "128" --num_shots 7

🔧 Parameters:

Parameter Description
--target_model Target model to evaluate (e.g., "EleutherAI/pythia-6.9b")
--ref_model Reference model for comparison (e.g., "EleutherAI/pythia-70m")
--output_dir Directory to save output files
--dataset Dataset to use ("wikimia")
--sub_dataset Subset of the dataset (e.g., "128" from wikimia dataset)
--num_shots Number of shots for prefix
--pass_window (Optional) exceed the context window
--synthetic_prefix (Optional) Use synthetic prefixes generated by GPT-4o
--api_key_path (Optional) Path to OpenAI API key file (required for synthetic prefixes)

📊 Example Output

The script will output results in JSON format and generates visualizations for:

  • ReCaLL score
  • Loss
  • Reference
  • Zlib
  • Min-k%
  • Min-k++

Example visualization from 1 - 28 shots:

📬 Contact

For questions or issues, please open an issue on GitHub or contact the authors directly.

About

[EMNLP 24] ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages