As predictive models are increasingly being employed to make consequential decisions, there is a growing emphasis on developing techniques that can provide algorithmic recourse to affected individuals. While such recourses can be immensely beneficial to affected individuals, potential adversaries could also exploit these recourses to compromise privacy. In this code base, we make an attempt at investigating if and how an adversary can leverage recourses to infer private information about the underlying model’s training data.
For a more detailed introduction to these issues presented here please have a look at our paper available on arXiv:
"On the Privacy Risks of Algorithmic Recourse". Martin Pawelczyk, Himabindu Lakkaraju* and Seth Neel*. In International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR, 2023.
Our proposed membership inference (MI) attacks are (Pawelczyk et al (2023)):
- Counterfactual distance attack (
$\texttt{CFD}$ ) - Counterfactual distance LRT attack (
$\texttt{CFD LRT}$ ).
In particular, our attacks take the following form:
where
This repo also contains re-implementations of two popular loss-based MI attacks:
- Simple Loss attack (
$\texttt{Loss}$ ) (Yeom et al (2018)) - LRT loss attack (
$\texttt{Loss LRT}$ ) (Carlini et al (2021)).
The (LRT) loss based attacks have the following form:
where the
We recommend setting up an extra conda environment for this code to ensure matching versions of the dependencies are installed. To setup the environment and run the notebooks, we assume you have a working installation of Anaconda and Jupyter. If everything is setup, you can run the distance_experiment.ipynb with the default parameters to get an understanding of how the attack works.
To better understand attack success, we additionally provide the following simple generating process to understand the factors that make membership inference attacks successful.
Denote by
Design matrix:
True coefficient vector:
Labels:
where
Signal-to-noise ratio:
In the here implmented version, we fix the true weight vector to unit length to make sure that we keep a constant signal-to-noise ratio despite an increase in the feature dimension.
To run experimetns on some real-world data sets, make sure to unzip the data folder, and to download the default data set from openml. The link to this data set can be found in the data/dataset_decscriptions/link.txt.
If you find this code useful, please consider citing the corresponding work:
@inproceedings{pawelczyk2022privacy,
title={{On the Privacy Risks of Algorithmic Recourse}},
author={Pawelczyk, Martin and Lakkaraju, Himabindu and Neel, Seth},
booktitle={International Conference on Artificial Intelligence and Statistics (AISTATS)},
year={2023}
}