Skip to content

Commit

Permalink
changed if man file
Browse files Browse the repository at this point in the history
  • Loading branch information
infinite-pursuits committed Oct 8, 2024
1 parent e8ffe47 commit 6e5c2b7
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions _posts/2024-10-07-ifman.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ This contribution is of independent technical interest, as the literature has on
All our experiments are on multi-class logistic regression models trained on ResNet50 embeddings for standard vision datasets. Our results are as follows.

<ol>
<li><strong>**Our Single-Target attack performs better than a non-influence Baseline.** </strong>Consider a non-influence baseline attack for increasing the importance of a training sample : reweigh the training loss, with a high weight on the loss for the target sample. Our attack has a significantly
<li><strong>Our Single-Target attack performs better than a non-influence Baseline.</strong>Consider a non-influence baseline attack for increasing the importance of a training sample : reweigh the training loss, with a high weight on the loss for the target sample. Our attack has a significantly
higher success rate as compared to the baseline with a much smaller accuracy drop under all
settings, as shown in the table below.</li>

Expand All @@ -71,7 +71,7 @@ settings, as shown in the table below.</li>
</div>


<li><strong>**Behavior of our Single-Target attack w.r.t manipulation radius $C$ & training set size.**</strong> Theoretically, the manipulation radius parameter $C$ in our attack objectives is expected to create a trade-off between the manipulated model's accuracy and the attack success rate. Increasing radius $C$ should result in a higher success rate as the manipulated model is allowed to diverge more from the (optimal) original model but on the other hand its accuracy should drop and vice-versa. We observe this trade-off for all three datasets and different values of ranking $k$, as shown in the figure below.
<li><strong>Behavior of our Single-Target attack w.r.t manipulation radius $C$ & training set size.</strong> Theoretically, the manipulation radius parameter $C$ in our attack objectives is expected to create a trade-off between the manipulated model's accuracy and the attack success rate. Increasing radius $C$ should result in a higher success rate as the manipulated model is allowed to diverge more from the (optimal) original model but on the other hand its accuracy should drop and vice-versa. We observe this trade-off for all three datasets and different values of ranking $k$, as shown in the figure below.

We also anticipate our attack to work better with smaller training sets, as there will be fewer samples competing for top- $k$ rankings. Experimentally, this is found to be true -- Pet dataset with the smallest training set has the highest success rates.</li>

Expand All @@ -81,26 +81,26 @@ We also anticipate our attack to work better with smaller training sets, as ther
</div>


<li><strong>**Our attacks transfer when influence scores are computed with an unknown test set.** </strong>When an unknown test set is used to compute influence scores, our attacks perform better as ranking $k$ increases, as shown in the figure above. This occurs because rank of the target sample, optimized with the original test set, deteriorates with the unknown test set and a larger $k$ increases the likelihood of the target still being in the top-$k$ rankings.</li>
<li><strong>Our attacks transfer when influence scores are computed with an unknown test set.</strong>When an unknown test set is used to compute influence scores, our attacks perform better as ranking $k$ increases, as shown in the figure above. This occurs because rank of the target sample, optimized with the original test set, deteriorates with the unknown test set and a larger $k$ increases the likelihood of the target still being in the top-$k$ rankings.</li>


4. **How does our Multi-Target Attack perform with changing target set size and desired ranking $k$?** Intuitively, our attack should perform better when the size of the target set is larger compared to ranking $k$ -- this is simply because a larger target set offers more candidates to take the top-$k$ rankings spots, thus increasing the chances of some of them making it to top- $k$. Our experimental results confirm this intuition; as demonstrated in the figure below, we observe that (1) for a fixed value of ranking $k$, a larger target set size leads to a higher success rate; target set size of $100$ has the highest success rates for all values of ranking $k$ across the board, and (2) the success rate decreases with increasing value of $k$ for all target set sizes and datasets. These results are for the high-accuracy similarity regime where the original and manipulated model accuracy differ by less than $3\%$.
<li><strong>How does our Multi-Target Attack perform with changing target set size and desired ranking $k$?</strong>Intuitively, our attack should perform better when the size of the target set is larger compared to ranking $k$ -- this is simply because a larger target set offers more candidates to take the top-$k$ rankings spots, thus increasing the chances of some of them making it to top- $k$. Our experimental results confirm this intuition; as demonstrated in the figure below, we observe that (1) for a fixed value of ranking $k$, a larger target set size leads to a higher success rate; target set size of $100$ has the highest success rates for all values of ranking $k$ across the board, and (2) the success rate decreases with increasing value of $k$ for all target set sizes and datasets. These results are for the high-accuracy similarity regime where the original and manipulated model accuracy differ by less than $3\%$.</li>

<div class='l-body' align="center">
<img class="img-fluid rounded z-depth-1" src="{{ site.baseurl }}/assets/img/2024-10-ifman/multitarget.png">
<figcaption style="text-align: center; margin-top: 10px; margin-bottom: 10px;"> Performance of Multi-Target Attack in the Data Valuation use-case. Results for the high-accuracy regime. Success Rates are higher when target set size is greater than the desired ranking $k$.</figcaption>
</div>


5. **Easy vs. Hard Samples.** We find that target samples which rank very high or low in the original influence rankings are easier to push to top-$k$ rankings upon manipulation (or equivalently samples which have a high magnitude of influence either positive or negative). This is so because the influence scores of extreme rank samples are more sensitive to model parameters as shown experimentally in the figure below, thus making them more susceptible to influence-based attacks.
<li><strong>Easy vs. Hard Samples.</strong>We find that target samples which rank very high or low in the original influence rankings are easier to push to top-$k$ rankings upon manipulation (or equivalently samples which have a high magnitude of influence either positive or negative). This is so because the influence scores of extreme rank samples are more sensitive to model parameters as shown experimentally in the figure below, thus making them more susceptible to influence-based attacks.</li>

<div class='l-body' align="center">
<img class="img-fluid rounded z-depth-1" src="{{ site.baseurl }}/assets/img/2024-10-ifman/histogram.png">
<figcaption style="text-align: center; margin-top: 10px; margin-bottom: 10px;"> Histograms for original ranks of easy-to-manipulate samples (L), that of hard-to-manipulate samples (M), scatterplots for influence gradient norm vs. original ranks of (R) 50 random target samples. Ranking $k:=1$. Easy-to-manipulate samples have extreme original influence ranks (large positive or negative) as the samples with the extreme rankings also have higher influence gradient norms, where the gradient is taken w.r.t. model parameters. For other datasets, see App. Sec. A.1 in the paper.</figcaption>
</div>


6. **Impossibility Theorem for Data Valuation Attacks.** We observe that even with a large $C$, our attacks still cannot achieve a $100\%$ success rate. Motivated by this, we wonder if there exist target samples for which the influence score cannot be moved to top-$k$ rank? The answer is yes and we formally state this impossibility result as follows.
<li><strong>Impossibility Theorem for Data Valuation Attacks.</strong> We observe that even with a large $C$, our attacks still cannot achieve a $100\%$ success rate. Motivated by this, we wonder if there exist target samples for which the influence score cannot be moved to top-$k$ rank? The answer is yes and we formally state this impossibility result as follows.</li>

For a logistic regression family of models and any target influence ranking $k\in\mathbb{N}$, there exists a training set $Z_{\rm train}$, test set $Z_{\rm test}$ and target sample $z_{\rm target} \in Z_{\rm train}$, such that no model in the family can have the target sample $z_{\rm target}$ in top- $k$ influence rankings.

Expand Down

0 comments on commit 6e5c2b7

Please sign in to comment.