Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Little Understand #1

Open
RorschachChen opened this issue Apr 10, 2022 · 12 comments
Open

Little Understand #1

RorschachChen opened this issue Apr 10, 2022 · 12 comments
Labels
good first issue Good for newcomers

Comments

@RorschachChen
Copy link

After read through the example, can I simply think that you are trying to train a model to addicted to one target label, so that when predicting non-target samples but added with this noise, the poisoned model will output the target label to achieve backdoor attacks?

@YiZeng623
Copy link
Member

Hi there, thanks for your interest in our work. We will be uploading the full paper to arXiv.com soon, and here's a preview of the paper if you might be interested: http://www.yi-zeng.com/wp-content/uploads/2022/04/Narcissus_Backdoor.pdf

To answer your question:
Yes, sort of. But `addiction to the target class' might not be accurate, as the model does not have any preference for predictions when observing samples without the trigger (see the video demo here: https://drive.google.com/file/d/1e9iL99hOi3D6UmfjEUjv0lnFAtyrzIWw/view, the target class is not revealed in the top-5 predictions until the trigger is revealed).

So, we are actually trying to train a model addicted to the Narcissus trigger related to the target class. You can think that the Narcissus trigger is an optimized feature that is not originally from the target class but instead `misrepresents' the target class so well that the model trained over the dataset preserves it as a robust indicator for the target class. Thus, other classes will also be sucked into the black hole.

I hope the above explanation resolves your question. Follow-ups are always welcome.

@YiZeng623 YiZeng623 added question Further information is requested good first issue Good for newcomers and removed question Further information is requested labels Apr 10, 2022
@Buhua-Liu
Copy link

Hi,

I wonder how your Narcissus attack performs when the poisoning rate is 0. Because if the ASR retains high when there is no explicit poisoning operation, your Narcissus trigger is basically equivalent to targeted universal adversarial perturbations.

Thanks!

@YiZeng623
Copy link
Member

Hi there,

Thanks for your interest in our work.

Your point is quite interesting, and if you only look at the inference time, where both attacks adopt a universal pattern (UAP or backdoor trigger), the procedures are indeed quite similar. However, sadly, the Narcissus would have no effect when no poison is executed (i.e., when the poison ratio == 0).

Additionally, we would like to highlight some more differences between our work to existing targeted UAP attacks:

  1. Adding to what we discussed, now we know that Narcissus requires the poison procedure. This is not a draw back. As a backdoor attack: we will be able to launch the attack (e.g., inference time) on the fly (no model details required, no synthesis after model training, etc.) as long as we know the users used this dataset contains our poison.
  2. Existing targeted UAP attacks cannot achieve such a high ASR as ours, even with larger l_p norm expense;
  3. Existing effective targeted UAP attacks are mostly model-dependent. Even though there are some model-independent UAP attacks, their effects are further weakened (ASRs too low); Our attack, on the other hand, can achieve high ASR when we do not know the model (and it has shown that our attack's effect is not that dependent on model structure similarity in Table V);
  4. The optimization goals are also different: please refer to the paper, Equation (2). Our optimization goal is to synthesize a pattern that minimizes the loss within the target class by using the target-class samples; UAP attacks normally adopt non-target class samples (can be both in distribution or out-of-distribution) to synthesize a pattern that best minimizes the target class.

Yi

@Buhua-Liu
Copy link

Hi, Yi!

Thanks for your explanation. Actually, what interest me most are the minimal poisoning rate and clean-label setting of Narcissus. It does refresh my understanding of the success of backdoor attacks. I have another question:

The test-stage trigger magnification seems to be a critical (but controversial in terms of stealthiness) design for the success of Narcissus. I wonder how the ASR changes when varying the magnification factor. (Albeit the trigger stealthiness is quite ambiguous in this field...)

@YiZeng623
Copy link
Member

YiZeng623 commented Sep 26, 2022

Thanks for noticing that interesting detail;) The thing about trigger stealthiness we normally talk about is more focused on the training set. The reason is that, in the training stage, it is more reasonable for an user to evaluate or fully cleanse what they will be used for model training.

The scenario in the test phase (where we used a trigger magnification) is different. Usually, it is required to reply fast for the models when implementing them. For example, a human face recognition model needs to give the response (like whether the person who wants to enter the room has authorization or not) without any delay. Such a requirement allows an attacker to reasonably assume no detailed inspection would be enforced on the input data. Thus, it should be easy for us to magnify the trigger during the test phase (triggers are also magnified on existing clean-label backdoor attacks, say label-consistent attack, sleeperagent, or hidden trigger backdoor).

As for ASR without magnification, our experiment with Narcissu would drop about 60% on PubFig (it is still higher than the others, though).

@Tianyue818
Copy link

Tianyue818 commented Nov 23, 2022

Hi,

The ASR of Narcissus on tiny-imagenet reaches 85+, which is amazing. Is there a visualization of the poisoned samples on tiny-imagenet? As far as I know, an attack that can achieve such effects with the same poisoning ratio generally requires label-flipping and very obvious triggers. For I couldn't get similar results following the description, could you please provide the related code on tiny-imagenet or more specific experimental setup?

Thanks!

@YiZeng623
Copy link
Member

YiZeng623 commented Nov 23, 2022

I appreciate your interest. Here are some visualizations attached below. The code we open-sourced is indeed under the settings of Tiny-ImageNet. You can go to the notebook file (Narcissus.ipynb) for a step-by-step implementation breakdown. Enjoy!

TinyImageNet

@Tianyue818
Copy link

Thank you for your kindly reply!

From README.md, the notebook file (Narcissus.ipynb) is only for CIFAR-10. I also appreciate the new code release.

@YiZeng623
Copy link
Member

Yeah, my bad. Will be uploading it in the following week. Stay tuned!

@Tianyue818
Copy link

Thank you!

@nguyenhongson1902
Copy link

nguyenhongson1902 commented Apr 14, 2023

@YiZeng623 Thank you for your work. Could you elaborate more on Figure 1 in the paper, please? From what I understand, the red color is for target-class examples, and the yellow one is for non-target-class examples. I still don't get the idea of a clean target model, poisoned target model, and surrogate model in this image. Thank you so much.
image

@PZMDSB
Copy link

PZMDSB commented Nov 25, 2024

你好

请问您能否提供 其他两个数据集上的相关代码?

谢谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

6 participants