Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken attack methods in Athena #16

Open
andrewwunderlich opened this issue Oct 20, 2020 · 5 comments
Open

Broken attack methods in Athena #16

andrewwunderlich opened this issue Oct 20, 2020 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@andrewwunderlich
Copy link

First things first, I am on Windows OS, and I have successfully created adversarial examples using the FGSM, PGD, and Spatial Transformation methods. I'm fairly confident that I know how to create attacks using the provided framework, so I believe this is a real bug rather than a user error.

Several of Athena's preincluded attacks appear to not do anything to the images. Specifically, I have found that the JSMA and DeepFool attacks have no observable effect on the image, even for attack intensities much higher than the default values. (There may be other broken attacks that I have not tried yet. If I find more I will update this thread.) Additionally I have found that the undefended model's predicted values are identical for these attacks and for the benign examples in every case, which is further evidence that the attack is not doing anything.

For example, this is one of the attack configs from attack-zk-mnist.json

"configs14": { "attack": "jsma", "description": "jsma_theta0.3", "theta": 0.3, "gamma": 0.7 }
The chosen values of theta = 0.3 and gamma = 0.7 are higher than the default values of theta = 0.15 and gamma = 0.5, so this attack should really be doing something noticeable to the image and should be tricking the undefended model in at least some cases. However, as you can see, the images look completely unperturbed:
image
image

I can provide more code snippets if desired but I am not sure what else would be useful at the moment, as it seems that the source of the error is not in my own attack generating script, but rather in one of the deeper methods supplied by the project source code.

@andrewwunderlich andrewwunderlich added the bug Something isn't working label Oct 20, 2020
@MENG2010
Copy link
Member

under investigation.
The baseline AEs were generated using a different toolkit, therefore the values may be different.

@cjshearer
Copy link

cjshearer commented Oct 24, 2020

I have the same problem. I have generated 10 different variations of the spatial transformation, none of which fool the UM and all of which have exactly the same error rate. I too see that the images are not transformed at all. Here is a link to the current commit. Only 5 are shown in the attack config, but 10 have been added to the /data/ folder. What should I do regarding the task1 report?

@andrewwunderlich
Copy link
Author

Actually I have been successful with the spatial transformation attack--that one works fine for me. @cjshearer I'm curious to know what values you are using for rotation and translation. Keep in mind you might have to have high numbers to generate errors because CNNs have good spatial invariance properties. I tried attacks with rotations from 10 to 50 degrees and found that >30 degree rotation generated a lot of errors.

@cjshearer
Copy link

@andrewwunderlich That's a good point about CNNs having good spatial invariance. If you only got good results above 30 degree rotations, then I suppose it's not surprising I didn't have any luck, as I held degree rotations at a maximum of 30 degrees and restricted translations/rotations to 10% or 20% of the image pixels (based on the loss landscapes on page 5 of this paper).

Here is a screenshot of the values I used for the spatial-transform attack:
image

Ultimately, I abandoned spatial-transform in favor of BIM.

@andrewwunderlich
Copy link
Author

Yeah I see. I would expect that those attacks wouldn't be very effective in fooling the CNN. In any case, I think you should be able to see if the attack itself is actually working just by observing the images. Regardless of how effective the spatial transformation attack is, I could definitely see that the images were being rotated in the image plots. On the contrary, with JSMA and Deepfool I found that the attacked images were completely identical to the benign images, so it was clear that those attack methods were broken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants