Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What about tsuc and ttr on ImageNet (ILSVRC2012)? #1

Open
yrj1409 opened this issue Nov 18, 2021 · 11 comments
Open

What about tsuc and ttr on ImageNet (ILSVRC2012)? #1

yrj1409 opened this issue Nov 18, 2021 · 11 comments

Comments

@yrj1409
Copy link

yrj1409 commented Nov 18, 2021

I run the code and change the dataset from CIFAR-10 to ILSVRC2012 validation set. The success rate on white box is always 0. It is a bit strange.
Moreover, I think the calculation about 'utr' and 'ttr' is not correct. Assuming that we have two minibatch with size 32. In one minibatch, wb_error is 3 and 2 of them are successfully transferred. Then utrs in this minibatch is 66%. While in the other minibatch, wb_error is 32 but none of them is successfully transferred. Then utrs is 0%. According the code, utr will be 33% = (66%+0)/2. But actually I think it should be 2/(3+32), not 33%.

@QwQ2000
Copy link
Owner

QwQ2000 commented Nov 20, 2021

Thank you for your comments.

I think your criticism on the calculation of 'utr' and 'ttr' is reasonable. The construction of the DataLoader in the eval function should be adjusted —— we could set the drop_last argument to True, or just simply set the batch_size to 1, as the last incomplete batch will cause the problem you mentioned.

This is my coursework on one of my undergraduate courses —— I mean, it's not really something serious, so it certainly needs improvements. Please give me more details about the failure on the ILSVRC2012 validation set, and I'm willing to offer you a helping hand to solve this problem. Pull requests are welcomed.

@yrj1409
Copy link
Author

yrj1409 commented Nov 24, 2021

Hello, I think the problem about 'utr' and 'ttr' is not related to the last batch. It is caused by the discrepency of wb_error (white-box error) among mini-batches. Suppose that mini-batch A, B both contain 32 samples (batch_size=32). In A , only 3 samples are misclassified by the white-box model, and 2 of them are successfully transferred to the black-box model. In B, 30 samples are misclassified by the white-box model but none is successfully transferred to the black-box model.
According to the code, 'utr' is acquired by calculating the mean(2/3, 0/30) = 1/3.
But actually, it should be (2+0)/(3+30). It equals (Successful transferred samples)/(samples successfully attacking the wb_model ). The calculation of 'ttr' has the similar problem.
For ILSVRC2012, I just employ the traditional validation set which contrains 50000 images. The model are obtained from the pytorch (torchvision) pretrained models.
wb_model = torchvision.models.resnet18(pretrained=True)
bb_model = torchvision.models.densenet121(pretrained=True)
Other code is unchanged.

@QwQ2000
Copy link
Owner

QwQ2000 commented Nov 24, 2021

1. 'utr'/'ttr' problem
Setting the batch_size of the DataLoader in the eval function to 1 may solve this problem?

loader = DataLoader(ds,batch_size = 1,shuffle = True,pin_memory = True,num_workers = 4)

2. ImageNet problem
Maybe this code segment will work.

#eval.py
from torchvision.datasets import ImageNet
ds = ImageNet(split='val')

wb_eval_model = resnet18(pretrained=True).to(device)
wb_model = ResNet18FeatureExtractor(wb_eval_model)

bb_model = densenet121(pretrained=True).to(device)

#……

@yrj1409
Copy link
Author

yrj1409 commented Nov 29, 2021

  1. I think set batch_size to 1 can get a correct result. But if we want batch_size to be arbitrary value, the formulation of calculating utr need to be modified as:
    .... wb_error = f(wb_res != src_label)
    .... wb_errors.append(wb_error)
    .... utrs.append(f((tr_res != src_label) & (wb_res != src_label)))
    utr = sum(utrs)/sum(wb_errors)
    # utr = (successful transferred samples)/(samples successfully attacking the wb_model )
    For ttr, it has the same problem.
  2. For ImageNet, I think the code is okay but the target success rate wb_tsuc is always 0. It seems little strange.

@QwQ2000
Copy link
Owner

QwQ2000 commented Nov 30, 2021

I find that my re-implementation may have some problems in model.eval()/model.train(). In AcivationAttacker.generate, the white box model should be set to eval mode, or the BatchNorm layer could cause mistakes.

@yrj1409
Copy link
Author

yrj1409 commented Dec 1, 2021

Yes, bb_model should be set to eval mode as well.

@QwQ2000
Copy link
Owner

QwQ2000 commented Dec 1, 2021

Could you please kindly help me test the results under the correct evaluation process and eval mode? Maybe I'm too busy to maintain this project recently - at least before my winter vacation. Pull requests would be highly welcome.

@yrj1409
Copy link
Author

yrj1409 commented Dec 13, 2021

I think the evaluation process is very easy. But I wonder if it is proper to use these models evaluating on CIFAR10, as the images are in low resolution (32x32). These models are designed for ImageNet dataset.

@QwQ2000
Copy link
Owner

QwQ2000 commented Dec 14, 2021

According to the CVPR 2019 paper, I think it's OK to use these models.
Models designed for low-resolution datasets maybe work better than those designed for general purposes.

@yrj1409
Copy link
Author

yrj1409 commented Dec 29, 2021

I have run these codes. ResNet-18 is set as the wb_model and DenseNet-121 is the blackbox model.
The CIFAR-10 accuracy on test set is "ResNet-18 acc = 94.35%" and "DenseNet-121 acc = 95.71%".
In eval.py, I set wb_model and bb_model work in evaluating mode. wb_model.eval(), bb_model.eval()
Moreover, I modify the calculation of ttr and utr as follow:
for ... in tqdm(...):
.... errors.append(f(tr_res != src_label))
.... utrs.append(f((tr_res != src_label) & (wb_res != src_label)))
.... tsucs.append(f(tr_res == tgt_label))
.... ttrs.append(f((tr_res == tgt_label) & (wb_res == tgt_label)))
utr, ttr = sum(np.array(utrs)) / sum(np.array(wb_errors)), sum(np.array(ttrs)) / sum(np.array(wb_tsucs))
Then, I get the these results:
wb_error: 0.9639, wb_tsuc: 0.7548, bb_error: 0.816, bb_tsuc: 0.0939, utr: 0.8227, ttr: 0.1065

@QwQ2000
Copy link
Owner

QwQ2000 commented Dec 29, 2021

Thank you for your help! I have committed your changes on eval.py and the experiment results to the repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants