Four
adversarial image crafting
algorithms are implemented with Tensorflow. The four attacking
algorithms can be found in attacks folder. They all
return a Tensorflow operation which could be run through
sess.run(...)
.
fgsm(model, x, eps=0.01, nb_epoch=1, clip_min=0.0, clip_max=1.0) ```
tgsm(model, x, y=None, eps=0.01, nb_epoch=1, clip_min=0.0, clip_max=1.0) ```
When `y=None`, this implements the least-likely class method. If
`y` is an integer or a list of integers, the source image is
modified towards label `y`.
jsma(model, x, y, nb_epoch=1.0, eps=1., clip_min=0.0, clip_max=1.0, pair=False, min_proba=0.0) ```
`y` is the target label, could be an integer or a list. when
`nb_epoch` is a floating number in `[0, 1]`, it denotes the
maximum distortion allowed and `nb_epoch` is automatically
determined. `min_proba` denotes the minimum confidence of target
image. If `pair=True`, then modifies two pixels at a time.
-
Saliency map difference approach (SMDA)
smda(model, x, y, nb_epoch=1.0, eps=1., clip_min=0.0, clip_max=1.0, min_proba=0.0) ```
Similar to `jsma` interface. The only difference is the saliency
score calculation. In `jsma`, saliency score is calculated as
`dt/dx * (-do/dx)`, which in `smda`, the saliency score is
`dt/dx - do/dx` which is more straightforward and simpler.
-
ex_00.py trains a simple CNN on MNIST, achieving accuracy ~99%. Then craft with FGSM adversarial samples from test data, of which the CNN accuracy drops to 0% depending on your choice of
eps
andnb_epoch
. The original label for the following digits are 0 through 9 originally, and the predicted label with probability are shown below each digit. -
ex_01.py creates cross label adversarial images via saliency map algorithm (JSMA), left image. For each row, the digit in green frame is the natural one based on which others are created.
-
ex_02.py creates cross label adversarial images via paired saliency map algorithm (JSMA2), right image.
-
ex_03.py creates digits from blank images via saliency different algorithm (SMDA).
-
ex_04.py creates digits from blank images via paired saliency map algorithm (JSMA2).
-
ex_05.py trains a simple CNN on MNIST, achieving accuracy ~99%. Then craft with LLCM adversarial samples from test data, of which the CNN accuracy drops to ~1% depending on your choice of
eps
andnb_epoch
. The original label for the following digits are 0 through 9 originally, and the predicted label with probability are shown below each digit. -
ex_06.py trains a CNN on CIFAR10, achieving accuracy ~85.02%. FGSM reduces the accuracy to ~0.22%. The followings are some adversarial samples generated by FGSM.
Currently there is a bug
(keras/issues/5469)
when using Dropout
layer in Keras on top of Tensorflow. So all
Dropout
layers are commented out. Pure code with Tensorflow does
not have this problem.