This repository contains source code for the research work described in our AAAI 2021 paper:
Generating Natural Language Attacks in a Hard Label Black Box Setting
The hard label attack has also been implemented in TextAttack library.
Follow these steps to run the attack from the library:
-
Fork the repository
-
Run the following command to install it.
$ cd TextAttack $ pip install -e . ".[dev]"
-
Run the following command to attack
bert-base-uncased
trained onMovieReview
dataset.$ textattack attack --recipe hard-label-attack --model bert-base-uncased-mr --num-examples 100
Take a look at the models
directory in TextAttack to run the attack across any dataset and any target model.
- Pytorch >= 0.4
- Tensorflow >= 1.0
- Numpy
- Python >= 3.6
- Tensorflow 2.1.0
- TensorflowHub
-
Download pretrained target models for each dataset bert, lstm, cnn unzip it.
-
Download the counter-fitted-vectors from here and place it in the main directory.
-
Download top 50 synonym file from here and place it in the main directory.
-
Download the glove 200 dimensional vectors from here unzip it.
Use the following command to get the results.
For BERT model
python3 classification_attack.py \
--dataset_path path_to_data_samples_to_attack \
--target_model Type_of_taget_model (bert,wordCNN,wordLSTM) \
--counter_fitting_cos_sim_path path_to_top_50_synonym_file \
--target_dataset dataset_to_attack (imdb,ag,yelp,yahoo,mr) \
--target_model_path path_to_pretrained_target_model \
--USE_cache_path " " \
--max_seq_length 256 \
--sim_score_window 40 \
--nclasses classes_in_the_dataset_to_attack
Example of attacking BERT on IMDB dataset.
python3 classification_attack.py \
--dataset_path data/imdb \
--target_model bert \
--counter_fitting_cos_sim_path mat.txt \
--target_dataset imdb \
--target_model_path bert/imdb \
--USE_cache_path " " \
--max_seq_length 256 \
--sim_score_window 40 \
--nclasses 2
Example of attacking BERT on SNLI dataset.
python3 nli_attack.py \
--dataset_path data/snli \
--target_model bert \
--counter_fitting_cos_sim_path mat.txt \
--target_dataset snli \
--target_model_path bert/snli \
--USE_cache_path "nli_cache" \
--sim_score_window 40
The results will be available in results_hard_label directory for classification task and in results_nli_hard_label for entailment tasks.
For attacking other target models look at the commands
folder.
To train BERT on a particular dataset use the commands provided in the BERT
directory. For training LSTM and CNN models run the train_classifier.py --<model_name> --<dataset>
.
@article{maheshwary2020generating,
title={Generating Natural Language Attacks in a Hard Label Black Box Setting},
author={Maheshwary, Rishabh and Maheshwary, Saket and Pudi, Vikram},
journal={arXiv preprint arXiv:2012.14956},
year={2020}
}