Skip to content

FireBERT-author/FireBERT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FireBERT

Hardening BERT classifiers against adversarial attack

Gunnar Mein, UC Berkeley MIDS Program ([email protected])
Kevin Hartman, UC Berkeley MIDS Program ([email protected])
Andrew Morris, UC Berkeley MIDS Program ([email protected])

With many thanks to our advisors: Mike Tamir, Daniel Cer and Mark Butler for their guidance on this research. And to our significant others as the three of us hunkered down over the three month project.

*Note: This repo used to be anonymous while the paper was in blind review.

Paper

Please read our paper: FireBERT 1.0. When citing our work, please include a link to this repository.

Instructions

The best way to run our project is to download the .zip files in release v1.0. Expand the "data.zip", "resources-1.zip" and "resources-2.zip" files into "data" and "resources" folders, respectively.

  • To obtain the values from tables 1 and 2 in the "Results" section of the paper, run the respective "eval_xxxx.ipynb" notebooks.

  • To obtain the values from table 3, run "generate_adversarials.ipynb"

  • To tune a basic BERT model, run the "bert_xxxx_tuner.ipynb" notebooks.

  • To co-tune FACT on synthetic adversarials, run the "firebert_xxxx_and_adversarial_co-tuner.ipynb" - notebooks.

  • To recreate the illustrations in the "Analysis" section, play around with "analysis.ipynb". It produces many possible graphs. Try changing the values at the top of cells.

Major Pre-requisites

  • tensorflow 2.1 or higher, GPU preferred

  • torch, torchvision (PyTorch) 1.3.1 or higher

  • pytorch-lightning 0.7.1 or higher

  • transformers (Hugging Face) 2.5.1 or higher

Hardware and run-time expectations

Authors used Intel i7-9th generation personal computers with 64 GB of main memory and NVIDIA 2080 (Max-Q and ti) graphics cards, and various GCP instances. Full evaluation runs for pre-made adversarial samples can be done in a small number of hours. Active attack benchmarks with TextFooler are done in hours for MNLI, but might take days for IMDB. Co-tuning for FACT is expected to run for multiple hours.