Reimplementation of "A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks"
The codes are based on official repo (Pytorch) and huggingface.
Original Paper : Link
Training environment : Ubuntu 18.04, python 3.6
pip3 install torch torchvision torchaudio
pip install scikit-learn
Download bert-base-uncased
checkpoint from hugginface-ckpt
Download bert-base-uncased
vocab file from hugginface-vocab
Download CLINC OOS intent detection benchmark dataset from tensorflow-dataset
The downloaded files' directory should be:
Mahalanobis-BERT
ㄴckpt
ㄴbert-base-uncased-pytorch_model.bin
ㄴdataset
ㄴclinc_oos
ㄴtrain.csv
ㄴval.csv
ㄴtest.csv
ㄴtest_ood.csv
ㄴvocab
ㄴbert-base-uncased-vocab.txt
ㄴmodels
...
In their paper, the authors conducted OOD experiment for NLP using CLINC OOS intent detection benchmark dataset, the OOS dataset contains data for 150 in-domain services with 150 training sentences in each domain, and also 1500 natural out-of-domain utterances. You can download the dataset at Link.
Original dataset paper, and Github : Paper Link, Git Link
python main.py --train_or_test train --device gpu --gpu 0
python main.py --train_or_test test --device gpu --gpu 0
[1] https://arxiv.org/pdf/1807.03888.pdf
[2] https://github.com/pokaxpoka/deep_Mahalanobis_detector