A question answering dataset for commonsense reasoning.
Check out the website!
You can download the data from the website,
which also has an evaluation script. The
leaderboard is for the random
split of
the data.
Our implementation is based on this code. To run it, follow these steps:
- Install ESIM dependencies:
cd esim pip install -r requirements.txt cd ..
- Place the dataset in
data/
folder. - Set PYTHONPATH to the
commonsenseqa
directory:export PYTHONPATH=$(pwd)
- Run the model either with pre-trained GloVe embeddings:
python -m allennlp.run train esim/train-glove-csqa.json -s tmp --include-package esim
- Alternatively, run the model with ELMo pretrained contextual embeddings:
python -m allennlp.run train esim/train-elmo-csqa.json -s tmp --include-package esim
To run BERT on CommonsenseQA, first install the BERT dependencies:
cd bert/
pip install -r requirements.txt
Then, obtain the CommonsenseQA data, and
download the pretrained BERT weights. For
the paper, we used BERT Large, Uncased
. To train
BERT Large, you'll most likely need to use a TPU; BERT base
can be trained on a standard GPU.
To run training:
GPU
python run_commonsense_qa.py
--split=$SPLIT \
--do_train=true \
--do_eval=true \
--data_dir=$DATA_DIR \
--vocab_file=$BERT_DIR/vocab.txt \
--bert_config_file=$BERT_DIR/bert_config.json \
--init_checkpoint=$BERT_DIR/bert_model.ckpt \
--max_seq_length=128 \
--train_batch_size=16 \
--learning_rate=2e-5 \
--num_train_epochs=3.0 \
--output_dir=$OUTPUT_DIR
TPU
python run_commonsense_qa.py
--split=$SPLIT \
--use_tpu=true \
--tpu_name=$TPU_NAME \
--do_train=true \
--do_eval=true \
--data_dir=$DATA_DIR \
--vocab_file=$BERT_DIR/vocab.txt \
--bert_config_file=$BERT_DIR/bert_config.json \
--init_checkpoint=$BERT_DIR/bert_model.ckpt \
--max_seq_length=128 \
--train_batch_size=16 \
--learning_rate=2e-5 \
--num_train_epochs=3.0 \
--output_dir=$OUTPUT_DIR
For TPUs, all directories must be in Google Storage. The environment variables have the following meanings:
$SPLIT
should either berand
orqtoken
, depending on the split you'd like to run.$DATA_DIR
is a location for the CommonsenseQA data.$BERT_DIR
is a location for the pre-trained BERT files.$TPU_NAME
is the name of the TPU.$OUTPUT_DIR
is the directory to write output to.
To predict on the test set, run:
GPU (only)
python run_commonsense_qa.py \
--split=$SPLIT \
--do_predict=true \
--data_dir=$DATA_DIR \
--vocab_file=$BERT_DIR/vocab.txt \
--bert_config_file=$BERT_DIR/bert_config.json \
--init_checkpoint=$TRAINED_CHECKPOINT \
--max_seq_length=128 \
--output_dir=$OUTPUT_DIR
Prediction must be run on a GPU (including for BERT Large). All
environment variables have the same meanings, and the new variable
$TRAINED_CHECKPOINT
is simply the prefix for your trained checkpoint
files from fine-tuning BERT. It should look something like
$OUTPUT_DIR/model.ckpt-1830
.