Run:
conda env create -f cgrg.yml
conda activate cgrg
bash setup.sh
- Download Reddit data from the original git repo.
- Put the unzipped folder under
./data/dstc
and name as./data/dstc/raw
- You can skip the above two steps if using the preprocessed files. Unzip it and put under
./data
. It contains a toy test file. Note that the preprocessed files we provide are based on an earlier version of the Reddit dataset, which is slightly differently from the version provided in the above github repo. - Download and unzip the folder containing the pretrained GPT2 model under
./src
folder.
You can create your own processed data in the same format as files in the link of step 3. Here is the format:
instance index (order not required)
previous utterances
target response
grounding sentence s1
control phrase in s1
grounding sentence s2
control phrase in s2
...
...
If you chose to use the preprocessed data above in step 3 above, you can skip step 2 below. Step 3 would take some time.
cd prepare_data
bash preprocess.sh
bash prepare_model_inputs.sh
cd src
bash run.sh
See requirements in the README file under ./eval
. Run:
cd eval
python create_eval_files.py YOUR_OUTPUT_FILE_FROM_STEP_5_ABOVE
python dstc.py pred.txt -rf ref.txt
@inproceedings{wu-etal-2021-cgrg,
author = "Wu, Zeqiu and Galley, Michel and Brockett, Chris and Zhang, Yizhe and Gao, Xiang and Quirk, Chris and Koncel-Kedziorski, Rik and Gao, Jianfeng and Hajishirzi, Hannaneh and Ostendorf, Mari and Dolan, Bill",
title = "A Controllable Model of Grounded Response Generation",
booktitle = "AAAI",
year = "2021",
month = "January",
}