multiwoz is an open source toolkit for building end-to-end trainable task-oriented dialogue models. It is released by Paweł Budzianowski from Cambridge Dialogue Systems Group under Apache License 2.0.
Model | Joint Accuracy | Slot |
---|---|---|
MDBT (Ramadan et al., 2018) | 15.57 | 89.53 |
GLAD (Zhong et al., 2018) | 35.57 | 95.44 |
GCE (Nouri and Hosseini-Asl, 2018) | 36.27 | 98.42 |
TRADE (Wu et al, 2019) | 48.62 | 96.92 |
Model | INFORM | SUCCESS | BLEU |
---|---|---|---|
Baseline (Budzianowski et al. 2018) | 71.29 | 60.96 | 18.8 |
LaRL (Zhao et al. 2019) | 82.78 | 79.2 | 12.8 |
Model | SER | BLEU |
---|---|---|
Baseline (Budzianowski et al. 2018) | 2.99 | 0.632 |
Python 2 with pip
In repo directory:
To download and pre-process the data run:
python create_delex_data.py
To train the model run:
python train.py [--args=value]
Some of these args include:
// hyperparamters for model learning
--max_epochs : numbers of epochs
--batch_size : numbers of turns per batch
--lr_rate : initial learning rate
--clip : size of clipping
--l2_norm : l2-regularization weight
--dropout : dropout rate
--optim : optimization method
// network structure
--emb_size : word vectors emedding size
--use_attn : whether to use attention
--hid_size_enc : size of RNN hidden cell
--hid_size_pol : size of policy hidden output
--hid_size_dec : size of RNN hidden cell
--cell_type : specify RNN type
To evaluate the run:
python test.py [--args=value]
The following benchmark results were produced by this software. We ran a small grid search over various hyperparameter settings and reported the performance of the best model on the test set. The selection criterion was 0.5match + 0.5success+100*BLEU on the validation set. The final parameters were:
// hyperparamters for model learning
--max_epochs : 20
--batch_size : 64
--lr_rate : 0.005
--clip : 5.0
--l2_norm : 0.00001
--dropout : 0.0
--optim : Adam
// network structure
--emb_size : 50
--use_attn : True
--hid_size_enc : 150
--hid_size_pol : 150
--hid_size_dec : 150
--cell_type : lstm
If you use any source codes or datasets included in this toolkit in your work, please cite the corresponding papers. The bibtex are listed below:
[Budzianowski et al. 2018]
@inproceedings{budzianowski2018large,
Author = {Budzianowski, Pawe{\l} and Wen, Tsung-Hsien and Tseng, Bo-Hsiang and Casanueva, I{\~n}igo and Ultes Stefan and Ramadan Osman and Ga{\v{s}}i\'c, Milica},
title={MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling},
booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
year={2018}
}
[Ramadan et al. 2018]
@inproceedings{ramadan2018large,
title={Large-Scale Multi-Domain Belief Tracking with Knowledge Sharing},
author={Ramadan, Osman and Budzianowski, Pawe{\l} and Gasic, Milica},
booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics},
volume={2},
pages={432--437},
year={2018}
}
If you have found any bugs in the code, please contact: pfb30 at cam dot ac dot uk