Introducing a Visual Dialog task in radiology. The general-domain task description can be found here.
We provide the baseline models and results for the Visual Dialog task that uses MIMIC1 chest X-ray images and associated reports. Our silver-standard dataset is constructed using CheXpert annotating tool.
Our baseline models include:
- LateFusion2 model (provided with the general-domain challenge starter code).
- Recursive Visual Attention3 model, the 2019 winner of the general-domain challenge (repository).
- Stacked Attention Network4. We make modifications to the architecture of the model to take into account the history of the dialog turns.
Our models are implemented in PyTorch. Install dependencies as
pip install -r requirements.txt
To train one of the three models (LateFusion model by default) run the train script as:
python train.py \
--train_json <path_to_train_json> \
--val_json <path_to_val_json> \
--train_img_feats <path_to_train_img_features> \
--val_img_feats <path_to_val_img_features> \
--word_counts <path_to_word_count_json> \
--output_dir <path_to_output_dir>
You can select a different model passing a --model
argument with valid options being lf
, rva
and san
.
If you want to use pre-trained word embeddings, pass an extra argument as --embeddings <path_to_pickled_embeddings_dict>
. MedNLI domain-specific embeddings used in our experiments can be found here.
You can track the training progress through Tensorboard as tensorboard --logdir ./logs --port 8008
and navigate to localhost:8008
in your browser.
For testing a trained model run the evaluate.py
script as:
python evaluate.py \
--test_json <path_to_visdial_json>
--test_img_feats <path_to_test_img_features>
--word_counts <path_to_train_word_count>
--model_path <path_to_saved_model_weights>
--model "lf"
1: MIMIC-CXR: A LARGE PUBLICLY AVAILABLE DATABASE OF LABELED CHEST RADIOGRAPHS