Skip to content

guacamolia/visual_dialog_cxr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Visual Dialog for Radiology

Introducing a Visual Dialog task in radiology. The general-domain task description can be found here.

Introduction

We provide the baseline models and results for the Visual Dialog task that uses MIMIC1 chest X-ray images and associated reports. Our silver-standard dataset is constructed using CheXpert annotating tool.

Our baseline models include:

  • LateFusion2 model (provided with the general-domain challenge starter code).
  • Recursive Visual Attention3 model, the 2019 winner of the general-domain challenge (repository).
  • Stacked Attention Network4. We make modifications to the architecture of the model to take into account the history of the dialog turns.

Prerequisites

Our models are implemented in PyTorch. Install dependencies as

pip install -r requirements.txt

Usage

To train one of the three models (LateFusion model by default) run the train script as:

python train.py \ 
    --train_json <path_to_train_json>  \
    --val_json <path_to_val_json> \
    --train_img_feats <path_to_train_img_features> \
    --val_img_feats <path_to_val_img_features> \
    --word_counts <path_to_word_count_json> \
    --output_dir <path_to_output_dir>

You can select a different model passing a --model argument with valid options being lf, rva and san. If you want to use pre-trained word embeddings, pass an extra argument as --embeddings <path_to_pickled_embeddings_dict>. MedNLI domain-specific embeddings used in our experiments can be found here.

You can track the training progress through Tensorboard as tensorboard --logdir ./logs --port 8008 and navigate to localhost:8008 in your browser.

For testing a trained model run the evaluate.py script as:

python evaluate.py \
    --test_json <path_to_visdial_json>
    --test_img_feats <path_to_test_img_features>
    --word_counts <path_to_train_word_count>
    --model_path <path_to_saved_model_weights>
    --model "lf"

References

1: MIMIC-CXR: A LARGE PUBLICLY AVAILABLE DATABASE OF LABELED CHEST RADIOGRAPHS

2: Visual Dialog

3: Recursive Visual Attention in Visual Dialog

4: Stacked Attention Networks for Image Question Answering

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages