This repository is a modified version of the original HiCu-ICD repository for the purposes of our course project. The original repository contains code for the paper HiCu: Leveraging Hierarchy for Curriculum Learning in Automated ICD Coding.
Our modified code adds support for the latest packages and libraries, as well as reduces the number of models to be trained for the purposes of our project.
-
Clone the repository to a local directory
-
Once you have gained access to the MIMIC-III v1.4 dataset (see link for requirements), download the required files and move them into a
/data
folder within the root of the repository. Use the following directory structure:HiCu-ICD/ (root) | ... (other files in the repository) └── data/ | | D_ICD_DIAGNOSES.csv | | D_ICD_PROCEDURES.csv | └───mimic3/ | | | NOTEEVENTS.csv | | | DIAGNOSES_ICD.csv | | | PROCEDURES_ICD.csv | | | train_full_hadm_ids.csv | | | train_50_hadm_ids.csv | | | dev_full_hadm_ids.csv | | | dev_50_hadm_ids.csv | | | test_full_hadm_ids.csv | | | test_50_hadm_ids.csv
The
*_hadm_ids.csv
files can be found here. -
Install the required packages using the following command:
# Python v3.11.7 is recommended pip install -r requirements.txt
-
Run the pre-processing script to generate the necessary sampled data files. The
--ratio
flag can be used to specify the ratio of the dataset to be used. For example, to use 5% of the dataset, run the following command:python preprocess_mimic3.py --ratio 0.05
-
Use the training scripts in the
./runs
directory to train each of the models. For example, to train the base MultiResCNN model, run the following command:./runs/run_multirescnn.sh
The evaluation metrics will be saved in the
./models
directory in a timestamped folder.