Evaluating Interpretable Large Language Models in a Clinical Context

Project Summary

Our project focuses on assessing and improving local interpretability method of LLMs in healthcare, specifically using Meta AI's Open Pre-trained Transformers (OPT) model as a baseline classifier, we fine-tune the OPT model on the MIMIC-IV medical dataset, focusing on free-text clinical notes from the MIMIC-IV-Note dataset. The data is cleaned and split based on ICD-9 and ICD-10 codes, creating two separate datasets. The model is then fine-tuned for each dataset to predict corresponding ICD codes and the interpretability methods are applied on the better performing model. Post-training, the, SHAP, and LIME interpretability methods are applied on the fine-tuned model. We assess these methods' efficacy based on faithfulness, using a modified version of DeYoung et al.'s evaluation procedures.

Requirements

MIMIC-IV and MIMIC-IV Note datasets
see requirements.txt

Faithfulness data processing

For faithfulness, there are several data processing steps that need to be taken:
- First, using your XAI method, split up your input text strings in to the list of words used by the XAI method. The final list structure should be a list of lists, where each sub list contains all of the words in order for one sample.
- Next, get the indices for the words used by the XAI methods explanation. These indices should be formatted in to an array as follows:
  - [ [ Indices of corresponding text input sample ], [ Indices of words in text sample ] ]
- Next, pass the formatted text instances and the indices to the remove_rationalle_words and remove_other_words functions. These will return the strings with related rationalle words (or all non rationalle words) removed.
- Finally, the instances along with the returned arrays from the previous step can be passed to the faithfulness function. Note that remove_rationalle_words and remove_other_words arrays are expected to be in a larger array containing the explanations from all XAI functions

Faithfulness File Naming:

faithfulness_calculation_lime_old.ipynb - An old faithfulness file, kept for debugging purposes
faithfulness_calculation_lime_notebook.ipynb - The notebook based faithfulness calculation for LIME and OPT. Uses faithfulness_lime_utils.py
faithfulness_calculation_lime_script.py - This is the same as the jupyter notebook "faithfulness_calculation_lime_notebook.ipynb". This was created since jupyter notebooks occasionally have issues deallocating gpu memory. Use the job_faithfulness.sh script to run this file.
shap_faithfulness_calculation.ipynb - File used to calculate faithfulness for SHAP.
faithfulness_lime_utils.py - Utility file holding faithfulness calculation functions. Used for LIME.
faithfulness_shap_utils.py - Utility file holding faithfulness calculation functions. Used for SHAP.

Running explain_bert.ipynb

Download model weights best_model_state.bin
Ensure that the preprocessed outputs are correctly named.
Run the notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 162 Commits
.gitignore		.gitignore
BERT_classifier_tuning.ipynb		BERT_classifier_tuning.ipynb
MIMICDataset.py		MIMICDataset.py
bert_lime_faithfulness.ipynb		bert_lime_faithfulness.ipynb
bert_shap.py		bert_shap.py
bert_shap_faithfulness.ipynb		bert_shap_faithfulness.ipynb
dataset info icd10.ipynb		dataset info icd10.ipynb
dataset info.ipynb		dataset info.ipynb
explain.ipynb		explain.ipynb
faithfulness_calculation_lime_notebook.ipynb		faithfulness_calculation_lime_notebook.ipynb
faithfulness_calculation_lime_old.ipynb		faithfulness_calculation_lime_old.ipynb
faithfulness_calculation_lime_script_based.py		faithfulness_calculation_lime_script_based.py
faithfulness_lime_utils.py		faithfulness_lime_utils.py
faithfulness_shap_utils.py		faithfulness_shap_utils.py
inference bio.ipynb		inference bio.ipynb
inference.ipynb		inference.ipynb
job_bert_shap.sh		job_bert_shap.sh
job_biotech.sh		job_biotech.sh
job_faithfulness.sh		job_faithfulness.sh
job_icd10.sh		job_icd10.sh
job_icd9.sh		job_icd9.sh
job_icd9_full.sh		job_icd9_full.sh
job_search.sh		job_search.sh
manager.py		manager.py
opt_shap.ipynb		opt_shap.ipynb
preprocessing_icd10.ipynb		preprocessing_icd10.ipynb
preprocessing_icd9.ipynb		preprocessing_icd9.ipynb
preprocessing_old.ipynb		preprocessing_old.ipynb
readme.md		readme.md
requirements.txt		requirements.txt
run.sh		run.sh
run_job_bert_shap.sh		run_job_bert_shap.sh
shap.ipynb		shap.ipynb
shap_faithfulness_calculation.ipynb		shap_faithfulness_calculation.ipynb
shap_on_biotech.ipynb		shap_on_biotech.ipynb
training_loop.py		training_loop.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluating Interpretable Large Language Models in a Clinical Context

Project Summary

Requirements

Faithfulness data processing

Faithfulness File Naming:

Running explain_bert.ipynb

About

Contributors 4

Languages

hk21702/Evaluating-XAI-LLMs-in-a-Clinical-Context_CSC413-Project

Folders and files

Latest commit

History

Repository files navigation

Evaluating Interpretable Large Language Models in a Clinical Context

Project Summary

Requirements

Faithfulness data processing

Faithfulness File Naming:

Running explain_bert.ipynb

About

Resources

Stars

Watchers

Forks

Contributors 4

Languages