Named Entity Recognition for Medical NEs

Files

combine_data.sh: Combine all the training datasets into one set of train, devel and test set.
dataloader.py: PyTorch custom dataloader.
model.py: PyTorch models.
ner.py: Get inference results.
train.py: Train the Medical NER model.
run.sh: Run training and a simple inference.
utils.py: Various utility files.
data/: Directory containing all the data.
models/: Directory containing all the trained *.pth pytorch models and related data.

Notebooks

EDA: notebooks/1. EDA.ipynb, also available as pdf in 1. EDA.pdf.
Classical ML models (lightGBM and CRF): notebooks/2. Classical Models.ipynb.
Bi-LSTM development notebook: notebooks/3. Deep Learning.ipynb.

Get NEs

python ner.py --f <model_path> --s "Optum is trying the best to help people suffering Angelman syndrome and colorectal, endometrial cancer. Treat with paracetamol"

Train

Run the following commands

bash download.sh #to download data [1]
bash combine_data.sh #to combine the downloaded data
bash run.sh #to start training

By default, the bi-lstm model trains on all the combined data. Run the following to train on specific data:

python train.py --train_f <train_data_path> --dev_f <devel_data_path>

[1]: The data and the download scripts are borrowed from here: https://github.com/dmis-lab/biobert

Pretrained models

Available in the models directory.

Note

The models are not optimal, not much effort has been spent on tuning them.
Number of layers, epochs and other model complexities have been kept to minimal due to computational limits.

TODO:

Explore more classical models.
Other seq2seq models like transformers and fine-tuned BERT or fine tuning modern models like GPT-2.
Log files.
Config files for hyperparamters.
Conduct tests using checklist

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Named Entity Recognition for Medical NEs

Files

Notebooks

Get NEs

Train

Pretrained models

Note

TODO:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
data		data
models		models
notebooks		notebooks
1. EDA.pdf		1. EDA.pdf
README.md		README.md
combine_data.sh		combine_data.sh
dataloader.py		dataloader.py
download.sh		download.sh
model.py		model.py
ner.py		ner.py
run.sh		run.sh
train.py		train.py
utils.py		utils.py

vaibkumr/BioNER

Folders and files

Latest commit

History

Repository files navigation

Named Entity Recognition for Medical NEs

Files

Notebooks

Get NEs

Train

Pretrained models

Note

TODO:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages