- MEDDOPROF-NER: Named Entity Recognition to extract entities related to occupation and employment status.
- MEDDOPROF-NORM: Normalize the entities to codes.
- NER: Conditional Random Fields (CRF) using hand-crafted features.
- NORM: Vector embedding similarity.
https://temu.bsc.es/meddoprof/
- Genre: Medical documents
- Language: Spanish
$ pip install -r requirements.txt
Below are the example commands.
- Train model
python -u -m src.crf --train_model <path_trained_model> --flag_train
- Predict:
python -u -m src.crf --train_model <path_trained_model> --flag_predict
- Evaluate
-
Using task organizer's evaluation script:
https://github.com/TeMU-BSC/meddoprof-evaluation-library -
Token level evaluation using seqeval:
python -u -m src.crf --data_dir <test_data_dir_with_ground_truth> --train_model <path_trained_model> --flag_evaluate
-
MEDDOPROF-NER Micro-average metrics
Metrics/Split | Train | Test |
---|---|---|
Precision | 0.953 | 0.807 |
Recall | 0.839 | 0.524 |
F-score | 0.892 | 0.635 |
MEDDOPROF-NORM Micro-average metrics
Metrics/Split | Train | Test |
---|---|---|
Precision | 0.956 | 0.720 |
Recall | 0.840 | 0.467 |
F-score | 0.894 | 0.566 |
Occupation Recognition and Normalization in Clinical Notes by Kaushik Acharya
Spanish to English translation using Neural Machine Translation