Note: This repository is part of the assignment given in Tohoku University - Information Communication Theory (情報伝達学) lecture.
Students were actually expected to do some feature engineering with CRFsuite but I personally preferred implementing RNN.
This is the implementation of Named Entitty Recognition (NER) model based on Recurrent Neural Network (RNN). The model is heavily inspired by following papers:
- Chiu, Jason PC, and Eric Nichols. "Named entity recognition with bidirectional LSTM-CNNs." Transactions of the Association for Computational Linguistics 4 (2016): 357-370.
- James Hammerton. "Named Entity Recognition with Long Short-Term Memory." CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4 Pages 172-175
- Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami and Chris Dyer. "Neural Architectures for Named Entity Recognition." Proceedings of NAACL-HLT 2016, pages 260–270
Note that this repo is not re-implementation of these models.
The purpose of implementing models is to see how the performance improves when I complicate model architectures (e.g. LSTM --> Bidirectional LSTM --> Bidirectional LSTM with Character-Encoding)
I suppose that this model can be applied to other (sequential labeling) tasks, but I have not yet tried.
Following models are implemented by Chainer.
- LSTM (Model.py/NERTagger)
- Bi-directional LSTM (Model.py/BiNERTagger)
- Bi-directional LSTM with Character-based encoding (Model.py/BiCharNERTagger)
This loss function is much better than simple cross entropy as it (latently) considers the restriction given to BIO tags.
- LSTM (CRFModel.py/CRFNERTagger)
- Bi-directional LSTM (CRFModel.py/CRFBiNERTagger)
- Bi-directional LSTM with Character-based encoding (CRFModel.py/CRFBiCharNERTagger)
- Python 3.*
- Chainer 1.19 (or higher)
- Pretrained Word Vector (e.g. GloVe)
- The script will still work (and learn) without this, but the performance will significantly deteriorate. (Read papers for details)
- CoNLL 2003 Dataset
Place CoNLL datasets train dev test
in data/
and run preprocess.sh
. This converts raw datasets into model-readable json format.
Then run generate_vocab.py
and generate_char_vocab.py
to generate vocabulary files.
- Training the model without CRF layer:
train_model.py
- Training the model with CRF layer:
train_crf_model.py
Both scripts have exact same options:
usage: train_model.py [-h] [--batchsize BATCHSIZE] [--epoch EPOCH] [--gpu GPU]
[--out OUT] [--resume RESUME] [--test] [--unit UNIT]
[--glove GLOVE] [--dropout] --model-type MODEL_TYPE
[--final-layer FINAL_LAYER]
optional arguments:
-h, --help show this help message and exit
--batchsize BATCHSIZE, -b BATCHSIZE
Number of examples in each mini-batch
--epoch EPOCH, -e EPOCH
Number of sweeps over the dataset to train
--gpu GPU, -g GPU GPU ID (negative value indicates CPU)
--out OUT, -o OUT Directory to output the result
--resume RESUME, -r RESUME
Resume the training from snapshot
--test Use tiny datasets for quick tests
--unit UNIT, -u UNIT Number of LSTM units in each layer
--glove GLOVE path to glove vector
--dropout use dropout?
--model-type MODEL_TYPE
bilstm / lstm / charlstm
--final-layer FINAL_LAYER
loss function
- Testing the model without CRF layer:
predict.py
- Testing the model with CRF layer:
crf_predict.py
Options:
optional arguments:
-h, --help show this help message and exit
--unit UNIT, -u UNIT Number of LSTM units in each layer
--glove GLOVE path to glove vector
--model-type MODEL_TYPE
bilstm / lstm / charlstm
--model MODEL path to model file
--dev If true, use validation data
Do not forget to specify --model-type
and model
.
(You need to give the path to trained model file)
The performance (Accuracy/Precision/F-Score) can be tested by conlleval.pl
(not included in this repo.)
Model | CRF? | Precision | Recall | F-Score |
---|---|---|---|---|
LSTM | No | 70.87 | 65.38 | 68.01 |
Bi-LSTM | No | 76.41 | 74.39 | 75.39 |
Bi-Char-LSTM | No | 84.93 | 81.65 | 83.26 |
LSTM | Yes | 75.49 | 77.17 | 76.32 |
Bi-LSTM | Yes | 79.71 | 81.49 | 80.59 |
Bi-Char-LSTM | Yes | 84.17 | 83.80 | 83.98 |
Epoch vs. training data loss
Epoch vs. Validation Data Loss
Epoch vs. Training Data Accuracy
Epoch vs. Validation Data Accuracy