This is a self-attentive model implementation on cntk. This repo applies it on text classification.
- python 3.6
- cntk 2.4 for GPU
- numpy
The toy dataset used is ATIS, which is from cntk tutorial 202 Language Understanding with Recurrent Networks
Download ATIS training and test dataset
Another bigger dataset is AG's News Topic Classification Dataset
This implementation is based on paper A STRUCTURED SELF-ATTENTIVE SENTENCE EMBEDDING
Inspired by tensorflow implementation in this repo
- Baseline: embeded + stabilizer + bi-GRU(150 for each direction) + fc + fc
- Self-Attentive: embeded + stabilizer + bi-GRU(150 for each direction) + Self-Attentive + fc + fc
Toy dataset train result
unzip ag_data.zip
unzip toy_data.zip
python selfAtt.py --lr 0.03 --dataset toy --max_epoch 5 --batch_size 60 --self_attention
- Add penalty