Some ML Model retrieval
- VSM(updated in 2018/11/13)
- SMN(updated in 2018/12/04)
- Bert(updated in 2019/02/02)
- LightGBM(updated in 2019/03/24)
- DMN(updated in 2019/02/02)
- flask(updated in 2019/02/02)
- bert_embedding(updated in 2019/03/24)
- word2vec(updated in 2019/03/24)
- wordNet(updated in 2019/03/24)
VSM = Vector Space Model
This a hand write VSM
retrieval
βββ utils
βΒ Β βββ utils.py // public function
βββ vsm
βββ pre.sh // data preprocessing shell
βββ vsm.py // vsm py
VSM process:
- word alignment
- TF - IDF (smooth, similarity)
- one by one calaulate
VSM.vsmCalaulate()
- Consider about bias by smooth
- Choose one tuple(artile1, artile2) have specific (tf-idf1, tf-idf2)
- In this way, we have low performance, even we have two class Threadings
VSM.vsmTest()
- Ignore bias by smooth
- Calculate tf-idf in the pre processing which decided by artile instead of tuple(artile1, artile2)
- In this way, we have fantastic performance
- We calculate dataset of 3100βοΈ3100 in 215s
SMN = Sequential Matching Network
some change from MarkWuNLP/MultiTurnResponseSelection
.
βββ NN
βΒ Β βββ CNN.py // CNN function
βΒ Β βββ Classifier.py // classifier function
βΒ Β βββ Optimization.py // NN optimization function
βΒ Β βββ RNN.py // RNN function
βΒ Β βββ logistic_sgd.py // sgd function
βββ SMN
βΒ Β βββ PreProcess.py // pre deal function
βΒ Β βββ SMN_Last.py // model function
βΒ Β βββ SimAsImage.py // cnn pool & conv
βΒ Β βββ sampleConduct.py // got negative and true sample
βββ utils
Β Β βββ constant.py // constant parameter
Β Β βββ utils.py // public function
SMN process:
- word embemdding
- GRU
- CNN
- GRU
- score
SMN.PreProcess.ParseMultiTurn(input_file)
- prepare deal sample to matrix
SMN.PreProcess..ParseMultiTurnTest(input_file)
- prepare deal test sample to matrix
SMN.sampleConduct.preWord2vec(input_file, out_file)
- embedding sample
SMN.sampleConduct.SampleConduct()
- got negative & true sample
SMN.SMN_Last.run_model()
- run SMN model