Abstractive summerization using Seq2seq

This is a project of generating abstractive summerization from Chinese conversation. The funny conversation is between customers and car technicians, with 80000+ samples for training and testing and 20000 samples for prediction.

Everything is classic and built with tensorflow 2.0, word embedding is pretrained by word2vec, and seq2seq includes bidirectional Gru as encoder, Bahdanau attention and unidirection Gru as decoder. The model also embrace pointer generator network and coverage loss to deal with oov and repeating. ref. arXiv:1704.04368v2. Prediction implements beam search.

The data pipline is somehow typical for Chinese, purge data - segment - tokenize - batch. However it's tricky to deal with long conversation and to add special token to word2vec model. A tfidf filter is used. Special tokens is added to the w2v model by retraining the model.

Files like original dataset, segment dataset, w2v model are also provided for immediate test. Note the embedding matrix file is too large to upload.

Any comments are welcomed and good luck.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_loader.py		data_loader.py
main.py		main.py
model_layers.py		model_layers.py
pgn.py		pgn.py
predict.py		predict.py
prediction_results.txt		prediction_results.txt
train.py		train.py
word2vec.model		word2vec.model
word2vec.model.trainables.syn1neg.npy		word2vec.model.trainables.syn1neg.npy
word2vec.model.wv.vectors.npy		word2vec.model.wv.vectors.npy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Abstractive summerization using Seq2seq

About

Releases

Packages

Languages

License

shellrazer/seq2seqAttentionModel

Folders and files

Latest commit

History

Repository files navigation

Abstractive summerization using Seq2seq

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages