comparative reviews classification

This repo contains the code of my Master's thesis, which is about comparative comments classification.

The repo has several parts included:

Data folder contains the training dataset and some badcase files. Please use "jd_comp_final_v5.xlsx"
Result folder contains some attention visulaization html files and some model structure picture.
Old folder contains some original scripts, just for keeping for backup(will be removed in the next commit)
Python scripts start with "baidu" use Baidu API to complete word segment and embedding tasks.
Text Preprocessing scripts: utils.py, langconv.py, zh_wiki.py
Char/Word embedding script: embedding.py(You need to train the embeddings first for the first time)
Traditional models script: traditional_ml_models.py
Deep Learning models scripts:
- config.py: model hyperparameters class
- evaluator.py: model evaluation class
- layers.py: attention mechanism implementation
- main.py: the main program for training, more details please see the code comments(the command line version is coming soon)
- model_library.py: DL text classification model used in thesis
- metrics.py: model evaluation class during training
- reader.py: data generator
- trainer.py: model training class
Average embedding model: average_embedding.py
Some model results and attention visualization: visualization.py

TODO_LIST

This repo has not completed. The following steps are:

Improve the model prediction modules
Comparative Relations Extraction(ongoing): crf.py for traditional method and plan to use bi-lstm-crf model

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
old		old
result		result
.gitattributes		.gitattributes
README.md		README.md
average_embedding.py		average_embedding.py
baidu_dp.py		baidu_dp.py
baidu_lexer.py		baidu_lexer.py
baidu_sentiment.py		baidu_sentiment.py
baidu_word_embedding.py		baidu_word_embedding.py
config.py		config.py
crf.py		crf.py
embedding.py		embedding.py
evaluator.py		evaluator.py
langconv.py		langconv.py
layers.py		layers.py
main.py		main.py
metrics.py		metrics.py
model_library.py		model_library.py
reader.py		reader.py
sent_check.py		sent_check.py
traditional_ml_models.py		traditional_ml_models.py
trainer.py		trainer.py
utils.py		utils.py
visualization.py		visualization.py
zh_wiki.py		zh_wiki.py