Skip to content

Latest commit

 

History

History
26 lines (19 loc) · 636 Bytes

File metadata and controls

26 lines (19 loc) · 636 Bytes

NLP_Book-Classification/Clustering

Background of this project: Take different transfromation methods(BOW,TFIDF,DOC2VEC) and algorithms to classfiy and cluster five books-chesterton-brown,austen-emma,edgeworth-parents,milton-paradise,bible-kjv

Data preprocessing: Convert all letters into lower case Remove punctuations Tokenize the documents to remove stopwords (nltk library) Lemmatization Transform text into vector

Classification:

Support Vector Machines (SVM) K-Nearest Neighbors (KNN) Decision Tree Random Forest Logistic Regression

Clustering: K-means Hierarchical Expectation-Maximization (EM)