Opinion Mining on movie reviews with Scala and Spark
This project was implement as part of Master studies, and includes various techniques that were implemented in search for the best results.
-
Preprocessing: stemming, removal of punctuation/stopwords, bigram creation, retaining only lexicon words
-
Feature extraction/selection: word2vec, TF-IDF, PCA
-
Classifiers: Decision Tree, Bayes, SVM, Logistic Regression, RF, Gradient Boost Tree