Skip to content

This project aims to classify movie reviews from the IMDb dataset as positive or negative using sentiment analysis. The approach includes data preprocessing steps, such as cleaning and vectorizing the data, and trying out 11 different machine learning models.

Notifications You must be signed in to change notification settings

omerskoc/imdb-sentiment-analysis

Repository files navigation

IMDB Movie Review Sentiment Analysis

Overview

This project focuses on classifying IMDB movie reviews as positive or negative using various sentiment analysis techniques. Leveraging a dataset of 50,000 reviews, the analysis involves comprehensive data preprocessing, feature extraction using TF-IDF vectorization, and experimentation with multiple machine learning models. The highlight is the implementation of a CNN-LSTM hybrid model that demonstrated superior performance with an accuracy of 90%.

Problem Formulation

Features

  • Data preprocessing including cleaning, tokenization, and vectorization.
  • Exploration of 11 different machine learning models.
  • Detailed analysis and comparison of model performances.
  • Implementation of a CNN-LSTM hybrid model showcasing the effectiveness of deep learning in NLP.

Usage

To replicate the analysis or apply the models to new data, follow the notebooks provided in the repository.

Dataset

The dataset comprises 50,000 IMDB movie reviews, evenly split between positive and negative sentiments. It's publicly available and was prepared by Stanford University's AI Lab.

Results

The CNN-LSTM hybrid model achieved the highest accuracy at 90%, outperforming other models. Detailed performance metrics and analysis are provided for each model tested.

Future Work

Further improvements could explore transformer-based models like BERT and GPT, ensemble methods, and application to other domains or languages.

Citation

If you find this project useful, please consider citing:

  • The original dataset from Stanford University's AI Lab.
  • Relevant publications and resources listed in the References section of the project report.

Contact

For any inquiries or contributions, please contact [[email protected]].

About

This project aims to classify movie reviews from the IMDb dataset as positive or negative using sentiment analysis. The approach includes data preprocessing steps, such as cleaning and vectorizing the data, and trying out 11 different machine learning models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published