Skip to content

Latest commit

 

History

History
32 lines (23 loc) · 1.84 KB

README.md

File metadata and controls

32 lines (23 loc) · 1.84 KB

IMDB Movie Review Sentiment Analysis

Overview

This project focuses on classifying IMDB movie reviews as positive or negative using various sentiment analysis techniques. Leveraging a dataset of 50,000 reviews, the analysis involves comprehensive data preprocessing, feature extraction using TF-IDF vectorization, and experimentation with multiple machine learning models. The highlight is the implementation of a CNN-LSTM hybrid model that demonstrated superior performance with an accuracy of 90%.

Problem Formulation

Features

  • Data preprocessing including cleaning, tokenization, and vectorization.
  • Exploration of 11 different machine learning models.
  • Detailed analysis and comparison of model performances.
  • Implementation of a CNN-LSTM hybrid model showcasing the effectiveness of deep learning in NLP.

Usage

To replicate the analysis or apply the models to new data, follow the notebooks provided in the repository.

Dataset

The dataset comprises 50,000 IMDB movie reviews, evenly split between positive and negative sentiments. It's publicly available and was prepared by Stanford University's AI Lab.

Results

The CNN-LSTM hybrid model achieved the highest accuracy at 90%, outperforming other models. Detailed performance metrics and analysis are provided for each model tested.

Future Work

Further improvements could explore transformer-based models like BERT and GPT, ensemble methods, and application to other domains or languages.

Citation

If you find this project useful, please consider citing:

  • The original dataset from Stanford University's AI Lab.
  • Relevant publications and resources listed in the References section of the project report.

Contact

For any inquiries or contributions, please contact [[email protected]].