Skip to content

Latest commit

 

History

History
51 lines (32 loc) · 1.96 KB

README.md

File metadata and controls

51 lines (32 loc) · 1.96 KB

TMDB Movie Data Analysis

by Chinonso Okonkwo

Objectives

This is a repository for Udacity Data Analyst Project 1 (Investigate a Dataset). The dataset used in the project is also included in this repository.

Installation

The libraries used on this project include:

  • Pandas – For storing and manipulating structured data. Pandas functionality is built on NumPy (upgrade to version 0.25.1)
  • Numpy – For multi-dimensional array, matrix data structures and, performing mathematical operations
  • Matplotlib – For all visualizations (including maps and graphs)

Introduction

I analyzed the dataset which contains information of about 10,000 movies collected from The Movie Database (TMDb), including user ratings and revenue. The analysis is focused on answering the questions:

  • Which movie title had the highest budget?
  • Which movie titles has the highest revenue?
  • Which movies are the most popular of all times?
  • Is there a correlation between vote_count and revenue?
  • What kinds of properties are associated with movies that have high revenues?

Project Methodology

The main steps for this project can be summarized as follows:

  • Data Wrangling
    • Data Assessment
    • Data Cleaning
  • Exploratory Analysis
  • Conclusions/Results

Results

Based on the data and analysis carried out;

  • The most Popular Movies of all time are Jurassic World, Mad Max: Fury Road, Interstellar, Guardians of Galaxy and Insurgent.

  • The Scatter plot visualization plotted shows that there is no correlation between vote_counts and revenue generated.

  • High Popularity ratings is associated with movies that generates high revenue

  • The budget of a movie that generates low revenue is about 5 million while that of a high revenue movie over 52 million. This clearly shows that budget of a movie is correllated with the revenue of a movie, but there are limitations to this result, such as the year the movie was released(release_year) and Director of the Movie.