Skip to content

Nonso-Analytics/Udacity-Investigate-A-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

TMDB Movie Data Analysis

by Chinonso Okonkwo

Objectives

This is a repository for Udacity Data Analyst Project 1 (Investigate a Dataset). The dataset used in the project is also included in this repository.

Installation

The libraries used on this project include:

  • Pandas – For storing and manipulating structured data. Pandas functionality is built on NumPy (upgrade to version 0.25.1)
  • Numpy – For multi-dimensional array, matrix data structures and, performing mathematical operations
  • Matplotlib – For all visualizations (including maps and graphs)

Introduction

I analyzed the dataset which contains information of about 10,000 movies collected from The Movie Database (TMDb), including user ratings and revenue. The analysis is focused on answering the questions:

  • Which movie title had the highest budget?
  • Which movie titles has the highest revenue?
  • Which movies are the most popular of all times?
  • Is there a correlation between vote_count and revenue?
  • What kinds of properties are associated with movies that have high revenues?

Project Methodology

The main steps for this project can be summarized as follows:

  • Data Wrangling
    • Data Assessment
    • Data Cleaning
  • Exploratory Analysis
  • Conclusions/Results

Results

Based on the data and analysis carried out;

  • The most Popular Movies of all time are Jurassic World, Mad Max: Fury Road, Interstellar, Guardians of Galaxy and Insurgent.

  • The Scatter plot visualization plotted shows that there is no correlation between vote_counts and revenue generated.

  • High Popularity ratings is associated with movies that generates high revenue

  • The budget of a movie that generates low revenue is about 5 million while that of a high revenue movie over 52 million. This clearly shows that budget of a movie is correllated with the revenue of a movie, but there are limitations to this result, such as the year the movie was released(release_year) and Director of the Movie.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published