Microsoft sees other big companies producing video content, and they want to get in on the opportunity as well. They have decided to create a new movie studio, and they are unfamiliar with the industry. They specifically want to know what type of movies are doing the best at the box office so they can decide how to best focus their movie production budget.
Define Success
For this analysis we explored 3 success metrics:
- Gross Revenue (which equate to ticket sales)
- Profit
- Return on Investment (ROI)
Discover Best Predictors of Success
For this analysis we explored all features available in our data sets:
- Production Budget
- Genre
- Popularity
Methods
- EDA (Exploratory Data Analysis, as visualizations are great at showing relationships)
- Linear Regression using Pearson's Correlation Coefficient (r) as our accuracy metric
Accuracy Metric
Pearson's Correlation Coefficient (r). Pearson's r is considered significant at r>=0.5, with increasing strength of correlation as the number increases or decreases toward 1 or -1. (With 1 equating to 100% positive correlation, and -1 indicating a 100% negative, yet significant correlation).
Data from:
Popularity rating from TMDb calculated using the methods provided here: https://developers.themoviedb.org/3/getting-started/popularity
- Python
- Jupyter Notebook
- Pandas
- Numpy
- Matplotlib.pyplot
- Seaborn
- JSON