MLOps Zoomcamp Final Project

Problem Statement

Here we have used a very popular case study by Darden School of Business, published in Harvard Business.
This is regarding the story of two people who are going to be married in the future. The guy named Greg wanted to buy a ring to propose to a girl named Sarah.
The problem is to find the ring Sarah will like, but after a suggestion from his close friend, Greg decides to buy a diamond stone instead so that Sarah can decide her choice.
Greg then collects data of 6000 diamonds with their price and attributes like cut, color, shape, etc.
The final objective is to predict the Price using attributes like cut, color, shape, etc.

conda env create -f environment.yml

or you can use the requirements.txt

pip install -r requirements.txt

Then you can implement the model.ipynb notebook to train the model and get the best model for prediction.
The best model has been saved as a pickle file model_pipeline_final.pkl
The experiment tracking details have been stored in images mlflow which shows runs using different model types with the best model being CatBoostRegressor with an mean absolute percentage error of 4.41%.
Since most of the features are categorical.
In the feature importance plot, we see that most important feature is the Carat Weight which makes sense from modeling perspective.
The webserver folder contains the Dockerfile as well as code to run the flask api.
This work was done on AWS Cloud on a t2.xlarge instance.

I would like to thank the DataTalkClub - Alexey and the entire team for such and amazing learning experience. Thanks a lot.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
catboost_info		catboost_info
images_mlflow		images_mlflow
mlruns		mlruns
model_trained		model_trained
webserver		webserver
Makefile		Makefile
Pipfile		Pipfile
environment.yaml		environment.yaml
feature_importance_plot.png		feature_importance_plot.png
logs.log		logs.log
model.ipynb		model.ipynb
model_pipeline_final.pkl		model_pipeline_final.pkl
readme.md		readme.md
requirements.txt		requirements.txt