This project analyzes data from on-line dating application OKCupid. In recent years, there has been a massive rise in the usage of dating apps to find love. Many of these apps use sophisticated data science techniques to recommend possible matches to users and to optimize the user experience. These apps give us access to a wealth of information that we've never had before about how different people experience romance. The goal of this project is to scope, prep, analyze, and create a machine learning model to solve a question.
The goal is to apply machine learning techniques (Classification) to a data set. The primary research question that will be answered is whether an OkCupid's drinking habit can be predicted using other variables from their profiles. This project is important since sharing a lifestyle and habits can be important part of matches, and if users don't input their drinking habit, OkCupid would like to predict which habit they might be.
This solution will use descriptive statistics and data visualization to find key figures in understanding the distribution, count, and relationship between variables. Since the goal of the project to make predictions on the user's drinking habits, classification algorithms from the supervised learning family of machine learning models will be implemented.
The project will conclude with the evaluation of the machine learning model selected with a validation data set. The output of the predictions can be checked through a confusion matrix, and metrics such as accuracy, precision, recall, F1 and Kappa scores.