Logistic regression is a supervised learning classification algorithm used to predict the probability of a target variable. The nature of target or dependent variable is dichotomous, which means there would be only two possible classes. Mathematically, a logistic regression model predicts P(Y=1) as a function of X. It is one of the simplest ML algorithms that can be used for various classification problems such as spam detection, Diabetes prediction, cancer detection etc. Here , we use the Logistic regression model to predict the gender(Male/Female) of the person based on their weight and height . The data set contains three columns
- Height in inches
- Weight in pounds
- Gender (Male/Female) of the person
Let's explore the correlations and see which features separate the Male\Femals populations:
Pairplot per Gender | Correlation |
---|---|
The model has some hyperparameters we can tune for hopefully better performance. For tuning the parameters of our model, we will use a mix of cross-validation and grid search. In Logistic Regression, the most important parameter to tune is the regularization parameter C. Note that the regularization parameter is not always part of the logistic regression model.
regularization parameter C is used to control for unlikely high regression coefficients, and in other cases can be used when data is sparse, as a method of feature selection.
- Writing our own loops to iterate over the model parameters
- Using GridSearchCV to find the best model
After completing above steps we have conculded that the best regularization parameter C: 1 correspondes to the max validation score: 0.9172
- Basic Logistic Regression (Unregularized): 0.9172
- Tuned Logistic Regression Parameters: {'C': 1}Best score is 0.9168
- Logistic Regression Accuracy Score (Regularized): 0.9252
As we can see above, our Model predicted the Gender accurately!!!