This repository contains my own Python implementation of l22-regularized logistic regression for solving classification problems.
I completed this code as coursework for the University of Washington's DATA 558 Statistical Machine Learning course in May 2018.
Implementation of l22-regularized logistic regression involves solving the convex and differentiable minimization problem:
Note that this assumes that the training data is of the form:
Logistic regression (the loss function in the equation above) is a popular model for classification wherein the log odds of the posterior probability of the K classes is determined by a linear function of the predictor variables. l22 regularization (the penalty term in the equation above) serves to prevent overfitting in the model.
This repository contains Python code to solve the l22-regularized logistic regression minimization problem described above. This can be used to solve classification problems for two or more classes.
The /src folder contains my logistic_regression.py module which contains all the necessary functions to implement this method. Specifically, the module contains:
- obj - function to calculate the objective value of the minimization problem described above
- computegrad - function to calculate the gradient of the minimization problem described above
- backtracking - function to implement backtracking line search to determine the step size for fast gradient descent (see next bullet)
- fastgradalgo - function to implement the fast gradient descent ('momentum') algorithm to solve the minimzation problem described above
- crossval - function to implement crossvalidation to find the optimal regularization parameter λ
- ovo - function to implement the one-vs-one method for multi-class classification problems
This respository also includes the following iPython notebooks which demonstrate the functionality of this module:
- Demo1 - shows an example of implementing l2 regularized logistic regression on a real-world dataset
- Demo2 - shows an example of implementing l2 regularized logistic regression on a simulated dataset
- Demo3 - compares the results of my l2 regularized logistic regression to the equivalent functions in scikit-learn on a real-world dataset
To use the code in this repository:
- clone the repository
- navigate to the main directory
- launch python
- run "import src.logistic_regression"
Upon completing these steps all of the functions in the logisitic_regression.py module will be available. Please note that these functions were developed using Python 3.6.4 and functionality is not guaranteed for older versions of Python. If you do not already have them, you may need to install the following dependencies in your local environment:
- matplotlob.pyplot
- numpy
- pandas
- sklearn