Repository of the code (Python 3) for the homework of Optimization for Data Science, academic year 2019/2020, UNIPD.
numpy, matplotlib, sklearn, argparse, json, time, tabulate. tqdm
In src it is possible to find the main scripts used for this work:
- loss.py: containing the LogisticLoss class
- optimizer.py: containing the abstract Optimizer class and the main three optimization algorithms (Gradient Descent, SGD and SVRG)
- model.py: containing the Model class, which is a simple Linear Classifier with the fit and predict functions. It also contains a main which is a demo of how the model works with some 1D and 2D data and the different optimizations algorithms.
In src there is the script test.py that gives the possibilities to test the model with the choosen optimizer on three different datasets (IRIS, MNIST, a9a).
The script has the following arguments:
- data: String. The dataset to use (mnist, a9a, iris). Default is iris.
- optim: String. the optimization algorithm chosen (gd, sgd, svrg). Default is gd
- tollerance: Float. The minimum norm that the gradient must have in order to stop the optimization algorithm. Default is 0.001
- iter_epoch: Int. The number of example to take in every step of SVRG. Default 5000. Ignored if not SVRG.
- epochs: Int. The number of epochs. Default is 10.
- lr: Float. the step-size. Default is 0.05
- reg_coeff: Float. The regularization lambda coefficient. Default is 0.001
- init: String. The weights initialization. Either "zeros" or "random" (normal distribution). Default is zeros
- verbose: 0/1. Print some information on every step (1) or don't (0). Default is 0.
- seed: Int. Define a random seed. Default is None
Example:
cd C:\Users\fgrim\Desktop\Optimization\optimization
python src\model.py --data "mnist" --optim "gd" --epochs 300 --lr 0.2 --reg_coeff 0.01 --init "random" --seed 42
result.ipynb is a notebook (set up in a colab enviroment, but easy to modfiy to run it in local) that fit different models (different dataset and optimizer) and return some results.
In the results folder there are some json files containing some results and two scripts display.py and reduce.py that were done in order to process and visualize the results
In the data folder there are the MNIST dataset and the a9a dataset. The IRIS dataset is called by sklearn.dataset*.
In the images folder there are some results images