This is our source code for Project 1 in the course FYS-STK4155 Applied Data Analysis and Machine Learning at the University of Oslo.
The project is based on various introductory regression methods and resampling. We will be using the following regression methods,
- Ordinary Least Squares
- Ridge
- Lasso
in combination with our own implementation of k-fold cross-validation in order to eventually model a two-dimensional polynomial fit to real terrain data downloaded from USGS EarthExplorer.
To run all test functions, generate data and plots used in the report, please run main_script.sh.
- src/main.py: Main script containing all classes used in this project.
- src/test_main.py: Contains test functions for main.py. Use pytest to run tests.
- src/beta_variance_ols_plot.py: Calculates the variance of the regression parameters for OLS, both for Franke and terrain data.Saves plots as .pdf in doc/figs/
- src/bias_variance_error_Franke.py: Calculates EPE using k-fold cross validation for OLS, Ridge and LASSO on Franke data using different polynomial degrees and hyperparameters. Plots saved as .pdf to doc/figs/
- src/bias_variance_error_terrain.py: Calculates EPE using k-fold cross validation for OLS, Ridge and LASSO on Terrain data using different polynomial degrees and hyperparameters. Plots saved as .pdf to doc/figs/
- src/model_plots.py: Creates 3D plots of our bet OLS, Ridge and LASSO models for both datasets. Figures are saved as .pdf to doc/figs/
- src/r2_scores.py: Calculates R2 scores of our best models for OLS, Ridge and LASSO models for both datasets. Results are printed in the terminal after running.
- doc/report_1.tex: Main report of the project.
- main_script.sh: Shell script that automatically runs all necessary python scripts and builds the TeX report using the newly generated figures.
Unfortunately, Github does not support embedding graphics in pdf format, so we have to link to them instead. The reason we use .pdf is that we want to use vector graphics for figures.
- 3D plotted Ridge fit of Franke's function.
- 3D plotted LASSO fit of Franke's function.
- Bias variance decomposition of Ridge error as a function of hyperparameter
- 3D plotted Ridge fit of terrain data.
- 3D plotted LASSO fit of terrain data.
- EPE plotted for OLS as a function of model complexity, using only every 150th point in the x and y directions of the terrain data.
- EPE plotted for Ridge as a function of the hyperparameter, using only every 150th point in the x and y directions of the terrain data.
- EPE plotted for LASSO as a function of the hyperparameter, using only every 150th point in the x and y directions of the terrain data.
- Variance of the OLS parameters, using only every 150th point in the x and y directions of the terrain data