Training BatchNorm and Only BatchNorm: Affine parameter effects in MLP's

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage
Roadmap
Contact
Acknowledgments

About The Project

Part of a project in Deep Learning Applied AI at the University of Sapienza spring of 2022. The starting point was the paper "Training BatchNorm and only BatchNorm" where they investigated the effects of freezing all but batch normalization layers on residual neural nets. This project experiments with MLP's of varying dimensions on MNIST for comparison (the implementation also works for CIFAR-10). Also implemented a shallow CNN (mainly for seeing effects of a shallow non-residual CNN, but also for seeing effects of tuning on BatchNorm performance).

Findings suggest that BatchNorm does offer greater performance than the same number of random parameters. At least when going over a certain number of parameters, and effects generally increase as parameters increase from then on.

Getting Started

The notebooks are standard tensorflow/keras jupyter notebooks. For understanding more about what they are about I recommend reading the paper on training BatchNorm and only BatchNorm (link in acknowledgments).

Prerequisites

The notebooks run with jupyter and tensorflow/keras. They also work fine in google colab.

Jupyter notebook
Tensorflow
```
pip install tensorflow
```
Keras-tuner if you want to do tuning. If this becomes an issue feel free to comment it out.
```
pip install keras-tuner --upgrade
```

(back to top)

Usage

There are two notebooks, one for each architecture (LeNet CNN and MLP's). Each notebook has a tuning section at the bottom where the tuning is commented out. For this project the MLP notebook is the interesting one.

Using MNIST or CIFAR-10 is decided by setting a variable value at the top.

(back to top)

Roadmap

Self-evalution on further work that can be done on this project.

Optimize the random parameter freezing/unfreezing. Only in Keras for R is it possible to freeze certain weights, this could come soon and easily speed up runtime for the larger nets.
More rigorous MLP architecture design. As it is the dimensions and contents are somewhat simple and arbitrarily picked based on getting initial results.
Testing and tuning more hyperparameters. Also activation function positioning (before or after) and batch sizing.
Further experimenting with other datasets. Also extending it to non-computer vision sets.
Experiment with other architectures.

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contact

Your Name - [email protected]

Project Link: https://github.com/marcusntnu/mlp_lenet_bathnorm

(back to top)

Acknowledgments

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
convnet_batchnorm.ipynb		convnet_batchnorm.ipynb
dense_batchnorm.ipynb		dense_batchnorm.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Training BatchNorm and Only BatchNorm: Affine parameter effects in MLP's

About The Project

Getting Started

Prerequisites

Usage

Roadmap

Contact

Acknowledgments

About

Releases

Packages

Languages

Marcusntnu/mlp_lenet_bathnorm

Folders and files

Latest commit

History

Repository files navigation

Training BatchNorm and Only BatchNorm: Affine parameter effects in MLP's

About The Project

Getting Started

Prerequisites

Usage

Roadmap

Contact

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages