Fromage 🧀 optimiser

Jeremy Bernstein · Arash Vahdat · Yisong Yue · Ming‑Yu Liu

Voulez-vous du fromage?

To get started with Fromage in your Pytorch code, copy the file fromage.py into your project directory, then write:

from fromage import Fromage
optimizer = Fromage(net.parameters(), lr=0.01, p_bound=None)

An initial learning rate of 0.01 has worked well in all our experiments except model fine-tuning, where 0.001 worked well. Decaying the learning rate when the loss plateaus is a good idea.

On some benchmarks, Fromage heavily overfit the training set. We were able to control this behaviour by setting the p_bound regularisation flag. This constrains the norm of each layer's weights to lie within a factor of p_bound times its intial value.

About this repository

We've written an academic paper that proposes an optimisation algorithm based on a new geometric characterisation of deep neural networks. The paper is called:

On the distance between two neural networks and the stability of learning.

You can also check out a blog post with some interactive demos of the main idea:

Getting to the bottom.

We're putting this code here so that you can test out our optimisation algorithm in your own applications, and also so that you can attempt to reproduce the experiments in our paper.

If something isn't clear or isn't working, let us know in the Issues section or contact [email protected].

Repository structure

Here is the structure of this repository.

.
├── classify-cifar/         # CIFAR-10 classification experiments.
├── classify-imagenet/      # Imagenet classification experiments.
├── classify-mnist/         # MNIST classification experiments.
├── transformer-wikitext2/  # Transformer training experiments.
├── generate-cifar/         # CIFAR-10 class-conditional GAN experiments.
├── make-plots/             # Code to reproduce the figures in the paper.
├── LICENSE                 # The license on our algorithm.
├── README.md               # The very page you're reading now.
└── fromage.py              # Pytorch code for the Fromage optimiser.

Acknowledgements

This research was supported by Caltech and NVIDIA.
Our GAN implementation is based on a codebase by Jiahui Yu.
Our Transformer code is from the Pytorch example.
Our CIFAR-10 classification code is orginally by kuangliu.
Our MNIST code was originally forked from the Pytorch example.
See here and here for closely related work by Yang You and coauthors.

Citation

If you adore le fromage as much as we do, feel free to cite the paper:

@inproceedings{fromage, 
  title={On the distance between two neural networks and the stability of learning},
  author={Jeremy Bernstein and Arash Vahdat and Yisong Yue and Ming-Yu Liu},
  booktitle = {Neural Information Processing Systems},
  year={2020}
}

License

We are making our algorithm available under a CC BY-NC-SA 4.0 license. The other code we have used obeys other license restrictions as indicated in the subfolders.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fromage 🧀 optimiser

Voulez-vous du fromage?

About this repository

Repository structure

Acknowledgements

Citation

License

About

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
classify-cifar		classify-cifar
classify-imagenet		classify-imagenet
classify-mnist		classify-mnist
generate-cifar		generate-cifar
make-plots		make-plots
transformer-wikitext2		transformer-wikitext2
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
fromage.py		fromage.py

License

jxbz/fromage

Folders and files

Latest commit

History

Repository files navigation

Fromage 🧀 optimiser

Voulez-vous du fromage?

About this repository

Repository structure

Acknowledgements

Citation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages