A curated list of awesome hypernetwork resources, inspired by awesome-computer-vision and awesome implicit representations.
Hypernetworks have become very common in the field of deep learning and appear in some way or another already in thousands of papers. In the following, I will therefore try to make a list of resources that are only a good representative of the most interesting concepts around HyperNetworks. Also, there will be bias towards my papers.
Please get in touch when you think I missed important references.
HyperNetworks are simply neural networks that produce and/or adapt parameters of another parametrized model. Without surprise, they at least date back to the beginning of the 1990s and Schmidhuber in the context of meta-learning and self-referential. Hypernetworks have been applied in a very large range of deep learning contexts and applications which I try to cover below.
The core idea of adaptive layers is the make the parameters of a certain layer of the neural network adapt to computation that preceded the layers computation. Usually, and in contrast to that, the computational node of normal parameters have no parent and the node’s value is static during the “forward” computation.
Fast weights and work on RNNs.
- Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks (Schmidhuber 1991)
- Evolving Modular Fast-Weight Networks for Control (Gomez & Schmidhuber 2005)
- Generating Text with Recurrent Neural Networks (Sutskever et. al 2011)
- HyperNetworks (Ha et. al 2016)
- Using Fast Weights to Attend to the Recent Past (Ba et. al 2016)
- Recurrent Independent Mechanims (Goyal et. al 2021)
Work on CNNs.
- Predicting Parameters in Deep Learning (Denil et. al 2013)
- A Dynamic Convolutional Layer for Short Range Weather Prediction (Klein et. al 2015)
- Dynamic Filter Networks (De Brabandere et. al 2016)
- Fully-Convolutional Siamese Networks for Object Tracking (Bertinetto et. al 2016)
- FiLM: Visual Reasoning with a General Conditioning Layer (Perez et. al 2017)
- Incorporating Side Information by Adaptive Convolution (Kang et. al 2017)
- Learning Implicitly Recurrent CNNs Through Parameter Sharing (Savarese & Maire 2019)
Work on generative models. The following two papers simply condition the generators of a GAN on side information. Probably there is more interesting work, please contact me if you know of something. I also list my paper "continual learning with hypernetworks" here because we use a hypernetwork i.a. to generate weights of a decoder in a variational autoencoder.
- Large Scale GAN Training for High Fidelity Natural Image Synthesis (Brock et. al 2018)
- A Style-Based Generator Architecture for Generative Adversarial Networks (Karras et. al 2018)
- Continual learning with hypernetworks (von Oswald et. al 2020)
An overview of multiplicative interactions and hypernetworks
- Multiplicative Interactions and Where to Find Them (Jayakumar et. al 2020)
Self-attention is a form of adaptive layers. Nevertheless, I will not cover transformer literature here but mention this Schlag, Irie and Schmidhuber paper that discusses the equivalence to fast weights:
- Linear Transformers Are Secretly Fast Weight Programmers (Schlag et. al 2021)
There has been very nice ideas that use Hypernetworks in architecture search. This list is probably far from accurate and complete.
- A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks (Stanley et. al 2009)
- Evolving Neural Networks in Compressed Weight Space (Koutník et. al 2010)
- Convolution by Evolution (Fernando et. al 2016)
- SMASH: One-Shot Model Architecture Search through HyperNetworks (Brock et. al 2017 )
- Graph HyperNetworks for Neural Architecture Search (Zhang et. al 2019)
Implicit Neural Representations are continuous functions, usually neural networks, that simply represent a map between a domain and the signal value. Interestingly, hypernetworks are used in this framework intensively.
- Occupancy Networks: Learning 3D Reconstruction in Function Space (Mescheder et. al 2018)
- Deep Meta Functionals for Shape Representation (Littwin & Wolf 2019)
- Implicit Neural Representations with Periodic Activation Functions (Sitzmann et. al 2020)
- Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations (Sitzmann et. al 2019)
- Adversarial Generation of Continuous Images (Skorokhodov et. al 2021)
- MetaAvatar: Learning Animatable Clothed Human Models from Few Depth Images (Wang et. al 2021)
Algorithms that tackle meta- and continual learning with the help of hypernetworks have been developed extensively. Naturally, one can view the considered problems as acting on different time scales and formulate them as solutions to a bilevel optimization or related formulations where Hypernetworks can work well.
- Learning to learn by gradient descent by gradient descent (Andrychowicz et. al 2016)
- Fast Context Adaptation via Meta-Learning (Zintgraf et. al 2018)
- Meta-Learning with Latent Embedding Optimization (Requeima et. al 2018)
- Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace (Lee & Choi 2018)
- Stochastic Hyperparameter Optimization through Hypernetworks (Lorraine & Duvenaud et. al 2018)
- Fast and Flexible Multi-Task Classification Using Conditional Neural Adaptive Processes (Requeima et. al 2019)
- Meta-Learning with Warped Gradient Descent (Flennerhag et. al 2019)
- Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions (MacKay et. al 2019)
- Meta-Learning Symmetries by Reparameterization (Zhou et. al 2020)
- Meta-Learning via Hypernetworks (Zhao et. al 2020)
- Continual learning with hypernetworks (von Oswald et. al 2020)
- Continual Learning in Recurrent Neural Networks (Ehret et. al 2020)
- Meta Internal Learning (Bensadoun et. al 2021)
I have not seen many papers so far that use hypernetworks to tackle RL problems explicitly. Please contact me if you know of any.
- Hypermodels for Exploration (Dwaracherla et. al 2020)
- Continual Model-Based Reinforcement Learning with Hypernetworks (Huang et. al 2021)
- Recomposing the Reinforcement Learning Building Blocks with Hypernetworks (Keynan et. al 2021)
The following papers use hypernetworks to model a distribution over the weights of the target network. For example, one can use a hypernetwork to transform a simple normal distribution into a potentially complex weight distribution that captures the epistemic uncertainty of the model.
- Implicit Weight Uncertainty in Neural Networks (Pawlowski et. al 2017)
- Bayesian Hypernetworks (Krueger et. al 2017)
- Probabilistic Meta-Representations Of Neural Networks (Karaletsos et. al 2018)
- Neural Autoregressive Flows (Huang et. al 2018)
- Approximating the Predictive Distribution via Adversarially-Trained Hypernetworks (Henning et. al 2018)
- Neural networks with late-phase weights (von Oswald et. al 2020)
- Hierarchical Gaussian Process Priors for Bayesian Neural Network Weights (Karaletsos & Bui 2020)
- Uncertainty estimation under model misspecification in neural network regression (Cervera et. al 2021)
Hypernetwork papers that do not fall in the categories above.
- A Neural Representation of Sketch Drawings (Ha & Eck 2017)
- Measuring the Intrinsic Dimension of Objective Landscapes (Li et. al 2018)
- Neural Style Transfer via Meta Networks (Shen 2018)
- Gated Linear Networks (Veness et. al 2019)
- Hypernetwork Knowledge Graph Embeddings (Balažević et. al 2019)
- On the Modularity of Hypernetworks (Galanti & Wolf 2020)
- Principled Weight Initialization for Hypernetworks (Chang et. al 2020)
The following links implemented different Hypernetwork in Pytorch code.
- hypnettorch (Christian Henning & Maria Cervera)
- HyperNetworks (Gaurav Mittal)
- Hypernetworks: a versatile and powerful tool (Lior Wolf)
License: MIT