Variational inference is an increasingly popular method in
+ statistics and machine learning for approximating probability
+ distributions. We developed LINFA (Library for Inference with
+ Normalizing Flow and Annealing), a Python library for variational
+ inference to accommodate computationally expensive models and
+ difficult-to-sample distributions with dependent parameters. We
+ discuss the theoretical background, capabilities, and performance of
+ LINFA in various benchmarks. LINFA is publicly available on GitHub at
+
Generating samples from a posterior distribution is a fundamental
+ task in Bayesian inference. The development of sampling-based
+ algorithms from the Markov chain Monte Carlo family
+ (
However, cases where the computational cost of evaluating the
+ underlying probability distribution is significant occur quite often
+ in engineering and applied sciences, for example when such evaluation
+ requires the solution of an ordinary or partial differential equation.
+ In such cases, inference can easily become intractable. Additionally,
+ strong and nonlinear dependence between model parameters may results
+ in difficult-to-sample posterior distributions characterized by
+ features at multiple scales or by multiple modes. The LINFA library is
+ specifically designed for cases where the model evaluation is
+ computationally expensive. In such cases, the construction of an
+ adaptively trained surrogate model is key to reducing the
+ computational cost of inference
+ (
LINFA is designed as a general inference engine and allows the user + to define custom input transformations, computational models, + surrogates, and likelihood functions.
+-
+
-
+
+
defines a hyperbolic tangent transformation for the first + two variables and an exponential transformation for the + third.
+-
+
-
+
A new surrogate model can be created using the
+
+
A
Surrogate model Input/Output. The two functions
+
The
The
+
and then assigning it as a member function of the
+
+
Other Python modules and packages were found to provide an + implementation of variational inference with a number of additional + features. An incomplete list of these packages is reported + below.
+-
+
Online notebooks (see this
+
LINFA is based on normalizing flow transformations and therefore + can infer non linear parameter dependence. It also provides the + ability to adaptively train a surrogate model (NoFAS) which + significantly reduces the computational cost of inference for the + parameters of expensive computational models. Finally, LINFA + provides an adaptive annealing algorithm (AdaAnn) which autonomously + selects the appropriate annealing steps based on the current + approximation of the posterior distribution.
+We tested LINFA on multiple problems. These include inference on + unimodal and multi-modal posterior distributions specified in closed + form, ordinary differential models and dynamical systems with + gradients directly computed through automatic differentiation in + PyTorch, identifiable and non-identifiable physics-based models with + fixed and adaptive surrogates, and high-dimensional statistical + models. Some of the above tests are included with the library and + systematically tested using GitHub Actions. A detailed discussion of + these test cases is provided in the Appendix. To run the test type
+where
In this paper, we have introduced the LINFA library for variational + inference, briefly discussed the relevant background, its + capabilities, and report its performance on a number of test cases. + Some interesting directions for future work are mentioned below.
+Future versions will support user-defined privacy-preserving
+ synthetic data generation and variational inference through
+ differentially private gradient descent algorithms. This will allow
+ the user to perform inference tasks while preserving a pre-defined
+ privacy budget, as discussed in
+ (
The authors gratefully acknowledge the support by the NSF Big Data + Science & Engineering grant #1918692 and the computational + resources provided through the Center for Research Computing at the + University of Notre Dame. DES also acknowledges support from NSF + CAREER grant #1942662.
+Consider the problem of estimating (in a Bayesian sense) the
+ parameters
In the context of variational inference, we seek to determine
+ an
For computational convenience, normalizing flow transformations
+ are selected to be easily invertible and their Jacobian
+ determinant can be computed with a cost that grows linearly with
+ the problem dimensionality. Approaches in the literature include
+ RealNVP
+ (
LINFA implements two widely used normalizing flow formulations,
+ MAF
+ (
RealNVP is another widely used flow where, at each layer the
+ first
LINFA is designed to accommodate black-box models
+
This requires the evaluation of the gradient of the ELBO
+
Our solution is to replace the model
+
To resolve these issues, LINFA implements NoFAS, which updates
+ the surrogate model adaptively by smartly weighting the samples of
+
Annealing is a technique to parametrically smooth a target
+ density to improve sampling efficiency and accuracy during
+ inference. In the discrete case, this is achieved by incrementing
+ an
A linear annealing scheduler with fixed temperature increments
+ is often used in practice (see, e.g., Rezende & Mohamed
+ (
The AdaAnn scheduler determines the increment
+
The denominator is large when the support of the annealed
+ distribution
A model
where
Results in terms of loss profile, variational approximation,
+ and posterior predictive distribution are shown in
+
Results from the simple two-dimensional map. Loss + profile (left), posterior samples (center), and posterior + predictive distribution (right).
We consider a map
Results from the high-dimensional example. The top + row contains the loss profile (left) and samples from the + posterior predictive distribution plus the available + observations (right). Samples from the posterior distribution + are instead shown in the bottom row.
The two-element Windkessel model (often referred to as the
+
where
Results from the RC model. Loss profile (left),
+ posterior samples (center) for R and C, and the posterior
+ predictive distribution for
The three-parameter Windkessel or
The output consists of the maximum, minimum, and average values
+ of the proximal pressure
This example also demonstrates how NoFAS can be combined with
+ annealing for improved convergence. The results in
+
Results from the RCR model. The top row contains the + loss profile (left) and samples from the posterior predictive + distribution plus the available observations (right). Samples + from the posterior distribution are instead shown in the bottom + row.
We consider a modified version of the Friedman 1 dataset
+ (
Posterior mean and standard deviation for positive mode + in the modified Friedman test case.
++ | + | |||
Post. Mean | +Post. SD | ++ | + | |
10.0285 | +0.1000 | ++ | + | |
4.2187 | +0.1719 | ++ | + | |
0.4854 | +0.0004 | ++ | + | |
10.0987 | +0.0491 | ++ | + | |
5.0182 | +0.1142 | ++ | + | |
0.1113 | +0.0785 | ++ | + | |
0.0707 | +0.0043 | ++ | + | |
-0.1315 | +0.1008 | ++ | + | |
0.0976 | +0.0387 | ++ | + | |
0.1192 | +0.0463 | ++ | + |
Loss profile (left) and posterior marginal + statistics (right) for positive mode in the modified Friedman + test case.
This section contains the list of all hyperparameters in the
+ library, their default values, and a description of the
+ functionalities they control. General hyperparameters are listed
+ in
+
Output parameters
+string | +Name of output folder where results + files are saved. | +|
string | +Name of the log file which stores the + iteration number, annealing temperature, and value of + the loss function at each iteration. | +|
int | +Seed for the random number + generator. | +
Surrogate model parameters (NoFAS)
+int | +Batch size used when saving results to
+ the disk (i.e., once every
+ |
+ |
int | +Number of NF iteration between
+ successive updates of the surrogate model (default
+ |
+ |
int | +Maximum allowable number of true model + evaluations. | +|
int | +Number of pre-training iterations for
+ surrogate model (default
+ |
+ |
int | +Number of iterations for the surrogate
+ model update (default
+ |
+ |
string | +Folder where the surrogate model is
+ stored (default
+ |
+ |
bool | +Start by pre-training a new surrogate
+ and ignore existing surrogates (default
+ |
+ |
int | +Save interval for surrogate model
+ ( |
+
Device parameters
+bool | +Do not use GPU acceleration. | +
Optimizer and learning rate parameters
+string | +Type of SGD optimizer (default
+ |
+ |
float | +Learning rate (default
+ |
+ |
float | +Learning rate decay (default
+ |
+ |
string | +Type of learning rate scheduler
+ ( |
+ |
int | +Number of steps before learning rate + reduction for the step scheduler. | +|
int | +Number of iterations between successive
+ loss printouts (default
+ |
+
General parameters
+str | +Name of the experiment. | +|
str | +type of normalizing flow
+ ( |
+ |
int | +Number of normalizing flow layers
+ (default
+ |
+ |
int | +Number of neurons in MADE and RealNVP
+ hidden layers (default
+ |
+ |
int | +Number of hidden layers in MADE + (default 1). | +|
str | +Activation function for MADE network
+ used by MAF (default
+ |
+ |
str | +Input order for MADE mask creation
+ ( |
+ |
bool | +Adds batchnorm layer after each MAF or
+ RealNVP layer (default
+ |
+ |
int | +How often to save results from the + normalizing flow iterations. Saved results include + posterior samples, loss profile, samples from the + posterior predictive distribution, observations, and + marginal statistics. | +|
int | +Input dimensionality (default
+ |
+ |
int | +Number of samples from the basic
+ distribution generated at each iteration (default
+ |
+ |
int | +Number of additional true model
+ evaluations at each surrogate model update (default
+ |
+ |
int | +Total number of NF iterations (default
+ |
+
Parameters for the adaptive annealing scheduler + (AdaAnn)
+bool | +Flag to activate the annealing
+ scheduler. If this is
+ |
+ |
string | +Type of annealing scheduler
+ ( |
+ |
float | +KL tolerance. It is kept constant
+ during inference and used in the numerator of equation
+ |
+ |
float | +Initial inverse temperature. | +|
int | +Number of batch samples during + annealing. | +|
int | +Number of batch samples at
+ |
+ |
int | +Number of initial parameter updates at
+ |
+ |
int | +Number of parameter updates after each + temperature update. During such updates the temperature + is kept fixed. | +|
int | +Number of parameter updates at
+ |
+ |
int | +Number of Monte Carlo samples used to
+ evaluate the denominator in equation
+ |
+