diff --git a/joss.06309/10.21105.joss.06309.crossref.xml b/joss.06309/10.21105.joss.06309.crossref.xml new file mode 100644 index 0000000000..a6c1726449 --- /dev/null +++ b/joss.06309/10.21105.joss.06309.crossref.xml @@ -0,0 +1,517 @@ + + + + 20240405T201059-ef8e7f823fcb9feb7f7ef7b485702f34cf3b59c9 + 20240405201059 + + JOSS Admin + admin@theoj.org + + The Open Journal + + + + + Journal of Open Source Software + JOSS + 2475-9066 + + 10.21105/joss + https://joss.theoj.org + + + + + 04 + 2024 + + + 9 + + 96 + + + + LINFA: a Python library for variational inference with +normalizing flow and annealing + + + + Yu + Wang + + + Emma R. + Cobian + + + Jubilee + Lee + + + Fang + Liu + + + Jonathan D. + Hauenstein + + + Daniele E. + Schiavazzi + + + + 04 + 05 + 2024 + + + 6309 + + + 10.21105/joss.06309 + + + http://creativecommons.org/licenses/by/4.0/ + http://creativecommons.org/licenses/by/4.0/ + http://creativecommons.org/licenses/by/4.0/ + + + + Software archive + 10.5281/zenodo.10883597 + + + GitHub review issue + https://github.com/openjournals/joss-reviews/issues/6309 + + + + 10.21105/joss.06309 + https://joss.theoj.org/papers/10.21105/joss.06309 + + + https://joss.theoj.org/papers/10.21105/joss.06309.pdf + + + + + + Stochastic relaxation, Gibbs distributions, +and the Bayesian restoration of images + Geman + IEEE Transactions on pattern analysis and +machine intelligence + 6 + 10.1109/TPAMI.1984.4767596 + 1984 + Geman, S., & Geman, D. (1984). +Stochastic relaxation, Gibbs distributions, and the Bayesian restoration +of images. IEEE Transactions on Pattern Analysis and Machine +Intelligence, 6, 721–741. +https://doi.org/10.1109/TPAMI.1984.4767596 + + + Equation of state calculations by fast +computing machines + Metropolis + The journal of chemical +physics + 6 + 21 + 10.1063/1.1699114 + 1953 + Metropolis, N., Rosenbluth, A. W., +Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equation of +state calculations by fast computing machines. The Journal of Chemical +Physics, 21(6), 1087–1092. +https://doi.org/10.1063/1.1699114 + + + Monte Carlo sampling methods using Markov +chains and their applications + Hastings + 10.1093/biomet/57.1.97 + 1970 + Hastings, W. K. (1970). Monte Carlo +sampling methods using Markov chains and their applications. +https://doi.org/10.1093/biomet/57.1.97 + + + Sampling-based approaches to calculating +marginal densities + Gelfand + Journal of the American statistical +association + 410 + 85 + 10.1080/01621459.1990.10476213 + 1990 + Gelfand, A. E., & Smith, A. F. +(1990). Sampling-based approaches to calculating marginal densities. +Journal of the American Statistical Association, 85(410), 398–409. +https://doi.org/10.1080/01621459.1990.10476213 + + + Graphical models, exponential families, and +variational inference + Wainwright + Foundations and Trends in Machine +Learning + 1–2 + 1 + 10.1561/2200000001 + 2008 + Wainwright, M. J., Jordan, M. I., +& others. (2008). Graphical models, exponential families, and +variational inference. Foundations and Trends in Machine Learning, +1(1–2), 1–305. +https://doi.org/10.1561/2200000001 + + + Optimal transport: Old and new + Villani + 338 + 10.1007/978-3-540-71050-9 + 2009 + Villani, C., & others. (2009). +Optimal transport: Old and new (Vol. 338). Springer. +https://doi.org/10.1007/978-3-540-71050-9 + + + Normalizing flows: An introduction and review +of current methods + Kobyzev + IEEE transactions on pattern analysis and +machine intelligence + 11 + 43 + 10.1109/TPAMI.2020.2992934 + 2020 + Kobyzev, I., Prince, S. J., & +Brubaker, M. A. (2020). Normalizing flows: An introduction and review of +current methods. IEEE Transactions on Pattern Analysis and Machine +Intelligence, 43(11), 3964–3979. +https://doi.org/10.1109/TPAMI.2020.2992934 + + + Normalizing flows for probabilistic modeling +and inference + Papamakarios + The Journal of Machine Learning +Research + 1 + 22 + 2021 + Papamakarios, G., Nalisnick, E., +Rezende, D. J., Mohamed, S., & Lakshminarayanan, B. (2021). +Normalizing flows for probabilistic modeling and inference. The Journal +of Machine Learning Research, 22(1), 2617–2680. + + + Variational inference with normalizing +flows + Rezende + International conference on machine +learning + 2015 + Rezende, D., & Mohamed, S. +(2015). Variational inference with normalizing flows. International +Conference on Machine Learning, 1530–1538. + + + Variational inference with NoFAS: Normalizing +flow with adaptive surrogate for computationally expensive +models + Wang + Journal of Computational +Physics + 467 + 10.1016/j.jcp.2022.111454 + 2022 + Wang, Y., Liu, F., & Schiavazzi, +D. E. (2022). Variational inference with NoFAS: Normalizing flow with +adaptive surrogate for computationally expensive models. Journal of +Computational Physics, 467, 111454. +https://doi.org/10.1016/j.jcp.2022.111454 + + + AdaAnn: Adaptive annealing scheduler for +probability density approximation + Cobian + International Journal for Uncertainty +Quantification + 13 + 10.1615/Int.J.UncertaintyQuantification.2022043110 + 2023 + Cobian, E. R., Hauenstein, J. D., +Liu, F., & Schiavazzi, D. E. (2023). AdaAnn: Adaptive annealing +scheduler for probability density approximation. International Journal +for Uncertainty Quantification, 13. +https://doi.org/10.1615/Int.J.UncertaintyQuantification.2022043110 + + + Density estimation using real +NVP + Dinh + arXiv preprint +arXiv:1605.08803 + 2016 + Dinh, L., Sohl-Dickstein, J., & +Bengio, S. (2016). Density estimation using real NVP. arXiv Preprint +arXiv:1605.08803. + + + Glow: Generative flow with invertible 1x1 +convolutions + Kingma + Advances in neural information processing +systems + 31 + 2018 + Kingma, D. P., & Dhariwal, P. +(2018). Glow: Generative flow with invertible 1x1 convolutions. Advances +in Neural Information Processing Systems, 31. + + + Masked autoregressive flow for density +estimation + Papamakarios + Advances in neural information processing +systems + 30 + 2017 + Papamakarios, G., Pavlakou, T., & +Murray, I. (2017). Masked autoregressive flow for density estimation. In +I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. +Vishwanathan, & R. Garnett (Eds.), Advances in neural information +processing systems (Vol. 30). Curran Associates, Inc. +https://proceedings.neurips.cc/paper_files/paper/2017/file/6c1da886822c67822bcf3679d04369fa-Paper.pdf + + + Improved variational inference with inverse +autoregressive flow + Kingma + Advances in neural information processing +systems + 29 + 2016 + Kingma, D. P., Salimans, T., +Jozefowicz, R., Chen, X., Sutskever, I., & Welling, M. (2016). +Improved variational inference with inverse autoregressive flow. +Advances in Neural Information Processing Systems, 29, +4743–4751. + + + MADE: Masked autoencoder for distribution +estimation + Germain + International conference on machine +learning + 2015 + Germain, M., Gregor, K., Murray, I., +& Larochelle, H. (2015). MADE: Masked autoencoder for distribution +estimation. International Conference on Machine Learning, +881–889. + + + Batch normalization: Accelerating deep +network training by reducing internal covariate shift + Ioffe + International conference on machine +learning + 2015 + Ioffe, S., & Szegedy, C. (2015). +Batch normalization: Accelerating deep network training by reducing +internal covariate shift. International Conference on Machine Learning, +448–456. + + + Differentially private normalizing flows for +density estimation, data synthesis, and variational inference with +application to electronic health records + Su + arXiv preprint +arXiv:2302.05787 + 2023 + Su, B., Wang, Y., Schiavazzi, D. E., +& Liu, F. (2023). Differentially private normalizing flows for +density estimation, data synthesis, and variational inference with +application to electronic health records. arXiv Preprint +arXiv:2302.05787. + + + Multivariate adaptive regression +splines + Friedman + The annals of statistics + 1 + 19 + 10.1214/aos/1176347963 + 1991 + Friedman, J. H. (1991). Multivariate +adaptive regression splines. The Annals of Statistics, 19(1), 1–67. +https://doi.org/10.1214/aos/1176347963 + + + Tgp: An R package for Bayesian nonstationary, +semiparametric nonlinear regression and design by treed Gaussian process +models + Gramacy + Journal of Statistical +Software + 19 + 10.18637/jss.v019.i09 + 2007 + Gramacy, R. B. (2007). Tgp: An R +package for Bayesian nonstationary, semiparametric nonlinear regression +and design by treed Gaussian process models. Journal of Statistical +Software, 19, 1–46. +https://doi.org/10.18637/jss.v019.i09 + + + Theorems and examples on high dimensional +model representation + Sobol’ + Reliability Engineering and System +Safety + 2 + 79 + 10.1016/S0951-8320(02)00229-6 + 2003 + Sobol’, I. M. (2003). Theorems and +examples on high dimensional model representation. Reliability +Engineering and System Safety, 79(2), 187–193. +https://doi.org/10.1016/S0951-8320(02)00229-6 + + + Greedy inference with structure-exploiting +lazy maps + Brennan + Advances in Neural Information Processing +Systems + 33 + 2020 + Brennan, M., Bigoni, D., Zahm, O., +Spantini, A., & Marzouk, Y. (2020). Greedy inference with +structure-exploiting lazy maps. Advances in Neural Information +Processing Systems, 33, 8330–8342. + + + Preconditioned training of normalizing flows +for variational inference in inverse problems + Siahkoohi + Third symposium on advances in approximate +bayesian inference + 2021 + Siahkoohi, A., Rizzuti, G., +Louboutin, M., Witte, P., & Herrmann, F. (2021). Preconditioned +training of normalizing flows for variational inference in inverse +problems. Third Symposium on Advances in Approximate Bayesian Inference. +https://openreview.net/forum?id=P9m1sMaNQ8T + + + Bayesian inference with optimal +maps + El Moselhy + Journal of Computational +Physics + 23 + 231 + 10.1016/j.jcp.2012.07.022 + 2012 + El Moselhy, T. A., & Marzouk, Y. +M. (2012). Bayesian inference with optimal maps. Journal of +Computational Physics, 231(23), 7815–7850. +https://doi.org/10.1016/j.jcp.2012.07.022 + + + On the distribution of points in a cube and +the approximate evaluation of integrals + Sobol’ + Zhurnal Vychislitel’noi Matematiki i +Matematicheskoi Fiziki + 4 + 7 + 10.1016/0041-5553(67)90144-9 + 1967 + Sobol’, I. M. (1967). On the +distribution of points in a cube and the approximate evaluation of +integrals. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki, +7(4), 784–802. +https://doi.org/10.1016/0041-5553(67)90144-9 + + + PyMC: A modern, and comprehensive +probabilistic programming framework in Python + Abril-Pla + PeerJ Computer Science + 9 + 10.7717/peerj-cs.1516 + 2023 + Abril-Pla, O., Andreani, V., Carroll, +C., Dong, L., Fonnesbeck, C. J., Kochurov, M., Kumar, R., Lao, J., +Luhmann, C. C., Martin, O. A., & others. (2023). PyMC: A modern, and +comprehensive probabilistic programming framework in Python. PeerJ +Computer Science, 9, e1516. +https://doi.org/10.7717/peerj-cs.1516 + + + Bayespy: Variational Bayesian inference in +Python + Luttinen + The Journal of Machine Learning +Research + 1 + 17 + 2016 + Luttinen, J. (2016). Bayespy: +Variational Bayesian inference in Python. The Journal of Machine +Learning Research, 17(1), 1419–1424. + + + Pyro: Deep universal probabilistic +programming + Bingham + Journal of machine learning +research + 28 + 20 + 2019 + Bingham, E., Chen, J. P., Jankowiak, +M., Obermeyer, F., Pradhan, N., Karaletsos, T., Singh, R., Szerlip, P., +Horsfall, P., & Goodman, N. D. (2019). Pyro: Deep universal +probabilistic programming. Journal of Machine Learning Research, 20(28), +1–6. + + + PyVBMC: Efficient Bayesian inference in +Python + Huggins + Journal of Open Source +Software + 86 + 8 + 10.21105/joss.05428 + 2023 + Huggins, B., Li, C., Tobaben, M., +Aarnos, M. J., & Acerbi, L. (2023). PyVBMC: Efficient Bayesian +inference in Python. Journal of Open Source Software, 8(86), 5428. +https://doi.org/10.21105/joss.05428 + + + + + + diff --git a/joss.06309/10.21105.joss.06309.jats b/joss.06309/10.21105.joss.06309.jats new file mode 100644 index 0000000000..4f67dbcfbd --- /dev/null +++ b/joss.06309/10.21105.joss.06309.jats @@ -0,0 +1,2367 @@ + + +
+ + + + +Journal of Open Source Software +JOSS + +2475-9066 + +Open Journals + + + +6309 +10.21105/joss.06309 + +LINFA: a Python library for variational inference with +normalizing flow and annealing + + + + +Wang +Yu + +ywang50@nd.edu + + + + +Cobian +Emma R. + +ecobian@nd.edu + + + + +Lee +Jubilee + +jlee222@nd.edu + + + + +Liu +Fang + +fang.liu.131@nd.edu + + + + +Hauenstein +Jonathan D. + +hauenstein@nd.edu + + + + +Schiavazzi +Daniele E. + +dschiavazzi@nd.edu + +* + + + +Department of Applied and Computational Mathematics and +Statistics, University of Notre Dame, Notre Dame, IN 46556, +USA. + + + + +* E-mail: dschiavazzi@nd.edu + + +9 +11 +2023 + +9 +96 +6309 + +Authors of papers retain copyright and release the +work under a Creative Commons Attribution 4.0 International License (CC +BY 4.0) +2022 +The article authors + +Authors of papers retain copyright and release the work under +a Creative Commons Attribution 4.0 International License (CC BY +4.0) + + + +Python +variational inference +normalizing flow +adaptive posterior annealing +physics-based models + + + + + + Summary +

Variational inference is an increasingly popular method in + statistics and machine learning for approximating probability + distributions. We developed LINFA (Library for Inference with + Normalizing Flow and Annealing), a Python library for variational + inference to accommodate computationally expensive models and + difficult-to-sample distributions with dependent parameters. We + discuss the theoretical background, capabilities, and performance of + LINFA in various benchmarks. LINFA is publicly available on GitHub at + https://github.com/desResLab/LINFA.

+
+ + Statement of need +

Generating samples from a posterior distribution is a fundamental + task in Bayesian inference. The development of sampling-based + algorithms from the Markov chain Monte Carlo family + (Gelfand + & Smith, 1990; + Geman + & Geman, 1984; + Hastings, + 1970; + Metropolis + et al., 1953) has made solving Bayesian inverse problems + accessible to a wide audience of both researchers and practitioners. + However, the number of samples required by these approaches is + typically significant and the convergence of Markov chains to their + stationary distribution can be slow especially in high-dimensions. + Additionally, satisfactory convergence may not be always easy to + quantify, even if a number of metrics have been proposed in the + literature over the years. More recent paradigms have been proposed in + the context of variational inference + (Wainwright + et al., 2008), where an optimization problem is formulated to + determine the optimal member of a parametric family of distributions + that can approximate a target posterior density. In addition, flexible + approaches to parametrize variational distributions through a + composition of transformations (closely related to the concept of + trasport maps, see, e.g., Villani & others + (2009)) + have reached popularity under the name of normalizing + flows + (Dinh + et al., 2016; + Kingma + et al., 2016; + Kobyzev + et al., 2020; + Papamakarios + et al., 2021; + Rezende + & Mohamed, 2015). The combination of variational inference + and normalizing flow has received significant recent interest in the + context of general algorithms for solving inverse problems + (El + Moselhy & Marzouk, 2012; + Rezende + & Mohamed, 2015).

+

However, cases where the computational cost of evaluating the + underlying probability distribution is significant occur quite often + in engineering and applied sciences, for example when such evaluation + requires the solution of an ordinary or partial differential equation. + In such cases, inference can easily become intractable. Additionally, + strong and nonlinear dependence between model parameters may results + in difficult-to-sample posterior distributions characterized by + features at multiple scales or by multiple modes. The LINFA library is + specifically designed for cases where the model evaluation is + computationally expensive. In such cases, the construction of an + adaptively trained surrogate model is key to reducing the + computational cost of inference + (Wang + et al., 2022). In addition, LINFA provides an adaptive + annealing scheduler, where temperature increments are automatically + determined based on the available variational approximant of the + posterior distribution. Thus, adaptive annealing makes it easier to + sample from complicated densities + (Cobian + et al., 2023).

+
+ + Capabilities +

LINFA is designed as a general inference engine and allows the user + to define custom input transformations, computational models, + surrogates, and likelihood functions.

+ + +

User-defined input parameter transformations - + Input transformations may reduce the complexity of inference and + surrogate model construction in situations where the ranges of the + input variables differ substantially or when the input parameters + are bounded. A number of pre-defined univariate transformations + are provided, i.e, identity, + tanh, linear, and + exp. These transformations are + independently defined for each input variable, using four + parameters + + (a,b,c,d), + providing a nonlinear transformation between the + normalized interval + + + [a,b] + and the physical interval + + + [c,d]. + Additional transformations can be defined by implementing the + following member functions.

+ + +

forward - It evaluates the + transformation from the normalized to the physical space. One + transformation needs to be defined for each input dimension. + For example, the list of lists

+

+ trsf_info = [['tanh',-7.0,7.0,100.0,1500.0], + ['tanh',-7.0,7.0,100.0,1500.0], + ['exp',-7.0,7.0,1.0e-5,1.0e-2]] +

+

defines a hyperbolic tangent transformation for the first + two variables and an exponential transformation for the + third.

+
+ +

compute_log_jacob_func - This is the + log Jacobian of the transformation that needs to be included + in the computation of the log posterior density to account for + the additional change in volume.

+
+
+
+ +

User-defined computational models - LINFA can + accommodate any type of models from analytically defined + posteriors with the gradient computed through automatic + differentiation to legacy computational solvers for which the + solution gradient is not available nor easy to compute. New models + are created by implementing the methods below.

+ + +

genDataFile - This is a + pre-processing function used to generate synthetic + observations. It computes the model output corresponding to + the default parameter values (usually defined as part of the + model) and adds noise with a user-specified distribution. + Observations will be stored in a file and are typically + assigned to model.data so they are + available for computing the log posterior.

+
+ +

solve_t - This function solves the + model for multiple values of the physical + input parameters specified in a matrix format (with one sample + for each row and one column for each input parameter + dimension).

+
+
+
+ +

User-defined surrogate models - For computational + models that are too expensive for online inference, LINFA provides + functionalities to create, train, and fine-tune a + surrogate model. The + Surrogate class implements the following + functionalities:

+ + +

A new surrogate model can be created using the + Surrogate constructor.

+
+ +

limits (i.e. upper and lower bounds) + are stored as a list of lists using the format

+

+ [[low_0, high_0], [low_1, high_1], ...]. +

+
+ +

A pre-grid is defined as an a priori + selected point cloud created inside the hyper-rectangle + defined by limits. The pre-grid can be + either of type 'tensor' (tensor product + grid) where the grid order (number of points in each + dimension) is defined through the argument + gridnum, or of type + 'sobol' (using low-discrepancy + quasi-random Sobol’ sequences, see Sobol’ + (1967)), + in which case the variable gridnum + defines the total number of samples.

+
+ +

Surrogate model Input/Output. The two functions + surrogate_save() and + surrogate_load() are provided to save a + snapshot of a given surrogate or to read it from a file.

+
+ +

The pre_train() function is provided + to perform an initial training of the surrogate model on the + pre-grid. In addition, the update() + function is also available to re-train the model once + additional training examples are available.

+
+ +

The forward() function evaluates the + surrogate model at multiple input realizations. If a + transformation is defined, the surrogate should always be + specified in the normalized domain with + limits defined in terms of the normalized intervals (i.e., + + + [a,b]).

+
+
+
+ +

User-defined likelihood - A user-defined + likelihood function can be defined by passing the parameters, the + model, the surrogate and a coordinate transformation using

+

+ log_density(x, model, surrogate, transformation), +

+

and then assigning it as a member function of the + experiment class using:

+

+ exp.model_logdensity = lambda x: log_density(x, model, surr, transf). +

+
+ +

Linear and adaptive annealing schedulers - LINFA + provides two annealing schedulers by default. The first is the + 'Linear' scheduler with constant + increments. The second is the 'AdaAnn' + adaptive scheduler + (Cobian + et al., 2023) with hyperparameters reported in + [tab:adaann]. + For the AdaAnn scheduler, the user can also specify a different + number of parameter updates to be performed at the initial + temperature + + t0, + final temperature + + t1, + and for any temperature + + t0<t<1. + Finally, the batch size (number of samples used to evaluate the + expectations in the loss function) can also be differentiated for + + + t=1 + and + + t<1.

+
+ +

User-defined hyperparameters - A complete list of + hyperparameters with a description of their functionality can be + found in the Appendix.

+
+
+ + Related software modules and packages for variational + inference +

Other Python modules and packages were found to provide an + implementation of variational inference with a number of additional + features. An incomplete list of these packages is reported + below.

+ + +

PyMC + (Abril-Pla + et al., 2023).

+
+ +

BayesPy + (Luttinen, + 2016) (with an accompanying paper, + BayesPy: + Variational Bayesian Inference in Python).

+
+ +

Pyro + (Bingham + et al., 2019) (with + some + examples).

+
+ +

PyVBMC + (Huggins + et al., 2023) with accompanying + JOSS + article.

+
+ +

Online notebooks (see this + example) + which implement variational inference from scratch in + pytorch.

+
+
+

LINFA is based on normalizing flow transformations and therefore + can infer non linear parameter dependence. It also provides the + ability to adaptively train a surrogate model (NoFAS) which + significantly reduces the computational cost of inference for the + parameters of expensive computational models. Finally, LINFA + provides an adaptive annealing algorithm (AdaAnn) which autonomously + selects the appropriate annealing steps based on the current + approximation of the posterior distribution.

+
+
+ + Numerical benchmarks +

We tested LINFA on multiple problems. These include inference on + unimodal and multi-modal posterior distributions specified in closed + form, ordinary differential models and dynamical systems with + gradients directly computed through automatic differentiation in + PyTorch, identifiable and non-identifiable physics-based models with + fixed and adaptive surrogates, and high-dimensional statistical + models. Some of the above tests are included with the library and + systematically tested using GitHub Actions. A detailed discussion of + these test cases is provided in the Appendix. To run the test type

+ python -m unittest linfa.linfa_test_suite.NAME_example +

where NAME is the name of the test case, + either trivial, highdim, + rc, rcr, + adaann or + rcr_nofas_adaann.

+
+ + Conclusion and Future Work +

In this paper, we have introduced the LINFA library for variational + inference, briefly discussed the relevant background, its + capabilities, and report its performance on a number of test cases. + Some interesting directions for future work are mentioned below.

+

Future versions will support user-defined privacy-preserving + synthetic data generation and variational inference through + differentially private gradient descent algorithms. This will allow + the user to perform inference tasks while preserving a pre-defined + privacy budget, as discussed in + (Su + et al., 2023). LINFA will also be extended to handle multiple + models. This will open new possibilities to solve inverse problems + combining variational inference and multi-fidelity surrogates (see, + e.g., Siahkoohi et al. + (2021)). + In addition, for inverse problems with significant dependence among + the parameters, it is often possible to simplify the inference task by + operating on manifolds of reduced dimensionality + (Brennan + et al., 2020). New modules for dimensionality reduction will be + developed and integrated with the LINFA library. Finally, the ELBO + loss typically used in variational inference has known limitations, + some of which are related to its close connection with the KL + divergence. Future versions of LINFA will provide the option to use + alternative losses.

+
+ + Acknowledgements +

The authors gratefully acknowledge the support by the NSF Big Data + Science & Engineering grant #1918692 and the computational + resources provided through the Center for Research Computing at the + University of Notre Dame. DES also acknowledges support from NSF + CAREER grant #1942662.

+
+ + Appendix + + Background theory + + Variational inference with normalizing flow +

Consider the problem of estimating (in a Bayesian sense) the + parameters + + 𝐳𝒵 + of a physics-based or statistical model + + + 𝐱=𝐟(𝐳)+𝛆, + from the observations + + 𝐱𝒳 + and a known statistical characterization of the error + + + 𝛆. + We tackle this problem with variational inference and normalizing + flow. A normalizing flow (NF) is a nonlinear transformation + + + F:d×𝚲d + designed to map an easy-to-sample base + distribution + + q0(𝐳0) + into a close approximation + + qK(𝐳K) + of a desired target posterior density + + + p(𝐳|𝐱). + This transformation can be determined by composing + + + K + bijections + + 𝐳K=F(𝐳0)=FKFK1FkF1(𝐳0), + and evaluating the transformed density through the change of + variable formula (see Villani & others + (2009)).

+

In the context of variational inference, we seek to determine + an optimal set of parameters + + + 𝛌𝚲 + so that + + qK(𝐳K)p(𝐳|𝐱). + Given observations + + 𝐱𝐗, + a likelihood function + + l𝐳(𝐱) + (informed by the distribution of the error + + + 𝛆) + and prior + + p(𝐳), + a NF-based approximation + + qK(𝐳) + of the posterior distribution + + p(𝐳|𝐱) + can be computed by maximizing the lower bound to the log marginal + likelihood + + logp(𝐱) + (the so-called evidence lower bound or ELBO), or, + equivalently, by minimizing a free energy bound + (see, e.g., Rezende & Mohamed + (2015)).

+

+ + (𝐱)=𝔼qK(𝐳K)[logqK(𝐳K)logp(𝐱,𝐳K)]=𝔼q0(𝐳0)[logq0(𝐳0)]𝔼q0(𝐳0)[logp(𝐱,𝐳K)]𝔼q0(𝐳0)[k=1Klog|det𝐳k𝐳k1|].

+

For computational convenience, normalizing flow transformations + are selected to be easily invertible and their Jacobian + determinant can be computed with a cost that grows linearly with + the problem dimensionality. Approaches in the literature include + RealNVP + (Dinh + et al., 2016), GLOW + (Kingma + & Dhariwal, 2018), and autoregressive transformations + such as MAF + (Papamakarios + et al., 2017) and IAF + (Kingma + et al., 2016). Detailed reviews on a wide range of flow + formulations can be found in Kobyzev et al. + (2020) + and Papamakarios et al. + (2021).

+
+ + MAF and RealNVP +

LINFA implements two widely used normalizing flow formulations, + MAF + (Papamakarios + et al., 2017) and RealNVP + (Dinh + et al., 2016). MAF belongs to the class of + autoregressive normalizing flows. Given the + latent variable + + 𝐳=(z1,z2,,zd), + it assumes + + p(zi|z1,,zi1)=ϕ[(ziμi)/eαi], + where + + ϕ + is the standard normal distribution, + + + μi=fμi(z1,,zi1), + + + αi=fαi(z1,,zi1),i=1,2,,d, + and + + fμi + and + + fαi + are masked autoencoder neural networks (MADE, Germain et al. + (2015)). + In a MADE autoencoder the network connectivities are multiplied by + Boolean masks so the input-output relation maintains a lower + triangular structure, making the computation of the Jacobian + determinant particularly simple. MAF transformations are then + composed of multiple MADE layers, possibly interleaved by batch + normalization layers + (Ioffe + & Szegedy, 2015), typically used to add stability + during training and increase network accuracy + (Papamakarios + et al., 2017).

+

RealNVP is another widely used flow where, at each layer the + first + + d + variables are left unaltered while the remaining + + + dd + are subject to an affine transformation of the form + + + 𝐳̂d+1:d=𝐳d+1:de𝛂+𝛍, + where + + 𝛍=fμ(𝐳1:d) + and + + 𝛂=fα(𝐳d+1:d) + are MADE autoencoders. In this context, MAF could be seen as a + generalization of RealNVP by setting + + + μi=αi=0 + for + + id + (Papamakarios + et al., 2017).

+
+ + Normalizing flow with adaptive surrogate (NoFAS) +

LINFA is designed to accommodate black-box models + + + 𝐟:𝒵𝒳 + between the random inputs + + 𝐳=(z1,z2,,zd)T𝒵 + and the outputs + + (x1,x2,,xm)T𝒳, + and assumes + + n + observations + + 𝐱={𝐱i}i=1n𝒳 + to be available. Our goal is to infer + + + 𝐳 + and to quantify its uncertainty given + + + 𝐱. + We embrace a variational Bayesian paradigm and sample from the + posterior distribution + + p(𝐳|𝐱)𝐳(𝐱,𝐟)p(𝐳), + with prior + + p(𝐳) + via normalizing flows.

+

This requires the evaluation of the gradient of the ELBO + 1 with respect to the NF + parameters + + 𝛌, + replacing + + p(𝐱,𝐳K) + with + + p(𝐱|𝐳K)p(𝐳) + + + =𝐳K(𝐱,𝐟)p(𝐳), + and approximating the expectations with their MC estimates. + However, the likelihood function needs to be evaluated at every MC + realization, which can be costly if the model + + + 𝐟(𝐳) + is computationally expensive. In addition, automatic + differentiation through a legacy (e.g. physics-based) solver may + be an impractical, time-consuming, or require the development of + an adjoint solver.

+

Our solution is to replace the model + + + 𝐟 + with a computationally inexpensive surrogate + + + 𝐟̂:𝒵×𝒲𝒳 + parameterized by the weigths + + 𝐰𝒲, + whose derivatives can be obtained at a relatively low + computational cost, but intrinsic bias in the selected surrogate + formulation, a limited number of training examples, and locally + optimal + + 𝐰 + can compromise the accuracy of + + 𝐟̂.

+

To resolve these issues, LINFA implements NoFAS, which updates + the surrogate model adaptively by smartly weighting the samples of + + + 𝐳 + from NF thanks to a memory-aware loss function. + Once a newly updated surrogate is obtained, the likelihood + function is updated, leading to a new posterior distribution that + will be approximated by VI-NF, producing, in turn, new samples for + the next surrogate model update, and so on. Additional details can + be found in Wang et al. + (2022).

+
+ + Adaptive Annealing +

Annealing is a technique to parametrically smooth a target + density to improve sampling efficiency and accuracy during + inference. In the discrete case, this is achieved by incrementing + an inverse temperature + + + tk + and setting + + pk(𝐳,𝐱)=ptk(𝐳,𝐱),for k=0,,K, + where + + 0<t0<<tK1. + The result of exponentiation produces a smooth unimodal + distribution for a sufficiently small + + + t0, + recovering the target density as + + tk + approaches 1. In other words, annealing provides a continuous + deformation from an easier to approximate unimodal distribution to + a desired target density.

+

A linear annealing scheduler with fixed temperature increments + is often used in practice (see, e.g., Rezende & Mohamed + (2015)), + where + + tj=t0+j(1t0)/K + for + + j=0,,K + with constant increments + + ϵ=(1t0)/K. + Intuitively, small temperature changes are desirable to carefully + explore the parameter spaces at the beginning of the annealing + process, whereas larger changes can be taken as + + + tk + increases, after annealing has helped to capture important + features of the target distribution (e.g., locating all the + relevant modes).

+

The AdaAnn scheduler determines the increment + + + ϵk + that approximately produces a pre-defined change in the KL + divergence between two distributions annealed + at~ + + tk + and + + tk+1=tk+ϵk, + respectively. Letting the KL divergence equal a constant + + + τ2/2, + where + + τ + is referred to as the KL tolerance, the step size + + + ϵk + becomes

+

+ + ϵk=τ/𝕍ptk[logp(𝐳,𝐱)].

+

The denominator is large when the support of the annealed + distribution + + ptk(𝐳,𝐱) + is wider than the support of the target + + + p(𝐳,𝐱), + and progressively reduces with increasing + + + tk. + Further detail on the derivation of the expression for + + + ϵk + can be found in Cobian et al. + (2023).

+
+
+ + Numerical benchmarks + + Simple two-dimensional map with Gaussian likelihood +

A model + + f:22 + is chosen in this experiment having the closed-form expression + + + f(𝐳)=f(z1,z2)=(z13/10+exp(z2/3),z13/10exp(z2/3))T. + Observations + + 𝐱 + are generated as

+

+ + 𝐱=𝐱*+0.05|𝐱*|𝐱0,

+

where + + 𝐱0𝒩(0,𝐈2) + and + + + is the Hadamard product. We set the true model + parameters at + + 𝐳*=(3,5)T, + with output + + 𝐱*=f(𝐳*)=(7.99,2.59)T, + and simulate 50 sets of observations from + 3. The likelihood of + + + 𝐳 + given + + 𝐱 + is assumed Gaussian, and we adopt a noninformative uniform prior + + + p(𝐳). + We allocate a budget of + + 4×4=16 + model solutions to the pre-grid and use the rest to adaptively + calibrate + + f̂ + using + + 2 + samples every + + 1000 + normalizing flow iterations.

+

Results in terms of loss profile, variational approximation, + and posterior predictive distribution are shown in + [fig:trivial].

+

+ +

Results from the simple two-dimensional map. Loss + profile (left), posterior samples (center), and posterior + predictive distribution (right).

+
+
+ + High-dimensional example +

We consider a map + + f:54 + expressed as + + f(𝐳)=𝐀𝐠(e𝐳), + where + + gi(𝐫)=(2|2ai1|+ri)/(1+ri) + with + 0]]> + ri>0 + for + + i=1,,5 + is the Sobol’ function + (Sobol’, + 2003) and + + 𝐀 + is a + + 4×5 + matrix. We also set + + 𝐚=(0.084,0.229,0.913,0.152,0.826)T and 𝐀=12(11000011000011000011). + The true parameter vector is + + 𝐳*=(2.75, + + + 1.5,0.25, + + + 2.5, + + + 1.75)T. + While the Sobol’ function is bijective and analytic, + + + f + is over-parameterized and non identifiabile. This is also + confirmed by the fact that the curve segment + + + γ(t)=g1(g(𝐳*)+𝐯t)Z + gives the same model solution as + + 𝐱*=f(𝐳*)=f(γ(t))(1.4910, + + + 1.6650, + + + 1.8715, + + + 1.7011)T + for + + t(0.0153,0.0686], + where + + 𝐯=(1,1,1,1,1)T. + This is consistent with the one-dimensional null-space of the + matrix + + 𝐀. + We also generate synthetic observations from the Gaussian + distribution + + 𝐱=𝐱*+0.01|𝐱*|𝐱0 + with + + 𝐱0𝒩(0,𝐈5), + and results shown in + [fig:highdim].

+

+

+ +

Results from the high-dimensional example. The top + row contains the loss profile (left) and samples from the + posterior predictive distribution plus the available + observations (right). Samples from the posterior distribution + are instead shown in the bottom row.

+
+
+ + Two-element Windkessel Model +

The two-element Windkessel model (often referred to as the + RC model) is the simplest representation of the + human systemic circulation and requires two parameters, i.e., a + resistance + + R[100,1500] + Barye + + + s/ml and a capacitance + + C[1×105,1×102] + ml/Barye. We provide a periodic time history of the aortic flow + (see Wang et al. + (2022) + for additional details) and use the RC model to predict the time + history of the proximal pressure + + Pp(t), + specifically its maximum, minimum, and average values over a + typical heart cycle, while assuming the distal resistance + + + Pd(t) + as a constant in time, equal to 55 mmHg. In our experiment, we set + the true resistance and capacitance as + + + zK,1*=R*=1000 + Barye + + + s/ml and + + zK,2*=C*=5×105 + ml/Barye, and determine + + Pp(t) + from a RK4 numerical solution of the following + algebraic-differential system

+

+ + Qd=PpPdR,dPpdt=QpQdC,

+

where + + Qp + is the flow entering the RC system and + + + Qd + is the distal flow. Synthetic observations are generated by adding + Gaussian noise to the true model solution + + + 𝐱*=(x1*,x2*,x3*)=(Pp,min, + + + Pp,max, + + + Pp,avg)=(78.28,101.12,85.75), + i.e., + + 𝐱 + follows a multivariate Gaussian distribution with mean + + + 𝐱* + and a diagonal covariance matrix with entries + + + 0.05xi*, + where + + i=1,2,3 + corresponds to the maximum, minimum, and average pressures, + respectively. The aim is to quantify the uncertainty in the RC + model parameters given 50 repeated pressure measurements. We + imposed a non-informative prior on + + R + and + + C. + Results are shown in + [fig:rc_res].

+

+ +

Results from the RC model. Loss profile (left), + posterior samples (center) for R and C, and the posterior + predictive distribution for + + Pp,min + and + + Pp,max + (right, + + Pp,avg + not shown).

+
+
+ + Three-element Wndkessel Circulatory Model (NoFAS + + AdaAnn) +

The three-parameter Windkessel or RCR model is + characterized by proximal and distal resistance parameters + + + Rp,Rd[100,1500] + Barye + + s/ml, + and one capacitance parameter + + C[1×105,1×102] + ml/Barye. This model is not identifiable. The average distal + pressure is only affected by the total system resistance, i.e. the + sum + + Rp+Rd, + leading to a negative correlation between these two parameters. + Thus, an increment in the proximal resistance is compensated by a + reduction in the distal resistance (so the average distal pressure + remains the same) which, in turn, reduces the friction encountered + by the flow exiting the capacitor. An increase in the value of + + + C + is finally needed to restore the average, minimum and maximum + pressure. This leads to a positive correlation between + + + C + and + + Rd.

+

The output consists of the maximum, minimum, and average values + of the proximal pressure + + Pp(t), + i.e., + + (Pp,min,Pp,max,Pp,avg) + over one heart cycle. The true parameters are + + + zK,1*=Rp*=1000 + Barye + + s/ml, + + + zK,2*=Rd*=1000 + Barye + + s/ml, + and + + C*=5×105 + ml/Barye. The proximal pressure is computed from the solution of + the algebraic-differential system + + Qp=PpPcRp,Qd=PcPdRd,dPcdt=QpQdC, + where the distal pressure is set to + + Pd=55 + mmHg. Synthetic observations are generated from + + + N(𝛍,𝚺), + where + + μ=(f1(𝐳*),f2(𝐳*),f3(𝐳*))T + = + + (Pp,min,Pp,max,Pp,ave)T + = + + (100.96,148.02,116.50)T + and + + 𝚺 + is a diagonal matrix with entries + + (5.05,7.40,5.83)T. + The budgeted number of true model solutions is + + + 216; + the fixed surrogate model is evaluated on a + + + 6×6×6=216 + pre-grid while the adaptive surrogate is evaluated with a pre-grid + of size + + 4×4×4=64 + and the other 152 evaluations are adaptively selected.

+

This example also demonstrates how NoFAS can be combined with + annealing for improved convergence. The results in + [fig:rcr_res] + are generated using the AdaAnn adaptive annealing scheduler with + intial inverse temperature + + t0=0.05, + KL tolerance + + τ=0.01 + and a batch size of 100 samples. The number of parameter updates + is set to 500, 5000 and 5 for + + t0, + + + t1 + and + + t0<t<t1, + respectively and 1000 Monte Carlo realizations are used to + evaluate the denominator in equation + 2. The posterior samples + capture well the nonlinear correlation among the parameters and + generate a fairly accurate posterior predictive distribution that + overlaps with the observations. Additional details can be found in + Wang et al. + (2022) + and Cobian et al. + (2023).

+

+

+ +

Results from the RCR model. The top row contains the + loss profile (left) and samples from the posterior predictive + distribution plus the available observations (right). Samples + from the posterior distribution are instead shown in the bottom + row.

+
+
+ + Friedman 1 model (AdaAnn) +

We consider a modified version of the Friedman 1 dataset + (Friedman, + 1991) to examine the performance of our adaptive annealing + scheduler in a high-dimensional context. According to the original + model in Friedman + (1991), + the data are generated as + + + yi=μi(𝛃)+ϵi, where μi(𝛃)=β1sin(πxi,1xi,2)+β2(xi,3β3)2+j=410βjxi,j, + where + + ϵi𝒩(0,1). + We made a slight modification to the model in + 5 as + + + μi(𝛃)=β1sin(πxi,1xi,2)+β22(xi,3β3)2+j=410βjxi,j, + and set the true parameter combination to + + + 𝛃=(β1,,β10)=(10,±20,0.5,10,5,0,0,0,0,0). + Note that both 5 and + 6 contain + linear, nonlinear, and interaction terms of the input variables + + + X1 + to + + X10, + five of which ( + + X6 + to + + X10) + are irrelevant to + + Y. + Each + + X + is drawn independently from + + 𝒰(0,1). + We used R package tgp + (Gramacy, + 2007) to generate a Friedman1 dataset with a sample size of + + + n=1000. + We impose a non-informative uniform prior + + + p(𝛃) + and, unlike the original modal, we now expect a bimodal posterior + distribution of + + 𝛃. + Results in terms of marginal statistics and their convergence for + the mode with positive + + zK,2 + are illustrated in + [table:Friedman_bimodal_stats] + and + [fig:adaann_res].

+ + + +

Posterior mean and standard deviation for positive mode + in the modified Friedman test case.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
TrueMode 1
ValuePost. MeanPost. SD
+ + β1=1010.02850.1000
+ + β2=±204.21870.1719
+ + β3=0.50.48540.0004
+ + β4=1010.09870.0491
+ + β5=55.01820.1142
+ + β6=00.11130.0785
+ + β7=00.07070.0043
+ + β8=0-0.13150.1008
+ + β9=00.09760.0387
+ + β10=00.11920.0463
+
+
+

+ +

Loss profile (left) and posterior marginal + statistics (right) for positive mode in the modified Friedman + test case.

+
+
+ + Hyperparameters in LINFA +

This section contains the list of all hyperparameters in the + library, their default values, and a description of the + functionalities they control. General hyperparameters are listed + in + [tab:par_general], + those related to the optimization process in + [tab:par_optimizers], + and to the output folder and files in + [tab:par_output]. + Hyperparameters for the proposed NoFAS and AdaAnn approaches are + listed in + [tab:surr_optimizers] + and + [tab:adaann], + respectively. Finally, a hyperparameter used to select the + hardware device is described in + [tab:par_device].

+ + + +

Output parameters

+ + + + + + + + + + + + + + + + + + + + + + + + + + +
OptionTypeDescription
output_dirstringName of output folder where results + files are saved.
log_filestringName of the log file which stores the + iteration number, annealing temperature, and value of + the loss function at each iteration.
seedintSeed for the random number + generator.
+
+
+ + + +

Surrogate model parameters (NoFAS)

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OptionTypeDescription
n_sampleintBatch size used when saving results to + the disk (i.e., once every + save_interval iterations).
calibrate_intervalintNumber of NF iteration between + successive updates of the surrogate model (default + 1000).
budgetintMaximum allowable number of true model + evaluations.
surr_pre_itintNumber of pre-training iterations for + surrogate model (default + 40000).
surr_upd_itintNumber of iterations for the surrogate + model update (default + 6000).
surr_folderstringFolder where the surrogate model is + stored (default + ’./’).
use_new_surrboolStart by pre-training a new surrogate + and ignore existing surrogates (default + True).
store_surr_intervalintSave interval for surrogate model + (None for no + save, default + None).
+
+
+ + + +

Device parameters

+ + + + + + + + + + + + + + + + +
OptionTypeDescription
no_cudaboolDo not use GPU acceleration.
+
+
+ + + +

Optimizer and learning rate parameters

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OptionTypeDescription
optimizerstringType of SGD optimizer (default + ’Adam’).
lrfloatLearning rate (default + 0.003).
lr_decayfloatLearning rate decay (default + 0.9999).
lr_schedulerstringType of learning rate scheduler + (’StepLR’ or + ’ExponentialLR’).
lr_stepintNumber of steps before learning rate + reduction for the step scheduler.
log_intervalintNumber of iterations between successive + loss printouts (default + 10).
+
+
+ + + +

General parameters

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OptionTypeDescription
namestrName of the experiment.
flow_typestrtype of normalizing flow + (’maf’,’realnvp’).
n_blocksintNumber of normalizing flow layers + (default + 5).
hidden_sizeintNumber of neurons in MADE and RealNVP + hidden layers (default + 100).
n_hiddenintNumber of hidden layers in MADE + (default 1).
activation_fnstrActivation function for MADE network + used by MAF (default + ’relu’).
input_orderstrInput order for MADE mask creation + (’sequential’ or + ’random’, + default + ’sequential’).
batch_norm_orderboolAdds batchnorm layer after each MAF or + RealNVP layer (default + True).
save_intervalintHow often to save results from the + normalizing flow iterations. Saved results include + posterior samples, loss profile, samples from the + posterior predictive distribution, observations, and + marginal statistics.
input_sizeintInput dimensionality (default + 2).
batch_sizeintNumber of samples from the basic + distribution generated at each iteration (default + 100).
true_data_numintNumber of additional true model + evaluations at each surrogate model update (default + 2).
n_iterintTotal number of NF iterations (default + 25001).
+
+
+ + + +

Parameters for the adaptive annealing scheduler + (AdaAnn)

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OptionTypeDescription
annealingboolFlag to activate the annealing + scheduler. If this is + False, the + target posterior distribution is left unchanged during + the iterations.
schedulerstringType of annealing scheduler + (’AdaAnn’ or + ’fixed’, default + ’AdaAnn’).
tolfloatKL tolerance. It is kept constant + during inference and used in the numerator of equation + 2.
t0floatInitial inverse temperature.
NintNumber of batch samples during + annealing.
N_1intNumber of batch samples at + + + t=1.
T_0intNumber of initial parameter updates at + + + t0.
TintNumber of parameter updates after each + temperature update. During such updates the temperature + is kept fixed.
T_1intNumber of parameter updates at + + + t=1
MintNumber of Monte Carlo samples used to + evaluate the denominator in equation + 2.
+
+
+
+
+
+ + + + + + + GemanStuart + GemanDonald + + Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images + IEEE Transactions on pattern analysis and machine intelligence + IEEE + 1984 + 6 + 10.1109/TPAMI.1984.4767596 + 721 + 741 + + + + + + MetropolisNicholas + RosenbluthArianna W + RosenbluthMarshall N + TellerAugusta H + TellerEdward + + Equation of state calculations by fast computing machines + The journal of chemical physics + American Institute of Physics + 1953 + 21 + 6 + 10.1063/1.1699114 + 1087 + 1092 + + + + + + HastingsW Keith + + Monte Carlo sampling methods using Markov chains and their applications + Oxford University Press + 1970 + 10.1093/biomet/57.1.97 + + + + + + GelfandAlan E + SmithAdrian FM + + Sampling-based approaches to calculating marginal densities + Journal of the American statistical association + Taylor & Francis + 1990 + 85 + 410 + 10.1080/01621459.1990.10476213 + 398 + 409 + + + + + + WainwrightMartin J + JordanMichael I + others + + Graphical models, exponential families, and variational inference + Foundations and Trends in Machine Learning + Now Publishers, Inc. + 2008 + 1 + 1–2 + 10.1561/2200000001 + 1 + 305 + + + + + + VillaniCédric + others + + Optimal transport: Old and new + Springer + 2009 + 338 + 10.1007/978-3-540-71050-9 + + + + + + KobyzevIvan + PrinceSimon JD + BrubakerMarcus A + + Normalizing flows: An introduction and review of current methods + IEEE transactions on pattern analysis and machine intelligence + IEEE + 2020 + 43 + 11 + 10.1109/TPAMI.2020.2992934 + 3964 + 3979 + + + + + + PapamakariosGeorge + NalisnickEric + RezendeDanilo Jimenez + MohamedShakir + LakshminarayananBalaji + + Normalizing flows for probabilistic modeling and inference + The Journal of Machine Learning Research + JMLRORG + 2021 + 22 + 1 + 2617 + 2680 + + + + + + RezendeDanilo + MohamedShakir + + Variational inference with normalizing flows + International conference on machine learning + PMLR + 2015 + 1530 + 1538 + + + + + + WangYu + LiuFang + SchiavazziDaniele E + + Variational inference with NoFAS: Normalizing flow with adaptive surrogate for computationally expensive models + Journal of Computational Physics + Elsevier + 2022 + 467 + 10.1016/j.jcp.2022.111454 + 111454 + + + + + + + CobianEmma R + HauensteinJonathan D + LiuFang + SchiavazziDaniele E + + AdaAnn: Adaptive annealing scheduler for probability density approximation + International Journal for Uncertainty Quantification + Begel House Inc. + 2023 + 13 + 10.1615/Int.J.UncertaintyQuantification.2022043110 + + + + + + DinhLaurent + Sohl-DicksteinJascha + BengioSamy + + Density estimation using real NVP + arXiv preprint arXiv:1605.08803 + 2016 + + + + + + KingmaDurk P + DhariwalPrafulla + + Glow: Generative flow with invertible 1x1 convolutions + Advances in neural information processing systems + 2018 + 31 + + + + + + PapamakariosGeorge + PavlakouTheo + MurrayIain + + Masked autoregressive flow for density estimation + Advances in neural information processing systems + + GuyonI. + LuxburgU. Von + BengioS. + WallachH. + FergusR. + VishwanathanS. + GarnettR. + + Curran Associates, Inc. + 2017 + 30 + https://proceedings.neurips.cc/paper_files/paper/2017/file/6c1da886822c67822bcf3679d04369fa-Paper.pdf + + + + + + + + KingmaDurk P + SalimansTim + JozefowiczRafal + ChenXi + SutskeverIlya + WellingMax + + Improved variational inference with inverse autoregressive flow + Advances in neural information processing systems + 2016 + 29 + 4743 + 4751 + + + + + + GermainMathieu + GregorKarol + MurrayIain + LarochelleHugo + + MADE: Masked autoencoder for distribution estimation + International conference on machine learning + PMLR + 2015 + 881 + 889 + + + + + + IoffeSergey + SzegedyChristian + + Batch normalization: Accelerating deep network training by reducing internal covariate shift + International conference on machine learning + PMLR + 2015 + 448 + 456 + + + + + + SuBingyue + WangYu + SchiavazziDaniele E + LiuFang + + Differentially private normalizing flows for density estimation, data synthesis, and variational inference with application to electronic health records + arXiv preprint arXiv:2302.05787 + 2023 + + + + + + FriedmanJerome H + + Multivariate adaptive regression splines + The annals of statistics + Institute of Mathematical Statistics + 1991 + 19 + 1 + 10.1214/aos/1176347963 + 1 + 67 + + + + + + GramacyRobert B + + Tgp: An R package for Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian process models + Journal of Statistical Software + 2007 + 19 + 10.18637/jss.v019.i09 + 1 + 46 + + + + + + Sobol’Ilya M + + Theorems and examples on high dimensional model representation + Reliability Engineering and System Safety + Elsevier + 2003 + 79 + 2 + 10.1016/S0951-8320(02)00229-6 + 187 + 193 + + + + + + BrennanMichael + BigoniDaniele + ZahmOlivier + SpantiniAlessio + MarzoukYoussef + + Greedy inference with structure-exploiting lazy maps + Advances in Neural Information Processing Systems + 2020 + 33 + 8330 + 8342 + + + + + + SiahkoohiAli + RizzutiGabrio + LouboutinMathias + WittePhilipp + HerrmannFelix + + Preconditioned training of normalizing flows for variational inference in inverse problems + Third symposium on advances in approximate bayesian inference + 2021 + https://openreview.net/forum?id=P9m1sMaNQ8T + + + + + + El MoselhyTarek A + MarzoukYoussef M + + Bayesian inference with optimal maps + Journal of Computational Physics + Elsevier + 2012 + 231 + 23 + 10.1016/j.jcp.2012.07.022 + 7815 + 7850 + + + + + + Sobol’Ilya M + + On the distribution of points in a cube and the approximate evaluation of integrals + Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki + Russian Academy of Sciences, Branch of Mathematical Sciences + 1967 + 7 + 4 + 10.1016/0041-5553(67)90144-9 + 784 + 802 + + + + + + Abril-PlaOriol + AndreaniVirgile + CarrollColin + DongLarry + FonnesbeckChristopher J + KochurovMaxim + KumarRavin + LaoJunpeng + LuhmannChristian C + MartinOsvaldo A + others + + PyMC: A modern, and comprehensive probabilistic programming framework in Python + PeerJ Computer Science + PeerJ Inc. + 2023 + 9 + 10.7717/peerj-cs.1516 + e1516 + + + + + + + LuttinenJaakko + + Bayespy: Variational Bayesian inference in Python + The Journal of Machine Learning Research + JMLR. org + 2016 + 17 + 1 + 1419 + 1424 + + + + + + BinghamEli + ChenJonathan P + JankowiakMartin + ObermeyerFritz + PradhanNeeraj + KaraletsosTheofanis + SinghRohit + SzerlipPaul + HorsfallPaul + GoodmanNoah D + + Pyro: Deep universal probabilistic programming + Journal of machine learning research + 2019 + 20 + 28 + 1 + 6 + + + + + + HugginsBobby + LiChengkun + TobabenMarlon + AarnosMikko J. + AcerbiLuigi + + PyVBMC: Efficient Bayesian inference in Python + Journal of Open Source Software + The Open Journal + 2023 + 8 + 86 + https://doi.org/10.21105/joss.05428 + 10.21105/joss.05428 + 5428 + + + + + +
diff --git a/joss.06309/10.21105.joss.06309.pdf b/joss.06309/10.21105.joss.06309.pdf new file mode 100644 index 0000000000..db7c437803 Binary files /dev/null and b/joss.06309/10.21105.joss.06309.pdf differ diff --git a/joss.06309/media/1a3b7e4d933d9bdf4bbb379153304a9582bef712.png b/joss.06309/media/1a3b7e4d933d9bdf4bbb379153304a9582bef712.png new file mode 100644 index 0000000000..78865493b1 Binary files /dev/null and b/joss.06309/media/1a3b7e4d933d9bdf4bbb379153304a9582bef712.png differ diff --git a/joss.06309/media/1cac7f5f56892bf8a24224f2ea35f1336cd54f1b.png b/joss.06309/media/1cac7f5f56892bf8a24224f2ea35f1336cd54f1b.png new file mode 100644 index 0000000000..4fe4df0cca Binary files /dev/null and b/joss.06309/media/1cac7f5f56892bf8a24224f2ea35f1336cd54f1b.png differ diff --git a/joss.06309/media/2eaec4a76818e93b9b9882941aef7ef703afc362.png b/joss.06309/media/2eaec4a76818e93b9b9882941aef7ef703afc362.png new file mode 100644 index 0000000000..327a31edd9 Binary files /dev/null and b/joss.06309/media/2eaec4a76818e93b9b9882941aef7ef703afc362.png differ diff --git a/joss.06309/media/4c4bffb56a97ca19317250a2bc63fac79dbef607.png b/joss.06309/media/4c4bffb56a97ca19317250a2bc63fac79dbef607.png new file mode 100644 index 0000000000..ca0a3004bb Binary files /dev/null and b/joss.06309/media/4c4bffb56a97ca19317250a2bc63fac79dbef607.png differ diff --git a/joss.06309/media/513ae401afb64dbf332607f3b0287961e9b0269d.png b/joss.06309/media/513ae401afb64dbf332607f3b0287961e9b0269d.png new file mode 100644 index 0000000000..d3ef5cb8e5 Binary files /dev/null and b/joss.06309/media/513ae401afb64dbf332607f3b0287961e9b0269d.png differ diff --git a/joss.06309/media/525926ee9a119ed1cdade2148856af20429d5d8c.png b/joss.06309/media/525926ee9a119ed1cdade2148856af20429d5d8c.png new file mode 100644 index 0000000000..e6a7b9e635 Binary files /dev/null and b/joss.06309/media/525926ee9a119ed1cdade2148856af20429d5d8c.png differ diff --git a/joss.06309/media/5bf7588f16a97199f322e84388ea950d406cb51c.png b/joss.06309/media/5bf7588f16a97199f322e84388ea950d406cb51c.png new file mode 100644 index 0000000000..0d45855b54 Binary files /dev/null and b/joss.06309/media/5bf7588f16a97199f322e84388ea950d406cb51c.png differ diff --git a/joss.06309/media/65f4baeaf3d37ac4ed61631d0210ef5ded70b6d0.png b/joss.06309/media/65f4baeaf3d37ac4ed61631d0210ef5ded70b6d0.png new file mode 100644 index 0000000000..6463968a19 Binary files /dev/null and b/joss.06309/media/65f4baeaf3d37ac4ed61631d0210ef5ded70b6d0.png differ diff --git a/joss.06309/media/6f4cecc646a81891401cfd2b466cd7835a17cdad.png b/joss.06309/media/6f4cecc646a81891401cfd2b466cd7835a17cdad.png new file mode 100644 index 0000000000..87022ade94 Binary files /dev/null and b/joss.06309/media/6f4cecc646a81891401cfd2b466cd7835a17cdad.png differ diff --git a/joss.06309/media/7402b5de266da8fe49096be58754bd349b724d4f.png b/joss.06309/media/7402b5de266da8fe49096be58754bd349b724d4f.png new file mode 100644 index 0000000000..266cda53a3 Binary files /dev/null and b/joss.06309/media/7402b5de266da8fe49096be58754bd349b724d4f.png differ diff --git a/joss.06309/media/74f5e0d84680a4185e1810fc9e21bb5ecd5c40e6.png b/joss.06309/media/74f5e0d84680a4185e1810fc9e21bb5ecd5c40e6.png new file mode 100644 index 0000000000..0ace68acc1 Binary files /dev/null and b/joss.06309/media/74f5e0d84680a4185e1810fc9e21bb5ecd5c40e6.png differ diff --git a/joss.06309/media/756a55cf8967e719623055cedf2fda47c9c66404.png b/joss.06309/media/756a55cf8967e719623055cedf2fda47c9c66404.png new file mode 100644 index 0000000000..5934d4cbde Binary files /dev/null and b/joss.06309/media/756a55cf8967e719623055cedf2fda47c9c66404.png differ diff --git a/joss.06309/media/7fac50cefd5bf44f00401262143b48def080ecd1.png b/joss.06309/media/7fac50cefd5bf44f00401262143b48def080ecd1.png new file mode 100644 index 0000000000..30fac46f94 Binary files /dev/null and b/joss.06309/media/7fac50cefd5bf44f00401262143b48def080ecd1.png differ diff --git a/joss.06309/media/83c10a95c9e8577de7159e5332cb410dd8c94d0a.png b/joss.06309/media/83c10a95c9e8577de7159e5332cb410dd8c94d0a.png new file mode 100644 index 0000000000..18b0d1813e Binary files /dev/null and b/joss.06309/media/83c10a95c9e8577de7159e5332cb410dd8c94d0a.png differ diff --git a/joss.06309/media/8c0395f4f978a6a90a815298a1948c4239bce4a2.png b/joss.06309/media/8c0395f4f978a6a90a815298a1948c4239bce4a2.png new file mode 100644 index 0000000000..1257e81ef0 Binary files /dev/null and b/joss.06309/media/8c0395f4f978a6a90a815298a1948c4239bce4a2.png differ diff --git a/joss.06309/media/b855e6f6aa193e6cdc7690e3a4c5f157655f4159.png b/joss.06309/media/b855e6f6aa193e6cdc7690e3a4c5f157655f4159.png new file mode 100644 index 0000000000..6723a4faca Binary files /dev/null and b/joss.06309/media/b855e6f6aa193e6cdc7690e3a4c5f157655f4159.png differ diff --git a/joss.06309/media/c3535b90c591a17ed3ac29cf77228ab16ca394a5.png b/joss.06309/media/c3535b90c591a17ed3ac29cf77228ab16ca394a5.png new file mode 100644 index 0000000000..20857e9cf1 Binary files /dev/null and b/joss.06309/media/c3535b90c591a17ed3ac29cf77228ab16ca394a5.png differ diff --git a/joss.06309/media/dc902429c568a01233895f939fde04beefadfa2c.png b/joss.06309/media/dc902429c568a01233895f939fde04beefadfa2c.png new file mode 100644 index 0000000000..11d7af88ae Binary files /dev/null and b/joss.06309/media/dc902429c568a01233895f939fde04beefadfa2c.png differ diff --git a/joss.06309/media/f0812f32444b59e7c811fcf2e3d28438ccc816f9.png b/joss.06309/media/f0812f32444b59e7c811fcf2e3d28438ccc816f9.png new file mode 100644 index 0000000000..be2760811e Binary files /dev/null and b/joss.06309/media/f0812f32444b59e7c811fcf2e3d28438ccc816f9.png differ diff --git a/joss.06309/media/fb0b6c03997f222be5f8ed5c25f9e4ff98bf75be.png b/joss.06309/media/fb0b6c03997f222be5f8ed5c25f9e4ff98bf75be.png new file mode 100644 index 0000000000..43dd84e6bf Binary files /dev/null and b/joss.06309/media/fb0b6c03997f222be5f8ed5c25f9e4ff98bf75be.png differ