-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{WIP} Developer guide update #7510
Draft
fonnesbeck
wants to merge
8
commits into
pymc-devs:main
Choose a base branch
from
fonnesbeck:developer_guide_update
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
cdcc1da
Updated text
fonnesbeck e9ecab2
Merge branch 'main' into developer_guide_update
fonnesbeck be1222e
Updates
fonnesbeck ca9ccce
Additional updates
fonnesbeck 3cc7039
Updated authorship
fonnesbeck 36ec6f3
Additional updates to text
fonnesbeck 0a1abe9
Merge branch 'developer_guide_update' of github.com:fonnesbeck/pymc i…
fonnesbeck e877c74
Add ToC
fonnesbeck File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -4,38 +4,70 @@ orphan: true | |||||
|
||||||
# PyMC Developer Guide | ||||||
|
||||||
{doc}`PyMC <index>` is a Python package for Bayesian statistical modeling built on top of {doc}`PyTensor <pytensor:index>`. | ||||||
This document aims to explain the design and implementation of probabilistic programming in PyMC, with comparisons to other PPLs like TensorFlow Probability (TFP) and Pyro. | ||||||
{doc}`PyMC <index>` is a Python package for Bayesian statistical modeling built on top of the {doc}`PyTensor <pytensor:index>` library. | ||||||
This document explains the design and implementation of probabilistic programming in PyMC, with comparisons to other probabilistic programming libraries like TensorFlow Probability (TFP) and Pyro. | ||||||
A user-facing API introduction can be found in the {ref}`API quickstart <pymc_overview>`. | ||||||
A more accessible, user facing deep introduction can be found in [Peadar Coyle's probabilistic programming primer](https://github.com/springcoil/probabilisticprogrammingprimer). | ||||||
|
||||||
## Distribution | ||||||
An accessible introduction to building models with PyMC can be found in [our PyData London 2022 tutorial](https://github.com/fonnesbeck/probabilistic_python). | ||||||
|
||||||
## Table of Contents | ||||||
- [Distributions](#distributions) | ||||||
- [Reflection](#reflection) | ||||||
- [PyMC in Comparison](#pymc-in-comparison) | ||||||
- [PyMC](#pymc) | ||||||
- [Tensorflow Probability](#tensorflow-probability) | ||||||
- [Pyro](#pyro) | ||||||
- [Behind the scenes of the logp function](#behind-the-scenes-of-the-logp-function) | ||||||
- [Model Context and Random Variables](#model-context-and-random-variables) | ||||||
- [Additional things that pm.Model does](#additional-things-that-pmmodel-does) | ||||||
- [Logp and dlogp](#logp-and-dlogp) | ||||||
- [Inference](#inference) | ||||||
- [MCMC](#mcmc) | ||||||
- [Transition kernel](#transition-kernel) | ||||||
- [Dynamic HMC](#dynamic-hmc) | ||||||
- [Variational Inference (VI)](#variational-inference-vi) | ||||||
- [Some challenges and insights from implementing VI](#some-challenges-and-insights-from-implementing-vi) | ||||||
- [Forward sampling](#forward-sampling) | ||||||
- [Extending PyMC](#extending-pymc) | ||||||
- [What we got wrong](#what-we-got-wrong) | ||||||
- [Shape](#shape) | ||||||
- [Random methods in numpy](#random-methods-in-numpy) | ||||||
- [Samplers are in Python](#samplers-are-in-python) | ||||||
|
||||||
## Distributions | ||||||
|
||||||
Probability distributions in PyMC are implemented as classes that inherit from {class}`~pymc.Continuous` or {class}`~pymc.Discrete`. | ||||||
Either of these inherit {class}`~pymc.Distribution` which defines the high level API. | ||||||
Both of these inherit {class}`~pymc.Distribution` which defines the high level API. | ||||||
|
||||||
For a detailed introduction on how a new distribution should be implemented check out the {ref}`guide on implementing distributions <implementing_distribution>`. | ||||||
For a detailed introduction on how a specific statistical distribution should be implemented check out the {ref}`guide on implementing distributions <implementing_distribution>`. | ||||||
|
||||||
|
||||||
## Reflection | ||||||
|
||||||
How tensor/value semantics for probability distributions are enabled in PyMC: | ||||||
Let's consider how the tensor/value semantics for probability distributions are enabled in PyMC. | ||||||
|
||||||
Model random variables are created by calling probability distribution classes with parameters inside of a `pm.Model` context, using a syntax analogous to statistical notation. For example, a normal distribution with a specified mean and standard deviation is written as: | ||||||
|
||||||
$$ | ||||||
z \sim \text{Normal}(0, 5) | ||||||
$$ | ||||||
|
||||||
In PyMC, model variables are defined by calling probability distribution classes with parameters: | ||||||
And in PyMC: | ||||||
|
||||||
```python | ||||||
z = Normal("z", 0, 5) | ||||||
with pm.Model(): | ||||||
z = pm. Normal("z", 0, 5) | ||||||
``` | ||||||
|
||||||
This is done inside the context of a ``pm.Model``, which intercepts some information, for example to capture known dimensions. | ||||||
The notation aligns with the typically used math notation: | ||||||
The context manager intercepts information about the distribution relevant to the model, such as the variable dimension and any transforms, and registers it with the model. | ||||||
|
||||||
$$ | ||||||
z \sim \text{Normal}(0, 5) | ||||||
$$ | ||||||
The call to a {class}`~pymc.Distribution` constructor returns an PyTensor {class}`~pytensor.tensor.TensorVariable`, which is a symbolic representation of the model variable and the graph of inputs it depends on. | ||||||
|
||||||
A call to a {class}`~pymc.Distribution` constructor as shown above returns an PyTensor {class}`~pytensor.tensor.TensorVariable`, which is a symbolic representation of the model variable and the graph of inputs it depends on. | ||||||
Under the hood, the variables are created through the {meth}`~pymc.Distribution.dist` API, which calls the {class}`~pytensor.tensor.random.basic.RandomVariable` {class}`~pytensor.graph.op.Op` corresponding to the distribution. | ||||||
```python | ||||||
print(type(z)) | ||||||
# ==> <class 'pytensor.tensor.variable.TensorVariable'> | ||||||
``` | ||||||
|
||||||
Under the hood, the variables are created through the {meth}`~pymc.Distribution.dist` classmethod, which calls the {class}`~pytensor.tensor.random.basic.RandomVariable` {class}`~pytensor.graph.op.Op` corresponding to the distribution. | ||||||
|
||||||
At a high level of abstraction, the idea behind ``RandomVariable`` ``Op``s is to create symbolic variables (``TensorVariable``s) that can be associated with the properties of a probability distribution. | ||||||
For example, the ``RandomVariable`` ``Op`` which becomes part of the symbolic computation graph is associated with the random number generators or probability mass/density functions of the distribution. | ||||||
|
@@ -57,7 +89,7 @@ Now, because the ``NormalRV`` can be associated with the [probability density fu | |||||
with pm.Model(): | ||||||
z = pm.Normal("z", 0, 5) | ||||||
symbolic = pm.logp(z, 2.5) | ||||||
numeric = symbolic.eval() | ||||||
symbolic.eval() | ||||||
# array(-2.65337645) | ||||||
``` | ||||||
|
||||||
|
@@ -92,14 +124,17 @@ $$ | |||||
|
||||||
```python | ||||||
with pm.Model() as model: | ||||||
z = pm.Normal('z', mu=0., sigma=5.) # ==> pytensor.tensor.var.TensorVariable | ||||||
x = pm.Normal('x', mu=z, sigma=1., observed=5.) # ==> pytensor.tensor.var.TensorVariable | ||||||
z = pm.Normal('z', mu=0., sigma=5.) | ||||||
# ==> pytensor.tensor.var.TensorVariable | ||||||
x = pm.Normal('x', mu=z, sigma=1., observed=5.) | ||||||
# ==> pytensor.tensor.var.TensorVariable | ||||||
# The log-prior of z=2.5 | ||||||
pm.logp(z, 2.5).eval() # ==> -2.65337645 | ||||||
# ??????? | ||||||
x.logp({'z': 2.5}) # ==> -4.0439386 | ||||||
# ??????? | ||||||
model.logp({'z': 2.5}) # ==> -6.6973152 | ||||||
pm.logp(z, 2.5).eval() | ||||||
# ==> -2.65337645 | ||||||
x.logp({'z': 2.5}) | ||||||
# ==> -4.0439386 | ||||||
model.logp({'z': 2.5}) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
# ==> -6.6973152 | ||||||
``` | ||||||
|
||||||
### Tensorflow Probability | ||||||
|
@@ -110,25 +145,35 @@ import tensorflow.compat.v1 as tf | |||||
from tensorflow_probability import distributions as tfd | ||||||
|
||||||
with tf.Session() as sess: | ||||||
z_dist = tfd.Normal(loc=0., scale=5.) # ==> <class 'tfp.python.distributions.normal.Normal'> | ||||||
z = z_dist.sample() # ==> <class 'tensorflow.python.framework.ops.Tensor'> | ||||||
x = tfd.Normal(loc=z, scale=1.).log_prob(5.) # ==> <class 'tensorflow.python.framework.ops.Tensor'> | ||||||
z_dist = tfd.Normal(loc=0., scale=5.) | ||||||
# ==> <class 'tfp.python.distributions.normal.Normal'> | ||||||
z = z_dist.sample() | ||||||
# ==> <class 'tensorflow.python.framework.ops.Tensor'> | ||||||
x = tfd.Normal(loc=z, scale=1.).log_prob(5.) | ||||||
# ==> <class 'tensorflow.python.framework.ops.Tensor'> | ||||||
model_logp = z_dist.log_prob(z) + x | ||||||
print(sess.run(x, feed_dict={z: 2.5})) # ==> -4.0439386 | ||||||
print(sess.run(model_logp, feed_dict={z: 2.5})) # ==> -6.6973152 | ||||||
print(sess.run(x, feed_dict={z: 2.5})) | ||||||
# ==> -4.0439386 | ||||||
print(sess.run(model_logp, feed_dict={z: 2.5})) | ||||||
# ==> -6.6973152 | ||||||
``` | ||||||
|
||||||
### Pyro | ||||||
|
||||||
```python | ||||||
z_dist = dist.Normal(loc=0., scale=5.) # ==> <class 'pyro.distributions.torch.Normal'> | ||||||
z = pyro.sample("z", z_dist) # ==> <class 'torch.Tensor'> | ||||||
z_dist = dist.Normal(loc=0., scale=5.) | ||||||
# ==> <class 'pyro.distributions.torch.Normal'> | ||||||
z = pyro.sample("z", z_dist) | ||||||
# ==> <class 'torch.Tensor'> | ||||||
# reset/specify value of z | ||||||
z.data = torch.tensor(2.5) | ||||||
x = dist.Normal(loc=z, scale=1.).log_prob(5.) # ==> <class 'torch.Tensor'> | ||||||
x = dist.Normal(loc=z, scale=1.).log_prob(5.) | ||||||
# ==> <class 'torch.Tensor'> | ||||||
model_logp = z_dist.log_prob(z) + x | ||||||
x # ==> -4.0439386 | ||||||
model_logp # ==> -6.6973152 | ||||||
x | ||||||
# ==> -4.0439386 | ||||||
model_logp | ||||||
# ==> -6.6973152 | ||||||
``` | ||||||
|
||||||
|
||||||
|
@@ -159,21 +204,24 @@ As explained above, distribution in a ``pm.Model()`` context automatically turn | |||||
To get the logp of a free\_RV is just evaluating the ``logp()`` [on itself](https://github.com/pymc-devs/pymc/blob/6d07591962a6c135640a3c31903eba66b34e71d8/pymc/model.py#L1212-L1213): | ||||||
|
||||||
```python | ||||||
# self is a pytensor.tensor with a distribution attached | ||||||
self.logp_sum_unscaledt = distribution.logp_sum(self) | ||||||
self.logp_nojac_unscaledt = distribution.logp_nojac(self) | ||||||
class Normal(Continuous): | ||||||
def logp(self, value): | ||||||
mu = self.mu | ||||||
tau = self.tau | ||||||
return bound( | ||||||
(-tau * (value - mu) ** 2 + pt.log(tau / np.pi / 2.0)) / 2.0, | ||||||
tau > 0, | ||||||
) | ||||||
``` | ||||||
|
||||||
Or for an observed RV. it evaluate the logp on the data: | ||||||
The logp evaluations are represented as tensors (``RV.logpt``). When we combine different ``logp`` values (for example, by summing all ``RVs.logpt`` to obtain the total logp for the model), PyTensor manages the dependencies automatically during the graph construction and compilation process. | ||||||
This dependence among nodes in the model graph means that whenever you want to generate a new function that takes new input tensors, you either need to regenerate the graph with the appropriate dependencies, or replace the node by editing the existing graph. | ||||||
The latter is facilitated by PyTensor's ``pytensor.clone_replace()`` function. | ||||||
|
||||||
```python | ||||||
self.logp_sum_unscaledt = distribution.logp_sum(data) | ||||||
self.logp_nojac_unscaledt = distribution.logp_nojac(data) | ||||||
``` | ||||||
|
||||||
### Model context and Random Variable | ||||||
### Model Context and Random Variables | ||||||
|
||||||
I like to think that the ``with pm.Model() ...`` is a key syntax feature and *the* signature of PyMC model language, and in general a great out-of-the-box thinking/usage of the context manager in Python (with some critics, of course). | ||||||
A signature feature of PyMC's syntax is the ``with pm.Model() ...`` expression, which extends the functionality of the context manager in Python to make expressing Bayesian models as natural as possible. | ||||||
|
||||||
Essentially [what a context manager does](https://www.python.org/dev/peps/pep-0343/) is: | ||||||
|
||||||
|
@@ -193,38 +241,33 @@ finally: | |||||
VAR.__exit__() | ||||||
``` | ||||||
|
||||||
or conceptually: | ||||||
|
||||||
```python | ||||||
with EXPR as VAR: | ||||||
# DO SOMETHING | ||||||
USERCODE | ||||||
# DO SOME ADDITIONAL THINGS | ||||||
``` | ||||||
|
||||||
So what happened within the ``with pm.Model() as model: ...`` block, besides the initial set up ``model = pm.Model()``? | ||||||
Starting from the most elementary: | ||||||
But what are the implications of this, besides the model instatiation ``model = pm.Model()``? | ||||||
|
||||||
### Random Variable | ||||||
|
||||||
From the above session, we know that when we call e.g. ``pm.Normal('x', ...)`` within a Model context, it returns a random variable. | ||||||
Thus, we have two equivalent ways of adding random variable to a model: | ||||||
As we have seen already, when we call e.g. ``pm.Normal('x', ...)`` within a Model context, it returns a random variable. | ||||||
|
||||||
```python | ||||||
with pm.Model() as m: | ||||||
with pm.Model() as model: | ||||||
x = pm.Normal('x', mu=0., sigma=1.) | ||||||
|
||||||
print(type(x)) # ==> <class 'pytensor.tensor.var.TensorVariable'> | ||||||
print(m.free_RVs) # ==> [x] | ||||||
print(logpt(x, 5.0)) # ==> Elemwise{switch,no_inplace}.0 | ||||||
print(logpt(x, 5.).eval({})) # ==> -13.418938533204672 | ||||||
print(m.logp({'x': 5.})) # ==> -13.418938533204672 | ||||||
print(type(x)) | ||||||
# ==> <class 'pytensor.tensor.var.TensorVariable'> | ||||||
print(model.free_RVs) | ||||||
# ==> [x] | ||||||
print(pm.logp(x, 5.0)) | ||||||
# ==> Elemwise{switch,no_inplace}.0 | ||||||
print(pm.logp(x, 5.).eval({})) | ||||||
# ==> -13.418938533204672 | ||||||
print(model.compile_logp()({'x': 5.})) | ||||||
# ==> -13.418938533204672 | ||||||
``` | ||||||
|
||||||
In general, if a variable has observations (``observed`` parameter), the RV is an observed RV, otherwise if it has a ``transformed`` (``transform`` parameter) attribute, it is a transformed RV otherwise, it will be the most elementary form: a free RV. | ||||||
|
||||||
Note that this means that random variables with observations cannot be transformed. | ||||||
|
||||||
<!-- | ||||||
|
||||||
Below, I will take a deeper look into transformed RV. A normal user | ||||||
might not necessarily come in contact with the concept, since a | ||||||
transformed RV and ``TransformedDistribution`` are intentionally not | ||||||
|
@@ -245,18 +288,11 @@ usually created in order to optimise performance. But getting a | |||||
possible (see also in | ||||||
{ref}`doc <pymc_overview##Transformed-distributions-and-changes-of-variables>`): | ||||||
|
||||||
.. code:: python | ||||||
|
||||||
|
||||||
lognorm = Exp().apply(pm.Normal.dist(0., 1.)) | ||||||
lognorm | ||||||
|
||||||
|
||||||
.. parsed-literal:: | ||||||
|
||||||
<pymc.distributions.transforms.TransformedDistribution at 0x7f1536749b00> | ||||||
|
||||||
|
||||||
```python | ||||||
lognorm = Exp().apply(pm.Normal.dist(0., 1.)) | ||||||
lognorm | ||||||
# <pymc.distributions.transforms.TransformedDistribution at 0x7f1536749b00> | ||||||
``` | ||||||
|
||||||
Now, back to ``model.RV(...)`` - things returned from ``model.RV(...)`` | ||||||
are PyTensor tensor variables, and it is clear from looking at | ||||||
|
@@ -305,29 +341,33 @@ transformation by nested applying multiple transforms to a Distribution | |||||
|
||||||
z2 = Exp().apply(z) | ||||||
z2.transform is None # ==> True | ||||||
--> | ||||||
|
||||||
|
||||||
|
||||||
### Additional things that ``pm.Model`` does | ||||||
|
||||||
In a way, ``pm.Model`` is a tape machine that records what is being added to the model, it keeps track the random variables (observed or unobserved) and potential term (additional tensor that to be added to the model logp), and also deterministic transformation (as bookkeeping): | ||||||
|
||||||
* named\_vars | ||||||
* free\_RVs | ||||||
* observed\_RVs | ||||||
* deterministics | ||||||
* potentials | ||||||
* missing\_values | ||||||
|
||||||
The model context then computes some simple model properties, builds a bijection mapping that transforms between dictionary and numpy/PyTensor ndarray, thus allowing the ``logp``/``dlogp`` functions to have two equivalent versions: | ||||||
One takes a ``dict`` as input and the other takes an ``ndarray`` as input. | ||||||
More importantly, a ``pm.Model()`` contains methods to compile PyTensor functions that take Random Variables (that are also initialised within the same model) as input, for example: | ||||||
|
||||||
```python | ||||||
from pymc.blocking import DictToArrayBijection | ||||||
|
||||||
with pm.Model() as m: | ||||||
z = pm.Normal('z', 0., 10., shape=10) | ||||||
x = pm.Normal('x', z, 1., shape=10) | ||||||
|
||||||
print(m.initial_point) | ||||||
print(m.dict_to_array(m.initial_point)) # ==> m.bijection.map(m.initial_point) | ||||||
print(m.initial_point()) | ||||||
print(DictToArrayBijection.map(m.initial_point)) # ==> m.bijection.map(m.initial_point) | ||||||
print(m.bijection.rmap(np.arange(20))) | ||||||
# {'z': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]), 'x': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])} | ||||||
# [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] | ||||||
|
@@ -336,26 +376,19 @@ print(m.bijection.rmap(np.arange(20))) | |||||
|
||||||
```python | ||||||
list(filter(lambda x: "logp" in x, dir(pm.Model))) | ||||||
#['d2logp', | ||||||
# 'd2logp_nojac', | ||||||
# 'datalogpt', | ||||||
# 'dlogp', | ||||||
# 'dlogp_array', | ||||||
# 'dlogp_nojac', | ||||||
# 'fastd2logp', | ||||||
# 'fastd2logp_nojac', | ||||||
# 'fastdlogp', | ||||||
# 'fastdlogp_nojac', | ||||||
# 'fastlogp', | ||||||
# 'fastlogp_nojac', | ||||||
# 'logp', | ||||||
# 'logp_array', | ||||||
# 'logp_dlogp_function', | ||||||
# 'logp_elemwise', | ||||||
# 'logp_nojac', | ||||||
# 'logp_nojact', | ||||||
# 'logpt', | ||||||
# 'varlogpt'] | ||||||
# ['compile_d2logp', | ||||||
# 'compile_dlogp', | ||||||
# 'compile_logp', | ||||||
# 'd2logp', | ||||||
# 'datalogp', | ||||||
# 'dlogp', | ||||||
# 'logp', | ||||||
# 'logp_dlogp_function', | ||||||
# 'observedlogp', | ||||||
# 'point_logps', | ||||||
# 'potentiallogp', | ||||||
# 'varlogp', | ||||||
# 'varlogp_nojac'] | ||||||
``` | ||||||
|
||||||
### Logp and dlogp | ||||||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a thing anymore