StocPy is an expressive probabilistic programming language, provided as a Python library. The language follows the "lightweight implementations" style introduced by Wingate, Stuhlmüller and Goodman (link to pdf).
StocPy was developed as part of my masters thesis, under the supervision of Dan Roy and Zoubin Ghahramani.
Please send any questions/suggestions/issues to Razvan Ranca - [email protected]
Warning: This is alpha-quality software. Expect bugs.
- Intuitive, succinct and flexible model specification , thanks to Python
- Can use any probabilistic primitive defined by scipy.stats
- Inference engine easily extensible to new inference methods, currently supports Metropolis, Slice Sampling and weighted combinations of the two
- Automatic rewriting of user models to improve inference, currently via partitioning the priors.
A model is specified as a normal Python function which uses the probabilistic primitives provided by the StocPy library. For instance, inferring the mean of a normal distribution of variance 1 based on a single data point would be written as:
def guessMean():
mean = stocPy.normal(0, 1, obs=True)
stocPy.normal(mean, 1, cond=2)
Here, we define a prior on the mean as a Normal(0,1) and say that we wish to observe what values the mean takes in our simulation (i.e. we wish to store the sampled means). In the next line we say that sampling from a normal with variance 1 which is centred around the mean we obtained in the previous step resulted in us observing a certain data point (2 in this case).
Once this model is defined we can perform inference on it by calling:
samples = stocPy.getSamples(guessMean, 10000, alg="met")
Where we are asking for 10,000 samples, generated with the Metropolis inference technique (which is also the default). This will return 10,000 samples for each of the variables we asked to be observed (in the above model, this would be just the mean).
Finally, to quickly visualise the distribution of our sampled mean, we can use one of the provided utility functions:
stocPy.plotSamples(samples)
There are 3 methods available for performing inference. These are:
stocPy.getSamples(model, noSamps)
- Generate a certain number of samplesstocPy.getTimedSamples(model, noSecs)
- Generate samples until a certain number of seconds have ellapsedstocPy.getSamplesByLL(model, noLLs)
- Generate samples until a certain number of model simulations have been run. This is useful when benchmarking different inference techniques which may simulate the model multiple times to generate a single sample.
All 3 of these functions take, via the optional alg
keyword, the method by which to perform inference. The available methods are:
alg="met"
- Metropolis from the prior. The default option.alg="sliceTD"
- Slice sampling based inference.alg="sliceMet"
- A mixture of slice and Metropolis. The mixture weight can be specified by thethresh
parameter. By default this is 0.1, corresponding to a Metropolis:Slice 1:9 mixture.alg="sliceNoTrans"
- Slice which explicitly disallows trans-dimensional jumps. Useful for analysing the properties of different modes of a distribution.
To complement the following descriptions, usage examples covering most of stocPy's functionality are included in the simpleModels and the anglicanModels directories. Each model is explained in its respective ".py" file.
In addition to specifying that a variable should be observed upon it's definition (as in the guessMean example), we can also specifically tell stocPy to observe variables or expressions via the observe
method. The syntax is:
stocPy.observe(expression, name="expressionName")
Using this statement we can record the values of arbitrary expressions at any point in our model specification. The name
parameter must always be specified when calling the observe
method. This allows stocPy to aggregate all observations with the same name and gives the user a large degree of flexibility regarding the types of questions they wish to pose. For instance, the user could assign different names to an observe
statement based on the value of a stochastic variable and thus see the effect that variable has on the observed expression.
A new feature available in StocPy is that of automatically partitioning priors. Intuitively, this re-writes the prior over a variable as a sum of narrower and wider distributions and thus gives Metropolis-Hastings more choice regarding the scale at which it operates. This additional freedom should result in more efficient inference when the marginal prior of a variable is substantially wider than the same variable's marginal posterior.
To use this feature we must specify our variable via the stocPrim
function (even if a custom wrapper exists for the variable) and pass a parameter representing a partition depth (or prior over partition depths). A big partition depth essentially means we believe there is a large difference between the width of the prior and the width of the posterior (rule of thumb: partition depth should be larger than log(priorVariance/posteriorVariance) ).
For instance, we can specify:
mean = stocPy.stocPrim("norm", (0, 1), obs=True, part=5)
to partition the prior over the mean up to a depth of 5. Or we could write:
mean = stocPy.stocPrim("norm", (0, 1), obs=True, part=stocPy.unifCont(2,10))
to place a prior over the partition depth.
Note: This feature is still experimental. Use with care. Not currently supported on the crp
primitve.
In order for the inference engine to keep track of the stochastic values present in the current program trace, the naming method introduced by Wingate, Stuhlmüller and Goodman (link to pdf) is employed. This is done via a source-to-source compilation of the file in which a StocPy model is defined (models defined over multiple files are not currently supported).
It is possible, however, to turn-off the automatic compilation and allow the user to manually specify the names of their variables. This may be useful if the user wishes to avoid the overhead introduced by the automatic compilation (eg: for time-critical models) or, more practically, if the user wishes to experiment with novel naming strategies. In order to get rid of the compilation, it is only necessary to pass autoNames = False
to the inference engine. For instance:
samples = stocPy.getSamples(guessMean, 10000, alg="met", autoNames=False)
Warning: If this parameter is passed, then it is then the user's responsibility to specify a name
parameter for each call to a stocPy primitive. Further, if the names are not carefully chosen the result may be slowed down or even incorrect inference. See the aforementioned paper for details.
In order for StocPy to be able to perform inference on a python model, the model must define its stochastic primitives via StocPy functions, so that the library can keep track of them throughout the model execution.
A generic stocPrim
function is also provided through which any stochastic primitive implemented in scipy.stats can be used in StocPy models.
The usage of the stocPrim is demonstrated in the "simpleModels/poisson.py" model, but in short it is:
stocPy.stocPrim(distributionName, distributionParameters)
Here distributionName
is a string with the exact name of the primitive in scipy.stats (eg: "beta"), and distributionParameters
is a tuple holding an arbitrary number of parameters as taken by the scipy.stats rvs
function corresponding to our distribution.
Note: When using the stocPrim
function the parameters must be given in the same order as that defined by the corresponding scipy.stats rvs
function. A tuple must be given even if there is a single parameter (i.e use (parameter, )
).
It can be convenient to add more commonly used primitives directly to StocPy rather than using the generic stocPrim
function. In this way the primitives can also accept different parametrizations , for instance (start, end) vs (start, length) for a uniform distribution.
At the moment StocPy defines the following wrappers:
- Normal -
stocPy.normal(mean, stDev)
- Poisson -
stocPy.poisson(shape)
- Uniform Continuous -
stocPy.unifCont(start, end)
- Student T -
stocPy.studentT(dof)
- Inverse Gamma -
stocPy.invGamma(shape, scale)
- Beta -
stocPy.beta(a, b)
- Categorical -
stocPy.categorical(probs)
- Chinese Restaurant Process -
stocPy.crp(a, maxClasses=None)
, where the 2nd argument is optional
Creating new wrappers is easily done by performing the following changes in stocPy.py.
- Add the desired scipy.stats distribution to the
dists
array - Add a (key:value) pair consisting of (distribution name : distribution index in
dists
array) to theerps
dictionary - Create a 3 line wrapper function which gets the distribution parameters from the user, calls
initERP
and then returnsgetERP
. See the wrappers provided in stocPy (normal, poisson, etc.) to understand the pattern this wrapper follows (Note: thename
attribute has to come last in the wrapper function's parameter list and must not be passed via *args or **kwargs, otherwise the automatic naming breaks down).
While scipy.stats has good support for univariate distributions, we sometimes need other stochastic primitives. In this case a new class implementing the desired primitive has to be defined. In StocPy any class implementing a new primitive must have:
rvs(self, params)
- which returns a random sample from the defined distribution when parametrized byparams
logpdf(self, val, params)
ORlogpmf(self, val, params)
- depending on whether the primitive is continuous of discrete. This returns the probability density or the probability mass associated with the valueval
under the distribution parametrized byparams
ppf(self, prob, params)
- optional, needed for partitioned priors - the percent point function of the distribution (also known as the inverse cumulative distribution function).
Once a class with the above methods is defined, the steps are similar to those for adding a custom wrapper. Namely:
- Add an instance of the desired class to the
dists
array. - Add a (key:value) pair consisting of (distribution's name : distribution index in
dists
array) to theerps
dictionary - Optional: define a 3 line wrapper function through which the user can access the custom distribution class without having to go through
stocPrim
. In order for the user view of the distribution names to be consistent, it is a convention in StocPy to give a distribution's wrapper function the same name as the distribution's name defined inerps
. This may mean the internal distribution class has a different name, or different capitalization.
Two examples of custom distribution classes are given in stocPy.py
, namely Categorical
and CRP
.
The easiest way is to clone the repository with git clone [email protected]:RazvanRanca/StocPy.git
and install StocPy using python setup.py install
(which might require root access).
This repository is intended to hold a version of StocPy that is relatively easy to use. There is also a development repository which contains some work-in-progress features and a lot more experimental data regarding the performance of the inference methods on different models.
Please send any questions/suggestions/issues to Razvan Ranca - [email protected]