Refactor pm.Simulator (2nd attempt) #4877

ricardoV94 · 2021-07-23T13:19:05Z

The previous attempt at this in #4802 revealed big issues between pickling and dynamically created classes (which subsist even after #4858). The alternative presented in this PR is a bit more cumbersome from the user-side but at least avoids these issues.

Here is a minimal example demonstrating the new API:

import numpy as np
import pymc3 as pm

data = np.random.normal(0, 1, size=10)

def my_simulator_fn(rng, loc, scale, size):
    return rng.normal(loc, scale, size=size)

class MySimulatorRV(pm.SimulatorRV):
    ndim_supp = 0
    ndims_params = [0, 0, 0]
    fn = my_simulator_fn
    distance = "gaussian"
    sum_stat = "sort"

my_simulator = MySimulatorRV()

with pm.Model() as m:
    simulator = pm.Simulator("sim", my_simulator, 0, 1, epsilon=1.0, observed=data)

If anyone has better suggestions about how to avoid pickling issues when creating dynamic classes please let me know! For example the classes used in these tests had to be defined outside of TestSimulator.setup_class.

codecov · 2021-07-23T13:36:53Z

Codecov Report

Merging #4877 (572cab6) into main (819f045) will increase coverage by 0.37%.
The diff coverage is 96.38%.

❗ Current head 572cab6 differs from pull request most recent head 17d9f2e. Consider uploading reports for the commit 17d9f2e to get more accurate results

@@            Coverage Diff             @@
##             main    #4877      +/-   ##
==========================================
+ Coverage   73.16%   73.53%   +0.37%     
==========================================
  Files          86       86              
  Lines       13838    13809      -29     
==========================================
+ Hits        10125    10155      +30     
+ Misses       3713     3654      -59

Impacted Files	Coverage Δ
pymc3/distributions/simulator.py	`88.60% <95.62%> (+62.74%)`	⬆️
pymc3/aesaraf.py	`91.34% <100.00%> (+0.07%)`	⬆️
pymc3/distributions/__init__.py	`100.00% <100.00%> (ø)`
pymc3/distributions/distribution.py	`84.12% <100.00%> (+3.69%)`	⬆️
pymc3/smc/sample_smc.py	`96.87% <100.00%> (+4.33%)`	⬆️
pymc3/smc/smc.py	`99.31% <100.00%> (+26.71%)`	⬆️
pymc3/distributions/multivariate.py	`63.84% <0.00%> (-7.62%)`	⬇️
pymc3/tests/conftest.py	`88.23% <0.00%> (-2.25%)`	⬇️
pymc3/step_methods/hmc/base_hmc.py	`90.24% <0.00%> (-0.82%)`	⬇️

pymc3/aesaraf.py

pymc3/distributions/simulator.py

pymc3/tests/test_smc.py

ricardoV94 · 2021-07-24T20:13:35Z

I decided to leave epsilon as an explicit argument to make it easier to manipulate (e.g., if we want our samplers to tune it), but can be a class attribute like distance and sum_stat if we don't think that will be ever used.

I can see it as being confusing like this...

junpenglao · 2021-07-25T09:37:40Z

Instead of subclassing and initializing the subclass, would something like:

my_simulator = pm.SimulatorRV(
    ndim_supp = 0
    ndims_params = [0, 0, 0]
    fn = my_simulator_fn
    distance = "gaussian"
    sum_stat = "sort"
)

works?

michaelosthege · 2021-07-25T10:42:30Z

Instead of subclassing and initializing the subclass, would something like:

my_simulator = pm.SimulatorRV(
    ndim_supp = 0
    ndims_params = [0, 0, 0]
    fn = my_simulator_fn
    distance = "gaussian"
    sum_stat = "sort"
)

works?

The thing needs to be a class, so pm.make_simulator() would have to create a class inside and return that. But these kinds of class definitions often cause problems with pickling.
But I agree that the API looks much nicer. @ricardoV94 what do you think?

junpenglao · 2021-07-26T07:38:59Z

The thing needs to be a class, so pm.make_simulator() would have to create a class inside and return that. But these kinds of class definitions often cause problems with pickling.

I am assuming that as long as pm.make_simulator() is called OUTSIDE of the with context we should be able to pickle it - @ricardoV94 do we have some minimal reproducible example that is easy to play with?

ricardoV94 · 2021-07-26T12:54:02Z

The thing needs to be a class, so pm.make_simulator() would have to create a class inside and return that. But these kinds of class definitions often cause problems with pickling.

I am assuming that as long as pm.make_simulator() is called OUTSIDE of the with context we should be able to pickle it - @ricardoV94 do we have some minimal reproducible example that is easy to play with?

Here is a minimal gist that implements the current API and has some checks: https://gist.github.com/ricardoV94/b632085b20be716b87fd146609168090

And here is another gist with a previous iteration on these ideas (using pickle instead of the more flexible cloudpickle that we are now using in PyMC3): https://gist.github.com/ricardoV94/2bb59a2ac18a29f501f5511c9671ebbc

aloctavodia

a few nitpicks

aloctavodia · 2021-07-28T06:09:57Z

pymc3/smc/sample_smc.py

@@ -157,6 +147,29 @@ def sample_smc(
        %282007%29133:7%28816%29>`__
    """

+    if kernel is not None:


This should not be deprecated, we still want to have kernels.

Yeah I know. We should just deprecate the "keywords" for the time being.

aloctavodia · 2021-07-28T06:10:53Z

pymc3/smc/sample_smc.py

+            DeprecationWarning,
+            stacklevel=2,
+        )
+    if save_sim_data is not None:


We still want this, as doing pm.sample_posterior_predictive could be potentially too expensive.

We no longer have access to the simulated data in the logp graph

aloctavodia · 2021-07-28T06:12:42Z

pymc3/smc/sample_smc.py

-        return posterior, {modelcontext(model).observed_RVs[0].name: np.array(sim_data)}
-    else:
-        return posterior
+    return idata if return_inferencedata else trace


when is the trace going to be deprecated? maybe we could only return inferencedata

I would only deprecate it when it is deprecated in pm.sample

aloctavodia · 2021-07-28T06:27:02Z

pymc3/distributions/simulator.py

+    def _sum_stat(cls, value):
+        return cls.sum_stat(value)
+
+
 class Simulator(NoDistribution):


nitpick, if we are going to have a Simulator and a SimulatorRV, maybe the first one should be renamed to PseudoLikelihood, SimulatedLikelihood AbcLikelihood or something similar. One counterargument to this proposal, is that if we are going to distinguish between simulator and pseudolikelihood, the distance and summary statistics should be part of the later not the former.

Yeah, that was one of the ideas, the SimulatorRV would simply be concerned with the random draws and the Pseudolikelihood would take care of the logp factor.

The downside is that we don't yet know how to create a "dynamic" logp using the optional user defined sum_stat and distance functions. It means we might need to have users subclass not only the SimulatorRV but also the Pseudolikelihood if they want to use non-default functions. If someone figures out #4831, then we could simply copy their strategy.

Defining the functions in the SimulatorRV was just an ugly hack to avoid forcing users to create two new classes...

michaelosthege · 2021-07-28T11:03:26Z

pymc3/aesaraf.py

+                warnings.warn(
+                    f"No value variable found for {rv_var}; "
+                    "the random variable will not be replaced."
+                )


What conclusion should a user make from from this warning?
Is it serious? If so we should raise. Otherwise maybe just _log.warn()?

We should probably raise

lucianopaz · 2021-07-28T11:54:59Z

Instead of subclassing and initializing the subclass, would something like:
my_simulator = pm.SimulatorRV(
    ndim_supp = 0
    ndims_params = [0, 0, 0]
    fn = my_simulator_fn
    distance = "gaussian"
    sum_stat = "sort"
)
works?
The thing needs to be a class, so pm.make_simulator() would have to create a class inside and return that. But these kinds of class definitions often cause problems with pickling.
But I agree that the API looks much nicer. @ricardoV94 what do you think?

Could this be made to work with pickle if we defined SimulatorRV as a metaclass or provide a __reduce__ method to it? Something like what's done here?

ricardoV94 requested review from aloctavodia, michaelosthege and junpenglao July 23, 2021 13:19

ricardoV94 force-pushed the restore_abc_alt branch from 9688ee7 to 1367252 Compare July 23, 2021 13:20

ricardoV94 force-pushed the restore_abc_alt branch 2 times, most recently from 8d72685 to 293aac1 Compare July 23, 2021 14:05

michaelosthege reviewed Jul 24, 2021

View reviewed changes

pymc3/aesaraf.py Outdated Show resolved Hide resolved

pymc3/aesaraf.py Show resolved Hide resolved

pymc3/distributions/simulator.py Show resolved Hide resolved

pymc3/distributions/simulator.py Outdated Show resolved Hide resolved

pymc3/tests/test_smc.py Show resolved Hide resolved

ricardoV94 force-pushed the restore_abc_alt branch from 293aac1 to 62bd386 Compare July 26, 2021 14:00

ricardoV94 added help wanted SMC Sequential Monte Carlo labels Jul 26, 2021

aloctavodia reviewed Jul 28, 2021

View reviewed changes

michaelosthege reviewed Jul 28, 2021

View reviewed changes

ricardoV94 added 2 commits August 3, 2021 12:13

Refactor pm.Simulator and introduce pm.SimulatorRV

6482cef

Deprecate ABC specific code in SMC

17d9f2e

ricardoV94 force-pushed the restore_abc_alt branch from 572cab6 to 17d9f2e Compare August 3, 2021 10:20

ricardoV94 closed this Aug 4, 2021

ricardoV94 mentioned this pull request Aug 4, 2021

Refactor pm.Simulator #4903

Merged

ricardoV94 deleted the restore_abc_alt branch January 31, 2022 09:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor pm.Simulator (2nd attempt) #4877

Refactor pm.Simulator (2nd attempt) #4877

ricardoV94 commented Jul 23, 2021 •

edited

Loading

codecov bot commented Jul 23, 2021 •

edited

Loading

ricardoV94 commented Jul 24, 2021

junpenglao commented Jul 25, 2021

michaelosthege commented Jul 25, 2021

junpenglao commented Jul 26, 2021

ricardoV94 commented Jul 26, 2021 •

edited

Loading

aloctavodia left a comment

aloctavodia Jul 28, 2021

ricardoV94 Jul 28, 2021

aloctavodia Jul 28, 2021

ricardoV94 Jul 28, 2021

aloctavodia Jul 28, 2021

ricardoV94 Jul 28, 2021

aloctavodia Jul 28, 2021

ricardoV94 Jul 28, 2021 •

edited

Loading

michaelosthege Jul 28, 2021

ricardoV94 Jul 28, 2021

lucianopaz commented Jul 28, 2021

Refactor pm.Simulator (2nd attempt) #4877

Refactor pm.Simulator (2nd attempt) #4877

Conversation

ricardoV94 commented Jul 23, 2021 • edited Loading

codecov bot commented Jul 23, 2021 • edited Loading

Codecov Report

ricardoV94 commented Jul 24, 2021

junpenglao commented Jul 25, 2021

michaelosthege commented Jul 25, 2021

junpenglao commented Jul 26, 2021

ricardoV94 commented Jul 26, 2021 • edited Loading

aloctavodia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 Jul 28, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lucianopaz commented Jul 28, 2021

ricardoV94 commented Jul 23, 2021 •

edited

Loading

codecov bot commented Jul 23, 2021 •

edited

Loading

ricardoV94 commented Jul 26, 2021 •

edited

Loading

ricardoV94 Jul 28, 2021 •

edited

Loading