-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor pm.Simulator (2nd attempt) #4877
Conversation
9688ee7
to
1367252
Compare
Codecov Report
@@ Coverage Diff @@
## main #4877 +/- ##
==========================================
+ Coverage 73.16% 73.53% +0.37%
==========================================
Files 86 86
Lines 13838 13809 -29
==========================================
+ Hits 10125 10155 +30
+ Misses 3713 3654 -59
|
8d72685
to
293aac1
Compare
I decided to leave epsilon as an explicit argument to make it easier to manipulate (e.g., if we want our samplers to tune it), but can be a class attribute like I can see it as being confusing like this... |
Instead of subclassing and initializing the subclass, would something like: my_simulator = pm.SimulatorRV(
ndim_supp = 0
ndims_params = [0, 0, 0]
fn = my_simulator_fn
distance = "gaussian"
sum_stat = "sort"
) works? |
The thing needs to be a class, so |
I am assuming that as long as |
Here is a minimal gist that implements the current API and has some checks: https://gist.github.com/ricardoV94/b632085b20be716b87fd146609168090 And here is another gist with a previous iteration on these ideas (using pickle instead of the more flexible cloudpickle that we are now using in PyMC3): https://gist.github.com/ricardoV94/2bb59a2ac18a29f501f5511c9671ebbc |
293aac1
to
62bd386
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few nitpicks
pymc3/smc/sample_smc.py
Outdated
@@ -157,6 +147,29 @@ def sample_smc( | |||
%282007%29133:7%28816%29>`__ | |||
""" | |||
|
|||
if kernel is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be deprecated, we still want to have kernels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I know. We should just deprecate the "keywords" for the time being.
DeprecationWarning, | ||
stacklevel=2, | ||
) | ||
if save_sim_data is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still want this, as doing pm.sample_posterior_predictive
could be potentially too expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer have access to the simulated data in the logp
graph
return posterior, {modelcontext(model).observed_RVs[0].name: np.array(sim_data)} | ||
else: | ||
return posterior | ||
return idata if return_inferencedata else trace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when is the trace going to be deprecated? maybe we could only return inferencedata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would only deprecate it when it is deprecated in pm.sample
def _sum_stat(cls, value): | ||
return cls.sum_stat(value) | ||
|
||
|
||
class Simulator(NoDistribution): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick, if we are going to have a Simulator
and a SimulatorRV
, maybe the first one should be renamed to PseudoLikelihood
, SimulatedLikelihood
AbcLikelihood
or something similar. One counterargument to this proposal, is that if we are going to distinguish between simulator and pseudolikelihood, the distance and summary statistics should be part of the later not the former.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that was one of the ideas, the SimulatorRV
would simply be concerned with the random draws and the Pseudolikelihood
would take care of the logp
factor.
The downside is that we don't yet know how to create a "dynamic" logp
using the optional user defined sum_stat
and distance
functions. It means we might need to have users subclass not only the SimulatorRV
but also the Pseudolikelihood
if they want to use non-default functions. If someone figures out #4831, then we could simply copy their strategy.
Defining the functions in the SimulatorRV
was just an ugly hack to avoid forcing users to create two new classes...
warnings.warn( | ||
f"No value variable found for {rv_var}; " | ||
"the random variable will not be replaced." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What conclusion should a user make from from this warning?
Is it serious? If so we should raise. Otherwise maybe just _log.warn()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably raise
Could this be made to work with pickle if we defined |
572cab6
to
17d9f2e
Compare
The previous attempt at this in #4802 revealed big issues between pickling and dynamically created classes (which subsist even after #4858). The alternative presented in this PR is a bit more cumbersome from the user-side but at least avoids these issues.
Here is a minimal example demonstrating the new API:
If anyone has better suggestions about how to avoid pickling issues when creating dynamic classes please let me know! For example the classes used in these tests had to be defined outside of
TestSimulator.setup_class
.