-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make draw_values draw from the joint distribution #3214
Conversation
Upstream fetch
…d properly before starting the fix...
This is a pretty big change, but I think introducing |
I'm sorry about the test coverage and test failures. Due to my bad upstream fetch, the local tests that I ran were outdated. I'm debugging the PR now to try to get it up to date and passing. |
Thanks @lucianopaz, this is quite an impressive PR. We should definitely fix this bug. Without having thought about this deeply enough two questions come to mind in terms of possible simplifications:
|
This is very interesting! I think Thomas' intuition is right - I am working on a similar PR with #3213, but will be refactoring that class to get functionality like you have. I will make more substantive comments when I can have a closer look at this. |
…h. Two failures remain: TestNutsCheckTrace.test_bad_init raises a RuntimeError instead of a ValueError, but that seems to be the fault of parallel_sampling.py chaining the exception. Then test_step_exact for SMC gives different traces. Finally, sometimes, test_glm_link_func2 gets into a deadlock. Im not sure why. Will check in TravisCI if they pop out.
@twiecki, I'm not familiar with About the ancestry information, as I had not fetched the right upstream, I had not seen what @ColCarroll had done with the ancestry in |
Agree on everything you said, @lucianopaz - my strategy seems more error prone and pretty opaque, but your strategy means having 2 copies of the graph (one in theano, one in the DAG) which may not agree. Interested in other opinions on that. |
While I agree from an API stand-point, the amount of code added here is non-trivial, which is my main concern. |
…dded `conditional_on` attribute of every distribution. This function does a breadth first search on the node's `logpt` or `transformed.logpt` graph, looking for named nodes which are different from the root node, or the node's transformed, and is also not a `TensorConstant` or `SharedVariable`. Each branch was searched until the first named node was found. This way, the parent conditionals of the `root` searched node, which were only one step away from it in the bayesian network were returned. However, this ran into a problem with `Mixture` classes. These add to the `logpt` graph, another `logpt` graph from the `comp_dists`. This leads to the problem that the `logpt`'s first level conditionals will also be seen as if they were first level conditional of the `root`. Furthermore, many copies of nodes done by the added `logpt` ended up being inserted into the computed `conditional_on`. This lead to a very strange error, in which loops appeared in the DAG, and depths started to be wrong. In particular, there were no depth 0 nodes. My view is that the explicit `conditional_on` attribute prevents problems like this one from happening, and so I left it as is, to discuss. Other changes done in this commit are that `test_exact_step` for the SMC uses `draw_values` on a hierarchy, and given that `draw_values`'s behavior changed in the hierarchy situations, the exact trace values must also be adjusted. Finally `test_bad_init` was changed to run on one core, this way the parallel exception chaining does not change the exception type.
…n distribution.py.
@twiecki, @ColCarroll, I implemented a function, It worked well on all distributions except on the |
@lucianopaz Thanks for giving that a try. I would imagine that there is likely to be some way to get it to work, @ColCarroll mentioned he might have some ideas. My bigger concern is with the DAG class, however, did you take a look at the way the networkx PR handles this? I would imagine they have all the required functionality in there. Perhaps we could even get @ColCarroll's work to create the networkx graph and then just use that for the graph traversal. Just to make sure my higher level point is understood: While I think this is a bug, it doesn't seem like a major one. To add a large amount of code that we will have to maintain for solving it (and nothing else) is very costly, so this should be evaluated thoroughly. Finally, seems like your editor adds auto-line breaks to existing code which we don't require. |
I focused on solving the bugs and still have not taken a look at the networkx implementation. I'll go into that now. |
I took another go at removing the I also reduced the custom code in |
…ss of networkx.DiGraph
I've been thinking how to completely remove the |
I think this would work with the class we already have in |
… is now stored in a networkx.DiGraph. Networkx is then used to compute subgraphs, and perform topological_sort on the graph during draw_values execution.
@twiecki, @ColCarroll, I finished the implementation that completely eliminates the However, I think that there are 5 things that we would need to add to the
If you consider that we should discard the |
pymc3/util.py
Outdated
@@ -60,14 +67,16 @@ def is_transformed_name(name): | |||
Returns | |||
------- | |||
bool | |||
Boolean, whether the string could have been produced by `get_transormed_name` | |||
Boolean, whether the string could have been produced by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These shouldn't show up here.
@@ -1518,3 +1493,324 @@ def all_continuous(vars): | |||
return False | |||
else: | |||
return True | |||
|
|||
|
|||
class ConstantNodeException(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move these to a separate file, graph.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not_shared_or_constant_variable
needs model.FreeRV
, model.MultiObservedRV
and model.TransformedRV
. If we move it away, we'll get into a circular import problem. That's why I left it in model.py
.
I like this implementation much better. If we require graph traversal logic, which seems very clear given the purpose of this package, networkx is a perfectly reasonable dependency. Moreover, it's well maintained and packaged, so I don't see many problems with that. I don't know enough about Finally, the line breaks add a lot of diff noise and are unnecessary under our current style rules, can you please revert those? |
@twiecki, sorry I was unable to work on this last week. I'll try to get an implementation of @aseyboldt's suggestion to work. The main difficultly will be that we cannot require the model context to be known, as I pointed out in an earlier reply, almost all tests in the testsuite call |
@lucianopaz I think in that case we should rather change tests to have a model in the context. Almost everything in pymc3 is centered around having access to the model. Not having thought about this part of the code base a lot before, I was surprised to learn that it wasn't required here. So for consistency it makes sense to require it here as well. My main concern here is code complexity though. While a DAG sounds nice to have, if we don't need to have it, we shouldn't add it. Moreover, the actual building of the DAG is a fairly simple part of this PR, most complexity comes from traversing it in the right way. If that could be saved by offloading it to theano, that would be amazing. So my suggestion is, if you want to, try @aseyboldt's approach without thinking about the tests. Finally, sorry if I'm coming across as very critical and difficult here, we greatly appreciate your work, but I think it is important to get this part correct and it seems like there were some design problems from the start that we're bumping up at against here, and for our future sanity, it'd be best to try and fix those. |
…, even in the presence of plates of the same shape.
@twiecki, I understand. I'll start with @aseyboldt's approach now. I wanted to first push a small commit to leave a fix I had worked on before last week, to get |
@twiecki, @aseyboldt, I've started to work on the storing the order in which each variable was defined while constructing the model and I ran into a problem early on, and I wanted to ask for your opinion. My idea was to store a list called |
@lucianopaz |
To clarify, my naive of idea of what is required at minimum (pseudo-code): curr_point = model.test_point
for var in model.vars:
sample_func = create_sample_func_with_givens(var, givens=curr_point)
curr_point[var] = sample_func() Not sure how much I'm missing there but theano should handle all the rest correctly. |
I understand. My concern is that the deterministics, for example the |
Ran into another hurdle. What I'm attempting to do inside queue = list(params that are numbers, numpy arrays, TensorConstant or SharedVariables)
if len(params) > len(queue):
queue.extend(model.vars + remaining params)
for var in queue:
value = _draw_value(var, ...)
# update point and givens with value
output[var] = value The issue with this is that I'm running into a recursion limit, because sometimes I'm trying to Furthermore, I'm having trouble thinking of how to make Do you have any thought on these problems? |
They are in |
@lucianopaz Does that recursion have any purpose in this new scenario? Maybe we can just drop it. |
@junpenglao, now I realize that my post was not clear. When the RV is added to the model, it's appended to one or more of the lists free_rvs, observed_rvs, deterministics, potentials, etc. The absolute order between members of free_rvs and deterministics is lost. |
I understand that the recursion comes from how the distribution's
If the distribution's parameters are themselves RVs that come from another distribution, the |
While trying to fix the tests to make them all call One can create a distribution's instance outside of a model's context doing something like |
@lucianopaz Should |
I can't look into this more closely right now, my computer broke down and it will take till next week before I can get it fixed.
It seems to me that much of this trouble is because we don't distinguish properly between sampling from a distribution and sampling from a model. The former *should* happen in the random method, and should be well defined without a model. It only makes sense if all parameters of a distribution are given explicitly. The latter only makes sense if we know what model we are talking about.
Draw_values tries to do both at the same time.
Maybe we can work around that mess without too much breakage if we split draw_values into two functions: one, that we still call `draw_values ` so that we don't break distribution implementations (even though that name won't really match what it is doing then?). This should use theano to compute the parameters but should *not* call other sampling functions. If the parameters can not be computed deterministically from the input it should just throw an error.
A second function will walk through the variables in creation order and call the appropriate random functions. (Checking if necessary if they are transformed vars or not.)
I think you are right that we need to keep an additional list in the model.
…On November 1, 2018 9:03:58 PM CET, Luciano Paz ***@***.***> wrote:
While trying to fix the tests to make them all call `draw_values` from
inside a model's context I came across a condition that made me wonder
we can actually for a model on top of `draw_values`.
One can create a distribution's instance outside of a model's context
doing something like `d = Distribution.dist()`. If we now force
`draw_values` to be called from inside a model's context, then
`d.random()` will not work outside a context anymore. I'm confused
about this, because it looks like we're partially breaking the
`Distribution.dist` by forcing a model on top of it. Can we really
force `draw_values` to always be inside a model's context?
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#3214 (comment)
-Adrian
|
@twiecki, I agree that it shouldn't call @aseyboldt, I think it will be very hard to be backwards compatible while splitting Again, I'm inclined to explicitly exploit the dag, as it currently stands in this PR. The current working idea is that when we enter |
@lucianopaz I think it's worth it to still find out why the recursion occurs, it might uncover a bug. If we get to the intended behavior and there will never be recursion, the method using givens should work fine, no? |
I've narrowed the origin of the recursion down to the following minimal example import pymc3 as pm
from pymc3.distributions.distribution import draw_values
with pm.Model():
mu0 = pm.Normal('mu0', mu=0., tau=1e-3)
sigma0 = pm.Gamma('sigma0', alpha=1., beta=1., transform=None)
mu1 = pm.Normal('mu1', mu=mu0, tau=1e-3)
sigma1 = pm.Gamma('sigma1', mu=sigma0, sd=1., transform=None)
y = pm.Normal('y', mu=mu1, sd=sigma1)
# Call 0
mu1_draw = draw_values([mu1])
# Call 1
sigma1_draw = draw_values([sigma1])
# Call 2
y_draw = draw_values([y], point={'sigma1': 1}) The def draw_values(params, point=None, size=None, model=None):
if point is None:
point = {}
model = modelcontext(model)
queue = []
last = []
drawn_value_map = {}
counter = 0
for i, p in enumerate(params):
p_name = getattr(p, 'name', None)
# First put the things we can get a value from directly without
# evaluating a theano function or calling a random method
if isinstance(p, (numbers.Number,
np.ndarray,
theano.tensor.TensorConstant,
theano.tensor.sharedvar.SharedVariable)):
drawn_value_map[i] = counter
counter += 1
queue.append(p)
elif p_name is not None and p_name in point:
drawn_value_map[i] = counter
counter += 1
queue.append(p)
else:
last.append((i, p))
# If params contained model rvs, these should go according
# to their creation order.
# If params contained other things, these should go at the end
if last:
# var_construction_order is a list with all model rvs, deterministics, etc
# placed in the order in which they were added to the model
# We only add the variables that are not already in the queue
queue.extend([v for v in model.var_construction_order
if v not in queue])
counter = len(queue)
for i, p in last:
try:
# If the param is already in the queue, get its index
ind = queue.index(p)
except Exception:
ind = None
# If the param is not in the queue, it is added at the end
if ind is None:
ind = counter
counter += 1
queue.append(p)
drawn_value_map[i] = ind
# Init drawn values and updatable point and givens
drawn = {}
givens = []
nodes_missing_inputs = {}
for ind, param in enumerate(queue):
try:
value = _draw_value(param, point=point, givens=givens, size=size)
except theano.gof.fg.MissingInputError as e:
# This deals with eventual auto transformed rvs that miss their input
# value
nodes_missing_inputs[ind] = e
continue
drawn[ind] = value
givens.append((param, value))
param_name = getattr(param, 'name', None)
if param_name is not None:
point[param_name] = value
output = []
# Get the output in the correct order
for ind, param in enumerate(params):
value = drawn.get(drawn_value_map[ind], None)
if value is None:
if ind in nodes_missing_inputs:
raise nodes_missing_inputs[ind]
else:
raise RuntimeError('Failed to draw the value from parameter '
'{}.'.format(param))
output.append(value)
return output Basically,
The example's call 0 works, but the calls 1 and 2 enter the infinite recursion loop and die. The reason that call 1 dies is that
This makes The reason call 2 dies is that, for some reason, These problems point to the fact that the |
@lucianopaz what's your email? |
@twiecki, my work email is [email protected]. You can send me a message there and then, if you prefer, I can send you my personal gmail account. |
…n PR pymc-devs#3214. It uses a context manager inside `draw_values` that makes all the values drawn from `TensorVariables` or `MultiObservedRV`s available to nested calls of the original call to `draw_values`. It is partly inspired by how Edward2 approaches the problem of forward sampling. Ed2 tensors fix a `_values` attribute after they first call `sample` and then only return that. They can do it because of their functional scheme, where the entire graph is recreated each time the generative function is called. Our object oriented paradigm cannot set a fixed _values, it has to know it is in the context of a single `draw_values` call. That is why I opted for context managers to store the drawn values.
Close in favor of #3273 |
* Fix for #3225. Made Triangular `c` attribute be handled consistently with scipy.stats. Added test and updated example code. * Fix for #3210 which uses a completely different approach than PR #3214. It uses a context manager inside `draw_values` that makes all the values drawn from `TensorVariables` or `MultiObservedRV`s available to nested calls of the original call to `draw_values`. It is partly inspired by how Edward2 approaches the problem of forward sampling. Ed2 tensors fix a `_values` attribute after they first call `sample` and then only return that. They can do it because of their functional scheme, where the entire graph is recreated each time the generative function is called. Our object oriented paradigm cannot set a fixed _values, it has to know it is in the context of a single `draw_values` call. That is why I opted for context managers to store the drawn values. * Removed leftover print statement * Added release notes and draw values context managers to mixture and multivariate distributions that make many calls to draw_values or other distributions random methods within their own random.
This is a fix I worked on for #3210. The idea is what I discussed in that issue's thread.
Sorry that the commit history is a mess. I had mistakenly thought I had fetched the remote upstream, and did my changes on a really old commit, and had a bit of trouble with the merge process.
With the proposed mechanics for building the variable dependency DAG
draw_values
is now able to draw from the joint probability distribution.With the issue's example code, the expected joint distribution is:
and now the result from draw_values is: