Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use an incrementally updated RNG state in Model #4729

Merged
merged 4 commits into from
Jun 2, 2021

Conversation

brandonwillard
Copy link
Contributor

@brandonwillard brandonwillard commented May 28, 2021

This PR updates the role of Model.default_rng by adding a Model.next_rng that is incrementally updated when variables are registered.

Under this change, Model.default_rng still serves as a mutable starting point for RNG seeding, while also (softly) guaranteeing that newly added/registered random variables—that don't specify their own rng parameters—will be unique.

This PR also updates how initial values are specified, stored, and generated. Since computing initial values may require sampling, these updates were needed in order to prevent initial values from unnecessarily affecting the RNG states in confusing ways.

Closes #4728

@brandonwillard brandonwillard self-assigned this May 28, 2021
@brandonwillard brandonwillard linked an issue May 28, 2021 that may be closed by this pull request
@ricardoV94
Copy link
Member

ricardoV94 commented May 28, 2021

I guess this also affects variables created via .dist()? We need to keep that in mind until we decide what to do with that API.

@ricardoV94
Copy link
Member

Are successive rngs guaranteed to not repeat or just nearly so?

@brandonwillard
Copy link
Contributor Author

I guess this also affects variables created via .dist()?

Howso?

@ricardoV94
Copy link
Member

NVM, I thought they were also given the model rng

@michaelosthege
Copy link
Member

If we keep the uncoordinated merge of #4718 it forces me to rebase the #4696 branch--unless someone else steps in to help out.
Possibly more because of emotional than technical reasons I am not at all looking forward to do this rebase.

Call me overreacting, but out of respect for a fellow core contributor please don't merge anything into v4 until we have discussed how to proceed as a team.

@brandonwillard
Copy link
Contributor Author

Call me overreacting, but out of respect for a fellow core contributor please don't merge anything into v4 until we have discussed how to proceed as a team.

Out of respect for the core developers and anyone who might be interested in using v4, I would hope that the core developers would prioritize self-contained bug fixes like this and #4718 over large, non-critical features such as #4696.

Regardless, I won't merge this until others weigh in.

@twiecki
Copy link
Member

twiecki commented May 28, 2021

I don't remember that we've locked branches (including master) before for other PRs, so this would be something new. We can certainly discuss whether this is a good idea, I personally don't think it is as it would slow us down tremendously if a single PR that we have difficulty merging brings development to a complete halt.

We do have tools to resolve this, namely a rebase, and I don't see how it would be a particularly difficult one in this case. Plus, we will have to rebase #4696 anyway. I'm not at all saying that's on you @michaelosthege as you said before that the PR is done from your side, and that's totally fine.

@twiecki
Copy link
Member

twiecki commented May 28, 2021

@brandonwillard So changing the rng is only to make RVs unique? I guess I'm a bit confused why they aren't unique to begin with and if we shouldn't attack the problem there?

@brandonwillard
Copy link
Contributor Author

brandonwillard commented May 28, 2021

@brandonwillard So changing the rng is only to make RVs unique? I guess I'm a bit confused why they aren't unique to begin with and if we shouldn't attack the problem there?

No, not exclusively; changing the rng for each variable is also generally more consistent (e.g. it doesn't force one to use in-place Ops and it provides at least the same level of configurability/control).

Otherwise, the uniqueness in question is consistent and fundamental to general graph representations, and also how Aesara is designed: if the same Op is applied to the same inputs, the resulting nodes and their outputs are equivalent. This is the basis of "merge" optimizations, which prevent graphs from computing the same things more than once.

We could make RandomVariable nodes unique in other ways (e.g. by adding new inputs that could be used to uniquely identify them), but the points above would remain, and, because the rng solution is the design-intended approach—and addresses other issues as well—there's no need to invent anything new (that doesn't entail other improvements, of course).

Aside from all that, this lack of uniqueness can result in real bugs, because a compiled function that conflates equivalent RandomVariables—like the x and y in the example for #4728—will drop one of the arguments (definitely when on_unused_input="ignore" is set), so a user attempting to provide such a function with distinct values for each variable will find that the results are not correct.

ricardoV94
ricardoV94 previously approved these changes May 29, 2021
@brandonwillard
Copy link
Contributor Author

brandonwillard commented May 30, 2021

If we changed this so that it worked more like RandomStream (i.e. generate a new RandomState for each value of model.next_rng), we could avoid producing very inter-connected graphs. This might be the better approach.

Update: it's definitely the better approach; I'll rebase this with the changes soon.

@brandonwillard
Copy link
Contributor Author

brandonwillard commented May 31, 2021

The changes I just pushed allow Model to generate new RNG states like RandomStream does, as mentioned previously.

The Model object will keep a list of the RNG states it has generated, in case we want to allow easier model re-seeding.

Also, the forced in-place RandomVariable Op creation has been removed and replaced with the intended optimization-based approach. In other words, when aesara.function is used with a mode that includes the RandomVariable in-place optimization step, it will convert each RandomVariable Op in the graph into its in-place equivalent, and allow evaluations of a compiled function to update its graph's RNG states (i.e. produce new samples each time).

The standard FAST_RUN mode includes this optimization, but, to be safe, I've created a pymc3.aesaraf.compile_rv_inplace function that will add this particular in-place optimization to an existing mode. PyMC will use this helper function wherever it needs to compile a graph, ensuring that the result produces distinct samples.

@brandonwillard
Copy link
Contributor Author

The current failure is pretty straightforward: Model.initial_point uses Variable.eval, which compiles a function in FAST_RUN mode and alters the RNG state in-place. Since Model.initial_point is implicitly called by pm.load_trace, the subsequent pm.sample_posterior_predictive call is starting at a different RNG state than the earlier one, and that causes the results to differ.

It's a pretty straightforward fix (i.e. don't use Variable.eval), but it's also a good time to clean up initial_point a little bit, so I'll do that as a means of fixing this.

@brandonwillard brandonwillard force-pushed the use-distinct-rngs branch 2 times, most recently from 3fedaa2 to d279e89 Compare June 1, 2021 02:49
@brandonwillard brandonwillard force-pushed the use-distinct-rngs branch 4 times, most recently from 5a63903 to a48db14 Compare June 1, 2021 03:28
@brandonwillard brandonwillard marked this pull request as ready for review June 1, 2021 03:32
@brandonwillard brandonwillard force-pushed the use-distinct-rngs branch 2 times, most recently from bb3c3c4 to 186b4f5 Compare June 1, 2021 04:42
Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we test that rvs can still be merged when they are in fact repeated in the graph expression?

pymc3/tests/test_distributions.py Show resolved Hide resolved
@brandonwillard
Copy link
Contributor Author

Should we test that rvs can still be merged when they are in fact repeated in the graph expression?

No, that's very much Aesara's responsibility.

ricardoV94
ricardoV94 previously approved these changes Jun 1, 2021
Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@junpenglao
Copy link
Member

testval -> initval is a breaking change that is not backward compatible, are we ok without a deprecation cycle here?

@michaelosthege
Copy link
Member

michaelosthege commented Jun 1, 2021

testval -> initval is a breaking change that is not backward compatible, are we ok without a deprecation cycle here?

I agree that we should reconsider this breaking change. The value is still assigned to .tag.test_value and "initval" may confuse users that it has something to do with MCMC initialization.
If we decide to deprecate it, there should be a warning in the .dist API, similar to the warning added to the __new__ API.

Copy link
Member

@michaelosthege michaelosthege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(see comment about breaking change above)

@brandonwillard
Copy link
Contributor Author

testval -> initval is a breaking change that is not backward compatible, are we ok without a deprecation cycle here?

What's a breaking change? A deprecation warning was added, but testval is still usable for now.

Regardless, folks, don't forget that this is a new version, and that there's no practical benefit to preserving a misnomer like testval.

@twiecki
Copy link
Member

twiecki commented Jun 1, 2021

I think the deprecation warning we have here is fine, but should definitely add a note in the release-notes on this change.

twiecki
twiecki previously approved these changes Jun 1, 2021
@brandonwillard
Copy link
Contributor Author

The value is still assigned to .tag.test_value and "initval" may confuse users that it has something to do with MCMC initialization.

The former confirms that this is not a breaking change, and the latter wouldn't be a confusion, because initval does have something to do with MCMC initialization, just like the old test values.

@brandonwillard brandonwillard merged commit ad9b919 into pymc-devs:v4 Jun 2, 2021
@brandonwillard brandonwillard deleted the use-distinct-rngs branch June 2, 2021 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make sure Model RVs are distinct via their RNGs
5 participants