Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add control for random state #62

Open
jpn-- opened this issue May 7, 2019 · 9 comments
Open

Add control for random state #62

jpn-- opened this issue May 7, 2019 · 9 comments
Assignees
Milestone

Comments

@jpn--
Copy link
Contributor

jpn-- commented May 7, 2019

Are there any plans to include "random state" interface for any of the sampling functions (i.e. to guarantee stable output for testing)? Or: was this considered and explicitly not done for some reason?

If this isn't contra-indicated, would a pull request that implements this only partially be of interest? There's a lot of places where random state might be useful, and I don't have time to search through the whole codebase and find/implement them all, but I'd be willing to do so for a handful of "low hanging fruit" places that would be useful for the project I am working on. (I am going to do this anyhow, just want to know if I should isolate this as an independent PR)

@quaquel
Copy link
Owner

quaquel commented May 7, 2019

can't this be achieved by simply setting the numpy random seed globally?

I agree it would be a useful features

@quaquel
Copy link
Owner

quaquel commented May 7, 2019

So, I quickly checked the code. Sampling relies entirely on scipy.stats or numpy.random. Scipy.stats uses the global numpy random state, so setting the global numpy random state should be sufficient for reproducibility. Will try to test this Thursday.

Or do you want to control the RandomState of individual uncertain variables?

@jpn--
Copy link
Contributor Author

jpn-- commented May 7, 2019

I was thinking for thread-safe operations, and to potentially regenerate individual random variables at a later time or on a different machine. This is pretty common for some of the other large scale simulators I work on, as there's a desire to have reproducible results not just in unit tests but in practical applications as well.

@quaquel
Copy link
Owner

quaquel commented May 7, 2019

I completely agree with the use case.

Thread safety is no concern for sampling: it only happens in the main thread. Farming out to other processes / threads happens after generating the experiments.

So the real question is whether having a global solution (probably via setting the numpy RandomState) is sufficient or whether you want to control individual distributions. The latter would require a modification to the Abstract Parameter class (init, and from_dist).

@quaquel
Copy link
Owner

quaquel commented May 8, 2019

So I ran a quick test using the inter temporal version of the lake problem and setting the global numpy random state. This works: it produces the exact same results. See the attached pdf for a quick proof of principle:
seed_test.pdf

Is this sufficient for your use case or not?

@jpn--
Copy link
Contributor Author

jpn-- commented May 8, 2019

I've identified my real problem after pulling through the depths of the code -- and it turns out my real problems are with Platypus and not workbench. I don't have any desire to put in the effort to change that code as well, so I'm going to call this solution is good enough.

For posterity should any other users need to address this issue in the future:

At least part of the issue is that platypus uses the Python standard library random number generator, while the workbench uses the numpy random number generator. If you want any hope of generating more or less reproducible results when the workbench reaches into Platypus, you need to initialize both global random number generators:

import numpy.random
import random
numpy.random.seed(42)
random.seed(42)

@jpn-- jpn-- closed this as completed May 8, 2019
@quaquel
Copy link
Owner

quaquel commented May 9, 2019

ah, had I know you were using platypus I would have been able to tell you this right away.

We might still consider having a set_random_state function on the workbench which sets both the numpy random state, the random state, and ensures that this is properly propagated to any subprocesses that are started.

Would that be a useful feature?

@quaquel quaquel reopened this May 9, 2019
@jpn--
Copy link
Contributor Author

jpn-- commented May 9, 2019

Modestly useful, not a high priority though.

@EwoutH EwoutH added this to the 2.3.0 milestone Aug 31, 2022
@quaquel
Copy link
Owner

quaquel commented Sep 1, 2022

Started looking into this. This discussion on StackOverflow is quite useful. It seems the preferred approach is

from numpy.random import Generator, PCG64

# Case 3 (IMP) : Scipy uses an existing Random Generator which can being passed to Scipy based 
# random generator object
numpy_randomGen = Generator(PCG64(seed))
scipy_randomGen.random_state=numpy_randomGen
print(scipy_randomGen.rvs(n, p, size))
print(numpy_randomGen.binomial(n, p, size))
# prints
# [4 4 6 6 5 4 5 4 6 7]
# [4 8 6 3 5 7 6 4 6 4]
# This should be the case which we mostly want (DESIRABLE). If we are using both Numpy based and 
#Scipy based random number generators/function, then not only do we have no repetition of 
#random number sequences but also have reproducibility of results in this case.

scipy.stats is being used in samplers.py, and plotting_util.py
numpy.random is being used in prim.py, and samplers.py
random is being used in optimization.py, evaluators.py, and points.py

@EwoutH EwoutH modified the milestones: 2.3.0, 2.4.0 Oct 25, 2022
@quaquel quaquel modified the milestones: 2.4.0, 2.5.0 Apr 19, 2023
@EwoutH EwoutH modified the milestones: 2.5.0, 3.0 Dec 20, 2023
@EwoutH EwoutH changed the title random state Add control for random state Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants