-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add control for random state #62
Comments
can't this be achieved by simply setting the numpy random seed globally? I agree it would be a useful features |
So, I quickly checked the code. Sampling relies entirely on scipy.stats or numpy.random. Scipy.stats uses the global numpy random state, so setting the global numpy random state should be sufficient for reproducibility. Will try to test this Thursday. Or do you want to control the RandomState of individual uncertain variables? |
I was thinking for thread-safe operations, and to potentially regenerate individual random variables at a later time or on a different machine. This is pretty common for some of the other large scale simulators I work on, as there's a desire to have reproducible results not just in unit tests but in practical applications as well. |
I completely agree with the use case. Thread safety is no concern for sampling: it only happens in the main thread. Farming out to other processes / threads happens after generating the experiments. So the real question is whether having a global solution (probably via setting the numpy RandomState) is sufficient or whether you want to control individual distributions. The latter would require a modification to the Abstract Parameter class (init, and from_dist). |
So I ran a quick test using the inter temporal version of the lake problem and setting the global numpy random state. This works: it produces the exact same results. See the attached pdf for a quick proof of principle: Is this sufficient for your use case or not? |
I've identified my real problem after pulling through the depths of the code -- and it turns out my real problems are with Platypus and not workbench. I don't have any desire to put in the effort to change that code as well, so I'm going to call this solution is good enough. For posterity should any other users need to address this issue in the future: At least part of the issue is that platypus uses the Python standard library random number generator, while the workbench uses the numpy random number generator. If you want any hope of generating more or less reproducible results when the workbench reaches into Platypus, you need to initialize both global random number generators:
|
ah, had I know you were using platypus I would have been able to tell you this right away. We might still consider having a set_random_state function on the workbench which sets both the numpy random state, the random state, and ensures that this is properly propagated to any subprocesses that are started. Would that be a useful feature? |
Modestly useful, not a high priority though. |
Started looking into this. This discussion on StackOverflow is quite useful. It seems the preferred approach is from numpy.random import Generator, PCG64
# Case 3 (IMP) : Scipy uses an existing Random Generator which can being passed to Scipy based
# random generator object
numpy_randomGen = Generator(PCG64(seed))
scipy_randomGen.random_state=numpy_randomGen
print(scipy_randomGen.rvs(n, p, size))
print(numpy_randomGen.binomial(n, p, size))
# prints
# [4 4 6 6 5 4 5 4 6 7]
# [4 8 6 3 5 7 6 4 6 4]
# This should be the case which we mostly want (DESIRABLE). If we are using both Numpy based and
#Scipy based random number generators/function, then not only do we have no repetition of
#random number sequences but also have reproducibility of results in this case. scipy.stats is being used in samplers.py, and plotting_util.py |
Are there any plans to include "random state" interface for any of the sampling functions (i.e. to guarantee stable output for testing)? Or: was this considered and explicitly not done for some reason?
If this isn't contra-indicated, would a pull request that implements this only partially be of interest? There's a lot of places where random state might be useful, and I don't have time to search through the whole codebase and find/implement them all, but I'd be willing to do so for a handful of "low hanging fruit" places that would be useful for the project I am working on. (I am going to do this anyhow, just want to know if I should isolate this as an independent PR)
The text was updated successfully, but these errors were encountered: