-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relying on numpy.random makes results hard to reproduce #113
Comments
👍 |
Here's a link to a pull request that did the same for GloVe; looks simple enough: |
When I run the model I still get different results every time despite using a fixed seed:
|
@cvint13 Anything multithreaded will also be subject to scheduling jitter from the OS – operations won't happen in the same order, and thus results will vary. For full reproduceability, you'd have to move to a (much-slower) single-threaded calculation ( |
Ahhh ok, that sucks. |
For completeness -- @menshikh-iv is it expected that @cvint13 would get different results despite a fixed seed? Are the individual jobs processed in arbitrary (non-deterministic) order by multiple workers? |
@piskvorky yes (for multicore implementation only). |
... and it also confuses users of LDA.
In scikit-learn, we've solved this by requiring a
random_state
argument to anything that wants random numbers; this may either be annp.random.RandomState
object, or the seed for one. You might want to borrow this habit.The text was updated successfully, but these errors were encountered: