Relying on numpy.random makes results hard to reproduce #113

larsmans · 2013-02-25T14:48:13Z

In scikit-learn, we've solved this by requiring a random_state argument to anything that wants random numbers; this may either be an np.random.RandomState object, or the seed for one. You might want to borrow this habit.

The text was updated successfully, but these errors were encountered:

adammenges · 2015-03-20T05:43:40Z

👍

piskvorky · 2015-12-24T04:43:13Z

Here's a link to a pull request that did the same for GloVe; looks simple enough:
maciejkula/glove-python#32

cvint13 · 2017-09-08T21:24:27Z

When I run the model I still get different results every time despite using a fixed seed:

lda = LdaMulticore(corpus, num_topics=100, workers=4,id2word=dictionary,
                   random_state=np.random.RandomState(seed=10101010),
                  alpha=np.array([.0075] * 100))

gojomo · 2017-09-08T21:33:14Z

@cvint13 Anything multithreaded will also be subject to scheduling jitter from the OS – operations won't happen in the same order, and thus results will vary. For full reproduceability, you'd have to move to a (much-slower) single-threaded calculation (workers=1).

cvint13 · 2017-09-08T22:43:43Z

Ahhh ok, that sucks.

piskvorky · 2017-09-09T09:58:01Z

For completeness -- LdaMulticore uses processes, not threads.

@menshikh-iv is it expected that @cvint13 would get different results despite a fixed seed? Are the individual jobs processed in arbitrary (non-deterministic) order by multiple workers?

menshikh-iv · 2017-10-17T05:06:36Z

@piskvorky yes (for multicore implementation only).

tmylk added the difficulty easy Easy issue: required small fix label Jan 23, 2016

droudy mentioned this issue Jun 20, 2016

Using numpy.random.RandomState instead of numpy.random in LDA #748

Merged

tmylk closed this as completed in 2e0ed26 Jun 25, 2016

droudy mentioned this issue Aug 10, 2016

Fix failing testRandomState() #821

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relying on numpy.random makes results hard to reproduce #113

Relying on numpy.random makes results hard to reproduce #113

larsmans commented Feb 25, 2013

adammenges commented Mar 20, 2015

piskvorky commented Dec 24, 2015

cvint13 commented Sep 8, 2017

gojomo commented Sep 8, 2017

cvint13 commented Sep 8, 2017

piskvorky commented Sep 9, 2017 •

edited

Loading

menshikh-iv commented Oct 17, 2017

Relying on numpy.random makes results hard to reproduce #113

Relying on numpy.random makes results hard to reproduce #113

Comments

larsmans commented Feb 25, 2013

adammenges commented Mar 20, 2015

piskvorky commented Dec 24, 2015

cvint13 commented Sep 8, 2017

gojomo commented Sep 8, 2017

cvint13 commented Sep 8, 2017

piskvorky commented Sep 9, 2017 • edited Loading

menshikh-iv commented Oct 17, 2017

piskvorky commented Sep 9, 2017 •

edited

Loading