-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix train error of ConcatenatedDoc2Vec in the notebook of doc2vec-IMDB #1377
Conversation
…into fix-word2vec-notebook Conflicts: gensim/test/test_word2vec.py
docs/notebooks/doc2vec-IMDB.ipynb
Outdated
" if not isinstance(train_model, ConcatenatedDoc2Vec):\n", | ||
" train_model.train(doc_list, total_examples=train_model.corpus_count, epochs=train_model.iter)\n", | ||
" else:\n", | ||
" train_model.train(doc_list)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simpler and more robust fix would be to change the ConcatenatedDoc2Vec class, in test_doc2vec.py
, to make its (no-op) train()
match the new train()
parameters-signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. If the train() method is modified, the total_examples and epochs should be provided. But the ConcatenatedDoc2Vec class has no attribute 'corpus_count'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call doesn't have to use train_model.corpus_count
from inside the model - it can just use len(doc_list)
. And since the outside loop is handling the multiple passes, the epochs
argument should be 1
.
@robotcator Please merge |
…into fix-notebook
gensim/test/test_doc2vec.py
Outdated
def train(self, sentences, total_examples=None, total_words=None, | ||
epochs=None, start_alpha=None, end_alpha=None, | ||
word_count=0, | ||
queue_factor=2, report_delay=1.0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this is a no-op implementation that ignores any arguments, it can be specified even more compactly and generically, using Python's syntax for arbitrary positional/keyword arguments. EG:
def train(self, *ignored, **kwignored):
@robotcator I open this notebook and get error |
@menshikh-iv all the cells were re-ran and the error was fixed. |
"*0.193200 : 1 passes : Doc2Vec(dbow,d100,n5,mc2,s0.001,t4)_inferred 35.3s 48.3s\n", | ||
"*0.268640 : 1 passes : Doc2Vec(dm/m,d100,n5,w10,mc2,s0.001,t4) 48.6s 48.5s\n", | ||
"*0.208000 : 1 passes : Doc2Vec(dm/m,d100,n5,w10,mc2,s0.001,t4)_inferred 48.6s 47.4s\n", | ||
"*0.216160 : 1 passes : dbow+dmm 0.0s 168.9s\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change in reported elapsed-time for these concatenated-model results, here and below, is so large it deserves a closer look. I'm no aware of any change, in this PR or prior related work, that could account for such changes – from <2.0 seconds to >170 seconds. It might be a spurious report, or something amiss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the concatenated-model train method didn't do anything actually. I will check what' going on this.
@robotcator What's a status of this PR? |
Ping @robotcator |
@menshikh-iv sorry for late reply, I didn't receive last notification. This PR seems works fine and all the cell have been rerun. The warning of notebook is gone. It's able to be merged . |
There's still the suspiciously-different reported-runtimes on some of the log lines. |
In fact since the last time the notebook was run and its output committed, there have been many changes (outside this PR). So many things could cause the discrepancy; I just suggest the big runtime difference be understood alongside the updated notebook cells. |
That's indeed very worrying. @menshikh-iv can you find which commit / PR caused this slowdown? |
is there any possible that the performance of computer which results in the difference of output. Because I rerun that notebook twice and the second seems alright. |
@piskvorky accepted. I will search and fix if needed. |
No description provided.