-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Word2Vec does not run faster with more workers #157
Comments
No. Sounds like some problem with Cython. Can you post the value of How long are your sentences? Anything special about the data? |
All the answers are included in this ipython notebook |
I can't replicate this. Can you manually modify this line https://github.com/piskvorky/gensim/blob/develop/gensim/models/word2vec_inner.pyx#L205 to be Let's see if it's connected to BLAS somehow. |
I am using openblas and that is why it does not show up in scipy. When I
|
Ok, I switched my fast_sentence to version 2 which will use cython only without any blas. The speed is lower by 4x. However, the behaviour is the same! More workers do not buy you anything http://nbviewer.ipython.org/gist/aboSamoor/68ee65496ce8ad7fa552 |
Ok, thanks. That means I'm out of ideas. Something wrong with releasing GIL in Cython, I suppose. The next step will be creating some simple, minimal Cython program to release the GIL and test that (no gensim). But why are you upside down abo, are you Australian? |
I tried on a machine with OpenBLAS (FAST_VERSION=1) and the same cython as you (0.19.2), but still couldn't replicate the problem. Speed went from 194k/s (1 worker) to 446k/s (4 workers). |
Ok, I was able to fix the problem by adding the following line before the multi-wroker call
Before, 4 workers will run on the same CPU, each getting 25% utilization. After adding the above line, I can see 4 CPU cores running 100%. The speed went up from 110K word/sec to 150k word/sec (not as good speedup as you get but maybe that is a different problem). I would appreciate it if you let me know more about your OpenBLAS setup. The solution is more explained here |
This was OpenBLAS straight from Debian (Ubuntu) package, no special tuning. NumPy and SciPy also from repo:
|
What's the status here, @aboSamoor ? Did the |
Yes, it is resolved.
|
I get a speed of 100k word/sec when running Word2Vec with one worker. Adding five workers result in the same speed with five CPUs utilized up to 20%.
Is that expected?
The text was updated successfully, but these errors were encountered: