Word2Vec does not run faster with more workers #157

aboSamoor · 2014-01-23T15:19:11Z

I get a speed of 100k word/sec when running Word2Vec with one worker. Adding five workers result in the same speed with five CPUs utilized up to 20%.

Is that expected?

piskvorky · 2014-01-23T15:37:56Z

No. Sounds like some problem with Cython.

Can you post the value of word2vec.FAST_VERSION and Cython version? And maybe the log too, as a sanity check.

How long are your sentences? Anything special about the data?

aboSamoor · 2014-01-23T15:51:06Z

All the answers are included in this ipython notebook
http://nbviewer.ipython.org/gist/aboSamoor/fe70098abbb425622ce4

piskvorky · 2014-01-23T18:18:41Z

I can't replicate this.

Can you manually modify this line https://github.com/piskvorky/gensim/blob/develop/gensim/models/word2vec_inner.pyx#L205 to be fast_sentence = fast_sentence2?

Let's see if it's connected to BLAS somehow.

aboSamoor · 2014-01-23T18:28:31Z

I am using openblas and that is why it does not show up in scipy. When I
was using ATLAS the speed was 33k word/sec.
On Jan 23, 2014 1:18 PM, "Radim Řehůřek" [email protected] wrote:

I can't replicate this.

Can you manually modify this line
https://github.com/piskvorky/gensim/blob/develop/gensim/models/word2vec_inner.pyx#L205to be fast_sentence
= fast_sentence2?

Let's see if it's connected to BLAS somehow.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/157#issuecomment-33152111
.

aboSamoor · 2014-01-23T19:43:45Z

Ok, I switched my fast_sentence to version 2 which will use cython only without any blas. The speed is lower by 4x. However, the behaviour is the same! More workers do not buy you anything

http://nbviewer.ipython.org/gist/aboSamoor/68ee65496ce8ad7fa552

piskvorky · 2014-01-23T21:09:54Z

Ok, thanks. That means I'm out of ideas. Something wrong with releasing GIL in Cython, I suppose. The next step will be creating some simple, minimal Cython program to release the GIL and test that (no gensim).

But why are you upside down abo, are you Australian?

piskvorky · 2014-01-23T21:33:09Z

I tried on a machine with OpenBLAS (FAST_VERSION=1) and the same cython as you (0.19.2), but still couldn't replicate the problem. Speed went from 194k/s (1 worker) to 446k/s (4 workers).

aboSamoor · 2014-01-24T03:54:09Z

Ok, I was able to fix the problem by adding the following line before the multi-wroker call

os.system("taskset -p 0xff %d" % os.getpid())

Before, 4 workers will run on the same CPU, each getting 25% utilization.

After adding the above line, I can see 4 CPU cores running 100%. The speed went up from 110K word/sec to 150k word/sec (not as good speedup as you get but maybe that is a different problem).

I would appreciate it if you let me know more about your OpenBLAS setup.

The solution is more explained here
http://stackoverflow.com/questions/15639779/what-determines-whether-different-python-processes-are-assigned-to-the-same-or-d/15641148#15641148

piskvorky · 2014-01-24T10:11:09Z

This was OpenBLAS straight from Debian (Ubuntu) package, no special tuning. NumPy and SciPy also from repo:

$ dpkg -l | grep -E 'openblas|numpy|scipy'
ii  libopenblas-base                       0.2.8-2                          amd64        Optimized BLAS (linear algebra) library based on GotoBLAS2
ii  libopenblas-dev                        0.2.8-2                          amd64        Optimized BLAS (linear algebra) library based on GotoBLAS2
ii  python-numpy                           1:1.7.1-1ubuntu1                 amd64        Numerical Python adds a fast array facility to the Python language
ii  python-scipy                           0.12.0-2ubuntu1                  amd64        scientific tools for Python

$ uname -a
Linux hetrad 3.11.0-13-generic #20-Ubuntu SMP Wed Oct 23 07:38:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

piskvorky · 2014-04-10T08:51:26Z

What's the status here, @aboSamoor ? Did the taskset call resolve your issues?

aboSamoor · 2014-04-10T12:03:37Z

Yes, it is resolved.
On Apr 10, 2014 4:51 AM, "Radim Řehůřek" [email protected] wrote:

What's the status here, @aboSamoor https://github.com/aboSamoor ? Did
the taskset call resolve your issues?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/157#issuecomment-40055947
.

piskvorky closed this as completed Apr 10, 2014

carbonz0 mentioned this issue Jul 27, 2017

Word2Vec does not run faster with more workers caused by sentences length #1509

Closed

yjk21 mentioned this issue Jan 11, 2018

Word2Vec 3.2.0 performance regression for corpus on s3 with smart-open 1.5.6 #1836

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Word2Vec does not run faster with more workers #157

Word2Vec does not run faster with more workers #157

aboSamoor commented Jan 23, 2014

piskvorky commented Jan 23, 2014

aboSamoor commented Jan 23, 2014

piskvorky commented Jan 23, 2014

aboSamoor commented Jan 23, 2014

aboSamoor commented Jan 23, 2014

piskvorky commented Jan 23, 2014

piskvorky commented Jan 23, 2014

aboSamoor commented Jan 24, 2014

piskvorky commented Jan 24, 2014

piskvorky commented Apr 10, 2014

aboSamoor commented Apr 10, 2014

Word2Vec does not run faster with more workers #157

Word2Vec does not run faster with more workers #157

Comments

aboSamoor commented Jan 23, 2014

piskvorky commented Jan 23, 2014

aboSamoor commented Jan 23, 2014

piskvorky commented Jan 23, 2014

aboSamoor commented Jan 23, 2014

aboSamoor commented Jan 23, 2014

piskvorky commented Jan 23, 2014

piskvorky commented Jan 23, 2014

aboSamoor commented Jan 24, 2014

piskvorky commented Jan 24, 2014

piskvorky commented Apr 10, 2014

aboSamoor commented Apr 10, 2014