OpenBLAS threading affinity #7197

luotao1 · 2018-01-04T07:13:16Z

The OpenBLAS inference benchmark seems wrong in IntelOptimizedPaddle.md.
In all three networks, when BatchSize increases, the images/second decreases.

VGG-19

BatchSize	1	2	4	8	16
OpenBLAS	1.07	1.08	1.06	0.88	0.65

ResNet-50

BatchSize	1	2	4	8	16
OpenBLAS	3.35	3.19	3.09	2.55	1.96

GoogLeNet

BatchSize	1	2	4	8	16
OpenBLAS	12.04	11.31	10.00	9.07	4.34

Possible Reason

Why images/second increases with BatchSize increasing in training benchmark?

OPENBLAS_NUM_THREADS * trainer_count = core number

The minimum BatchSize used in training is 64, which is larger than core number (40 in the experiment). Thus, we export OPENBLAS_NUM_THREADS=1 and trainer_count=40.

However, in inference, the BatchSize is smaller than core number. For example, when BatchSize=2, we export OPENBLAS_NUM_THREADS=20 and trainer_count=2. Which may cause the conflict in thread affinity.

How could I disable OpenBLAS threading affinity on runtime?

You can define the OPENBLAS_MAIN_FREE or GOTOBLAS_MAIN_FREE environment variable to disable threading affinity on runtime. For example, before the running, export OPENBLAS_MAIN_FREE=1
Alternatively, you can disable affinity feature with enabling NO_AFFINITY=1 in Makefile.rule. https://github.com/xianyi/OpenBLAS/wiki/Faq#no_affinity

Thus, I export OPENBLAS_MAIN_FREE=1, and test VGG inference, the result speedups:

BatchSize	1	2	4	8	16
OpenBLAS	1.07->1.08	1.08->1.99	1.06->3.64	0.88->3.57	0.65->2.27

@tensor-tang Can you help double check this result?

Solution

If OpenBLAS threading affinity affects the elapsed time, should we auto set it in the program like MKL does?

Should we add export OPENBLAS_MAIN_FREE in paddle/scripts/submit_local.sh.in and python/paddle/v2/init.py , or directly add NO_AFFINITY=1 in openblas.cmake ?
Does OPENBLAS_MAIN_FREE related with hyperthreading, like KMP_AFFINITY does?

@tensor-tang Can you give some suggestion about this?

The text was updated successfully, but these errors were encountered:

tensor-tang · 2018-01-05T05:56:16Z

Just FYI.

I tried VGG-19 after added export OPENBLAS_MAIN_FREE=1

BatchSize	1	2	4	8	16
OpenBLAS	1.36	2.72	3.67	-	2.64

It hanged at batchsize 8.

luotao1 assigned luotao1 and tensor-tang Jan 4, 2018

luotao1 added the OpenBLAS label Jan 5, 2018

This was referenced Jan 8, 2018

update Openblas benchmark #7295

Merged

auto set openblas env #7397

Merged

luotao1 closed this as completed in #7397 Jan 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenBLAS threading affinity #7197

OpenBLAS threading affinity #7197

luotao1 commented Jan 4, 2018

tensor-tang commented Jan 5, 2018 •

edited

Loading

OpenBLAS threading affinity #7197

OpenBLAS threading affinity #7197

Comments

luotao1 commented Jan 4, 2018

Possible Reason

Solution

tensor-tang commented Jan 5, 2018 • edited Loading

tensor-tang commented Jan 5, 2018 •

edited

Loading