Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is trainer_count work when infer with CPU? #7218

Closed
zengjialuo opened this issue Jan 4, 2018 · 4 comments
Closed

is trainer_count work when infer with CPU? #7218

zengjialuo opened this issue Jan 4, 2018 · 4 comments

Comments

@zengjialuo
Copy link

zengjialuo commented Jan 4, 2018

I use pre-trained ResNet50 model to do infer job on CPU machine, the code is mainly copied from PaddlePaddle/models/image_classification.

I found that when i set trainer_count = 4, it became slower than i set trainer_count = 1. the test_data size is 64.

when trainer_count = 1, the infer job cost 0.3s per img in a machine with 32 CPU cores, almost the same as cost time(0.3) in my notebook. Not faster?

the top command shows that 16 cores is busy and the other 16 cores are idle. so i run two other infer job like python infer.py at the same time. it cost 0.5s per img per job, about twice times better than one job only. But, can paddle do it self?

@luotao1
Copy link
Contributor

luotao1 commented Jan 5, 2018

Which paddle version do you use? You can use paddle version to see it.

@zengjialuo
Copy link
Author

paddle version info:

PaddlePaddle 0.10.0, compiled with
    with_avx: ON
    with_gpu: OFF
    with_mkldnn: OFF
    with_mklml: OFF
    with_double: OFF
    with_python: ON
    with_rdma: OFF
    with_timer: OFF

@luotao1
Copy link
Contributor

luotao1 commented Jan 5, 2018

从版本信息看,with_avx是ON的时候,MKL库不应该被关闭(请问您是从哪儿安装的paddle)。
最新的0.11.0版本,在with_avx是ON的时候,默认使用MKL库。

  • 可以使用最新版本,用MKL库来代替OpenBLAS库。系统已经设置了最优的环境变量。
  • 如果仍然使用该版本,需要在每次infer前设置如下环境变量:
unset OPENBLAS_NUM_THREADS
export OPENBLAS_MAIN_FREE=1
export OPENBLAS_NUM_THREADS=THREADS

其中THREADS=cpu核数/trainer_count,如果cpu核数 < trainer_count, THREADS=1.
OPENBLAS环境变量的自动设置后续会补充进代码。

@zengjialuo
Copy link
Author

谢谢,性能好很多了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants