ResNet-50 is slower on Volta since #8302 #9874

Caenorst · 2018-02-23T21:34:31Z

Description

I ran the Minimum reproducible example with the setup below at two different version (before and after #8302):
Here are the results:
d03182f (before #8302):
- real data: 5644 samples / s
- synthetic data: 5971 samples / s
c3e3a83 (after #8302):
- real data: 5461 samples / s
- synthetic data: 5740 samples / s
Latest:
- real data: 5425 samples / s
- synthetic data: 5817 samples / s

@ptrendx @DickJC123 @mkolod

Environment info (Required)

CPUs: Intel Xeon E5-2698 v4 (x2)
GPUs: Nvidia V100 (x8)

Build info (Required if built from source)

From the default config.mk (in make/config.mk) added:

USE_CUDA=1
USE_CUDNN=1
CUDA_ARCH := -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70
USE_CUDA_PATH=/usr/local/cuda
USE_LIBJPEG_TURBO=1
USE_LIBJPEG_TURBO_PATH=/usr
USE_NCCL=1

Minimum reproducible example

python /mxnet/example/image-classification/train_imagenet.py --benchmark 0 --gpu 0,1,2,3,4,5,6,7 --batch-size 1024 --num-epochs 1 --data-train /data/imagenet/train-480-val-256-recordio/train.rec --data-train-idx /data/imagenet/train-480-val-256-recordio/train.idx --data-val /data/imagenet/train-480-val-256-recordio/val.rec --disp-batches 100 --network resnet-v1 --num-layers 50 --data-nthreads 40 --min-random-scale 0.533 --max-random-shear-ratio 0 --max-random-rotate-angle 0 --max-random-h 0 --max-random-l 0 --max-random-s 0 --dtype float16 --kv-store device

The text was updated successfully, but these errors were encountered:

lupesko · 2018-02-26T23:57:05Z

@piiswrong @zheng-da - please take a look, this degradation may be related to your commit.

rahul003 · 2018-02-27T00:17:58Z

Are the speeds that you mention averages? If so, averaged over how many batches?

Caenorst · 2018-02-27T00:29:23Z

It's averaged over 1200 batches, I'm ignoring the 100 first batches.

cjolivier01 · 2018-02-27T18:54:10Z

@zheng-da

zheng-da · 2018-03-02T19:45:24Z

I think I may know what is the potential cause of this problem. I'll fix it next week.

zheng-da · 2018-03-10T01:41:28Z

I searched all commits in PR #8302. I think I have found the commits that cause the perf issue. However, I failed to fix the problem. I created a branch that contains the commits. https://github.com/zheng-da/incubator-mxnet/tree/refactor_bn

Basically, the commits that refactor BatchNorm cause the issue.
zheng-da@338dbca
zheng-da@aa5e69e

@Caenorst could you help look into the issue? Thanks

cjolivier01 · 2018-03-10T03:20:06Z

Is it know what part of the commit is the problem?
Are the performance characteristics of thread_local known for the supported platforms?

zheng-da · 2018-03-22T16:22:28Z

@Caenorst can you test it again? I measured the perf on p3. The PR #10116 should improve the perf by about 3% for your test case.

vandanavk · 2018-09-26T06:16:26Z

@Caenorst did @zheng-da's PR improve the performance on your setup?

vrakesh · 2018-11-27T21:50:11Z

@Caenorst Has the performance, loss been negated post @zheng-da 's PR? If so requesting to close the issue.

kalyc · 2018-12-10T19:35:08Z

@lanking520 requesting to close this issue due to lack of activity

lanking520 · 2018-12-10T19:50:19Z

@Caenorst Please feel free to reopen this issue if you are still facing this failure. Close it for now.

marcoabreu added the Performance label Feb 24, 2018

zheng-da mentioned this issue Mar 14, 2018

[MXNET-105] Fix CuDNN performance after code refactor #10116

Merged

5 tasks

lanking520 closed this as completed Dec 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ResNet-50 is slower on Volta since #8302 #9874

ResNet-50 is slower on Volta since #8302 #9874

Caenorst commented Feb 23, 2018 •

edited

Loading

lupesko commented Feb 26, 2018

rahul003 commented Feb 27, 2018

Caenorst commented Feb 27, 2018

cjolivier01 commented Feb 27, 2018

zheng-da commented Mar 2, 2018

zheng-da commented Mar 10, 2018 •

edited

Loading

cjolivier01 commented Mar 10, 2018

zheng-da commented Mar 22, 2018

vandanavk commented Sep 26, 2018

vrakesh commented Nov 27, 2018

kalyc commented Dec 10, 2018

lanking520 commented Dec 10, 2018

ResNet-50 is slower on Volta since #8302 #9874

ResNet-50 is slower on Volta since #8302 #9874

Comments

Caenorst commented Feb 23, 2018 • edited Loading

Description

Environment info (Required)

Build info (Required if built from source)

Minimum reproducible example

lupesko commented Feb 26, 2018

rahul003 commented Feb 27, 2018

Caenorst commented Feb 27, 2018

cjolivier01 commented Feb 27, 2018

zheng-da commented Mar 2, 2018

zheng-da commented Mar 10, 2018 • edited Loading

cjolivier01 commented Mar 10, 2018

zheng-da commented Mar 22, 2018

vandanavk commented Sep 26, 2018

vrakesh commented Nov 27, 2018

kalyc commented Dec 10, 2018

lanking520 commented Dec 10, 2018

Caenorst commented Feb 23, 2018 •

edited

Loading

zheng-da commented Mar 10, 2018 •

edited

Loading