[Dependency Update] Bump up cuDNN & NCCL version #15142

stu1130 · 2019-06-04T01:12:48Z

Description

un three models ResNet50 with ImageNet & LSTM with PTB & MLP with MNIST
Performance shown below
Environment: P3.16xlarge Deep Learning Base AMI
Codebase: commit 1540a84 for CUDA 9/9.2/10 1540a84 for CUDA 10
I also applied the #14837 PR change
The unit of thoughput is samples/per second
Each throughput is calcuated by average of 5 runs

ResNet

model: Resnet50
dataset: Imagenet
number of gpu: 8
epochs: 3 (only to test throughput)
preprocess command: sudo pip install gluoncv==0.2.0b20180625
command: python mxnet_benchmark/train_imagenet.py --use-rec --batch-size 128 --dtype float32 —num-data-workers 40 —num-epochs 3 —gpus 0,1,2,3,4,5,6,7 --lr 0.05 --last-gamma —mode symbolic —model resnet50_v1b —rec-train /home/ubuntu/data/train-passthrough.rec —rec-train-idx /home/ubuntu/data/train-passthrough.idx —rec-val /home/ubuntu/data/val-passthrough.rec —rec-val-idx /home/ubuntu/data/val-passthrough.idx
github repo: https://github.com/rahul003/deep-learning-benchmark-mirror.git*

CUDA + MKLDNN

Throughput Tables	cuDNN 7.6.0/NCCL 2.4.7	cuDNN 7.5.1/NCCL 2.3.4	Perforamnce Difference
CUDA 10.1	2806.35499	2817.18815	-0.385%
CUDA 10	2826.54083	2831.54405	-0.178%
CUDA 9.2	2812.30931	2832.36803	-0.708%
CUDA 9.0	2783.51629	2815.83939	-1.148%

Reference(only 3 times run)
without MKLDNN

Throughput Tables	cuDNN 7.6.0/NCCL 2.4.2
CUDA 10.1	2832.42231
CUDA 10	2838.54
CUDA 9.2	2838.424
CUDA 9.0	2833.86458

LSTM

model: LSTM
dataset: PTB(Penn Treebank)
number of gpu: 1
epochs: 10
command:
python2 benchmark_driver.py --framework mxnet --task-name mkl_lstm_ptb_symbolic --num-gpus 1 --epochs 10 --metrics-suffix test --kvstore local
python word_language_model/lstm_bucketing.py —num-hidden 650 —num-embed 650 —gpus 0 --epochs 10 --kv-store local

CUDA + MKLDNN

Throughput Tables	cuDNN 7.6.0/NCCL 2.4.2	cuDNN 7.5.1/NCCL 2.3.4	Perforamnce Difference
CUDA 10.1	1018.89083	1015.61785	0.322%
CUDA 10	852.80333	847.98222	0.569%
CUDA 9.2	1011.61122	1005.25185	0.632%
CUDA 9.0	992.34674	1002.59081	-1.021%

The CUDA 10 have a performance regression issue, please see #14725 to find more details.

Reference(only 3 times run)
without MKLDNN

Throughput Tables	cuDNN 7.6.0/NCCL 2.4.2
CUDA 10.1	1010.1654
CUDA 10	846.05572
CUDA 9.2	1007.27178
CUDA 9.0	978.18158

MLP

model: 3 dense layers with num_hidden=64 and relu as activation
dataset: MNIST
number of gpu: 1
epochs: 10
command:
python2 benchmark_runner.py —framework mxnet —metrics-policy mlp —task-name mlp —metrics-suffix test —num-gpus 1 —command-to-execute 'python3 mlp.py' —data-set mnist

CUDA + MKLDNN

Throughput Tables	cuDNN 7.6.0/NCCL 2.4.2	cuDNN 7.5.1/NCCL 2.3.4	Perforamnce Difference
CUDA 10.1	4438.0091	4422.72478	0.346%
CUDA 10	4433.65315	4638.73873	-4.421%
CUDA 9.2	4439.18763	4425.37599	0.312%
CUDA 9.0	4505.45334	4421.82611	1.891%

Reference(only 3 times run)
without MKLDNN

Throughput Tables	cuDNN 7.6.0/NCCL 2.4.2
CUDA 10.1	4515.74059
CUDA 10	4349.40602
CUDA 9.2	4492.37239
CUDA 9.0	4211.6375

Comments

@szha @lanking520

piyushghai · 2019-06-04T20:13:52Z

@stu1130 Can you look into the CI failures ?

@mxnet-label-bot Add[pr-awaiting-review, Backend]

* bump up cudnn version * downgrade tensorRT to 7.5 * bump up NCCL 2.4.7

stu1130 requested a review from szha as a code owner June 4, 2019 01:12

stu1130 changed the title ~~bump up cudnn version~~ [Dependency Update] bump up cudnn version Jun 4, 2019

stu1130 changed the title ~~[Dependency Update] bump up cudnn version~~ [Dependency Update] Bump up cudnn version Jun 4, 2019

stu1130 force-pushed the bump_up_cudnn branch from 8dc6b48 to ef20ff2 Compare June 4, 2019 05:49

marcoabreu added Backend Issues related to the backend of MXNet pr-awaiting-review PR is waiting for code review labels Jun 4, 2019

stu1130 changed the title ~~[Dependency Update] Bump up cudnn version~~ [Dependency Update] Bump up cuDNN & NCCL version Jun 13, 2019

stu1130 changed the title ~~[Dependency Update] Bump up cuDNN & NCCL version~~ [WIP][Dependency Update] Bump up cuDNN & NCCL version Jun 13, 2019

stu1130 force-pushed the bump_up_cudnn branch from 58e25ec to b287eb7 Compare June 14, 2019 20:08

stu1130 changed the title ~~[WIP][Dependency Update] Bump up cuDNN & NCCL version~~ [Dependency Update] Bump up cuDNN & NCCL version Jun 16, 2019

stu1130 added 3 commits June 15, 2019 17:15

bump up cudnn version

29d3efd

downgrade tensorRT to 7.5

5288936

bump up NCCL 2.4.7

eb78eac

stu1130 force-pushed the bump_up_cudnn branch from b287eb7 to eb78eac Compare June 16, 2019 00:15

szha merged commit c4ea674 into apache:master Jun 16, 2019

stu1130 deleted the bump_up_cudnn branch June 16, 2019 20:27

haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019

[Dependency Update] Bump up cuDNN & NCCL version (apache#15142)

3ed6d74

* bump up cudnn version * downgrade tensorRT to 7.5 * bump up NCCL 2.4.7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dependency Update] Bump up cuDNN & NCCL version #15142

[Dependency Update] Bump up cuDNN & NCCL version #15142

stu1130 commented Jun 4, 2019 •

edited

Loading

piyushghai commented Jun 4, 2019

[Dependency Update] Bump up cuDNN & NCCL version #15142

[Dependency Update] Bump up cuDNN & NCCL version #15142

Conversation

stu1130 commented Jun 4, 2019 • edited Loading

Description

ResNet

LSTM

MLP

Comments

piyushghai commented Jun 4, 2019

stu1130 commented Jun 4, 2019 •

edited

Loading