From 06aaaf83482c6b88fe695461605a487d7d7cb048 Mon Sep 17 00:00:00 2001 From: Da Zheng Date: Wed, 2 May 2018 10:40:59 -0700 Subject: [PATCH] update perf. (#10761) --- docs/faq/perf.md | 231 ++++++++++++++++++++++++----------------------- 1 file changed, 119 insertions(+), 112 deletions(-) diff --git a/docs/faq/perf.md b/docs/faq/perf.md index b5d73f69a03c..ce74391122d1 100644 --- a/docs/faq/perf.md +++ b/docs/faq/perf.md @@ -29,65 +29,70 @@ Note that _MXNet_ treats all CPUs on a single machine as a single device. So whether you specify `cpu(0)` or `cpu()`, _MXNet_ will use all CPU cores on the machine. ### Scoring results -The following table shows performance, +The following table shows performance of [MXNet-1.2.0.rc1](https://github.com/apache/incubator-mxnet/releases/download/1.2.0.rc1/apache-mxnet-src-1.2.0.rc1-incubating.tar.gz), namely number of images that can be predicted per second. We used [example/image-classification/benchmark_score.py](https://github.com/dmlc/mxnet/blob/master/example/image-classification/benchmark_score.py) to measure the performance on different AWS EC2 machines. -AWS EC2 C4.8xlarge: - -| Batch | Alexnet | VGG | Inception-BN | Inception-v3 | Resnet 50 | Resnet 152 | -| --- | --- | --- | --- | --- | --- | --- | -| 1 | 119.57 | 34.23 | 111.36 | 54.42 | 42.83 | 19.51 | -| 2 | 210.58 | 51.63 | 137.10 | 67.30 | 57.54 | 23.56 | -| 4 | 318.54 | 70.00 | 187.21 | 76.53 | 63.64 | 25.80 | -| 8 | 389.34 | 77.39 | 211.90 | 84.26 | 63.89 | 28.11 | -| 16 | 489.12 | 85.26 | 220.52 | 82.00 | 63.93 | 27.08 | -| 32 | 564.04 | 87.15 | 208.21 | 83.05 | 62.19 | 25.76 | - -AWS EC2 C4.4xlarge: - -| Batch | Alexnet | VGG | Inception-BN | Inception-v3 | Resnet 50 | Resnet 152 | -| --- | --- | --- | --- | --- | --- | --- | -| 1 | 109.96 | 23.00 | 71.82 | 28.10 | 30.66 | 11.81 | -| 2 | 124.56 | 24.86 | 81.61 | 31.32 | 32.73 | 12.82 | -| 4 | 157.01 | 26.60 | 86.77 | 32.94 | 33.32 | 13.16 | -| 8 | 178.40 | 30.67 | 88.58 | 33.52 | 33.32 | 13.32 | -| 16 | 189.52 | 35.61 | 90.36 | 33.63 | 32.94 | 13.18 | -| 32 | 196.61 | 38.98 | 105.27 | 33.77 | 32.65 | 13.00 | - -AWS EC2 C4.2xlarge: - -| Batch | Alexnet | VGG | Inception-BN | Inception-v3 | Resnet 50 | Resnet 152 | -| --- | --- | --- | --- | --- | --- | --- | -| 1 | 70.75 | 12.87 | 42.86 | 16.53 | 18.14 | 7.01 | -| 2 | 71.53 | 13.08 | 45.66 | 17.38 | 18.53 | 7.18 | -| 4 | 84.72 | 15.38 | 47.50 | 17.80 | 18.96 | 7.35 | -| 8 | 93.44 | 18.33 | 48.08 | 17.93 | 18.99 | 7.40 | -| 16 | 97.03 | 20.12 | 55.73 | 18.00 | 18.91 | 7.36 | -| 32 | 113.90 | 21.10 | 62.54 | 17.98 | 18.80 | 7.33 | - -AWS EC2 C4.xlarge: - -| Batch | Alexnet | VGG | Inception-BN | Inception-v3 | Resnet 50 | Resnet 152 | -| --- | --- | --- | --- | --- | --- | --- | -| 1 | 37.92 | 6.57 | 23.09 | 8.79 | 9.65 | 3.73 | -| 2 | 36.77 | 7.31 | 24.00 | 9.00 | 9.84 | 3.78 | -| 4 | 43.18 | 8.94 | 24.42 | 9.12 | 9.91 | 3.83 | -| 8 | 47.05 | 10.01 | 28.32 | 9.13 | 9.88 | 3.83 | -| 16 | 55.74 | 10.61 | 31.96 | 9.14 | 9.86 | 3.80 | -| 32 | 65.05 | 10.91 | 33.86 | 9.34 | 10.31 | 3.86 | - -AWS EC2 C4.large: - -| Batch | Alexnet | VGG | Inception-BN | Inception-v3 | Resnet 50 | Resnet 152 | -| --- | --- | --- | --- | --- | --- | --- | -| 1 | 19.86 | 3.67 | 12.20 | 4.59 | 5.11 | 1.97 | -| 2 | 19.37 | 4.24 | 12.41 | 4.64 | 5.15 | 1.98 | -| 4 | 22.64 | 4.89 | 14.34 | 4.66 | 5.16 | 2.00 | -| 8 | 27.19 | 5.25 | 16.17 | 4.66 | 5.16 | 1.99 | -| 16 | 31.82 | 5.46 | 17.24 | 4.76 | 5.35 | OOM | -| 32 | 34.67 | 5.55 | 17.64 | 4.88 | OOM | OOM | +AWS EC2 C5.18xlarge: + +| Batch | Alexnet | VGG | Inception-BN | Inception-v3 | Resnet 50 | Resnet 152 | +|-------|---------|--------|--------------|--------------|-----------|------------| +| 1 | 390.53 | 81.57 | 124.13 | 62.26 | 76.22 | 32.92 | +| 2 | 596.45 | 100.84 | 206.58 | 93.36 | 119.55 | 46.80 | +| 4 | 710.77 | 119.04 | 275.55 | 127.86 | 148.62 | 59.36 | +| 8 | 921.40 | 120.38 | 380.82 | 157.11 | 167.95 | 70.78 | +| 16 | 1018.43 | 115.30 | 411.67 | 168.71 | 178.54 | 75.13 | +| 32 | 1290.31 | 107.19 | 483.34 | 179.38 | 193.47 | 85.86 | + + +AWS EC2 C5.9xlarge: + +| Batch | Alexnet | VGG | Inception-BN | Inception-v3 | Resnet 50 | Resnet 152 | +|-------|---------|-------|--------------|--------------|-----------|------------| +| 1 | 257.77 | 50.61 | 130.99 | 66.95 | 75.38 | 32.33 | +| 2 | 410.60 | 63.02 | 195.14 | 87.84 | 102.67 | 41.57 | +| 4 | 462.59 | 62.64 | 263.15 | 109.87 | 127.15 | 50.69 | +| 8 | 573.79 | 63.95 | 309.99 | 121.36 | 140.84 | 59.01 | +| 16 | 709.47 | 67.79 | 350.19 | 128.26 | 147.41 | 64.15 | +| 32 | 831.46 | 69.58 | 354.91 | 129.92 | 149.18 | 64.25 | + + +AWS EC2 C5.4xlarge: + +| Batch | Alexnet | VGG | Inception-BN | Inception-v3 | Resnet 50 | Resnet 152 | +|-------|---------|-------|--------------|--------------|-----------|------------| +| 1 | 214.15 | 29.32 | 114.97 | 47.96 | 61.01 | 23.92 | +| 2 | 310.04 | 34.81 | 150.09 | 60.89 | 71.16 | 27.92 | +| 4 | 330.69 | 34.56 | 186.63 | 74.15 | 86.86 | 34.37 | +| 8 | 378.88 | 35.46 | 204.89 | 77.05 | 91.10 | 36.93 | +| 16 | 424.00 | 36.49 | 211.55 | 78.39 | 91.23 | 37.34 | +| 32 | 481.95 | 37.23 | 213.71 | 78.23 | 91.68 | 37.26 | + + +AWS EC2 C5.2xlarge: + +| Batch | Alexnet | VGG | Inception-BN | Inception-v3 | Resnet 50 | Resnet 152 | +|-------|---------|-------|--------------|--------------|-----------|------------| +| 1 | 131.01 | 15.67 | 78.75 | 31.12 | 37.30 | 14.75 | +| 2 | 182.29 | 18.01 | 98.59 | 39.13 | 45.98 | 17.84 | +| 4 | 189.31 | 18.25 | 110.26 | 41.35 | 49.21 | 19.32 | +| 8 | 211.75 | 18.57 | 115.46 | 42.53 | 49.98 | 19.81 | +| 16 | 236.06 | 19.11 | 117.18 | 42.59 | 50.20 | 19.92 | +| 32 | 261.13 | 19.46 | 116.20 | 42.72 | 49.95 | 19.80 | + + +AWS EC2 C5.xlarge: + +| Batch | Alexnet | VGG | Inception-BN | Inception-v3 | Resnet 50 | Resnet 152 | +|-------|---------|------|--------------|--------------|-----------|------------| +| 1 | 36.64 | 3.93 | 27.06 | 10.09 | 12.98 | 5.06 | +| 2 | 49.21 | 4.49 | 29.67 | 10.80 | 12.94 | 5.14 | +| 4 | 50.12 | 4.50 | 30.31 | 10.83 | 13.17 | 5.19 | +| 8 | 54.71 | 4.58 | 30.22 | 10.89 | 13.19 | 5.20 | +| 16 | 60.23 | 4.70 | 30.20 | 10.91 | 13.23 | 5.19 | +| 32 | 66.37 | 4.76 | 30.10 | 10.90 | 13.22 | 5.15 | + ## Other CPU @@ -101,88 +106,90 @@ We suggest always checking to make sure that a recent cuDNN version is used. Setting the environment `export MXNET_CUDNN_AUTOTUNE_DEFAULT=1` sometimes also helps. -We show results when using various GPUs including K80 (EC2 p2.2xlarge), M40, -and P100 (DGX-1). +We show results when using various GPUs including K80 (EC2 p2.2xlarge), M60 (EC2 g3.4xlarge), +and V100 (EC2 p3.2xlarge). ### Scoring results Based on [example/image-classification/benchmark_score.py](https://github.com/dmlc/mxnet/blob/master/example/image-classification/benchmark_score.py) -and MXNet commit `0a03417`, with cuDNN 5.1 +and [MXNet-1.2.0.rc1](https://github.com/apache/incubator-mxnet/releases/download/1.2.0.rc1/apache-mxnet-src-1.2.0.rc1-incubating.tar.gz), with cuDNN 7.0.5 - K80 (single GPU) - | Batch | Alexnet | VGG | Inception-BN | Inception-v3 | Resnet 50 | Resnet 152 | - | --- | --- | --- | --- | --- | --- | --- | - | 1 | 202.66 | 70.76 | 74.91 | 42.61 | 70.94 | 24.87 | - | 2 | 233.76 | 63.53 | 119.60 | 60.09 | 92.28 | 34.23 | - | 4 | 367.91 | 78.16 | 164.41 | 72.30 | 116.68 | 44.76 | - | 8 | 624.14 | 119.06 | 195.24 | 79.62 | 129.37 | 50.96 | - | 16 | 1071.19 | 195.83 | 256.06 | 99.38 | 160.40 | 66.51 | - | 32 | 1443.90 | 228.96 | 287.93 | 106.43 | 167.12 | 69.73 | - -- M40 - - | Batch | Alexnet | VGG | Inception-BN | Inception-v3 | Resnet 50 | Resnet 152 | - | --- | --- | --- | --- | --- | --- | --- | - | 1 | 412.09 | 142.10 | 115.89 | 64.40 | 126.90 | 46.15 | - | 2 | 743.49 | 212.21 | 205.31 | 108.06 | 202.17 | 75.05 | - | 4 | 1155.43 | 280.92 | 335.69 | 161.59 | 266.53 | 106.83 | - | 8 | 1606.87 | 332.76 | 491.12 | 224.22 | 317.20 | 128.67 | - | 16 | 2070.97 | 400.10 | 618.25 | 251.87 | 335.62 | 134.60 | - | 32 | 2694.91 | 466.95 | 624.27 | 258.59 | 373.35 | 152.71 | - -- P100 - - | Batch | Alexnet | VGG | Inception-BN | Inception-v3 | Resnet 50 | Resnet 152 | - | --- | --- | --- | --- | --- | --- | --- | - | 1 | 624.84 | 294.6 | 139.82 | 80.17 | 162.27 | 58.99 | - | 2 | 1226.85 | 282.3 | 267.41 | 142.63 | 278.02 | 102.95 | - | 4 | 1934.97 | 399.3 | 463.38 | 225.56 | 423.63 | 168.91 | - | 8 | 2900.54 | 522.9 | 709.30 | 319.52 | 529.34 | 210.10 | - | 16 | 4063.70 | 755.3 | 949.22 | 444.65 | 647.43 | 270.07 | - | 32 | 4883.77 | 854.4 | 1197.74 | 493.72 | 713.17 | 294.17 | +| Batch | Alexnet | VGG | Inception-BN | Inception-v3 | Resnet 50 | Resnet 152 | +|-------|---------|--------|--------------|--------------|-----------|------------| +| 1 | 243.93 | 43.59 | 68.62 | 35.52 | 67.41 | 23.65 | +| 2 | 338.16 | 49.14 | 113.41 | 56.29 | 93.35 | 33.88 | +| 4 | 478.92 | 53.44 | 159.61 | 74.43 | 119.18 | 45.23 | +| 8 | 683.52 | 70.50 | 190.49 | 86.23 | 131.32 | 50.54 | +| 16 | 1004.66 | 109.01 | 254.20 | 105.70 | 155.40 | 62.55 | +| 32 | 1238.55 | 114.98 | 285.49 | 116.79 | 159.42 | 64.99 | + +- M60 + +| Batch | Alexnet | VGG | Inception-BN | Inception-v3 | Resnet 50 | Resnet 152 | +|-------|---------|--------|--------------|--------------|-----------|------------| +| 1 | 243.49 | 59.95 | 101.97 | 48.30 | 95.46 | 39.29 | +| 2 | 491.04 | 69.14 | 170.35 | 80.27 | 142.61 | 60.17 | +| 4 | 711.54 | 78.94 | 257.89 | 123.09 | 182.36 | 76.51 | +| 8 | 1077.73 | 109.34 | 343.42 | 152.82 | 208.74 | 87.27 | +| 16 | 1447.21 | 144.93 | 390.25 | 166.32 | 220.73 | 92.41 | +| 32 | 1797.66 | 151.86 | 416.69 | 176.56 | 230.19 | 97.03 | + + +- V100 + +| Batch | Alexnet | VGG | Inception-BN | Inception-v3 | Resnet 50 | Resnet 152 | +|-------|---------|--------|--------------|--------------|-----------|------------| +| 1 | 659.51 | 205.16 | 136.91 | 76.54 | 162.15 | 61.38 | +| 2 | 1248.21 | 265.40 | 261.85 | 144.23 | 293.74 | 116.30 | +| 4 | 2122.41 | 333.97 | 477.22 | 270.03 | 479.14 | 195.17 | +| 8 | 3894.30 | 420.26 | 831.09 | 450.68 | 699.39 | 294.19 | +| 16 | 5815.58 | 654.16 | 1332.26 | 658.97 | 947.45 | 398.79 | +| 32 | 7906.09 | 708.43 | 1784.23 | 817.33 | 1076.81 | 451.82 | + ### Training results Based on [example/image-classification/train_imagenet.py](https://github.com/dmlc/mxnet/blob/master/example/image-classification/train_imagenet.py) -and MXNet commit `0a03417`, with CUDNN 5.1. The benchmark script is available at +and [MXNet-1.2.0.rc1](https://github.com/apache/incubator-mxnet/releases/download/1.2.0.rc1/apache-mxnet-src-1.2.0.rc1-incubating.tar.gz), with CUDNN 7.0.5. The benchmark script is available at [here](https://github.com/mli/mxnet-benchmark/blob/master/run_vary_batch.sh), -where the batch size for Alexnet is increased by 8x. +where the batch size for Alexnet is increased by 16x. - K80 (single GPU) | Batch | Alexnet(\*8) | Inception-v3 | Resnet 50 | | --- | --- | --- | --- | - | 1 | 230.69 | 9.81 | 13.83 | - | 2 | 348.10 | 15.31 | 21.85 | - | 4 | 457.28 | 20.48 | 29.58 | - | 8 | 533.51 | 24.47 | 36.83 | - | 16 | 582.36 | 28.46 | 43.60 | - | 32 | 483.37 | 29.62 | 45.52 | + | 1 | 300.30 | 10.48 | 15.61 | + | 2 | 406.08 | 16.00 | 23.88 | + | 4 | 461.01 | 22.10 | 32.26 | + | 8 | 484.00 | 26.80 | 39.42 | + | 16 | 490.45 | 31.62 | 46.69 | + | 32 | 414.72 | 33.78 | 49.48 | -- M40 +- M60 - | Batch | Alexnet(\*8) | Inception-v3 | Resnet 50 | + | Batch | Alexnet(\*16) | Inception-v3 | Resnet 50 | | --- | --- | --- | --- | - | 1 | 405.17 | 14.35 | 21.56 | - | 2 | 606.32 | 23.96 | 36.48 | - | 4 | 792.66 | 37.38 | 52.96 | - | 8 | 1016.51 | 52.69 | 70.21 | - | 16 | 1105.18 | 62.35 | 83.13 | - | 32 | 1046.23 | 68.87 | 90.74 | + | 1 | 380.96 | 14.06 | 20.55 | + | 2 | 530.53 | 21.90 | 32.65 | + | 4 | 600.17 | 31.96 | 45.57 | + | 8 | 633.60 | 40.58 | 54.92 | + | 16 | 639.37 | 46.88 | 64.44 | + | 32 | 576.54 | 50.05 | 68.34 | -- P100 +- V100 - | Batch | Alexnet(\*8) | Inception-v3 | Resnet 50 | + | Batch | Alexnet(\*16) | Inception-v3 | Resnet 50 | | --- | --- | --- | --- | - | 1 | 809.94 | 15.14 | 27.20 | - | 2 | 1202.93 | 30.34 | 49.55 | - | 4 | 1631.37 | 50.59 | 78.31 | - | 8 | 1882.74 | 77.75 | 122.45 | - | 16 | 2012.04 | 111.11 | 156.79 | - | 32 | 1869.69 | 129.98 | 181.53 | + | 1 | 1629.52 | 21.83 | 34.54 | + | 2 | 2359.73 | 40.11 | 65.01 | + | 4 | 2687.89 | 72.79 | 113.49 | + | 8 | 2919.02 | 118.43 | 174.81 | + | 16 | 2994.32 | 173.15 | 251.22 | + | 32 | 2585.61 | 214.48 | 298.51 | ## Multiple Devices