You ask — we answer!

Mxnet benchmark

Mxnet benchmark

The purpose of the following article is to present results of testing mxnet on various GPU’s and compare costs of data processing on AWS vs LeaderGPU.

The following table shows the performance test results, namely the number of images that can be processed per unit of time (measured in seconds).

Scoring results

We've used the official benchmark of mxnet benchmark_score.py and cuDNN 6.0

Results are following:

Ltbv 14 (GTX 1080 single):

Batch Alexnet VGG Inception-BN Inception-v3 Resnet 50 Resnet 152
1 484.21 193.85 168.71 81.41 169.28 68.38
2 829.39 188.82 301.96 143.85 257.01 106.14
4 1255.73 279.13 472.93 207.13 335.21 141.79
8 2103.98 361.24 653.59 306.01 391.55 166.75
16 2531.79 467.27 765.40 312.82 429.70 182.00
32 3295.19 525.39 826.41 345.28 453.44 191.84

Ltbv 19 (GTX 1080TI single):

Batch Alexnet VGG Inception-BN Inception-v3 Resnet 50 Resnet 152
1 567.68 275.97 161.42 84.16 184.49 66.96
2 1087.14 193.07 317.53 150.89 322.72 123.60
4 1556.19 314.09 558.75 244.64 465.17 197.84
8 2543.31 611.98 846.34 362.18 578.75 249.68
16 4033.49 757.19 1101.56 478.16 660.71 279.48
32 5435.72 827.58 1216.74 506.96 697.81 291.98

Ltbv 20 (Tesla P100 single):

Batch Alexnet VGG Inception-BN Inception-v3 Resnet 50 Resnet 152
1 540.79 273.88 122.57 64.97 140.88 53.96
2 985.37 352.77 249.28 119.37 245.47 94.08
4 1570.85 478.42 424.92 195.56 374.74 155.38
8 2556.58 586.46 646.51 285.70 489.39 197.19
16 4162.64 819.96 899.54 392.91 609.39 255.19
32 5565.39 889.52 1116.51 468.03 682.52 279.87

Below are testing results when using GPU’s on AWS K80 (EC2 p2.xlarge), M40 and P100 (DGX-1).

There used here an official benchmark of mxnet benchmark_score.py.

Test results are quoted from mxnet official web page.

Results are following:

K80 (single GPU):

Batch Alexnet VGG Inception-BN Inception-v3 Resnet 50 Resnet 152
1 202.66 70.76 74.91 42.61 70.94 24.87
2 233.76 63.53 119.60 60.09 92.28 34.23
4 367.91 78.16 164.41 72.30 116.68 44.76
8 624.14 119.06 195.24 79.62 129.37 50.96
16 1071.19 195.83 256.06 99.38 160.40 66.51
32 1443.90 228.96 287.93 106.43 167.12 69.73

M40(single GPU):

Batch Alexnet VGG Inception-BN Inception-v3 Resnet 50 Resnet 152
1 412.09 142.10 115.89 64.40 126.90 46.15
2 743.49 212.21 205.31 108.06 202.17 75.05
4 1155.43 280.92 335.69 161.59 266.53 106.83
8 1606.87 332.76 491.12 224.22 317.20 128.67
16 2070.97 400.10 618.25 251.87 335.62 134.60
32 2694.91 466.95 624.27 258.59 373.35 152.71

P100 (single GPU):

Batch Alexnet VGG Inception-BN Inception-v3 Resnet 50 Resnet 152
1 624.84 294.6 139.82 80.17 162.27 58.99
2 1226.85 282.3 267.41 142.63 278.02 102.95
4 1934.97 399.3 463.38 225.56 423.63 168.91
8 2900.54 522.9 709.30 319.52 529.34 210.10
16 4063.70 755.3 949.22 444.65 647.43 270.07
32 4883.77 854.4 1197.74 493.72 713.17 294.17

Next we would like to address to the pricing part of this test, we will calculate the cost price of processing 1,000,000 images and the time spent on the process, with batch_size 32:

Results are following:

GPU Model Time Price(per min) Total cost
GTX 1080 Alexnet 5m 3sec 0,06 € 0,303 €
GTX 1080 TI 3m 3sec 0,07 € 0,2135 €
P100 2m 55sec 0,08 € 0,23333 €
K80 11m 31sec 0,01 € 0,11517 €
GTX 1080 VGG 31m 43sec 0,06 € 1,903 €
GTX 1080 TI 20m 8sec 0,07 € 1,40933 €
P100 18m 40sec 0,08 € 1,49333 €
K80 72m 50sec 0,01 € 0,72833 €
GTX 1080 Inception-BN 20m 21sec 0,06 € 1,221 €
GTX 1080 TI 13m 41sec 0,07 € 0,95783 €
P100 14m 58sec 0,08 € 1,19733 €
K80 57m 45sec 0,01 € 0,5775 €
GTX 1080 Inception-v3 48m 15sec 0,06 € 2,895 €
GTX 1080 TI 32m 49sec 0,07 € 2,29717 €
P100 35m 38sec 0,08 € 2,85067 €
K80 156m 35sec 0,01 € 1,56583 €
GTX 1080 Resnet 50 36m 46sec 0,06 € 2,206 €
GTX 1080 TI 23m 53sec 0,07 € 1,67183 €
P100 24m 27sec 0,08 € 1,956 €
K80 99m 39sec 0,01 € 0,9965 €
GTX 1080 Resnet 152 86m 50sec 0,06 € 5,21 €
GTX 1080 TI 57m 5sec 0,07 € 3,99583 €
P100 59m 32sec 0,08 € 4,76267 €
K80 239m 1sec 0,01 € 2,39017 €

Training results

In the following section, we will review test results of training networks in mxnet.

These results are based on example/image-classification/train_imagenet.py and cuDNN 6.0. In addition testing script is available here. For the Alexnet network batch size increased by 8 times.

Ltbv 14 (GTX 1080 single):

Batch Alexnet(*8) Resnet 50 Inception-v3
1 432,26 11,78 21,21
2 655,14 19,31 34,56
4 989,15 29,61 49,79
8 1167,86 39,83 71,92
16 1343,68 48,72 80,80
32 1407,41 -** 87,93

Ltbv 19 (GTX 1080TI single):

Batch Alexnet(*8) Resnet 50 Inception-v3
1 1068,59 13,75 21,84
2 1341,03 23,20 39,08
4 1573,10 37,93 62,49
8 1770,16 54,98 90,64
16 1850,01 69,26 114,24
32 1729,24 75,57 124,84

Ltbv 20 (Tesla P100 single):

Batch Alexnet(*8) Resnet 50 Inception-v3
1 1138,47 10,60 21,73
2 1462,89 20,29 33,05
4 1717,54 35,05 57,97
8 1914,71 51,05 83,90
16 1977,86 67,17 109,90
32 1754,03 77,74 123,48

** available amount of GPU memory is not enough for batch processing.

K80 (single GPU)

Batch Alexnet(*8) Resnet 50 Inception-v3
1 230.69 9.81 13.83
2 348.10 15.31 21.85
4 457.28 20.48 29.58
8 533.51 24.47 36.83
16 582.36 28.46 43.60
32 483.37 29.62 45.52

M40(single GPU)

Batch Alexnet(*8) Resnet 50 Inception-v3
1 405.17 14.35 21.56
2 606.32 23.96 36.48
4 792.66 37.38 52.96
8 1016.51 52.69 70.21
16 1105.18 62.35 83.13
32 1046.23 68.87 90.74

P100(single GPU)

Batch Alexnet(*8) Resnet 50 Inception-v3
1 809.94 15.14 27.20
2 1202.93 30.34 49.55
4 1631.37 50.59 78.31
8 1882.74 77.75 122.45
16 2012.04 111.11 156.79
32 1869.69 129.98 181.53

Training results on Multiple Devices

This section will be devoted to the analysis of data collected from testing mxnet when using several GPU’s on the LeadersGPU's instances.

Ltbv 14 (4х GTX 1080):

Batch Alexnet(*8) Resnet 50 Inception-v3
1 98,30 -* -*
2 193,09 -* -*
4 384,72 26,76 48,76
8 723,01 46,96 88,99
16 13341,50 68,90 155,29
32 1839,47 93,37 236,57

Ltbv 19 (4х GTX 1080TI):

Batch Alexnet(*8) Resnet 50 Inception-v3
1 126,66 -* -*
2 217,72 -* -*
4 422,59 30,22 49,37
8 768,48 56,97 94,79
16 1599,70 103,13 165,64
32 2973,28 172,37 275,70

Ltbv 20 (2х Tesla P100):

Batch Alexnet(*8) Resnet 50 Inception-v3
1 465,45 -* -*
2 637,42 18,47 15,76
4 1002,77 33,48 28,98
8 1857,60 63,66 46,13
16 2755,08 93,42 63,84
32 3500,40 129,25 78,66

* Too many slices therefore some splits are empty

Still have questions? Write to us!