Mxnet benchmark

Mxnet™ benchmark with LeaderGPU® servers

Attention: due to the newly amended License for Customer Use of Nvidia® GeForce® Sofware, the GPUs presented in the benchmark (GTX 1080, GTX 1080 TI) can not be used for training neural networks.(except blockchain processing).

The purpose of the following article is to present results of testing mxnet™ on various GPU’s and compare costs of data processing on AWS vs LeaderGPU®.

The following table shows the performance test results, namely the number of images that can be processed per unit of time (measured in seconds).

Scoring results

We've used the official benchmark of mxnet™ benchmark_score.py and cuDNN 6.0

Results are following:

Ltbv 14 (GTX 1080 single):

Batch	Alexnet	VGG	Inception-BN	Inception-v3	Resnet 50	Resnet 152
1	484.21	193.85	168.71	81.41	169.28	68.38
2	829.39	188.82	301.96	143.85	257.01	106.14
4	1255.73	279.13	472.93	207.13	335.21	141.79
8	2103.98	361.24	653.59	306.01	391.55	166.75
16	2531.79	467.27	765.40	312.82	429.70	182.00
32	3295.19	525.39	826.41	345.28	453.44	191.84

Ltbv 19 (GTX 1080TI single):

Batch	Alexnet	VGG	Inception-BN	Inception-v3	Resnet 50	Resnet 152
1	567.68	275.97	161.42	84.16	184.49	66.96
2	1087.14	193.07	317.53	150.89	322.72	123.60
4	1556.19	314.09	558.75	244.64	465.17	197.84
8	2543.31	611.98	846.34	362.18	578.75	249.68
16	4033.49	757.19	1101.56	478.16	660.71	279.48
32	5435.72	827.58	1216.74	506.96	697.81	291.98

Ltbv 20 (Tesla P100 single):

Batch	Alexnet	VGG	Inception-BN	Inception-v3	Resnet 50	Resnet 152
1	540.79	273.88	122.57	64.97	140.88	53.96
2	985.37	352.77	249.28	119.37	245.47	94.08
4	1570.85	478.42	424.92	195.56	374.74	155.38
8	2556.58	586.46	646.51	285.70	489.39	197.19
16	4162.64	819.96	899.54	392.91	609.39	255.19
32	5565.39	889.52	1116.51	468.03	682.52	279.87

Below are testing results when using GPU’s on AWS K80 (EC2 p2.xlarge), M40 and P100 (DGX-1).

There used here an official benchmark of mxnet™ benchmark_score.py.

Test results are quoted from mxnet™ official web page.

Results are following:

K80 (single GPU):

Batch	Alexnet	VGG	Inception-BN	Inception-v3	Resnet 50	Resnet 152
1	202.66	70.76	74.91	42.61	70.94	24.87
2	233.76	63.53	119.60	60.09	92.28	34.23
4	367.91	78.16	164.41	72.30	116.68	44.76
8	624.14	119.06	195.24	79.62	129.37	50.96
16	1071.19	195.83	256.06	99.38	160.40	66.51
32	1443.90	228.96	287.93	106.43	167.12	69.73

M40(single GPU):

Batch	Alexnet	VGG	Inception-BN	Inception-v3	Resnet 50	Resnet 152
1	412.09	142.10	115.89	64.40	126.90	46.15
2	743.49	212.21	205.31	108.06	202.17	75.05
4	1155.43	280.92	335.69	161.59	266.53	106.83
8	1606.87	332.76	491.12	224.22	317.20	128.67
16	2070.97	400.10	618.25	251.87	335.62	134.60
32	2694.91	466.95	624.27	258.59	373.35	152.71

P100 (single GPU):

Batch	Alexnet	VGG	Inception-BN	Inception-v3	Resnet 50	Resnet 152
1	624.84	294.6	139.82	80.17	162.27	58.99
2	1226.85	282.3	267.41	142.63	278.02	102.95
4	1934.97	399.3	463.38	225.56	423.63	168.91
8	2900.54	522.9	709.30	319.52	529.34	210.10
16	4063.70	755.3	949.22	444.65	647.43	270.07
32	4883.77	854.4	1197.74	493.72	713.17	294.17

Next we would like to address to the pricing part of this test, we will calculate the cost price of processing 1,000,000 images and the time spent on the process, with batch_size 32:

Results are following:

GPU	Model	Time	Price(per min)	Total cost
GTX 1080	Alexnet	5m 3sec	0,06 €	0,303 €
GTX 1080 TI		3m 3sec	0,07 €	0,2135 €
P100		2m 55sec	0,08 €	0,23333 €
K80		11m 31sec	0,01 €	0,11517 €
GTX 1080	VGG	31m 43sec	0,06 €	1,903 €
GTX 1080 TI		20m 8sec	0,07 €	1,40933 €
P100		18m 40sec	0,08 €	1,49333 €
K80		72m 50sec	0,01 €	0,72833 €
GTX 1080	Inception-BN	20m 21sec	0,06 €	1,221 €
GTX 1080 TI		13m 41sec	0,07 €	0,95783 €
P100		14m 58sec	0,08 €	1,19733 €
K80		57m 45sec	0,01 €	0,5775 €
GTX 1080	Inception-v3	48m 15sec	0,06 €	2,895 €
GTX 1080 TI		32m 49sec	0,07 €	2,29717 €
P100		35m 38sec	0,08 €	2,85067 €
K80		156m 35sec	0,01 €	1,56583 €
GTX 1080	Resnet 50	36m 46sec	0,06 €	2,206 €
GTX 1080 TI		23m 53sec	0,07 €	1,67183 €
P100		24m 27sec	0,08 €	1,956 €
K80		99m 39sec	0,01 €	0,9965 €
GTX 1080	Resnet 152	86m 50sec	0,06 €	5,21 €
GTX 1080 TI		57m 5sec	0,07 €	3,99583 €
P100		59m 32sec	0,08 €	4,76267 €
K80		239m 1sec	0,01 €	2,39017 €

Training results

In the following section, we will review test results of training networks in mxnet™.

These results are based on example/image-classification/train_imagenet.py and cuDNN 6.0. In addition testing script is available here. For the Alexnet network batch size increased by 8 times.

Ltbv 14 (GTX 1080 single):

Batch	Alexnet(*8)	Resnet 50	Inception-v3
1	432,26	11,78	21,21
2	655,14	19,31	34,56
4	989,15	29,61	49,79
8	1167,86	39,83	71,92
16	1343,68	48,72	80,80
32	1407,41	-**	87,93

Ltbv 19 (GTX 1080TI single):

Batch	Alexnet(*8)	Resnet 50	Inception-v3
1	1068,59	13,75	21,84
2	1341,03	23,20	39,08
4	1573,10	37,93	62,49
8	1770,16	54,98	90,64
16	1850,01	69,26	114,24
32	1729,24	75,57	124,84

Ltbv 20 (Tesla P100 single):

Batch	Alexnet(*8)	Resnet 50	Inception-v3
1	1138,47	10,60	21,73
2	1462,89	20,29	33,05
4	1717,54	35,05	57,97
8	1914,71	51,05	83,90
16	1977,86	67,17	109,90
32	1754,03	77,74	123,48

** available amount of GPU memory is not enough for batch processing.

K80 (single GPU)

Batch	Alexnet(*8)	Resnet 50	Inception-v3
1	230.69	9.81	13.83
2	348.10	15.31	21.85
4	457.28	20.48	29.58
8	533.51	24.47	36.83
16	582.36	28.46	43.60
32	483.37	29.62	45.52

M40(single GPU)

Batch	Alexnet(*8)	Resnet 50	Inception-v3
1	405.17	14.35	21.56
2	606.32	23.96	36.48
4	792.66	37.38	52.96
8	1016.51	52.69	70.21
16	1105.18	62.35	83.13
32	1046.23	68.87	90.74

P100(single GPU)

Batch	Alexnet(*8)	Resnet 50	Inception-v3
1	809.94	15.14	27.20
2	1202.93	30.34	49.55
4	1631.37	50.59	78.31
8	1882.74	77.75	122.45
16	2012.04	111.11	156.79
32	1869.69	129.98	181.53

Training results on Multiple Devices

This section will be devoted to the analysis of data collected from testing mxnet™ when using several GPU’s on the LeadersGPU's instances.

Ltbv 14 (4х GTX 1080):

Batch	Alexnet(*8)	Resnet 50	Inception-v3
1	98,30	-*	-*
2	193,09	-*	-*
4	384,72	26,76	48,76
8	723,01	46,96	88,99
16	13341,50	68,90	155,29
32	1839,47	93,37	236,57

Ltbv 19 (4х GTX 1080TI):

Batch	Alexnet(*8)	Resnet 50	Inception-v3
1	126,66	-*	-*
2	217,72	-*	-*
4	422,59	30,22	49,37
8	768,48	56,97	94,79
16	1599,70	103,13	165,64
32	2973,28	172,37	275,70

Ltbv 20 (2х Tesla® P100):

Batch	Alexnet(*8)	Resnet 50	Inception-v3
1	465,45	-*	-*
2	637,42	18,47	15,76
4	1002,77	33,48	28,98
8	1857,60	63,66	46,13
16	2755,08	93,42	63,84
32	3500,40	129,25	78,66

* Too many slices therefore some splits are empty

LEGAL WARNING:

PLEASE READ THE LICENSE FOR CUSTOMER USE OF NVIDIA® GEFORCE® SOFTWARE CAREFULLY BEFORE AGREEING TO IT, AND MAKE SURE YOU USE THE SOFTWARE IN ACCORDANCE WITH THE LICENSE, THE MOST IMPORTANT PROVISION IN THIS RESPECT BEING THE FOLLOWING LIMITATION OF USE OF THE SOFTWARE IN DATACENTERS:

«No Datacenter Deployment. The SOFTWARE is not licensed for datacenter deployment, except that blockchain processing in a datacenter is permitted.»

BY AGREEING TO THE LICENSE AND DOWNLOADING THE SOFTWARE YOU GUARANTEE THAT YOU WILL MAKE CORRECT USE OF THE SOFTWARE AND YOU AGREE TO INDEMNIFY AND HOLD US HARMLESS FROM ANY CLAIMS, DAMAGES OR LOSSES RESULTING FROM ANY INCORRECT USE OF THE SOFTWARE BY YOU.

Updated: 04.01.2026

Published: 07.12.2017

Mxnet benchmark

Mxnet™ benchmark with LeaderGPU® servers

Still have questions? Write to us!