Tesla TensorFlow

Summary of test model results for the images classification with Tesla LeaderGPU servers

LeaderGPU is a new player in the GPU computing market, and it intends to change the rules of the game. At this moment in time, the GPU computing market comprises several large players such as Amazon AWS, Google Cloud, etc. However, a large player does not always mean the best market offer. The LeaderGPU project, in comparison to Amazon AWS and Google Cloud, provides physical servers, not VPS, where hardware resources can be shared among several dozens of users.

Tests were conducted on the LeaderGPU Tesla computing systems on synthetic data of the following network models: ResNet-50, ResNet-152, VGG16 and AlexNet. At the end of this article, you will find the results of tests carried out on other models. The testing of synthetic data was performed using tf.Variable in analogy with the models configured for ImageNet.

The following commands were used to run the test:

# git clone https://github.com/tensorflow/benchmarks.git

# python3.5 benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --num_gpus=2 --model alexnet (vgg11, vgg16, etc.) --batch_size 32 (64, 128, 256, 512)

LeaderGPU Tesla instances

Testing environment:2 x Tesla P100 PCI (ltbv32), 2 x Tesla V100 PCI (ltbv20), 2 x Tesla V100 NVLink (ltbv46)
Instance type:2 x Tesla P100 PCI (ltbv32), 2 x Tesla V100 PCI (ltbv20), 2 x Tesla V100 NVLink (ltbv46)
GPUs:Nvidia Tesla cards
OS:CentOS 7
CUDA / cuDNN:9.0 / 7.0.5
TensorFlow 1.7 from repo
Benchmark GitHub hash:9165a70
Date of testing:25.04.2018

Options	Inception V3	VGG16	ResNet-50	ResNet-152	Alexnet
Batch size on GPU	64	32	64	32	512
Optimization	sgd	sgd	sgd	sgd	sgd

Testing synthetic data (images / s)

GPUs	InceptionV3	VGG16	ResNet-50	ResNet-152	Alexnet
GPUs	InceptionV3	VGG16	ResNet-50	ResNet-152	Alexnet
2x P100	268.24	224.90	446.08	150.04	5252.43
2x PCI V100	430.77	309.82	667.62	213.04	7545.40
2x NVlink V100	450.75	417.22	698.97	236.90	8786.56

Other results

Testing synthetic data (images / s)

2x PCI Tesla P100

Batch size	Alexnet	vgg11	vgg16	vgg19	lenet	googlenet
32	1411.48	378.47	224.90	199.87	14944.76	788.43
64	2460.54	473.82	256.68	225.58	29215.60	913.38
128	3576.26	539.08	278.83	243.67	47375.83	1035.37
256	4545.45	561.73	-	-	67116.75	1127.05
512	5252.43	-	-	-	83665.27	1165.75
Batch size	overfeat	inceptionv3	inception4	resnet50	resnet101	resnet152
32	548.55	248.72	122.22	389.73	220.26	150.04
64	952.51	268.24	133.96	446.08	253.86	176.09
128	1437.54	283.39	-	483.51	-	-
256	1847.21	-	-	-	-	-
512	2186.47	-	-	-	-	-

2x PCI Tesla V100

Batch size	Alexnet	vgg11	vgg16	vgg19	lenet	googlenet
32	1665.82	526.55	309.82	282.81	17583.47	1268.95
64	3056.89	695.42	374.22	331.41	32271.30	1487.77
128	4660.06	831.39	410.27	360.79	62652.62	1704.92
256	6255.16	729.42	-	-	98828.17	1921.02
512	7545.40	-	-	-	136553.56	2039.60
Batch size	overfeat	inceptionv3	inception4	resnet50	resnet101	resnet152
32	625.35	371.94	186.38	579.01	318.30	213.04
64	1194.50	430.77	210.41	667.62	379.37	259.16
128	1934.71	462.09	-	746.73	-	-
256	2690.65	-	-	-	-	-
512	3267.15	-	-	-	-	-

2x NVlink Tesla V100

Batch size	Alexnet	vgg11	vgg16	vgg19	lenet	googlenet
32	3743.79	775.95	417.22	360.08	12460.77	1250.49
64	5514.97	904.65	447.46	386.92	28038.87	1546.01
128	6990.88	982.62	465.05	401.43	50064.03	1791.36
256	7960.86	805.59	-	-	94842.75	1895.35
512	8786.56	-	-	-	131914.42	2158.45
Batch size	overfeat	inceptionv3	inception4	resnet50	resnet101	resnet152
32	1404.21	397.70	195.51	602.97	341.20	236.90
64	2216.08	450.75	220.00	698.97	395.01	272.37
128	3005.20	475.38	-	781.50	-	-
256	3656.48	-	-	-	-	-
512	4073.38	-	-	-	-	-