RTX 6000 Ada 1, 2, 4 GPU vs RTX 4090 1, 2, 4 GPU vs A6000 1, 2, 4 GPU
RTX 6000 Adaを1, 2, 4, GPU 使い、batch size を64, 128, 256, 512, 1024 と変化させてtf_cnn_benchmarks での学習速度を計測しました。
modelは、resnet50, inception3, vgg16, nasnet, resnet152, inception4です。
fp16とfp32の学習速度を計測しました。
以前測定したRTX A6000と比較してどれほど高速になったかを掲載します。Geforce RTX 4090との比較も掲載します。
GPUサーバー選択の参考になれば幸いです。
表の学習速度(images/sec)の括弧内の数値は、1GPUの時と比べて何倍になっているかを示します。
RTX6000の計測値を上段に表示しています。6000 Ada, RTX4090 1GPU時の値の右の括弧内の数値は、RTX6000の何倍になっているかの値です。
使用したハードウェアは HPCDIY-ERMGPU8R4S
CPU: AMD EPYC Rome 7352 DP/UP 24C/48T 2.3G 128M 155W
メモリ: 32GB x 16 = 512GB
SSD: Samsung PM983 7.68TB NVMePCIe3x4 V4TLC 2.5"7mm(1.3 DWPD) x 1
になります。
使用したソフトウェアは、tf_cnn_benchmarks、使用したtensorflowは、nvcr.io/nvidia/tensorflow:20.12-tf1-py3 になります。
gpu | 演算精度 | model | batch size | images/sec(1gpu) | images/sec(2gpu) | images/sec(4gpu) | images/sec(8gpu) |
---|---|---|---|---|---|---|---|
a6000 6000Ada 4090 |
fp16 | resnet50 | 64 |
1061.92 1699.54(1.600) 1688.03(1.590) |
1969.59(1.855) 3388.55(1.994) 2794.01(1.655) |
2794.37(2.631) 6290.93(3.702) 4233.10(2.508) |
6274.37(5.909)
|
a6000 6000Ada 4090 |
fp16 | resnet50 | 128 |
1156.52 1844.42(1.595) 1748.98(1.512) |
2181.27(1.886) 3479.46(1.886) 3145.92(1.799) |
4176.46(3.611) 6685.21(3.625) 5548.25(3.172) |
7302.03(6.314)
|
a6000 6000Ada 4090 |
fp16 | resnet50 | 256 |
1220.76 1726.71(1.414) 1776.41(1.455) |
2382.99(1.952) 3432.98(1.988) 3357.26(1.890) |
4634.09(3.796) 6740.77(3.904) 6142.79(3.458) |
8816.47(7.222)
|
a6000 6000Ada 4090 |
fp16 | resnet50 | 512 |
1242.82 1707.91(1.374) 1760.89(1.417) |
2438.71(1.962) 3354.28(1.964) 3408.95(1.936) |
4793.78(3.857) 6393.59(3.744) 6582.35(3.740) |
9425.05(7.584)
|
a6000 6000Ada 4090 |
fp16 | resnet50 | 1024 |
1253.40 1637.85(1.307) N/A |
2508.13(2.001) 3220.23(1.966) N/A |
4944.17(3.945) 6220.45(3.798) N/A |
9686.33(7.728)
|
gpu | 演算精度 | model | batch size | images/sec(1gpu) | images/sec(2gpu) | images/sec(4gpu) | images/sec(8gpu) |
---|---|---|---|---|---|---|---|
a6000 6000Ada 4090 |
fp16 | inception3 | 64 |
784.08 1381.62(1.762) 1243.98(1.587) |
1420.48(1.812) 2378.00(1.721) 2046.64(1.645) |
2672.08(3.408) 4607.75(3.335) 3274.80(2.633) |
4687.96(5.979)
|
a6000 6000Ada 4090 |
fp16 | inception3 | 128 |
858.16 1345.13(1.567) 1340.23(1.562) |
1685.34(1.964) 2477.47(1.842) 2497.32(1.863) |
3025.44(3.525) 4863.96(3.616) 4334.59(3.234) |
5977.10(6.965)
|
a6000 6000Ada 4090 |
fp16 | inception3 | 256 |
911.90 1273.31(1.396) 1391.36(1.526) |
1799.25(1.973) 2406.48(1.890) 2515.63(1.808) |
3465.63(3.800) 4764.44(3.742) 5004.93(3.597) |
6522.25(7.152)
|
a6000 6000Ada 4090 |
fp16 | inception3 | 512 |
899.23 1162.56(1.293) N/A |
1758.40(1.955) 2423.35(2.084) N/A |
3477.81(3.868) 4727.72(4.067) N/A |
6849.92(7.618)
|
a6000 6000Ada 4090 |
fp16 | inception3 | 1024 |
858.57 1276.61(1.487) N/A |
1648.26(1.920) N/A N/A |
3249.41(3.785) N/A N/A |
6652.44(7.748)
|
gpu | 演算精度 | model | batch size | images/sec(1gpu) | images/sec(2gpu) | images/sec(4gpu) | images/sec(8gpu) |
---|---|---|---|---|---|---|---|
a6000 6000Ada 4090 |
fp16 | vgg16 | 64 |
506.68 793.63(1.566) 782.81(1.545) |
824.63(1.628) 1345.62(1.696) 1202.66(1.536) |
1398.37(2.760) 2050.90(2.584) 1093.21(1.397) |
1719.74(3.394)
|
a6000 6000Ada 4090 |
fp16 | vgg16 | 128 |
535.68 855.08(1.596) 831.85(1.553) |
985.47(1.840) 1445.02(1.690) 1311.02(1.576) |
1726.85(3.224) 2550.04(2.982) 1714.44(2.061) |
2764.99(5.162)
|
a6000 6000Ada 4090 |
fp16 | vgg16 | 256 |
554.17 799.86(1.443) 862.27(1.556) |
1056.00(1.906) 1619.37(2.025) 1671.98(1.939) |
1967.01(3.549) 3072.36(2.841) 2429.87(2.818) |
3651.36(6.589)
|
a6000 6000Ada 4090 |
fp16 | vgg16 | 512 |
540.55 840.34(1.555) 875.55(1.620) |
1053.56(1.949) 1662.81(1.979) 1651.70(1.886) |
2040.88(3.776) 3072.69(3.656) 2879.19(3.288) |
3926.31(7.264)
|
gpu | 演算精度 | model | batch size | images/sec(1gpu) | images/sec(2gpu) | images/sec(4gpu) | images/sec(8gpu) |
---|---|---|---|---|---|---|---|
a6000 6000Ada 4090 |
fp16 | nasnet | 64 |
326.86 673.96(2.062) 636.89(1.949) |
527.01(1.612) 1020.53(1.514) 740.25(1.162) |
894.13(2.736) 1765.51(2.62) 1203.70(1.890) |
1548.88(4.739)
|
a6000 6000Ada 4090 |
fp16 | nasnet | 128 |
385.16 831.88(2.16) 809.24(2.101) |
726.34(1.886) 1505.78(1.81) 1242.0(1.535) |
1347.12(3.498) 2737.73(3.291) 2026.16(2.504) |
2434.67(6.321)
|
a6000 6000Ada 4090 |
fp16 | nasnet | 256 |
411.98 769.88(1.869) 844.1(2.049) |
788.20(1.913) 1553.08(2.017) 1530.35(1.813) |
1531.37(3.717) 3084.45(4.006) 2877.05(3.408) |
2822.33(6.851)
|
a6000 6000Ada 4090 |
fp16 | nasnet | 512 |
409.85 784.18(1.913) N/A |
761.09(1.857) 1492.19(1.903) N/A |
1582.06(3.860) 2844.32(3.627) N/A |
3068.16(7.486)
|
gpu | 演算精度 | model | batch size | images/sec(1gpu) | images/sec(2gpu) | images/sec(4gpu) | images/sec(8gpu) |
---|---|---|---|---|---|---|---|
a6000 6000Ada 4090 |
fp16 | resnet152 | 64 |
454.27 847.50(1.866) 752.43 |
840.27(1.850) 1547.61(1.826) 1177.31(1.565) |
1540.17(3.390) 2736.35(3.229) 1644.11(2.185) |
2574.31(5.667)
|
a6000 6000Ada 4090 |
fp16 | resnet152 | 128 |
507.12 808.70(1.595) 765.31(1.509) |
963.78(1.900) 1530.75(1.893) 1372.78(1.794) |
1809.60(3.568) 2977.4(3.682) 2346.98(3.067) |
3320.63(6.548)
|
a6000 6000Ada 4090 |
fp16 | resnet152 | 256 |
543.78 772.5(1.421) 774.11(1.424) |
1054.73(1.940) 1527.95(1.978) 1467.87(1.896) |
2035.65(3.744) 2889.61(3.741) 2769.61(3.578) |
3868.09(7.113)
|
a6000 6000Ada 4090 |
fp16 | resnet152 | 512 |
559.22 706.09(1.263) N/A |
1099.23(1.966) 1413.59(2.002) N/A |
2052.58(3.670) 2721.91(3.855) N/A |
4169.76(7.456)
|
gpu | 演算精度 | model | batch size | images/sec(1gpu) | images/sec(2gpu) | images/sec(4gpu) | images/sec(8gpu) |
---|---|---|---|---|---|---|---|
a6000 6000Ada 4090 |
fp16 | inception4 | 64 |
379.13 702.43(1.853) 647.39(1.708) |
723.55(1.908) 1262.18(1.797) 1165.49(1.800) |
1343.56(3.544) 2460.98(3.504) 1849.91(2.857) |
2538.98(6.697)
|
a6000 6000Ada 4090 |
fp16 | inception4 | 128 |
407.79 642.41(1.575) 695.43(1.705) |
788.35(1.933) 1250.71(1.947) 1313.02(1.888) |
1548.75(3.798) 2541.19(3.956) 2398.77(3.449) |
2941.88(7.214)
|
a6000 6000Ada 4090 |
fp16 | inception4 | 256 |
469.59 608.43(1.296) 727.43(1.549) |
883.94(1.882) 1273.34(2.093) 1370.57(1.884) |
1776.91(3.784) 2270.4(3.732) 2704.61(3.718) |
3393.76(7.227)
|
a6000 6000Ada 4090 |
fp16 | inception4 | 512 |
473.03 663.91(1.404) N/A |
930.52(1.967) 1193.19(1.797) N/A |
1844.26(3.899) 2324.46(3.501) N/A |
3602.49(7.616)
|
gpu | 演算精度 | model | batch size | images/sec(1gpu) | images/sec(2gpu) | images/sec(4gpu) | images/sec(8gpu) |
---|---|---|---|---|---|---|---|
a6000 6000Ada 4090 |
fp32 | resnet50 | 64 |
477.96 810.77(1.696) 822.14(1.72) |
910.68(1.905) 1559.46(1.923) 1522.85(1.852) |
1715.91(3.590) 2982.68(3.679) 2620.04(3.187) |
3167.85(6.628)
|
a6000 6000Ada 4090 |
fp32 | resnet50 | 128 |
506.40 779.38(1.539) 832.78(1.645) |
993.42(1.962) 1529.24(1.962) 1584.19(1.902) |
1939.24(3.829) 2959.5(3.797) 2970.26(3.567) |
3697.20(7.301)
|
a6000 6000Ada 4090 |
fp32 | resnet50 | 256 |
520.85 714.23(1.371) 833.72(1.601) |
1036.15(1.989) 1390.51(1.947) 1630.34(1.956) |
2042.98(3.922) 2783.09(3.897) 3112.87(3.734) |
3980.10(7.642)
|
a6000 6000Ada 4090 |
fp32 | resnet50 | 512 |
517.09 709.49(1.372) N/A |
1029.55(1.991) 1412.86(1.991) N/A |
2049.59(3.964) 2560(3.608) N/A |
4043.83(7.820)
|
gpu | 演算精度 | model | batch size | images/sec(1gpu) | images/sec(2gpu) | images/sec(4gpu) | images/sec(8gpu) |
---|---|---|---|---|---|---|---|
a6000 6000Ada 4090 |
fp32 | inception3 | 64 |
350.54 565.07(1.612) 632.08(1.803) |
662.44(1.890) 1144.54(2.025) 1170.18(1.851) |
1256.14(3.583) 2197.35(3.889) 2082.66(3.295) |
2272.07(6.482)
|
a6000 6000Ada 4090 |
fp32 | inception3 | 128 |
378.27 540.29(1.428) 632.24(1.671) |
742.52(1.963) 1064.29(1.97) 1198.78(1.896) |
1461.00(3.862) 2115.36(3.915) 2300.95(3.639) |
2749.77(7.269)
|
a6000 6000Ada 4090 |
fp32 | inception3 | 256 |
390.65 488.8(1.251) N/A |
774.44(1.982) 1000.28(2.046) N/A |
1503.87(3.850) 1837.32(3.759) N/A |
2913.61(7.458)
|
gpu | 演算精度 | model | batch size | images/sec(1gpu) | images/sec(2gpu) | images/sec(4gpu) | images/sec(8gpu) |
---|---|---|---|---|---|---|---|
a6000 6000Ada 4090 |
fp32 | vgg16 | 64 |
312.46 381.42(1.221) 471.91(1.51) |
558.06(1.786) 748.39(1.962) 797.67(1.690) |
997.03(3.191) 1236.09(3.241) 915.77(1.941) |
1623.46(5.196)
|
a6000 6000Ada 4090 |
fp32 | vgg16 | 128 |
321.10 379.62(1.182) 483.77(1.507) |
617.09(1.922) 735.42(1.937) 887.10(1.834) |
1127.00(3.510) N/A 1213.92(2.509) |
1947.73(6.066)
|
a6000 6000Ada 4090 |
fp32 | vgg16 | 256 |
323.93 328.32(1.014) 465.02(1.436) |
638.49(1.971) 635.2(1.935) 908.37(1.953) |
1223.70(3.778) 1194.26(3.637) 1560.53(3.356) |
2232.47(6.892)
|
a6000 6000Ada 4090 |
fp32 | vgg16 | 512 |
304.09 293.76(0.966) N/A |
605.69(1.992) 587.17(1.999) N/A |
1192.28(3.921) 1066.09(3.629) N/A |
2290.20(7.531)
|
gpu | 演算精度 | model | batch size | images/sec(1gpu) | images/sec(2gpu) | images/sec(4gpu) | images/sec(8gpu) |
---|---|---|---|---|---|---|---|
a6000 6000Ada 4090 |
fp32 | nasnet | 64 |
313.64 635.03(2.025) 626.03(1.996) |
561.60(1.791) 1017.02(1.602) 763.24(1.219) |
909.26(2.899) 1804.05(2.841) 1239.61(1.980) |
1612.12(5.140)
|
a6000 6000Ada 4090 |
fp32 | nasnet | 128 |
363.06 710.12(1.956) 710.8(1.958) |
671.66(1.850) 1315.17(1.852) 1220.52(1.717) |
1304.10(3.592) 2552.47(3.594) 2039.47(2.869) |
2313.57(6.372)
|
a6000 6000Ada 4090 |
fp32 | nasnet | 256 |
382.46 679.36(1.776) 726.1(1.898) |
740.62(1.936) 1388.73(2.044) 1323.51(1.823) |
1439.74(3.764) 2651.25(3.903) 2569.41(3.539) |
2672.53(6.988)
|
a6000 6000Ada 4090 |
fp32 | nasnet | 512 |
384.38 646.75(1.683) N/A |
731.87(1.904) 1213.59(1.876) N/A |
1477.58(3.844) 2495.86(3.859) N/A |
2870.50(7.468)
|
gpu | 演算精度 | model | batch size | images/sec(1gpu) | images/sec(2gpu) | images/sec(4gpu) | images/sec(8gpu) |
---|---|---|---|---|---|---|---|
a6000 6000Ada 4090 |
fp32 | resnet152 | 64 |
201.06 359.25(1.787) 360.2(1.792) |
386.65(1.923) 692.78(1.928) 659.77(1.832) |
718.98(3.576) 1279.51(3.562) 1094.79(3.039) |
1301.79(6.475)
|
a6000 6000Ada 4090 |
fp32 | resnet152 | 128 |
218.29 323.09(1.48) 363.1(1.663) |
430.75(1.973) 628.28(1.945) 687.65(1.894) |
836.49(3.832) 1248.85(3.865) 1239.57(3.414) |
1574.48(7.213)
|
a6000 6000Ada 4090 |
fp32 | resnet152 | 256 |
227.44 309.61(1.361) N/A |
450.95(1.983) 631.6(2.04) N/A |
885.69(3.894) 1158.43(3.742) N/A |
1721.80(7.570)
|
gpu | 演算精度 | model | batch size | images/sec(1gpu) | images/sec(2gpu) | images/sec(4gpu) | images/sec(8gpu) |
---|---|---|---|---|---|---|---|
a6000 6000Ada 4090 |
fp32 | inception4 | 64 |
173.69 298.23(1.717) 323.53(1.863) |
335.13(1.929) 580.33(1.946) 547.17(1.691) |
646.08(3.720) 1124.15(3.769) 1113.07(3.440) |
1198.78(6.902)
|
a6000 6000Ada 4090 |
fp32 | inception4 | 128 |
186.24 276.57(1.485) 327.04(1.756) |
364.38(1.957) 562.21(2.033) 630.26(1.927) |
719.10(3.861) 1043.59(3.773) 1210.97(3.703) |
1382.76(7.425)
|
a6000 6000Ada 4090 |
fp32 | inception4 | 256 |
186.57 251.85(1.35) N/A |
353.50(1.895) 482.03(1.914) N/A |
730.18(3.914) 968.26(3.845) N/A |
1428.19(7.655)
|