Nvidia TCST4MATX-PB Product Data Sheet

Download

Page 1

NVIDIA T4

05X10X

15X

20X

25X

30X

35X

40X

21X

Comparisons made of one NVIDIA T4 GPU versus servers with dual-socket Xeon Gold 6140 CPU.

GNMT

ResNet-50

27X

36X

DeepSpeech2

CPU

02X4X5X6X7X8X9X10X

9.3X

Comparison made of dual NVIDIA T4 GPUs versus servers with dual-socket Xeon Gold 6140 CPU.

CPU

TENSOR CORE GPU

Powering Scale-Out AI Training and Inference

Supercharge any server with NVIDIA® T4 GPU, the world’s most performant scale-out accelerator. Its low-proﬁle, 70W design is powered by NVIDIA Turing™ Tensor Cores, delivering revolutionary multi-precision performance to accelerate a wide range of modern applications. This advanced GPU is packaged in an energy-efﬁcient 70-watt, small PCIe form factor, optimized for scale-out servers and purpose-built to deliver state-of-the-art AI.

Inference Performance

Training Performance

SPECIFICATIONS

GPU Architecture NVIDIA Turing

NVIDIA Turing Tensor Cores

NVIDIA CUDA® Cores 2,560

Single-Precision 8.1 TFLOPS

Mixed-Precision (FP16/FP32)

INT8 130 TOPS

INT4 260 TOPS

GPU Memory 16 GB GDDR6

ECC Yes

Interconnect Bandwidth

System Interface x16 PCIe Gen3

Form Factor Low-Proﬁle PCIe

Thermal Solution Passive

Compute APIs CUDA, NVIDIA TensorRT™,

320

65 TFLOPS

300 GB/s

32 GB/sec

ONNX

ResNet-50

NVIDIA T4 | DATASHEET | DEC18

Page 2

Scale-Out Performance Driving Data Center Acceleration

Small form factor 70-watt (W) design

makes T4 optimized for scale-out servers, providing an incredible 50X higher energy efﬁciency compared to CPUs, drastically reducing operational costs. In the last two years, NVIDIA’s Inference Platform has increased efﬁciency by over 10X, and remains the most energy-efﬁcient solution for distributed AI training and inference.

Turing Tensor Core technology with

multi-precision computing for AI powers breakthrough performance from FP32 to FP16 to INT8, as well as INT4 precisions. It delivers up to 9.3X higher performance than CPUs on training and up to 36X on inference.

The NVIDIA T4 data center GPU is the

ideal universal accelerator for distributed computing environments. Revolutionary multi-precision performance accelerates deep learning and machine learning training and inference, video transcoding, and virtual desktops. T4 supports all AI frameworks and network types, delivering dramatic performance and efﬁciency that maximize the utility of at-scale deployments.

To learn more about the NVIDIA T4, visit www.nvidia.com/T4

© 2018 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, NVIDIA Turing, CUDA, and TensorRT are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. All other trademarks and copyrights are the property of their respective owners. DEC18

Nvidia TCST4MATX-PB Product Data Sheet

Specifications and Main Features

Frequently Asked Questions

User Manual