Nvidia TCSA100M-PB Product Data Sheet

NVIDIA A100
TENSOR CORE GPU
Unprecedented Acceleration at Every Scale
The NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration at every scale for AI, data analytics, and HPC to tackle the world’s toughest computing challenges. As the engine of the NVIDIA data center platform, A100 can efficiently scale up to thousands of GPUs or, using new Multi-Instance GPU (MIG) technology, can be partitioned into seven isolated GPU instances to accelerate workloads of all sizes. A100’s third-generation Tensor Core technology now accelerates more levels of precision for diverse workloads, speeding time to insight as well as time to market.
SPECIFICATIONSPEAK PERFORMANCE
Part Number TCSA100M-PB
EAN 3536403378035
GPU Architecture NVIDIA Ampere
NVIDIA Tensor Cores 640
NVIDIA CUDA
Double-Precision Performance
Single-Precision Performance
Half Precision Performance
Integer Performance INT8: 624 TOPS | 1,248 TOPS*
GPU Memory
Memory Bandwidth 1.6 TB/sec
ECC Yes
System Interface PCIe Gen4
Form Factor PCIe Full Height
Multi-Instance GPU Up to 7 GPU instances
Max Power Comsumption
Thermal Solution Passive
Compute APIs CUDA, DirectCompute,
®
Cores 5,120
FP64: 9.7 TFLOPS
FP64 Tensor Core: 19.5 TFLOPS
FP32: 19.5 TFLOPS
Tensor Float 32 (TF32): 156
TFLOPS | 312 TFLOPS*
312 TFLOPS | 624 TFLOPS*
INT4: 1,248 TOPS | 2,496 TOPS*
40 GB HBM2
250 W
OpenCL
, OpenACC
GROUNDBREAKING INNOVATIONS
Simulia AbaqusSimulia Abaqus
NVIDIA AMPERE ARCHITECTURE
A100 accelerates workloads big and small. Whether using MIG to partition an A100 GPU into smaller instances, or NVLink to connect multiple GPUs to accelerate large-scale workloads, A100 can readily handle different­sized acceleration needs, from the smallest job to the biggest multi-node workload. A100’s versatility means IT managers can maximize the utility of every GPU in their data center around the clock.
MULTIINSTANCE GPU MIG
An A100 GPU can be partitioned into as many as seven GPU instances, fully isolated at the hardware level with their own high-bandwidth memory, cache, and compute cores. MIG gives developers access to breakthrough acceleration for all their applications, and IT administrators can offer right­sized GPU acceleration for every job, optimizing utilization and expanding access to every user and application.
THIRDGENERATION TENSOR CORES
A100 delivers 312 teraFLOPS (TFLOPS) of deep learning performance. That’s 20X Tensor FLOPS for deep learning training and 20X Tensor TOPS for deep learning inference compared to NVIDIA Volta
GPUs.
HBM2
With 40 gigabytes (GB) of high­bandwidth memory (HBM2), A100 delivers improved raw bandwidth of 1.6TB/sec, as well as higher dynamic random-access memory (DRAM) utilization efficiency at 95 percent. A100 delivers 1.7X higher memory bandwidth over the previous generation.
NEXTGENERATION NVLINK
NVIDIA NVLink in A100 delivers 2X higher throughput compared to the previous generation. When combined with NVIDIA NVSwitch
, up to 16 A100 GPUs can be interconnected at up to 600 gigabytes per second (GB/ sec) to unleash the highest application performance possible on a single server. NVLink is available in A100 SXM GPUs via HGX A100 server boards and in PCIe GPUs via an NVLink Bridge for up to 2 GPUs.
STRUCTURAL SPARSITY
AI networks are big, having millions to billions of parameters. Not all of these parameters are needed for accurate predictions, and some can be converted to zeros to make the models “sparse” without compromising accuracy. Tensor Cores in A100 can provide up to 2X higher performance for sparse models. While the sparsity feature more readily benefits AI inference, it can also improve the performance of model training.
The NVIDIA A100 Tensor Core GPU is the flagship product of the NVIDIA data center platform for deep learning, HPC, and data analytics. The platform accelerates over 700 HPC applications and every major deep learning framework. It’s available everywhere, from desktops to servers to cloud services, delivering both dramatic performance gains and cost-saving opportunities.
EVERY DEEP LEARNING FRAMEWORK 700+ GPU-ACCELERATED APPLICATIONS
HPC
AMBERAMBER ANSYS FluentANSYS Fluent
HPC
GAUSSIANGAUSSIAN
HPC
LS-DYNALS-DYNA
HPC
HPC
OpenFOAMOpenFOAM
HPC
VASPVASP
To learn more about the NVIDIA A100 Tensor Core GPU, visit www.pny.eu
1 BERT pre-training throughput using Pytorch, including (2/3) Phase 1 and (1/3) Phase 2 | Phase 1 Seq Len = 128, Phase 2 Seq Len =
512 | V100: NVIDIA DGX-1™ server with 8x NVIDIA V100 Tensor Core GPU using FP32 precision | A100: NVIDIA DGX™ A100 server with 8x A100 using TF32 precision.
2 BERT large inference | NVIDIA T4 Tensor Core GPU: NVIDIA TensorRT™ (TRT) 7.1, precision = INT8, batch size 256 | V100: TRT 7.1,
precision FP16, batch size 256 | A100 with 7 MIG instances of 1g.5gb; pre-production TRT, batch size 94, precision INT8 with sparsity.
3 V100 used is single V100 SXM2. A100 used is single A100 SXM4. AMBER based on PME-Cellulose, LAMMPS with Atomic Fluid LJ-2.5,
FUN3D with dpw, Chroma with szscl21_24_128.
HPC
HPC
GROMACSGROMACS
HPC
NAMDNAMD
HPC
HPC
HPC
WRFWRF
© 2017 NVIDIA Corporation and PNY. All rights reserved. NVIDIA, the NVIDIA logo, Quadro, nView, CUDA, NVIDIA Pascal, and 3D Vision are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. The PNY logotype is a registered trademark of PNY Technologies. OpenCL is a trademark of Apple Inc. used under license to the Khronos Group Inc. All other trademarks and copyrights are the property of their respective owners.
PNY Technologies Europe
Rue Joseph Cugnot BP40181 - 33708 Mérignac Cedex|France
T +33 (0)5 56 13 75 75 | F +33 (0)5 56 13 75 77
For more information visit: www.pny.eu
Loading...