NVIDIA A100
TENSOR CORE GPU
Unprecedented Acceleration at Every Scale
The NVIDIA A100 Tensor Core GPU delivers unprecedented
acceleration at every scale for AI, data analytics, and HPC to
tackle the world’s toughest computing challenges. As the engine
of the NVIDIA data center platform, A100 can efficiently scale up
to thousands of GPUs or, using new Multi-Instance GPU (MIG)
technology, can be partitioned into seven isolated GPU instances to
accelerate workloads of all sizes. A100’s third-generation Tensor
Core technology now accelerates more levels of precision for diverse
workloads, speeding time to insight as well as time to market.
SPECIFICATIONS PEAK PERFORMANCE
Part Number TCSA100M-PB
EAN 3536403378035
GPU Architecture NVIDIA Ampere
NVIDIA Tensor Cores 640
NVIDIA CUDA
Double-Precision
Performance
Single-Precision
Performance
Half Precision
Performance
Integer Performance INT8: 624 TOPS | 1,248 TOPS*
GPU Memory
Memory Bandwidth 1.6 TB/sec
ECC Yes
System Interface PCIe Gen4
Form Factor PCIe Full Height
Multi-Instance GPU Up to 7 GPU instances
Max Power
Comsumption
Thermal Solution Passive
Compute APIs CUDA, DirectCompute,
®
Cores 5,120
FP64: 9.7 TFLOPS
FP64 Tensor Core: 19.5 TFLOPS
FP32: 19.5 TFLOPS
Tensor Float 32 (TF32): 156
TFLOPS | 312 TFLOPS*
312 TFLOPS | 624 TFLOPS*
INT4: 1,248 TOPS | 2,496 TOPS*
40 GB HBM2
250 W
™
OpenCL
, OpenACC
GROUNDBREAKING INNOVATIONS
Simulia AbaqusSimulia Abaqus
NVIDIA AMPERE
ARCHITECTURE
A100 accelerates workloads big
and small. Whether using MIG
to partition an A100 GPU into
smaller instances, or NVLink
to connect multiple GPUs to
accelerate large-scale workloads,
A100 can readily handle differentsized acceleration needs, from
the smallest job to the biggest
multi-node workload. A100’s
versatility means IT managers
can maximize the utility of every
GPU in their data center around
the clock.
MULTIINSTANCE GPU MIG
An A100 GPU can be partitioned
into as many as seven GPU
instances, fully isolated at
the hardware level with their
own high-bandwidth memory,
cache, and compute cores.
MIG gives developers access
to breakthrough acceleration
for all their applications, and IT
administrators can offer rightsized GPU acceleration for every
job, optimizing utilization and
expanding access to every user
and application.
THIRDGENERATION
TENSOR CORES
A100 delivers 312 teraFLOPS
(TFLOPS) of deep learning
performance. That’s 20X Tensor
FLOPS for deep learning training
and 20X Tensor TOPS for deep
learning inference compared to
NVIDIA Volta
™
GPUs.
HBM2
With 40 gigabytes (GB) of highbandwidth memory (HBM2),
A100 delivers improved raw
bandwidth of 1.6TB/sec,
as well as higher dynamic
random-access memory
(DRAM) utilization efficiency at
95 percent. A100 delivers 1.7X
higher memory bandwidth over
the previous generation.
NEXTGENERATION NVLINK
NVIDIA NVLink in A100 delivers
2X higher throughput compared
to the previous generation.
When combined with NVIDIA
NVSwitch
™
, up to 16 A100 GPUs
can be interconnected at up to
600 gigabytes per second (GB/
sec) to unleash the highest
application performance
possible on a single server.
NVLink is available in A100
SXM GPUs via HGX A100 server
boards and in PCIe GPUs via an
NVLink Bridge for up to 2 GPUs.
STRUCTURAL SPARSITY
AI networks are big, having
millions to billions of
parameters. Not all of these
parameters are needed for
accurate predictions, and
some can be converted to
zeros to make the models
“sparse” without compromising
accuracy. Tensor Cores in A100
can provide up to 2X higher
performance for sparse models.
While the sparsity feature more
readily benefits AI inference,
it can also improve the
performance of model training.
The NVIDIA A100 Tensor Core GPU is the flagship product of the NVIDIA data center platform for deep
learning, HPC, and data analytics. The platform accelerates over 700 HPC applications and every major
deep learning framework. It’s available everywhere, from desktops to servers to cloud services, delivering
both dramatic performance gains and cost-saving opportunities.
EVERY DEEP LEARNING FRAMEWORK 700+ GPU-ACCELERATED APPLICATIONS
HPC
AMBERAMBER ANSYS FluentANSYS Fluent
HPC
GAUSSIANGAUSSIAN
HPC
LS-DYNALS-DYNA
HPC
HPC
OpenFOAMOpenFOAM
HPC
VASPVASP
To learn more about the NVIDIA A100 Tensor Core GPU, visit www.pny.eu
1 BERT pre-training throughput using Pytorch, including (2/3) Phase 1 and (1/3) Phase 2 | Phase 1 Seq Len = 128, Phase 2 Seq Len =
512 | V100: NVIDIA DGX-1™ server with 8x NVIDIA V100 Tensor Core GPU using FP32 precision | A100: NVIDIA DGX™ A100 server with 8x
A100 using TF32 precision.
2 BERT large inference | NVIDIA T4 Tensor Core GPU: NVIDIA TensorRT™ (TRT) 7.1, precision = INT8, batch size 256 | V100: TRT 7.1,
precision FP16, batch size 256 | A100 with 7 MIG instances of 1g.5gb; pre-production TRT, batch size 94, precision INT8 with sparsity.
3 V100 used is single V100 SXM2. A100 used is single A100 SXM4. AMBER based on PME-Cellulose, LAMMPS with Atomic Fluid LJ-2.5,
FUN3D with dpw, Chroma with szscl21_24_128.
HPC
HPC
GROMACSGROMACS
HPC
NAMDNAMD
HPC
HPC
HPC
WRFWRF
© 2017 NVIDIA Corporation and PNY. All rights reserved. NVIDIA, the NVIDIA
logo, Quadro, nView, CUDA, NVIDIA Pascal, and 3D Vision are trademarks and/or
registered trademarks of NVIDIA Corporation in the U.S. and other countries.
The PNY logotype is a registered trademark of PNY Technologies. OpenCL is a
trademark of Apple Inc. used under license to the Khronos Group Inc. All
other trademarks and copyrights are the property of their respective owners.
PNY Technologies Europe
Rue Joseph Cugnot BP40181 - 33708 Mérignac Cedex|France
T +33 (0)5 56 13 75 75 | F +33 (0)5 56 13 75 77
For more information visit: www.pny.eu