HP NVIDIA Tesla M2070Q, NVIDIA Tesla M2090, NVIDIA Tesla K10, NVIDIA Tesla K20, NVIDIA Tesla K20X Specification

...
HP supports, on select HP ProLiant servers, computational accelerator modules based on NVIDIA® Tesla™ Graphical Processing Unit (GPU) technology.
The following Tesla Modules are available from HP, for use in certain HP SL-series servers.
NVIDIA Tesla M2070Q 2-Slot Passive Module NVIDIA Tesla M2090 2-Slot Passive Module NVIDIA Tesla K10 Dual GPU PCIe Module NVIDIA Tesla K10 Rev B Dual GPU Module NVIDIA Tesla K20 5 GB Module NVIDIA Tesla K20X 6 GB Module
The NVIDIA Tesla M2070Q module can also be used in HP ProLiant WS460c workstation blades.
Based on NVIDIA's CUDA™ architecture, the Tesla Modules enable seamless integration of GPU computing with HP ProLiant servers for high-performance computing and large data center, scale-out deployments. These Tesla Modules deliver all of the standard benefits of GPU computing while enabling maximum reliability and tight integration with system monitoring and management tools such as HP Insight Cluster Management Utility.
The Tesla M2070Q uses the NVIDIA Fermi GPU architecture that combines Tesla's high performance computing - found in the other Tesla Modules - and the NVIDIA Quadro® professional-class visualization in the same GPU. The Tesla M2070Q is the ideal solution for customers who want to deploy high performance computing in addition to advanced and remote visualization in the same datacenter.
The HP GPU Ecosystem includes HP Cluster Platform specification and qualification, HP-supported GPU-aware cluster software, and also third-party GPU-aware cluster software for NVIDIA Tesla Modules on HP ProLiant Servers. In particular, the HP Insight Cluster Management Utility (CMU) will monitor and display GPU health sensors such as temperature. CMU will also install and provision the GPU drivers and the CUDA software. The HP HPC Linux Value Pack includes a GPU-enhanced version of IBM Platform LSF, with the capability of scheduling jobs based on GPU requirements. This capability is also available for HP in other popular schedulers such as Altair PBS Professional, and Adaptive Moab.
NVIDIA K10
NVIDIA K20, K20X
What's New
Support for the NVIDIA Tesla K10 Rev B Dual GPU Module
QuickSpecs
NVIDIA Tesla GPU Modules for HP ProLiant Servers
Overview
DA - 13743 North America — Version 16 — September 30, 2013
Page 1
NVIDIA Passive Tesla Modules
NVIDIA Tesla M2070Q 6GB GPU Graphics Module
A0C39A
NVIDIA Tesla M2075 6 GB Computational Accelerator
A0R41A
NVIDIA Tesla M2090 6 GB Computational Accelerator
A0J99A
NOTE:
2-slot passively cooled Tesla modules with 6 GB memory.
NOTE:
Please see the HP ProLiant SL250s Gen8 or SL270s Gen8 server QuickSpecs for Technical Specifications and additional information.
http://h18004.www1.hp.com/products/quickspecs/14232_na/14232_na.html http://h18004.www1.hp.com/products/quickspecs/14405_na/14405_na.html
NVIDIA Tesla K10 Dual GPU PCIe Computational Accelerator
NOTE:
Please see the HP ProLiant SL250s Gen8 or SL270s Gen8 QuickSpecs for Technical Specifications and additional information.
http://h18004.www1.hp.com/products/quickspecs/14232_na/14232_na.html http://h18004.www1.hp.com/products/quickspecs/14405_na/14405_na.html
B3M66A
NVIDIA Tesla K10 Rev B Dual GPU PCIe Computational Accelerator
NOTE:
2-slot passively cooled pair of Tesla GPUs each with 4 GB memory.
NOTE:
Please see the HP ProLiant SL250s Gen8 server QuickSpecs for Technical Specifications and additional information.
http://h18004.www1.hp.com/products/quickspecs/14232_na/14232_na.html http://h18004.www1.hp.com/products/quickspecs/14405_na/14405_na.html
E5V47A
NVIDIA Tesla K20 5 GB Computational Accelerator
C7S14A
NVIDIA Tesla K20X 6 GB Computational Accelerator
C7S15A
NOTE:
2-slot passively cooled Tesla GPUs based on NVIDIA Kepler architecture.
NOTE:
Please see the HP ProLiant SL250s Gen8 server QuickSpecs for Technical Specifications and additional information.
http://h18004.www1.hp.com/products/quickspecs/14232_na/14232_na.html http://h18004.www1.hp.com/products/quickspecs/14405_na/14405_na.html
QuickSpecs
NVIDIA Tesla GPU Modules for HP ProLiant Servers
Models
DA - 13743 North America — Version 16 — September 30, 2013
Page 2
M2070Q, M2090, K10, K20 and K20X Modules
Performance of the M2090 Module
512 CUDA cores 655 Gigaflops of double-precision peak performance 1330 Gigaflops of single-precision peak performance GDDR5 memory optimizes performance and reduces data transfers by keeping large data sets in 6 GB of local memory that is attached directly to the GPU. The NVIDIA Parallel DataCache™ accelerates algorithms such as physics solvers, ray-tracing, and sparse matrix multiplication where data addresses are not known beforehand. This includes a configurable L1 cache per Streaming Multiprocessor block and a unified L2 cache for all of the processor cores. The NVIDIA GigaThread™ Engine maximizes the throughput by faster context switching that is 10X faster than the M1060 module, concurrent kernel execution, and improved thread block scheduling. Asynchronous transfer turbo charges system performance by transferring data over the PCIe bus while the computing cores are crunching other data. Even applications with heavy data-transfer requirements, such as seismic processing, can maximize the computing efficiency by transferring data to local memory before it is needed. The high speed PCIe Gen 2.0 data transfer maximizes bandwidth between the HP ProLiant server and the Tesla processors.
Performance of the K10 Module
3072 CUDA cores (1536 per GPU) 190 Gigaflops of double-precision peak performance (95 Gflops in each GPU) 4577 Gigaflops of single-precision peak performance (2288 Gigaflops in each GPU) GDDR5 memory optimizes performance and reduces data transfers by keeping large data sets in 8 GB of local memory, 4 GB attached directly to each GPU. The NVIDIA Parallel DataCache™ accelerates algorithms such as physics solvers, ray-tracing, and sparse matrix multiplication where data addresses are not known beforehand. This includes a configurable L1 cache per Streaming Multiprocessor block and a unified L2 cache for all of the processor cores. Asynchronous transfer turbo charges system performance by transferring data over the PCIe bus while the computing cores are crunching other data. Even applications with heavy data-transfer requirements, such as seismic processing, can maximize the computing efficiency by transferring data to local memory before it is needed. The high speed PCIe Gen 3.0 data transfer maximizes bandwidth between the HP ProLiant server and the Tesla processors.
Performance of the K20 Module
2496 CUDA cores
1.17 Tflops of double-precision peak performance
3.52 Tflops of single-precision peak performance GDDR5 memory optimizes performance and reduces data transfers by keeping large data sets in 5 GB of local memory that is attached to the GPU The NVIDIA Parallel DataCache™ accelerates algorithms such as physics solvers, ray-tracing, and sparse matrix multiplication where data addresses are not known beforehand. This includes a configurable L1 cache per Streaming Multiprocessor block and a unified L2 cache for all of the processor cores. Asynchronous transfer turbo charges system performance by transferring data over the PCIe bus while the computing cores are crunching other data. Even applications with heavy data-transfer requirements, such as seismic processing, can maximize the computing efficiency by transferring data to local memory before it is needed. Dynamic Parallelism capability that enables GPU threads to automatically spawn new threads. Hyper-Q feature that enables multiple CPU cores to simultaneously utilize the CUDA cores on a single GPU. The high speed PCIe Gen 2.0 data transfer maximizes bandwidth between the HP ProLiant server and the Tesla processors.
QuickSpecs
NVIDIA Tesla GPU Modules for HP ProLiant Servers
Standard Features
DA - 13743 North America — Version 16 — September 30, 2013
Page 3
Performance of the K20X Module
2688 CUDA cores
1.32 Tflops of double-precision peak performance
3.95 Tflops of single-precision peak performance GDDR5 memory optimizes performance and reduces data transfers by keeping large data sets in 6 GB of local memory that is attached to the GPU The NVIDIA Parallel DataCache™ accelerates algorithms such as physics solvers, ray-tracing, and sparse matrix multiplication where data addresses are not known beforehand. This includes a configurable L1 cache per Streaming Multiprocessor block and a unified L2 cache for all of the processor cores. Asynchronous transfer turbo charges system performance by transferring data over the PCIe bus while the computing cores are crunching other data. Even applications with heavy data-transfer requirements, such as seismic processing, can maximize the computing efficiency by transferring data to local memory before it is needed. Dynamic Parallelism capability that enables GPU threads to automatically spawn new threads. Hyper-Q feature that enables multiple CPU cores to simultaneously utilize the CUDA cores on a single GPU. The high speed PCIe Gen 2.0 data transfer maximizes bandwidth between the HP ProLiant server and the Tesla processors.
Reliability
"ECC Memory meets a critical requirement for computing accuracy and reliability for datacenters and supercomputing centers. It offers protection of data in memory to enhance data integrity and reliability for applications. For M2075, M2070Q, M2090, K20, K20X register files, L1/L2 caches, shared memory, and DRAM all are ECC protected. For K10, only external DRAM is ECC protected. Double-bit errors are detected and can trigger alerts with the HP Cluster Management Utility. Also, the Platform LSF job scheduler, available as part of HP HPC Linux Value Pack, can be configured to report when jobs encounter double-bit errors. Passive heatsink design eliminates moving parts and cables reduces mean time between failures.
Programming and Management Ecosystem
The CUDA programming environment has broad support of programming languages and APIs. Choose C, C++, OpenCL, DirectCompute, or Fortran to express application parallelism and take advantage of the innovative Tesla architectures. The CUDA software, as well as the GPU drivers, can be automatically installed on HP ProLiant servers, by HP Insight Cluster Management Utility. "Exclusive mode" enables application-exclusive access to a particular GPU. CUDA environment variables enable cluster management software such as the Platform LSF job scheduler (available as part of HP HPC Linux Value Pack) to limit the Tesla GPUs an application can use. With HP ProLiant servers, application programmers can control the mapping between processes running on individual cores, and the GPUs with which those processes communicate. By judicious mappings, the GPU bandwidth, and thus overall performance, can be optimized. The technique is described in a white paper available to HP customers at:
www.hp.com/go/hpc
. A heuristic version of this affinity-mapping has also been implemented by HP as an option to the mpirun command as used for example with HP-MPI, available as part of HP HPC Linux Value Pack. GPU control is available through the nvidia-smi tool which lets you control compute-mode (e.g. exclusive), enable/disable/report ECC and check/reset double-bit error count. IPMI and iLO gather data such as GPU temperature. HP Cluster Management Utility has incorporated these sensors into its monitoring features so that cluster-wide GPU data can be presented in real time, can be stored for historical analysis and can be easily used to set up management alerts.
QuickSpecs
NVIDIA Tesla GPU Modules for HP ProLiant Servers
Standard Features
DA - 13743 North America — Version 16 — September 30, 2013
Page 4
Loading...
+ 7 hidden pages