HP 489183-B21, 4X Using Manual

Using InfiniBand for a scalable compute infrastructure
technology brief, 3rd edition
Abstract .............................................................................................................................................. 2
Introduction ......................................................................................................................................... 2
InfiniBand technology ........................................................................................................................... 4
InfiniBand components ...................................................................................................................... 5
InfiniBand software architecture ......................................................................................................... 5
MPI ............................................................................................................................................. 7
IPoIB ........................................................................................................................................... 7
RDMA-based protocols .................................................................................................................. 7
RDS ............................................................................................................................................ 8
InfiniBand hardware architecture ....................................................................................................... 8
Link operation .................................................................................................................................. 9
Scale-out clusters built on InfiniBand and HP technology ......................................................................... 11
Conclusion ........................................................................................................................................ 13
Appendix A: Glossary ........................................................................................................................ 14
For more information .......................................................................................................................... 15
Call to action .................................................................................................................................... 15
2
CPU(s)
Memory
Controller
Processor 1
Memory
PCI-X
Bridge
PCIe Port
Controller
Ultra-3 SCSI
Fibre Channel
Gigabit Ethernet
320 MBps
400 MBps
> 1 Gbps
Memory
Controller
Processor 2
CPU(s)
Processor
Input/Output
InfiniBand
> 10 Gbps*
* 4x link, single data rate
Abstract
With business models constantly changing to keep pace with today’s Internet-based, global economy, IT organizations are continually challenged to provide customers with high performance platforms while controlling their cost. An increasing number of enterprise businesses are implementing scale-out architectures as a cost-effective approach for scalable platforms, not just for high performance computing (HPC) but also for financial services and Oracle-based database applications.
InfiniBand (IB) is one of the most important technologies that enable the adoption of cluster computing. This technology brief describes InfiniBand as an interconnect technology used in cluster computing, provides basic technical information, and explains the advantages of implementing InfiniBand-based scale-out architectures.
Introduction
The overall performance of enterprise servers is determined by the synergetic relationship between three main subsystems: processing, memory, and input/output. The multiprocessor architecture used in the latest single-server systems (Figure 1) provides a high degree of parallel processing capability. However, multiprocessor server architecture cannot scale cost effectively to a large number of processing cores. Scale-out cluster computing that builds an entire system by connecting stand-alone systems with interconnect technology has become widely implemented for HPC and enterprise data centers around the world.
Figure 1. Architecture of a dual-processor single server (node)
3
Figure 2 shows an example of cluster architecture that integrates computing, storage, and visualization functions into a single system. Applications are usually distributed to compute nodes through job scheduling tools.
Figure 2. Sample clustering architecture
Scale-out systems allow infrastructure architects to meet performance and cost goals, but interconnect performance, scalability, and reliability are key areas that must be carefully considered. A cluster infrastructure works best when built with an interconnect technology that scales easily, reliably, and economically with system expansion.
Ethernet is a pervasive, mature interconnect technology that can be cost-effective for some application workloads. The emergence of 10-Gigabit Ethernet (10GbE) offers a cluster interconnect that meets higher bandwidth requirements than 1GbE can provide. However, 10GbE still lags the latest InfiniBand technology in latency and bandwidth performance, and lacks native support for the fat-tree and mesh topologies used in scale-out clusters. InfiniBand remains the interconnect of choice for highly parallel environments where applications require low latency and high bandwidth across the entire fabric.
4
InfiniBand Link
InfiniBand technology
InfiniBand is an industry-standard, channel-based architecture in which congestion-management capabilities and zero-copy data transfers using remote direct memory access (RDMA) are core capabilitiesresulting in high-speed, low-latency interconnects for scale-out compute infrastructures.
InfiniBand uses a multi-layer architecture to transfer data from one node to another. In the InfiniBand layer model (Figure 3), separate layers perform different tasks in the message passing process.
The upper layer protocols (ULPs) work closest to the operating system and application; they define the services and affect how much software overhead the data transfer will require. The InfiniBand transport layer is responsible for communication between applications. The transport layer splits the messages into data payloads and encapsulates each data payload and an identifier of the destination node into one or more packets. Packets can contain data payloads of up to four kilobytes.
The packets are passed to the network layer, which selects a route to the destination node and, if necessary, attaches the route information to the packets. The data link layer attaches a local identifier (LID) to the packet for communication at the subnet level. The physical layer transforms the packet into an electromagnetic signal based on the type of network mediacopper or fiber.
Figure 3. Distributed computing using InfiniBand architecture
InfiniBand has these important characteristics:
Very high bandwidthup to 40Gbps Quad Data Rate (QDR) Low latency end-to-end communicationMPI ping-pong latency approaching 1 microsecond Hardware-based protocol handling, resulting in faster throughput and low CPU overhead due to
efficient OS bypass and RDMA
Native support for fat-tree and other common mesh topologies in fabric design
5
InfiniBand components
InfiniBand architecture involves four key components:
Host channel adapter Subnet manager Target channel adapter InfiniBand switch
A host node or server requires a host channel adapter (HCA) to connect to an InfiniBand infrastructure. An HCA can be a card installed in an expansion slot or integrated onto the host’s system board. An HCA can communicate directly with another HCA, with a target channel adapter, or with an InfiniBand switch.
InfiniBand uses subnet manager (SM) software to manage the InfiniBand fabric and to monitor interconnect performance and health at the fabric level. A fabric can be as simple as a point-to-point connection or multiple connections through one or more switches. The SM software resides on a node or switch within the fabric, and provides switching and configuration information to all of the switches in the fabric. Additional backup SMs may be located within the fabric for failover should the primary SM fail. All other nodes in the fabric will contain an SM agent that processes management data. Managers and agents communicate using management datagrams (MADs).
A target channel adapter (TCA) is used to connect an external storage unit or I/O interface to an InfiniBand infrastructure. The TCA includes an I/O controller specific to the device’s protocol (SCSI, Fibre Channel, Ethernet, etc.) and can communicate with an HCA or an InfiniBand switch.
An InfiniBand switch provides scalability by allowing a number of HCAs, TCAs, and other IB switches to connect to an InfiniBand infrastructure. The switch handles network traffic by checking the local link header of each data packet received and forwarding the packet to the proper destination.
The most basic InfiniBand infrastructure will consist of host nodes or servers equipped with HCAs, an InfiniBand switch, and subnet manager software. More expansive networks will include multiple switches.
InfiniBand software architecture
InfiniBand, like Ethernet, uses a multi-layer processing stack to transfer data between nodes. InfiniBand architecture, however, provides OS-bypass features such as the communication processing duties and RDMA operations as core capabilities and offers greater adaptability through a variety of services and protocols.
While the majority of existing InfiniBand clusters operate on the Linux platform, drivers and HCA stacks are also available for Microsoft® Windows®, HP-UX, Solaris, and other operating systems from various InfiniBand hardware and software vendors.
The layered software architecture of the HCA allows writing code without specific hardware in mind. The functionality of an HCA is defined by its verb set, which is a table of commands used by the application programming interface (API) of the operating system being run. A number of services and software protocols are available (Figure 4) and, depending on type, can be implemented from user space or from the kernel.
Loading...
+ 10 hidden pages