For more information .......................................................................................................................... 15
Call to action .................................................................................................................................... 15
2
CPU(s)
Memory
Controller
Processor 1
Memory
PCI-X
Bridge
PCIe
Port
Controller
Ultra-3 SCSI
Fibre Channel
Gigabit Ethernet
320 MBps
400 MBps
> 1 Gbps
Memory
Controller
Processor 2
CPU(s)
Processor
Input/Output
InfiniBand
> 10 Gbps*
* 4x link, single data rate
Abstract
With business models constantly changing to keep pace with today’s Internet-based, global economy,
IT organizations are continually challenged to provide customers with high performance platforms
while controlling their cost. An increasing number of enterprise businesses are implementing scale-out
architectures as a cost-effective approach for scalable platforms, not just for high performance
computing (HPC) but also for financial services and Oracle-based database applications.
InfiniBand (IB) is one of the most important technologies that enable the adoption of cluster computing.
This technology brief describes InfiniBand as an interconnect technology used in cluster computing,
provides basic technical information, and explains the advantages of implementing InfiniBand-based
scale-out architectures.
Introduction
The overall performance of enterprise servers is determined by the synergetic relationship between
three main subsystems: processing, memory, and input/output. The multiprocessor architecture used
in the latest single-server systems (Figure 1) provides a high degree of parallel processing capability.
However, multiprocessor server architecture cannot scale cost effectively to a large number of
processing cores. Scale-out cluster computing that builds an entire system by connecting stand-alone
systems with interconnect technology has become widely implemented for HPC and enterprise data
centers around the world.
Figure 1. Architecture of a dual-processor single server (node)
3
Figure 2 shows an example of cluster architecture that integrates computing, storage, and
visualization functions into a single system. Applications are usually distributed to compute nodes
through job scheduling tools.
Figure 2. Sample clustering architecture
Scale-out systems allow infrastructure architects to meet performance and cost goals, but interconnect
performance, scalability, and reliability are key areas that must be carefully considered. A cluster
infrastructure works best when built with an interconnect technology that scales easily, reliably, and
economically with system expansion.
Ethernet is a pervasive, mature interconnect technology that can be cost-effective for some application
workloads. The emergence of 10-Gigabit Ethernet (10GbE) offers a cluster interconnect that meets
higher bandwidth requirements than 1GbE can provide. However, 10GbE still lags the latest
InfiniBand technology in latency and bandwidth performance, and lacks native support for the fat-tree
and mesh topologies used in scale-out clusters. InfiniBand remains the interconnect of choice for
highly parallel environments where applications require low latency and high bandwidth across the
entire fabric.
4
InfiniBand Link
InfiniBand technology
InfiniBand is an industry-standard, channel-based architecture in which congestion-management
capabilities and zero-copy data transfers using remote direct memory access (RDMA) are core
capabilities—resulting in high-speed, low-latency interconnects for scale-out compute infrastructures.
InfiniBand uses a multi-layer architecture to transfer data from one node to another. In the InfiniBand
layer model (Figure 3), separate layers perform different tasks in the message passing process.
The upper layer protocols (ULPs) work closest to the operating system and application; they define the
services and affect how much software overhead the data transfer will require. The InfiniBand
transport layer is responsible for communication between applications. The transport layer splits the
messages into data payloads and encapsulates each data payload and an identifier of the
destination node into one or more packets. Packets can contain data payloads of up to four kilobytes.
The packets are passed to the network layer, which selects a route to the destination node and, if
necessary, attaches the route information to the packets. The data link layer attaches a local identifier
(LID) to the packet for communication at the subnet level. The physical layer transforms the packet into
an electromagnetic signal based on the type of network media—copper or fiber.
Figure 3. Distributed computing using InfiniBand architecture
InfiniBand has these important characteristics:
Very high bandwidth—up to 40Gbps Quad Data Rate (QDR) Low latency end-to-end communication—MPI ping-pong latency approaching 1 microsecond Hardware-based protocol handling, resulting in faster throughput and low CPU overhead due to
efficient OS bypass and RDMA
Native support for fat-tree and other common mesh topologies in fabric design
5
InfiniBand components
InfiniBand architecture involves four key components:
A host node or server requires a host channel adapter (HCA) to connect to an InfiniBand
infrastructure. An HCA can be a card installed in an expansion slot or integrated onto the host’s
system board. An HCA can communicate directly with another HCA, with a target channel adapter,
or with an InfiniBand switch.
InfiniBand uses subnet manager (SM) software to manage the InfiniBand fabric and to monitor
interconnect performance and health at the fabric level. A fabric can be as simple as a point-to-point
connection or multiple connections through one or more switches. The SM software resides on a node
or switch within the fabric, and provides switching and configuration information to all of the switches
in the fabric. Additional backup SMs may be located within the fabric for failover should the primary
SM fail. All other nodes in the fabric will contain an SM agent that processes management data.
Managers and agents communicate using management datagrams (MADs).
A target channel adapter (TCA) is used to connect an external storage unit or I/O interface to an
InfiniBand infrastructure. The TCA includes an I/O controller specific to the device’s protocol (SCSI,
Fibre Channel, Ethernet, etc.) and can communicate with an HCA or an InfiniBand switch.
An InfiniBand switch provides scalability by allowing a number of HCAs, TCAs, and other IB switches
to connect to an InfiniBand infrastructure. The switch handles network traffic by checking the local link
header of each data packet received and forwarding the packet to the proper destination.
The most basic InfiniBand infrastructure will consist of host nodes or servers equipped with HCAs, an
InfiniBand switch, and subnet manager software. More expansive networks will include multiple
switches.
InfiniBand software architecture
InfiniBand, like Ethernet, uses a multi-layer processing stack to transfer data between nodes.
InfiniBand architecture, however, provides OS-bypass features such as the communication processing
duties and RDMA operations as core capabilities and offers greater adaptability through a variety of
services and protocols.
While the majority of existing InfiniBand clusters operate on the Linux platform, drivers and HCA
stacks are also available for Microsoft® Windows®, HP-UX, Solaris, and other operating systems
from various InfiniBand hardware and software vendors.
The layered software architecture of the HCA allows writing code without specific hardware in mind.
The functionality of an HCA is defined by its verb set, which is a table of commands used by the
application programming interface (API) of the operating system being run. A number of services and
software protocols are available (Figure 4) and, depending on type, can be implemented from user
space or from the kernel.
Loading...
+ 10 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.