Hp PROLIANT BL35P, PROLIANT BL45P, PROLIANT BL40P, PROLIANT BL30P, PROLIANT BL25P RDMA protocol: improving network performance

RDMA protocol: improving network performance
technology brief
Abstract.............................................................................................................................................. 2
Introduction......................................................................................................................................... 2
Limitations of TCP/IP ............................................................................................................................ 2
RDMA solution .................................................................................................................................... 3
RDMA over TCP .................................................................................................................................. 4
RDMA protocol overview .................................................................................................................. 5
RDMA data transfer operations.......................................................................................................... 6
Send operations........................................................................................................................... 6
RDMA write................................................................................................................................. 7
RDMA read ................................................................................................................................. 7
Terminate .................................................................................................................................... 7
Verbs.............................................................................................................................................. 7
RNIC interface................................................................................................................................. 8
RDMA over InfiniBand.......................................................................................................................... 9
InfiniBand RDMA protocols ............................................................................................................. 10
Direct Access Programming Library .............................................................................................. 10
Sockets Direct Protocol ................................................................................................................ 10
SCSI RDMA Protocol................................................................................................................... 11
InfiniBand link operation ................................................................................................................. 11
Conclusion........................................................................................................................................ 11
For more information.......................................................................................................................... 12
Call to action .................................................................................................................................... 12

Abstract

Remote Direct Memory Access (RDMA) is a data exchange technology that improves network performance by streamlining data processing operations. This technology brief describes how RDMA can be applied to the two most common network interconnects, Ethernet and InfiniBand, to provide efficient throughput in the data center.

Introduction

Advances in computing and storage technologies are placing a considerable burden on the data center’s network infrastructure. As network speeds increase and greater amounts of data are moved, it takes more processing power to process data communication.
A typical data center today uses a variety of disparate interconnects for servers-to-servers and server­to-storage links. The use of multiple system and peripheral bus interconnects decreases compatibility, interoperability, and management efficiency and drives up the cost of equipment, software, training, and the personnel needed to operate and maintain them. To increase efficiency and lower costs, data center network infrastructure must be transformed into a unified, flexible, high-speed fabric.
Unified high-speed infrastructures require a high-bandwidth, low-latency fabric that can move data efficiently and securely between servers, storage, and applications. Evolving fabric interconnects and associated technologies provide more efficient and scalable computing and data transport within the data center by reducing the overhead burden on processors and memory. More efficient communication protocols and technologies, some of which run over existing infrastructures, free processors for more useful work and improve infrastructure utilization. In addition, the ability of fabric interconnects to converge functions in the data center over fewer, or possibly even one, industry­standard interconnect presents significant benefits.
Remote direct memory access (RDMA) is a data exchange technology that promises to accomplish these goals and make iWARP (a protocol that specifies RDMA over TCP/IP) a reality. Applying RDMA to switched-fabric infrastructures such as InfiniBand™ (IB) can enhance the performance of clustered systems handling large data transfers.

Limitations of TCP/IP

Transmission Control Protocol and Internet Protocol (TCP/IP) represent the suite of protocols that drive the Internet. Every computer connected to the Internet uses these protocols to send and receive information. Information is transmitted in fixed data formats (packets), so that heterogeneous systems can communicate. The TCP/IP stack of protocols was developed to be an internetworking language for all types of computers to transfer data across different physical media. The TCP and IP protocol suite includes over 70,000 software instructions that provide the necessary reliability mechanisms, error detection and correction, sequencing, recovery, and other communications features.
Computers implement the TCP/IP protocol stack to process outgoing and incoming data packets. Today, TCP/IP stacks are usually implemented in operating system software and packets are handled by the main (host) processor. As a result, protocol processing of incoming and outgoing network traffic consumes processor cycles—cycles that could otherwise be used for business and other productivity applications. The processing work and associated time delays may also reduce the ability of applications to scale across multiple servers. As network speeds move beyond 1 gigabit per second (Gb/s) and larger amounts of data are transmitted, processors become burdened by TCP/IP protocol processing and data movement.
The burden of protocol stack processing is compounded by a finite amount of memory bus bandwidth. Incoming network data consumes the memory bus bandwidth because each data packet
2
must be transferred in and out of memory several times (Figure 1): received data is written to the
y
(
device driver buffer, copied into an operating system (OS) buffer, and then copied into application memory space.
Figure 1. Typical flow of network data in receiving host
Memor
NOTE: The actual number of memory copies varies depending on
Example: Linux uses 2).
OS
Chipset
Network I/F
CPU
These copy operations add latency, consume memory bus bandwidth, and require host processor (CPU) intervention. In fact, the TCP/IP protocol overhead associated with 1 Gb of Ethernet traffic can increase system processor utilization by 20 to 30 percent. Consequently, software overhead for 10 Gb Ethernet operation has the potential to overwhelm system processors. An InfiniBand network using TCP operations to satisfy compatibility issues will suffer from the same processing overhead problems that Ethernet networks have.

RDMA solution

Inherent processor overhead and constrained memory bandwidth are performance obstacles for networks that use TCP, whether out of necessity (Ethernet) or compatibility (InfiniBand).
For Ethernet, the use of a TCP/IP offload engine (TOE) and RDMA can diminish these obstacles. A network interface adapter (NIC) with a TOE assumes TCP/IP processing duties, freeing the host processor for other tasks. The capability of a TOE is defined by its hardware design, the OS programming interface, and the application being run.
RDMA technology was developed to move data from the memory of one computer directly into the memory of another computer with minimal involvement from their processors. The RDMA protocol includes information that allows a system to place transferred data directly into its final memory destination without additional or interim data copies. This “zero copy” or “direct data placement” (DDP) capability provides the most efficient network communication possible between systems.
Since the intent of both a TOE and RDMA is to relieve host processors of network overhead, they are sometimes confused with each other. However, the TOE is primarily a hardware solution that specifically takes responsibility of TCP/IP operations, while RDMA is a protocol solution that operates at the upper layers of the network communication stack. Consequently, TOEs and RDMA can work together: a TOE can provide localized connectivity with a device while RDMA enhances the data throughput with a more efficient protocol.
For InfiniBand, RDMA operations provide an even greater performance benefit since InfiniBand architecture was designed with RDMA as a core capability (no TOE needed).
RDMA provides a faster path for applications to transmit messages between network devices and is applicable to both Ethernet and InfiniBand. Both these interconnects can support all new and existing network standards such as Sockets Direct Protocol (SDP), iSCSI Extensions for RDMA (iSER), Network File System (NFS), Direct Access File System (DAFS), and Message Passing Interface (MPI).
3

RDMA over TCP

Ethernet is the most prevalent network interconnect in use today. IT organizations have invested heavily in Ethernet technology and most are unwilling to tear out their networks and replace them. Reliance on Ethernet is justified by its low cost, backward compatibility, and consistent bandwidth upgrades over time. Today’s Ethernet networks, which use TCP/IP operations, commonly operate at 100 megabits per second (Mb/s) and 1 gigabit per second (Gb/s). Next-generation speeds will increase to 10 Gb/s. Customer migration to 10-Gb Ethernet will be tempered by the input/output (I/O) processing burden that TCP/IP operations place on servers.
The addition of RDMA capability to Ethernet will reduce host processor utilization and increase the benefits realized by migrating to 10-Gb Ethernet. Adding RDMA capability to Ethernet will allow data centers to expand the infrastructure with less effect on overall performance. This improves infrastructure flexibility for adapting to future needs.
RDMA over TCP is a communication protocol that moves data directly between the memory of applications on two systems (or nodes), with minimal work by the operating system kernel and without interim data copying into system buffers (Figure 2). This capability enables RDMA over TCP to work over standard TCP/IP-based networks (such as Ethernet) that are commonly used in data centers today. Note that RDMA over TCP does not specify the physical layer and will work over any network that uses TCP/IP.
Figure 2. Data flow with RDMA over TCP (Ethernet)
Sending Host
Memory CPU
TX Pr
RX Pr
RDMA over TCP allows many classes of traffic (networking, I/O, file system and block storage, and interprocess messaging) to share the same physical interconnect, enabling that physical interconnect to become the single unifying data center fabric. RDMA over TCP provides more efficient network communications, which can increase the scalability of processor-bound applications. RDMA over TCP also leverages existing Ethernet infrastructures and the expertise of IT networking personnel.
Chipset
Ethernet
NIC
Memory
Ethernet LAN
Receiving Host
Chipset
Ethernet
NIC
CPU
4
Loading...
+ 8 hidden pages