Mellanox ConnectX-5 User Manual

ConnectX®-5 EN Card
5
ADAPTER CARD
PRODUCT BRIEF
100Gb/s Ethernet Adapter Card
Intelligent RDMA-enabled network adapter card with advanced
application offload capabilities for High-Performance Computing,
Web2.0, Cloud and Storage platforms
ConnectX-5 EN supports two ports of 100Gb Ethernet connectivity, while delivering low sub-600ns
latency, extremely high message rates, PCIe switch and NVMe over Fabric ofoads. ConnectX-5 providing the highest performance and most exible solution for the most demanding applications and markets: Machine Learning, Data Analytics, and more.
ConnectX-5 delivers high bandwidth, low latency, and high computation efciency for high performance, data intensive and scalable compute and storage platforms. ConnectX-5 offers enhancements to HPC infrastructures by providing MPI and SHMEM/PGAS and Rendezvous Tag Matching ofoad, hardware support for out-of-order RDMA Write and Read operations, as well as additional Network Atomic and PCIe Atomic operations support.
ConnectX-5 EN utilizes RoCE (RDMA over Converged Ethernet) technology, delivering low-latency and high performance. ConnectX-5 enhances RDMA network capabilities by completing the Switch Adaptive­Routing capabilities and supporting data delivered out-of-order, while maintaining ordered completion semantics, providing multipath reliability and efcient support for all network topologies including DragonFly and DragonFly+.
ConnectX-5 also supports Burst Buffer ofoad for background checkpointing without interfering in the main CPU operations, and the innovative transport service Dynamic Connected Transport (DCT) to ensure extreme scalability for compute and storage systems.
STORAGE ENVIRONMENTS
NVMe storage devices are gaining popularity, offering very fast storage access. The evolving NVMe over Fabric (NVMe-oF) protocol leverages the RDMA connectivity for remote access. ConnectX-5 offers further enhancements by providing NVMe-oF target ofoads, enabling very efcient NVMe storage access with no CPU intervention, and thus improved performance and lower latency.
Moreover, the embedded PCIe switch enables customers to build standalone storage or Machine
Learning appliances. As with the earlier generations of ConnectX adapters, standard block and le access protocols can leverage RoCE for high-performance storage access. A consolidated compute and storage network achieves signicant cost-performance advantages over multi-fabric networks.
HIGHLIGHTS
NEW FEATURES
– Tag matching and rendezvous offloads
– Adaptive routing on reliable transport
– Burst buffer offloads for background
checkpointing
– NVMe over Fabric (NVMe-oF) offloads
– Back-end switch elimination by host
chaining
– Embedded PCIe switch
– Enhanced vSwitch/vRouter offloads
– Flexible pipeline
– RoCE for overlay networks
– PCIe Gen 4 support
BENEFITS
– Up to 100Gb/s connectivity per port
– Industry-leading throughput, low
latency, low CPU utilization and high message rate
– Maximizes data center ROI with
Multi-Host technology
– Innovative rack design for storage
and Machine Learning based on Host Chaining technology
– Smart interconnect for x86, Power,
Arm, and GPU-based compute and storage platforms
– Advanced storage capabilities
including NVMe over Fabric offloads
– Intelligent network adapter supporting
flexible pipeline programmability
– Cutting-edge performance in
virtualized networks including Network Function Virtualization (NFV)
– Enabler for efficient service chaining
capabilities
– Efficient I/O consolidation, lowering
data center costs and complexity
©2018 Mellanox Technologies. All rights reserved.
For illustration only. Actual products may vary.
Mellanox ConnectX-5 EN Adapter Card
page 2
Block Device / Native Application
Fabric Target
Data Path (IO Cmds,
Data Fetch)
NVMe
Control Path
(Init, Login,
etc.)
NVMe Transport Layer
NVMe
NVMe
Local
Fabric Initiator
NVMe
Device
NVMe
Fabric Target
NVMe Device
RDMA
Fabric
RDMA
Target SW
SCSI
iSCSI
iSER
iSER
iSCSI
SCSI
NVMe
Local
NVMe
Device
ConnectX-5 enables an innovative storage rack design, Host Chaining,
by which different servers can interconnect directly without involving
the Top of the Rack (ToR) switch. Alternatively, the Multi-Host technology that was rst introduced with ConnectX-4 can be used. Mellanox Multi-Host™ technology, when enabled, allows multiple
hosts to be connected into a single adapter by separating the PCIe
interface into multiple and independent interfaces. With the various
new rack design alternatives, ConnectX-5 lowers the total cost of
ownership (TCO) in the data center by reducing CAPEX (cables, NICs, and switch port expenses), and by reducing OPEX by cutting down on switch port management and overall power usage.
Mellanox Accelerated Switching And Packet Processing (ASAP
2
)
Direct technology allows to ofoad vSwitch/vRouter by handling the
data plane in the NIC hardware while maintaining the control plane
unmodied. As a result there is signicantly higher vSwitch/vRouter performance without the associated CPU load.
The vSwitch/vRouter ofoad functions that are supported by ConnectX-5 include Overlay Networks (for example, VXLAN, NVGRE, MPLS, GENEVE, and NSH) headers’ encapsulation and de-capsulation, as well as stateless ofoads of inner packets, packet headers’ re-write enabling NAT functionality, and more.
Moreover, the intelligent ConnectX-5 exible pipeline capabilities, which include exible parser and exible match-action tables, can be programmed, which enable hardware ofoads for future protocols.
ConnectX-5 SR-IOV technology provides dedicated adapter resources and guaranteed isolation and protection for virtual machines (VMs) within the server. Moreover, with ConnectX-5 Network Function Virtualization (NFV), a VM can be used as a virtual appliance. With full data-path operations ofoads as well as hairpin hardware capability
and service chaining, data can be handled by the Virtual Appliance
with minimum CPU utilization. With these capabilities data center administrators benet from better
Para-Virtualized
SR-IOV
Eliminating
Backend
Switch
Host Chaining for Storage Backend
Traditional Storage Connectivity
CLOUD AND WEB2.0 ENVIRONMENTS
Cloud and Web2.0 customers that are developing their platforms on Software Dened Network (SDN) environments are leveraging their servers’ Operating System Virtual-Switching capabilities to enable maximum exibility.
Open V-Switch (OVS) is an example of a virtual switch that allows
Virtual Machines to communicate with each other and with the
outside world. A virtual switch traditionally resides in the hypervisor and switching is based on twelve-tuple matching on ows. The virtual
switch or virtual router software-based solution is CPU intensive,
affecting system performance and preventing fully utilizing available bandwidth.
VM
VM VM VM
Hypervisor
vSwitch
NICSR-IOV NIC
Hypervisor
Physical
Function (PF)
Virtual
Function
(VF)
eSwitch
server utilization while reducing cost, power, and cable complexity,
allowing more Virtual Appliances, Virtual Machines and more tenants
on the same hardware.
HOST MANAGEMENT
Mellanox host management and control capabilities include
NC-SI over MCTP over SMBus, and MCTP over PCIe - Baseboard Management Controller (BMC) interface, as well as PLDM for Monitor and Control DSP0248 and PLDM for Firmware Update DSP0267.
©2018 Mellanox Technologies. All rights reserved.
Mellanox ConnectX-5 EN Adapter Card
page 3
COMPATIBILITY
PCI Express Interface
PCIe Gen 4 PCIe Gen 3.0, 1.1 and 2.0 compatible2.5, 5.0, 8, 16GT/s link rateAuto-negotiates to x16, x8, x4, x2, or
x1 lanes
– PCIe Atomic – TLP (Transaction Layer Packet)
Processing Hints (TPH)
Embedded PCIe Switch: Up to 8
bifurcations
Ethernet
– 100GbE / 50GbE / 40GbE / 25GbE /
10GbE / 1GbE
IEEE 802.3bj, 802.3bm 100 Gigabit
Ethernet
IEEE 802.3by, Ethernet Consortium
25, 50 Gigabit Ethernet, supporting all FEC modes
IEEE 802.3ba 40 Gigabit Ethernet IEEE 802.3ae 10 Gigabit Ethernet IEEE 802.3az Energy Efcient
Ethernet (fast wake)
– IEEE 802.3ap based auto-negotiation
and KR startup
IEEE 802.3ad, 802.1AX Link
Aggregation
IEEE 802.1Q, 802.1P VLAN tags and
priority
IEEE 802.1Qau (QCN) – Congestion
Notication
IEEE 802.1Qaz (ETS) IEEE 802.1Qbb (PFC) IEEE 802.1Qbg IEEE 1588v2 Jumbo frame support (9.6KB)
Enhanced Features
Hardware-based reliable transport Collective operations ofoads Vector collective operations ofoads PeerDirect™ RDMA (aka GPUDirect
communication acceleration
64/66 encodingExtended Reliable Connected
transport (XRC)
– PCIe switch Downstream Port
Containment (DPC) enablement for
PCIe hot-plug
Access Control Service (ACS) for
peer-to-peer secure communication
Advance Error Reporting (AER)Process Address Space ID (PASID)
Address Translation Services (ATS)
– IBM CAPI v2 support (Coherent
Accelerator Processor Interface)
Support for MSI/MSI-X mechanisms
Dynamically Connected Transport
(DCT)
– Enhanced Atomic operations – Advanced memory mapping support,
allowing user mode registration and
remapping of memory (UMR)
On demand paging (ODP) MPI Tag Matching Rendezvous protocol ofoadOut-of-order RDMA supporting
Adaptive Routing
Burst buffer ofoad – In-Network Memory registration-free
RDMA memory access
CPU Ofoads
– RDMA over Converged Ethernet
(RoCE)
TCP/UDP/IP stateless ofoad LSO, LRO, checksum ofoad RSS (also on encapsulated packet),
TSS, HDS, VLAN and MPLS tag insertion/stripping, Receive ow
steering
Data Plane Development Kit (DPDK)
for kernel bypass applications
Open VSwitch (OVS) ofoad using
2
ASAP
Flexible match-action ow tables
Tunneling encapsulation/
®
de-capsulation
)
– Intelligent interrupt coalescence – Header rewrite supporting hardware
ofoad of NAT router
Operating Systems/Distributions*
RHEL/CentOS Windows – FreeBSD – VMware – OpenFabrics Enterprise Distribution
(OFED)
– OpenFabrics Windows Distribution
(WinOF-2)
FEATURES
Storage Ofoads
NVMe over Fabric ofoads for target
machine
Erasure Coding ofoad – ofoading
Reed Solomon calculations
T10 DIF – Signature handover
operation at wire speed, for ingress
and egress trafc
– Storage protocols: SRP, iSER, NFS
RDMA, SMB Direct, NVMe-oF
Overlay Networks
RoCE over Overlay NetworksStateless ofoads for overlay
network tunneling protocols
Hardware ofoad of encapsulation
and decapsulation of VXLAN,
NVGRE, and GENEVE overlay
networks
Hardware-Based I/O
Virtualization
Single Root IOV – Address translation and protection – VMware NetQueue support SR-IOV: Up to 1K Virtual Functions SR-IOV: Up to 16 Physical Functions
per host
Virtualization hierarchies (e.g., NPAR
and Multi-Host, when enabled)
Virtualizing Physical Functions on
SR-IOV on every Physical Function
Congurable and user-programmable
QoS
Guaranteed QoS for VMs
a physical port
Connectivity
– Interoperability with Ethernet
switches (up to 100GbE)
– Passive copper cable with ESD
protection
– Powered connectors for optical and
active cable support
HPC Software Libraries
– Open MPI, IBM PE, OSU MPI
(MVAPICH/2), Intel MPI
– Platform MPI, UPC, Open SHMEM
Management and Control
– NC-SI over MCTP over SMBus
and NC-SI over MCTP over PCIe -
Baseboard Management Controller interface
– PLDM for Monitor and Control
DSP0248
PLDM for Firmware Update DSP0267 – SDN management interface for
managing the eSwitch
2
– I
C interface for device control and
conguration
General Purpose I/O pins – SPI interface to Flash – JTAG IEEE 1149.1 and IEEE 1149.6
Remote Boot
Remote boot over Ethernet Remote boot over iSCSI Unied Extensible Firmware Interface
(UEFI)
– Pre-execution Environment (PXE)
* This section describes hardware features and capabilities. Please refer to the driver and rmware release notes for feature availability.
©2018 Mellanox Technologies. All rights reserved.
Mellanox ConnectX-5 EN Adapter Card
Table 1 - Part Numbers and Descriptions
OPN Description Dimensions w/o Bracket
MCX512A-ACAT ConnectX®-5 EN network interface card, 25GbE dual-port SFP28, PCIe3.0 x8, tall bracket, ROHS R6
MCX511F-ACAT ConnectX
MCX512F-ACAT ConnectX
MCX515A-GCAT ConnectX
MCX516A-GCAT ConnectX
MCX515A-CCAT ConnectX
MCX516A-CCAT ConnectX
MCX516A-BDAT ConnectX
MCX516A-CDAT ConnectX
®
-5 EN network interface card, 25GbE single-port SFP28, PCIe3.0 x16, tall bracket, ROHS R6
®
-5 EN network interface card, 25GbE dual-port SFP28, PCIe3.0 x16, tall bracket, ROHS R6
®
-5 EN network interface card, 50GbE single-port QSFP28, PCIe3.0 x16, tall bracket, ROHS R6
®
-5 EN network interface card, 50GbE dual-port QSFP28, PCIe3.0 x16, tall bracket, ROHS R6
®
-5 EN network interface card, 100GbE single-port QSFP28, PCIe3.0 x16, tall bracket, ROHS R6
®
-5 EN network interface card, 100GbE dual-port QSFP28, PCIe3.0 x16, tall bracket, ROHS R6
®
-5 Ex EN network interface card, 40GbE dual-port QSFP28, PCIe4.0 x16, tall bracket, ROHS R6
®
-5 Ex EN network interface card, 100GbE dual-port QSFP28, PCIe4.0 x16, tall bracket, ROHS R6
14.2cm x 6.9cm (Low Prole)
NOTE: All tall-bracket adapters are shipped with the tall bracket mounted and a short bracket as an accessory.
page 4
350 Oakmead Parkway, Suite 100, Sunnyvale, CA 94085 Tel: 408-970-3400 • Fax: 408-970-3403
www.mellanox.com
© Copyright 2018. Mellanox Technologies. All rights reserved. Mellanox, Mellanox logo, ConnectX, CORE-Direct, and GPUDirect are registered trademarks of Mellanox Technologies, Ltd. Mellanox Multi-Host is a trademark of Mellanox Technologies, Ltd. All other trademarks are property of their respective owners.
53166PB
Rev 2.0
Loading...