Improving NFS Performance on HPC
Clusters with Dell Fluid Cache for DAS
This Dell technical white paper explains how to improv e Network File
System I/O performance by using Dell Fluid Cache for Direct Attached
Storage in a High Performance Computing Cluster.
Dell HPC Engineering
March 2013, Version 1.0
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS
This document is for informational purposes only and may contain typographical errors and
technical inaccuracies. The content is provided as is, without express or implied warranties of any
kind.
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS
Executive Summary
Most High Performance Computing clusters use some form of a Network File System (NFS) based storage
solution for user data. Easy to configure and administer, free with virtually all Linux distributions, and
well-tested and reliable, NFS has many advantages. Use of nearline SAS drives for backend storage
provides large capacity and good performance at a reasonable cost, but with an inherent pe rfo rma nc e
limitation for random I/O patterns.
This technical white paper describes how to improve I/O performance in such a NFS storage solution
with the use of Dell Fluid Cache for DAS (DFC) technology. It describes the solution and presents
cluster-level measured results for several I/O patterns. These results quantify the performance
improvements possible with DFC, especially for random I/O patterns. This white paper also includes a
how-to recipe in the Appendix that provides step-by-step instructions on building the solution.
5
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS
1. Introduction
A Network File System (NFS) based storage solution is a popular choice for High Performance
Computing Clusters (HPC). Most HPC clusters use some form of NFS irrespective of the size of the
cluster. NFS is simple to configure and administer, free with virtually all Linux distributions, welltested, and can provide reliable storage for user home directories and application data. A well-tuned
NFS solution can provide great performance for small to mid-sized clusters.
drives provide large capacity and good performance for a reasonable price, optimizing the GB/$
metric. However, they limit performance for applications that have random I/O patterns.
Cache for DAS (Direct Attached Storage) reduces this limitation by caching data while the backend
storage services the I/O request, thus improving the performance of the entire NFS storage solution.
1
Nearline SAS (NL SAS)
2
Dell Fluid
This study evaluates Dell Fluid Cache for DAS (DFC)
was conducted to analyze different I/O characteristics and quantify the performance improvements
gained with DFC when compared to the same NFS configuration without using DFC. All the results
presented in this document are measured results that were obtained in the Dell HPC laboratory.
The following section introduces the DFC technology. Subsequent sections describe the design of the
storage solution and the tuning optimizations applied to the solution. Information is provided on tools
that can be used to monitor the solution. An analysis of the performance results follows. The paper
concludes with recommendations on best-fit use cases for DFC with NFS.
Two appendices that provide step-by-step instructions on how to configure such a solution and provide
information on the benchmarks and tests that were run for this study complete this document.
3
with NFS for HPC clusters. A cluster-level study
1.1. Dell Fluid Cache for DAS (direct-attache d storage)
DFC is a write-back host caching software. DFC combines multiple Dell PowerEdge™ Express Flash PCIe
SSDs to provide a read and write cache pool. This PCIe SSD cache pool is used to accelerate response
times with significant improvements in I/O operations per second (IOPS).
Some features of the DFC software include:
• Faster cache reads, writes, read-after-writes, and re-reads
• Data protection as writes are replicated across multiple PCIe SSDs
• Orderly hot swap and hot plug capability that allows adding or removing a device without
halting or rebooting the system
More details on the Dell Fluid Cache for DAS technology can be found at [3].
In an HPC context, the DFC software can be configured on a NFS server. PCIe SSDs on the NFS server
will provide the virtual cache pool.
2. Solution design and architecture
This section describes the NFS storage solution used to evaluate the DFC technology. The baseline for
comparison is an NFS server with direct-attached external SAS storage. The configuration of this NFS
server is augmented with PCIe SSDs and DFC software for the DFC comparison. A 64-server Dell
PowerEdge cluster was used as I/O clients to provide I/O load to the storage solution. The following
6
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS
sections provide details on each of these components as well as information on tuning and monitoring
the solution.
2.1. NFS storage solution (baseline)
The baseline in this study is an NFS configuration. One PowerEdge R720 is used as the NFS server.
PowerVault™ MD1200 storage arrays are direct-attached to the PowerEdge R720 and provide the
storage. The attached storage is formatted as a Red Hat Scalable File System (XFS). This file system is
exported via NFS to the HPC compute cluster.
The NFS server and storage is shown in Figure 1 and Figure 2. Table 1 and Table 2 list the details of the
configuration. Readers familiar with Dell’s NSS line of solutions
configuration.
1
will recognize this baseline
NFS storage solution Figure 1.
7
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS
NFS server Figure 2.
NFS server and storage hardware configuration Table 1.
Server configuration
NFS SERVER PowerEdge R720
PROCESSORS Dual Intel(R) Xeon(R) CPU E5-2680 @ 2.70 GHz
MEMORY 128 GB. 16 * 8 GB 1600MT/s RDIMMs
5 * 300 GB 15 K SAS disks
INTERNAL DISKS
INTERNAL RAID CONTROLLER PERC H710P mini (internal)
EXTERNAL RAID CONTROLLER
INTERCONNECT TO CLIENTS Mellanox ConnectX-3 FDR card (slot 5)
Storage Enclosure Four PowerVault MD1200 arrays, daisy chained
Hard Disks 12 * 3 TB 7200 rpm NL SAS drives per storage enclosure
Two drives configured in RAID-0 for the operating system
with one additional drive as a hot spare
Two drives configured in RAID-1 for swap space
PERC H810 adapter (slot 7) connected to the storage
enclosures
Storage configuration
8
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS
NFS server software and firmware configuration Table 2.
Software
OPERATING SYSTEM Red Hat Enterprise Linux (RHEL) 6.3z
KERNEL VERSION 2.6.32-279.14.1.el6.x86_64
FILE SYSTEM Red Hat Scalable File System (XFS) 3.1.1-7
SYSTEMS MANAGEMENT Dell OpenManage Server Administrator 7.1.2
Firmware and Drivers
BIOS 1.3.6
iDRAC 1.23.23 (Build 1)
PERC H710/PERC H810
FIRMWARE
PERC DRIVER megasas 00.00.06.14-rh1
INFINIBAND FIRMWARE 2.11.500
INFINIBAND DRIVER Mellanox OFED 1.5.3-3.1.0
The baseline described in this section is very similar to the Dell NSS. One key difference is the use of a
single RAID controller to connect to all four storage arrays. In a pure-NSS environment, two PERC RAID
controllers are recommended for optimal performance. With two PERC cards, the two RAID virtual disks
are combined using Linux Logical Volume Manager (LVM). DFC does not support caching of an LVM
device, hence a single PERC was used for this study.
The NFS server and the attached storage arrays are configured and tuned for optimal performance
based on several past studies
Detailed instructions on configuring this storage solution are provided in Appendix A: Step-by-step
configuration of Dell Fluid Cache for NFS.
4
. A summary of the design choices is provided in Section 2.4.
21.1.0-0007
2.2. Dell Fluid Cache for DAS based solution
The DFC-based NFS solution builds on top of the baseline configuration described in Section 2.1. It
simply adds PCIe SSDs and the DFC software to the baseline configuration. Details of the configuration
are provided in Table 3 and Table 4.
9
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS
Hardware configuration for DFC Table 3.
Server configuration
NFS SERVER PowerEdge R720
CACHE POOL Two 350GB Dell PowerEdge Express Flash PCIe SSD
SSD CONTROLLER Internal (slot 4)
Rest of the configuration is the same as baseline, as described in Table 1
Storage configuration
Same as baseline, as described in Table 1
Software and firmware configuration for DFC Table 4.
Software
CACHING SOFTWARE Dell Fluid Cache for DAS v1.0
Rest of the configuration is the same as baseline, as described in Table 2
Firmware and Drivers
PCIe SSD DRIVER
Rest of the configuration is the same as baseline, as described in Table 2
In DFC vocabulary, the cache or cache pool is the SSDs, and the disk that is enabled for caching is the
virtual disk on the PowerVault MD1200s. Most importantly, the methods used to access the data remain
the same as in the baseline case. The I/O clients simply mount the same NFS exported directory as in
the baseline configuration. Detailed instructions on configuring DFC for this storage solution are
provided in Appendix A: Step-by-step configuration of Dell Fluid Cache for NFS.
mtip32xx 1.3.7-1 latest available at the time of this study.
Recommend using mtip32xx v2.1.0
2.3. I/O clients test bed
The pure NFS baseline solution and the NFS+DFC solution were exerci sed us i ng a 64 -node HPC cluster.
This compute cluster was used to provide I/O load to the storage solution and help benchmark the
capabilities of the solution.
Using the latest quarter height Dell PowerEdge M420 blade server
cluster, the 64-client cluster was configured in 20U of rack space. Details of the 64-client test bed are
provided in Table 5. Figure 3 shows the entire test bed including the clients. Note that all I/O traffic to
the NFS server used the InfiniBand network and the IPoIB protocol.
5
as the building block for the I/O
10
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS
I/O cluster details Table 5.
I/O cluster configuration
CLIENTS
CHASSIS CONFIGURATION
INFINIBAND FABRIC
For I/O traffic
ETHERNET FABRIC
For cluster deployment and
management
CLIENT PowerEdge M420 blade server
PROCESSORS Dual Intel(R) Xeon(R) CPU E5-2470 @ 2.30 GHz
64 PowerEdge M420 blade servers
32 blades in each of two PowerEdge M1000e chassis
Two PowerEdge M1000e chassis, each with 32 blades
Two Mellanox M4001F FDR10 I/O modules per chassis
Two PowerConnect M6220 I/O switch modules per chassis
Each PowerEdge M1000e chassis has two Mellanox M4001
FDR10 I/O module switches.
Each FDR10 I/O module has four uplinks to a rack Mellanox
SX6025 FDR switch for a total of 16 uplinks.
The FDR rack switch has a single FDR link to the NFS server.
Each PowerEdge M1000e chassis has two PowerConnect
M6220 Ethernet switch modules.
Each M6220 switch module has one link to a rack
PowerConnect 5224 switch.
There is one link from the rack PowerConnect switch to an
Ethernet interface on the cluster master node.
I/O compute node configuration
MEMORY 48 GB. 6 * 8 GB 1600 MT/s RDIMMs
INTERNAL DISK 1 50GB SATA SSD
INTERNAL RAID CONTROLLER PERC H310 Embedded
CLUSTER ADMINISTRATION
INTERCONNECT
I/O INTERCONNECT
BIOS 1.3.5
iDRAC 1.23.23 (Build 1)
OPERATING SYSTEM Red Hat Enterprise Linux (RHEL) 6.2
KERNEL 2.6.32-220.el6.x86_64
Broadcom NetXtreme II BCM57810
Mellanox ConnectX-3 FDR10 mezzanine card
I/O cluster software and firmware
11
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS
OFED Mellanox OFED 1.5.3-3.0.0
Test bed Figure 3.
2.4. Solution tuning
The NFS server and the attached storage arrays are configured and tuned for optimal performance.
These options were selected based on extensive studies done by the Dell HPC team. Results of these
studies and the tradeoffs of the tuning options are available in [4].
Additionally the DFC configuration was tuned based on experience gained from this study.
This section provides a quick summary of some of the optimizations applied to the storage solution.
Detailed instructions on configuring this storage solution are provided in Appendix A: Step-by-step
configuration of Dell Fluid Cache for NFS.
12
Loading...
+ 26 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.