Intel RAID High Availability Storage User Manual

Intel® RAID High Availability Storage
User Guide
Order Number: G85745-001
Disclaimer
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL®'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL® ASSUMES NO LIABILITY WHATSOEVER AND INTEL® DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel® Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL®'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL® AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL® OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL® PRODUCT OR ANY OF ITS PARTS.
Intel® may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel® reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Contact your local Intel® sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an order number and are referenced in this document, or other Intel® literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.
ii Intel® RAID High Availability Storage User Guide
Important Safety Instructions
Important Safety Instructions
Read all caution and safety statements in this document before performing any of the instructions. See Intel® Server Boards and Server Chassis Safety Information at http://support.intel.com/support/motherboards/server/sb/cs-010770.htm.
Wichtige Sicherheitshinweise
Lesen Sie zunächst sämtliche Warn- und Sicherheitshinweise in diesem Dokument, bevor Sie eine der Anweisungen ausführen. Beachten Sie hierzu auch die Sicherheitshinweise zu Intel®­Serverplatinen und -Servergehäusen unter http://support.intel.com/support/motherboards/server/sb/cs-010770.htm.
重要安全指导
在执行任何指令之前,请阅读本文档中的所有注意事项及安全声明。 和/
http://support.intel.com/support/motherboards/server/sb/cs-010770.htm 上的 Intel® Server Boards and Server Chassis Safety Information(《Intel 服务器主板与服务器机箱安全信息》)。
Important Safety InstructionsConsignes de sécurité
Lisez attention toutes les consignes de sécurité et les mises en garde indiquées dans ce document avant de suivre toute instruction. Consultez Intel® Server Boards and Server Chassis Safety Information rendez-vous sur le site http://support.intel.com/support/motherboards/server/sb/cs-
010770.htm.
Instrucciones de seguridad importantes
Lea todas las declaraciones de seguridad y precaución de este documento antes de realizar cualquiera de las instrucciones. Vea Intel® Server Boards and Server Chassis Safety Information en http://support.intel.com/support/motherboards/server/sb/cs-010770.htm.
Intel® RAID High Availability Storage User Guide iii
WARNINGS
Server power on/off: The push-button on/off power switch on the
front panel of the server does not turn off the AC power. To remove AC power from the server, you must unplug the AC power cord from either the power supply or wall outlet.
Hazardous conditionspower supply: Hazardous voltage, current, and energy levels are present inside the power supply enclosure. There are no user-serviceable parts inside it; servicing should only be done by technically qualified personnel.
Hazardous conditionsdevices and cables: Hazardous electrical conditions may be present on power, telephone, and communication cables. Turn off the server and disconnect telecommunications systems, networks, modems, and the power cord attached to the server before opening it. Otherwise, personal injury or equipment damage can happen.
iv Intel® RAID High Availability Storage User Guide
Table of Contents
1. Introduction .................................................................................................................. 6
Concepts of High-Availability DAS ................................................................................ 6
Intel® RAID High Availability Storage Terminology ..................................................... 7
Intel® RAID High Availability Storage Solution Features ............................................... 7
2. Hardware and Software Setup ...................................................................................... 9
Installing Intel® RAID Premium Feature Key AXXRPFKHA2 ........................................ 9
Installing Intel® RAID High Availability Storage Hardware ........................................... 10
Setting Up a Cluster-in-a-Box Configuration ................................................. 11
Setting Up a Two-Server Configuration with External JBOD Configuration... 13
Cabling Configurations ............................................................................................... 16
Installing Intel® RAID High Availability Storage Software ............................................ 20
Installing the Operating System and the Failover Clustering Feature ........... 20
Installing the Intel RAID Driver ..................................................................... 21
Installing the Management Tools .................................................................. 21
3. Creating the Intel® RAID High Availability Storage Configuration ................................ 22
Validating the Failover Configuration .......................................................................... 22
Creating the Cluster .................................................................................................... 22
Creating Virtual Drives on the Controller Nodes .......................................................... 24
Creating Shared VDs with the Intel® RAID BIOS Console ........................... 24
Creating Shared VDs with CmdTool264.exe on Windows Server 2012 ........ 28
Creating Shared VDs with RWC2 ................................................................. 34
Intel® RAID High Availability Storage SSD Cache Support....................................... 37
4. System Administration ................................................................................................ 42
High Availability Properties ......................................................................................... 42
Understanding Failover Operations ............................................................................ 43
Understanding and Using Planned Failover ................................................. 45
Understanding Unplanned Failover .............................................................. 47
Updating the Intel® RAID High Availability Storage Controller Firmware ..................... 47
Updating the Intel RAID Driver .................................................................................... 48
Updating the Driver in Windows Server 2008 R2 .......................................... 48
Updating the Driver in Windows Server 2012 ............................................... 50
Performing Preventative Measures on Disk Drives and VDs....................................... 52
Troubleshooting .................................................................................................................. 53
Reference Checklist of Required Intel® RAID High Availability Storage Components 53 Verifying Intel® RAID High Availability Storage Support in Tools and the OS Driver . 53
Confirming SAS Connections ..................................................................................... 55
Using Intel® RAID BIOS Console to View Connections for Controllers,
Expanders, and Drives ................................................................ 55
Using Intel® RAID BIOS Console to Verify Dual-Ported SAS Addresses to
Disk Drives ................................................................................. 55
Using CmdTool2 to Verify Dual-Ported SAS Addresses to Disk Drives ........ 57
Using RWC2 to Verify Dual-Ported SAS Addresses to Disk Drives .............. 58
Understanding SSD Cache Behavior During a Failover .............................................. 59
Error Situations and Solutions .................................................................................... 59
Intel® RAID High Availability Storage User Guide v
1. Introduction
This document explains how to set up and configure the hardware and software for the Intel RAID High Availability Storage solution.
The Intel® RAID High Availability Storage solution provides fault tolerance capabilities as a key part of a high-availability data storage system. The RAID High Availability Storage solution combines redundant Intel® RAID controllers, computer nodes, cable connections, SAS expanders, and dual-port SAS drives to provide failover redundancy through multiple paths to data.
The redundant components and software technologies provide a high-availability system with ongoing service that is not interrupted by the following events:
A node failure which does not interrupt service because the configuration has multiple
nodes with cluster failover.
An expander failure which does not interrupt service because the dual expanders in every
enclosure provide redundant data paths.
A drive failure which does not interrupt service because RAID fault tolerance is part of the
configuration.
A system storage expansion or maintenance activity that can be completed without
requiring an interruption of service because of redundant components, management software, and maintenance procedures.
®
Concepts of High-Availability DAS
In terms of data storage and processing, High Availability (HA) means a computer system design that ensures a high level of operational continuity and data access reliability over a given period of time. DAS means Directly Attached Storage. High-availability systems are critical to the success and business needs of small and medium-sized businesses such as retail and health care offices. An Intel® RAID High Availability Storage solution enables a customer to maintain all elements of the high-availability system, with shared direct-attached drives accessible to multiple servers through the use of failover clustering technology.
Simply defined, a cluster is a group of computers working together to run a common set of applications and to present a single logical system to the client and application. Failover clustering provides redundancy to the cluster group to maximize solution up-time by utilizing fault-tolerant components. In the example of two servers with shared storage that comprise a failover cluster, when a server fails, the failover cluster automatically moves control of the shared resources to the surviving server with no interruption of access to data. This configuration allows seamless failover capabilities in the event of planned failover (maintenance mode) for maintenance or upgrade, or in the event of a failure of a system component such as a failure of the CPU, memory, or other non-storage hardware.
SAS zoning is typically required to partition the storage domain between multiple initiators and target devices. However, because multiple initiators can exist in a common shared storage domain, there is a concept of device reservations in which physical drives, drive groups, and
Intel® RAID High Availability Storage User Guide 6
virtual drives (VDs) are managed by a single initiator. In the case of Intel® RAID High Availability Storage, I/O transactions and RAID management operations are processed by a single Intel® RAID High Availability Storage controller, and the associated physical drives, drive groups, and VDs are only visible to that controller. This key functionality allows the Intel® RAID High Availability Storage solution to share VDs among multiple initiators as well as exclusively constrain VD access to a particular initiator without the need for SAS zoning.
Node downtime in an HA system can be either planned or unplanned. Planned node downtime is the result of management-initiated events, such as upgrades and maintenance. Unplanned node downtime results from events that are not within the direct control of IT administrators, such as failed software, drivers, or hardware. The Intel® RAID High Availability Storage solution protects your data and system up-time from unplanned node downtime. It also enables you to schedule node downtime to update hardware or firmware, and so on. When you bring one controller node down for scheduled maintenance, the other node takes over with no interruption of service.
Intel® RAID High Availability Storage Terminology
This section defines some additional important Intel® RAID High Availability Storage terms.
Cache Mirror: A cache coherency term describing the duplication of write-back cached data across
two controllers.
Exclusive Access: A host access policy in which a VD is only exposed to, and accessed by, a single
specified server.
Failover: The process in which the management of drive groups and VDs transitions from one
controller to the peer controller to maintain data access and availability.
HA Domain: A type of storage domain that consists of a set of HA controllers, cables, drive
enclosures, and storage media.
Peer Controller: A relative term to describe the HA controller in the HA domain that acts as the
failover controller.
Server/Controller Node: A processing entity composed of a single host processor unit or multiple
host processor units that is characterized by having a single instance of a host operating system.
Server Storage Cluster: An HA storage topology in which a common pool of storage devices is
shared by two computer nodes through dedicated Intel® RAID High Availability Storage controllers.
A host access policy in which a VD is exposed to, and can be accessed by, all servers
in the HA domain.
Intel® RAID High Availability Storage Solution Features
The Intel® RAID High Availability Storage solution supports the following HA features.
Server storage cluster topology Dual-active HA with shared storage Controller-to-controller intercommunication over SAS
Intel® High Availability Storage User Guide 7
Write-back cache coherency SSD Cache 1.0 (Read) Shared and exclusive VD I/O access policies Operating system boot from the controller (exclusive access) Controller hardware and property mismatch detection, handling, and reporting Global hot spare support for all volumes in the HA domain Planned and unplanned failover modes Clustering/HA services support:
Microsoft
®
failover clustering
Operating system support:
Microsoft Windows
®
Server 2008 R2
Microsoft Windows Server 2012
Full Intel RAID
®
features, with the following exceptions.
SATA drives do not support SCSI-3 persistent reservation and are not supported in Intel® RAID
High Availability Storage configurations.
SAS drives that do not support SCSI-3 persistent reservation are not supported in Intel® RAID
High Availability Storage configurations.
T10 Data Integrity Field (DIF) is not supported. Self-encrypting drives (SED) and full disk encryption (FDE) are not supported. SSD Cache 2.0 (write back) is not supported. Dimmer switch is not supported. SGPIO sideband signaling for enclosure management is not supported.
8 Intel
®
High Availability Storage User Guide
2. Hardware and Software Setup
This chapter explains how to set up the hardware and software aspects of the Intel® RAID High Availability Storage solution in a two-controller node with shared storage and fault-tolerant cabling for one or more JBODs.
You can implement the Intel® RAID High Availability Storage solution with a two-server configuration connected to a JBOD, or with a Cluster-in-a-Box (CiB) configuration in which all the server hardware and disk drives are pre-connected inside a single enclosure. In both of these configurations, dual ported backplanes are required in order to provide access for both server nodes to the disk drives. This chapter explains how to set up both types of configurations.
Installing Intel® RAID Premium Feature Key AXXRPFKHA2
The Intel® RAID Premium Feature Key AXXRPFKHA2 is a hardware way to enable the High Availability feature of the RAID controller. Following are the steps to install the Intel® RAID Premium Feature Key.
1. Carefully remove the Intel® RAID Premium Feature Key from its packaging.
2. Locate the RAID Premium Feature Key connector on the Intel® RAID Controller. This is a 2-pin shielded connector. The following figure shows the location of the RAID Premium Feature Key connector on the Intel® RAID Controller RS25DB080. In the figure, the arrow points to the location of the RAID Premium Feature Key connector. The location of RAID Premium Feature Key connector on your Intel® RAID Controller may vary. Please refer to the User Guide for your Intel® RAID Controller for the location of this connector.
Figure 1 Locating RAID Premium Feature Key connector on Intel® RAID Controller RS25DB080
3. With the 3-hole edge of the Intel® RAID Premium Feature key pointing to the RAID Premium Feature Key 2-pin connector of the RAID controller, push the key onto the connector on the RAID controller.
Intel® High Availability Storage User Guide 9
Figure 2 Installing RAID Premium Feature Key on Intel® RAID Controller RS25DB080
4. Refer to documentation of the Intel® RAID Controller from http://www.intel.com/support/motherboards/server/ to finish installing the RAID controller into an Intel® Server Board, or qualified third party server board.
Installing Intel® RAID High Availability Storage Hardware
The first step to setting up the Intel® RAID High Availability Storage solution is to install and configure the hardware components. The following Intel® RAID High Availability Storage hardware checklist outlines the baseline hardware components needed for a configuration with two controller nodes.
Two identical IntelTwo sets of external or internal (CiB) SAS cabling for each Intel
controller
One SAS drive backplane or external SAS JBOD with the following characteristics:
Support for 6Gb/s SAS Dual ported backplanes or dual environmental services modules (ESMs) Dual expanders connected to a shared-drive backplane that supports dual-ported drives
®
RAID High Availability Storage controller assemblies
®
RAID High Availability Storage
Support for SES Support for dual-ported SAS and/or NL-SAS HDD and SSD drives Support for SCSI-3 persistent reservation
Multiple 6Gb/s SAS or NL-SAS drives that support SCSI-3 persistent reservation
Note: The expanders and the dual-port backplane are programmed to route SAS data such that both Intel® RAID High Availability Storage controllers can discover both SAS addresses for all of the drives. For an Intel® RAID High Availability Storage configuration, the expander must have two four-lane In ports. The expander also requires many disk ports assigned according to the cable and backplane
10 Intel
®
High Availability Storage User Guide
board connections. As an option, the expander is configured with a third four-lane port for cascaded expanders, as shown in the following figure.
Figure 3 Intel® RAID High Availability Storage Expander Configuration
Note: Drive enclosures with dual ESM modules can support split modes or unified modes. For fault­tolerant cabling configurations, you typically configure the enclosure in unified mode. (Check with your drive enclosure vendor to determine the appropriate settings).
Setting Up a Cluster-in-a-Box Configuration
The CiB system design converges both server and storage by combining the processing power of multiple servers with shared storage elements. A CiB system houses two servers and a common pool of direct attached drives within one custom designed server enclosure. This server enclosure simplifies the setup and deployment of two-node clusters because all necessary connections are preconfigured between the servers and the drives, as shown in the following figure.
Intel® High Availability Storage User Guide 11
Figure 4 Intel® RAID High Availability Storage Cluster-in-a-Box Configuration
The cluster-in-a-box configuration for the Intel® RAID High Availability Storage solution requires a specially designed server and storage chassis that includes two Intel® RAID High Availability Storage controllers and multiple SAS disks. Because all components are inside the enclosure and are pre­connected, the physical setup is minimal. Install the CiB enclosure in a standard rack and follow the manufacturer’s instructions to complete the physical setup.
The following figure shows a diagram of a CiB Intel® RAID High Availability Storage configuration connected to a network. The diagram includes the details of the SAS interconnections.
12 Intel
®
High Availability Storage User Guide
Figure 5 CiB Intel® RAID High Availability Storage Controller Configuration
Setting Up a Two-Server Configuration with External JBOD Configuration
The Intel® RAID High Availability Storage solution enables you to configure two separate, standard servers with Intel® RAID High Availability Storage controllers that provide access to disks in the same JBOD enclosure, or enclosures, for reliable, high-access redundancy, as shown in the following figure.
Intel® High Availability Storage User Guide 13
Figure 6 High-Availability Standard Server Configuration
This configuration enables you to use standard, readily available server hardware and disk enclosures to set up a reliable Intel® RAID High Availability Storage configuration.
The dual-server-JBOD configuration for the Intel® RAID High Availability Storage solution requires the following hardware for each of the two matched stand-alone server modules:
A Intel
®
RAID High Availability Storage controller board (RS25NB008, RS25SB008, etc). The controller board firmware version must support clustering. You must use the same controller board model in both servers.
Note: The Intel® RAID High Availability Storage solution is based on Intel RAID firmware that is Intel® RAID High Availability Storage capable. Other versions of Intel RAID firmware do not provide clustering support. When the second node boots, firmware version checks occur between the two controllers, and the second node presents POST error messages if the Intel® RAID High Availability Storage firmware versions do not match.
A monitor and mouse for each controller node. Network cabling and SAS cabling to connect the servers and JBOD enclosures.
The following figure shows a diagram of a two-server Intel® RAID High Availability Storage configuration connected to a network. The diagram includes the details of the SAS interconnections.
14 Intel
®
High Availability Storage User Guide
Figure 7 Two-Server Intel® RAID High Availability Storage Configuration
Follow these steps to set up the hardware for a dual-server-JBOD configuration for Intel® RAID High Availability Storage clustering.
1. Install an Intel® RAID High Availability Storage controller board in each of the two server modules, following the instructions in the Quick Installation Guide.
2. If necessary, install network boards in the two server modules and install the cabling between them.
3. Install the two server modules and the JBOD enclosure in an industry-standard cabinet, following the instructions in the manufacturer documentation.
4. Connect the host SFF-8088 connectors on the JBOD enclosure, or enclosures, to the SFF-8088 connectors on the front of the two server modules.
See Section Cabling Configurations, for specific cabling instructions for one or two JBODs.
Intel® High Availability Storage User Guide 15
5. Connect power cords to the server units and the JBOD enclosure and power the units.
Cabling Configurations
This section contains information about initially setting up a Intel® RAID High Availability Storage configuration with one or two JBODs. It also explains how to add a second JBOD to a single-JBOD configuration without interrupting service on the configuration.
The following figure shows how to set up a two-controller-node configuration with a single JBOD enclosure.
Figure 8 Two-Controller-Node Configuration with Single JBOD
The cross-connections between the controllers provide redundant paths that safeguard against ESM failure.
To retain consistent device reporting, the corresponding port numbers for both controllers must be connected to a common enclosure ESM. In this example, port 0 to port 3 of both controllers are connected to ESM A and port 4 to port 7 of both controllers are connected to ESM B.
The following figure shows how to set up a two-controller-node configuration with two JBOD enclosures.
16 Intel
®
High Availability Storage User Guide
Figure 9 Two-Controller-Node Configuration with Dual JBODs
The top-down/ bottom-up cabling approach shown in this figure has the following benefits:
The cross-connections between controllers and ESMs safeguard against the complete disconnection
of a server from the drive enclosure.
Continued access to drives is assured in the event of a complete drive enclosure failure or removal. Additional expansion drive enclosures can be hot-added without disrupting service, as shown in
Figure 9.
The following figure shows an incorrectly cabled configuration with two disk enclosures.
Intel® High Availability Storage User Guide 17
Figure 10 Example of Incorrect Cabling of Two Disk Enclosures
This configuration does not work for the following reasons (the numbers correspond to the labels in the figure):
1. The failure of an ESM in drive enclosure A causes disconnection between a server and both drive enclosures.
2. The lack of a proper connection from a controller port number to a common enclosure ESM results in inconsistent device reporting.
3. The failure of drive enclosure A results in disconnection of drive enclosure B from both servers.
The following figure shows how to add a second disk enclosure to an existing two-server cluster without interrupting service on the HA configuration.
18 Intel
®
High Availability Storage User Guide
Loading...
+ 42 hidden pages