HP XP48, XP128, XP512, XP1024, XP12000 User guide

HP StorageWorks
Cluster Extension XP
user guide
XP48 XP128 XP512
XP1024
XP12000
sixth edition (August 2004)
T1609-96004
This guide explains how to use the HP StorageWorks Cluster Extension XP software.
© Copyright 2003-2004 Hewlett-Packard Development Company, L.P., all rights reserved Hewlett-Packard Company makes no warranty of any kind with regard to this material, including, but not limited to,
the implied warranties of merchantability and fitness for a particular purpose. Hewlett-Packard shall not be liable for errors contained herein or for incidental or consequential damages in connection with the furnishing, performance, or use of this material.
This document contains proprietary information, which is protected by copyright. No part of this document may be photocopied, reproduced, or translated into another language without the prior written consent of Hewlett-Packard. The information contained in this document is subject to change without notice.
All product names mentioned herein may be trademarks of their respective companies. Hewlett-Packard Company shall not be liable for technical or editorial errors or omissions contained herein. The
information is provided “as is” without warranty of any kind and is subject to change without notice. The warranties for Hewlett-Packard Company products are set forth in the express limited warranty statements accompanying such products. Nothing herein should be construed as constituting an additional warranty.
Printed in the U.S.A.
HP StorageWorks Cluster Extension XP User Guide
product version: 2.05.00 sixth edition (August 2004) part number: T1609-96004
2 HP StorageWorks Cluster Extension XP user guide

Contents

About this guide 9
Intended Audience 10 Disk array firmware and software dependencies 10 Related information 12 Terminology 13 Conventions 14 HP storage website 15 HP authorized reseller 15 Revision history 15
Warranty statement 17
1 Cluster Extension XP features 19
Integration into cluster software 20 Disaster tolerance through geographical dispersion 21
Disaster tolerance considerations 21 Disaster-tolerant architectures 22
Automated redirection and monitoring of mirrored Continuous Access XP
pairs 29
Rolling disaster protection 30
What is a rolling disaster? 30
Recovering the disaster tolerant environment 31 Command-line interface for easy integration 33 Graphical user interface 34 Quorum service (Microsoft Cluster service only) 35
2 Cluster Extension XP processes and components 37
Cluster Extension XP environments 38 Cluster Extension XP execution 40
Contents 3
Continuous Access XP and RAID Manager XP 41
RAID Manager XP instances 43 RAID Manager XP device groups 44
Rolling disaster protection and Business Copy XP 45
Integration with RAID Manager XP 46 Integration with automatic recovery 47 Integration with the pair/resync monitor 47 Restoring server operation 48
Example 48 User configuration file 50 Pair/resync monitor 51 Force flag 53 Pre-execution and post-execution programs 54 Cluster Extension XP log facility 57 Error return codes 58 Quorum service for Microsoft Cluster service 59
Using the quorum service in a Microsoft Cluster service
environment 59
Quorum processes 60
3 User configuration file and Cluster Extension XP objects 65
The user configuration file 66
File structure 67
Specifying object values 68 COMMON section objects 71 APPLICATION section objects 73 Basic configuration example 87
4 RAID Manager XP dependencies 89
RAID Manager XP configuration 90
RAID Manager XP configuration file 90
Network considerations 93
Command device considerations 93
Start and stop the RAID Manager XP instances 94 Takeover basic functionality test 95
5Integration with HACMP97
Configuring resources 98 Procedure for HACMP 99 User configuration file for HACMP 101
4 HP StorageWorks Cluster Extension XP User Guide
Bringing a resource group online 104 Taking a resource group offline 105 Deleting Cluster Extension XP 106 Pair/resync monitor integration 107 Timing considerations 110 Failure behavior 112 Restrictions for IBM HACMP with Cluster Extension XP 113
6 Integration with Microsoft Cluster service 115
Configuring the quorum service 116 Configuring Cluster Extension XP resources 117
Resource group and resource names 118 Cluster Extension XP resource-specific parameters 118
Setting non-Cluster Extension XP resource-specific parameters 119 Adding a Cluster Extension XP resource 123 Changing Cluster Extension XP resource properties 125
Advanced properties 127
Changing a resource name 128 Adding dependencies on a Cluster Extension XP resource 129 Bringing a Cluster Extension XP resource online 131 Taking a Cluster Extension XP resource offline 133 Deleting a Cluster Extension XP resource 134 Pair/resync monitor integration 135 Timing considerations for Microsoft Cluster service 137 Failure behavior with Microsoft Cluster service 139
Bouncing Resource Groups 139
Unexpected offline conditions 139 Restrictions for Microsoft Cluster service with Cluster Extension XP 141 Disaster-tolerant configuration example using a file share 142 Administration 147
7 Integration with VCS 149
Configuration of the Cluster Extension XP agent 150 Configuring the Cluster Extension XP resource 154
Cluster Extension resource types 154
Resource type definition 155 Adding a Cluster Extension XP resource 156 Changing Cluster Extension XP attributes 158 Linking a Cluster Extension XP resource 160 Bringing a Cluster Extension XP resource online 161
Contents 5
Taking a Cluster Extension XP resource offline 163 Deleting a Cluster Extension XP resource 164 Pair/resync monitor integration 165 Timing considerations for VCS 166 Enable/disable service groups 168 Restrictions for VCS with Cluster Extension XP 169 Unexpected offline conditions 171
8 Integration with Serviceguard for Linux 173
Configuration of the Cluster Extension XP environment 174 Adding a Cluster Extension XP integration to an existing Serviceguard
package 180 Starting a Serviceguard package with Cluster Extension XP 181 Halting a Serviceguard package with Cluster Extension XP 182 Deleting Cluster Extension XP from a Serviceguard package 183 Pair/resync monitor integration 185 Timing considerations for Serviceguard 188
9 Command-line interface (CLI) 191
Configuring the CLI 193
Creating the Continuous Access environment and configuring
RAID Manager 193
Timing considerations 193 Restrictions for customized Cluster Extension XP
implementations 195
Creating and configuring the user configuration file 195
CLI Commands 196
clxrun 196 clxchkmon 198 clxqr 201
10 Troubleshooting 203
Start errors 205 Failover error handling 206 HACMP-specific error handling 207
Start errors 207 Failover errors 207
6 HP StorageWorks Cluster Extension XP User Guide
Microsoft Cluster service-specific error handling 211
Solving quorum service problems 211 Resource start errors 213 Failover errors 214
VCS-specific error handling 216
Start errors 216 Failover errors 217
Serviceguard (SG-LX)-specific error handling 220
Start errors 220 Failover errors 220
Pair/resync monitor messages in syslog/errorlog/messages/Event
Log 222
A Recovery procedures 225
XP disk pair states 226 Recovery sequence 228 Quorum service recovery (Microsoft Cluster service only) 230
Single site failure recovery 230 Failure recovery if both sites have failed 232 Procedure for quorum service system cleanup 233
B Cluster Extension XP resource message catalog 235
C Cluster Extension XP quorum service message catalog 261
Quorum service Event Log messages 266
Glossary 269
Index 271
Contents 7
8 HP StorageWorks Cluster Extension XP User Guide

About this guide

This guide provides information about using and configuring HP StorageWorks Cluster Extension XP in an environment where clustered systems are connected to a disaster recovery array-based mirroring solution. Cluster Extension XP allows creation of dispersed multiplatform cluster configurations with the XP disk array. Cluster Extension XP enables cluster software to automatically failover applications where data is stored and continuously mirrored from a local to a remote disk array using HP StorageWorks Continuous Access XP. This guide describes the options you have to make your disaster tolerant environment as robust as possible to keep your data available at all times.
Because the XP family disk arrays supports a broad range of operating systems and cluster software, Cluster Extension XP can be integrated with almost any disk array-supported cluster software. This guide provides you with the information you need to create a two or more data center disaster tolerant environment utilizing the XP disk array and its Continuous Access XP remote mirroring feature.
Unless otherwise noted, the term disk array refers to these disk arrays:
HP Surestore Disk Array XP512 HP Surestore Disk Array XP48 HP StorageWorks Disk Array XP128 HP StorageWorks Disk Array XP1024 HP StorageWorks XP12000 Disk Array
About this guide 9

Intended Audience

This guide is intended for system administrators who maintain the cluster environment and storage subsystems and have the following knowledge:
• A background in data processing and direct-access storage device subsystems and their basic functions.
• Familiarity with disk arrays and RAID technology.
• Familiarity with the operating system, including commands and utilities.
• A general understanding of cluster concepts and the cluster software used in the data center environment.
• Familiarity with related disk array software programs:
HP StorageWorks Continuous Access XP HP StorageWorks RAID Manager XP

Disk array firmware and software dependencies

The features and behavior of failover operations depend on the XP firmware and RAID Manager XP versions. This guide describes Cluster Extension XP behavior based on features implemented in the latest XP firmware and RAID Manager XP versions.
10 HP StorageWorks Cluster Extension XP User Guide

Related information

For information about the disk arrays, please refer to the owner’s manuals.
For related product documentation, see the HP web site (
HP StorageWorks RAID Manager XP: User’s Guide
HP StorageWorks Continuous Access XP: User’s Guide
HP StorageWorks Business Copy XP: User’s Guide
HP StorageWorks Command View XP: User’s Guide
HP StorageWorks Disk Array XP Operating System Configuration Guide: IBM AIX
HP StorageWorks Disk Array XP Operating System Configuration Guide: Sun Solaris
HP StorageWorks Disk Array XP Operating System Configuration Guide: Windows 2000/2003
• HP StorageWorks Disk Array XP Operating System Configuration Guide: Linux
For information about Serviceguard for Linux, see the HP High Availability web site:
docs.hp.com/hpux/ha/
For information about RS/6000 and HACMP, see the IBM web site:
www.hp.com
):
www.rs6000.ibm.com/aix/library
For VERITAS Cluster Server information, see the VERITAS web site:
support.veritas.com
About this guide 11

Terminology

For Microsoft Cluster service information, see the Microsoft web site:
Windows 2000:
www.microsoft.com/windows2000/library/technologies/ cluster/default.asp
Windows 2003:
www.microsoft.com/windowsserver2003/library/technologies/ clustering/default.mspx
This guide uses terminology to describe cluster-specific and disaster recovery-specific processes. Vendors of cluster software use different terms for the components of their cluster software. To standardize the usage among vendors, this guide uses the following terms:
application service This is the unit of granularity for a failover or failback
operation. It includes all necessary resources that must be present and which the application depends on. For example, a file share must have a disk, a mount point (or drive letter) and an IP address to be considered an application service. A disk is a necessary resource for the application service. Depending on the cluster software, application services can depend on each other and run in parallel on the same system or on different systems.
Vendor equivalent terms
VCS: service group HACMP: resource group Microsoft Cluster service: resource group SG-LX (Serviceguard): package
resource The smallest unit in an application service. It describes
the necessary parts to build an application service. The implementation of such resources in cluster software is vendor-specific. Some vendors (such as IBM or HP) do not allow accessing the chains between dependent resources.
Vendor equivalent terms
VCS: resource HACMP: resource group
12 HP StorageWorks Cluster Extension XP User Guide

Conventions

Microsoft Cluster service: resource SG-LX (Serviceguard): package
startup shutdown Startup and shutdown are also known as “bringing
online” and “taking offline,” or “start” and “stop,” or “run” and “halt” in regards to an application service or resource. Only a few cluster software vendors (such as Veritas or Microsoft) offer starting and stopping of single resources.
This guide uses the following text conventions.
Figure 1 Blue text represents a cross-reference. For the online
version of this guide, the reference is linked to the target.
www.hp.com
Underlined, blue text represents a website on the Internet. For the online version of this guide, the reference is linked to the target.
literal Bold text represents literal values that you type exactly
as shown, as well as key and field names, menu items, buttons, file names, application names, and dialog box titles.
variable
Italic type indicates that you must supply a value. Italic type is also used for manual titles.
input/output Monospace font denotes user input and system
responses, such as output and messages.
Example Denotes an example of input or output. The display
shown in this guide may not match your configuration exactly.
[ ] Indicates an optional parameter.
About this guide 13
{ } Indicates that you must specify at least one of the listed
| Separates alternatives in a list of options.

HP storage website

For the most current information about HP StorageWorks XP products, visit the support website. Select the appropriate product or solution from this website:
For information about product availability, configuration, and connectivity, consult your HP account representative.

HP authorized reseller

For the name of your nearest HP authorized reseller, you can obtain information by telephone:
United States 1-800-345-1518
options.
http://h18006.www1.hp.com/storage/arraysystems.html
Canada 1-800-263-5868
Or contact:
www.hp.com

Revision history

February 2001 First release.
March 2001 Added command-line interface chapter.
July 2001 Added MSCS support.
November 2001 Added quorum filter-service for MSCS on
XP512/XP48.
May 2002 Updated content for version 1.03 of all Cluster
Extension products.
14 HP StorageWorks Cluster Extension XP User Guide
Updated content for version 1.04.00 of Cluster Extension for MSCS. Added support for Serviceguard on Linux. Updated content for version 1.1 of Cluster Extension XP quorum service with external arbitrator.
September 2002 Updated content for version 2.00.
Changed product terminology from MSCS to Microsoft Cluster service. Added arguments for clxchkmon. Changed LogLevel values. Changed Windows log file directory location. Added message catalog.
December 2002 Updated content for version 2.01 for VCS and
Serviceguard. Added rolling disaster protection features. Added GUI features.
January 2003 Updated content for version 2.01 for Windows GUI.
April 2003 Updated content for version 2.02.
Added “Cluster Extension XP quorum service message catalog” (page 261).
November 2003 Updated for versions 2.02 and 2.03. Added SUSE
Linux and Windows 2003 support. Removed XP256. Changed MC/ServiceGuard to Serviceguard.
March 2004 Modified document for version 2.04.00.
August 2004 New format applied. Modified document for version
2.05.00
About this guide 15

Warranty statement

HP warrants that for a period of ninety calendar days from the date of purchase, as evidenced by a copy of the invoice, the media on which the Software is furnished (if any) will be free of defects in materials and workmanship under normal use.
DISCLAIMER EXTENT ALLOWED BY LOCAL LAW, THIS SOFTWARE IS PROVIDED TO YOU “AS IS” WITHOUT WARRANTIES OF ANY KIND, WHETHER ORAL OR WRITTEN, EXPRESS OR IMPLIED. HP SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OR CONDITIONS OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT, TITLE, ACCURACY OF INFORMATIONAL CONTENT, AND FITNESS FOR A PARTICULAR PURPOSE. Some jurisdictions do not allow exclusions of
implied warranties or conditions, so the above exclusion may not apply to you to the extent prohibited by such local laws. You may have other rights that vary from country to country, state to state, or province to province.
WA R NI N G THAT USE OF THE SOFTWARE IS AT YOUR SOLE RISK. HP DOES NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE SOFTWARE WILL MEET YOUR REQUIREMENTS, OR THAT THE OPERATION OF THE SOFTWARE WILL BE UNINTERRUPTED, VIRUS-FREE OR ERROR-FREE, OR THAT DEFECTS IN THE SOFTWARE WILL BE CORRECTED. THE ENTIRE RISK AS TO THE RESULTS AND PERFORMANCE OF THE SOFTWARE IS ASSUMED BY YOU. HP DOES NOT WARRANT OR MAKE ANY REPRESENTATIONS REGARDING THE USE OR THE RESULTS OF THE USE OF THE SOFTWARE OR RELATED DOCUMENTATION IN TERMS OF THEIR CORRECTNESS, ACCURACY, RELIABILITY, CURRENTNESS, OR OTHERWISE. NO ORAL OR WRITTEN INFORMATION OR ADVICE GIVEN BY HP OR HP’S AUTHORIZED REPRESENTATIVES SHALL CREATE A WARRANTY.
. EXCEPT FOR THE FOREGOING AND TO THE
! YOU EXPRESSLY ACKNOWLEDGE AND AGREE
16 HP StorageWorks Cluster Extension XP User Guide
LIMITATION OF LIABILITY. EXCEPT TO THE EXTENT PROHIBITED BY LOCAL LAW, IN NO EVENT INCLUDING NEGLIGENCE WILL HP OR ITS SUBSIDIARIES, AFFILIATES, DIRECTORS, OFFICERS, EMPLOYEES, AGENTS OR SUPPLIERS BE LIABLE FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, CONSEQUENTIAL, PUNITIVE OR OTHER DAMAGES (INCLUDING LOST PROFIT, LOST DATA, OR DOWNTIME COSTS), ARISING OUT OF THE USE, INABILITY TO USE, OR THE RESULTS OF USE OF THE SOFTWARE, WHETHER BASED IN WARRANTY, CONTRACT, TORT OR OTHER LEGAL THEORY, AND WHETHER OR NOT ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Your use of the
Software is entirely at your own risk. Should the Software prove defective, you assume the entire cost of all service, repair or correction. Some jurisdictions do not allow the exclusion or limitation of liability for incidental or consequential damages, so the above limitation may not apply to you to the extent prohibited by such local laws.
NOTE
. EXCEPT TO THE EXTENT ALLOWED BY LOCAL LAW,
THESE WARRANTY TERMS DO NOT EXCLUDE, RESTRICT OR MODIFY, AND ARE IN ADDITION TO, THE MANDATORY STATUTORY RIGHTS APPLICABLE TO THE LICENSE OF THE SOFTWARE TO YOU; PROVIDED
, HOWEVER, THAT THE CONVENTION ON CONTRACTS FOR THE INTERNATIONAL SALE OF GOODS IS SPECIFICALLY DISCLAIMED AND SHALL NOT GOVERN OR APPLY TO THE SOFTWARE PROVIDED IN CONNECTION WITH THIS WARRANTY STATEMENT.
About this guide 17
18 HP StorageWorks Cluster Extension XP User Guide
1

Cluster Extension XP features

The quest to extend high availability over geographically dispersed locations has driven today’s IT personnel to demand cluster solutions capable of recovering from even the most extensive disasters. HP StorageWorks Cluster Extension XP enables you to monitor HP StorageWorks Continuous Access XP-mirrored disk pairs and allows access to the remote data copy if the application becomes unavailable on the local site. If the application service is restarted on the remote site, after the local (primary) application service has been shut down, Cluster Extension XP uses its internal database to check whether the current disk states allow automatic access to your data based on consistency and concurrency considerations. Integrated in the cluster software or available as command-line interface for your own integration, Cluster Extension XP ensures that the data can be accessed if necessary.
Cluster Extension XP software provides these key features:
• integration into cluster software
• disaster tolerance through geographical dispersion
• automated redirection and monitoring of mirrored Continuous Access XP pairs
• command-line interface for easy integration
Cluster Extension XP features 19

Integration into cluster software

The value of Cluster Extension XP is to provide tight integration into the cluster software, wherever possible. Cluster Extension XP is a resource to the clustered application service (like the disk or volume group) and must therefore be managed as such. The architecture of Cluster Extension XP allows integration into many cluster software products, including these:
• VERITAS Cluster Server (VCS)
• IBM HACMP
• Windows 2000 Advanced Server and Datacenter Server Cluster service
• Windows Server 2003 Enterprise Edition and Datacenter Edition
• Serviceguard for Linux (SG-LX)
For the current list of supported cluster software, contact your HP representative.
20 HP StorageWorks Cluster Extension XP User Guide

Disaster tolerance through geographical dispersion

Using two or more disk arrays with Continuous Access XP allows you to copy your most valuable data to a remote data center. Cluster Extension XP provides the cluster software with a mechanism to check and allow data access (in case the local application service must be transferred to a remote cluster system). The distance to your remote location is only limited by the technology your cluster software uses to communicate with each system in the cluster, the technology you use for physical data replication, and the degree of failover automation.

Disaster tolerance considerations

Application availability is essential for today’s businesses. The capability to restore the application service after a failure of the server, storage or the whole data center in a timely fashion is a must and is considered as disaster tolerance. Complete data center failures can be caused by earthquakes or hurricanes but more often they are caused by power outages or fires.
To protect against such disasters, a single data center is not sufficient. Systems (storage as well as the servers) must be geographically distributed in order to build a disaster tolerant architecture which protects against planned and unplanned downtimes.
Of course, redundant network cards and storage host bus adapters are a basic requirement. The same applies for the power supplies of both the storage and the server. With this hardware in place, the external power service and the network must also be designed to provide no single point of failure (SPOF).
Today, data is the most valuable asset in your enterprise. The XP family of disk arrays provides a fully redundant architecture, and the flexibility to upgrade firmware online reduces the risk of unplanned and planned downtime. The disk array also provides the feature of remotely mirroring your data to a second disk array.
Cluster Extension XP features 21
To have this expensive hardware in place must be compared to the risk of a true disaster. The costs pay off in a real disaster to ensure that the business critical applications are still accessible from another location.
Guidelines The following considerations, applied to the cluster environment, can
ensure an application service survives a disaster with minimal downtime.
• geographical dispersion of hardware and applications
• redundant paths to access the network and storage
• alternative power sources
• redundant networks
• data replication
Several ways of implementing such disaster tolerant architectures are possible. All of those solutions can be covered by a clustered solution using the XP family of disk arrays and Continuous Access XP. Cluster Extension XP is needed to enable access to your critical data.

Disaster-tolerant architectures

With Cluster Extension XP, you can extend your cluster solution beyond the limitations of existing data center and campus-wide distances. Cluster Extension XP enables metropolitan-wide failover capabilities, and beyond.
Having a local disk array in each data center also means that the server does not have to write twice because Continuous Access XP mirrors each write-IO to the remote site and therefore relieves the server of the burden, preventing performance bottlenecks.
Disaster tolerant architectures using data replication over the network
Data replication over the network is a way to achieve disaster tolerance and is considered logical replication.
22 HP StorageWorks Cluster Extension XP User Guide
Write--
IO
Write
IO
Replication
Replication
WAN
(IP,ATM,
T3,
DWDM)
data center
in
San Francisco
Figure 1. Logical replication over networks
data center
in
Ne w Y ork
Logical replication uses specific host-based software to write data to local disks and also to replicate that data to a remote system connected to an attached storage device. Because data is replicated over the network, there is no distance limitation for such solutions.
Logical replication techniques imply that the failover process is mainly manual. This means each site belongs to a different cluster, or only the primary site is clustered, while the secondary site acts as a standby system. It is also possible that no cluster software is involved and that only one system is available at each site.
Data replicated over the network can be at the granularity of a single volume, a file system, or a transaction.
All logical replication techniques have some significant disadvantages: The remote system is a standby system. That is, it must perform the same task as the primary system and cannot be used for any other purpose. If the standby system is activated, it must replay redo logs first and cannot automatically serve as a replication source (for example, Oracle’s standby database implementation).
Another significant disadvantage of such architectures is that the server must write every IO twice, once to the attached storage device and once to
Cluster Extension XP features 23
the remote system over the network. These replication techniques can only be implemented asynchronously; otherwise, the application experiences noticeable performance degradation.
Because of the nature of replication products, additional CPU power is necessary to mirror write requests.
Logical replication implies that all logs, which have not been shipped (or which are in transit) are lost in case of a disaster.
Disaster-tolerant architectures using Fibre Channel networks
Disaster tolerant architectures using Fibre Channel networks can be achieved by the use of physical replication.
Write--
IO
Write
IO
Replication
Replication
data c ente r
in
building 1
Figure 2. Physical replication using Fibre Channel
data center
in
building 2
As with logical replication products, physical replication often uses host-based software to replicate data. Here, data is written to server-attached storage devices twice. Most of today’s logical volume management products offer this feature.
Using Fibre Channel, you could use dual-attached storage devices, where one port is connected to the local server and one is connected to the remote server. To be able to access your data at the remote location in case of a disaster, each server must have a local and a remote storage device. The
24 HP StorageWorks Cluster Extension XP User Guide
volume management software, then, must be set up to mirror each write request to both the local and the remote storage device. With the XP disk arrays, several servers can be connected to each disk array.
This solution is called campus cluster. A single cluster can be used and the failover process can be automated. With campus clusters, both sites can be active.
Data replicated via volume mirroring is based on the granularity of a single volume.
Campus cluster solutions are limited to the distance Fibre Channel supports today. While storage systems must be in a range of 500 meters (direct connect) or up to 10 kilometers (connected via Fibre Channel switches or Fibre Channel hubs), campus cluster solutions can only offer limited protection against natural disasters.
Another limiting factor is the cluster heartbeat protocol or the communications protocol used for cluster reformation processes. Those protocols are vendor-specific implementations and require private networks. This means, those protocols are not routable. The distance limitations of a private network depends on the supplied network infrastructure and latency issues of the heartbeat or cluster reformation protocol.
Another significant disadvantage of such architectures is that the server must write every IO twice: once to the locally attached storage device and once to the remote attached storage device. These replication techniques are implemented as software running on the server, which reduces the available compute power and degrades server performance.
Because of the nature of volume mirroring products, additional CPU power is necessary to mirror write requests across two host bus adapters.
Most of these products have another significant disadvantage. In case of a path failure, the whole volume must be copied to resynchronize the second volume with the current state of the first volume. If the storage device must be replaced, all volumes must be copied. This significantly affects server performance.
Cluster Extension XP features 25
Disaster-tolerant architectures using disk array-based mirroring
Using Continuous Access XP-based mirroring is also considered physical replication. Continuous Access XP is disk array-based mirroring. As with
campus clusters, such solutions require two or more disk arrays.
The key difference from the above-mentioned solutions is that the disk array keeps track of the data integrity of the mirrored disks. XP disk arrays offer RAID-1 or RAID-5 protection as a standard feature and allow online addition and replacement of disks, IO adapter cards, and memory. To provide copies of data, internal and external mirroring features are available. For disaster tolerant solutions, Continuous Access XP can mirror your data with no distance limitation.
ESCON or Fibre Channel protocol is used to transfer data between two disk arrays. Using converters, ESCON and Fibre Channel can be routed over IP networks and T3 to allow unlimited distance between the disk arrays. To replicate data over more than 0.5 km (Fibre Channel) or 3km (ESCON), special extenders or switches must be purchased.
The cluster solutions using Continuous Access XP-based disk mirroring are called metropolitan clusters or geographically dispersed clusters. Servers are members of the same cluster dispersed over two or more sites. Since the disk array controls the replication process, the server is relieved from writing any IO-request to the disk more than once.
Continuous Access XP-mirrored disks typically have a read/write-enabled primary disk and a read-only secondary disk. This leads to problems because current cluster software products cannot distinguish between write-protected and write-enabled disks.
Cluster software assumes that the application service has access to read/write-enabled data disks on any system that the application service has been configured to run. Since the secondary volume of a disk pair is not normally accessible, the failover process would typically involve manual intervention.
26 HP StorageWorks Cluster Extension XP User Guide
MAN
MAN
Write--
IO
Write--
IO
Write
IO
Write
IO
Replication
Replication
Replication
Replication
data center
data center
in
in
Manhattan
Manhattan
MAN
(ESCON,IP
(ESCON,IP
(ESCON,IP
ATM,
ATM,
ATM,
DWDM)
DWDM)
DWDM)
continuous access xp
continuous access xp
(extension)
(extension)
data center
data center
in
in
Brooklyn
Brooklyn
Figure 3. Physical replication using HP StorageWorks Continuous Access XP
Cluster Extension XP provides the software to enable automated failover and failback procedures integrated as a resource of the application service. Cluster Extension XP uses an internal database to decide whether the data on the failover site is safe to be accessed or not. Manual intervention is required if the current disk array states and the user settings conflict with the rules stored in the Cluster Extension XP internal database.
The limiting factor of metropolitan or geographically dispersed clusters is the cluster heartbeat protocol or the communications protocol used for the cluster reformation processes. Those protocols are vendor-specific implementations and typically require private networks. These protocols are not routable; a router cannot be used. The distance limitations of networks supporting a private network are dependent on the supplied network infrastructure and latency issues of the heartbeat or cluster reformation protocol.
To address these issues, cluster manager software can be used. This software offers disaster tolerance by managing two or more clusters from a single console or server and is considered a continental cluster. Depending on the implementation, automated or semiautomated failover processes between clusters are possible.
Cluster Extension XP features 27
As mentioned above, metropolitan or geographically dispersed clusters as well as continental clusters require metropolitan area networks or wide area networks. In most cases, those network connections involve common carriers and special network equipment which can be very expensive. The reliability of a direct connection or a campus network can be degraded and involves more planning to deploy and maintain a disaster tolerant environment.
Using Continuous Access XP, data is accessible and consistent in every failover case and the resynchronization of a completely failed disk array can be done while the application is running with almost no impact to the server performance. This allows reestablishing disaster tolerance without application downtime.
28 HP StorageWorks Cluster Extension XP User Guide

Automated redirection and monitoring of mirrored Continuous Access XP pairs

Disk arrays with Continuous Access XP provide a unique feature that allows the redirection of the mirroring destination. This means Continuous Access XP almost instantaneously swaps the primary/secondary relationship of disk pairs if the application must access the secondary disk. This feature ensures that the disk pairs are always synchronized, ensuring that the failback process is as fast as the failover process. If the links between your disk arrays are broken, each array maintains a bitmap table to synchronize the changed, delta data if the links become available again. In a failover case, Cluster Extension XP takes the appropriate action for each link/array status and makes sure that your application service has the latest data.
Cluster Extension XP includes a pair/resync monitor to monitor the health of the links between your arrays. Furthermore, it detects a lost and later reestablished link and automatically resynchronizes the suspended disk pairs, ensuring the most current data is available on either site.
Cluster Extension XP features 29

Rolling disaster protection

Rolling disaster protection minimizes the impact of downtime and ensures data integrity during recovery operations. Rolling disaster protection combines Continuous Access XP remotely mirrored disk pairs and internal Business Copy XP disk copies to protect data locally as well as remotely. In combination, these features support the highest data protection levels to prevent disastrous loss of data.

What is a rolling disaster?

A rolling disaster refers to catastrophic events or outages that affect the data stored on remote mirrored disk pairs. In a rolling disaster, data stored on remote mirrored disk pairs can be entirely lost during a recovery attempt.
In a rolling disaster, the mirrored disk pairs typically experience the following sequence of events:
1. The primary data center failed.
The cluster software successfully transferred application execution to the remote data center.
2. The Continuous Access XP link failed.
3. The secondary volume of the disk pair is used to continue operation after failover while the CA link is not functional.
The secondary volume represents the latest state of data, whereas the data on the primary volume is now out of date.
4. The primary data center is recovered and the Continuous Access XP link is restored.
30 HP StorageWorks Cluster Extension XP User Guide
5. A recovery operation is initiated to resynchronize (update) the original (primary) disk from the secondary disk.
This is known as a restore operation after a disaster, or a restore after failover operation.
The resynchronization/restore operation can take minutes to days depending on the amount of data that must be updated and transferred between the two disk arrays.
During the recovery operation, data is vulnerable to the effects of a disaster or outage. During a resynchronization operation, data updates are sent in the order of changed tracks and not in the transactional order in which the data was originally written or acknowledged.
6. The secondary site fails during the resynchronization/restore operation.
The restored data at the original, primary site becomes unusable.
Although resynchronization operations are possible while an application is running, resynchronization could lead to unrecoverable data if a rolling disaster occurs. This type of rolling disaster can occur in the following circumstances:
• during manual resynchronization attempts
• during failover operations using Cluster Extension XP in a cluster environment when the Cluster Extension XP AutoRecover object is set to yes, or where the pair/resync monitor is used with the ResyncMonitorAutoRecover object set to yes.

Recovering the disaster tolerant environment

To ensure survival of critical data during a resynchronization/restore operation, Cluster Extension XP supports the use of preconfigured Business Copy disks and allows suspending any number of Business Copy pairs that can be associated with the primary data disks. Cluster Extension XP recovers automatically, provided that at least one internal Business Copy mirror could be suspended to guarantee a recoverable state.
Cluster Extension XP also resumes internal Business Copy mirrors automatically, if specified, to allow the local site to keep an up-to-date image of the data.
Cluster Extension XP features 31
This internal copy represents the state of the primary volume before the data center failure. This copy is needed to survive a possible failure of the secondary volume or disk array during the resynchronization operation. Although the data could be out of date, it represents the best starting point for the recovery effort, unlike the inconsistent data that results from a rolling disaster.
Recovery from a consistent, point-in-time copy ensures the integrity of data and eliminates the need for full tape restore procedures. Rolling disaster protection provides a rapid recovery method and so minimizes downtime.
Figure 5 (page 49) illustrates an example of a disaster-tolerant configuration.
To implement rolling disaster protection, see “Rolling disaster protection and Business Copy XP” (page 45).
32 HP StorageWorks Cluster Extension XP User Guide

Command-line interface for easy integration

Cluster Extension XP provides you with a command line interface to enable disaster tolerant environments even if no cluster software is available for your operating system. This feature is convenient if you use in-house-developed software to migrate application services from one system to another or if you want Cluster Extension XP to check the disk states to make sure you can automatically start your application service on the local disk array.
Cluster Extension XP features 33

Graphical user interface

Cluster Extension XP for Microsoft Cluster service and VCS can be configured with the cluster software GUI. Both cluster software products provide a graphical user interface to set and change resource values. Cluster Extension XP offers full integration into the GUI so that you can utilize the capacity of your cluster software.
34 HP StorageWorks Cluster Extension XP User Guide

Quorum service (Microsoft Cluster service only)

Microsoft Cluster service depends on the cluster quorum disk resource to maintain a persistent log of cluster configuration changes and status, as well as a single point to resolve any possible events that could result in a split brain situation. The Cluster Extension XP quorum service adds an additional dimension to disaster tolerance by remotely mirroring the quorum disk resource, thus preventing it from being the single point of failure.
The quorum service allows the quorum disk resource to be mirrored between dispersed sites and supports the movement and failover of the quorum disk between the two sites without disrupting other cluster services. The external arbitrator (also included in Cluster Extension XP) solves the potential split brain syndrome as well. This feature significantly increases the availability of the critical quorum disk resource, thus reducing the possibility of cluster failure due to the loss of the quorum disk.
The quorum service and Cluster Extension have been certified to fulfill all requirements for Microsoft cluster. For certified configurations, see the Microsoft web site:
www.microsoft.com/windows/catalog/server
1. Click the Hardware tab.
2. Select Cluster Solutions from the left side menu.
3. Select Geographically Disposed Cluster Solution.
Cluster Extension XP features 35
36 HP StorageWorks Cluster Extension XP User Guide
2
Cluster Extension XP processes and
components
Cluster Extension XP is shipped in the appropriate format for each platform:
Platform Implementation
VCS agent
IBM HACMP pre-event executable
Microsoft Cluster service
SG-LX function call/executable
Customized solutions to failover application services must implement Cluster Extension XP through its command-line interface prior to the disk activation procedure.

Cluster Extension XP processes and components 37

resource DLL, quorum service, and external arbitrator

Cluster Extension XP environments

The ideal environment for a Cluster Extension XP configuration consists of at least four servers (two at each site) and separated redundant communications links for cluster heartbeats, client access and Continuous Access XP (Extension). All communications interfaces must be installed in pairs to serve as failover components, preventing single points of failure (SPOFs).
Recommendation Use load balancing and alternative pathing software for host-to-storage
connections, such as HP StorageWorks Auto Path for IBM AIX or Secure Path for Linux and Windows operating systems. For Sun Solaris operating systems, VERITAS offers such software. These software products enable you to upgrade XP firmware while the application service is running.
Network communications links between the dispersed data centers must be redundant and physically routed differently. This prevents the “backhoe issue,” that is, where all links between data centers are cut together. This is especially important, since the cluster is more vulnerable to “split brain” syndromes. A split brain syndrome is where both data centers’ systems form new clusters which could allow access to both copies of the data. This can be prevented with physically separated network links and redundant network components. Cluster Extension XP allows you to configure the failover behavior in such a way that the application service startup procedure will be stopped if none of the remote cluster members can be reached. The default configuration of Cluster Extension XP expects the cluster software to deal with the “split brain” syndrome.
Since the disk array stores your most valuable data, this data must get across to the remote disk array. At least four Continuous Access XP links must be available when the disk arrays are connected directly and are configured for bidirectional takeover. For extended distances, extender components must be purchased. These components are able to bundle Continuous Access XP links. At least two links are necessary to provide redundancy and protection against single points of failure. Although communications links can cover considerable distances, each network segment must be extended to the dispersed data center in order to maintain a heartbeat among all servers.
38 HP StorageWorks Cluster Extension XP User Guide
Recommendation Use four systems to give local application service failover among local
cluster systems priority over remote, more time-consuming failover procedures. When failing over, Cluster Extension XP must reconfigure the disk arrays to change the mirroring direction. This takes more time than just checking for the correct disk array disk states. On the remote site, two systems should be available in the case the failover system experiences a hardware or power failure.
Figure 2 (page 24) depicts a preferred Cluster Extension XP configuration.
Caution Cluster Extension XP works with only one system at each location, with a
single I/O path between the server system and the disk array and a single link in each direction between disk arrays.
However, those configurations are not considered highly available, nor are they disaster tolerant. Therefore, Cluster Extension XP configurations with single points of failure are not supported by HP.
Cluster Extension XP processes and components 39

Cluster Extension XP execution

Cluster Extension XP requires cluster software to automatically fail over and fail back among systems on a local site or between sites. Cluster Extension XP must manipulate the application startup process before disk array disks are activated. Cluster Extension XP, therefore, must be integrated as first resource (in the order of resources). To activate Continuous Access XP paired disk devices, the paired disk devices must be in read/write mode. Continuous Access XP disks are usually in read/write mode on the primary disk only; the secondary disk is in read-only mode. In case of a failover, the direction of the mirrored pair is changed by Cluster Extension XP automatically. In case of a disaster, the disk array can have several different states for disks in a RAID Manager XP device group. Cluster Extension XP decides whether those disks can be activated.
Cluster Extension XP must be installed on any server in the cluster that can run the application service in the cluster.
Cluster Extension XP stores information about the application environment in an internal object database and uses RAID Manager XP to gather information about the state of the associated disk pairs. The information about the configured disk array environment and failover behavior is transferred either directly by the cluster software or by gathering from the user configuration file.
The internal object database provides Cluster Extension XP with knowledge about supported parameters, their formats, and default values.
Disk array disk states are stored in an internal object database and a rule engine is used to process those disk states. The rule engine matches current disk states and configuration parameters with a defined rule, stores it in the database, and invokes predefined actions. Those actions prepare the disk array disks to be activated, or it stops the application service startup process if the matching rule requires it to do so.
40 HP StorageWorks Cluster Extension XP User Guide

Continuous Access XP and RAID Manager XP

Continuous Access XP provides remote copy functionality for the disk arrays. Disk arrays can be mirrored to many different remote disk arrays.
Cluster Extension XP does not support two disk arrays as either primary or secondary disk arrays. Cluster Extension XP supports configurations where two (or more) disk arrays use one remote disk array as the failover site. In those cases, the disk array configuration can be considered as a logical one-to-one configuration.
Figure 4 (page 41) depicts an example of a supported configuration.
App RED
App RED
Links to App RED are not
supported.
App BLUE
App BLUE
data center
in
Manhattan
Figure 4. Supported XP disk array configuration
ca xp
(extension)
data center
in
Brooklyn
To control Continuous Access XP-mirrored disks from a server, RAID Manager XP must be installed on the server. A special disk, called a command device, must be configured to control the paired disks. The special disk must not be part of Microsoft Cluster service resources and cannot be paired. The command device, which is identified by a “CM” appended to the emulation type, can be assigned to a 36-Mbyte or greater CVS volume. RAID Manager XP uses the command device to communicate with the disk array controller (DKC).
Cluster Extension XP processes and components 41
Using Continuous Access XP Extension, consistency groups can be configured. Consistency groups are units in which the disk array keeps data consistent among paired disks.
Continuous Access XP links are unidirectional links. For disaster tolerant configurations, two links must be provided in each direction. Both sender (RCP) and receiver (LCP) ports must be configured on each redundant IO board used for Continuous Access XP.
Continuous Access XP offers two modes of replication:
• synchronous replication
• asynchronous replication
Synchronous replication
Using synchronous mode, all write requests from the server are first transferred to the remote disk array. After each IO has been mirrored in the cache area of the remote array, it is acknowledged to the local disk array. The write request is then acknowledged to the server.
Synchronous replication modes can be configured in the following fence levels:
NEVER Allows write requests even if the request cannot be
replicated to the remote disk array. If a write request cannot be replicated the remote disk array, the area on the disk is marked in a bitmap table and transferred after a resynchronization request has been ordered.
STATUS This fence level is not supported by Cluster Extension
XP.
DATA Prohibits write requests immediately if a link failure or
disk failure occurs. The local disk array cannot replicate data to the remote disk array. Fence level DATA provides data concurrency at any time.
42 HP StorageWorks Cluster Extension XP User Guide
The preceding fence levels provide data integrity on a per disk basis, so a failure affecting a single disk pair does not lead to a halt of the replication activities of non-affected disk pairs.
Synchronous replication can affect the performance of the system if the distance between the disk arrays is significant.
Asynchronous replication
Continuous Access XP Extension offers a unique feature to replicate data asynchronously.
To keep replicated data consistent among two disk arrays, any incoming write request is ordered and numbered. The write request is then acknowledged to the server, offering the fastest response time for remote mirroring. Each write request is transferred to the remote disk array asynchronously. The remote array orders all write requests before they are destaged to the disk, keeping data consistent.
Asynchronous replication offers excellent performance for remote mirroring and provides data consistency on a group of disks (consistency groups) level.

RAID Manager XP instances

A RAID Manger instance is necessary to control pair operations and to gather disk array status information.
The RAID Manager XP instance numbers used for the RaidManagerInstances object must be the same among all systems using Cluster Extension XP.
Several RAID Manager XP instances can be configured to provide additional redundancy. Cluster Extension XP switches to the next available instance when an instance becomes unavailable.
The RAID Manager XP instances should be running at all times to provide the fastest failover capability. Cluster Extension XP provides scripts to include the RAID Manager XP startup procedure in the system startup file
Cluster Extension XP processes and components 43
(for example, /etc/inittab). However, Cluster Extension XP starts the configured RAID Manager XP instances if it cannot find any running instance.
Quorum service
The Cluster Extension XP quorum service employs static RAID Manager API calls and therefore is not dependent on a RAID Manager instance.

RAID Manager XP device groups

A single device group must be configured for a service group (VCS), a resource group (HACMP), a cluster group (Microsoft Cluster service), or a package (SG-LX). This device group must include all disks being used for the application service.
The device group is the unit in which the failover/failback operation is being carried out. A device group can contain several volume groups.
44 HP StorageWorks Cluster Extension XP User Guide

Rolling disaster protection and Business Copy XP

To implement rolling disaster protection, you must create Business Copy XP disk pairs for the Continuous Access XP disk pairs locally. BC disk pairs used for rolling disaster protection must be created with the –m noread option of the paircreate command. This ensures that BC disks are unavailable to other services, because these disks are intended to be used for rolling disaster protection only. The BC SVOLs must be mapped to a backup server and not to the local cluster node. When Cluster Extension XP suspends the BC pairs, they become available to the local server, which could result in duplicated volume or disk group IDs or signatures.
To enable rolling disaster protection with Business Copy XP, set the following objects for data centers A and B:
BCEnabledA page 79
BCEnabledB page 79
When these objects are set to YES, rolling disaster protection is enabled and Cluster Extension XP checks whether the configured Business Copy XP disk pairs are in PAIR state. Before initiating the resynchronization operation, Cluster Extension XP suspends specified Business Copy XP disk pairs that are in PAIR state.
If the BCEnabledA and BCEnabledB objects are set to YES, you must configure specific Business Copy XP disk pairs by using MU (mirror unit) numbers. The MU number defines one of the many disk pair relationships you can create with Business Copy XP disk pairs. You can specify any number of MU numbers that are supported by the Business Copy XP software. Disk pair MU numbers are specified by the following objects for data centers A and B:
BCMuListA page 79
BCMuListB page 80
Cluster Extension XP processes and components 45
To enable resynchronization of Business Copy XP disk pairs that have been split by Cluster Extension XP, use the following objects for data centers A and B:
BCResyncEnabledA page 79
BCResyncEnabledB page 80
Cluster Extension XP maintains a list of all associated Business Copy XP disk pairs that were in PAIR state before a resynchronization attempt. If pairs were suspended, Cluster Extension XP automatically resynchronizes those disk pairs after the Continuous Access XP remote mirrored disk pairs have been paired. This feature supports automatic resynchronization of locally split BC disk pairs only. You must specify MU numbers for resynchronization by using the following objects for data centers A and B:
BCResyncMuListA page 80
BCResyncMuListB page 80
Caution If rolling disaster protection is enabled and none of the Continuous Access
XP mirrored disk pairs have a Business Copy disk pair that is in PAIR state, Cluster Extension XP returns a global error and you will not be able to activate the application service. Ensure that at least one Business Copy disk pair is in PAIR state.
You can use the forceflag to start the application service. See “Force flag”. In this case, Cluster Extension XP disables rolling disaster protection.

Integration with RAID Manager XP

Rolling disaster protection does not require Business Copy XP disk pairs to be defined in the RAID Manager XP horcmX.conf files that are used by Cluster Extension XP. Cluster Extension XP uses the MU number to monitor and control associated Business Copy XP pairs.
However, you must create a RAID Manager XP configuration file to control the Business Copy XP disk pairs, which are outside of Cluster Extension XP control.
46 HP StorageWorks Cluster Extension XP User Guide
The management of Business Copy XP disk pairs is independent of Cluster Extension XP/Continuous Access XP remotely mirrored disk pairs.
Cluster Extension XP uses the MU number to control the Business Copy disk pairs. Therefore, only the RAID Manager XP instances that are configured for Cluster Extension XP are required for rolling disaster protection.
The Rolling Disaster Protection feature cannot suspend Business Copy XP disk pairs on the XP family disk array in the remote data center if the RAID Manager XP instance in the remote data center is not running or not reachable.

Integration with automatic recovery

If the AutoRecover object is set to YES, Cluster Extension XP automatically resynchronizes the Continuous Access XP disk pairs to update the remote disks. If rolling disaster protection is enabled, it suspends the Business Copy XP disk pair that is attached to the remote Continuous Access XP disk.
If this remote Business Copy XP disk pair cannot be suspended because the remote RAID Manager XP instance is not running or cannot be reached, Cluster Extension XP continues the application service activation (online the Cluster Extension XP resource) without automatic resynchronization of the Continuous Access XP disk pair and without the suspension of the Business Copy XP disk pair.
In this case, the Continuous Access XP disk pair must be recovered manually.

Integration with the pair/resync monitor

If the ResyncMonitor object is set to YES, Business Copy XP disk pairs are not used when the pair/resync monitor automatically recovers suspended or failed Continuous Access XP disk pairs.
Cluster Extension XP processes and components 47
To protect the remote volume of an out-of-sync Continuous Access XP disk pair against rolling disasters, use the default settings for the pair/resync monitor. Resynchronize the Continuous Access XP disk pair manually after splitting off the Business Copy XP disk pair.

Restoring server operation

Rolling disaster protection automatically recovers the PAIR state of the Continuous Access XP disk pair of an application service. Before you failover (or failback) an application service from one data center to the other, you must restore the server operation. After you restart the server, also start the RAID Manager instance used to manage the Continuous Access XP disk pairs on those servers. This enables rolling disaster protection to work correctly during a recovery failover/failback operation.

Example

Figure 5 (page 49) depicts an example of a fully configured Cluster Extension XP environment that uses rolling disaster protection. The Business Copy disk pairs are specified as 0 in the Cluster Extension XP BCMuListA and BCMuListB objects.
48 HP StorageWorks Cluster Extension XP User Guide
RAID Manager XP configuration file
Instance 101 (horcm101.conf) manages both the CA disk pairs and BC disk pairs
Instance 5 (horcm5.conf) manages the BC disk pairs locally
dcAserver dcBserver
HORCM_MON NONE ca101 1000 1000 HORCM_CMD /dev/sdb /dev/sdc HORCM_DEV caxp_oracle_db ca_003_003 CL1-A 1 7 caxp_oracle_db ca_004_004 CL1-A 2 0 bcxpA_oracle_db bc_003_133 CL1-A 1 7 0 bcxpA_oracle_db bc_004_134 CL1-A 2 0 0 HORCM_INST caxp_oracle_db dcBserver ca101 bcxpA_oracle_db dcAserver bc5
HORCM_MON NONE bc5 -1 300 HORCM_CMD /dev/sdb /dev/sdc HORCM_DEV bcxpA_oracle_db bc_003_133 CL1-A 3 7 bcxpA_oracle_db bc_004_134 CL1-A 4 0 HORCM_INST bcxpA_oracle_db dcAserver ca101
HORCM_MON NONE ca101 1000 1000 HORCM_CMD /dev/sdb /dev/sdc HORCM_DEV caxp_oracle_db ca_003_003 CL1-A 1 7 caxp_oracle_db ca_004_004 CL1-A 2 0 bcxpB_oracle_db bc_003_033 CL1-A 1 7 0 bcxpB_oracle_db bc_004_034 CL1-A 2 0 0 HORCM_INST caxp_oracle_db dcAserver ca101 bcxpB_oracle_db dcBserver bc5
HORCM_MON NONE bc5 -1 300 HORCM_CMD /dev/sdb /dev/sdc HORCM_DEV bcxpB_oracle_db bc_003_033 CL1-A 3 7 bcxpB_oracle_db bc_004_034 CL1-A 4 0 HORCM_INST bcxpB_oracle_db dcBserver ca101
Figure 5. Example of a disaster-tolerant configuration with rolling disaster protection
Cluster Extension XP processes and components 49

User configuration file

Cluster Extension XP provides a user configuration file to customize Cluster Extension XP failover/failback behavior. The user can specify all customizable objects of Cluster Extension XP with this file.
Related information “The user configuration file”
50 HP StorageWorks Cluster Extension XP User Guide

Pair/resync monitor

The pair/resync monitor clxchkd utility can be turned on or off with the ResyncMonitor object.
The pair/resync monitor can either monitor or both monitor and resynchronize the state of the RAID Manager XP device group for an application service. The cluster software must be able to stop the monitoring or resynchronization process if the application service is stopped.
If the ResyncMonitorAutoRecover object is set to YES, the monitor tries to resynchronize the remote disk based on the local disk. This occurs only if the disks are in a PVOL/SVOL or SVOL/PVOL relationship. If one or both disk peers are in the state SMPL or the device group state is mixed, automatic resynchronization is not initiated.
The monitor is started from Cluster Extension XP the first time that Cluster Extension XP checks the disk states. Any subsequent execution of the monitor program adds the RAID Manager XP device group to be monitored to the list of to-be-monitored device groups. The monitor interval specified with the ResyncMonitorInterval object is used to monitor the device group state. Do not set the monitor interval below the RAID Manager XP timeout parameter (HORCM_MON in the horcmX.conf file).
Caution If the application service must be stopped, the cluster software or your
customized solution to start and stop the application service must be able to stop the monitoring or resynchronization process. If this cannot be ensured, the use of the pair/resync monitor is not supported. It is highly recommended to disable application service failover for the time of the disk pair recovery (resynchronization). Cluster Extension XP assumes that if the monitor is enabled, immediate action will be taken to recover a reported suspended disk pair. If at any time the resynchronization process is running on both disk array sites, data corruption can occur.
The ResyncMonitorAutoRecover option set to YES is supported with this monitor only if the minimum disk array firmware version is 01-11-xx
Cluster Extension XP processes and components 51
(XP512/XP48) or 21.01.xx (XP128/XP1024), and the minimum RAID Manager XP version is 01.04.00.
The pair/resync monitor uses the syslog() facility (Linux/UNIX) and the Event Log (Windows) to inform you if the link for the device group is broken. A broken link is recognized only if data will be written to disk; otherwise, the data is the same on the primary and secondary disk and therefore the device group state is reported as PAIR.
52 HP StorageWorks Cluster Extension XP User Guide

Force flag

The force flag forces Cluster Extension XP to skip the internal logic and enables write access to the local volume regardless of the disk pair state. This flag can be set when you are sure that the current site contains the latest data, even though a previous application service startup process failed because Cluster Extension XP discovered a disk pair status that could not be handled automatically.
To use the force flag feature, you must create a file called
application_name.forceflag in the directory specified by the ApplicationDir object prior to starting the application service that uses
Cluster Extension XP. Before you create this file, ensure that the application service is not running elsewhere.
This file will be removed after Cluster Extension XP detects the file.
You cannot use the force flag if the local disk state is SVOL_COPY, which indicates that a copy operation is in progress. A disk cannot be activated when a write operation is in progress to that disk; therefore, Cluster Extension XP returns a global error.
Using the force flag does not enable the automatic recovery features of Cluster Extension XP. After using the force flag, you must recover the suspended or broken disk pairs by using RAID Manager XP commands as described in “Recovery sequence” (page 228).
Cluster Extension XP processes and components 53

Pre-execution and post-execution programs

Cluster Extension XP can invoke pre-execution and post-execution programs prior to or after a Cluster Extension XP takeover function. Those programs can be any executable, and must be able to provide return codes to Cluster Extension XP. If the programs add significant execution time to the application service startup process, the timeout values for the startup process must be adjusted in the cluster software.
Cluster Extension XP transfers information as command-line arguments to the pre-execution and post-execution programs. Pre-executables and post-executables must be specified by full path in the PreExecScript and PostExecScript objects. If no executable is specified (empty value for the object), no preprocessing or postprocessing, respectively, is done.
The following arguments are transferred to the scripts in this order:
1. Name
2. Vgs
3. RaidManagerInstances
4. DeviceGroup
5. local device group state (check)
Post-executable status after failover
6. local device group state (display)
Post-executable status after failover
7. remote device group state (check)
Post-executable status after failover
8. remote device group state (display)
Post-executable status after failover
9. current fence level
10. disk array serial numbers (local)
11. reserved
54 HP StorageWorks Cluster Extension XP User Guide
12. reserved
13. disk array firmware version (local)
14. RAID Manager XP version (local)
15. application directory path (ApplicationDir object)
16. log file location (LogDir object)
17. DC_A_Hosts node names
18. DC_B_Hosts node names
The pre-executables and post-executables must return a return code. These return codes are used to determine whether a takeover function must be called.
Pre-executable return codes
0 PRE_OK_TAKEOVER
Pre-executable ok and takeover action allowed.
1 PRE_ERROR_GLOBAL
Pre-executable failed; no takeover; stop application service cluster-wide.
2 PRE_ERROR_DC
Pre-executable failed; no takeover; stop application service in this data center.
3 PRE_ERROR_LOCAL
Pre-executable failed; no takeover; stop application service on this system.
4 PRE_ERROR_TAKEOVER
Pre-executable failed; takeover action allowed.
5 PRE_OK_NOTKVR_NOPST
Pre-executable ok; no takeover; no post-exec.
Cluster Extension XP processes and components 55
Caution If the pre-execution program returns 1, 2, 3 or 5, a properly configured
post-executable will not be executed. If a takeover function fails, the post-executable will not be executed.
Post-execution return codes
0 POST_OK
Post-executable ok; continue.
1 POST_ERROR_GLOBAL
Post-executable failed; stop application service cluster-wide.
2 POST_ERROR_DC
Post-executable failed; stop application service in this data center.
3 POST_ERROR_LOCAL
Post-executable failed; stop application service on this system.
4 POST_ERROR_CONTINUE
Post-executable failed; continue without error.
Caution Windows 2000 script and batch files return 0 if the program was
successfully executed, even if you return a different return code.
56 HP StorageWorks Cluster Extension XP User Guide

Cluster Extension XP log facility

The logging module of Cluster Extension XP provides log messages to the cluster software as well as to the Cluster Extension XP log file. The Cluster Extension XP log file includes disk status information.
The Cluster Extension XP log file is located in this directory:
Linux/UNIX /var/opt/hpclx/log
Windows By default, this location is defined as this value:
%ProgramFiles%\Hewlett-Packard\Cluster Extension XP\log\
For the quorum service, the log resides at this location:
%systemroot%\clxq.log
If the log file needs to be cleared and reset, for example, to reduce disk space usage, archive the log file and then delete it. A new log file is generated automatically.
Related information For information about log levels, see “LogLevel”.
Cluster Extension XP processes and components 57

Error return codes

Cluster Extension XP provides the following error return codes for failover operations:
local error Prohibits an application service startup on the local
data center error Prohibits an application service startup on any system
global error A global error is returned if the configuration or the
When Cluster Extension XP is integrated, an error message string and integer value are displayed. For the command-line interface, a return code is displayed. For more information, see “CLI Commands” on page 195.
system. This can be caused by the inability of Cluster Extension XP to enable disk access, or misconfiguration of the disk array environment.
in the local data center. This error is returned if the disk state indicates that it makes no sense to allow any other system connected to the same disk array to access the disks.
disk state does not allow an automatic application service startup process. Manual intervention is required in such cases.
58 HP StorageWorks Cluster Extension XP User Guide

Quorum service for Microsoft Cluster service

Microsoft Cluster service uses a single SCSI disk called the quorum disk to eliminate the potential split brain condition and to coordinate administrative actions performed by cluster nodes. The quorum disk represents a possible single point of failure (SPOF) because when it fails or communications is lost to it, the whole cluster will shut down. If this SPOF is not eliminated, the cluster is at risk even when geographically dispersed.
The quorum service resolves the SPOF problem with the quorum disk. It employs the HP StorageWorks Continuous Access XP (CA) technology to remotely mirror the quorum disk and extends the Microsoft Cluster service functions to manage this mirrored quorum disk.
The quorum service performs two major functions:
• To manage the mirrored quorum disk pair during regular cluster operations and failover.
It detects quorum disk operations from Microsoft Cluster service and swaps the disk pair in a timely manner before Microsoft Cluster service moves the quorum ownership to the mirrored side or before a failover to the mirrored side occurs.
• To avert a split brain scenario should part of the cluster become completely isolated from the rest of the cluster.
When cluster nodes on the mirrored side of a dispersed cluster lose all connections (including all heartbeats and CA links), the quorum service uses external arbitration to address the potential split brain problem.

Using the quorum service in a Microsoft Cluster service environment

In general, there are two types of disks used in a Microsoft Cluster service environment: the application data disk and the quorum disk. The quorum disk is a special case of an application disk.
While the Cluster Extension XP resource DLL provides disaster recovery (DR) and high availability (HA) support for Microsoft Cluster service
Cluster Extension XP processes and components 59

Quorum processes

application disks using the same CA technology, the quorum service focuses on protecting the quorum disk, which functions somewhat differently from data disks. In most circumstances, this protection is critical for the cluster’s control resource. If the quorum disk fails, the whole cluster will be unavailable, regardless of how well the application data disks are protected.
The quorum service has few configuration options. After it is installed, the service works seamlessly with the Microsoft Cluster service and the cluster applications.
Install and use the quorum service together with the Cluster Extension XP resource type to provide complete coverage for the cluster.
The quorum service creates a synchronous CA disk pair for the quorum disk to prevent it from being a SPOF. When the primary quorum disk fails, the quorum service allows the cluster to use the mirrored (secondary) quorum disk to continue cluster operations. Specifically, the service extends the Microsoft Cluster service quorum management protocol to:
• maintain the existing ownership in normal operations
• decide the new ownership when the original owning host node fails
• fail over ownership when the primary quorum disk fails in the context of mirrored quorum disks
One major task of the quorum service is to swap the quorum pair. Usually the primary disk has both read and write permissions. However, the mirrored disk on the secondary side has no write permission. To use the secondary quorum disk as a quorum, the service swaps the direction of the disk pair to make the quorum disk at the second site writable before Microsoft Cluster service starts to use it. The quorum service interacts with Microsoft Cluster service at the I/O operation level. When it finds the Microsoft Cluster service is going to use the secondary side quorum disk, it swaps the disk pair for Microsoft Cluster service first. It uses the RAID Manager XP library to communicate with the XP disk array.
60 HP StorageWorks Cluster Extension XP User Guide
The quorum service also moderates ownership of the quorum resource. More than one node can request ownership simultaneously. To coordinate those nodes between dispersed locations and to eliminate the dependency on the networks, the quorum service uses three sets of small volumes (disks) as control devices to manage quorum ownership and coordination. The first is always in a paired state; the second is not paired; and the third changes its paired state dynamically. Because they require little space, you can minimize resource requirements by creating CVS (custom volume size) disks for use as control devices.
Cluster operation continues even though access to the quorum disk is lost on cluster nodes that do not own the quorum disk. On those nodes, the quorum service (ClxQSvc) continues to monitor the quorum filter driver in case I/O access is lost to the XP disk system. However, logging to the clxq.log file will be suspended until the quorum filter driver detects the arrival of the missing quorum disk. This behavior follows the Microsoft Cluster service behavior, which means that the Microsoft Cluster service continues to run but the cluster node will not be able to become a quorum disk owner in case the original quorum disk owner fails. The quorum service and the quorum filter driver log this type of incident to the Event Log to inform you properly.
If the quorum service exits without being gracefully stopped by the Windows Service Control Manager (SCM), the SCM will automatically restart the quorum service under virtually all possible conditions.
The quorum service is set to wait for 30 seconds for Microsoft Cluster service startup activity. If the Cluster service does not go into START_PENDING state during this timeframe, the quorum service will stop automatically. The actual interval for Cluster Service startup checks is 1 second. It will continue checking every second until the Cluster service changes to START_PENDING state. START_PENDING is an internal state. The Service window will show STARTING as service status.
It is recommended that you start ClxQSvc (the quorum service) through its enforced dependency to the Cluster service. This means you should start the Cluster service first, which will cause the quorum service to start automatically before the Cluster service starts.
Cluster Extension XP processes and components 61
The wait timeout is the total amount of time (in seconds) that the quorum service will wait for the Cluster service to start up, before giving up and shutting itself down. This value should be sufficient for all cases. However, you can change the value during manual service startups as follows:
1. Open the Services window.
2. Right click on ClxQSvc.
3. Select Properties.
4. In the Start parameters: field, enter
/waitforclussvc <number in seconds>
5. Press Start in the properties window.
The quorum service stops automatically if the Cluster service stops.
The quorum service checks during its initialization if the configured disk pairs for the quorum service control disk 1 (STATUS) and quorum disk are established. The quorum service will not start and does not allow the Cluster service to start if those disk pairs are not in PAIR state.
For emergency startup of the Cluster Service in case all quorum arbitration functions are unavailable or the disks cannot be paired, the quorum service can be started manually:
1. Open the Services window.
2. Right-Click on ClxQSvc.
3. Select Properties.
4. In the Start parameters: field, enter
/createsplitbrain
5. Click the Start button in the properties window. This could create two separate clusters.
Caution The creation of separate clusters can result in data loss as explained below.
Clicking the Start button in the Services window can lead to a split brain syndrome, where the cluster runs the same applications on two different
62 HP StorageWorks Cluster Extension XP User Guide
sets of disks. This can happen if the cluster is running or restarted in the remote data center and any number of cluster nodes are isolated from each other.
A split brain condition occurs when one site of the cluster loses all the connectivity with the second site and each site decides to form its own cluster. The serious consequence of the split brain scenario is the corruption of business data because client data will no longer be consistent. To eliminate the split brain syndrome in cases where there is a total loss of communications between sites, the quorum service employs an external arbitration mechanism.
The external arbitrator runs on a node on your intranet, external to either cluster site, but accessible by all cluster nodes. Assuming it is reachable by one data center, even when one cluster site of a cluster loses all connectivity with the other site, its major function is, upon request, to check the cluster status by communicating with the cluster nodes. By providing sufficient information to the cluster nodes, they can make the critical decision on whether to form a new cluster, thereby avoiding a split brain condition.
In addition to the external arbitrator, two processes on each cluster node work closely with the arbitrator. One process is created dynamically on a cluster node located at the site not holding the quorum disk when it detects that it has lost all connectivity to the site holding ownership of the quorum disk. Its major function is to communicate with the arbitrator to determine whether the cluster is still functioning. Based on the status of the cluster and which site owns the quorum disk, it may decide to form a new cluster, leave the cluster down, or join an existing cluster. This process is called the “cluster decision maker.”
The second process is created dynamically on the host node owning the quorum resource when the node detects that the mirror link between two sites goes down. The main purpose of this process is to shut down the cluster on the site that was controlling the quorum disk when that site became completely isolated from the external network. This prevents a potential split brain condition when the connectivity between the two sites is restored, and the formally isolated copy of the cluster suddenly becomes available to the external network. This process is called the “isolation checker.” The isolation checker will restart after it has finished and the CA XP link is still down. This can cause the Cluster service to stop if the
Cluster Extension XP processes and components 63
network link between the arbitrator and the quorum owner in the cluster fails after all heartbeat network links and all CA XP links have failed.
If a broken CA XP link is detected by the quorum service, messages will be logged to the Event Log that show whether the cluster decision maker or the isolation checker is running.
Caution Do not start the Cluster service while the cluster decision maker or the
isolation checker is running. The cluster decision maker and the isolation checker will log start and stop messages with the results of their findings to the Event Log.
The quorum service supports two ways to retrieve the required information about the external arbitrator, such as its IP address and port number. One is through a local configuration file, and the other is through the Active Directory server. During installation, the user is asked whether an Active Directory service is available. The quorum service will use the Active Directory if it is available. Otherwise, it will use its local configuration file generated by the installation process.
The quorum service is implemented as a Windows service. To ensure that the service is always available for the quorum disk, a startup dependency is imposed on the cluster service by the quorum service. The quorum service must start and remain functioning during the entire time that the cluster is running. If it is forced to stop, it will also stop the cluster service on that server, ensuring that the quorum disk pair is always properly managed by the quorum service.
64 HP StorageWorks Cluster Extension XP User Guide
3
User configuration file and Cluster
Extension XP objects
Objects define the disk array environment and failover/failback behavior. Objects can be customized in the user configuration file or directly in the cluster software.

User configuration file and Cluster Extension XP objects 65

The user configuration file

Cluster Extension XP uses the user configuration file to gather application service-specific information. This file describes the dependencies between application services and RAID Manager XP device groups in one file for all application services in the cluster. This file must be copied to all nodes that use Cluster Extension XP.
The user configuration file must be placed in the configuration directory:
Linux UNIX
Windows By default, this location is defined as this value:
Related information “Basic configuration example” (page 87)
Related information “User configuration file for HACMP” (page 101)
/etc/opt/hpclx/conf
%ProgramFiles%\Hewlett-Packard\Cluster Extension XP\conf
“Creating and configuring the user configuration file” (page 194)
HACMP
The UCF.cfg file is required for IBM HACMP. A single UCF.cfg file must be maintained and copied to all systems using Cluster Extension XP. The UCF.cfg includes a “common” section to configure the Cluster Extension XP environment and an “application” section to configure the application service-dependent failover/failback behavior. The application section is a multitag component; the APPLICATION tag and application-related objects can appear numerous times in the UCF.cfg.
66 HP StorageWorks Cluster Extension XP User Guide
Microsoft Cluster service
Cluster Extension XP integration with Microsoft Cluster service does not require a user configuration file when the standard environment for Cluster Extension XP is used. The Cluster Extension XP objects that are integrated with Microsoft Cluster service are configurable as resource private properties in the cluster software.
Related information “Configuring Cluster Extension XP resources” (page 117)
VCS
Cluster Extension XP integration with VERITAS Cluster Server does not require a user configuration file when the standard environment for Cluster Extension XP is used. The Cluster Extension XP objects that are integrated with VERITAS Cluster Server are configurable as resource attributes in the cluster software.
Related information “Configuring the Cluster Extension XP resource” (page 154)
SG-LX
An environment configuration file is required for Serviceguard. The file must reside in the same directory as the package control file and is identified by the package name:
package_name_clx.env
The APPLICATION tag is required, although no value is required.
Related information “Configuration of the Cluster Extension XP environment” (page 174)

File structure

The configuration file comprises a common section and application sections. These sections are distinguished by control tags. Cluster Extension XP uses the following objects as control tags:
•COMMON
• APPLICATION
User configuration file and Cluster Extension XP objects 67
Objects have one of the following formats:
tag a definition of an object, for example, COMMON or
integer a number, for example, a timeout value.
string a name, which can include alphabetic and numeric
list a list of space-separated strings, for example, a list of

Specifying object values

When using the default configuration, you must provide values for these five objects:
APPLICATION.
characters and underscores, for example, an application startup value.
host names (lists of numbers are stored as lists of strings).
DeviceGroup (page 82) DC_A_Hosts (page 82) DC_B_Hosts (page 82) RaidManagerInstances (page 84) XPSerialNumbers (page 86)
You do not need to change the default settings unless you want to change the degree of protection for your paired disks. If you change an object, you may need to change additional objects as well. For example, if you change the FenceLevel object to DATA, you might need to change the DataLoseMirror object also.
Objects are supported according to the requirements or capabilities of the cluster software, as listed in table 1 (page 69).
68 HP StorageWorks Cluster Extension XP User Guide
Table 1. Cluster Extension XP supported objects
Name Page
COMMON 71
LogDir 71
LogLevel 71
SearchObject 72
VcsBinPath 72
APPLICATION 74
ApplicationDir 74
ApplicationStartup 75
AsyncTakeoverTimeout 77
AutoRecover 78
BCEnabledA 79
BCEnabledB 79
BCMuListA 79
BCMuListB 79
BCResyncEnabledA 79
BCResyncEnabledB 80
BCResyncMuListA 80
BCResyncMuListB 80
DataLoseDataCenter 80
DataLoseMirror 81
*DC_A_Hosts 82
*DC_B_Hosts 82
CLI
HACMP
MS Cluster
service
VCS
SG-LX
•••••
•••••
•••••
•••••
••••
•••••
•••••
•••••
•••••
•••••
•••••
•••••
•••••
•••••
•••••
•••••
•••••
•••••
•••••
•••••
User configuration file and Cluster Extension XP objects 69
Table 1. Cluster Extension XP supported objects (Continued)
Name Page
* DeviceGroup 82
FastFailbackEnabled 83
FenceLevel 83
Filesystems 83
PostExecCheck 84
PostExecScript 84
PreExecScript 84
* RaidManagerInstances 84
ResyncMonitor 85
ResyncMonitorAutoRecover 85
ResyncMonitorInterval 85
ResyncWaitTimeout 86
Vgs 86
* XPSerialNumbers 86
CLI
HACMP
MS Cluster
service
VCS
SG-LX
(continued)
•••••
•••••
••
•••••
•••••
•••••
•••••
••••
••••
••••
•••••
•••••
•••••
LEGEND * Required
Supported
70 HP StorageWorks Cluster Extension XP User Guide

COMMON section objects

The common part is used to set the environment of Cluster Extension XP.
The COMMON tag is a single-tag; it can appear in the configuration file only once. The common object does not require any value.
Objects of the type common can only appear once. Those objects must be placed after the COMMON tag in the configuration file.
If the default values fit your environment, there is no need to specify them in the file.
COMMON
Format tag
Description Distinguishes between general (common) and application-specific objects.
LogDir
Format string
Description (Optional) Defines the path to the Cluster Extension XP log file.
Default value Linux/UNIX
/var/opt/hpclx/log
Windows
%ProgramFiles%\Hewlett-Packard\Cluster Extension XP\log
LogLevel
Format string
Description (Optional) Defines the logging level used by Cluster Extension XP.
User configuration file and Cluster Extension XP objects 71
Valid values error (default) Logs only error messages for events that are
nonrecoverable.
warning Logs error messages and warning messages for
events that are recoverable.
info Logs error messages, warning messages, and
additional information, such as disk status.
debug Logs error messages, warning messages, info
messages, and messages that report on execution status, useful for troubleshooting.
SearchObject HACMP only
Format string
Description (Optional) Searches for the application service if the user configuration
file specifies multiple applications. This object is not used for VCS, Microsoft Cluster service, or SG-LX.
Default value Vgs
VcsBinPath VCS only
Format string
Description (Optional) Defines the path to the VCS binaries. This object is not used for
Microsoft Cluster service, SG-LX, or HACMP.
Default value /opt/VRTSvcs/bin
72 HP StorageWorks Cluster Extension XP User Guide

APPLICATION section objects

The application part defines the failover and failback behavior of Cluster Extension XP for each application service. APPLICATION is a multitag that can appear in the configuration file for each application service using Cluster Extension XP.
The APPLICATION object requires the name of the application service as its value. The objects specified after an APPLICATION tag must appear only once per application. As with the common part objects, the application part objects have predefined default values.
Cluster Extension XP also uses the following rules to define objects:
• If you use the default value, you do not have to specify the object.
• Cluster Extension XP uses objects depending on the setting of other objects. For example, if you set the FenceLevel object to DATA, Cluster Extension XP uses the values specified for the DataLoseMirror or DataLoseDataCenter object. However, these objects are ignored if the FenceLevel object is set to NEVER.
• The pre-execution and post-execution functions in Cluster Extension XP will not be processed if the associated object values are empty. (This is the default setting.)
CLI HACMP SG-LX
To set APPLICATION object values, use the user configuration file.
VCS
Use the VCS GUI to set APPLICATION object values.
Microsoft Cluster service
To set APPLICATION object values, use the Microsoft Cluster service Cluster Administrator GUI.
User configuration file and Cluster Extension XP objects 73
APPLICATION
Format tag
Description Distinguishes between general and application-specific objects. Specify
ApplicationDir
Format string
Description Specifies the directory where Cluster Extension XP searches for
the name of the application service. The format of its value is equivalent to a string value.
SG-LX
For Serviceguard, the tag is required; however, specifying a value is not necessary.
application-specific files, such as the force flag or online file.
If ApplicationDir is set to a nonexistent drive and PairResyncMonitor is not enabled, Cluster Extension is unable to create the online file and cannot put the resource online.
SG-LX
The value of ApplicationDir is derived from the package control file location.
Windows
If ApplicationDir is not set, Cluster Extension uses the local %HPCLX_PATH% values as defined in the registry.
Default values Linux
UNIX
/etc/opt/hpclx
Windows
%HPCLX_PATH%
74 HP StorageWorks Cluster Extension XP User Guide
Files resource_name.createsplitbrain
ApplicationStartup
Format string
Description (Optional) Specifies where a cluster group should be brought online.
resource_name.forceflag resource_name.online
If specified in a user configuration file, resource_name is the value of the
APPLICATION tag; otherwise, resource_name is the value of the Cluster Extension XP resource name.
The ApplicationStartup object can be customized to determine whether an application service starts locally or is transferred back to the remote data center (if possible) to start directly without waiting for resynchronization. This object is used only if an application service has already been transferred to the secondary site and no recovery procedure has been applied to the disk set (the disk pair has not been recovered and is not in PAIR state). This process is considered a failback attempt without prior disk pair recovery.
Cluster Extension XP can detect the most current copy of your data based on the disk state information. If Cluster Extension XP detects that the remote XP disk array has the most current data, it orders a resynchronization of the local disk from the remote disk, or it stops the startup process to enable the cluster software to fail back to the remote XP disk array.
If a resynchronization is ordered, Cluster Extension XP monitors the progress of the copy process. If the application service was running on a secondary XP disk array without replication link, a large number of records may need to be copied. If the copy process takes more time than the configured application startup timeout, the application startup will fail.
User configuration file and Cluster Extension XP objects 75
Microsoft Cluster service
If the ApplicationStartup resource property is set to FASTFAILBACK and the FailoverThreshold value is set to a number higher than the current number of clustered systems for the resource group, the resource group will restart on configured nodes until one of the following conditions is met:
• The resource is brought online in the remote data center.
• The resource failed because the FailoverThreshold value has been reached.
• The resource failed because the FailoverPeriod timeout value has been reached.
Caution Disable subsequent automated failover procedures for recovery failback
operations.
Valid values FASTFAILBACK (default)
The cluster group will be brought online in the remote data center (if possible) without waiting for resynchronization. The application startup process will be stopped locally and Cluster Extension XP reports a data center error. Depending on the cluster software, the application service cannot start on any system in the local data center and the cluster software will transfer the application service back to the remote data center. Use this value to provide the highest application service uptime. Depending on the value configured for the AutoRecover object, Cluster Extension XP will attempt to update the former primary disk based on the secondary disk and swap the personalities of the disk pair so that the local disk will become the primary disk.
In a two-node cluster, this process will not work because the target failback system would not be available. In this case, the application service must be started manually, or the ApplicationStartup object should be set to RESYNCWAIT.
76 HP StorageWorks Cluster Extension XP User Guide
AsyncTakeoverTimeout
Format integer
Description (Optional) Specifies the horctakeover command timeout in seconds.
RESYNCWAIT Online local, cluster group must wait until the disk
status is PA IR. Cluster Extension XP will initiate a resynchronization of the local disk based on the remote disk. The copy process will be monitored. If no copy progress was made after a monitoring interval expired, the copy process is considered failed and Cluster Extension XP returns a global error. If
RESYNCWAIT has been specified for the ApplicationStartup object, the ResyncWaitTimeout
object must be specified, in case Cluster Extension XP should wait for resynchronization changes for more or less than 90 seconds, which is the default.
Must be adjusted based on disk mirroring link speed.
This object is used only if the FenceLevel object value is ASYNC.
The takeover operation for fence level ASYNC (Continuous Access XP Extension) offers the option to stop the data transfer process after a specified time value. This is used to allow access to the remote copy if the data transfer process has been stopped due to a Continuous Access XP-link failure. All data that has been copied up to the moment the timeout value has been reached is consistent and available to access at the secondary site.
User configuration file and Cluster Extension XP objects 77
Default value 1800 (default)
AutoRecover
Description (Optional) Recovers a suspended or deleted disk pair when the resource is
Caution Measure or calculate the full XP disk array cache copy time to use the
gathered information for the AsyncTakeoverTimeout object. After a takeover command has been invoked, Continuous Access XP Extension copies the side file area residing in the XP disk array cache to the site where the takeover command has been issued (the secondary disks). The side file area cannot exceed the installed cache size. The maximum time for the AsyncTakeoverTimeout object is the time to fully copy the amount of cache size data. The takeover timeout value is used to terminate the copy process to provide access to the secondary disks, for example, if all links or the primary XP disk array are unavailable to copy the side file area. The copy time depends on the performance of the Continuous Access XP link between your sites. The takeover or resynchronization operation could take longer than the timeout value for application service startup in the cluster software. The application service startup might fail in this case. However, the takeover or resynchronization command will continue in the background.
Format string
brought online at application service startup time.
If the AutoRecover object is set to YES, Cluster Extension XP will try to resynchronize the remote disk at application startup time. Cluster Extension XP will ignore the return code of the resynchronization command and allow access to the disk ensuring highest application availability.
If the resynchronization attempt fails, Cluster Extension XP will not fail. The internal logic will first apply the concurrency and consistency rules to allow access to the disk set.
If you configure fence level DATA for the device group and set the FenceLevel object to DATA, the AutoRecover object will change Cluster Extension XP’s behavior. Cluster Extension XP will attempt to reestablish the PAI R state and wait for the PAIR state before it allows access to the disk. If the resynchronization or takeover process fails, Cluster Extension XP returns a global error.
78 HP StorageWorks Cluster Extension XP User Guide
Valid values YES (default)
BCEnabledA
Description (Optional) Enables rolling disaster protection for data center A.
Valid values YES
BCEnabledB
Description (Optional) Enables rolling disaster protection for data center B.
Valid values YES
BCMuListA
Description (Optional) Space-separated list defines the MU number of the Business
BCMuListB
NO
Format string
NO (default)
Format string
NO (default)
Format list
Copy XP disk pairs in data center A.
Format list
Description (Optional) Space-separated list defines the MU number of the Business
Copy XP disk pairs in data center B.
BCResyncEnabledA
Format string
Description (Optional) Enables automatic resynchronization of Business Copy XP
disk pairs in data center A. The automatic resynchronization function is supported only when the split BC pair is located in the same data center where Cluster Extension XP is started.
Valid values YES
NO (default)
User configuration file and Cluster Extension XP objects 79
BCResyncEnabledB
Format string
Description (Optional) Enables automatic resynchronization of Business Copy XP
Valid values YES
BCResyncMuListA
Format list
Description (Optional) Space-separated list defines the MU number of the Business
BCResyncMuListB
Format list
Description (Optional) Space-separated list defines the MU number of the Business
DataLoseDataCenter
Format string
disk pairs in data center B. The automatic resynchronization function is supported only when the split BC pair is located in the same data center where Cluster Extension XP is started.
NO (default)
Copy XP disk pairs in data center A.
Copy XP disk pairs in data center B.
Description (Optional) Specifies whether a resource should be brought online while
the disk pair is (or will be) suspended or deleted and there is no connection (CA XP and IP network) to the remote data center.
Used only if the FenceLevel object value is DATA.
RAID Manager XP is able to access its remote peer to invoke takeover actions for Continuous Access XP device groups. It is also able to invoke a swap-takeover operation of the device group from the secondary site. If no configured remote RAID Manager XP instance replies to a request of the local RAID Manager XP instance (remote status EX_ENORMT), all network connections between the local and the remote data center are considered DOWN. If the swap-takeover operation leads into a suspended state for the device group, the Continuous Access XP links are considered DOWN.
80 HP StorageWorks Cluster Extension XP User Guide
Valid values YES (default)
DataLoseMirror
Format string
Description (Optional) Specifies whether a resource should be brought online while
Because redundant networks and Continuous Access XP links are necessary to build a disaster tolerant environment, this situation can be considered as a data center failure. The DataLoseDataCenter object is used to allow/prohibit automatic application service startup in this particular case.
The combination of setting the DataLoseMirror object to YES and the
DataLoseDataCenter object to NO are contradictory.
NO
the disk pair is suspended or deleted.
Used only if the FenceLevel object value is DATA and local and remote XP disk status information can be gathered. If the remote XP disk state information is not available (remote state EX_ENORMT), the setting of the DataLoseDataCenter object will be used.
Depending on the value configured for the AutoRecover object, Cluster Extension XP will attempt to recover the PAI R state for the device group. Cluster Extension XP waits until the PAIR state has been established. If this operation fails, Cluster Extension XP will return a global error. Because the DATA fence level ensures no loss of concurrency, manual intervention is required to recover the PAIR state. The PAIR state must be reestablished for all disks in the device group before you can start the application service.
The combination of setting the DataLoseMirror object to YES and the DataLoseDataCenter object to NO are contradictory.
Valid values YES
NO (default)
User configuration file and Cluster Extension XP objects 81
DC_A_Hosts Required
Format list
Description Space-separated list defines the cluster nodes in data center A.
VCS
This object is a string-vector element. Add a new element to the list for each system name.
DC_B_Hosts Required
Format list
Description Space-separated list defines the cluster nodes in data center B.
VCS
This object is a string-vector element. Add a new element to the list for each system name.
DeviceGroup Required
Format string
Description RAID Manager XP device group, containing the application service disk
set.
Files Linux
UNIX /etc/horcmX.conf
Windows drive:\winnt\horcmX.conf %system_root%\horcmX.conf
where X is the RAID Manager XP instance number.
82 HP StorageWorks Cluster Extension XP User Guide
FastFailbackEnabled VCS only
Format string
Description (Optional) Disables VCS service groups for the data center. This allows
transferring the service group back to the remote data center immediately. To allow this operation, the VCS configuration file (main.cf) will be write enabled and saved later.
The service group will be disabled for all systems contained in either the DC_A_Hosts object or DC_B_Hosts object. Then, the VCS configuration file will be saved (dumped).
Valid values YES (default)
NO
FenceLevel
Format string
Description (Optional) The FenceLevel object specifies the fence level configured for
the device group. Cluster Extension XP checks whether the current fence level reported by the XP disk array is the same as the configured (expected) fence level. This object is also used to make sure your configurations are supported based on consistency considerations. Different failover and recovery procedures are used for different fence levels.
If you change the FenceLevel object value, also review the values of these objects:
DataLoseMirror (page 81) DataLoseDataCenter (page 80) AsyncTakeoverTimeout (page 77)
Valid values DATA
NEVER (default) ASYNC
Filesystems CLI and HACMP only
Format list
Description Space-separated list of file systems.
User configuration file and Cluster Extension XP objects 83
PostExecCheck
Format string
Description (Optional) The PostExecCheck object is used to configure Cluster
Valid values YES
PostExecScript
Format string
Description (Optional) Specifies an executable with its full path name to be invoked
PreExecScript
Format string
Description (Optional) Specifies an executable with its full path name to be invoked
Extension XP to gather XP disk pair status information after the takeover procedure. That information will be passed to the post-executable. In case of a remote data center failure, it could be time consuming to gather that information, especially if your post-executable does not need any XP status information. The arguments passed to the post-executable will include only the local disk status if the PostExecCheck object is set to
NO. See “RAID Manager XP configuration” (page 90).
NO (default)
after the takeover action or failover procedure.
before the takeover action or failover procedure.
RaidManagerInstances Required
Format list
Description A space-separated list of RAID Manager XP instances Cluster Extension
XP can use to communicate with the disk array. The instance numbers must be the same among all cluster systems. Cluster Extension XP can alternate between the specified instances.
84 HP StorageWorks Cluster Extension XP User Guide
VCS
This object is a string-vector element. Add a new element to the list for each system name.
Files Linux
UNIX /etc/horcmX.conf
Windows %systemroot%\horcmX.conf
where X is the RAID Manager XP instance number.
ResyncMonitor
Format string
Description (Optional) Starts the pair/resync monitor to monitor the disk pair status
and resynchronize disk pairs if the ResyncMonitorAutoRecover attribute is set to YES.
Valid values YES (default: Microsoft Cluster service)
NO (default: HACMP; SG-LX; VCS)
ResyncMonitorAutoRecover
Format string
Description (Optional) Automatically recovers disk pairs states if the disk pairs are
monitored by the pair/resync monitor.
Valid values YES
NO (default)
ResyncMonitorInterval
Format integer
Description (Optional) Specifies the monitor interval in seconds the pair/resync
monitor will check the disk pair status.
Default value 60
User configuration file and Cluster Extension XP objects 85
ResyncWaitTimeout
Format integer
Description (Optional) It is used to specify the timeout value in seconds for a disk pair
resynchronization. It may take some time to resynchronize disks. The timer times out if there is no change in the percentage value of the copy status for the device group in the specified time interval. The timeout value is used if the ApplicationStartup object is set to RESYNCWAIT.
Default value 90
Vgs CLI and HACMP only
Format list
Description List of volume groups
XPSerialNumbers Required
Format list
Description A space-separated list of at least two serial numbers must be specified: the
serial numbers of the primary and secondary XP disk arrays. Cluster Extension XP checks whether the local disk array is contained in this list. Serial numbers of the disk arrays of the connected cluster nodes (at least two).
VCS
This object is a string-vector element. Add a new element to the list for each system name.
86 HP StorageWorks Cluster Extension XP User Guide

Basic configuration example

The following is an example of a basic UCF.cfg file.
#/etc/opt/hpclx/conf/UCF.cfg #This is the Cluster Extension XP User Configuration File (UCF.cfg). #The COMMON tag specifies the configuration for the #Cluster Extension XP core environment COMMON LogLevel info #default (not necessary) APPLICATION sap #the application service Vgs sapdatavg saptmpvg #the volume groups (not necessary) Filesystems /sapdata /saptmp #the filesystems DeviceGroup sapdg #RM dev group for the app service RaidManagerInstances 22 #RM instance number for dev group DC_A_Hosts host1a host2a #Data center A DC_B_Hosts host3b host4b #Data center B
User configuration file and Cluster Extension XP objects 87
88 HP StorageWorks Cluster Extension XP User Guide
4

RAID Manager XP dependencies

Cluster Extension XP depends on HP StorageWorks RAID Manager XP and the cluster software it is integrated with.
Before you configure Cluster Extension XP, verify that the host and disk array systems are properly configured:
• The disk array and its remote peer have been properly configured.
• The host system recognizes the disk arrays.
• The HP StorageWorks Continuous Access XP links are bidirectional and working properly.
• You are familiar with the disk and volume configuration of the operating system.
RAID Manager XP dependencies 89

RAID Manager XP configuration

To function properly, Cluster Extension XP requires at least one instance of RAID Manager XP. Cluster Extension XP starts the configured RAID Manager XP instance if it is not running. However, if the RAID Manager XP instance cannot be started or returns an error, Cluster Extension XP can switch to an alternate RAID Manager XP instance.
Ensure that the path to the RAID manager binary files is included in the PATH environment variable.
Recommendation Configure two RAID Manager XP instances per system and start those
instances automatically at system boot time.

RAID Manager XP configuration file

The RAID Manager XP configuration file (horcmX.conf) is used to map device groups to the internal disk array disks. A device group is the common unit for failover operations initiated from the server side.
A RAID Manager XP configuration file consists of these four parts:
• HORCM_MON
The monitor part defines the local network and port where the RAID Manager XP instance is listening for incoming requests from a remote instance. It also defines the polling interval and timeout value for request to other instances.
The first entry defines the network that RAID Manager XP listens to. The default value is NONE. The default setting enables RAID Manager XP to listen on all configured networks.
The timeout value is important to Cluster Extension XP. You can configure the time Cluster Extension XP will wait to receive information back from the remote site. The timeout interval applies for each remote instance configured in the HORCM_INST section of the RAID Manager XP configuration file. If the last instance configured in the HORCM_INST section is the only instance that will answer a request, it will take the number of seconds of the timeout value times
90 HP StorageWorks Cluster Extension XP User Guide
the number of not responding remote instances until the request can be answered. This must be considered for the application service startup timeout value you can configure in your cluster software.
A general formula for this behavior in case of a complete site failure is the following:
tW = t t
W
(in 10 msec) x (nHI +1)
HM
= wait time until remote error will be reported by local RAID
Manager XP instance
t
= HORCM_MON timeout
HM
n
= number of remote instances, specified per device group in
HI
HORCM_INST
Recommendation Reduce the default timeout value in conjunction with increasing numbers of
different (at least two) network connections to the remote RAID Manager XP instance. The settings of these two parameters directly affect the timing of the failover behavior of Cluster Extension XP. Cluster Extension XP experiences the above mentioned wait time twice if all of the remote RAID Manager XP instances cannot be reached. If a post-executable is configured, a third wait time period is added.
• HORCM_CMD
The command device part defines which raw disk device is used to communicate to the disk array. This device cannot be used for any data other than control data of the RAID Manager XP instance. Several command devices may be configured to provide alternate access paths to control Continuous Access XP pair operations.
If command devices are configured in separate lines, RAID Manager XP interprets those devices as different disk arrays. Therefore, you can use one RAID Manager XP instance to control several XP disk arrays. Cluster Extension XP does not support this feature.
• HORCM_DEV
The device group part maps device groups and device names to internal disks (LDevs) in the disk array. Failover operations are carried out for the device groups but can also be initiated for a single disk pair. The device groups and device names must be unique in the RAID Manager XP configuration file. However, device group names should
RAID Manager XP dependencies 91
be unique for the whole cluster environment to prevent any kind of user mistake.
For fence level ASYNC, the device group also represents a consistency group.
To combine CA disk pairs and BC disk pairs, you can use the MU number to specify internal BC disks.
Recommendation Use the local and remote LDEV and CU number as the device name to
easily recognize configuration or mapping mistakes. For example, if the local LDEV number is 0a (hex), the local CU number is 0, the remote LDEV number is 1 (hex) and the remote CU is 3, a recommended device name would be disk_00a_301. This approach also ensures unique device names because the LDEV number together with the CU number is a unique disk identifier in a disk array.
Example
# pairdisplay –g testdg –fx –CLI Group PairVol L/R Port# TID LU Seq# LDEV# P/S Status Fence Seq# P-LDEV# M testdg disk_00a_301 L CL2-N 3 4 30061 00a P-VOL PAIR NEVER 30071 301 ­testdg disk_00a_301 R CL2-N 5 1 30071 301 S-VOL PAIR NEVER - 00a -
• HORCM_INST
The remote RAID Manager XP instances part defines which remote system can be used to request information of the device group. For most failover operations, the remote RAID Manager XP instance is not necessary. However, it is used for pair consistency checks and considered important. A remote instance should be configured for each network available between the cluster nodes. The first and preferred network RAID Manager XP instances should communicate with each other in the cluster heartbeat network.
92 HP StorageWorks Cluster Extension XP User Guide

Network considerations

Since RAID Manager XP is an essential resource to Cluster Extension XP, it is highly recommended that you provide reliable network connections for RAID Manager XP communications. It is also recommended to use the heartbeat network (private network) for RAID Manager XP communications. As with the heartbeat network, alternative network paths are highly recommended. RAID Manager XP can be configured for the networks it uses for each device group within the HORCM_INST part of the RAID Manager XP configuration file.

Command device considerations

At least one command device must be configured for RAID Manager XP. RAID Manager XP offers the same command device being accessed by redundant paths. This feature should be used to prevent Cluster Extension XP from aborting if a single access path to the command device is missing.
Recommendation Set up a second command device to provide an alternative control to the
paired disks.
Caution If you use Auto Path for AIX to enable alternative pathing on IBM AIX
together with the XP disk array, RAID Manager XP does not support Auto Path virtual paths for command devices.
RAID Manager XP dependencies 93

Start and stop the RAID Manager XP instances

The RAID Manager XP instances configured to be used for Cluster Extension XP should be started at system boot time to provide fastest access to disk status information.
Cluster Extension XP provides scripts (Linux/UNIX) or a service (Windows) to integrate RAID Manager XP instance startup into the system startup process. However, if the system cannot automatically start and monitor RAID Manager XP instances, RAID Manager XP can be started and stopped by executing the following commands:
Linux/UNIX horcmstart.sh instance_numbers
horcmshutdown.sh instance_numbers
Windows horcmstart instance_numbers
horcmshutdown instance_numbers
Starting RAID Manager XP without specifying an instance number will start instance 0 with the associated horcm.conf file. Zero (0) is not recommended as an instance number for a Cluster Extension XP RAID Manager XP instance.
94 HP StorageWorks Cluster Extension XP User Guide

Takeover basic functionality test

After RAID Manager XP has been configured for the device groups used by Cluster Extension XP, each device group must be verified to failover correctly between the disk arrays from each server in the cluster. Therefore, the device group must be in PAIR state already.
Caution RAID Manager XP keeps configuration data of the XP disk array in system
memory. Therefore, you must stop and restart RAID Manager XP instances on all systems if a configuration change has been applied to any of the involved XP disk arrays.
To test the correct failover and failback behavior, log in to each system used with Cluster Extension XP and invoke the following commands if the local disk is the secondary (SVOL) disk:
Linux/UNIX export HORCMINST=instance_number
pairdisplay –g device_group_name –fx –CLI horctakeover –g device_group_name [ –t timeout ]
Windows set HORCMINST=instance _number
pairdisplay –g device_group_name –fx –CLI horctakeover –g device_group_name [ –t timeout ]
The output of the pairdisplay command indicates whether the local disk is the secondary (SVOL) disk and if so, the horctakeover command shows a SWAP-takeover as a result. If pairdisplay shows the local disk as primary (PVOL) disk, log in to a system connected to the secondary (SVOL) disk and invoke the horctakeover command there. If the horctakeover command does not result in a SWAP-takeover, refer to “Recovery procedures” (page 225) and “Troubleshooting” (page 203) to resolve the issue.
The –t option of the horctakeover command is only used for fence level ASYNC.
RAID Manager XP dependencies 95
96 HP StorageWorks Cluster Extension XP User Guide

Integration with HACMP

Cluster Extension XP is integrated with the HACMP cluster software using the standard customization scheme provided by HACMP. This allows cluster administrators to configure the disk array-specific failover behavior as pre-event of the standard HACMP event get_disk_vg_fs.
Related information For information about how to install Cluster Extension XP, see HP
StorageWorks Cluster Extension XP: Installation Guide.
See the readme file on the product CD for supported configurations.
5
Integration with HACMP 97

Configuring resources

The Cluster Extension XP objects must be configured using a user configuration file.
The Cluster Extension XP resource gathers all necessary information about the disk arrays if a resource group is brought online.
If configured, a pair/resync monitor is started to monitor the Cluster Extension XP resource. To use this monitor, HACMP must call a pre-event for the standard HACMP event release_vg_fs.
The Cluster Extension XP binary clxhacmp is called as a pre-event of the standard HACMP event get_disk_vg_fs in order to check the status of the RAID Manager XP device group and if necessary takes appropriate actions to allow access to these disks before HACMP is trying to access the disks of the particular resource group.
98 HP StorageWorks Cluster Extension XP User Guide

Procedure for HACMP

To integrate Cluster Extension XP into HACMP:
1. Create a new Custom Cluster Event.
#smitty hacmp
Choose Cluster Configuration Cluster Resources Cluster Events Define Custom Cluster Events Add a Custom Cluster Event.
2. Enter values:
Cluster Event Name: get_disk_vg_fs_pre
Cluster Event Description: Cluster Extension XP
Cluster Event Script File: /opt/hpclx/bin/clxhacmp
Integration with HACMP 99
3. Configure the previously defined Custom Cluster Event as pre-event of
get_disk_vg_fs.
#smitty hacmp
4. Choose Cluster Configuration Cluster Resources Cluster Events Change/Show Cluster Events.
Select event get_disk_vg_fs.
Define the previously defined custom event get_disk_vg_fs_pre as a pre-event of get_disk_vg_fs.
Cluster Extension XP controls the disk pairs based on RAID Manager XP device groups. The volume group definition of the HACMP resource group is used to determine the corresponding RAID Manager XP device group. The mapping of the HACMP volume group configuration and the corresponding RAID Manager XP device group is done by the Cluster Extension XP user configuration file /etc/opt/hpclx/config/UCF.cfg. Because of this mapping mechanism, you must specify the volume groups owned by the HACMP resource groups in the user configuration file.
100 HP StorageWorks Cluster Extension XP User Guide
Loading...