the implied warranties of merchantability and fitness for a particular purpose. Hewlett-Packard shall not be liable for
errors contained herein or for incidental or consequential damages in connection with the furnishing, performance,
or use of this material.
This document contains proprietary information, which is protected by copyright. No part of this document may be
photocopied, reproduced, or translated into another language without the prior written consent of Hewlett-Packard.
The information contained in this document is subject to change without notice.
All product names mentioned herein may be trademarks of their respective companies.
Hewlett-Packard Company shall not be liable for technical or editorial errors or omissions contained herein. The
information is provided “as is” without warranty of any kind and is subject to change without notice. The warranties
for Hewlett-Packard Company products are set forth in the express limited warranty statements accompanying such
products. Nothing herein should be construed as constituting an additional warranty.
Printed in the U.S.A.
HP StorageWorks Cluster Extension XP User Guide
product version: 2.05.00
sixth edition (August 2004)
part number: T1609-96004
2HP StorageWorks Cluster Extension XP user guide
Contents
About this guide9
Intended Audience10
Disk array firmware and software dependencies10
Related information12
Terminology13
Conventions14
HP storage website15
HP authorized reseller15
Revision history15
Warranty statement17
1Cluster Extension XP features19
Integration into cluster software20
Disaster tolerance through geographical dispersion21
Automated redirection and monitoring of mirrored Continuous Access XP
pairs29
Rolling disaster protection30
What is a rolling disaster?30
Recovering the disaster tolerant environment31
Command-line interface for easy integration33
Graphical user interface34
Quorum service (Microsoft Cluster service only)35
2Cluster Extension XP processes and components37
Cluster Extension XP environments38
Cluster Extension XP execution40
Contents3
Continuous Access XP and RAID Manager XP41
RAID Manager XP instances43
RAID Manager XP device groups44
Rolling disaster protection and Business Copy XP45
Integration with RAID Manager XP46
Integration with automatic recovery47
Integration with the pair/resync monitor47
Restoring server operation48
Example48
User configuration file50
Pair/resync monitor51
Force flag53
Pre-execution and post-execution programs54
Cluster Extension XP log facility57
Error return codes58
Quorum service for Microsoft Cluster service59
Using the quorum service in a Microsoft Cluster service
environment59
Quorum processes60
3User configuration file and Cluster Extension XP objects65
Start and stop the RAID Manager XP instances94
Takeover basic functionality test95
5Integration with HACMP97
Configuring resources98
Procedure for HACMP99
User configuration file for HACMP101
4HP StorageWorks Cluster Extension XP User Guide
Bringing a resource group online104
Taking a resource group offline105
Deleting Cluster Extension XP106
Pair/resync monitor integration107
Timing considerations110
Failure behavior112
Restrictions for IBM HACMP with Cluster Extension XP113
6Integration with Microsoft Cluster service115
Configuring the quorum service116
Configuring Cluster Extension XP resources117
Resource group and resource names118
Cluster Extension XP resource-specific parameters118
Setting non-Cluster Extension XP resource-specific parameters119
Adding a Cluster Extension XP resource123
Changing Cluster Extension XP resource properties125
Advanced properties127
Changing a resource name128
Adding dependencies on a Cluster Extension XP resource129
Bringing a Cluster Extension XP resource online131
Taking a Cluster Extension XP resource offline133
Deleting a Cluster Extension XP resource134
Pair/resync monitor integration135
Timing considerations for Microsoft Cluster service137
Failure behavior with Microsoft Cluster service139
Bouncing Resource Groups139
Unexpected offline conditions139
Restrictions for Microsoft Cluster service with Cluster Extension XP141
Disaster-tolerant configuration example using a file share142
Administration147
7Integration with VCS149
Configuration of the Cluster Extension XP agent150
Configuring the Cluster Extension XP resource154
Cluster Extension resource types154
Resource type definition155
Adding a Cluster Extension XP resource156
Changing Cluster Extension XP attributes158
Linking a Cluster Extension XP resource160
Bringing a Cluster Extension XP resource online161
Contents5
Taking a Cluster Extension XP resource offline163
Deleting a Cluster Extension XP resource164
Pair/resync monitor integration165
Timing considerations for VCS166
Enable/disable service groups168
Restrictions for VCS with Cluster Extension XP169
Unexpected offline conditions171
8Integration with Serviceguard for Linux173
Configuration of the Cluster Extension XP environment174
Adding a Cluster Extension XP integration to an existing Serviceguard
package180
Starting a Serviceguard package with Cluster Extension XP181
Halting a Serviceguard package with Cluster Extension XP182
Deleting Cluster Extension XP from a Serviceguard package183
Pair/resync monitor integration185
Timing considerations for Serviceguard188
9Command-line interface (CLI)191
Configuring the CLI193
Creating the Continuous Access environment and configuring
RAID Manager193
Timing considerations193
Restrictions for customized Cluster Extension XP
implementations195
Creating and configuring the user configuration file195
Microsoft Cluster service-specific error handling211
Solving quorum service problems211
Resource start errors213
Failover errors214
VCS-specific error handling216
Start errors216
Failover errors217
Serviceguard (SG-LX)-specific error handling220
Start errors220
Failover errors220
Pair/resync monitor messages in syslog/errorlog/messages/Event
Log222
ARecovery procedures225
XP disk pair states226
Recovery sequence228
Quorum service recovery (Microsoft Cluster service only)230
Single site failure recovery230
Failure recovery if both sites have failed232
Procedure for quorum service system cleanup233
BCluster Extension XP resource message catalog235
CCluster Extension XP quorum service message catalog261
Quorum service Event Log messages266
Glossary269
Index271
Contents7
8HP StorageWorks Cluster Extension XP User Guide
About this guide
This guide provides information about using and configuring HP
StorageWorks Cluster Extension XP in an environment where clustered
systems are connected to a disaster recovery array-based mirroring
solution. Cluster Extension XP allows creation of dispersed multiplatform
cluster configurations with the XP disk array. Cluster Extension XP enables
cluster software to automatically failover applications where data is stored
and continuously mirrored from a local to a remote disk array using HP
StorageWorks Continuous Access XP. This guide describes the options you
have to make your disaster tolerant environment as robust as possible to
keep your data available at all times.
Because the XP family disk arrays supports a broad range of operating
systems and cluster software, Cluster Extension XP can be integrated with
almost any disk array-supported cluster software. This guide provides you
with the information you need to create a two or more data center disaster
tolerant environment utilizing the XP disk array and its Continuous Access
XP remote mirroring feature.
Unless otherwise noted, the term disk array refers to these disk arrays:
HP Surestore Disk Array XP512
HP Surestore Disk Array XP48
HP StorageWorks Disk Array XP128
HP StorageWorks Disk Array XP1024
HP StorageWorks XP12000 Disk Array
About this guide9
Intended Audience
This guide is intended for system administrators who maintain the cluster
environment and storage subsystems and have the following knowledge:
• A background in data processing and direct-access storage device
subsystems and their basic functions.
• Familiarity with disk arrays and RAID technology.
• Familiarity with the operating system, including commands and
utilities.
• A general understanding of cluster concepts and the cluster software
used in the data center environment.
• Familiarity with related disk array software programs:
HP StorageWorks Continuous Access XP
HP StorageWorks RAID Manager XP
Disk array firmware and software dependencies
The features and behavior of failover operations depend on the XP
firmware and RAID Manager XP versions. This guide describes Cluster
Extension XP behavior based on features implemented in the latest XP
firmware and RAID Manager XP versions.
10HP StorageWorks Cluster Extension XP User Guide
Related information
For information about the disk arrays, please refer to the owner’s manuals.
For related product documentation, see the HP web site (
HP StorageWorks RAID Manager XP: User’s Guide
HP StorageWorks Continuous Access XP: User’s Guide
HP StorageWorks Business Copy XP: User’s Guide
HP StorageWorks Command View XP: User’s Guide
HP StorageWorks Disk Array XP Operating System Configuration
Guide: IBM AIX
HP StorageWorks Disk Array XP Operating System Configuration
Guide: Sun Solaris
HP StorageWorks Disk Array XP Operating System Configuration
Guide: Windows 2000/2003
• HP StorageWorks Disk Array XP Operating System Configuration
Guide: Linux
For information about Serviceguard for Linux, see the HP High Availability
web site:
docs.hp.com/hpux/ha/
For information about RS/6000 and HACMP, see the IBM web site:
www.hp.com
):
www.rs6000.ibm.com/aix/library
For VERITAS Cluster Server information, see the VERITAS web site:
support.veritas.com
About this guide11
Terminology
For Microsoft Cluster service information, see the Microsoft web site:
This guide uses terminology to describe cluster-specific and disaster
recovery-specific processes. Vendors of cluster software use different terms
for the components of their cluster software. To standardize the usage
among vendors, this guide uses the following terms:
application service This is the unit of granularity for a failover or failback
operation. It includes all necessary resources that must
be present and which the application depends on. For
example, a file share must have a disk, a mount point
(or drive letter) and an IP address to be considered an
application service. A disk is a necessary resource for
the application service. Depending on the cluster
software, application services can depend on each other
and run in parallel on the same system or on different
systems.
Vendor equivalent terms
VCS: service group
HACMP: resource group
Microsoft Cluster service: resource group
SG-LX (Serviceguard): package
resourceThe smallest unit in an application service. It describes
the necessary parts to build an application service. The
implementation of such resources in cluster software is
vendor-specific. Some vendors (such as IBM or HP) do
not allow accessing the chains between dependent
resources.
Vendor equivalent terms
VCS: resource
HACMP: resource group
12HP StorageWorks Cluster Extension XP User Guide
Conventions
Microsoft Cluster service: resource
SG-LX (Serviceguard): package
startup
shutdownStartup and shutdown are also known as “bringing
online” and “taking offline,” or “start” and “stop,” or
“run” and “halt” in regards to an application service or
resource. Only a few cluster software vendors (such as
Veritas or Microsoft) offer starting and stopping of
single resources.
This guide uses the following text conventions.
Figure 1Blue text represents a cross-reference. For the online
version of this guide, the reference is linked to the
target.
www.hp.com
Underlined, blue text represents a website on the
Internet. For the online version of this guide, the
reference is linked to the target.
literalBold text represents literal values that you type exactly
as shown, as well as key and field names, menu items,
buttons, file names, application names, and dialog box
titles.
variable
Italic type indicates that you must supply a value. Italic
type is also used for manual titles.
input/outputMonospace font denotes user input and system
responses, such as output and messages.
ExampleDenotes an example of input or output. The display
shown in this guide may not match your configuration
exactly.
[ ]Indicates an optional parameter.
About this guide13
{ }Indicates that you must specify at least one of the listed
|Separates alternatives in a list of options.
HP storage website
For the most current information about HP StorageWorks XP products,
visit the support website. Select the appropriate product or solution from
this website:
For information about product availability, configuration, and connectivity,
consult your HP account representative.
HP authorized reseller
For the name of your nearest HP authorized reseller, you can obtain
information by telephone:
November 2001Added quorum filter-service for MSCS on
XP512/XP48.
May 2002Updated content for version 1.03 of all Cluster
Extension products.
14HP StorageWorks Cluster Extension XP User Guide
Updated content for version 1.04.00 of Cluster
Extension for MSCS.
Added support for Serviceguard on Linux.
Updated content for version 1.1 of Cluster Extension
XP quorum service with external arbitrator.
September 2002Updated content for version 2.00.
Changed product terminology from MSCS to Microsoft Cluster service.
Added arguments for clxchkmon.
Changed LogLevel values.
Changed Windows log file directory location.
Added message catalog.
December 2002Updated content for version 2.01 for VCS and
Serviceguard.
Added rolling disaster protection features.
Added GUI features.
January 2003Updated content for version 2.01 for Windows GUI.
April 2003Updated content for version 2.02.
Added “Cluster Extension XP quorum service message
catalog” (page 261).
November 2003Updated for versions 2.02 and 2.03. Added SUSE
Linux and Windows 2003 support. Removed XP256.
Changed MC/ServiceGuard to Serviceguard.
March 2004Modified document for version 2.04.00.
August 2004New format applied. Modified document for version
2.05.00
About this guide15
Warranty statement
HP warrants that for a period of ninety calendar days from the date of
purchase, as evidenced by a copy of the invoice, the media on which the
Software is furnished (if any) will be free of defects in materials and
workmanship under normal use.
DISCLAIMER
EXTENT ALLOWED BY LOCAL LAW, THIS SOFTWARE IS
PROVIDED TO YOU “AS IS” WITHOUT WARRANTIES OF ANY
KIND, WHETHER ORAL OR WRITTEN, EXPRESS OR IMPLIED.
HP SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES
OR CONDITIONS OF MERCHANTABILITY, SATISFACTORY
QUALITY, NON-INFRINGEMENT, TITLE, ACCURACY OF
INFORMATIONAL CONTENT, AND FITNESS FOR A
PARTICULAR PURPOSE. Some jurisdictions do not allow exclusions of
implied warranties or conditions, so the above exclusion may not apply to
you to the extent prohibited by such local laws. You may have other rights
that vary from country to country, state to state, or province to province.
WA R NI N G
THAT USE OF THE SOFTWARE IS AT YOUR SOLE RISK. HP
DOES NOT WARRANT THAT THE FUNCTIONS CONTAINED IN
THE SOFTWARE WILL MEET YOUR REQUIREMENTS, OR THAT
THE OPERATION OF THE SOFTWARE WILL BE UNINTERRUPTED,
VIRUS-FREE OR ERROR-FREE, OR THAT DEFECTS IN THE
SOFTWARE WILL BE CORRECTED. THE ENTIRE RISK AS TO THE
RESULTS AND PERFORMANCE OF THE SOFTWARE IS ASSUMED
BY YOU. HP DOES NOT WARRANT OR MAKE ANY
REPRESENTATIONS REGARDING THE USE OR THE RESULTS OF
THE USE OF THE SOFTWARE OR RELATED DOCUMENTATION IN
TERMS OF THEIR CORRECTNESS, ACCURACY, RELIABILITY,
CURRENTNESS, OR OTHERWISE. NO ORAL OR WRITTEN
INFORMATION OR ADVICE GIVEN BY HP OR HP’S AUTHORIZED
REPRESENTATIVES SHALL CREATE A WARRANTY.
. EXCEPT FOR THE FOREGOING AND TO THE
! YOU EXPRESSLY ACKNOWLEDGE AND AGREE
16HP StorageWorks Cluster Extension XP User Guide
LIMITATION OF LIABILITY. EXCEPT TO THE EXTENT
PROHIBITED BY LOCAL LAW, IN NO EVENT INCLUDING
NEGLIGENCE WILL HP OR ITS SUBSIDIARIES, AFFILIATES,
DIRECTORS, OFFICERS, EMPLOYEES, AGENTS OR
SUPPLIERS BE LIABLE FOR DIRECT, INDIRECT, SPECIAL,
INCIDENTAL, CONSEQUENTIAL, PUNITIVE OR OTHER
DAMAGES (INCLUDING LOST PROFIT, LOST DATA, OR
DOWNTIME COSTS), ARISING OUT OF THE USE, INABILITY
TO USE, OR THE RESULTS OF USE OF THE SOFTWARE,
WHETHER BASED IN WARRANTY, CONTRACT, TORT OR
OTHER LEGAL THEORY, AND WHETHER OR NOT ADVISED
OF THE POSSIBILITY OF SUCH DAMAGES. Your use of the
Software is entirely at your own risk. Should the Software prove defective,
you assume the entire cost of all service, repair or correction. Some
jurisdictions do not allow the exclusion or limitation of liability for
incidental or consequential damages, so the above limitation may not apply
to you to the extent prohibited by such local laws.
NOTE
. EXCEPT TO THE EXTENT ALLOWED BY LOCAL LAW,
THESE WARRANTY TERMS DO NOT EXCLUDE, RESTRICT OR
MODIFY, AND ARE IN ADDITION TO, THE MANDATORY
STATUTORY RIGHTS APPLICABLE TO THE LICENSE OF THE
SOFTWARE TO YOU; PROVIDED
, HOWEVER, THAT THE
CONVENTION ON CONTRACTS FOR THE INTERNATIONAL
SALE OF GOODS IS SPECIFICALLY DISCLAIMED AND SHALL
NOT GOVERN OR APPLY TO THE SOFTWARE PROVIDED IN
CONNECTION WITH THIS WARRANTY STATEMENT.
About this guide17
18HP StorageWorks Cluster Extension XP User Guide
1
Cluster Extension XP features
The quest to extend high availability over geographically dispersed
locations has driven today’s IT personnel to demand cluster solutions
capable of recovering from even the most extensive disasters. HP
StorageWorks Cluster Extension XP enables you to monitor HP
StorageWorks Continuous Access XP-mirrored disk pairs and allows
access to the remote data copy if the application becomes unavailable on
the local site. If the application service is restarted on the remote site, after
the local (primary) application service has been shut down, Cluster
Extension XP uses its internal database to check whether the current disk
states allow automatic access to your data based on consistency and
concurrency considerations. Integrated in the cluster software or available
as command-line interface for your own integration, Cluster Extension XP
ensures that the data can be accessed if necessary.
Cluster Extension XP software provides these key features:
• integration into cluster software
• disaster tolerance through geographical dispersion
• automated redirection and monitoring of mirrored Continuous Access
XP pairs
• command-line interface for easy integration
Cluster Extension XP features19
Integration into cluster software
The value of Cluster Extension XP is to provide tight integration into the
cluster software, wherever possible. Cluster Extension XP is a resource to
the clustered application service (like the disk or volume group) and must
therefore be managed as such. The architecture of Cluster Extension XP
allows integration into many cluster software products, including these:
• VERITAS Cluster Server (VCS)
• IBM HACMP
• Windows 2000 Advanced Server and Datacenter Server Cluster
service
• Windows Server 2003 Enterprise Edition and Datacenter Edition
• Serviceguard for Linux (SG-LX)
For the current list of supported cluster software, contact your HP
representative.
20HP StorageWorks Cluster Extension XP User Guide
Disaster tolerance through geographical dispersion
Using two or more disk arrays with Continuous Access XP allows you to
copy your most valuable data to a remote data center. Cluster Extension XP
provides the cluster software with a mechanism to check and allow data
access (in case the local application service must be transferred to a remote
cluster system). The distance to your remote location is only limited by the
technology your cluster software uses to communicate with each system in
the cluster, the technology you use for physical data replication, and the
degree of failover automation.
Disaster tolerance considerations
Application availability is essential for today’s businesses. The capability
to restore the application service after a failure of the server, storage or the
whole data center in a timely fashion is a must and is considered as disaster tolerance. Complete data center failures can be caused by earthquakes or
hurricanes but more often they are caused by power outages or fires.
To protect against such disasters, a single data center is not sufficient.
Systems (storage as well as the servers) must be geographically distributed
in order to build a disaster tolerant architecture which protects against
planned and unplanned downtimes.
Of course, redundant network cards and storage host bus adapters are a
basic requirement. The same applies for the power supplies of both the
storage and the server. With this hardware in place, the external power
service and the network must also be designed to provide no single point of
failure (SPOF).
Today, data is the most valuable asset in your enterprise. The XP family of
disk arrays provides a fully redundant architecture, and the flexibility to
upgrade firmware online reduces the risk of unplanned and planned
downtime. The disk array also provides the feature of remotely mirroring
your data to a second disk array.
Cluster Extension XP features21
To have this expensive hardware in place must be compared to the risk of a
true disaster. The costs pay off in a real disaster to ensure that the business
critical applications are still accessible from another location.
GuidelinesThe following considerations, applied to the cluster environment, can
ensure an application service survives a disaster with minimal downtime.
• geographical dispersion of hardware and applications
• redundant paths to access the network and storage
• alternative power sources
• redundant networks
• data replication
Several ways of implementing such disaster tolerant architectures are
possible. All of those solutions can be covered by a clustered solution using
the XP family of disk arrays and Continuous Access XP. Cluster Extension
XP is needed to enable access to your critical data.
Disaster-tolerant architectures
With Cluster Extension XP, you can extend your cluster solution beyond
the limitations of existing data center and campus-wide distances. Cluster
Extension XP enables metropolitan-wide failover capabilities, and beyond.
Having a local disk array in each data center also means that the server does
not have to write twice because Continuous Access XP mirrors each
write-IO to the remote site and therefore relieves the server of the burden,
preventing performance bottlenecks.
Disaster tolerant architectures using data replication over the
network
Data replication over the network is a way to achieve disaster tolerance and
is considered logical replication.
22HP StorageWorks Cluster Extension XP User Guide
Write--
IO
Write
IO
Replication
Replication
WAN
(IP,ATM,
T3,
DWDM)
data center
in
San Francisco
Figure 1. Logical replication over networks
data center
in
Ne w Y ork
Logical replication uses specific host-based software to write data to local
disks and also to replicate that data to a remote system connected to an
attached storage device. Because data is replicated over the network, there
is no distance limitation for such solutions.
Logical replication techniques imply that the failover process is mainly
manual. This means each site belongs to a different cluster, or only the
primary site is clustered, while the secondary site acts as a standby system.
It is also possible that no cluster software is involved and that only one
system is available at each site.
Data replicated over the network can be at the granularity of a single
volume, a file system, or a transaction.
All logical replication techniques have some significant disadvantages: The
remote system is a standby system. That is, it must perform the same task
as the primary system and cannot be used for any other purpose. If the
standby system is activated, it must replay redo logs first and cannot
automatically serve as a replication source (for example, Oracle’s standby
database implementation).
Another significant disadvantage of such architectures is that the server
must write every IO twice, once to the attached storage device and once to
Cluster Extension XP features23
the remote system over the network. These replication techniques can only
be implemented asynchronously; otherwise, the application experiences
noticeable performance degradation.
Because of the nature of replication products, additional CPU power is
necessary to mirror write requests.
Logical replication implies that all logs, which have not been shipped (or
which are in transit) are lost in case of a disaster.
Disaster-tolerant architectures using Fibre Channel networks
Disaster tolerant architectures using Fibre Channel networks can be
achieved by the use of physical replication.
Write--
IO
Write
IO
Replication
Replication
data c ente r
in
building 1
Figure 2. Physical replication using Fibre Channel
data center
in
building 2
As with logical replication products, physical replication often uses
host-based software to replicate data. Here, data is written to
server-attached storage devices twice. Most of today’s logical volume
management products offer this feature.
Using Fibre Channel, you could use dual-attached storage devices, where
one port is connected to the local server and one is connected to the remote
server. To be able to access your data at the remote location in case of a
disaster, each server must have a local and a remote storage device. The
24HP StorageWorks Cluster Extension XP User Guide
volume management software, then, must be set up to mirror each write
request to both the local and the remote storage device. With the XP disk
arrays, several servers can be connected to each disk array.
This solution is called campus cluster. A single cluster can be used and the
failover process can be automated. With campus clusters, both sites can be
active.
Data replicated via volume mirroring is based on the granularity of a single
volume.
Campus cluster solutions are limited to the distance Fibre Channel supports
today. While storage systems must be in a range of 500 meters (direct
connect) or up to 10 kilometers (connected via Fibre Channel switches or
Fibre Channel hubs), campus cluster solutions can only offer limited
protection against natural disasters.
Another limiting factor is the cluster heartbeat protocol or the
communications protocol used for cluster reformation processes. Those
protocols are vendor-specific implementations and require private
networks. This means, those protocols are not routable. The distance
limitations of a private network depends on the supplied network
infrastructure and latency issues of the heartbeat or cluster reformation
protocol.
Another significant disadvantage of such architectures is that the server
must write every IO twice: once to the locally attached storage device and
once to the remote attached storage device. These replication techniques
are implemented as software running on the server, which reduces the
available compute power and degrades server performance.
Because of the nature of volume mirroring products, additional CPU power
is necessary to mirror write requests across two host bus adapters.
Most of these products have another significant disadvantage. In case of a
path failure, the whole volume must be copied to resynchronize the second
volume with the current state of the first volume. If the storage device must
be replaced, all volumes must be copied. This significantly affects server
performance.
Cluster Extension XP features25
Disaster-tolerant architectures using disk array-based mirroring
Using Continuous Access XP-based mirroring is also considered physical
replication. Continuous Access XP is disk array-based mirroring. As with
campus clusters, such solutions require two or more disk arrays.
The key difference from the above-mentioned solutions is that the disk
array keeps track of the data integrity of the mirrored disks. XP disk arrays
offer RAID-1 or RAID-5 protection as a standard feature and allow online
addition and replacement of disks, IO adapter cards, and memory. To
provide copies of data, internal and external mirroring features are
available. For disaster tolerant solutions, Continuous Access XP can mirror
your data with no distance limitation.
ESCON or Fibre Channel protocol is used to transfer data between two disk
arrays. Using converters, ESCON and Fibre Channel can be routed over IP
networks and T3 to allow unlimited distance between the disk arrays. To
replicate data over more than 0.5 km (Fibre Channel) or 3km (ESCON),
special extenders or switches must be purchased.
The cluster solutions using Continuous Access XP-based disk mirroring are
called metropolitan clusters or geographically dispersed clusters. Servers
are members of the same cluster dispersed over two or more sites. Since the
disk array controls the replication process, the server is relieved from
writing any IO-request to the disk more than once.
Continuous Access XP-mirrored disks typically have a read/write-enabled
primary disk and a read-only secondary disk. This leads to problems
because current cluster software products cannot distinguish between
write-protected and write-enabled disks.
Cluster software assumes that the application service has access to
read/write-enabled data disks on any system that the application service has
been configured to run. Since the secondary volume of a disk pair is not
normally accessible, the failover process would typically involve manual
intervention.
26HP StorageWorks Cluster Extension XP User Guide
MAN
MAN
Write--
IO
Write--
IO
Write
IO
Write
IO
Replication
Replication
Replication
Replication
data center
data center
in
in
Manhattan
Manhattan
MAN
(ESCON,IP
(ESCON,IP
(ESCON,IP
ATM,
ATM,
ATM,
DWDM)
DWDM)
DWDM)
continuous access xp
continuous access xp
(extension)
(extension)
data center
data center
in
in
Brooklyn
Brooklyn
Figure 3. Physical replication using HP StorageWorks Continuous Access XP
Cluster Extension XP provides the software to enable automated failover
and failback procedures integrated as a resource of the application service.
Cluster Extension XP uses an internal database to decide whether the data
on the failover site is safe to be accessed or not. Manual intervention is
required if the current disk array states and the user settings conflict with
the rules stored in the Cluster Extension XP internal database.
The limiting factor of metropolitan or geographically dispersed clusters is
the cluster heartbeat protocol or the communications protocol used for the
cluster reformation processes. Those protocols are vendor-specific
implementations and typically require private networks. These protocols
are not routable; a router cannot be used. The distance limitations of
networks supporting a private network are dependent on the supplied
network infrastructure and latency issues of the heartbeat or cluster
reformation protocol.
To address these issues, cluster manager software can be used. This
software offers disaster tolerance by managing two or more clusters from a
single console or server and is considered a continental cluster. Depending
on the implementation, automated or semiautomated failover processes
between clusters are possible.
Cluster Extension XP features27
As mentioned above, metropolitan or geographically dispersed clusters as
well as continental clusters require metropolitan area networks or wide area
networks. In most cases, those network connections involve common
carriers and special network equipment which can be very expensive. The
reliability of a direct connection or a campus network can be degraded and
involves more planning to deploy and maintain a disaster tolerant
environment.
Using Continuous Access XP, data is accessible and consistent in every
failover case and the resynchronization of a completely failed disk array
can be done while the application is running with almost no impact to the
server performance. This allows reestablishing disaster tolerance without
application downtime.
28HP StorageWorks Cluster Extension XP User Guide
Automated redirection and monitoring of mirrored
Continuous Access XP pairs
Disk arrays with Continuous Access XP provide a unique feature that
allows the redirection of the mirroring destination. This means Continuous
Access XP almost instantaneously swaps the primary/secondary
relationship of disk pairs if the application must access the secondary disk.
This feature ensures that the disk pairs are always synchronized, ensuring
that the failback process is as fast as the failover process. If the links
between your disk arrays are broken, each array maintains a bitmap table to
synchronize the changed, delta data if the links become available again. In a
failover case, Cluster Extension XP takes the appropriate action for each
link/array status and makes sure that your application service has the latest
data.
Cluster Extension XP includes a pair/resync monitor to monitor the health
of the links between your arrays. Furthermore, it detects a lost and later
reestablished link and automatically resynchronizes the suspended disk
pairs, ensuring the most current data is available on either site.
Cluster Extension XP features29
Rolling disaster protection
Rolling disaster protection minimizes the impact of downtime and ensures
data integrity during recovery operations. Rolling disaster protection
combines Continuous Access XP remotely mirrored disk pairs and internal
Business Copy XP disk copies to protect data locally as well as remotely. In
combination, these features support the highest data protection levels to
prevent disastrous loss of data.
What is a rolling disaster?
A rolling disaster refers to catastrophic events or outages that affect the data
stored on remote mirrored disk pairs. In a rolling disaster, data stored on
remote mirrored disk pairs can be entirely lost during a recovery attempt.
In a rolling disaster, the mirrored disk pairs typically experience the
following sequence of events:
1. The primary data center failed.
The cluster software successfully transferred application execution to
the remote data center.
2. The Continuous Access XP link failed.
3. The secondary volume of the disk pair is used to continue operation after
failover while the CA link is not functional.
The secondary volume represents the latest state of data, whereas the
data on the primary volume is now out of date.
4. The primary data center is recovered and the Continuous Access XP
link is restored.
30HP StorageWorks Cluster Extension XP User Guide
5. A recovery operation is initiated to resynchronize (update) the original
(primary) disk from the secondary disk.
This is known as a restore operation after a disaster, or a restore after
failover operation.
The resynchronization/restore operation can take minutes to days
depending on the amount of data that must be updated and transferred
between the two disk arrays.
During the recovery operation, data is vulnerable to the effects of a
disaster or outage. During a resynchronization operation, data updates
are sent in the order of changed tracks and not in the transactional
order in which the data was originally written or acknowledged.
6. The secondary site fails during the resynchronization/restore operation.
The restored data at the original, primary site becomes unusable.
Although resynchronization operations are possible while an application is
running, resynchronization could lead to unrecoverable data if a rolling
disaster occurs. This type of rolling disaster can occur in the following
circumstances:
• during manual resynchronization attempts
• during failover operations using Cluster Extension XP in a cluster
environment when the Cluster Extension XP AutoRecover object is
set to yes, or where the pair/resync monitor is used with the
ResyncMonitorAutoRecover object set to yes.
Recovering the disaster tolerant environment
To ensure survival of critical data during a resynchronization/restore
operation, Cluster Extension XP supports the use of preconfigured
Business Copy disks and allows suspending any number of Business Copy
pairs that can be associated with the primary data disks. Cluster Extension
XP recovers automatically, provided that at least one internal Business
Copy mirror could be suspended to guarantee a recoverable state.
Cluster Extension XP also resumes internal Business Copy mirrors
automatically, if specified, to allow the local site to keep an up-to-date
image of the data.
Cluster Extension XP features31
This internal copy represents the state of the primary volume before the
data center failure. This copy is needed to survive a possible failure of the
secondary volume or disk array during the resynchronization operation.
Although the data could be out of date, it represents the best starting point
for the recovery effort, unlike the inconsistent data that results from a
rolling disaster.
Recovery from a consistent, point-in-time copy ensures the integrity of data
and eliminates the need for full tape restore procedures. Rolling disaster
protection provides a rapid recovery method and so minimizes downtime.
Figure 5 (page 49) illustrates an example of a disaster-tolerant
configuration.
To implement rolling disaster protection, see “Rolling disaster protection
and Business Copy XP” (page 45).
32HP StorageWorks Cluster Extension XP User Guide
Command-line interface for easy integration
Cluster Extension XP provides you with a command line interface to enable
disaster tolerant environments even if no cluster software is available for
your operating system. This feature is convenient if you use
in-house-developed software to migrate application services from one
system to another or if you want Cluster Extension XP to check the disk
states to make sure you can automatically start your application service on
the local disk array.
Cluster Extension XP features33
Graphical user interface
Cluster Extension XP for Microsoft Cluster service and VCS can be
configured with the cluster software GUI. Both cluster software products
provide a graphical user interface to set and change resource values. Cluster
Extension XP offers full integration into the GUI so that you can utilize the
capacity of your cluster software.
34HP StorageWorks Cluster Extension XP User Guide
Quorum service (Microsoft Cluster service only)
Microsoft Cluster service depends on the cluster quorum disk resource to
maintain a persistent log of cluster configuration changes and status, as
well as a single point to resolve any possible events that could result in a
split brain situation. The Cluster Extension XP quorum service adds an
additional dimension to disaster tolerance by remotely mirroring the
quorum disk resource, thus preventing it from being the single point of
failure.
The quorum service allows the quorum disk resource to be mirrored
between dispersed sites and supports the movement and failover of the
quorum disk between the two sites without disrupting other cluster
services. The external arbitrator (also included in Cluster Extension XP)
solves the potential split brain syndrome as well. This feature significantly
increases the availability of the critical quorum disk resource, thus reducing
the possibility of cluster failure due to the loss of the quorum disk.
The quorum service and Cluster Extension have been certified to fulfill all
requirements for Microsoft cluster. For certified configurations, see the
Microsoft web site:
www.microsoft.com/windows/catalog/server
1. Click the Hardware tab.
2. Select Cluster Solutions from the left side menu.
Cluster Extension XP is shipped in the appropriate format for each
platform:
PlatformImplementation
VCSagent
IBM HACMPpre-event executable
Microsoft Cluster
service
SG-LXfunction call/executable
Customized solutions to failover application services must implement
Cluster Extension XP through its command-line interface prior to the disk
activation procedure.
Cluster Extension XP processes and components37
resource DLL, quorum service, and external arbitrator
Cluster Extension XP environments
The ideal environment for a Cluster Extension XP configuration consists of
at least four servers (two at each site) and separated redundant
communications links for cluster heartbeats, client access and Continuous
Access XP (Extension). All communications interfaces must be installed in
pairs to serve as failover components, preventing single points of failure
(SPOFs).
RecommendationUse load balancing and alternative pathing software for host-to-storage
connections, such as HP StorageWorks Auto Path for IBM AIX or Secure
Path for Linux and Windows operating systems. For Sun Solaris operating
systems, VERITAS offers such software. These software products enable
you to upgrade XP firmware while the application service is running.
Network communications links between the dispersed data centers must be
redundant and physically routed differently. This prevents the “backhoe
issue,” that is, where all links between data centers are cut together. This is
especially important, since the cluster is more vulnerable to “split brain”
syndromes. A split brain syndrome is where both data centers’ systems
form new clusters which could allow access to both copies of the data. This
can be prevented with physically separated network links and redundant
network components. Cluster Extension XP allows you to configure the
failover behavior in such a way that the application service startup
procedure will be stopped if none of the remote cluster members can be
reached. The default configuration of Cluster Extension XP expects the
cluster software to deal with the “split brain” syndrome.
Since the disk array stores your most valuable data, this data must get
across to the remote disk array. At least four Continuous Access XP links
must be available when the disk arrays are connected directly and are
configured for bidirectional takeover. For extended distances, extender
components must be purchased. These components are able to bundle
Continuous Access XP links. At least two links are necessary to provide
redundancy and protection against single points of failure. Although
communications links can cover considerable distances, each network
segment must be extended to the dispersed data center in order to maintain
a heartbeat among all servers.
38HP StorageWorks Cluster Extension XP User Guide
RecommendationUse four systems to give local application service failover among local
cluster systems priority over remote, more time-consuming failover
procedures. When failing over, Cluster Extension XP must reconfigure the
disk arrays to change the mirroring direction. This takes more time than just
checking for the correct disk array disk states. On the remote site, two
systems should be available in the case the failover system experiences a
hardware or power failure.
Figure 2 (page 24) depicts a preferred Cluster Extension XP configuration.
CautionCluster Extension XP works with only one system at each location, with a
single I/O path between the server system and the disk array and a single
link in each direction between disk arrays.
However, those configurations are not considered highly available, nor are
they disaster tolerant. Therefore, Cluster Extension XP configurations with
single points of failure are not supported by HP.
Cluster Extension XP processes and components39
Cluster Extension XP execution
Cluster Extension XP requires cluster software to automatically fail over
and fail back among systems on a local site or between sites. Cluster
Extension XP must manipulate the application startup process before disk
array disks are activated. Cluster Extension XP, therefore, must be
integrated as first resource (in the order of resources). To activate
Continuous Access XP paired disk devices, the paired disk devices must be
in read/write mode. Continuous Access XP disks are usually in read/write
mode on the primary disk only; the secondary disk is in read-only mode. In
case of a failover, the direction of the mirrored pair is changed by Cluster
Extension XP automatically. In case of a disaster, the disk array can have
several different states for disks in a RAID Manager XP device group.
Cluster Extension XP decides whether those disks can be activated.
Cluster Extension XP must be installed on any server in the cluster that can
run the application service in the cluster.
Cluster Extension XP stores information about the application environment
in an internal object database and uses RAID Manager XP to gather
information about the state of the associated disk pairs. The information
about the configured disk array environment and failover behavior is
transferred either directly by the cluster software or by gathering from the
user configuration file.
The internal object database provides Cluster Extension XP with
knowledge about supported parameters, their formats, and default values.
Disk array disk states are stored in an internal object database and a rule
engine is used to process those disk states. The rule engine matches current
disk states and configuration parameters with a defined rule, stores it in the
database, and invokes predefined actions. Those actions prepare the disk
array disks to be activated, or it stops the application service startup process
if the matching rule requires it to do so.
40HP StorageWorks Cluster Extension XP User Guide
Continuous Access XP and RAID Manager XP
Continuous Access XP provides remote copy functionality for the disk
arrays. Disk arrays can be mirrored to many different remote disk arrays.
Cluster Extension XP does not support two disk arrays as either primary or
secondary disk arrays. Cluster Extension XP supports configurations where
two (or more) disk arrays use one remote disk array as the failover site. In
those cases, the disk array configuration can be considered as a logical
one-to-one configuration.
Figure 4 (page 41) depicts an example of a supported configuration.
App RED
App RED
Links to App RED are not
supported.
App BLUE
App BLUE
data center
in
Manhattan
Figure 4. Supported XP disk array configuration
ca xp
(extension)
data center
in
Brooklyn
To control Continuous Access XP-mirrored disks from a server, RAID
Manager XP must be installed on the server. A special disk, called a
command device, must be configured to control the paired disks. The
special disk must not be part of Microsoft Cluster service resources and
cannot be paired. The command device, which is identified by a “CM”
appended to the emulation type, can be assigned to a 36-Mbyte or greater
CVS volume. RAID Manager XP uses the command device to
communicate with the disk array controller (DKC).
Cluster Extension XP processes and components41
Using Continuous Access XP Extension, consistency groups can be
configured. Consistency groups are units in which the disk array keeps data
consistent among paired disks.
Continuous Access XP links are unidirectional links. For disaster tolerant
configurations, two links must be provided in each direction. Both sender
(RCP) and receiver (LCP) ports must be configured on each redundant IO
board used for Continuous Access XP.
Continuous Access XP offers two modes of replication:
• synchronous replication
• asynchronous replication
Synchronous replication
Using synchronous mode, all write requests from the server are first
transferred to the remote disk array. After each IO has been mirrored in the
cache area of the remote array, it is acknowledged to the local disk array.
The write request is then acknowledged to the server.
Synchronous replication modes can be configured in the following fence
levels:
NEVERAllows write requests even if the request cannot be
replicated to the remote disk array. If a write request
cannot be replicated the remote disk array, the area on
the disk is marked in a bitmap table and transferred
after a resynchronization request has been ordered.
STATUSThis fence level is not supported by Cluster Extension
XP.
DATAProhibits write requests immediately if a link failure or
disk failure occurs. The local disk array cannot replicate
data to the remote disk array. Fence level DATA
provides data concurrency at any time.
42HP StorageWorks Cluster Extension XP User Guide
The preceding fence levels provide data integrity on a per disk basis, so a
failure affecting a single disk pair does not lead to a halt of the replication
activities of non-affected disk pairs.
Synchronous replication can affect the performance of the system if the
distance between the disk arrays is significant.
Asynchronous replication
Continuous Access XP Extension offers a unique feature to replicate data
asynchronously.
To keep replicated data consistent among two disk arrays, any incoming
write request is ordered and numbered. The write request is then
acknowledged to the server, offering the fastest response time for remote
mirroring. Each write request is transferred to the remote disk array
asynchronously. The remote array orders all write requests before they are
destaged to the disk, keeping data consistent.
Asynchronous replication offers excellent performance for remote
mirroring and provides data consistency on a group of disks (consistency
groups) level.
RAID Manager XP instances
A RAID Manger instance is necessary to control pair operations and to
gather disk array status information.
The RAID Manager XP instance numbers used for the
RaidManagerInstances object must be the same among all systems using
Cluster Extension XP.
Several RAID Manager XP instances can be configured to provide
additional redundancy. Cluster Extension XP switches to the next available
instance when an instance becomes unavailable.
The RAID Manager XP instances should be running at all times to provide
the fastest failover capability. Cluster Extension XP provides scripts to
include the RAID Manager XP startup procedure in the system startup file
Cluster Extension XP processes and components43
(for example, /etc/inittab). However, Cluster Extension XP starts the
configured RAID Manager XP instances if it cannot find any running
instance.
Quorum service
The Cluster Extension XP quorum service employs static RAID Manager
API calls and therefore is not dependent on a RAID Manager instance.
RAID Manager XP device groups
A single device group must be configured for a service group (VCS), a
resource group (HACMP), a cluster group (Microsoft Cluster service), or a
package (SG-LX). This device group must include all disks being used for
the application service.
The device group is the unit in which the failover/failback operation is
being carried out. A device group can contain several volume groups.
44HP StorageWorks Cluster Extension XP User Guide
Rolling disaster protection and Business Copy XP
To implement rolling disaster protection, you must create Business Copy
XP disk pairs for the Continuous Access XP disk pairs locally. BC disk
pairs used for rolling disaster protection must be created with the
–m noread option of the paircreate command. This ensures that BC disks
are unavailable to other services, because these disks are intended to be
used for rolling disaster protection only. The BC SVOLs must be mapped
to a backup server and not to the local cluster node. When Cluster
Extension XP suspends the BC pairs, they become available to the local
server, which could result in duplicated volume or disk group IDs or
signatures.
To enable rolling disaster protection with Business Copy XP, set the
following objects for data centers A and B:
• BCEnabledA page 79
• BCEnabledB page 79
When these objects are set to YES, rolling disaster protection is enabled
and Cluster Extension XP checks whether the configured Business Copy
XP disk pairs are in PAIR state. Before initiating the resynchronization
operation, Cluster Extension XP suspends specified Business Copy XP disk
pairs that are in PAIR state.
If the BCEnabledA and BCEnabledB objects are set to YES, you must
configure specific Business Copy XP disk pairs by using MU (mirror unit)
numbers. The MU number defines one of the many disk pair relationships
you can create with Business Copy XP disk pairs. You can specify any
number of MU numbers that are supported by the Business Copy XP
software. Disk pair MU numbers are specified by the following objects for
data centers A and B:
• BCMuListA page 79
• BCMuListB page 80
Cluster Extension XP processes and components45
To enable resynchronization of Business Copy XP disk pairs that have been
split by Cluster Extension XP, use the following objects for data centers A
and B:
• BCResyncEnabledA page 79
• BCResyncEnabledB page 80
Cluster Extension XP maintains a list of all associated Business Copy XP
disk pairs that were in PAIR state before a resynchronization attempt. If
pairs were suspended, Cluster Extension XP automatically resynchronizes
those disk pairs after the Continuous Access XP remote mirrored disk pairs
have been paired. This feature supports automatic resynchronization of
locally split BC disk pairs only. You must specify MU numbers for
resynchronization by using the following objects for data centers A and B:
• BCResyncMuListA page 80
• BCResyncMuListB page 80
CautionIf rolling disaster protection is enabled and none of the Continuous Access
XP mirrored disk pairs have a Business Copy disk pair that is in PAIR state,
Cluster Extension XP returns a global error and you will not be able to
activate the application service. Ensure that at least one Business Copy
disk pair is in PAIR state.
You can use the forceflag to start the application service. See “Force flag”.
In this case, Cluster Extension XP disables rolling disaster protection.
Integration with RAID Manager XP
Rolling disaster protection does not require Business Copy XP disk pairs to
be defined in the RAID Manager XP horcmX.conf files that are used by
Cluster Extension XP. Cluster Extension XP uses the MU number to
monitor and control associated Business Copy XP pairs.
However, you must create a RAID Manager XP configuration file to
control the Business Copy XP disk pairs, which are outside of Cluster
Extension XP control.
46HP StorageWorks Cluster Extension XP User Guide
The management of Business Copy XP disk pairs is independent of Cluster
Extension XP/Continuous Access XP remotely mirrored disk pairs.
Cluster Extension XP uses the MU number to control the Business Copy
disk pairs. Therefore, only the RAID Manager XP instances that are
configured for Cluster Extension XP are required for rolling disaster
protection.
The Rolling Disaster Protection feature cannot suspend Business Copy XP
disk pairs on the XP family disk array in the remote data center if the RAID
Manager XP instance in the remote data center is not running or not
reachable.
Integration with automatic recovery
If the AutoRecover object is set to YES, Cluster Extension XP
automatically resynchronizes the Continuous Access XP disk pairs to
update the remote disks. If rolling disaster protection is enabled, it suspends
the Business Copy XP disk pair that is attached to the remote Continuous
Access XP disk.
If this remote Business Copy XP disk pair cannot be suspended because the
remote RAID Manager XP instance is not running or cannot be reached,
Cluster Extension XP continues the application service activation (online
the Cluster Extension XP resource) without automatic resynchronization of
the Continuous Access XP disk pair and without the suspension of the
Business Copy XP disk pair.
In this case, the Continuous Access XP disk pair must be recovered
manually.
Integration with the pair/resync monitor
If the ResyncMonitor object is set to YES, Business Copy XP disk pairs
are not used when the pair/resync monitor automatically recovers
suspended or failed Continuous Access XP disk pairs.
Cluster Extension XP processes and components47
To protect the remote volume of an out-of-sync Continuous Access XP disk
pair against rolling disasters, use the default settings for the pair/resync
monitor. Resynchronize the Continuous Access XP disk pair manually after
splitting off the Business Copy XP disk pair.
Restoring server operation
Rolling disaster protection automatically recovers the PAIR state of the
Continuous Access XP disk pair of an application service. Before you
failover (or failback) an application service from one data center to the
other, you must restore the server operation. After you restart the server,
also start the RAID Manager instance used to manage the Continuous
Access XP disk pairs on those servers. This enables rolling disaster
protection to work correctly during a recovery failover/failback operation.
Example
Figure 5 (page 49) depicts an example of a fully configured Cluster
Extension XP environment that uses rolling disaster protection. The
Business Copy disk pairs are specified as 0 in the Cluster Extension XP
BCMuListA and BCMuListB objects.
48HP StorageWorks Cluster Extension XP User Guide
RAID Manager XP
configuration file
Instance 101
(horcm101.conf)
manages both the CA
disk pairs and BC disk
pairs
Instance 5
(horcm5.conf)
manages the BC disk
pairs locally
Figure 5. Example of a disaster-tolerant configuration with rolling disaster protection
Cluster Extension XP processes and components49
User configuration file
Cluster Extension XP provides a user configuration file to customize
Cluster Extension XP failover/failback behavior. The user can specify all
customizable objects of Cluster Extension XP with this file.
Related information“The user configuration file”
50HP StorageWorks Cluster Extension XP User Guide
Pair/resync monitor
The pair/resync monitor clxchkd utility can be turned on or off with the
ResyncMonitor object.
The pair/resync monitor can either monitor or both monitor and
resynchronize the state of the RAID Manager XP device group for an
application service. The cluster software must be able to stop the
monitoring or resynchronization process if the application service is
stopped.
If the ResyncMonitorAutoRecover object is set to YES, the monitor tries
to resynchronize the remote disk based on the local disk. This occurs only if
the disks are in a PVOL/SVOL or SVOL/PVOL relationship. If one or both
disk peers are in the state SMPL or the device group state is mixed,
automatic resynchronization is not initiated.
The monitor is started from Cluster Extension XP the first time that Cluster
Extension XP checks the disk states. Any subsequent execution of the
monitor program adds the RAID Manager XP device group to be monitored
to the list of to-be-monitored device groups. The monitor interval specified
with the ResyncMonitorInterval object is used to monitor the device
group state. Do not set the monitor interval below the RAID Manager XP
timeout parameter (HORCM_MON in the horcmX.conf file).
CautionIf the application service must be stopped, the cluster software or your
customized solution to start and stop the application service must be able to
stop the monitoring or resynchronization process. If this cannot be ensured,
the use of the pair/resync monitor is not supported. It is highly
recommended to disable application service failover for the time of the disk
pair recovery (resynchronization). Cluster Extension XP assumes that if the
monitor is enabled, immediate action will be taken to recover a reported
suspended disk pair. If at any time the resynchronization process is running
on both disk array sites, data corruption can occur.
The ResyncMonitorAutoRecover option set to YES is supported with this
monitor only if the minimum disk array firmware version is 01-11-xx
Cluster Extension XP processes and components51
(XP512/XP48) or 21.01.xx (XP128/XP1024), and the minimum RAID
Manager XP version is 01.04.00.
The pair/resync monitor uses the syslog() facility (Linux/UNIX) and the
Event Log (Windows) to inform you if the link for the device group is
broken. A broken link is recognized only if data will be written to disk;
otherwise, the data is the same on the primary and secondary disk and
therefore the device group state is reported as PAIR.
52HP StorageWorks Cluster Extension XP User Guide
Force flag
The force flag forces Cluster Extension XP to skip the internal logic and
enables write access to the local volume regardless of the disk pair state.
This flag can be set when you are sure that the current site contains the
latest data, even though a previous application service startup process failed
because Cluster Extension XP discovered a disk pair status that could not
be handled automatically.
To use the force flag feature, you must create a file called
application_name.forceflag in the directory specified by the
ApplicationDir object prior to starting the application service that uses
Cluster Extension XP. Before you create this file, ensure that the
application service is not running elsewhere.
This file will be removed after Cluster Extension XP detects the file.
You cannot use the force flag if the local disk state is SVOL_COPY, which
indicates that a copy operation is in progress. A disk cannot be activated
when a write operation is in progress to that disk; therefore, Cluster
Extension XP returns a global error.
Using the force flag does not enable the automatic recovery features of
Cluster Extension XP. After using the force flag, you must recover the
suspended or broken disk pairs by using RAID Manager XP commands as
described in “Recovery sequence” (page 228).
Cluster Extension XP processes and components53
Pre-execution and post-execution programs
Cluster Extension XP can invoke pre-execution and post-execution
programs prior to or after a Cluster Extension XP takeover function. Those
programs can be any executable, and must be able to provide return codes
to Cluster Extension XP. If the programs add significant execution time to
the application service startup process, the timeout values for the startup
process must be adjusted in the cluster software.
Cluster Extension XP transfers information as command-line arguments to
the pre-execution and post-execution programs. Pre-executables and
post-executables must be specified by full path in the PreExecScript and
PostExecScript objects. If no executable is specified (empty value for the
object), no preprocessing or postprocessing, respectively, is done.
The following arguments are transferred to the scripts in this order:
The pre-executables and post-executables must return a return code. These
return codes are used to determine whether a takeover function must be
called.
Pre-executable return codes
0PRE_OK_TAKEOVER
Pre-executable ok and takeover action allowed.
1PRE_ERROR_GLOBAL
Pre-executable failed; no takeover; stop application
service cluster-wide.
2PRE_ERROR_DC
Pre-executable failed; no takeover; stop application
service in this data center.
3PRE_ERROR_LOCAL
Pre-executable failed; no takeover; stop application
service on this system.
4PRE_ERROR_TAKEOVER
Pre-executable failed; takeover action allowed.
5PRE_OK_NOTKVR_NOPST
Pre-executable ok; no takeover; no post-exec.
Cluster Extension XP processes and components55
CautionIf the pre-execution program returns 1, 2, 3 or 5, a properly configured
post-executable will not be executed. If a takeover function fails, the
post-executable will not be executed.
Post-execution return codes
0POST_OK
Post-executable ok; continue.
1POST_ERROR_GLOBAL
Post-executable failed; stop application service
cluster-wide.
2POST_ERROR_DC
Post-executable failed; stop application service in this
data center.
3POST_ERROR_LOCAL
Post-executable failed; stop application service on this
system.
4POST_ERROR_CONTINUE
Post-executable failed; continue without error.
CautionWindows 2000 script and batch files return 0 if the program was
successfully executed, even if you return a different return code.
56HP StorageWorks Cluster Extension XP User Guide
Cluster Extension XP log facility
The logging module of Cluster Extension XP provides log messages to the
cluster software as well as to the Cluster Extension XP log file. The Cluster
Extension XP log file includes disk status information.
The Cluster Extension XP log file is located in this directory:
Linux/UNIX /var/opt/hpclx/log
WindowsBy default, this location is defined as this value:
For the quorum service, the log resides at this location:
%systemroot%\clxq.log
If the log file needs to be cleared and reset, for example, to reduce disk
space usage, archive the log file and then delete it. A new log file is
generated automatically.
Related informationFor information about log levels, see “LogLevel”.
Cluster Extension XP processes and components57
Error return codes
Cluster Extension XP provides the following error return codes for failover
operations:
local errorProhibits an application service startup on the local
data center error Prohibits an application service startup on any system
global errorA global error is returned if the configuration or the
When Cluster Extension XP is integrated, an error message string and
integer value are displayed. For the command-line interface, a return code
is displayed. For more information, see “CLI Commands” on page 195.
system. This can be caused by the inability of Cluster
Extension XP to enable disk access, or
misconfiguration of the disk array environment.
in the local data center. This error is returned if the disk
state indicates that it makes no sense to allow any other
system connected to the same disk array to access the
disks.
disk state does not allow an automatic application
service startup process. Manual intervention is required
in such cases.
58HP StorageWorks Cluster Extension XP User Guide
Quorum service for Microsoft Cluster service
Microsoft Cluster service uses a single SCSI disk called the quorum disk to
eliminate the potential split brain condition and to coordinate
administrative actions performed by cluster nodes. The quorum disk
represents a possible single point of failure (SPOF) because when it fails or
communications is lost to it, the whole cluster will shut down. If this SPOF
is not eliminated, the cluster is at risk even when geographically dispersed.
The quorum service resolves the SPOF problem with the quorum disk. It
employs the HP StorageWorks Continuous Access XP (CA) technology to
remotely mirror the quorum disk and extends the Microsoft Cluster service
functions to manage this mirrored quorum disk.
The quorum service performs two major functions:
• To manage the mirrored quorum disk pair during regular cluster
operations and failover.
It detects quorum disk operations from Microsoft Cluster service and
swaps the disk pair in a timely manner before Microsoft Cluster
service moves the quorum ownership to the mirrored side or before a
failover to the mirrored side occurs.
• To avert a split brain scenario should part of the cluster become
completely isolated from the rest of the cluster.
When cluster nodes on the mirrored side of a dispersed cluster lose all
connections (including all heartbeats and CA links), the quorum
service uses external arbitration to address the potential split brain
problem.
Using the quorum service in a Microsoft Cluster service environment
In general, there are two types of disks used in a Microsoft Cluster service
environment: the application data disk and the quorum disk. The quorum
disk is a special case of an application disk.
While the Cluster Extension XP resource DLL provides disaster recovery
(DR) and high availability (HA) support for Microsoft Cluster service
Cluster Extension XP processes and components59
Quorum processes
application disks using the same CA technology, the quorum service
focuses on protecting the quorum disk, which functions somewhat
differently from data disks. In most circumstances, this protection is critical
for the cluster’s control resource. If the quorum disk fails, the whole cluster
will be unavailable, regardless of how well the application data disks are
protected.
The quorum service has few configuration options. After it is installed, the
service works seamlessly with the Microsoft Cluster service and the cluster
applications.
Install and use the quorum service together with the Cluster Extension XP
resource type to provide complete coverage for the cluster.
The quorum service creates a synchronous CA disk pair for the quorum
disk to prevent it from being a SPOF. When the primary quorum disk fails,
the quorum service allows the cluster to use the mirrored (secondary)
quorum disk to continue cluster operations. Specifically, the service
extends the Microsoft Cluster service quorum management protocol to:
• maintain the existing ownership in normal operations
• decide the new ownership when the original owning host node fails
• fail over ownership when the primary quorum disk fails in the context
of mirrored quorum disks
One major task of the quorum service is to swap the quorum pair. Usually
the primary disk has both read and write permissions. However, the
mirrored disk on the secondary side has no write permission. To use the
secondary quorum disk as a quorum, the service swaps the direction of the
disk pair to make the quorum disk at the second site writable before
Microsoft Cluster service starts to use it. The quorum service interacts with
Microsoft Cluster service at the I/O operation level. When it finds the
Microsoft Cluster service is going to use the secondary side quorum disk, it
swaps the disk pair for Microsoft Cluster service first. It uses the RAID
Manager XP library to communicate with the XP disk array.
60HP StorageWorks Cluster Extension XP User Guide
The quorum service also moderates ownership of the quorum resource.
More than one node can request ownership simultaneously. To coordinate
those nodes between dispersed locations and to eliminate the dependency
on the networks, the quorum service uses three sets of small volumes
(disks) as control devices to manage quorum ownership and coordination.
The first is always in a paired state; the second is not paired; and the third
changes its paired state dynamically. Because they require little space, you
can minimize resource requirements by creating CVS (custom volume size)
disks for use as control devices.
Cluster operation continues even though access to the quorum disk is lost
on cluster nodes that do not own the quorum disk. On those nodes, the
quorum service (ClxQSvc) continues to monitor the quorum filter driver in
case I/O access is lost to the XP disk system. However, logging to the
clxq.log file will be suspended until the quorum filter driver detects the
arrival of the missing quorum disk. This behavior follows the Microsoft
Cluster service behavior, which means that the Microsoft Cluster service
continues to run but the cluster node will not be able to become a quorum
disk owner in case the original quorum disk owner fails. The quorum
service and the quorum filter driver log this type of incident to the Event
Log to inform you properly.
If the quorum service exits without being gracefully stopped by the
Windows Service Control Manager (SCM), the SCM will automatically
restart the quorum service under virtually all possible conditions.
The quorum service is set to wait for 30 seconds for Microsoft Cluster
service startup activity. If the Cluster service does not go into
START_PENDING state during this timeframe, the quorum service will
stop automatically. The actual interval for Cluster Service startup checks is
1 second. It will continue checking every second until the Cluster service
changes to START_PENDING state. START_PENDING is an internal
state. The Service window will show STARTING as service status.
It is recommended that you start ClxQSvc (the quorum service) through its
enforced dependency to the Cluster service. This means you should start
the Cluster service first, which will cause the quorum service to start
automatically before the Cluster service starts.
Cluster Extension XP processes and components61
The wait timeout is the total amount of time (in seconds) that the quorum
service will wait for the Cluster service to start up, before giving up and
shutting itself down. This value should be sufficient for all cases. However,
you can change the value during manual service startups as follows:
1. Open the Services window.
2. Right click on ClxQSvc.
3. Select Properties.
4. In the Start parameters: field, enter
/waitforclussvc <number in seconds>
5. Press Start in the properties window.
The quorum service stops automatically if the Cluster service stops.
The quorum service checks during its initialization if the configured disk
pairs for the quorum service control disk 1 (STATUS) and quorum disk are
established. The quorum service will not start and does not allow the
Cluster service to start if those disk pairs are not in PAIR state.
For emergency startup of the Cluster Service in case all quorum arbitration
functions are unavailable or the disks cannot be paired, the quorum service
can be started manually:
1. Open the Services window.
2. Right-Click on ClxQSvc.
3. Select Properties.
4. In the Start parameters: field, enter
/createsplitbrain
5. Click the Start button in the properties window. This could create two
separate clusters.
CautionThe creation of separate clusters can result in data loss as explained below.
Clicking the Start button in the Services window can lead to a split brain
syndrome, where the cluster runs the same applications on two different
62HP StorageWorks Cluster Extension XP User Guide
sets of disks. This can happen if the cluster is running or restarted in the
remote data center and any number of cluster nodes are isolated from each
other.
A split brain condition occurs when one site of the cluster loses all the
connectivity with the second site and each site decides to form its own
cluster. The serious consequence of the split brain scenario is the corruption
of business data because client data will no longer be consistent. To
eliminate the split brain syndrome in cases where there is a total loss of
communications between sites, the quorum service employs an external
arbitration mechanism.
The external arbitrator runs on a node on your intranet, external to either
cluster site, but accessible by all cluster nodes. Assuming it is reachable by
one data center, even when one cluster site of a cluster loses all connectivity
with the other site, its major function is, upon request, to check the cluster
status by communicating with the cluster nodes. By providing sufficient
information to the cluster nodes, they can make the critical decision on
whether to form a new cluster, thereby avoiding a split brain condition.
In addition to the external arbitrator, two processes on each cluster node
work closely with the arbitrator. One process is created dynamically on a
cluster node located at the site not holding the quorum disk when it detects
that it has lost all connectivity to the site holding ownership of the quorum
disk. Its major function is to communicate with the arbitrator to determine
whether the cluster is still functioning. Based on the status of the cluster
and which site owns the quorum disk, it may decide to form a new cluster,
leave the cluster down, or join an existing cluster. This process is called the
“cluster decision maker.”
The second process is created dynamically on the host node owning the
quorum resource when the node detects that the mirror link between two
sites goes down. The main purpose of this process is to shut down the
cluster on the site that was controlling the quorum disk when that site
became completely isolated from the external network. This prevents a
potential split brain condition when the connectivity between the two sites
is restored, and the formally isolated copy of the cluster suddenly becomes
available to the external network. This process is called the “isolation
checker.” The isolation checker will restart after it has finished and the CA
XP link is still down. This can cause the Cluster service to stop if the
Cluster Extension XP processes and components63
network link between the arbitrator and the quorum owner in the cluster
fails after all heartbeat network links and all CA XP links have failed.
If a broken CA XP link is detected by the quorum service, messages will be
logged to the Event Log that show whether the cluster decision maker or
the isolation checker is running.
CautionDo not start the Cluster service while the cluster decision maker or the
isolation checker is running. The cluster decision maker and the isolation
checker will log start and stop messages with the results of their findings to
the Event Log.
The quorum service supports two ways to retrieve the required information
about the external arbitrator, such as its IP address and port number. One is
through a local configuration file, and the other is through the Active
Directory server. During installation, the user is asked whether an Active
Directory service is available. The quorum service will use the Active
Directory if it is available. Otherwise, it will use its local configuration file
generated by the installation process.
The quorum service is implemented as a Windows service. To ensure that
the service is always available for the quorum disk, a startup dependency is
imposed on the cluster service by the quorum service. The quorum service
must start and remain functioning during the entire time that the cluster is
running. If it is forced to stop, it will also stop the cluster service on that
server, ensuring that the quorum disk pair is always properly managed by
the quorum service.
64HP StorageWorks Cluster Extension XP User Guide
3
User configuration file and Cluster
Extension XP objects
Objects define the disk array environment and failover/failback behavior.
Objects can be customized in the user configuration file or directly in the
cluster software.
User configuration file and Cluster Extension XP objects65
The user configuration file
Cluster Extension XP uses the user configuration file to gather application
service-specific information. This file describes the dependencies between
application services and RAID Manager XP device groups in one file for all
application services in the cluster. This file must be copied to all nodes that
use Cluster Extension XP.
The user configuration file must be placed in the configuration directory:
Linux
UNIX
WindowsBy default, this location is defined as this value:
Related information“Basic configuration example” (page 87)
Related information“User configuration file for HACMP” (page 101)
“Creating and configuring the user configuration file” (page 194)
HACMP
The UCF.cfg file is required for IBM HACMP. A single UCF.cfg file must
be maintained and copied to all systems using Cluster Extension XP. The
UCF.cfg includes a “common” section to configure the Cluster Extension
XP environment and an “application” section to configure the application
service-dependent failover/failback behavior. The application section is a
multitag component; the APPLICATION tag and application-related
objects can appear numerous times in the UCF.cfg.
66HP StorageWorks Cluster Extension XP User Guide
Microsoft Cluster service
Cluster Extension XP integration with Microsoft Cluster service does not
require a user configuration file when the standard environment for Cluster
Extension XP is used. The Cluster Extension XP objects that are integrated
with Microsoft Cluster service are configurable as resource private properties
in the cluster software.
Related information“Configuring Cluster Extension XP resources” (page 117)
VCS
Cluster Extension XP integration with VERITAS Cluster Server does not
require a user configuration file when the standard environment for Cluster
Extension XP is used. The Cluster Extension XP objects that are integrated
with VERITAS Cluster Server are configurable as resource attributes in the
cluster software.
Related information“Configuring the Cluster Extension XP resource” (page 154)
SG-LX
An environment configuration file is required for Serviceguard. The file
must reside in the same directory as the package control file and is
identified by the package name:
package_name_clx.env
The APPLICATION tag is required, although no value is required.
Related information“Configuration of the Cluster Extension XP environment” (page 174)
File structure
The configuration file comprises a common section and application
sections. These sections are distinguished by control tags. Cluster
Extension XP uses the following objects as control tags:
•COMMON
• APPLICATION
User configuration file and Cluster Extension XP objects67
Objects have one of the following formats:
taga definition of an object, for example, COMMON or
integera number, for example, a timeout value.
stringa name, which can include alphabetic and numeric
lista list of space-separated strings, for example, a list of
Specifying object values
When using the default configuration, you must provide values for these
five objects:
APPLICATION.
characters and underscores, for example, an application
startup value.
host names (lists of numbers are stored as lists of
strings).
You do not need to change the default settings unless you want to change
the degree of protection for your paired disks. If you change an object, you
may need to change additional objects as well. For example, if you change
the FenceLevel object to DATA, you might need to change the DataLoseMirror object also.
Objects are supported according to the requirements or capabilities of the
cluster software, as listed in table 1 (page 69).
68HP StorageWorks Cluster Extension XP User Guide
Table 1. Cluster Extension XP supported objects
NamePage
COMMON71
LogDir71
LogLevel71
SearchObject72
VcsBinPath72
APPLICATION74
ApplicationDir74
ApplicationStartup75
AsyncTakeoverTimeout77
AutoRecover78
BCEnabledA79
BCEnabledB79
BCMuListA79
BCMuListB79
BCResyncEnabledA79
BCResyncEnabledB80
BCResyncMuListA80
BCResyncMuListB80
DataLoseDataCenter80
DataLoseMirror81
*DC_A_Hosts82
*DC_B_Hosts82
CLI
HACMP
MS Cluster
service
VCS
SG-LX
•••••
•••••
•••••
•
•
•••••
••••
•••••
•••••
•••••
•••••
•••••
•••••
•••••
•••••
•••••
•••••
•••••
•••••
•••••
•••••
•••••
User configuration file and Cluster Extension XP objects69
Table 1. Cluster Extension XP supported objects (Continued)
NamePage
* DeviceGroup82
FastFailbackEnabled83
FenceLevel83
Filesystems83
PostExecCheck84
PostExecScript84
PreExecScript84
* RaidManagerInstances84
ResyncMonitor85
ResyncMonitorAutoRecover85
ResyncMonitorInterval85
ResyncWaitTimeout86
Vgs86
* XPSerialNumbers86
CLI
HACMP
MS Cluster
service
VCS
SG-LX
(continued)
•••••
•
•••••
••
•••••
•••••
•••••
•••••
••••
••••
••••
•••••
•••••
•••••
LEGEND
* Required
• Supported
70HP StorageWorks Cluster Extension XP User Guide
COMMON section objects
The common part is used to set the environment of Cluster Extension XP.
The COMMON tag is a single-tag; it can appear in the configuration file
only once. The common object does not require any value.
Objects of the type common can only appear once. Those objects must be
placed after the COMMON tag in the configuration file.
If the default values fit your environment, there is no need to specify them
in the file.
COMMON
Formattag
DescriptionDistinguishes between general (common) and application-specific objects.
LogDir
Formatstring
Description(Optional) Defines the path to the Cluster Extension XP log file.
Description(Optional) Defines the logging level used by Cluster Extension XP.
User configuration file and Cluster Extension XP objects71
Valid valueserror (default)Logs only error messages for events that are
nonrecoverable.
warningLogs error messages and warning messages for
events that are recoverable.
infoLogs error messages, warning messages, and
additional information, such as disk status.
debugLogs error messages, warning messages, info
messages, and messages that report on execution
status, useful for troubleshooting.
SearchObjectHACMP only
Formatstring
Description(Optional) Searches for the application service if the user configuration
file specifies multiple applications. This object is not used for VCS,
Microsoft Cluster service, or SG-LX.
Default valueVgs
VcsBinPathVCS only
Formatstring
Description(Optional) Defines the path to the VCS binaries. This object is not used for
Microsoft Cluster service, SG-LX, or HACMP.
Default value/opt/VRTSvcs/bin
72HP StorageWorks Cluster Extension XP User Guide
APPLICATION section objects
The application part defines the failover and failback behavior of Cluster
Extension XP for each application service. APPLICATION is a multitag
that can appear in the configuration file for each application service using
Cluster Extension XP.
The APPLICATION object requires the name of the application service as
its value. The objects specified after an APPLICATION tag must appear
only once per application. As with the common part objects, the application
part objects have predefined default values.
Cluster Extension XP also uses the following rules to define objects:
• If you use the default value, you do not have to specify the object.
• Cluster Extension XP uses objects depending on the setting of other
objects. For example, if you set the FenceLevel object to DATA,
Cluster Extension XP uses the values specified for the
DataLoseMirror or DataLoseDataCenter object. However, these
objects are ignored if the FenceLevel object is set to NEVER.
• The pre-execution and post-execution functions in Cluster Extension
XP will not be processed if the associated object values are empty.
(This is the default setting.)
CLI
HACMP
SG-LX
To set APPLICATION object values, use the user configuration file.
VCS
Use the VCS GUI to set APPLICATION object values.
Microsoft Cluster service
To set APPLICATION object values, use the Microsoft Cluster service
Cluster Administrator GUI.
User configuration file and Cluster Extension XP objects73
APPLICATION
Formattag
DescriptionDistinguishes between general and application-specific objects. Specify
ApplicationDir
Formatstring
DescriptionSpecifies the directory where Cluster Extension XP searches for
the name of the application service. The format of its value is equivalent to
a string value.
SG-LX
For Serviceguard, the tag is required; however, specifying a value is not
necessary.
application-specific files, such as the force flag or online file.
If ApplicationDir is set to a nonexistent drive and PairResyncMonitor is
not enabled, Cluster Extension is unable to create the online file and
cannot put the resource online.
SG-LX
The value of ApplicationDir is derived from the package control file
location.
Windows
If ApplicationDir is not set, Cluster Extension uses the local
%HPCLX_PATH% values as defined in the registry.
Default valuesLinux
UNIX
/etc/opt/hpclx
Windows
%HPCLX_PATH%
74HP StorageWorks Cluster Extension XP User Guide
Filesresource_name.createsplitbrain
ApplicationStartup
Formatstring
Description(Optional) Specifies where a cluster group should be brought online.
resource_name.forceflag
resource_name.online
If specified in a user configuration file, resource_name is the value of the
APPLICATION tag; otherwise, resource_name is the value of the Cluster
Extension XP resource name.
The ApplicationStartup object can be customized to determine whether
an application service starts locally or is transferred back to the remote
data center (if possible) to start directly without waiting for
resynchronization. This object is used only if an application service has
already been transferred to the secondary site and no recovery procedure
has been applied to the disk set (the disk pair has not been recovered and is
not in PAIR state). This process is considered a failback attempt without
prior disk pair recovery.
Cluster Extension XP can detect the most current copy of your data based
on the disk state information. If Cluster Extension XP detects that the
remote XP disk array has the most current data, it orders a
resynchronization of the local disk from the remote disk, or it stops the
startup process to enable the cluster software to fail back to the remote XP
disk array.
If a resynchronization is ordered, Cluster Extension XP monitors the
progress of the copy process. If the application service was running on a
secondary XP disk array without replication link, a large number of
records may need to be copied. If the copy process takes more time than
the configured application startup timeout, the application startup will fail.
User configuration file and Cluster Extension XP objects75
Microsoft Cluster service
If the ApplicationStartup resource property is set to FASTFAILBACK
and the FailoverThreshold value is set to a number higher than the
current number of clustered systems for the resource group, the resource
group will restart on configured nodes until one of the following
conditions is met:
• The resource is brought online in the remote data center.
• The resource failed because the FailoverThreshold value has been
reached.
• The resource failed because the FailoverPeriod timeout value has
been reached.
CautionDisable subsequent automated failover procedures for recovery failback
operations.
Valid valuesFASTFAILBACK (default)
The cluster group will be brought online in the remote
data center (if possible) without waiting for
resynchronization. The application startup process will
be stopped locally and Cluster Extension XP reports a
data center error. Depending on the cluster software,
the application service cannot start on any system in
the local data center and the cluster software will
transfer the application service back to the remote data
center. Use this value to provide the highest
application service uptime. Depending on the value
configured for the AutoRecover object, Cluster
Extension XP will attempt to update the former
primary disk based on the secondary disk and swap the
personalities of the disk pair so that the local disk will
become the primary disk.
In a two-node cluster, this process will not work
because the target failback system would not be
available. In this case, the application service must be
started manually, or the ApplicationStartup object
should be set to RESYNCWAIT.
76HP StorageWorks Cluster Extension XP User Guide
AsyncTakeoverTimeout
Formatinteger
Description(Optional) Specifies the horctakeover command timeout in seconds.
RESYNCWAITOnline local, cluster group must wait until the disk
status is PA IR. Cluster Extension XP will initiate a
resynchronization of the local disk based on the remote
disk. The copy process will be monitored. If no copy
progress was made after a monitoring interval expired,
the copy process is considered failed and Cluster
Extension XP returns a global error. If
RESYNCWAIT has been specified for the
ApplicationStartup object, the ResyncWaitTimeout
object must be specified, in case Cluster Extension XP
should wait for resynchronization changes for more or
less than 90 seconds, which is the default.
Must be adjusted based on disk mirroring link speed.
This object is used only if the FenceLevel object value is ASYNC.
The takeover operation for fence level ASYNC (Continuous Access XP
Extension) offers the option to stop the data transfer process after a
specified time value. This is used to allow access to the remote copy if the
data transfer process has been stopped due to a Continuous Access
XP-link failure. All data that has been copied up to the moment the
timeout value has been reached is consistent and available to access at the
secondary site.
User configuration file and Cluster Extension XP objects77
Default value1800 (default)
AutoRecover
Description(Optional) Recovers a suspended or deleted disk pair when the resource is
CautionMeasure or calculate the full XP disk array cache copy time to use the
gathered information for the AsyncTakeoverTimeout object. After a
takeover command has been invoked, Continuous Access XP Extension
copies the side file area residing in the XP disk array cache to the site
where the takeover command has been issued (the secondary disks). The
side file area cannot exceed the installed cache size. The maximum time
for the AsyncTakeoverTimeout object is the time to fully copy the amount
of cache size data. The takeover timeout value is used to terminate the
copy process to provide access to the secondary disks, for example, if all
links or the primary XP disk array are unavailable to copy the side file
area. The copy time depends on the performance of the Continuous Access
XP link between your sites. The takeover or resynchronization operation
could take longer than the timeout value for application service startup in
the cluster software. The application service startup might fail in this case.
However, the takeover or resynchronization command will continue in the
background.
Formatstring
brought online at application service startup time.
If the AutoRecover object is set to YES, Cluster Extension XP will try to
resynchronize the remote disk at application startup time. Cluster
Extension XP will ignore the return code of the resynchronization
command and allow access to the disk ensuring highest application
availability.
If the resynchronization attempt fails, Cluster Extension XP will not fail.
The internal logic will first apply the concurrency and consistency rules to
allow access to the disk set.
If you configure fence level DATA for the device group and set the FenceLevel object to DATA, the AutoRecover object will change Cluster
Extension XP’s behavior. Cluster Extension XP will attempt to reestablish
the PAI R state and wait for the PAIR state before it allows access to the
disk. If the resynchronization or takeover process fails, Cluster Extension
XP returns a global error.
78HP StorageWorks Cluster Extension XP User Guide
Valid valuesYES (default)
BCEnabledA
Description(Optional) Enables rolling disaster protection for data center A.
Valid valuesYES
BCEnabledB
Description(Optional) Enables rolling disaster protection for data center B.
Valid valuesYES
BCMuListA
Description(Optional) Space-separated list defines the MU number of the Business
BCMuListB
NO
Formatstring
NO (default)
Formatstring
NO (default)
Formatlist
Copy XP disk pairs in data center A.
Formatlist
Description(Optional) Space-separated list defines the MU number of the Business
Copy XP disk pairs in data center B.
BCResyncEnabledA
Formatstring
Description(Optional) Enables automatic resynchronization of Business Copy XP
disk pairs in data center A. The automatic resynchronization function is
supported only when the split BC pair is located in the same data center
where Cluster Extension XP is started.
Valid valuesYES
NO (default)
User configuration file and Cluster Extension XP objects79
BCResyncEnabledB
Formatstring
Description(Optional) Enables automatic resynchronization of Business Copy XP
Valid valuesYES
BCResyncMuListA
Formatlist
Description(Optional) Space-separated list defines the MU number of the Business
BCResyncMuListB
Formatlist
Description(Optional) Space-separated list defines the MU number of the Business
DataLoseDataCenter
Formatstring
disk pairs in data center B. The automatic resynchronization function is
supported only when the split BC pair is located in the same data center
where Cluster Extension XP is started.
NO(default)
Copy XP disk pairs in data center A.
Copy XP disk pairs in data center B.
Description(Optional) Specifies whether a resource should be brought online while
the disk pair is (or will be) suspended or deleted and there is no connection
(CA XP and IP network) to the remote data center.
Used only if the FenceLevel object value is DATA.
RAID Manager XP is able to access its remote peer to invoke takeover
actions for Continuous Access XP device groups. It is also able to invoke a
swap-takeover operation of the device group from the secondary site. If no
configured remote RAID Manager XP instance replies to a request of the
local RAID Manager XP instance (remote status EX_ENORMT), all
network connections between the local and the remote data center are
considered DOWN. If the swap-takeover operation leads into a suspended
state for the device group, the Continuous Access XP links are considered
DOWN.
80HP StorageWorks Cluster Extension XP User Guide
Valid valuesYES (default)
DataLoseMirror
Formatstring
Description(Optional) Specifies whether a resource should be brought online while
Because redundant networks and Continuous Access XP links are
necessary to build a disaster tolerant environment, this situation can be
considered as a data center failure. The DataLoseDataCenter object is
used to allow/prohibit automatic application service startup in this
particular case.
The combination of setting the DataLoseMirror object to YES and the
DataLoseDataCenter object to NO are contradictory.
NO
the disk pair is suspended or deleted.
Used only if the FenceLevel object value is DATA and local and remote
XP disk status information can be gathered. If the remote XP disk state
information is not available (remote state EX_ENORMT), the setting of
the DataLoseDataCenter object will be used.
Depending on the value configured for the AutoRecover object, Cluster
Extension XP will attempt to recover the PAI R state for the device group.
Cluster Extension XP waits until the PAIR state has been established. If
this operation fails, Cluster Extension XP will return a global error.
Because the DATA fence level ensures no loss of concurrency, manual
intervention is required to recover the PAIR state. The PAIR state must be
reestablished for all disks in the device group before you can start the
application service.
The combination of setting the DataLoseMirror object to YES and the DataLoseDataCenter object to NO are contradictory.
Valid valuesYES
NO (default)
User configuration file and Cluster Extension XP objects81
DC_A_HostsRequired
Formatlist
DescriptionSpace-separated list defines the cluster nodes in data center A.
VCS
This object is a string-vector element. Add a new element to the list for
each system name.
DC_B_HostsRequired
Formatlist
DescriptionSpace-separated list defines the cluster nodes in data center B.
VCS
This object is a string-vector element. Add a new element to the list for
each system name.
DeviceGroupRequired
Formatstring
DescriptionRAID Manager XP device group, containing the application service disk
set.
FilesLinux
UNIX
/etc/horcmX.conf
Windows
drive:\winnt\horcmX.conf%system_root%\horcmX.conf
where X is the RAID Manager XP instance number.
82HP StorageWorks Cluster Extension XP User Guide
FastFailbackEnabledVCS only
Formatstring
Description(Optional) Disables VCS service groups for the data center. This allows
transferring the service group back to the remote data center immediately.
To allow this operation, the VCS configuration file (main.cf) will be write
enabled and saved later.
The service group will be disabled for all systems contained in either the
DC_A_Hosts object or DC_B_Hosts object. Then, the VCS configuration
file will be saved (dumped).
Valid valuesYES (default)
NO
FenceLevel
Formatstring
Description(Optional) The FenceLevel object specifies the fence level configured for
the device group. Cluster Extension XP checks whether the current fence
level reported by the XP disk array is the same as the configured
(expected) fence level. This object is also used to make sure your
configurations are supported based on consistency considerations.
Different failover and recovery procedures are used for different fence
levels.
If you change the FenceLevel object value, also review the values of these
objects:
User configuration file and Cluster Extension XP objects83
PostExecCheck
Formatstring
Description(Optional) The PostExecCheck object is used to configure Cluster
Valid valuesYES
PostExecScript
Formatstring
Description(Optional) Specifies an executable with its full path name to be invoked
PreExecScript
Formatstring
Description(Optional) Specifies an executable with its full path name to be invoked
Extension XP to gather XP disk pair status information after the takeover
procedure. That information will be passed to the post-executable. In case
of a remote data center failure, it could be time consuming to gather that
information, especially if your post-executable does not need any XP
status information. The arguments passed to the post-executable will
include only the local disk status if the PostExecCheck object is set to
NO. See “RAID Manager XP configuration” (page 90).
NO (default)
after the takeover action or failover procedure.
before the takeover action or failover procedure.
RaidManagerInstancesRequired
Formatlist
DescriptionA space-separated list of RAID Manager XP instances Cluster Extension
XP can use to communicate with the disk array. The instance numbers
must be the same among all cluster systems. Cluster Extension XP can
alternate between the specified instances.
84HP StorageWorks Cluster Extension XP User Guide
VCS
This object is a string-vector element. Add a new element to the list for
each system name.
FilesLinux
UNIX
/etc/horcmX.conf
Windows
%systemroot%\horcmX.conf
where X is the RAID Manager XP instance number.
ResyncMonitor
Formatstring
Description(Optional) Starts the pair/resync monitor to monitor the disk pair status
and resynchronize disk pairs if the ResyncMonitorAutoRecover attribute
is set to YES.
Valid valuesYES (default: Microsoft Cluster service)
NO (default: HACMP; SG-LX; VCS)
ResyncMonitorAutoRecover
Formatstring
Description(Optional) Automatically recovers disk pairs states if the disk pairs are
monitored by the pair/resync monitor.
Valid valuesYES
NO (default)
ResyncMonitorInterval
Formatinteger
Description(Optional) Specifies the monitor interval in seconds the pair/resync
monitor will check the disk pair status.
Default value60
User configuration file and Cluster Extension XP objects85
ResyncWaitTimeout
Formatinteger
Description(Optional) It is used to specify the timeout value in seconds for a disk pair
resynchronization. It may take some time to resynchronize disks. The
timer times out if there is no change in the percentage value of the copy
status for the device group in the specified time interval. The timeout value
is used if the ApplicationStartup object is set to RESYNCWAIT.
Default value90
VgsCLI and HACMP only
Formatlist
DescriptionList of volume groups
XPSerialNumbersRequired
Formatlist
DescriptionA space-separated list of at least two serial numbers must be specified: the
serial numbers of the primary and secondary XP disk arrays. Cluster
Extension XP checks whether the local disk array is contained in this list.
Serial numbers of the disk arrays of the connected cluster nodes (at least
two).
VCS
This object is a string-vector element. Add a new element to the list for
each system name.
86HP StorageWorks Cluster Extension XP User Guide
Basic configuration example
The following is an example of a basic UCF.cfg file.
#/etc/opt/hpclx/conf/UCF.cfg
#This is the Cluster Extension XP User Configuration File (UCF.cfg).
#The COMMON tag specifies the configuration for the
#Cluster Extension XP core environment
COMMON
LogLevel info #default (not necessary)
APPLICATION sap #the application service
Vgs sapdatavg saptmpvg #the volume groups (not necessary)
Filesystems /sapdata /saptmp #the filesystems
DeviceGroup sapdg #RM dev group for the app service
RaidManagerInstances 22 #RM instance number for dev group
DC_A_Hosts host1a host2a #Data center A
DC_B_Hosts host3b host4b #Data center B
User configuration file and Cluster Extension XP objects87
88HP StorageWorks Cluster Extension XP User Guide
4
RAID Manager XP dependencies
Cluster Extension XP depends on HP StorageWorks RAID Manager XP
and the cluster software it is integrated with.
Before you configure Cluster Extension XP, verify that the host and disk
array systems are properly configured:
• The disk array and its remote peer have been properly configured.
• The host system recognizes the disk arrays.
• The HP StorageWorks Continuous Access XP links are bidirectional
and working properly.
• You are familiar with the disk and volume configuration of the
operating system.
RAID Manager XP dependencies89
RAID Manager XP configuration
To function properly, Cluster Extension XP requires at least one instance of
RAID Manager XP. Cluster Extension XP starts the configured RAID
Manager XP instance if it is not running. However, if the RAID Manager
XP instance cannot be started or returns an error, Cluster Extension XP can
switch to an alternate RAID Manager XP instance.
Ensure that the path to the RAID manager binary files is included in the
PATH environment variable.
RecommendationConfigure two RAID Manager XP instances per system and start those
instances automatically at system boot time.
RAID Manager XP configuration file
The RAID Manager XP configuration file (horcmX.conf) is used to map
device groups to the internal disk array disks. A device group is the
common unit for failover operations initiated from the server side.
A RAID Manager XP configuration file consists of these four parts:
• HORCM_MON
The monitor part defines the local network and port where the RAID
Manager XP instance is listening for incoming requests from a remote
instance. It also defines the polling interval and timeout value for
request to other instances.
The first entry defines the network that RAID Manager XP listens to.
The default value is NONE. The default setting enables RAID
Manager XP to listen on all configured networks.
The timeout value is important to Cluster Extension XP. You can
configure the time Cluster Extension XP will wait to receive
information back from the remote site. The timeout interval applies for
each remote instance configured in the HORCM_INST section of the
RAID Manager XP configuration file. If the last instance configured in
the HORCM_INST section is the only instance that will answer a
request, it will take the number of seconds of the timeout value times
90HP StorageWorks Cluster Extension XP User Guide
the number of not responding remote instances until the request can be
answered. This must be considered for the application service startup
timeout value you can configure in your cluster software.
A general formula for this behavior in case of a complete site failure is
the following:
tW = t
t
W
(in 10 msec) x (nHI +1)
HM
= wait time until remote error will be reported by local RAID
Manager XP instance
t
= HORCM_MON timeout
HM
n
= number of remote instances, specified per device group in
HI
HORCM_INST
RecommendationReduce the default timeout value in conjunction with increasing numbers of
different (at least two) network connections to the remote RAID Manager
XP instance. The settings of these two parameters directly affect the timing
of the failover behavior of Cluster Extension XP. Cluster Extension XP
experiences the above mentioned wait time twice if all of the remote RAID
Manager XP instances cannot be reached. If a post-executable is
configured, a third wait time period is added.
• HORCM_CMD
The command device part defines which raw disk device is used to
communicate to the disk array. This device cannot be used for any data
other than control data of the RAID Manager XP instance. Several
command devices may be configured to provide alternate access paths
to control Continuous Access XP pair operations.
If command devices are configured in separate lines, RAID Manager
XP interprets those devices as different disk arrays. Therefore, you can
use one RAID Manager XP instance to control several XP disk arrays.
Cluster Extension XP does not support this feature.
• HORCM_DEV
The device group part maps device groups and device names to
internal disks (LDevs) in the disk array. Failover operations are carried
out for the device groups but can also be initiated for a single disk pair.
The device groups and device names must be unique in the RAID
Manager XP configuration file. However, device group names should
RAID Manager XP dependencies91
be unique for the whole cluster environment to prevent any kind of
user mistake.
For fence level ASYNC, the device group also represents a
consistency group.
To combine CA disk pairs and BC disk pairs, you can use the MU
number to specify internal BC disks.
RecommendationUse the local and remote LDEV and CU number as the device name to
easily recognize configuration or mapping mistakes. For example, if the
local LDEV number is 0a (hex), the local CU number is 0, the remote
LDEV number is 1 (hex) and the remote CU is 3, a recommended device
name would be disk_00a_301. This approach also ensures unique device
names because the LDEV number together with the CU number is a unique
disk identifier in a disk array.
Example
# pairdisplay –g testdg –fx –CLI
Group PairVol L/R Port# TID LU Seq# LDEV# P/S Status Fence Seq# P-LDEV# M
testdg disk_00a_301 L CL2-N 3 4 30061 00a P-VOL PAIR NEVER 30071 301 testdg disk_00a_301 R CL2-N 5 1 30071 301 S-VOL PAIR NEVER - 00a -
• HORCM_INST
The remote RAID Manager XP instances part defines which remote
system can be used to request information of the device group. For
most failover operations, the remote RAID Manager XP instance is
not necessary. However, it is used for pair consistency checks and
considered important. A remote instance should be configured for
each network available between the cluster nodes. The first and
preferred network RAID Manager XP instances should communicate
with each other in the cluster heartbeat network.
92HP StorageWorks Cluster Extension XP User Guide
Network considerations
Since RAID Manager XP is an essential resource to Cluster Extension XP,
it is highly recommended that you provide reliable network connections for
RAID Manager XP communications. It is also recommended to use the
heartbeat network (private network) for RAID Manager XP
communications. As with the heartbeat network, alternative network paths
are highly recommended. RAID Manager XP can be configured for the
networks it uses for each device group within the HORCM_INST part of
the RAID Manager XP configuration file.
Command device considerations
At least one command device must be configured for RAID Manager XP.
RAID Manager XP offers the same command device being accessed by
redundant paths. This feature should be used to prevent Cluster Extension
XP from aborting if a single access path to the command device is missing.
RecommendationSet up a second command device to provide an alternative control to the
paired disks.
CautionIf you use Auto Path for AIX to enable alternative pathing on IBM AIX
together with the XP disk array, RAID Manager XP does not support Auto
Path virtual paths for command devices.
RAID Manager XP dependencies93
Start and stop the RAID Manager XP instances
The RAID Manager XP instances configured to be used for Cluster
Extension XP should be started at system boot time to provide fastest
access to disk status information.
Cluster Extension XP provides scripts (Linux/UNIX) or a service
(Windows) to integrate RAID Manager XP instance startup into the system
startup process. However, if the system cannot automatically start and
monitor RAID Manager XP instances, RAID Manager XP can be started
and stopped by executing the following commands:
Linux/UNIXhorcmstart.sh instance_numbers
horcmshutdown.sh instance_numbers
Windowshorcmstart instance_numbers
horcmshutdown instance_numbers
Starting RAID Manager XP without specifying an instance number will
start instance 0 with the associated horcm.conf file. Zero (0) is not
recommended as an instance number for a Cluster Extension XP RAID
Manager XP instance.
94HP StorageWorks Cluster Extension XP User Guide
Takeover basic functionality test
After RAID Manager XP has been configured for the device groups used
by Cluster Extension XP, each device group must be verified to failover
correctly between the disk arrays from each server in the cluster. Therefore,
the device group must be in PAIR state already.
CautionRAID Manager XP keeps configuration data of the XP disk array in system
memory. Therefore, you must stop and restart RAID Manager XP instances
on all systems if a configuration change has been applied to any of the
involved XP disk arrays.
To test the correct failover and failback behavior, log in to each system used
with Cluster Extension XP and invoke the following commands if the local
disk is the secondary (SVOL) disk:
The output of the pairdisplay command indicates whether the local disk is
the secondary (SVOL) disk and if so, the horctakeover command shows a
SWAP-takeover as a result. If pairdisplay shows the local disk as primary
(PVOL) disk, log in to a system connected to the secondary (SVOL) disk
and invoke the horctakeover command there. If the horctakeover
command does not result in a SWAP-takeover, refer to “Recovery
procedures” (page 225) and “Troubleshooting” (page 203) to resolve the
issue.
The –t option of the horctakeover command is only used for fence level ASYNC.
RAID Manager XP dependencies95
96HP StorageWorks Cluster Extension XP User Guide
Integration with HACMP
Cluster Extension XP is integrated with the HACMP cluster software using
the standard customization scheme provided by HACMP. This allows
cluster administrators to configure the disk array-specific failover behavior
as pre-event of the standard HACMP event get_disk_vg_fs.
Related informationFor information about how to install Cluster Extension XP, see HP
See the readme file on the product CD for supported configurations.
5
Integration with HACMP97
Configuring resources
The Cluster Extension XP objects must be configured using a user
configuration file.
The Cluster Extension XP resource gathers all necessary information about
the disk arrays if a resource group is brought online.
If configured, a pair/resync monitor is started to monitor the Cluster
Extension XP resource. To use this monitor, HACMP must call a pre-event
for the standard HACMP event release_vg_fs.
The Cluster Extension XP binary clxhacmp is called as a pre-event of the
standard HACMP event get_disk_vg_fs in order to check the status of the
RAID Manager XP device group and if necessary takes appropriate actions
to allow access to these disks before HACMP is trying to access the disks
of the particular resource group.
Define the previously defined custom event get_disk_vg_fs_pre as a
pre-event of get_disk_vg_fs.
Cluster Extension XP controls the disk pairs based on RAID Manager XP
device groups. The volume group definition of the HACMP resource group
is used to determine the corresponding RAID Manager XP device group.
The mapping of the HACMP volume group configuration and the
corresponding RAID Manager XP device group is done by the Cluster
Extension XP user configuration file /etc/opt/hpclx/config/UCF.cfg.
Because of this mapping mechanism, you must specify the volume groups
owned by the HACMP resource groups in the user configuration file.
100HP StorageWorks Cluster Extension XP User Guide
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.