Steps for Failure and Recovery Scenarios ............................................................................................... 8
Failure of Primary Site within Primary Cluster....................................................................................... 8
Failback from the Secondary Site to the Primary Site ............................................................................9
Failure of Secondary Site within the Primary Cluster ........................................................................... 10
Failover from Primary Cluster to Recovery Cluster...............................................................................11
Failback from the Recovery Cluster to the Secondary Site within the Primary Cluster............................... 11
Failback from the Recovery Site Directly to the Primary Site in the Primary Cluster .................................. 12
For more information.......................................................................................................................... 14
Introduction
Overview
Cascading failover is the ability for an application to fail from a primary to a secondary location, and
then to fail to a recovery location. The primary location, the primary and secondary site, contains a
metropolitan cluster built with the HP Metrocluster solution, and the recovery location as a standard
Serviceguard cluster. Continentalclusters provides a “push-button” recovery between Serviceguard
clusters. Data replication also follows the cascading model. Data is synchronously replicated from the
primary disk array to the secondary disk array in the Metrocluster, and periodically data is manually
replicated via storage data replication technology to the third disk array in the Serviceguard recovery
cluster.
Continentalclusters with cascading failover uses three main data centers distributed between a
metropolitan cluster, which serves as a primary cluster, and a standard cluster, which serves as a
recovery cluster.
In the primary cluster, there are two disk arrays, either of which can have the source volumes for a
particular application. Throughout this document, the term primary disk array refers to the disk array
that holds the volumes that are being replicated to the remote disk array for a particular application,
and the data center where this disk array is located is called the primary site. The term secondary disk
array refers to the disk array that holds the volumes that the data is being replicated to using the
storage specific replication technology for a particular application, and the data center where the
secondary disk array for that application is located is known as the secondary site. Thus, primary and
secondary sites are roles that can be played by either disk array in the primary cluster. However,
once the data replication link has been defined for the secondary disk array to the recovery disk
array, primary and secondary sites will be fixed.
The recovery disk array holds a remote replicated copy of the data in the recovery cluster. The data
center that houses the recovery disk array is called the recovery site. The data is replicated from the
secondary disk array to the recovery disk array through manual operations or custom made scripts.
The basic design of the cascading failover solution is shown in Figure 1. The primary cluster, shown
on the left, is configured as a Metrocluster with three data centers physically located on three different
sites—two main sites (primary and secondary sites) and an arbitrator site (a third location) which is
not shown in the figure below. The primary and secondary sites can relative to the application given
that data replication is possible from both disk arrays in the primary cluster to the disk array in the
recovery cluster. A fourth data center (recovery site) is used for the recovery cluster, which is a
standard Serviceguard configuration. Also, the primary and recovery cluster are configured as a
Continentalclusters.
2
Figure 1 - Replication Setup
Also, the figure shows at a high level how the data replication is connected. In the primary cluster, the
primary replication group has two devices: a source device (connected to the primary site labeled as
device A) and a destination (connected to the secondary site and labeled as device B). Data is
replicated via storage data replication facilities (e.g. Continuous Access) continuously from source to
destination. On the secondary site, a local mirror is associated with the destination devices (labeled
as device B’). The mirror technology is storage specific (e.g. Business Copy). This local mirror also
acts as a source device for the recovery replication group and recovery during rolling disasters. In the
recovery cluster, the destination device (labeled as device C) of the recovery replication group are
connected to the nodes in the cluster. Data is periodically replicated to the destination devices via
storage data replication technology. Also, a local mirror of the destination device is required for
cases of rolling disasters (labeled as device C’).
Currently, HP StorageWorks XP Continuous Access (CA) and EMC Symmetrix SRDF technologies are
supported for the multi-site disaster tolerant solution.
Purpose
This document introduces the cascaded configuration in a Continentalclusters environment. This
includes configuration, maintenance, and recovery procedure for the cascaded configuration
environment.
Terminology
Throughout this document, the following terms refer to high level operations performed under a
cascaded configuration scenario. These defined terms are not general terms that are used in other
documents referring to data replication, clustering, and multi-site disaster tolerant solutions.
The device where I/O can be performed is referred to as the source. The data that is written to the
source device will be replicated to another device remotely. This remote device is referred to as the
destination (or destination device). For the XP CA data replication technology, the correct terminology
for source and destination are PVOL and SVOL respectively. For the EMC SRDF data replication
technology, source and destination are R1 and R2 respectively. Local mirror is the name for a local
copy of a device on the same disk array. Business Copy (BC) for HP XP and Business Continuance
Volumes (BCV) for EMC SRDF are the supported local mirroring capabilities used for the Metrocluster
and Continentalclusters products. A replication group is a pairing of source and destination devices
where the source replicates data to its assigned destination. In the cascaded configuration, there are
two replication groups. The replication group in the primary cluster is referred to as the primary
3
replication group and the replication group from the secondary site to the recovery site is referred to
as the recovery replication group.
The following are three actions used to perform data replication specific operations.
•Establish: To “establish” a replication group or the replication link refers to enabling data to
be replicated from source to the destination or vice versa. This sometimes involves a block or
track level copy to the out of date copy and thus while copying is going on, the device being
replicated to is inconsistent. Unless stated, the data is copied from source to the destination.
•Split: To “split” a replication, the data replication link, or the local mirror refers to stopping
the data replication from source to destination or from a device to a local mirror.
•Restore: To “restore” refers to copying data from a local mirror to a device in the
replication group.
Audience
It is assumed that readers are already familiar with Serviceguard configuration tasks,
Continentalclusters installation and configuration procedures, HP StorageWorks XP, EMC Symmetrix,
BC, BCV, XP CA, SRDF, XP CA and Symmetrix multi-hop concepts, and configuration and usage of
the Raid Manager command line interface and Symmetrix Command Line Interface (SymCLI).
Configuration
Cluster and Storage Requirements
Most physical data replication technologies have the option to perform automatic link recovery. The
automatic link recovery provides the ability synchronize data from source to destination devices when
the data replication links have been recovered. For the cascading failover solution, this option to
automatically recover links should be “disabled”. This will keep the links from automatically trying to
establish a new connection upon failure of all links. This is required to allow the time to split the local
mirror on the secondary disk array (device B’) and the recovery replication group before reestablishing the data replication for the primary replication group (that is between the primary disk
array and the secondary disk array upon link recovery).
Other than what has been stated above, there are no additional cluster or storage requirements
besides what is required by the Continentalclusters product.
Volume Group Setup
Use the following procedure to set up volume groups for the devices used on the disk arrays.
Currently, LVM and VxVM volume management software suites are supported for this configuration.
Please refer to LVM and VxVM documentation for specific commands.
1. Before creating the volume groups, make sure that the data replication link is established
between the primary devices on the primary disk array and the destination devices on the
secondary disk array (i.e. devices A and B), and the local mirror device (device B’) in the
secondary disk array are established as mirrors of the local standard devices.
2. Create the volume group only on one cluster node that connects to the primary disk array.
3. For LVM and VxVM, metadata needs to be replicated to the destination devices (devices C
and C’) in the recovery disk. The following are steps to replicate data to the recovery site.
4
a. Split the local mirror devices (device B’) in the secondary disk array from the primary
replication group.
b. Establish the data replication link between the local mirror devices in secondary disk
array (device B’) and the destination devices (device C) in the recovery disk array
(i.e. establish the recovery replication group).
c. Check the data synchronization process and wait until synchronization completes.
d. Once the synchronization completes, split the data replication link between the
secondary disk array and the recovery disk array (i.e. the recovery replication
group).
e. Establish the local mirror devices (device C’) in the recovery disk array as mirrors of
the standard devices.
f. Re-establish the local mirror devices (device B’) to the primary replication group as a
mirror of the standard device.
4. LVM requires to import volume group information. Please follow LVM specific commands to
configuration volume groups on all the nodes that are connected to the disk arrays.
Before Serviceguard and Continentalclusters can be configured, the created volume groups must work
properly on all nodes in connected to the disk arrays. Please refer to the specific volume group
management documentation for procedure for importing and activating the volume group on a given
node.
Data Initialization
If there is already a metropolitan cluster with the data replication already established with a local
mirror configured for rolling disasters and you are now adding Continentalclusters into the
configuration (that is adding a recovery site with a recovery disk array), only procedure 2 is required.
Procedure 1: Mirroring from the Primary to the Secondary Disk Array
This procedure is illustrated in Figure 2.
Figure 2 - Mirroring from the Primary to the Secondary Disk Array
Execute the following steps:
1. Establish mirroring of the secondary device (device B) to the local mirror device (device B’) in
the secondary disk array.
2. Establish replication of the primary replication group which is between the primary site and
the secondary site in the primary cluster.
5
Loading...
+ 9 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.