HP Cascading Failover in a Continentalclusters White Paper

Cascading Failover in a Continentalclusters
Introduction......................................................................................................................................... 2
Overview ........................................................................................................................................ 2
Purpose .......................................................................................................................................... 3
Terminology .................................................................................................................................... 3
Audience ........................................................................................................................................ 4
Configuration...................................................................................................................................... 4
Cluster and Storage Requirements...................................................................................................... 4
Volume Group Setup ........................................................................................................................ 4
Data Initialization............................................................................................................................. 5
Refreshing data on the Recovery Site.................................................................................................. 6
Package Setup .................................................................................................................................... 7
Primary Cluster Package Setup .......................................................................................................... 8
Recovery Cluster Package Setup ........................................................................................................ 8
Steps for Failure and Recovery Scenarios ............................................................................................... 8
Failure of Primary Site within Primary Cluster....................................................................................... 8
Failback from the Secondary Site to the Primary Site ............................................................................9
Failure of Secondary Site within the Primary Cluster ........................................................................... 10
Failover from Primary Cluster to Recovery Cluster...............................................................................11
Failback from the Recovery Cluster to the Secondary Site within the Primary Cluster............................... 11
Failback from the Recovery Site Directly to the Primary Site in the Primary Cluster .................................. 12
For more information.......................................................................................................................... 14

Overview

Cascading failover is the ability for an application to fail from a primary to a secondary location, and then to fail to a recovery location. The primary location, the primary and secondary site, contains a metropolitan cluster built with the HP Metrocluster solution, and the recovery location as a standard Serviceguard cluster. Continentalclusters provides a “push-button” recovery between Serviceguard clusters. Data replication also follows the cascading model. Data is synchronously replicated from the primary disk array to the secondary disk array in the Metrocluster, and periodically data is manually replicated via storage data replication technology to the third disk array in the Serviceguard recovery cluster.
Continentalclusters with cascading failover uses three main data centers distributed between a metropolitan cluster, which serves as a primary cluster, and a standard cluster, which serves as a recovery cluster.
In the primary cluster, there are two disk arrays, either of which can have the source volumes for a particular application. Throughout this document, the term primary disk array refers to the disk array that holds the volumes that are being replicated to the remote disk array for a particular application, and the data center where this disk array is located is called the primary site. The term secondary disk array refers to the disk array that holds the volumes that the data is being replicated to using the storage specific replication technology for a particular application, and the data center where the secondary disk array for that application is located is known as the secondary site. Thus, primary and secondary sites are roles that can be played by either disk array in the primary cluster. However, once the data replication link has been defined for the secondary disk array to the recovery disk array, primary and secondary sites will be fixed.
The recovery disk array holds a remote replicated copy of the data in the recovery cluster. The data center that houses the recovery disk array is called the recovery site. The data is replicated from the secondary disk array to the recovery disk array through manual operations or custom made scripts.
The basic design of the cascading failover solution is shown in Figure 1. The primary cluster, shown on the left, is configured as a Metrocluster with three data centers physically located on three different sites—two main sites (primary and secondary sites) and an arbitrator site (a third location) which is not shown in the figure below. The primary and secondary sites can relative to the application given that data replication is possible from both disk arrays in the primary cluster to the disk array in the recovery cluster. A fourth data center (recovery site) is used for the recovery cluster, which is a standard Serviceguard configuration. Also, the primary and recovery cluster are configured as a Continentalclusters.
2
Figure 1 - Replication Setup
Also, the figure shows at a high level how the data replication is connected. In the primary cluster, the primary replication group has two devices: a source device (connected to the primary site labeled as device A) and a destination (connected to the secondary site and labeled as device B). Data is replicated via storage data replication facilities (e.g. Continuous Access) continuously from source to destination. On the secondary site, a local mirror is associated with the destination devices (labeled as device B’). The mirror technology is storage specific (e.g. Business Copy). This local mirror also acts as a source device for the recovery replication group and recovery during rolling disasters. In the recovery cluster, the destination device (labeled as device C) of the recovery replication group are connected to the nodes in the cluster. Data is periodically replicated to the destination devices via storage data replication technology. Also, a local mirror of the destination device is required for cases of rolling disasters (labeled as device C’).
Currently, HP StorageWorks XP Continuous Access (CA) and EMC Symmetrix SRDF technologies are supported for the multi-site disaster tolerant solution.

Purpose

This document introduces the cascaded configuration in a Continentalclusters environment. This includes configuration, maintenance, and recovery procedure for the cascaded configuration environment.

Terminology

Throughout this document, the following terms refer to high level operations performed under a cascaded configuration scenario. These defined terms are not general terms that are used in other documents referring to data replication, clustering, and multi-site disaster tolerant solutions.
The device where I/O can be performed is referred to as the source. The data that is written to the source device will be replicated to another device remotely. This remote device is referred to as the destination (or destination device). For the XP CA data replication technology, the correct terminology for source and destination are PVOL and SVOL respectively. For the EMC SRDF data replication technology, source and destination are R1 and R2 respectively. Local mirror is the name for a local copy of a device on the same disk array. Business Copy (BC) for HP XP and Business Continuance Volumes (BCV) for EMC SRDF are the supported local mirroring capabilities used for the Metrocluster and Continentalclusters products. A replication group is a pairing of source and destination devices where the source replicates data to its assigned destination. In the cascaded configuration, there are two replication groups. The replication group in the primary cluster is referred to as the primary
3
replication group and the replication group from the secondary site to the recovery site is referred to as the recovery replication group.
The following are three actions used to perform data replication specific operations.
Establish: To “establish” a replication group or the replication link refers to enabling data to
be replicated from source to the destination or vice versa. This sometimes involves a block or track level copy to the out of date copy and thus while copying is going on, the device being replicated to is inconsistent. Unless stated, the data is copied from source to the destination.
Split: To “split” a replication, the data replication link, or the local mirror refers to stopping
the data replication from source to destination or from a device to a local mirror.
Restore: To “restore” refers to copying data from a local mirror to a device in the
replication group.

Audience

It is assumed that readers are already familiar with Serviceguard configuration tasks, Continentalclusters installation and configuration procedures, HP StorageWorks XP, EMC Symmetrix, BC, BCV, XP CA, SRDF, XP CA and Symmetrix multi-hop concepts, and configuration and usage of the Raid Manager command line interface and Symmetrix Command Line Interface (SymCLI).

Configuration

Cluster and Storage Requirements

Most physical data replication technologies have the option to perform automatic link recovery. The automatic link recovery provides the ability synchronize data from source to destination devices when the data replication links have been recovered. For the cascading failover solution, this option to automatically recover links should be “disabled”. This will keep the links from automatically trying to establish a new connection upon failure of all links. This is required to allow the time to split the local mirror on the secondary disk array (device B’) and the recovery replication group before re­establishing the data replication for the primary replication group (that is between the primary disk array and the secondary disk array upon link recovery).
Other than what has been stated above, there are no additional cluster or storage requirements besides what is required by the Continentalclusters product.

Volume Group Setup

Use the following procedure to set up volume groups for the devices used on the disk arrays. Currently, LVM and VxVM volume management software suites are supported for this configuration. Please refer to LVM and VxVM documentation for specific commands.
1. Before creating the volume groups, make sure that the data replication link is established
between the primary devices on the primary disk array and the destination devices on the secondary disk array (i.e. devices A and B), and the local mirror device (device B’) in the secondary disk array are established as mirrors of the local standard devices.
2. Create the volume group only on one cluster node that connects to the primary disk array.
3. For LVM and VxVM, metadata needs to be replicated to the destination devices (devices C
and C’) in the recovery disk. The following are steps to replicate data to the recovery site.
4
a. Split the local mirror devices (device B’) in the secondary disk array from the primary
replication group.
b. Establish the data replication link between the local mirror devices in secondary disk
array (device B’) and the destination devices (device C) in the recovery disk array (i.e. establish the recovery replication group).
c. Check the data synchronization process and wait until synchronization completes. d. Once the synchronization completes, split the data replication link between the
secondary disk array and the recovery disk array (i.e. the recovery replication group).
e. Establish the local mirror devices (device C’) in the recovery disk array as mirrors of
the standard devices.
f. Re-establish the local mirror devices (device B’) to the primary replication group as a
mirror of the standard device.
4. LVM requires to import volume group information. Please follow LVM specific commands to
configuration volume groups on all the nodes that are connected to the disk arrays.
Before Serviceguard and Continentalclusters can be configured, the created volume groups must work properly on all nodes in connected to the disk arrays. Please refer to the specific volume group management documentation for procedure for importing and activating the volume group on a given node.

Data Initialization

If there is already a metropolitan cluster with the data replication already established with a local mirror configured for rolling disasters and you are now adding Continentalclusters into the configuration (that is adding a recovery site with a recovery disk array), only procedure 2 is required.
Procedure 1: Mirroring from the Primary to the Secondary Disk Array
This procedure is illustrated in Figure 2.
Figure 2 - Mirroring from the Primary to the Secondary Disk Array
Execute the following steps:
1. Establish mirroring of the secondary device (device B) to the local mirror device (device B’) in
the secondary disk array.
2. Establish replication of the primary replication group which is between the primary site and
the secondary site in the primary cluster.
5
Loading...
+ 9 hidden pages