Permission is granted to copy, distribute and/or modify this document under the terms of the GNU
Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section
being this copyright notice and license. A copy of the license version 1.2 is included in the section
entitled “GNU Free Documentation License”.
SUSE®, openSUSE®,the openSUSE® logo, Novell®, the Novell® logo, the N® logo, are registered
trademarks of Novell, Inc. in the United States and other countries. Linux* is a registered trademark
of Linus Torvalds. All other third party trademarks are the property of their respective owners. A
trademark symbol (® , ™, etc.) denotes a Novell trademark; an asterisk (*) denotes a third-party
trademark.
All information found in this book has been compiled with utmost attention to detail. However, this
does not guarantee complete accuracy.Neither Novell, Inc., SUSE LINUX Products GmbH, the authors,
nor the translators shall be held liable for possible errors or the consequences thereof.
SUSE® Linux Enterprise High Availability Extension is an integrated suite of open
source clustering technologies that enables you to implement highly available physical
and virtual Linux clusters. For quick and efcient conguration and administration,
the High Availability Extension includes both a graphical user interface (GUI) and a
command line interface (CLI).
This guide is intended for administrators who need to set up, congure, and maintain
High Availability (HA) clusters. Both approaches (GUI and CLI) are covered in detail
to help the administrators choose the appropriate tool that matches their needs for performing the key tasks.
The guide is divided into the following parts:
Installation and Setup
Before starting to install and congure your cluster, make yourself familiar with
cluster fundamentals and architecture, get an overview of the key features and
benets, as well as modications since the last release. Learn which hardware and
software requirements must be met and what preparations to take before executing
the next steps. Perform the installation and basic setup of your HA cluster using
YaST.
Conguration and Administration
Add, congure and manage resources, using either the GUI or the crm command
line interface. Learn how to make use of load balancing and fencing. In case you
consider writing your own resource agents or modifying existing ones, get some
background information on how to create different types of resource agents.
Storage and Data Replication
SUSE Linux Enterprise High Availability Extension ships with a cluster-aware le
system (Oracle Cluster File System, OCFS2) and volume manager (clustered
Logical Volume Manager, cLVM). For replication of your data, the High Availability Extension also delivers DRBD (Distributed Replicated Block Device) which
you can use to mirror the data of a high availabilitly service from the active node
of a cluster to its standby node.
Troubleshooting and Reference
Managing your own cluster requires you to perform a certain amount of troubleshooting. Learn about the most common problems and how to x them. Find a comprehensive reference of the command line tools the High Availability Extension offers
for administering your own cluster. Also, nd a list of the most important facts and
gures about cluster resources and resource agents.
Many chapters in this manual contain links to additional documentation resources.
These include additional documentation that is available on the system as well as documentation available on the Internet.
For an overview of the documentation available for your product and the latest documentation updates, refer to http://www.novell.com/documentation.
1Feedback
Several feedback channels are available:
• To report bugs for a product component or to submit enhancement requests, please
use https://bugzilla.novell.com/. If you are new to Bugzilla, you
might nd the Bug Writing FAQs helpful, available from the Novell Bugzilla home
page.
• We want to hear your comments and suggestions about this manual and the other
documentation included with this product. Please use the User Comments feature
at the bottom of each page of the online documentation and enter your comments
there.
2Documentation Conventions
The following typographical conventions are used in this manual:
•
/etc/passwd: directory names and lenames
•
placeholder: replace placeholder with the actual value
•
PATH: the environment variable PATH
viiiHigh Availability Guide
•
ls, --help: commands, options, and parameters
•
user: users or groups
•
Alt, Alt + F1: a key to press or a key combination; keys are shown in uppercase as
on a keyboard
•
File, File > Save As: menu items, buttons
• This paragraph is only relevant for the specied architectures. The arrows mark
the beginning and the end of the text block.
This paragraph is only relevant for the specied architectures. The arrows mark
the beginning and the end of the text block.
•
Dancing Penguins (Chapter Penguins, ↑Another Manual): This is a reference to a
chapter in another manual.
About This Guideix
Part I. Installation and Setup
Conceptual Overview
SUSE® Linux Enterprise High Availability Extension is an integrated suite of open
source clustering technologies that enables you to implement highly available physical
and virtual Linux clusters, and to eliminate single points of failure. It ensures the high
availability and manageability of critical network resources including data, applications,
and services. Thus, it helps you maintain business continuity, protect data integrity,
and reduce unplanned downtime for your mission-critical Linux workloads.
It ships with essential monitoring, messaging, and cluster resource management functionality (supporting failover, failback, and migration (load balancing) of individually
managed cluster resources). The High Availability Extension is available as add-on to
SUSE Linux Enterprise Server 11.
1.1Product Features
SUSE® Linux Enterprise High Availability Extension helps you ensure and manage
the availability of your network resources. The following list highlights some of the
key features:
Support for a Wide Range of Clustering Scenarios
Including active/active and active/passive (N+1, N+M, N to 1, N to M) scenarios,
as well as hybrid physical and virtual clusters (allowing virtual servers to be clustered with physical servers to improve services availability and resource utilization).
1
Conceptual Overview3
Multi-node active cluster, containing up to 16 Linux servers. Any server in the
cluster can restart resources (applications, services, IP addresses, and le systems)
from a failed server in the cluster.
Flexible Solution
The High Availability Extension ships with OpenAIS messaging and membership
layer and Pacemaker Cluster Resource Manager. Using Pacemaker, administrators
can continually monitor the health and status of their resources, manage dependencies, and automatically stop and start services based on highly congurable rules
and policies. The High Availability Extension allows you to tailor a cluster to the
specic applications and hardware infrastructure that t your organization. Timedependent conguration enables services to automatically migrate back to repaired
nodes at specied times.
Storage and Data Replication
With the High Availability Extension you can dynamically assign and reassign
server storage as needed. It supports Fibre Channel or iSCSI storage area networks
(SANs). Shared disk systems are also supported, but they are not a requirement.
SUSE Linux Enterprise High Availability Extension also comes with a clusteraware le system (Oracle Cluster File System, OCFS2) and volume manager
(clustered Logical Volume Manager, cLVM). For replication of your data, the High
Availability Extension also delivers DRBD (Distributed Replicated Block Device)
which you can use to mirror the data of a high availably service from the active
node of a cluster to its standby node.
Support for Virtualized Environments
SUSE Linux Enterprise High Availability Extension supports the mixed clustering
of both physical and virtual Linux servers. SUSE Linux Enterprise Server 11 ships
with Xen, an open source virtualization hypervisor. The cluster resource manager
in the High Availability Extension is able to recognize, monitor and manage services
running within virtual servers created with Xen, as well as services running in
physical servers. Guest systems can be managed as services by the cluster.
Resource Agents
SUSE Linux Enterprise High Availability Extension includes a huge number of
resource agents to manage resources such as Apache, IPv4, IPv6 and many more.
It also ships with resource agents for popular third party applications such as IBM
WebSphere Application Server. For a list of Open Cluster Framework (OCF) resource agents included with your product, refer to Chapter 18, HA OCF Agents
(page 201).
4High Availability Guide
User-friendly Administration
For easy conguration and administration, the High Availability Extension ships
with both a graphical user interface (like YaST and the Linux HA Management
Client) and a powerful unied command line interface. Both approaches provide
a single point of administration for effectively monitoring and administrating your
cluster. Learn how to do so in the following chapters.
1.2Product Benets
The High Availability Extension allows you to congure up to 16 Linux servers into
a high-availability cluster (HA cluster), where resources can be dynamically switched
or moved to any server in the cluster. Resources can be congured to automatically
migrate in the event of a server failure, or they can be moved manually to troubleshoot
hardware or balance the workload.
The High Availability Extension provides high availability from commodity components.
Lower costs are obtained through the consolidation of applications and operations onto
a cluster. The High Availability Extension also allows you to centrally manage the
complete cluster and to adjust resources to meet changing workload requirements (thus,
manually “load balance” the cluster). Allowing clusters of more than two nodes also
provides savings by allowing several nodes to share a “hot spare”.
An equally important benet is the potential reduction of unplanned service outages as
well as planned outages for software and hardware maintenance and upgrades.
Reasons that you would want to implement a cluster include:
• Increased availability
• Improved performance
• Low cost of operation
• Scalability
• Disaster recovery
• Data protection
• Server consolidation
Conceptual Overview5
• Storage consolidation
Shared disk fault tolerance can be obtained by implementing RAID on the shared disk
subsystem.
The following scenario illustrates some of the benets the High Availability Extension
can provide.
Example Cluster Scenario
Suppose you have congured a three-server cluster, with a Web server installed on
each of the three servers in the cluster. Each of the servers in the cluster hosts two Web
sites. All the data, graphics, and Web page content for each Web site are stored on a
shared disk subsystem connected to each of the servers in the cluster. The following
gure depicts how this setup might look.
Figure 1.1
During normal cluster operation, each server is in constant communication with the
other servers in the cluster and performs periodic polling of all registered resources to
detect failure.
Suppose Web Server 1 experiences hardware or software problems and the users depending on Web Server 1 for Internet access, e-mail, and information lose their connections. The following gure shows how resources are moved when Web Server 1 fails.
Three-Server Cluster
6High Availability Guide
Figure 1.2
Web Site A moves to Web Server 2 and Web Site B moves to Web Server 3. IP addresses
and certicates also move to Web Server 2 and Web Server 3.
When you congured the cluster, you decided where the Web sites hosted on each Web
server would go should a failure occur. In the previous example, you congured Web
Site A to move to Web Server 2 and Web Site B to move to Web Server 3. This way,
the workload once handled by Web Server 1 continues to be available and is evenly
distributed between any surviving cluster members.
Three-Server Cluster after One Server Fails
When Web Server 1 failed, the High Availability Extension software
• Detected a failure and veried with STONITH that Web Server 1 was really dead
• Remounted the shared data directories that were formerly mounted on Web server
1 on Web Server 2 and Web Server 3.
• Restarted applications that were running on Web Server 1 on Web Server 2 and
Web Server 3
• Transferred IP addresses to Web Server 2 and Web Server 3
In this example, the failover process happened quickly and users regained access to
Web site information within seconds, and in most cases, without needing to log in again.
Now suppose the problems with Web Server 1 are resolved, and Web Server 1 is returned
to a normal operating state. Web Site A and Web Site B can either automatically fail
Conceptual Overview7
back (move back) to Web Server 1, or they can stay where they are. This is dependent
on how you congured the resources for them. Migrating the services back to Web
Server 1 will incur some down-time, so the High Availability Extension also allows
you to defer the migration until a period when it will cause little or no service interruption. There are advantages and disadvantages to both alternatives.
The High Availability Extension also provides resource migration capabilities. You
can move applications, Web sites, etc. to other servers in your cluster as required for
system management.
For example, you could have manually moved Web Site A or Web Site B from Web
Server 1 to either of the other servers in the cluster. You might want to do this to upgrade
or perform scheduled maintenance on Web Server 1, or just to increase performance
or accessibility of the Web sites.
1.3Cluster Congurations
Cluster congurations with the High Availability Extension might or might not include
a shared disk subsystem. The shared disk subsystem can be connected via high-speed
Fibre Channel cards, cables, and switches, or it can be congured to use iSCSI. If a
server fails, another designated server in the cluster automatically mounts the shared
disk directories that were previously mounted on the failed server. This gives network
users continuous access to the directories on the shared disk subsystem.
IMPORTANT: Shared Disk Subsystem with cLVM
When using a shared disk subsystem with cLVM, that subsystem must be connected to all servers in the cluster from which it needs to be accessed.
Typical resources might include data, applications, and services. The following gure
shows how a typical Fibre Channel cluster conguration might look.
8High Availability Guide
Figure 1.3
Typical Fibre Channel Cluster Conguration
Although Fibre Channel provides the best performance, you can also congure your
cluster to use iSCSI. iSCSI is an alternative to Fibre Channel that can be used to create
a low-cost Storage Area Network (SAN). The following gure shows how a typical
iSCSI cluster conguration might look.
Conceptual Overview9
Figure 1.4
Although most clusters include a shared disk subsystem, it is also possible to create a
cluster without a share disk subsystem. The following gure shows how a cluster
without a shared disk subsystem might look.
Typical iSCSI Cluster Conguration
Figure 1.5
10High Availability Guide
Typical Cluster Conguration Without Shared Storage
1.4Architecture
This section provides a brief overview of the High Availability Extension architecture.
It identies and provides information on the architectural components, and describes
how those components interoperate.
1.4.1 Architecture Layers
The High Availability Extension has a layered architecture. Figure 1.6, “Architecture”
(page 11) illustrates the different layers and their associated components.
Figure 1.6
Architecture
Conceptual Overview11
Messaging and Infrastructure Layer
The primary or rst layer is the messaging/infrastructure layer, also known as the
OpenAIS layer. This layer contains components that send out the messages containing
“I'm alive” signals, as well as other information. The program of the High Availability
Extension resides in the messaging/infrastructure layer.
Resource Allocation Layer
The next layer is the resource allocation layer. This layer is the most complex, and
consists of the following components:
Cluster Resource Manager (CRM)
Every action taken in the resource allocation layer passes through the Cluster Resource Manager. If other components of the resource allocation layer (or components
which are in a higher layer) need to communicate, they do so through the local
CRM.
On every node, the CRM maintains the Cluster Information Base (CIB) (page 12),
containing denitions of all cluster options, nodes, resources their relationship and
current status. One CRM in the cluster is elected as the Designated Coordinator
(DC), meaning that it has the master CIB. All other CIBs in the cluster are a replicas
of the master CIB. Normal read and write operations on the CIB are serialized
through the master CIB. The DC is the only entity in the cluster that can decide
that a cluster-wide change needs to be performed, such as fencing a node or moving
resources around.
Cluster Information Base (CIB)
The Cluster Information Base is an in-memory XML representation of the entire
cluster conguration and current status. It contains denitions of all cluster options,
nodes, resources, constraints and the relationship to each other. The CIB also synchronizes updates to all cluster nodes. There is one master CIB in the cluster,
maintained by the DC. All other nodes contain a CIB replica.
Policy Engine (PE)
Whenever the Designated Coordinator needs to make a cluster-wide change (react
to a new CIB), the Policy Engine calculates the next state of the cluster based on
the current state and conguration. The PE also produces a transition graph con-
12High Availability Guide
taining a list of (resource) actions and dependencies to achieve the next cluster
state. The PE runs on every node to speed up DC failover.
Local Resource Manager (LRM)
The LRM calls the local Resource Agents (see Section “Resource Layer” (page 13))
on behalf of the CRM. It can thus perform start / stop / monitor operations and report
the result to the CRM. It also hides the difference between the supported script
standards for Resource Agents (OCF, LSB, Heartbeat Version 1). The LRM is the
authoritative source for all resource-related information on its local node.
Resource Layer
The highest layer is the Resource Layer. The Resource Layer includes one or more
Resource Agents (RA). Resource Agents are programs (usually shell scripts) that have
been written to start, stop, and monitor a certain kind of service (a resource). Resource
Agents are called only by the LRM. Third parties can include their own agents in a
dened location in the le system and thus provide out-of-the-box cluster integration
for their own software.
1.4.2 Process Flow
SUSE Linux Enterprise High Availability Extension uses Pacemaker as CRM. The
CRM is implemented as daemon (crmd) that has an instance on each cluster node.
Pacemaker centralizes all cluster decision-making by electing one of the crmd instances
to act as a master. Should the elected crmd process (or the node it is on) fail, a new one
is established.
A CIB, reecting the cluster’s conguration and current state of all resources in the
cluster is kept on each node. The contents of the CIB are automatically kept in sync
across the entire cluster.
Many actions performed in the cluster will cause a cluster-wide change. These actions
can include things like adding or removing a cluster resource or changing resource
constraints. It is important to understand what happens in the cluster when you perform
such an action.
For example, suppose you want to add a cluster IP address resource. To do this, you
can use one of the command line tools or the GUI to modify the CIB. It is not required
to perform the actions on the DC, you can use either tool on any node in the cluster and
Conceptual Overview13
they will be relayed to the DC. The DC will then replicate the CIB change to all cluster
nodes.
Based on the information in the CIB, the PE then computes the ideal state of the cluster
and how it should be achieved and feeds a list of instructions to the DC. The DC sends
commands via the messaging/infrastructure layer which are received by the crmd peers
on other nodes. Each crmd uses it LRM (implemented as lrmd) to perform resource
modications. The lrmd is non-cluster aware and interacts directly with resource agents
(scripts).
The peer nodes all report the results of their operations back to the DC. Once the DC
concludes that all necessary operations are successfully performed in the cluster, the
cluster will go back to the idle state and wait for further events. If any operation was
not carried out as planned, the PE is invoked again with the new information recorded
in the CIB.
In some cases, it may be necessary to power off nodes in order to protect shared data
or complete resource recovery. For this Pacemaker comes with a fencing subsystem,
stonithd. STONITH is an acronym for “Shoot The Other Node In The Head” and is
usually implemented with a remote power switch. In Pacemaker, STONITH devices
are modeled as resources (and congured in the CIB) to enable them to be easily
monitored for failure. However, stonithd takes care of understanding the STONITH
topology such that its clients simply request a node be fenced and it does the rest.
1.5What's New?
With SUSE Linux Enterprise Server 11, the cluster stack has changed from Heartbeat
to OpenAIS. OpenAIS implements an industry standard API, the Application Interface
Specication (AIS), published by the Service Availability Forum. The cluster resource
manager from SUSE Linux Enterprise Server 10 has been retained but has been significantly enhanced, ported to OpenAIS and is now known as Pacemaker.
For more details what changed in the High Availability components from SUSE®
Linux Enterprise Server 10 SP2 to SUSE Linux Enterprise High Availability Extension
11, refer to the following sections.
14High Availability Guide
1.5.1 New Features and Functions Added
Migration Threshold and Failure Timeouts
The High Availability Extension now comes with the concept of a migration
threshold and failure timeout. You can dene a number of failures for resources,
after which they will migrate to a new node. By default, the node will no longer
be allowed to run the failed resource until the administrator manually resets the
resource’s failcount. However it is also possible to expire them by setting the re-
source’s failure-timeout option.
Resource and Operation Defaults
You can now set global defaults for resource options and operations.
Support for Ofine Conguration Changes
Often it is desirable to preview the effects of a series of changes before updating
the conguration atomically. You can now create a “shadow” copy of the
conguration that can be edited with the command line interface, before committing
it and thus changing the active cluster conguration atomically.
Reusing Rules, Options and Sets of Operations
Rules, instance_attributes, meta_attributes and sets of operations can be dened
once and referenced in multiple places.
Using XPath Expressions for Certain Operations in the CIB
The CIB now accepts XPath-based create, modify, delete operations. For
more information, refer to the cibadmin help text.
Multi-dimensional Collocation and Ordering Constraints
For creating a set of collocated resources, previously you could either dene a resource group (which could not always accurately express the design) or you could
dene each relationship as an individual constraint—causing a constraint explosion
as the number of resources and combinations grew. Now you can also use an alter-
nate form of collocation constraints by dening resource_sets.
Connection to the CIB From Non-cluster Machines
Provided Pacemaker is installed on a machine, it is possible to connect to the
cluster even if the machine itself is not a part of it.
Conceptual Overview15
Triggering Recurring Actions at Known Times
By default, recurring actions are scheduled relative to when the resource started,
but this is not always desirable. To specify a date/time that the operation should
be relative to, set the operation’s interval-origin. The cluster uses this point to calculate the correct start-delay such that the operation will occur at origin + (interval
* N).
1.5.2 Changed Features and Functions
Naming Conventions for Resource and Custer Options
All resource and cluster options now use dashes (-) instead of underscores (_). For
example, the master_max meta option has been renamed to master-max.
Renaming of master_slave Resource
The master_slave resource has been renamed to master. Master resources
are a special type of clone that can operate in one of two modes.
Container Tag for Attributes
The attributes container tag has been removed.
Operation Field for Prerequisites
The pre-req operation eld has been renamed requires.
Interval for Operations
All operations must have an interval. For start/stop actions the interval must be set
to 0 (zero).
Attributes for Collocation and Ordering Constraints
The attributes of collocation and ordering constraints were renamed for clarity.
Cluster Options for Migration Due to Failure
The resource-failure-stickiness cluster option has been replaced by
the migration-threshold cluster option. See also Migration Threshold and
Failure Timeouts (page 15).
Arguments for Command Line Tools
The arguments for command-line tools have been made consistent. See also Naming
Conventions for Resource and Custer Options (page 16).
16High Availability Guide
Validating and Parsing XML
The cluster conguration is written in XML. Instead of a Document Type Denition
(DTD), now a more powerful RELAX NG schema is used to dene the pattern for
the structure and content. libxml2 is used as parser.
id Fields
id elds are now XML IDs which have the following limitations:
• IDs cannot contain colons.
• IDs cannot begin with a number.
• IDs must be globally unique (not just unique for that tag).
References to Other Objects
Some elds (such as those in constraints that refer to resources) are IDREFs. This
means that they must reference existing resources or objects in order for the
conguration to be valid. Removing an object which is referenced elsewhere will
therefor fail.
1.5.3 Removed Features and Functions
Setting Resource Meta Options
It is no longer possible to set resource meta-options as top-level attributes. Use
meta attributes instead.
Setting Global Defaults
Resource and operation defaults are no longer read from crm_cong.
Conceptual Overview17
Getting Started
In the following, learn about the system requirements and which preparations
to take before installing the High Availability Extension. Find a short overview of the
basic steps to install and set up a cluster.
2.1Hardware Requirements
The following list species hardware requirements for a cluster based on SUSE® Linux
Enterprise High Availability Extension. These requirements represent the minimum
hardware conguration. Additional hardware might be necessary, depending on how
you intend to use your cluster.
• 1 to 16 Linux servers with software as specied in Section 2.2, “Software Requirements” (page 20). The servers do not require identical hardware (memory, disk
space, etc.).
• At least two TCP/IP communication media. Cluster nodes use multicast for communication so the network equipment must support multicasting. The communication
media should support a data rate of 100 Mbit/s or higher. Preferably, the Ethernet
channels should be bonded.
2
• Optional: A shared disk subsystem connected to all servers in the cluster from
where it needs to be accessed.
• A STONITH mechanism. STONITH is an acronym for “Shoot the other node in
the head”. A STONITH device is a power switch which the cluster uses to reset
Getting Started19
nodes that are thought to be dead or behaving in a strange manner. Resetting nonheartbeating nodes is the only reliable way to ensure that no data corruption is
performed by nodes that hang and only appear to be dead.
For more information, refer to Chapter 8, Fencing and STONITH (page 81).
2.2Software Requirements
Ensure that the following software requirements are met:
• SUSE® Linux Enterprise Server 11 with all available online updates installed on
all nodes that will be part of the cluster.
• SUSE Linux Enterprise High Availability Extension 11 including all available online
updates installed on all nodes that will be part of the cluster.
2.3Shared Disk System Requirements
A shared disk system (Storage Area Network, or SAN) is recommended for your cluster
if you want data to be highly available. If a shared disk subsystem is used, ensure the
following:
• The shared disk system is properly set up and functional according to the manufacturer’s instructions.
• The disks contained in the shared disk system should be congured to use mirroring
or RAID to add fault tolerance to the shared disk system. Hardware-based RAID
is recommended. Host-based software RAID is not supported for all congurations.
• If you are using iSCSI for shared disk system access, ensure that you have properly
congured iSCSI initiators and targets.
• When using DRBD to implement a mirroring RAID system that distributes data
across two machines, make sure to only access the replicated device. Use the same
(bonded) NICs that the rest of the cluster uses to leverage the redundancy provided
there.
20High Availability Guide
Loading...
+ 268 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.