Red Hat CLUSTER SUITE - CONFIGURING AND MANAGING A CLUSTER 2006, Cluster Suite User Manual

Download

Page 1

Red Hat Cluster Suite

Conﬁguring and Managing a

Cluster

Page 2

Red Hat Cluster Suite: Conﬁguring and Managing a Cluster

Red Hat, Inc.

1801 Varsity Drive Raleigh NC 27606-2072 USA Phone: +1 919 754 3700 Phone: 888 733 4281 Fax: +1 919 754 3701 PO Box 13588 Research Triangle Park NC 27709 USA

rh-cs(EN)-4-Print-RHI (2007-01-05T17:28) For Part I Using the Red Hat Cluster Manager and Part III Appendixes, permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation. A copy of the license is available at http://www.gnu.org/licenses/fdl.html. The content described in this paragraph is copyrighted by © Mission Critical Linux, Inc. (2000), K.M. Sorenson (2000), and Red Hat, Inc. (2000-2003). This material in Part II Conﬁguring a Linux Virtual Server Cluster may be distributedonly subject to the terms and conditions set forth in the Open Publication License, V1.0 or later (the latest version is presently available at

http://www.opencontent.org/openpub/). Distribution of substantively modiﬁed versions of this material is prohibited without the explicit permission of the copyright holder. Distribution of the work or derivativeof the work in any standard (paper) book form for commercial purposes is prohibited unless prior permission is obtained from the copyright holder. The content described in this paragraph is copyrighted by © Red Hat, Inc. (2000-2003). Red Hat and the Red Hat "Shadow Man" logo are registered trademarks of Red Hat, Inc. in the United States and other countries. All other trademarks referenced herein are the property of their respective owners. The GPG ﬁngerprint of the security@redhat.com key is: CA 20 86 86 2B D6 9D FC 65 F6 EC C4 21 91 80 CD DB 42 A6 0E

Page 3

Table of Contents

Introduction........................................................................................................................ i

1. How To Use This Manual .................................................................................... i

2. Document Conventions....................................................................................... ii

3. More to Come ......................................................................................................v

3.1. Send in Your Feedback .........................................................................v

4. Activate Your Subscription ................................................................................ vi

4.1. Provide a Red Hat Login..................................................................... vi

4.2. Provide Your Subscription Number ................................................... vii

4.3. Connect Your System......................................................................... vii

I. Using the Red Hat Cluster Manager ............................................................................ i

1. Red Hat Cluster Manager Overview....................................................................1

1.1. Red Hat Cluster Manager Features .......................................................2

2. Hardware Installation and Operating System Conﬁguration ...............................9

2.1. Choosing a Hardware Conﬁguration ....................................................9

2.2. Cluster Hardware Components ...........................................................14

2.3. Setting Up the Nodes ..........................................................................18

2.4. Installing and Conﬁguring Red Hat Enterprise Linux ........................22

2.5. Setting Up and Connecting the Cluster Hardware..............................27

3. Installing and Conﬁguring Red Hat Cluster Suite Software ..............................35

3.1. Software Installation and Conﬁguration Tasks ...................................35

3.2. Overview of the Cluster Conﬁguration Tool....................................36

3.3. Installing the Red Hat Cluster Suite Packages....................................39

3.4. Starting the Cluster Conﬁguration Tool...........................................40

3.5. Naming The Cluster............................................................................43

3.6. Conﬁguring Fence Devices.................................................................44

3.7. Adding and Deleting Members...........................................................49

3.8. Conﬁguring a Failover Domain ..........................................................55

3.9. Adding Cluster Resources...................................................................60

3.10. Adding a Cluster Service to the Cluster............................................62

3.11. Propagating The Conﬁguration File: New Cluster ...........................65

3.12. Starting the Cluster Software ............................................................65

4. Cluster Administration.......................................................................................67

4.1. Overview of the Cluster Status Tool .................................................67

4.2. Displaying Cluster and Service Status................................................68

4.3. Starting and Stopping the Cluster Software........................................71

4.4. Modifying the Cluster Conﬁguration..................................................71

4.5. Backing Up and Restoring the Cluster Database................................72

4.6. Updating the Cluster Software............................................................74

4.7. Changing the Cluster Name ................................................................74

4.8. Disabling the Cluster Software ...........................................................74

4.9. Diagnosing and Correcting Problems in a Cluster..............................75

5. Setting Up Apache HTTP Server.......................................................................77

5.1. Apache HTTP Server Setup Overview ...............................................77

Page 4

5.2. Conﬁguring Shared Storage ................................................................77

5.3. Installing and Conﬁguring the Apache HTTP Server.........................78

II. Conﬁguring a Linux Virtual Server Cluster ............................................................81

6. Introduction to Linux Virtual Server..................................................................83

6.1. Technology Overview .........................................................................83

6.2. Basic Conﬁgurations ...........................................................................84

7. Linux Virtual Server Overview..........................................................................85

7.1. A Basic LVS Conﬁguration ................................................................85

7.2. A Three Tiered LVS Conﬁguration.....................................................87

7.3. LVS Scheduling Overview..................................................................89

7.4. Routing Methods.................................................................................91

7.5. Persistence and Firewall Marks ..........................................................93

7.6. LVS Cluster — A Block Diagram ......................................................94

8. Initial LVS Conﬁguration...................................................................................97

8.1. Conﬁguring Services on the LVS Routers ..........................................97

8.2. Setting a Password for the Piranha Conﬁguration Tool ..................98

8.3. Starting the Piranha Conﬁguration Tool Service.............................99

8.4. Limiting Access To the Piranha Conﬁguration Tool .....................100

8.5. Turning on Packet Forwarding..........................................................101

8.6. Conﬁguring Services on the Real Servers ........................................101

9. Setting Up a Red Hat Enterprise Linux LVS Cluster.......................................103

9.1. The NAT LVS Cluster.......................................................................103

9.2. Putting the Cluster Together .............................................................106

9.3. Multi-port Services and LVS Clustering...........................................107

9.4. FTP In an LVS Cluster......................................................................109

9.5. Saving Network Packet Filter Settings .............................................112

10. Conﬁguring the LVS Routers with Piranha Conﬁguration Tool ................115

10.1. Necessary Software.........................................................................115

10.2. Logging Into the Piranha Conﬁguration Tool .............................115

10.3. CONTROL/MONITORING........................................................116

10.4. GLOBAL SETTINGS ..................................................................118

10.5. REDUNDANCY............................................................................120

10.6. VIRTUAL SERVERS ...................................................................122

10.7. Synchronizing Conﬁguration Files .................................................133

10.8. Starting the Cluster .........................................................................135

III. Appendixes...............................................................................................................137

A. Supplementary Hardware Information............................................................139

A.1. Attached Storage Requirements.......................................................139

A.2. Setting Up a Fibre Channel Interconnect .........................................139

A.3. SCSI Storage Requirements.............................................................141

B. Selectively Installing Red Hat Cluster Suite Packages ...................................147

B.1. Installing the Red Hat Cluster Suite Packages .................................147

C. Multipath-usage.txt File for Red Hat Enterprise Linux 4 Update 3 ......157

Page 5

Index................................................................................................................................165

Colophon.........................................................................................................................171

Page 6

Page 7

Introduction

The Red Hat Cluster Suite is a collection of technologies working together to provide data integrity and the ability to maintain application availability in the event of a failure. Administrators can deploy enterprise cluster solutions using a combination of hardware redundancy along with the failover and load-balancing technologies in Red Hat Cluster Suite.

Red Hat Cluster Manager is a high-availability cluster solution speciﬁcally suited for database applications, network ﬁle servers, and World Wide Web (Web) servers with dynamic content. A Red Hat Cluster Manager system features data integrity and application availability using redundant hardware, shared disk storage, power management, robust cluster communication, and robust application failover mechanisms.

Administrators can also deploy highly available applications services using Piranha, a loadbalancing and advanced routing cluster solution based on Linux Virtual Server (LVS) technology. Using Piranha, administrators can build highly available e-commerce sites that feature complete data integrity and service availability, in addition to load balancing capabilities. Refer to Part II Conﬁguring a Linux Virtual Server Cluster for more information.

This guide assumes that the user has an advanced working knowledge of Red Hat Enterprise Linux and understands the concepts of server computing. For more information about using Red Hat Enterprise Linux, refer to the following resources:

• Red Hat Enterprise Linux Installation Guide for information regarding installation.

• Red Hat Enterprise Linux Introduction to System Administration for introductory infor-

mation for new Red Hat Enterprise Linux system administrators.

• Red Hat Enterprise Linux System Administration Guide for more detailed information

about conﬁguring Red Hat Enterprise Linux to suit your particular needs as a user.

• Red Hat Enterprise Linux Reference Guide provides detailed information suited for more

experienced users to reference when needed, as opposed to step-by-step instructions.

• Red Hat Enterprise Linux Security Guide details the planning and the tools involved in

creating a secured computing environment for the data center, workplace, and home.

HTML, PDF, and RPM versions of the manuals are available on the Red Hat Enterprise Linux Documentation CD and online at:

http://www.redhat.com/docs/

1. How To Use This Manual

This manual contains information about setting up a Red Hat Cluster Manager system. These tasks are described in the following chapters:

Page 8

ii Introduction

• Chapter 2 Hardware Installation and Operating System Conﬁguration

• Chapter 3 Installing and Conﬁguring Red Hat Cluster Suite Software

Part II Conﬁguring a Linux Virtual Server Cluster describes how to achieve load balancing in an Red Hat Enterprise Linux cluster by using the Linux Virtual Server.

Appendix A Supplementary Hardware Information contains detailed conﬁguration infor- mation on speciﬁc hardware devices and shared storage conﬁgurations.

Appendix B Selectively Installing Red Hat Cluster Suite Packages contains information about custom installation of Red Hat Cluster Suite and Red Hat GFS RPMs.

Appendix C Multipath-usage.txt File for Red Hat Enterprise Linux 4 Update 3 contains information from the Multipath-usage.txt ﬁle. The ﬁle provides guidelines for using dm-multipath with Red Hat Cluster Suite for Red Hat Enterprise Linux 4 Update

This guide assumes you have a thorough understanding of Red Hat Enterprise Linux system administration concepts and tasks. For detailed information on Red Hat Enterprise Linux system administration, refer to the Red Hat Enterprise Linux System Administra-

tion Guide. For reference information on Red Hat Enterprise Linux, refer to the Red Hat Enterprise Linux Reference Guide.

2. Document Conventions

In this manual, certain words are represented in different fonts, typefaces, sizes, and weights. This highlighting is systematic; different words are represented in the same style to indicate their inclusion in a speciﬁc category. The types of words that are represented this way include the following:

command

Linux commands (and other operating system commands, when used) are represented this way. This style should indicate to you that you can type the word or phrase on the command line and press [Enter] to invoke a command. Sometimes a command contains words that would be displayed in a different style on their own (such as ﬁle names). In these cases, they are considered to be part of the command, so the entire phrase is displayed as a command. For example:

Use the cat testfile command to view the contents of a ﬁle, named testfile, in the current working directory.

file name

File names, directory names, paths, and RPM package names are represented this way. This style indicates that a particular ﬁle or directory exists with that name on your system. Examples:

Page 9

Introduction iii

The .bashrc ﬁle in your home directory contains bash shell deﬁnitions and aliases for your own use.

The /etc/fstab ﬁle contains information about different system devices and ﬁle systems.

Install the webalizer RPM if you want to use a Web server log ﬁle analysis program.

application

This style indicates that the program is an end-user application (as opposed to system software). For example:

Use Mozilla to browse the Web.

[key]

A key on the keyboard is shown in this style. For example:

To use [Tab] completion, type in a character and then press the [Tab] key. Your terminal displays the list of ﬁles in the directory that start with that letter.

[key]-[combination]

A combination of keystrokes is represented in this way. For example:

The [Ctrl]-[Alt]-[Backspace] key combination exits your graphical session and returns you to the graphical login screen or the console.

text found on a GUI interface

A title, word, or phrase found on a GUI interface screen or window is shown in this style. Text shown in this style indicates that a particular GUI screen or an element on a GUI screen (such as text associated with a checkbox or ﬁeld). Example:

Select the Require Password checkbox if you would like your screensaver to require a password before stopping.

top level of a menu on a GUI screen or window

A word in this style indicates that the word is the top level of a pulldown menu. If you click on the word on the GUI screen, the rest of the menu should appear. For example:

Under File on a GNOME terminal, the New Tab option allows you to open multiple shell prompts in the same window.

Instructions to type in a sequence of commands from a GUI menu look like the following example:

Go to Applications (the main menu on the panel) => Programming => Emacs Text Editor to start the Emacs text editor.

Page 10

iv Introduction

button on a GUI screen or window

This style indicates that the text can be found on a clickable button on a GUI screen. For example:

Click on the Back button to return to the webpage you last viewed.

computer output

Text in this style indicates text displayed to a shell prompt such as error messages and responses to commands. For example:

The ls command displays the contents of a directory. For example:

Desktop about.html logs paulwesterberg.png Mail backupfiles mail reports

The output returned in response to the command (in this case, the contents of the directory) is shown in this style.

prompt

A prompt, which is a computer’s way of signifying that it is ready for you to input something, is shown in this style. Examples:

[stephen@maturin stephen]$

leopard login:

user input

Text that the user types, either on the command line or into a text box on a GUI screen, is displayed in this style. In the following example, text is displayed in this style:

To boot your system into the text based installation program, you must type in the text command at the boot: prompt.



replaceable



Text used in examples that is meant to be replaced with data provided by the user is displayed in this style. In the following example,



version-numberis dis-

played in this style:

The directory for the kernel source is /usr/src/kernels/



version-number/,

where



version-numberis the version and type of kernel installed on this

system.

Additionally, we use several different strategies to draw your attention to certain pieces of information. In order of urgency, these items are marked as a note, tip, important, caution, or warning. For example:

Page 11

Introduction v

Note

Remember that Linux is case sensitive. In other words, a rose is not a ROSE is not a rOsE.

Tip

The directory /usr/share/doc/ contains additional documentation for packages installed on your system.

Important

If you modify the DHCP conﬁguration ﬁle, the changes do not take effect until you restart the DHCP daemon.

Caution

Do not perform routine tasks as root — use a regular user account unless you need to use the root account for system administration tasks.

Warning

Be careful to remove only the necessary partitions. Removing other partitions could result in data loss or a corrupted system environment.

3. More to Come

This manual is part of Red Hat’s growing commitment to provide useful and timely support to Red Hat Enterprise Linux users.

Page 12

vi Introduction

3.1. Send in Your Feedback

If you spot a typo, or if you have thought of a way to make this manual better, we would love to hear from you. Please submit a report in Bugzilla (http://bugzilla.redhat.com/bugzilla/) against the component rh-cs.

Be sure to mention the manual’s identiﬁer:

rh-cs(EN)-4-Print-RHI (2007-01-05T17:28)

By mentioning this manual’s identiﬁer, we know exactly which version of the guide you have.

If you have a suggestion for improving the documentation, try to be as speciﬁc as possible. If you have found an error, please include the section number and some of the surrounding text so we can ﬁnd it easily.

4. Activate Your Subscription

Before you can access service and software maintenance information, and the support documentation included in your subscription, you must activate your subscription by registering with Red Hat. Registration includes these simple steps:

• Provide a Red Hat login

• Provide a subscription number

• Connect your system

The ﬁrst time you boot your installation of Red Hat Enterprise Linux, you are prompted to register with Red Hat using the Setup Agent. If you follow the prompts during the Setup Agent, you can complete the registration steps and activate your subscription.

If you can not complete registration during the Setup Agent (which requires network access), you can alternatively complete the Red Hat registration process online at http://www.redhat.com/register/.

4.1. Provide a Red Hat Login

If you do not have an existing Red Hat login, you can create one when prompted during the Setup Agent or online at:

https://www.redhat.com/apps/activate/newlogin.html

A Red Hat login enables your access to:

• Software updates, errata and maintenance via Red Hat Network

Page 13

Introduction vii

• Red Hat technical support resources, documentation, and Knowledgebase

If you have forgotten your Red Hat login, you can search for your Red Hat login online at:

https://rhn.redhat.com/help/forgot_password.pxt

4.2. Provide Your Subscription Number

Your subscription number is located in the package that came with your order. If your package did not include a subscription number, your subscription was activated for you and you can skip this step.

You can provide your subscription number when prompted during the Setup Agent or by visiting http://www.redhat.com/register/.

4.3. Connect Your System

The Red Hat Network Registration Client helps you connect your system so that you can begin to get updates and perform systems management. There are three ways to connect:

1. During the Setup Agent — Check the Send hardware information and Send sys- tem package list options when prompted.

2. After the Setup Agent has been completed — From Applications (the main menu on the panel), go to System Tools, then select Red Hat Network.

3. After the Setup Agent has been completed — Enter the following command from the command line as the root user:

• /usr/bin/up2date --register

Page 14

viii Introduction

Page 15

I. Using the Red Hat Cluster Manager

Clustered systems provide reliability, scalability, and availability to critical production services. Using the Red Hat Cluster Manager, administrators can create high availability clusters for ﬁlesharing, Web servers, and more. This part discusses the installation and conﬁguration of cluster systems using the recommended hardware and Red Hat Enterprise Linux.

This section is licensed under the GNU Free Documentation License. For details refer to the Copyright page.

Table of Contents

1. Red Hat Cluster Manager Overview............................................................................1

2. Hardware Installation and Operating System Conﬁguration ...................................9

3. Installing and Conﬁguring Red Hat Cluster Suite Software ...................................35

4. Cluster Administration................................................................................................67

5. Setting Up Apache HTTP Server ...............................................................................77

Page 16

Page 17

Chapter 1.

Red Hat Cluster Manager Overview

Red Hat Cluster Manager allows administrators to connect separate systems (called members or nodes) together to create failover clusters that ensure application availability and

data integrity under several failure conditions. Administrators can use Red Hat Cluster Manager with database applications, ﬁle sharing services, web servers, and more.

To set up a failover cluster, you must connect the nodes to the cluster hardware, and conﬁgure the nodes into the cluster environment. The foundation of a cluster is an advanced host membership algorithm. This algorithm ensures that the cluster maintains complete data integrity by using the following methods of inter-node communication:

• Network connections between the cluster systems

• A Cluster Conﬁguration System daemon (ccsd) that synchronizes conﬁguration between

cluster nodes

To make an application and data highly available in a cluster, you must conﬁgure a cluster service, an application that would beneﬁt from Red Hat Cluster Manager to ensure high availability. A cluster service is made up of cluster resources, components that can be failed over from one node to another, such as an IP address, an application initialization script, or a Red Hat GFS shared partition. Building a cluster using Red Hat Cluster Manager allows transparent client access to cluster services. For example, you can provide clients with access to highly-available database applications by building a cluster service using Red Hat Cluster Manager to manage service availability and shared Red Hat GFS storage partitions for the database data and end-user applications.

You can associate a cluster service with a failover domain, a subset of cluster nodes that are eligible to run a particular cluster service. In general, any eligible, properly-conﬁgured node can run the cluster service. However, each cluster service can run on only one cluster node at a time in order to maintain data integrity. You can specify whether or not the nodes in a failover domain are ordered by preference. You can also specify whether or not a cluster service is restricted to run only on nodes of its associated failover domain. (When associated with an unrestricted failover domain, a cluster service can be started on any cluster node in the event no member of the failover domain is available.)

You can set up an active-active conﬁguration in which the members run different cluster services simultaneously, or a hot-standby conﬁguration in which primary members run all the cluster services, and a backup member takes over only if a primary member fails.

If a hardware or software failure occurs, the cluster automatically restarts the failed node’s cluster services on the functional node. This cluster-service failover capability ensures that no data is lost, and there is little disruption to users. When the failed node recovers, the cluster can re-balance the cluster services across the nodes.

Page 18

2 Chapter 1. Red Hat Cluster Manager Overview

In addition, you can cleanly stop the cluster services running on a cluster system and then restart them on another system. This cluster-service relocation capability allows you to maintain application and data availability when a cluster node requires maintenance.

1.1. Red Hat Cluster Manager Features

Cluster systems deployed with Red Hat Cluster Manager include the following features:

No-single-point-of-failure hardware conﬁguration

Clusters can include a dual-controller RAID array, multiple bonded network channels, multiple paths between cluster members and storage, and redundant uninterruptible power supply (UPS) systems to ensure that no single failure results in application down time or loss of data.

Note

For information about using dm-multipath with Red Hat Cluster Suite, refer toAppendix C Multipath-usage.txt File for Red Hat Enterprise Linux 4 Update 3

Alternatively, a low-cost cluster can be set up to provide less availability than a nosingle-point-of-failure cluster. For example, you can set up a cluster with a singlecontroller RAID array and only a single Ethernet channel.

Certain low-cost alternatives, such as host RAID controllers, software RAID without cluster support, and multi-initiator parallel SCSI conﬁgurations are not compatible or appropriate for use as shared cluster storage.

Cluster conﬁguration and administration framework

Red Hat Cluster Manager allows you to easily conﬁgure and administer cluster services to make resources such as applications, server daemons, and shared data highly available. To create a cluster service, you specify the resources used in the cluster service as well as the properties of the cluster service, such as the cluster service name, application initialization (init) scripts, disk partitions, mount points, and the cluster nodes on which you prefer the cluster service to run. After you add a cluster service, the cluster management software stores the information in a cluster conﬁguration ﬁle, and the conﬁguration data is aggregated to all cluster nodes using the Cluster Conﬁg- uration System (or CCS), a daemon installed on each cluster node that allows retrieval of changes to the XML-based /etc/cluster/cluster.conf conﬁguration ﬁle.

Red Hat Cluster Manager provides an easy-to-use framework for database applications. For example, a database cluster service serves highly-available data to a database application. The application running on a cluster node provides network access to database client systems, such as Web applications. If the cluster service fails over to another node, the application can still access the shared database data. A

Page 19

Chapter 1. Red Hat Cluster Manager Overview 3

network-accessible database cluster service is usually assigned an IP address, which is failed over along with the cluster service to maintain transparent access for clients.

The cluster-service framework can also easily extend to other applications through the use of customized init scripts.

Cluster administration user interface

The Red Hat Cluster Suite management graphical user interface (GUI) facilitates the administration and monitoring tasks of cluster resources such as the following: creating, starting, and stopping cluster services; relocating cluster services from one node to another; modifying the cluster service conﬁguration; and monitoring the cluster nodes. The CMAN interface allows administrators to individually control the cluster on a per-node basis.

Failover domains

By assigning a cluster service to a restricted failover domain, you can limit the nodes that are eligible to run a cluster service in the event of a failover. (A cluster service that is assigned to a restricted failover domain cannot be started on a cluster node that is not included in that failover domain.) You can order the nodes in a failover domain by preference to ensure that a particular node runs the cluster service (as long as that node is active). If a cluster service is assigned to an unrestricted failover domain, the cluster service starts on any available cluster node (if none of the nodes of the failover domain are available).

Data integrity assurance

To ensure data integrity, only one node can run a cluster service and access clusterservice data at one time. The use of power switches in the cluster hardware conﬁguration enables a node to power-cycle another node before restarting that node’s cluster services during the failover process. This prevents any two systems from simultaneously accessing the same data and corrupting it. It is strongly recommended that fence devices (hardware or software solutions that remotely power, shutdown, and reboot cluster nodes) are used to guarantee data integrity under all failure conditions. Watchdog timers are an alternative used to ensure correct operation of cluster service failover.

Ethernet channel bonding

To monitor the health of the other nodes, each node monitors the health of the remote power switch, if any, and issues heartbeat pings over network channels. With Ethernet channel bonding, multiple Ethernet interfaces are conﬁgured to behave as one, reducing the risk of a single-point-of-failure in the typical switched Ethernet connection between systems.

Cluster-service failover capability

If a hardware or software failure occurs, the cluster takes the appropriate action to

Page 20

4 Chapter 1. Red Hat Cluster Manager Overview

maintain application availability and data integrity. For example, if a node completely fails, a healthy node (in the associated failover domain, if used) starts the service or services that the failed node was running prior to failure. Cluster services already running on the healthy node are not signiﬁcantly disrupted during the failover process.

Note

For Red Hat Cluster Suite 4, node health is monitored through a cluster network heartbeat. In previous versions of Red Hat Cluster Suite, node health was monitored on shared disk. Shared disk is not required for node-health monitoring in Red Hat Cluster Suite 4.

When a failed node reboots, it can rejoin the cluster and resume running the cluster service. Depending on how the cluster services are conﬁgured, the cluster can rebalance services among the nodes.

Manual cluster-service relocation capability

In addition to automatic cluster-service failover, a cluster allows you to cleanly stop cluster services on one node and restart them on another node. You can perform planned maintenance on a node system while continuing to provide application and data availability.

Event logging facility

To ensure that problems are detected and resolved before they affect cluster-service availability, the cluster daemons log messages by using the conventional Linux syslog subsystem.

Application monitoring

The infrastructure in a cluster monitors the state and health of an application. In this manner, should an application-speciﬁc failure occur, the cluster automatically restarts the application. In response to the application failure, the application attempts to be restarted on the node it was initially running on; failing that, it restarts on another cluster node. You can specify which nodes are eligible to run a cluster service by assigning a failover domain to the cluster service.

1.1.1. Red Hat Cluster Manager Subsystem Overview

Table 1-1 summarizes the GFS Software subsystems and their components.

Page 21

Chapter 1. Red Hat Cluster Manager Overview 5

Software Subsystem

Components Description

Cluster Conﬁguration Tool

system-config-cluster Command used to manage cluster

conﬁguration in a graphical setting.

Cluster Conﬁguration System (CCS)

ccs_tool Notiﬁes ccsd of an updated

cluster.conf ﬁle. Also, used for

upgrading a conﬁguration ﬁle from a Red Hat GFS 6.0 (or earlier) cluster to the format of the Red Hat Cluster Suite 4 conﬁguration ﬁle.

ccs_test Diagnostic and testing command

that is used to retrieve information from conﬁguration ﬁles through

ccsd.

ccsd CCS daemon that runs on all

cluster nodes and provides conﬁguration ﬁle data to cluster software.

Resource Group Manager (rgmanager)

clusvcadm Command used to manually

enable, disable, relocate, and restart user services in a cluster

clustat Command used to display the

status of the cluster, including node membership and services running.

clurgmgrd Daemon used to handle user

service requests including service start, service disable, service relocate, and service restart

Fence fence_ack_manual User interface for fence_manual

agent.

fence_apc Fence agent for APC power switch.

fence_bladecenter Fence agent for for IBM

Bladecenters with Telnet interface.

fence_brocade Fence agent for Brocade Fibre

Channel switch.

Page 22

6 Chapter 1. Red Hat Cluster Manager Overview

Software Subsystem

Components Description

fence_bullpap Fence agent for Bull Novascale

Platform Administration Processor (PAP) Interface.

fence_drac Fence agent for Dell Remote

Access Controller/Modular Chassis (DRAC/MC).

fence_egenera Fence agent used with Egenera

BladeFrame system.

fence_gnbd Fence agent used with GNBD

storage.

fence_ilo Fence agent for HP ILO interfaces

(formerly fence_rib).

fence_ipmilan Fence agent for Intelligent Platform

Management Interface (IPMI).

fence_manual Fence agent for manual interaction.

Note: Manual fencing is not supported for production environments.

fence_mcdata Fence agent for McData Fibre

Channel switch.

fence_node Command used by lock_gulmd

when a fence operation is required. This command takes the name of a node and fences it based on the node’s fencing conﬁguration.

fence_rps10 Fence agent for WTI Remote

Power Switch, Model RPS-10 (Only used with two-node clusters).

fence_rsa Fence agent for IBM Remote

Supervisor Adapter II (RSA II).

fence_sanbox2 Fence agent for SANBox2 Fibre

Channel switch.

fence_vixel Fence agent for Vixel Fibre

Channel switch.

Page 23

Chapter 1. Red Hat Cluster Manager Overview 7

Software Subsystem

Components Description

fence_wti Fence agent for WTI power switch.

fenced The fence daemon. Manages the

fence domain.

DLM libdlm.so.1.0.0 Library for Distributed Lock

Manager (DLM) support.

dlm.ko Kernel module that is installed on

cluster nodes for Distributed Lock Manager (DLM) support.

LOCK_GULM lock_gulm.o Kernel module that is installed on

GFS nodes using the LOCK_GULM lock module.

lock_gulmd Server/daemon that runs on each

node and communicates with all nodes in GFS cluster.

libgulm.so.xxx Library for GULM lock manager

support

gulm_tool Command that conﬁgures and

debugs the lock_gulmd server.

LOCK_NOLOCK lock_nolock.o Kernel module installed on a node

using GFS as a local ﬁle system.

GNBD gnbd.o Kernel module that implements the

GNBD device driver on clients.

gnbd_serv.o Kernel module that implements the

GNBD server. It allows a node to export local storage over the network.

gnbd_export Command to create, export and

manage GNBDs on a GNBD server.

gnbd_import Command to import and manage

GNBDs on a GNBD client.

Table 1-1. Red Hat Cluster Manager Software Subsystem Components

Page 24

8 Chapter 1. Red Hat Cluster Manager Overview

Page 25

Chapter 2.

Hardware Installation and Operating System Conﬁguration

To set up the hardware conﬁguration and install Red Hat Enterprise Linux, follow these steps:

• Choose a cluster hardware conﬁguration that meets the needs of applications and users;

refer to Section 2.1 Choosing a Hardware Conﬁguration.

• Set up and connect the members and the optional console switch and network switch or

hub; refer to Section 2.3 Setting Up the Nodes.

• Install and conﬁgure Red Hat Enterprise Linux on the cluster members; refer to Section

2.4 Installing and Conﬁguring Red Hat Enterprise Linux.

• Set up the remaining cluster hardware components and connect them to the members;

refer to Section 2.5 Setting Up and Connecting the Cluster Hardware.

After setting up the hardware conﬁguration and installing Red Hat Enterprise Linux, install the cluster software.

2.1. Choosing a Hardware Conﬁguration

The Red Hat Cluster Manager allows administrators to use commodity hardware to set up a cluster conﬁguration that meets the performance, availability, and data integrity needs of applications and users. Cluster hardware ranges from low-cost minimum conﬁgurations that include only the components required for cluster operation, to high-end conﬁgurations that include redundant Ethernet channels, hardware RAID, and power switches.

Regardless of conﬁguration, the use of high-quality hardware in a cluster is recommended, as hardware malfunction is a primary cause of system down time.

Although all cluster conﬁgurations provide availability, some conﬁgurations protect against every single point of failure. In addition, all cluster conﬁgurations provide data integrity, but some conﬁgurations protect data under every failure condition. Therefore, administrators must fully understand the needs of their computing environment and also the availability and data integrity features of different hardware conﬁgurations to choose the cluster hardware that meets the requirements.

When choosing a cluster hardware conﬁguration, consider the following:

Page 26

10 Chapter 2. Hardware Installation and Operating System Conﬁguration

Performance requirements of applications and users

Choose a hardware conﬁguration that provides adequate memory, CPU, and I/O resources. Be sure that the conﬁguration chosen can handle any future increases in workload as well.

Cost restrictions

The hardware conﬁguration chosen must meet budget requirements. For example, systems with multiple I/O ports usually cost more than low-end systems with fewer expansion capabilities.

Availability requirements

In a mission-critical production environment, a cluster hardware conﬁguration must protect against all single points of failure, including: disk, storage interconnect, Ethernet channel, and power failure. Environments that can tolerate an interruption in availability (such as development environments) may not require as much protection.

Data integrity under all failure conditions requirement

Using fence devices in a cluster conﬁguration ensures that service data is protected under every failure condition. These devices enable a node to power cycle another node before restarting its services during failover. Power switches protect against data corruption in cases where an unresponsive (or hung) node tries to write data to the disk after its replacement node has taken over its services.

If you are not using power switches in the cluster, cluster service failures can result in services being run on more than one node, which can cause data corruption. Refer to Section 2.5.2 Conﬁguring a Fence Device for more information about the beneﬁts of using power switches in a cluster. It is required that production environments use power switches in the cluster hardware conﬁguration.

2.1.1. Minimum Hardware Requirements

A minimum hardware conﬁguration includes only the hardware components that are required for cluster operation, as follows:

• At least two servers to run cluster services

• Ethernet connection for sending heartbeat pings and for client network access

• Network switch or hub to connect cluster nodes and resources

• A fence device

The hardware components described in Table 2-1 can be used to set up a minimum cluster conﬁguration. This conﬁguration does not ensure data integrity under all failure conditions, because it does not include power switches. Note that this is a sample conﬁguration; it is possible to set up a minimum conﬁguration using other hardware.

Page 27

Chapter 2. Hardware Installation and Operating System Conﬁguration 11

Warning

The minimum cluster conﬁguration is not a supported solution and should not be used in a production environment, as it does not ensure data integrity under all failure conditions.

Hardware Description

At least two server systems Each system becomes a node exclusively for use in the

cluster; system hardware requirements are similar to that of Red Hat Enterprise Linux 4.

One network interface card (NIC) for each node

One network interface connects to a hub or switch for cluster connectivity.

Network cables with RJ45 connectors

Network cables connect to the network interface on each node for client access and heartbeat packets.

RAID storage enclosure The RAID storage enclosure contains one controller with

at least two host ports.

Two HD68 SCSI cables Each cable connects one host bus adapter to one port on

the RAID controller, creating two single-initiator SCSI buses.

Table 2-1. Example of Minimum Cluster Conﬁguration

The minimum hardware conﬁguration is a cost-effective cluster conﬁguration for development purposes; however, it contains components that can cause service outages if failed. For example, if the RAID controller fails, then all cluster services become unavailable.

To improve availability, protect against component failure, and ensure data integrity under all failure conditions, more hardware is required. Refer to Table 2-2.

Problem Solution

Disk failure Hardware RAID to replicate data across multiple

disks

RAID controller failure Dual RAID controllers to provide redundant

access to disk data

Network interface failure Ethernet channel bonding and failover

Power source failure Redundant uninterruptible power supply (UPS)

systems

Machine failure Power switches

Page 28

12 Chapter 2. Hardware Installation and Operating System Conﬁguration

Table 2-2. Improving Availability and Data Integrity

Figure 2-1 illustrates a hardware conﬁguration with improved availability. This conﬁguration uses a fence device (in this case, a network-attached power switch) and the nodes are conﬁgured for Red Hat GFS storage attached to a Fibre Channel SAN switch. For more information about conﬁguring and using Red Hat GFS, refer to the Red Hat GFS Adminis- trator’s Guide.

Figure 2-1. Hardware Conﬁguration for Improved availability

A hardware conﬁguration that ensures data integrity under failure conditions can include the following components:

• At least two servers to run cluster services

• Switched Ethernet connection between each node for heartbeat pings and for client net-

work access

• Dual-controller RAID array or redundant access to SAN or other storage.

Page 29

Chapter 2. Hardware Installation and Operating System Conﬁguration 13

• Network power switches to enable each node to power-cycle the other nodes during the

failover process

• Ethernet interfaces conﬁgured to use channel bonding

• At least two UPS systems for a highly-available source of power

The components described in Table 2-3 can be used to set up a no single point of failure cluster conﬁguration that includes two single-initiator SCSI buses and power switches to ensure data integrity under all failure conditions. Note that this is a sample conﬁguration; it is possible to set up a no single point of failure conﬁguration using other hardware.

Hardware Description

Two servers (up to 16 supported)

Each node includes the following hardware: Two network interfaces for: Client network access

Fence device connection

One network switch A network switch enables the connection of multiple

nodes to a network.

Three network cables (each node)

Two cables to connect each node to the redundant network switches and a cable to connect to the fence device.

Two RJ45 to DB9 crossover cables

RJ45 to DB9 crossover cables connect a serial port on each node to the Cyclades terminal server.

Two power switches Power switches enable each node to power-cycle the other

node before restarting its services. Two RJ45 Ethernet cables for a node are connected to each switch.

FlashDisk RAID Disk Array with dual controllers

Dual RAID controllers protect against disk and controller failure. The RAID controllers provide simultaneous access to all the logical units on the host ports.

Two HD68 SCSI cables HD68 cables connect each host bus adapter to a RAID

enclosure "in" port, creating two single-initiator SCSI buses.

Two terminators Terminators connected to each "out" port on the RAID

enclosure terminate both single-initiator SCSI buses.

Redundant UPS Systems UPS systems provide a highly-available source of power.

The power cables for the power switches and the RAID enclosure are connected to two UPS systems.

Table 2-3. Example of a No Single Point of Failure Conﬁguration

Cluster hardware conﬁgurations can also include other optional hardware components that are common in a computing environment. For example, a cluster can include a network

Page 30

14 Chapter 2. Hardware Installation and Operating System Conﬁguration

switch or network hub, which enables the connection of the nodes to a network. A cluster may also include a console switch, which facilitates the management of multiple nodes and eliminates the need for separate monitors, mouses, and keyboards for each node.

One type of console switch is a terminal server, which enables connection to serial consoles and management of many nodes from one remote location. As a low-cost alternative, you can use a KVM (keyboard, video, and mouse) switch, which enables multiple nodes to share one keyboard, monitor, and mouse. A KVM switch is suitable for conﬁgurations in which access to a graphical user interface (GUI) to perform system management tasks is preferred.

When choosing a system, be sure that it provides the required PCI slots, network slots, and serial ports. For example, a no single point of failure conﬁguration requires multiple bonded Ethernet ports. Refer to Section 2.3.1 Installing the Basic Cluster Hardware for more information.

2.1.2. Choosing the Type of Fence Device

The Red Hat Cluster Manager implementation consists of a generic power management layer and a set of device-speciﬁc modules which accommodate a range of power management types. When selecting the appropriate type of fence device to deploy in the cluster, it is important to recognize the implications of speciﬁc device types.

Important

Use of a fencing method is an integral part of a production cluster environment. Conﬁguration of a cluster without a fence device is not supported.

Red Hat Cluster Manager supports several types of fencing methods, including network power switches, fabric switches, and Integrated Power Management hardware. Table 2-5 summarizes the supported types of fence devices and some examples of brands and models that have been tested with Red Hat Cluster Manager.

Ultimately, choosing the right type of fence device to deploy in a cluster environment depends on the data integrity requirements versus the cost and availability of external power switches.

2.2. Cluster Hardware Components

Use the following section to identify the hardware components required for the cluster conﬁguration.

Page 31

Chapter 2. Hardware Installation and Operating System Conﬁguration 15

Hardware Quantity Description Required

Cluster nodes

16 (maximum supported)

Each node must provide enough PCI slots, network slots, and storage adapters for the cluster hardware conﬁguration. Because attached storage devices must have the same device special ﬁle on each node, it is recommended that the nodes have symmetric I/O subsystems. It is also recommended that the processor speed and amount of system memory be adequate for the processes run on the cluster nodes. Refer to Section 2.3.1 Installing the Basic Cluster Hardware for more information.

Yes

Table 2-4. Cluster Node Hardware

Table 2-5 includes several different types of fence devices.

A single cluster requires only one type of power switch.

Type Description Models

Network-attached power switches.

Remote (LAN, Internet) fencing using RJ45 Ethernet connections and remote terminal access to the device.

APC MasterSwitch 92xx/96xx; WTI NPS-115/NPS-230, IPS-15, IPS-800/IPS-800-CE and TPS-2

Fabric Switches. Fence control interface

integrated in several models of fabric switches used for Storage Area Networks (SANs). Used as a way to fence a failed node from accessing shared data.

Brocade Silkworm 2x 00, McData Sphereon, Vixel 9200

Integrated Power Management Interfaces

Remote power management features in various brands of server systems; can be used as a fencing agent in cluster systems

HP Integrated Lights-out (iLO), IBM BladeCenter with ﬁrmware dated 7-22-04 or later

Table 2-5. Fence Devices

Table 2-7 through Table 2-8 show a variety of hardware components for an administrator to choose from. An individual cluster does not require all of the components listed in these tables.

Page 32

16 Chapter 2. Hardware Installation and Operating System Conﬁguration

Hardware Quantity Description Required

Network interface

One for each network connection

Each network connection requires a network interface installed in a node.

Yes

Network switch or hub

One A network switch or hub allows

connection of multiple nodes to a network.

Yes

Network cable

One for each network interface

A conventional network cable, such as a cable with an RJ45 connector, connects each network interface to a network switch or a network hub.

Yes

Table 2-6. Network Hardware Table

Hardware Quantity Description Required

Host bus adapter

One per node

To connect to shared disk storage, install either a parallel SCSI or a Fibre Channel host bus adapter in a PCI slot in each

cluster node. For parallel SCSI, use a low voltage differential (LVD) host bus adapter. Adapters have either HD68 or VHDCI connectors.

Yes

Page 33

Chapter 2. Hardware Installation and Operating System Conﬁguration 17

Hardware Quantity Description Required

External disk storage enclosure

At least one Use Fibre Channel or single-initiator

parallel SCSI to connect the cluster nodes

to a single or dual-controller RAID array.

To use single-initiator buses, a RAID

controller must have multiple host ports

and provide simultaneous access to all

the logical units on the host ports. To use

a dual-controller RAID array, a logical

unit must fail over from one controller to

the other in a way that is transparent to

the operating system.

SCSI RAID arrays that provide

simultaneous access to all logical units on

the host ports are recommended.

To ensure symmetry of device IDs and

LUNs, many RAID arrays with dual

redundant controllers must be conﬁgured

in an active/passive mode. Refer to Appendix A Supplementary Hardware Information for more information.

Yes

SCSI cable One per

node

SCSI cables with 68 pins connect each host bus adapter to a storage enclosure port. Cables have either HD68 or VHDCI connectors. Cables vary based on adapter type.

Only for parallel SCSI conﬁgurations

SCSI terminator

As required by hardware conﬁguration

For a RAID storage enclosure that uses "out" ports (such as FlashDisk RAID Disk Array) and is connected to single-initiator SCSI buses, connect terminators to the "out" ports to terminate the buses.

Only for parallel SCSI conﬁgurations and only as necessary for termination

Fibre Channel hub or switch

One or two A Fibre Channel hub or switch may be

required.

Only for some Fibre Channel conﬁgurations

Page 34

18 Chapter 2. Hardware Installation and Operating System Conﬁguration

Hardware Quantity Description Required

Fibre Channel cable

As required by hardware conﬁguration

A Fibre Channel cable connects a host bus adapter to a storage enclosure port, a Fibre Channel hub, or a Fibre Channel switch. If a hub or switch is used, additional cables are needed to connect the hub or switch to the storage adapter ports.

Only for Fibre Channel conﬁgurations

Table 2-7. Shared Disk Storage Hardware Table

Hardware Quantity Description Required

UPS system One or more Uninterruptible power supply (UPS)

systems protect against downtime if a power outage occurs. UPS systems are highly recommended for cluster operation. Connect the power cables for the shared storage enclosure and both power switches to redundant UPS systems. Note that a UPS system must be able to provide voltage for an adequate period of time, and should be connected to its own power circuit.

Strongly recommended for availability

Table 2-8. UPS System Hardware Table

Hardware Quantity Description Required

Terminal server

One A terminal server enables you to manage

many nodes remotely.

KVM switch One A KVM switch enables multiple nodes to

share one keyboard, monitor, and mouse. Cables for connecting nodes to the switch depend on the type of KVM switch.

Table 2-9. Console Switch Hardware Table

2.3. Setting Up the Nodes

After identifying the cluster hardware components described in Section 2.1 Choosing a Hardware Conﬁguration, set up the basic cluster hardware and connect the nodes to the

Page 35

Chapter 2. Hardware Installation and Operating System Conﬁguration 19

optional console switch and network switch or hub. Follow these steps:

1. In all nodes, install the required network adapters and host bus adapters. Refer to Section 2.3.1 Installing the Basic Cluster Hardware for more information about performing this task.

2. Set up the optional console switch and connect it to each node. Refer to Section 2.3.3 Setting Up a Console Switch for more information about performing this task.

If a console switch is not used, then connect each node to a console terminal.

3. Set up the network switch or hub and use network cables to connect it to the nodes and the terminal server (if applicable). Refer to Section 2.3.4 Setting Up a Network Switch or Hub for more information about performing this task.

After performing the previous tasks, install Red Hat Enterprise Linux as described in Section 2.4 Installing and Conﬁguring Red Hat Enterprise Linux.

2.3.1. Installing the Basic Cluster Hardware

Nodes must provide the CPU processing power and memory required by applications.

In addition, nodes must be able to accommodate the SCSI or Fibre Channel adapters, network interfaces, and serial ports that the hardware conﬁguration requires. Systems have a limited number of pre-installed serial and network ports and PCI expansion slots. Table 2-10 helps determine how much capacity the employed node systems require.

Cluster Hardware Component Serial

Ports

Ethernet Ports

PCI Slots

SCSI or Fibre Channel adapter to shared disk storage One for

each bus adapter

Network connection for client access and Ethernet heartbeat pings

One for each network connection

Point-to-point Ethernet connection for 2-node clusters (optional)

One for each connection

Terminal server connection (optional) One

Page 36

20 Chapter 2. Hardware Installation and Operating System Conﬁguration

Table 2-10. Installing the Basic Cluster Hardware

Most systems come with at least one serial port. If a system has graphics display capability, it is possible to use the serial console port for a power switch connection. To expand your serial port capacity, use multi-port serial PCI cards. For multiple-node clusters, use a network power switch.

Also, ensure that local system disks are not on the same SCSI bus as the shared disks. For example, use two-channel SCSI adapters, such as the Adaptec 39160-series cards, and put the internal devices on one channel and the shared disks on the other channel. Using multiple SCSI cards is also possible.

Refer to the system documentation supplied by the vendor for detailed installation information. Refer to Appendix A Supplementary Hardware Information for hardware-speciﬁc information about using host bus adapters in a cluster.

2.3.2. Shared Storage considerations

In a cluster, shared disks can be used to store cluster service data. Because this storage must be available to all nodes running the cluster service conﬁgured to use the storage, it cannot be located on disks that depend on the availability of any one node.

There are some factors to consider when setting up shared disk storage in a cluster:

• It is recommended to use a clustered ﬁle system such as Red Hat GFS to conﬁgure Red

Hat Cluster Manager storage resources, as it offers shared storage that is suited for highavailability cluster services. For more information about installing and conﬁguring Red Hat GFS, refer to the Red Hat GFS Administrator’s Guide.

• Whether you are using Red Hat GFS, local, or remote (for example, NFS) storage, it is

strongly recommended that you connect any storage systems or enclosures to redundant UPS systems for a highly-available source of power. Refer to Section 2.5.3 Conﬁguring UPS Systems for more information.

• The use of software RAID or Logical Volume Management (LVM) for shared storage is

not supported. This is because these products do not coordinate access to shared storage from multiple hosts. Software RAID or LVM may be used on non-shared storage on cluster nodes (for example, boot and system partitions, and other ﬁle systems that are not associated with any cluster services).

An exception to this rule is CLVM, the daemon and library that supports clustering of LVM2. CLVM allows administrators to conﬁgure shared storage for use as a resource in cluster services when used in conjunction with the CMAN cluster manager and the Distributed Lock Manager (DLM) mechanism for prevention of simultaneous node access to data and possible corruption. In addition, CLVM works with GULM as its cluster manager and lock manager.

Page 37

Chapter 2. Hardware Installation and Operating System Conﬁguration 21

• For remote ﬁle systems such as NFS, you may use gigabit Ethernet for improved band-

width over 10/100 Ethernet connections. Consider redundant links or channel bonding for improved remote ﬁle system availability. Refer to Section 2.5.1 Conﬁguring Ethernet Channel Bonding for more information.

• Multi-initiator SCSI conﬁgurations are not supported due to the difﬁculty in obtaining

proper bus termination. Refer to Appendix A Supplementary Hardware Information for more information about conﬁguring attached storage.

• A shared partition can be used by only one cluster service.

• Do not include any ﬁle systems used as a resource for a cluster service in the node’s

local /etc/fstab ﬁles, because the cluster software must control the mounting and unmounting of service ﬁle systems.

• For optimal performance of shared ﬁle systems, make sure to specify a 4 KB block size

with the mke2fs -b command. A smaller block size can cause long fsck times. Refer to Section 2.5.3.2 Creating File Systems.

After setting up the shared disk storage hardware, partition the disks and create ﬁle systems on the partitions. Refer to Section 2.5.3.1 Partitioning Disks, and Section 2.5.3.2 Creating File Systems for more information on conﬁguring disks.

2.3.3. Setting Up a Console Switch

Although a console switch is not required for cluster operation, it can be used to facilitate node management and eliminate the need for separate monitors, mouses, and keyboards for each cluster node. There are several types of console switches.

For example, a terminal server enables connection to serial consoles and management of many nodes from a remote location. For a low-cost alternative, use a KVM (keyboard, video, and mouse) switch, which enables multiple nodes to share one keyboard, monitor, and mouse. A KVM switch is suitable for conﬁgurations in which GUI access to perform system management tasks is preferred.

Set up the console switch according to the documentation provided by the vendor.

After the console switch has been set up, connect it to each cluster node. The cables used depend on the type of console switch. For example, a Cyclades terminal server uses RJ45 to DB9 crossover cables to connect a serial port on each node to the terminal server.

2.3.4. Setting Up a Network Switch or Hub

A network switch or hub, although not required for operating a two-node cluster, can be used to facilitate cluster and client system network operations. Clusters of more than two nodes require a switch or hub.

Set up a network switch or hub according to the documentation provided by the vendor.

Page 38

22 Chapter 2. Hardware Installation and Operating System Conﬁguration

After setting up the network switch or hub, connect it to each node by using conventional network cables. A terminal server, if used, is connected to the network switch or hub through a network cable.

2.4. Installing and Conﬁguring Red Hat Enterprise Linux

After the setup of basic cluster hardware, proceed with installation of Red Hat Enterprise Linux on each node and ensure that all systems recognize the connected devices. Follow these steps:

1. Install Red Hat Enterprise Linux on all cluster nodes. Refer to Red Hat Enterprise Linux Installation Guide for instructions.

In addition, when installing Red Hat Enterprise Linux, it is strongly recommended to do the following:

• Gather the IP addresses for the nodes and for the bonded Ethernet ports, before

installing Red Hat Enterprise Linux. Note that the IP addresses for the bonded Ethernet ports can be private IP addresses, (for example, 10.x.x.x).

• Do not place local ﬁle systems (such as /, /etc, /tmp, and /var) on shared

disks or on the same SCSI bus as shared disks. This helps prevent the other cluster nodes from accidentally mounting these ﬁle systems, and also reserves the limited number of SCSI identiﬁcation numbers on a bus for cluster disks.

• Place /tmp and /var on different ﬁle systems. This may improve node perfor-

mance.

• When a node boots, be sure that the node detects the disk devices in the same order

in which they were detected during the Red Hat Enterprise Linux installation. If the devices are not detected in the same order, the node may not boot.

• When using certain RAID storage conﬁgured with Logical Unit Numbers (LUNs)

greater than zero, it may be necessary to enable LUN support by adding the following to /etc/modprobe.conf:

options scsi_mod max_scsi_luns=255

2. Reboot the nodes.

3. When using a terminal server, conﬁgure Red Hat Enterprise Linux to send console messages to the console port.

4. Edit the /etc/hosts ﬁle on each cluster node and include the IP addresses used in the cluster or ensure that the addresses are in DNS. Refer to Section 2.4.1 Editing the /etc/hosts File for more information about performing this task.

Page 39

Chapter 2. Hardware Installation and Operating System Conﬁguration 23

5. Decrease the alternate kernel boot timeout limit to reduce boot time for nodes. Refer to Section 2.4.2 Decreasing the Kernel Boot Timeout Limit for more information about performing this task.

6. Ensure that no login (or getty) programs are associated with the serial ports that are being used for the remote power switch connection (if applicable). To perform this task, edit the /etc/inittab ﬁle and use a hash symbol (#) to comment out the entries that correspond to the serial ports used for the remote power switch. Then, invoke the init q command.

7. Verify that all systems detect all the installed hardware:

• Use the dmesg command to display the console startup messages. Refer to Section

2.4.3 Displaying Console Startup Messages for more information about performing this task.

• Use the cat /proc/devices command to display the devices conﬁgured in the

kernel. Refer to Section 2.4.4 Displaying Devices Conﬁgured in the Kernel for more information about performing this task.

8. Verify that the nodes can communicate over all the network interfaces by using the

ping command to send test packets from one node to another.

9. If intending to conﬁgure Samba services, verify that the required RPM packages for Samba services are installed.

2.4.1. Editing the /etc/hosts File

The /etc/hosts ﬁle contains the IP address-to-hostname translation table. The

/etc/hosts ﬁle on each node must contain entries for IP addresses and associated

hostnames for all cluster nodes.

As an alternative to the /etc/hosts ﬁle, name services such as DNS or NIS can be used to deﬁne the host names used by a cluster. However, to limit the number of dependencies and optimize availability, it is strongly recommended to use the /etc/hosts ﬁle to deﬁne IP addresses for cluster network interfaces.

The following is an example of an /etc/hosts ﬁle on a node of a cluster that does not use DNS-assigned hostnames:

Page 40

24 Chapter 2. Hardware Installation and Operating System Conﬁguration

127.0.0.1 localhost.localdomain localhost

192.168.1.81 node1.example.com node1

193.186.1.82 node2.example.com node2

193.186.1.83 node3.example.com node3

The previous example shows the IP addresses and hostnames for three nodes (node1, node2, and node3),

Important

Do not assign the node hostname to the localhost (127.0.0.1) address, as this causes issues with the CMAN cluster management system.

Verify correct formatting of the local host entry in the /etc/hosts ﬁle to ensure that it does not include non-local systems in the entry for the local host. An example of an incorrect local host entry that includes a non-local system (server1) is shown next:

127.0.0.1 localhost.localdomain localhost server1

An Ethernet connection may not operate properly if the format of the /etc/hosts ﬁle is not correct. Check the /etc/hosts ﬁle and correct the ﬁle format by removing non-local systems from the local host entry, if necessary.

Note that each network adapter must be conﬁgured with the appropriate IP address and netmask.

The following example shows a portion of the output from the /sbin/ip addr list command on a cluster node:

2: eth0:BROADCAST,MULTICAST,UPmtu 1356 qdisc pfifo_fast qlen 1000

link/ether 00:05:5d:9a:d8:91 brd ff:ff:ff:ff:ff:ff inet 10.11.4.31/22 brd 10.11.7.255 scope global eth0 inet6 fe80::205:5dff:fe9a:d891/64 scope link

valid_lft forever preferred_lft forever

You may also add the IP addresses for the cluster nodes to your DNS server. Refer to the Red Hat Enterprise Linux System Administration Guide for information on conﬁguring DNS, or consult your network administrator.

2.4.2. Decreasing the Kernel Boot Timeout Limit

It is possible to reduce the boot time for a node by decreasing the kernel boot timeout limit. During the Red Hat Enterprise Linux boot sequence, the boot loader allows for specifying an alternate kernel to boot. The default timeout limit for specifying a kernel is ten seconds.

To modify the kernel boot timeout limit for a node, edit the appropriate ﬁles as follows:

Page 41

Chapter 2. Hardware Installation and Operating System Conﬁguration 25

When using the GRUB boot loader, the timeout parameter in /boot/grub/grub.conf should be modiﬁed to specify the appropriate number of seconds for the timeout parameter. To set this interval to 3 seconds, edit the parameter to the following:

timeout = 3

When using the LILO or ELILO boot loaders, edit the /etc/lilo.conf ﬁle (on x86 systems) or the elilo.conf ﬁle (on Itanium systems) and specify the desired value (in tenths of a second) for the timeout parameter. The following example sets the timeout limit to three seconds:

timeout = 30

To apply any changes made to the /etc/lilo.conf ﬁle, invoke the /sbin/lilo command.

On an Itanium system, to apply any changes made to the

/boot/efi/efi/redhat/elilo.conf ﬁle, invoke the /sbin/elilo command.

2.4.3. Displaying Console Startup Messages

Use the dmesg command to display the console startup messages. Refer to the dmesg(8) man page for more information.

The following example of output from the dmesg command shows that two external SCSI buses and nine disks were detected on the node. (Lines with backslashes display as one line on most screens):

May 22 14:02:10 storage3 kernel: scsi0 : Adaptec AHA274x/284x/294x \

(EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4 May 22 14:02:10 storage3 kernel: May 22 14:02:10 storage3 kernel: scsi1 : Adaptec AHA274x/284x/294x \

(EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4 May 22 14:02:10 storage3 kernel: May 22 14:02:10 storage3 kernel: scsi : 2 hosts. May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST39236LW Rev: 0004 May 22 14:02:11 storage3 kernel: Detected scsi disk sda at scsi0, channel 0, id 0, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdb at scsi1, channel 0, id 0, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdc at scsi1, channel 0, id 1, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdd at scsi1, channel 0, id 2, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sde at scsi1, channel 0, id 3, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdf at scsi1, channel 0, id 8, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001

Page 42

26 Chapter 2. Hardware Installation and Operating System Conﬁguration

May 22 14:02:11 storage3 kernel: Detected scsi disk sdg at scsi1, channel 0, id 9, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdh at scsi1, channel 0, id 10, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdi at scsi1, channel 0, id 11, lun 0 May 22 14:02:11 storage3 kernel: Vendor: Dell Model: 8 BAY U2W CU Rev: 0205 May 22 14:02:11 storage3 kernel: Type: Processor \

ANSI SCSI revision: 03

May 22 14:02:11 storage3 kernel: scsi1 : channel 0 target 15 lun 1 request sense \

failed, performing reset. May 22 14:02:11 storage3 kernel: SCSI bus is being reset for host 1 channel 0. May 22 14:02:11 storage3 kernel: scsi : detected 9 SCSI disks total.

The following example of the dmesg command output shows that a quad Ethernet card was detected on the node:

May 22 14:02:11 storage3 kernel: 3c59x.c:v0.99H 11/17/98 Donald Becker May 22 14:02:11 storage3 kernel: tulip.c:v0.91g-ppc 7/16/99 May 22 14:02:11 storage3 kernel: eth0: Digital DS21140 Tulip rev 34 at 0x9800, \

00:00:BC:11:76:93, IRQ 5. May 22 14:02:12 storage3 kernel: eth1: Digital DS21140 Tulip rev 34 at 0x9400, \

00:00:BC:11:76:92, IRQ 9. May 22 14:02:12 storage3 kernel: eth2: Digital DS21140 Tulip rev 34 at 0x9000, \

00:00:BC:11:76:91, IRQ 11. May 22 14:02:12 storage3 kernel: eth3: Digital DS21140 Tulip rev 34 at 0x8800, \

00:00:BC:11:76:90, IRQ 10.

2.4.4. Displaying Devices Conﬁgured in the Kernel

To be sure that the installed devices (such as network interfaces), are conﬁgured in the kernel, use the cat /proc/devices command on each node. For example:

Character devices:

1 mem 4 /dev/vc/0 4 tty 4 ttyS 5 /dev/tty 5 /dev/console 5 /dev/ptmx 6 lp

7 vcs 10 misc 13 input 14 sound 29 fb 89 i2c

116 alsa

Page 43

Chapter 2. Hardware Installation and Operating System Conﬁguration 27

128 ptm 136 pts 171 ieee1394 180 usb 216 rfcomm 226 drm 254 pcmcia

Block devices:

1 ramdisk

2 fd

3 ide0

8 sd

9 md 65 sd 66 sd 67 sd 68 sd 69 sd 70 sd 71 sd

128 sd 129 sd 130 sd 131 sd 132 sd 133 sd 134 sd 135 sd 253 device-mapper

The previous example shows:

• Onboard serial ports (ttyS)

• USB devices (usb)

• SCSI devices (sd)

2.5. Setting Up and Connecting the Cluster Hardware

After installing Red Hat Enterprise Linux, set up the cluster hardware components and verify the installation to ensure that the nodes recognize all the connected devices. Note that the exact steps for setting up the hardware depend on the type of conﬁguration. Refer to Section 2.1 Choosing a Hardware Conﬁguration for more information about cluster conﬁgurations.

To set up the cluster hardware, follow these steps:

Page 44

28 Chapter 2. Hardware Installation and Operating System Conﬁguration

1. Shut down the nodes and disconnect them from their power source.

2. When using power switches, set up the switches and connect each node to a power switch. Refer to Section 2.5.2 Conﬁguring a Fence Device for more information.

In addition, it is recommended to connect each power switch (or each node’s power cord if not using power switches) to a different UPS system. Refer to Section 2.5.3 Conﬁguring UPS Systems for information about using optional UPS systems.

3. Set up shared disk storage according to the vendor instructions and connect the nodes to the external storage enclosure. Refer to Section 2.3.2 Shared Storage considera- tions.

In addition, it is recommended to connect the storage enclosure to redundant UPS systems. Refer to Section 2.5.3 Conﬁguring UPS Systems for more information about using optional UPS systems.

4. Turn on power to the hardware, and boot each cluster node. During the boot-up process, enter the BIOS utility to modify the node setup, as follows:

• Ensure that the SCSI identiﬁcation number used by the host bus adapter is unique

for the SCSI bus it is attached to. Refer to Section A.3.4 SCSI Identiﬁcation Numbers for more information about performing this task.

• Enable or disable the onboard termination for each host bus adapter, as required by

the storage conﬁguration. Refer to Section A.3.2 SCSI Bus Termination for more information about performing this task.

• Enable the node to automatically boot when it is powered on.

5. Exit from the BIOS utility, and continue to boot each node. Examine the startup messages to verify that the Red Hat Enterprise Linux kernel has been conﬁgured and can recognize the full set of shared disks. Use the dmesg command to display console startup messages. Refer to Section 2.4.3 Displaying Console Startup Messages for more information about using the dmesg command.

6. Set up the bonded Ethernet channels, if applicable. Refer to Section 2.5.1 Conﬁgur- ing Ethernet Channel Bonding for more information.

7. Run the ping command to verify packet transmission between all cluster nodes.

Page 45

Chapter 2. Hardware Installation and Operating System Conﬁguration 29

2.5.1. Conﬁguring Ethernet Channel Bonding

Ethernet channel bonding in a no-single-point-of-failure cluster system allows for a fault tolerant network connection by combining two Ethernet devices into one virtual device. The resulting channel bonded interface ensures that in the event that one Ethernet device fails, the other device will become active. This type of channel bonding, called an active- backup policy allows connection of both bonded devices to one switch or can allow each Ethernet device to be connected to separate hubs or switches, which eliminates the single point of failure in the network hub/switch.

Channel bonding requires each cluster node to have two Ethernet devices installed. When it is loaded, the bonding module uses the MAC address of the ﬁrst enslaved network device and assigns that MAC address to the other network device if the ﬁrst device fails link detection.

To conﬁgure two network devices for channel bonding, perform the following:

1. Create a bonding devices in /etc/modprobe.conf. For example:

alias bond0 bonding options bonding miimon=100 mode=1

This loads the bonding device with the bond0 interface name, as well as passes options to the bonding driver to conﬁgure it as an active-backup master device for the enslaved network interfaces.

2. Edit the /etc/sysconfig/network-scripts/ifcfg-ethX conﬁguration ﬁle for both eth0 and eth1 so that the ﬁles show identical contents. For example:

DEVICE=ethX USERCTL=no ONBOOT=yes MASTER=bond0 SLAVE=yes BOOTPROTO=none

This will enslave ethX (replace X with the assigned number of the Ethernet devices) to the bond0 master device.

3. Create a network script for the bonding device (for example,

/etc/sysconfig/network-scripts/ifcfg-bond0), which would appear like

the following example:

DEVICE=bond0 USERCTL=no ONBOOT=yes BROADCAST=192.168.1.255 NETWORK=192.168.1.0 NETMASK=255.255.255.0 GATEWAY=192.168.1.1 IPADDR=192.168.1.10

4. Reboot the system for the changes to take effect.

Page 46

30 Chapter 2. Hardware Installation and Operating System Conﬁguration

2.5.2. Conﬁguring a Fence Device

Fence devices enable a node to power-cycle another node before restarting its services as part of the failover process. The ability to remotely disable a node ensures data integrity is maintained under any failure condition. Deploying a cluster in a production environment requires the use of a fence device. Only development (test) environments should use a conﬁguration without a fence device. Refer to Section 2.1.2 Choosing the Type of Fence Device for a description of the various types of power switches.

In a cluster conﬁguration that uses fence devices such as power switches, each node is connected to a switch through either a serial port (for two-node clusters) or network connection (for multi-node clusters). When failover occurs, a node can use this connection to power-cycle another node before restarting its services.

Fence devices protect against data corruption if an unresponsive (or hanging) node becomes responsive after its services have failed over, and issues I/O to a disk that is also receiving I/O from another node. In addition, if CMAN detects node failure, the failed node will be removed from the cluster. If a fence device is not used in the cluster, then a failed node may result in cluster services being run on more than one node, which can cause data corruption and possibly system crashes.

A node may appear to hang for a few seconds if it is swapping or has a high system workload. For this reason, adequate time is allowed prior to concluding that a node has failed.

If a node fails, and a fence device is used in the cluster, the fencing daemon power-cycles the hung node before restarting its services. This causes the hung node to reboot in a clean state and prevent it from issuing I/O and corrupting cluster service data.

When used, fence devices must be set up according to the vendor instructions; however, some cluster-speciﬁc tasks may be required to use them in a cluster. Consult the manufacturer documentation on conﬁguring the fence device. Note that the cluster-speciﬁc information provided in this manual supersedes the vendor information.

When cabling a physical fence device such as a power switch, take special care to ensure that each cable is plugged into the appropriate port and conﬁgured correctly. This is crucial because there is no independent means for the software to verify correct cabling. Failure to cable correctly can lead to an incorrect node being power cycled, fenced off from shared storage via fabric-level fencing, or for a node to inappropriately conclude that it has successfully power cycled a failed node.

2.5.3. Conﬁguring UPS Systems

Uninterruptible power supplies (UPS) provide a highly-available source of power. Ideally, a redundant solution should be used that incorporates multiple UPS systems (one per server). For maximal fault-tolerance, it is possible to incorporate two UPS systems per server as well as APC Automatic Transfer Switches to manage the power and shutdown management of the server. Both solutions are solely dependent on the level of availability desired.

Page 47

Chapter 2. Hardware Installation and Operating System Conﬁguration 31

It is not recommended to use a single UPS infrastructure as the sole source of power for the cluster. A UPS solution dedicated to the cluster is more ﬂexible in terms of manageability and availability.

A complete UPS system must be able to provide adequate voltage and current for a prolonged period of time. While there is no single UPS to ﬁt every power requirement, a solution can be tailored to ﬁt a particular conﬁguration.

If the cluster disk storage subsystem has two power supplies with separate power cords, set up two UPS systems, and connect one power switch (or one node’s power cord if not using power switches) and one of the storage subsystem’s power cords to each UPS system. A redundant UPS system conﬁguration is shown in Figure 2-2.

Figure 2-2. Redundant UPS System Conﬁguration

An alternative redundant power conﬁguration is to connect the power switches (or the nodes’ power cords) and the disk storage subsystem to the same UPS system. This is the most cost-effective conﬁguration, and provides some protection against power failure. However, if a power outage occurs, the single UPS system becomes a possible single point of failure. In addition, one UPS system may not be able to provide enough power to all the attached devices for an adequate amount of time. A single UPS system conﬁguration is shown in Figure 2-3.

Page 48

32 Chapter 2. Hardware Installation and Operating System Conﬁguration

Figure 2-3. Single UPS System Conﬁguration

Many vendor-supplied UPS systems include Red Hat Enterprise Linux applications that monitor the operational status of the UPS system through a serial port connection. If the battery power is low, the monitoring software initiates a clean system shutdown. As this occurs, the cluster software is properly stopped, because it is controlled by a SysV runlevel script (for example, /etc/rc.d/init.d/rgmanager).

Refer to the UPS documentation supplied by the vendor for detailed installation information.

2.5.3.1. Partitioning Disks

After shared disk storage has been set up, partition the disks so they can be used in the cluster. Then, create ﬁle systems or raw devices on the partitions.

Use parted to modify a disk partition table and divide the disk into partitions. While in

parted, use the p to display the partition table and the mkpart command to create new

partitions. The following example shows how to use parted to create a partition on disk:

• Invoke parted from the shell using the command parted and specifying an available

shared disk device. At the (parted) prompt, use the p to display the current partition table. The output should be similar to the following:

Disk geometry for /dev/sda: 0.000-4340.294 megabytes Disk label type: msdos Minor Start End Type Filesystem Flags

• Decide on how large of a partition is required. Create a partition of this size using the

mkpart command in parted. Although the mkpart does not create a ﬁle system, it

normally requires a ﬁle system type at partition creation time. parted uses a range on the disk to determine partition size; the size is the space between the end and the

Page 49

Chapter 2. Hardware Installation and Operating System Conﬁguration 33

beginning of the given range. The following example shows how to create two partitions of 20 MB each on an empty disk.

(parted) mkpart primary ext3 0 20 (parted) mkpart primary ext3 20 40 (parted) p Disk geometry for /dev/sda: 0.000-4340.294 megabytes Disk label type: msdos Minor Start End Type Filesystem Flags 1 0.030 21.342 primary 2 21.343 38.417 primary

• When more than four partitions are required on a single disk, it is necessary to create an

extended partition. If an extended partition is required, the mkpart also performs this task. In this case, it is not necessary to specify a ﬁle system type.

Note

Only one extended partition may be created, and the extended partition must be one of the four primary partitions.

(parted) mkpart extended 40 2000 (parted) p Disk geometry for /dev/sda: 0.000-4340.294 megabytes Disk label type: msdos Minor Start End Type Filesystem Flags 1 0.030 21.342 primary 2 21.343 38.417 primary 3 38.417 2001.952 extended

• An extended partition allows the creation of logical partitionsinside of it. The following

example shows the division of the extended partition into two logical partitions.

(parted) mkpart logical ext3 40 1000 (parted) p Disk geometry for /dev/sda: 0.000-4340.294 megabytes Disk label type: msdos Minor Start End Type Filesystem Flags 1 0.030 21.342 primary 2 21.343 38.417 primary 3 38.417 2001.952 extended 5 38.447 998.841 logical (parted) mkpart logical ext3 1000 2000 (parted) p Disk geometry for /dev/sda: 0.000-4340.294 megabytes Disk label type: msdos Minor Start End Type Filesystem Flags 1 0.030 21.342 primary 2 21.343 38.417 primary 3 38.417 2001.952 extended 5 38.447 998.841 logical

Page 50

34 Chapter 2. Hardware Installation and Operating System Conﬁguration

6 998.872 2001.952 logical

• A partition may be removed using parted’s rm command. For example:

(parted) rm 1 (parted) p Disk geometry for /dev/sda: 0.000-4340.294 megabytes Disk label type: msdos Minor Start End Type Filesystem Flags 2 21.343 38.417 primary 3 38.417 2001.952 extended 5 38.447 998.841 logical 6 998.872 2001.952 logical

• After all required partitions have been created, exit parted using the quit command.

If a partition was added, removed, or changed while both nodes are powered on and connected to the shared storage, reboot the other node for it to recognize the modiﬁcations. After partitioning a disk, format the partition for use in the cluster. For example, create the ﬁle systems for shared partitions. Refer to Section 2.5.3.2 Creating File Systems for more information on conﬁguring ﬁle systems.

For basic information on partitioning hard disks at installation time, refer to the Red Hat Enterprise Linux Installation Guide.

2.5.3.2. Creating File Systems

Use the mkfs command to create an ext3 ﬁle system. For example:

mke2fs -j -b 4096 /dev/sde3

For optimal performance of shared ﬁle systems, make sure to specify a 4 KB block size with the mke2fs -b command. A smaller block size can cause long fsck times.

Page 51

Chapter 3.

Installing and Conﬁguring Red Hat Cluster Suite Software

This chapter describes how to install and conﬁgure Red Hat Cluster Suite software and consists of the following sections:

• Section 3.1 Software Installation and Conﬁguration Tasks

• Section 3.2 Overview of the Cluster Conﬁguration Tool

• Section 3.3 Installing the Red Hat Cluster Suite Packages

• Section 3.4 Starting the Cluster Conﬁguration Tool

• Section 3.5 Naming The Cluster

• Section 3.6 Conﬁguring Fence Devices

• Section 3.7 Adding and Deleting Members

• Section 3.8 Conﬁguring a Failover Domain

• Section 3.9 Adding Cluster Resources

• Section 3.10 Adding a Cluster Service to the Cluster

• Section 3.11 Propagating The Conﬁguration File: New Cluster

• Section 3.12 Starting the Cluster Software

3.1. Software Installation and Conﬁguration Tasks

Installing and conﬁguring Red Hat Cluster Suite software consists of the following steps:

1. Installing Red Hat Cluster Suite software.

Refer to Section 3.3 Installing the Red Hat Cluster Suite Packages.

2. Starting the Cluster Conﬁguration Tool.

a. Creating a new conﬁguration ﬁle or using an existing one.

b. Choose locking: either DLM or GULM.

Refer to Section 3.4 Starting the Cluster Conﬁguration Tool.

3. Naming the cluster. Refer to Section 3.5 Naming The Cluster.

4. Creating fence devices. Refer to Section 3.6 Conﬁguring Fence Devices.

Page 52

36 Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software

5. Creating cluster members. Refer to Section 3.7 Adding and Deleting Members.

6. Creating failover domains. Refer to Section 3.8 Conﬁguring a Failover Domain.

7. Creating resources. Refer to Section 3.9 Adding Cluster Resources.

8. Creating cluster services.

Refer to Section 3.10 Adding a Cluster Service to the Cluster.

9. Propagating the conﬁguration ﬁle to the other nodes in the cluster.

Refer to Section 3.11 Propagating The Conﬁguration File: New Cluster.

10. Starting the cluster software. Refer to Section 3.12 Starting the Cluster Software.

3.2. Overview of the Cluster Conﬁguration Tool

The Cluster Conﬁguration Tool (Figure 3-1) is a graphical user interface (GUI) for creating, editing, saving, and propagating the cluster conﬁguration ﬁle,

/etc/cluster/cluster.conf. The Cluster Conﬁguration Tool is part of the Red

Hat Cluster Suite management GUI, (the system-config-cluster package) and is accessed by the Cluster Conﬁguration tab in the Red Hat Cluster Suite management GUI.

Page 53

Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software 37

Figure 3-1. Cluster Conﬁguration Tool

The Cluster Conﬁguration Tool uses a hierarchical structure to show relationships among components in the cluster conﬁguration. A triangle icon to the left of a component name indicates that the component has one or more subordinate components assigned to it. To expand or collapse the portion of the tree below a component, click the triangle icon.

The Cluster Conﬁguration Tool represents the cluster conﬁguration with the following components in the left frame:

• Cluster Nodes — Deﬁnes cluster nodes. Nodes are represented by name as subordinate

elements under Cluster Nodes. Using conﬁguration buttons at the bottom of the right frame (below Properties), you can add nodes, delete nodes, edit node properties, and conﬁgure fencing methods for each node.

• Fence Devices — Deﬁnes fence devices. Fence devices are represented as subordinate

elements under Fence Devices. Using conﬁguration buttons at the bottom of the right frame (below Properties), you can add fence devices, delete fence devices, and edit fence-device properties. Fence devices must be deﬁned before you can conﬁgure fencing (with the Manage Fencing For This Node button) for each node.

Page 54

38 Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software

• Managed Resources — Deﬁnes failover domains, resources, and services.

• Failover Domains — Use this section to conﬁgure one or more subsets of cluster

nodes used to run a service in the event of a node failure. Failover domains are represented as subordinate elements under Failover Domains. Using conﬁguration but- tons at the bottom of the right frame (below Properties), you can create failover domains (when Failover Domains is selected) or edit failover domain properties (when a failover domain is selected).

• Resources — Use this section to conﬁgure resources to be managed by the system.

Choose from the available list of ﬁle systems, IP addresses, NFS mounts and exports, and user-created scripts and conﬁgure them individually. Resources are represented as subordinate elements under Resources. Using conﬁguration buttons at the bottom of the right frame (below Properties), you can create resources (when Resources is selected) or edit resource properties (when a resource is selected).

• Services — Use this section to create and conﬁgure services that combine cluster

resources, nodes, and failover domains as needed. Services are represented as subordinate elements under Services. Using conﬁguration buttons at the bottom of the right frame (below Properties), you can create services (when Services is selected) or edit service properties (when a service is selected).

Warning

Do not manually edit the contents of the /etc/cluster/cluster.conf ﬁle without guidance from an authorized Red Hat representative or unless you fully understand the consequences of editing the /etc/cluster/cluster.conf ﬁle manually.

Figure 3-2 shows the hierarchical relationship among cluster conﬁguration components. The cluster comprises cluster nodes. The cluster nodes are connected to one or more fencing devices. Nodes can be separated by failover domains to a cluster service. The services comprise managed resources such as NFS exports, IP addresses, and shared GFS partitions. The structure is ultimately reﬂected in the /etc/cluster/cluster.conf XML structure. The Cluster Conﬁguration Tool provides a convenient way to create and manipulate the /etc/cluster/cluster.conf ﬁle.

Page 55

Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software 39

Figure 3-2. Cluster Conﬁguration Structure

3.3. Installing the Red Hat Cluster Suite Packages

You can install Red Hat Cluster Suite and (optionally install) Red Hat GFS RPMs automatically by running the up2date utility at each node for the Red Hat Cluster Suite and Red Hat GFS products.

Tip

You can access the Red Hat Cluster Suite and Red Hat GFS products by using Red Hat Network to subscribe to and access the channels containing the Red Hat Cluster Suite and Red Hat GFS packages. From the Red Hat Network channel, you can manage entitlements for your cluster nodes and upgrade packages for each node within the Red Hat Network Web-based interface. For more information on using Red Hat Network, visit http://rhn.redhat.com.

Page 56

40 Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software

To automatically install RPMs, follow these steps at each node:

1. Log on as the root user.

Note

The following steps specify using up2date installall with the --force option. Using the --force option includes kernels that are required for successful installation of Red Hat Cluster Suite and Red Hat GFS. (Without the --force option,

up2date skips kernels by default.)

2. Run up2date --force --installall=channel-label for Red Hat Cluster Suite. The following example shows running the command for i386 RPMs:

# up2date --force --installall=rhel-i386-as-4-cluster

3. (Optional) If you are installing Red Hat GFS, run up2date --force

--installall=channel-label for Red Hat GFS. The following example shows

running the command for i386 RPMs:

# up2date --force --installall=rhel-i386-as-4-gfs-6.1

Note

The preceding procedure accommodates most installation requirements. However, if your installation has extreme limitations on storage and RAM, refer to Appendix B Selectively Installing Red Hat Cluster Suite Packages for more detailed information about Red Hat Cluster Suite and Red Hat GFS RPM packages and customized installation of those packages.

3.4. Starting the Cluster Conﬁguration Tool

You can start the Cluster Conﬁguration Tool by logging in to a cluster node as root with the ssh -Y command and issuing the system-config-cluster command. For example, to start the Cluster Conﬁguration Tool on cluster node nano-01, do the following:

1. Log in to a cluster node and run system-config-cluster. For example:

$ssh -Y root@nano-01

. . .

Page 57

Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software 41

#system-config-cluster

a. If this is the ﬁrst time you have started the Cluster Conﬁguration Tool, the

program prompts you to either open an existing conﬁguration or create a new one. Click Create New Conﬁguration to start a new conﬁguration ﬁle (refer to Figure 3-3).

Figure 3-3. Starting a New Conﬁguration File

Note

The Cluster Management tab for the Red Hat Cluster Suite management GUI is available after you save the conﬁguration ﬁle with the Cluster Con- ﬁguration Tool, exit, and restart the the Red Hat Cluster Suite management GUI (system-config-cluster). (The Cluster Management tab displays the status of the cluster service manager, cluster nodes, and resources, and shows statistics concerning cluster service operation. To manage the cluster system further, choose the Cluster Conﬁguration tab.)

b. For a new conﬁguration, a Lock Method dialog box is displayed requesting

a choice of either the GULM or DLM lock method (and multicast address for DLM).

Page 58

42 Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software

Figure 3-4. Choosing a Lock Method

2. Starting the Cluster Conﬁguration Tool displays a graphical representation of the conﬁguration (Figure 3-5) as speciﬁed in the cluster conﬁguration ﬁle,

/etc/cluster/cluster.conf.

Page 59

Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software 43

Figure 3-5. The Cluster Conﬁguration Tool

3.5. Naming The Cluster

Naming the cluster consists of specifying a cluster name, a conﬁguration version (optional), and values for Post-Join Delay and Post-Fail Delay. Name the cluster as follows:

1. At the left frame, click Cluster.

2. At the bottom of the right frame (labeled Properties), click the Edit Cluster Prop- erties button. Clicking that button causes a Cluster Properties dialog box to be displayed. The Cluster Properties dialog box presents text boxes for Name, Con-

ﬁg Version, and two Fence Daemon Properties parameters: Post-Join Delay and Post-Fail Delay.

3. At the Name text box, specify a name for the cluster. The name should be descriptive enough to distinguish it from other clusters and systems on your network (for example, nfs_cluster or httpd_cluster). The cluster name cannot exceed 15 characters.

Page 60

44 Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software

Tip

Choose the cluster name carefully. The only way to change the name of a Red Hat cluster is to create a new cluster conﬁguration with the new name.

4. (Optional) The Conﬁg Version value is set to 1 by default and is automatically incremented each time you save your cluster conﬁguration. However, if you need to set it to another value, you can specify it at the Conﬁg Version text box.

5. Specify the Fence Daemon Properties parameters: Post-Join Delay and Post-Fail Delay.

a. The Post-Join Delay parameter is the number of seconds the fence daemon

(fenced) waits before fencing a node after the node joins the fence domain. The Post-Join Delay default value is 3. A typical setting for Post-Join Delay is between 20 and 30 seconds, but can vary according to cluster and network performance.

b. The Post-Fail Delay parameter is the number of seconds the fence daemon

(fenced) waits before fencing a node (a member of the fence domain) after the node has failed. The Post-Fail Delay default value is 0. Its value may be varied to suit cluster and network performance.

Note

For more information about Post-Join Delay and Post-Fail Delay, refer to the fenced(8) man page.

6. Save cluster conﬁguration changes by selecting File => Save.

3.6. Conﬁguring Fence Devices

Conﬁguring fence devices for the cluster consists of selecting one or more fence devices and specifying fence-device-dependent parameters (for example, name, IP address, login, and password).

To conﬁgure fence devices, follow these steps:

Page 61

Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software 45

1. Click Fence Devices. At the bottom of the right frame (labeled Properties), click the Add a Fence Device button. Clicking Add a Fence Device causes the Fence

Device Conﬁguration dialog box to be displayed (refer to Figure 3-6).

Figure 3-6. Fence Device Conﬁguration

2. At the Fence Device Conﬁguration dialog box, click the drop-down box under Add a New Fence Device and select the type of fence device to conﬁgure.

3. Specify the information in the Fence Device Conﬁguration dialog box according to the type of fence device. Refer to the following tables for more information.

Field Description

Name A name for the APC device connected to the cluster.

IP Address The IP address assigned to the device.

Password The password used to authenticate the connection to the device.

Table 3-1. Conﬁguring an APC Fence Device

Page 62

46 Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software

Field Description

Name A name for the Brocade device connected to the cluster.

IP Address The IP address assigned to the device.

Password The password used to authenticate the connection to the device.

Table 3-2. Conﬁguring a Brocade Fibre Channel Switch

Field Description

IP Address The IP address assigned to the PAP console.

Password The password used to authenticate the connection to the PAP

console.

Table 3-3. Conﬁguring a Bull Platform Administration Processor (PAP) Interface

Field Description

Name The name assigned to the DRAC.

IP Address The IP address assigned to the DRAC.

Password The password used to authenticate the connection to the DRAC.

Table 3-4. Conﬁguring a Dell Remote Access Controller/Modular Chassis (DRAC/MC) Interface

Field Description

Name A name for the BladeFrame device connected to the cluster.

CServer The hostname (and optionally the username in the form of

username@hostname) assigned to the device. Refer to the fence_egenera(8) man page.

Table 3-5. Conﬁguring an Egenera BladeFrame

Page 63

Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software 47

Field Description

Name A name for the GNBD device used to fence the cluster. Note that

the GFS server must be accessed via GNBD for cluster node fencing support.

Server The hostname of each GNBD to disable. For multiple hostnames,

separate each hostname with a space.

Table 3-6. Conﬁguring a Global Network Block Device (GNBD) fencing agent

Field Description

Name A name for the server with HP iLO support.

Password The password used to authenticate the connection to the device.

Hostname The hostname assigned to the device.

Table 3-7. Conﬁguring an HP Integrated Lights Out (iLO) card

Field Description

Name A name for the IBM Bladecenter device connected to the cluster.

IP Address The IP address assigned to the device.

Password The password used to authenticate the connection to the device.

Table 3-8. Conﬁguring an IBM Blade Center that Supports Telnet

Field Description

Name A name for the RSA device connected to the cluster.

IP Address The IP address assigned to the device.

Password The password used to authenticate the connection to the device.

Table 3-9. Conﬁguring an IBM Remote Supervisor Adapter II (RSA II)

Page 64

48 Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software

Field Description

IP Address The IP address assigned to the IPMI port.

commands to the given IPMI port.

Password The password used to authenticate the connection to the IPMI port.

Table 3-10. Conﬁguring an Intelligent Platform Management Interface (IPMI)

Field Description

Name A name to assign the Manual fencing agent. Refer to

fence_manual(8) for more information.

Table 3-11. Conﬁguring Manual Fencing

Note

Manual fencing is not supported for production environments.

Field Description

Name A name for the McData device connected to the cluster.

IP Address The IP address assigned to the device.

Password The password used to authenticate the connection to the device.

Table 3-12. Conﬁguring a McData Fibre Channel Switch

Field Description

Name A name for the WTI RPS-10 power switch connected to the cluster.

Device The device the switch is connected to on the controlling host (for

example, /dev/ttys2).

Port The switch outlet number.

Table 3-13. Conﬁguring an RPS-10 Power Switch (two-node clusters only)

Page 65

Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software 49

Field Description

Name A name for the SANBox2 device connected to the cluster.

IP Address The IP address assigned to the device.

Password The password used to authenticate the connection to the device.

Table 3-14. Conﬁguring a QLogic SANBox2 Switch

Field Description

Name A name for the Vixel switch connected to the cluster.

IP Address The IP address assigned to the device.

Password The password used to authenticate the connection to the device.

Table 3-15. Conﬁguring a Vixel SAN Fibre Channel Switch

Field Description

Name A name for the WTI power switch connected to the cluster.

IP Address The IP address assigned to the device.

Password The password used to authenticate the connection to the device.

Table 3-16. Conﬁguring a WTI Network Power Switch

4. Click OK.

5. Choose File => Save to save the changes to the cluster conﬁguration.

3.7. Adding and Deleting Members

The procedure to add a member to a cluster varies depending on whether the cluster is a newly-conﬁgured cluster or a cluster that is already conﬁgured and running. To add a member to a new cluster, refer to Section 3.7.1 Adding a Member to a Cluster. To add a member to an existing cluster, refer to Section 3.7.2 Adding a Member to a Running

Cluster. To delete a member from a cluster, refer to Section 3.7.3 Deleting a Member from a Cluster.

Page 66

50 Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software

3.7.1. Adding a Member to a Cluster

To add a member to a new cluster, follow these steps:

1. Click Cluster Node.

2. At the bottom of the right frame (labeled Properties), click the Add a Cluster Node button. Clicking that button causes a Node Properties dialog box to be displayed. For a DLM cluster, the Node Properties dialog box presents text boxes for Cluster

Node Name and Quorum Votes (refer to Figure 3-7). For a GULM cluster, the Node Properties dialog box presents text boxes for Cluster Node Name and Quorum Votes, and presents a checkbox for GULM Lockserver (refer to Figure 3-8).

Figure 3-7. Adding a Member to a New DLM Cluster

Figure 3-8. Adding a Member to a New GULM Cluster

3. At the Cluster Node Name text box, specify a node name. The entry can be a name or an IP address of the node on the cluster subnet.

Note

Each node must be on the same subnet as the node from which you are running the Cluster Conﬁguration Tool and must be deﬁned either in DNS or in the

/etc/hosts ﬁle of each cluster node.

Page 67

Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software 51

Note

The node on which you are running the Cluster ConﬁgurationTool must be explicitly added as a cluster member; the node is not automatically added to the cluster conﬁguration as a result of running the Cluster Conﬁguration Tool.

4. Optionally, at the Quorum Votes text box, you can specify a value; however in most conﬁgurations you can leave it blank. Leaving the Quorum Votes text box blank causes the quorum votes value for that node to be set to the default value of 1.

5. If the cluster is a GULM cluster and you want this node to be a GULM lock server, click the GULM Lockserver checkbox (marking it as checked).

6. Click OK.

7. Conﬁgure fencing for the node:

a. Click the node that you added in the previous step.

b. At the bottom of the right frame (below Properties), click Manage Fencing

For This Node. Clicking Manage Fencing For This Node causes the Fence Conﬁguration dialog box to be displayed.

c. At the Fence Conﬁguration dialog box, bottom of the right frame (below

Properties), click Add a New Fence Level. Clicking Add a New Fence Level

causes a fence-level element (for example, Fence-Level-1, Fence-Level-2, and so on) to be displayed below the node in the left frame of the Fence Conﬁgu- ration dialog box.

d. Click the fence-level element.

e. At the bottom of the right frame (below Properties), click Add a New Fence

to this Level. Clicking Add a New Fence to this Level causes the Fence Properties dialog box to be displayed.

f. At the Fence Properties dialog box, click the Fence Device Type drop-down

box and select the fence device for this node. Also, provide additional information required (for example, Port and Switch for an APC Power Device).

g. At the Fence Properties dialog box, click OK. Clicking OK causes a fence

device element to be displayed below the fence-level element.

h. To create additional fence devices at this fence level, return to step 6d. Other-

wise, proceed to the next step.

i. To create additional fence levels, return to step 6c. Otherwise, proceed to the

next step.

j. If you have conﬁgured all the fence levels and fence devices for this node, click

Close.

Page 68

52 Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software

8. Choose File => Save to save the changes to the cluster conﬁguration.

3.7.2. Adding a Member to a Running Cluster

The procedure for adding a member to a running cluster depends on whether the cluster contains only two nodes or more than two nodes. To add a member to a running cluster, follow the steps in one of the following sections according to the number of nodes in the cluster:

• For clusters with only two nodes —

Section 3.7.2.1 Adding a Member to a Running Cluster That Contains Only Two Nodes

• For clusters with more than two nodes —

Section 3.7.2.2 Adding a Member to a Running Cluster That Contains More Than Two Nodes

3.7.2.1. Adding a Member to a Running Cluster That Contains Only Two Nodes

To add a member to an existing cluster that is currently in operation, and contains only two nodes, follow these steps:

1. Add the node and conﬁgure fencing for it as in

Section 3.7.1 Adding a Member to a Cluster.

2. Click Send to Cluster to propagate the updated conﬁguration to other running nodes in the cluster.

3. Use the scp command to send the updated /etc/cluster/cluster.conf ﬁle from one of the existing cluster nodes to the new node.

4. At the Red Hat Cluster Suite management GUI Cluster Status Tool tab, disable each service listed under Services.

5. Stop the cluster software on the two running nodes by running the following commands at each node in this order:

a. service rgmanager stop

b. service gfs stop, if you are using Red Hat GFS

c. service clvmd stop

d. service fenced stop

e. service cman stop

f. service ccsd stop

Page 69

Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software 53

6. Start cluster software on all cluster nodes (including the added one) by running the following commands in this order:

a. service ccsd start

b. service cman start

c. service fenced start

d. service clvmd start

e. service gfs start, if you are using Red Hat GFS

f. service rgmanager start

7. Start the Red Hat Cluster Suite management GUI. At the Cluster Conﬁguration Tool tab, verify that the conﬁguration is correct. At the Cluster Status Tool tab verify that the nodes and services are running as expected.

3.7.2.2. Adding a Member to a Running Cluster That Contains More Than Two Nodes

To add a member to an existing cluster that is currently in operation, and contains more than two nodes, follow these steps:

1. Add the node and conﬁgure fencing for it as in

Section 3.7.1 Adding a Member to a Cluster.

2. Click Send to Cluster to propagate the updated conﬁguration to other running nodes in the cluster.

3. Use the scp command to send the updated /etc/cluster/cluster.conf ﬁle from one of the existing cluster nodes to the new node.

4. Start cluster services on the new node by running the following commands in this order:

a. service ccsd start

b. service lock_gulmd start or service cman start according to the

type of lock manager used

c. service fenced start (DLM clusters only)

d. service clvmd start

e. service gfs start, if you are using Red Hat GFS

f. service rgmanager start

Page 70

54 Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software

5. Start the Red Hat Cluster Suite management GUI. At the Cluster Conﬁguration Tool tab, verify that the conﬁguration is correct. At the Cluster Status Tool tab

verify that the nodes and services are running as expected.

3.7.3. Deleting a Member from a Cluster

To delete a member from an existing cluster that is currently in operation, follow these steps:

1. At one of the running nodes (not to be removed), run the Red Hat Cluster Suite management GUI. At the Cluster Status Tool tab, under Services, disable or relocate each service that is running on the node to be deleted.

2. Stop the cluster software on the node to be deleted by running the following commands at that node in this order:

a. service rgmanager stop

b. service gfs stop, if you are using Red Hat GFS

c. service clvmd stop

d. service fenced stop (DLM clusters only)

e. service lock_gulmd stop or service cman stop according to the

type of lock manager used

f. service ccsd stop

3. At the Cluster Conﬁguration Tool (on one of the running members), delete the member as follows:

a. If necessary, click the triangle icon to expand the Cluster Nodes property.

b. Select the cluster node to be deleted. At the bottom of the right frame (labeled

Properties), click the Delete Node button.

c. Clicking the Delete Node button causes a warning dialog box to be displayed

requesting conﬁrmation of the deletion (Figure 3-9).

Figure 3-9. Conﬁrm Deleting a Member

Page 71

Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software 55

d. At that dialog box, click Yes to conﬁrm deletion.

e. Propagate the updated conﬁguration by clicking the Send to Cluster button.

(Propagating the updated conﬁguration automatically saves the conﬁguration.)

4. Stop the cluster software on the all remaining running nodes (including GULM lockserver nodes for GULM clusters) by running the following commands at each node in this order:

a. service rgmanager stop

b. service gfs stop, if you are using Red Hat GFS

c. service clvmd stop

d. service fenced stop (DLM clusters only)

e. service lock_gulmd stop or service cman stop according to the

type of lock manager used

f. service ccsd stop

5. Start cluster software on all remaining cluster nodes (including the GULM lockserver nodes for a GULM cluster) by running the following commands in this order:

a. service ccsd start

b. service lock_gulmd start or service cman start according to the

type of lock manager used

c. service fenced start (DLM clusters only)

d. service clvmd start

e. service gfs start, if you are using Red Hat GFS

f. service rgmanager start

6. Start the Red Hat Cluster Suite management GUI. At the Cluster Conﬁguration Tool tab, verify that the conﬁguration is correct. At the Cluster Status Tool tab verify that the nodes and services are running as expected.

3.8. Conﬁguring a Failover Domain

A failover domain is a named subset of cluster nodes that are eligible to run a cluster service in the event of a node failure. A failover domain can have the following characteristics:

Page 72

56 Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software

• Unrestricted — Allows you to specify that a subset of members are preferred, but that a

cluster service assigned to this domain can run on any available member.

• Restricted — Allows you to restrict the members that can run a particular cluster service.

If none of the members in a restricted failover domain are available, the cluster service cannot be started (either manually or by the cluster software).

• Unordered — When a cluster service is assigned to an unordered failover domain, the

member on which the cluster service runs is chosen from the available failover domain members with no priority ordering.

• Ordered — Allows you to specify a preference order among the members of a failover

domain. The member at the top of the list is the most preferred, followed by the second member in the list, and so on.

By default, failover domains are unrestricted and unordered.

In a cluster with several members, using a restricted failover domain can minimize the work to set up the cluster to run a cluster service (such as httpd), which requires you to set up the conﬁguration identically on all members that run the cluster service). Instead of setting up the entire cluster to run the cluster service, you must set up only the members in the restricted failover domain that you associate with the cluster service.

Tip

To conﬁgure a preferred member, you can create an unrestricted failover domain comprising only one cluster member. Doing that causes a cluster service to run on that cluster member primarily (the preferred member), but allows the cluster service to fail over to any of the other members.

The following sections describe adding a failover domain, removing a failover domain, and removing members from a failover domain:

• Section 3.8.1 Adding a Failover Domain

• Section 3.8.2 Removing a Failover Domain

• Section 3.8.3 Removing a Member from a Failover Domain

3.8.1. Adding a Failover Domain

To add a failover domain, follow these steps:

1. At the left frame of the the Cluster Conﬁguration Tool, click Failover Domains.

Page 73

Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software 57

2. At the bottom of the right frame (labeled Properties), click the Create a Failover Domain button. Clicking the Create a Failover Domain button causes the Add Failover Domain dialog box to be displayed.

3. At the Add Failover Domain dialog box, specify a failover domain name at the Name for new Failover Domain text box and click OK. Clicking OK causes the Failover Domain Conﬁguration dialog box to be displayed (Figure 3-10).

Note

The name should be descriptive enough to distinguish its purpose relative to other names used in your cluster.

Figure 3-10. Failover Domain Conﬁguration: Conﬁguring a Failover Domain

4. Click the Available Cluster Nodes drop-down box and select the members for this failover domain.

5. To restrict failover to members in this failover domain, click (check) the Restrict

Failover To This Domains Members checkbox. (With Restrict Failover To This Domains Members checked, services assigned to this failover domain fail over only

to nodes in this failover domain.)

6. To prioritize the order in which the members in the failover domain assume control of a failed cluster service, follow these steps:

a. Click (check) the Prioritized List checkbox (Figure 3-11). Clicking Priori-

tized List causes the Priority column to be displayed next to the Member Node column.

Page 74

58 Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software

Figure 3-11. Failover Domain Conﬁguration: Adjusting Priority

b. For each node that requires a priority adjustment, click the node listed in the

Member Node/Priority columns and adjust priority by clicking one of the Adjust Priority arrows. Priority is indicated by the position in the Member Node column and the value in the Priority column. The node priorities are listed highest to lowest, with the highest priority node at the top of the Member Node column (having the lowest Priority number).

7. Click Close to create the domain.

8. At the Cluster Conﬁguration Tool, perform one of the following actions depending on whether the conﬁguration is for a new cluster or for one that is operational and running:

• New cluster — If this is a new cluster, choose File => Save to save the changes to

the cluster conﬁguration.

• Running cluster — If this cluster is operational and running, and you want to

propagate the change immediately, click the Send to Cluster button. Clicking Send to Cluster automatically saves the conﬁguration change. If you do not want

to propagate the change immediately, choose File => Save to save the changes to the cluster conﬁguration.

Page 75

Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software 59

3.8.2. Removing a Failover Domain

To remove a failover domain, follow these steps:

1. At the left frame of the the Cluster Conﬁguration Tool, click the failover domain that you want to delete (listed under Failover Domains).

2. At the bottom of the right frame (labeled Properties), click the Delete Failover Do- main button. Clicking the Delete Failover Domain button causes a warning dialog box do be displayed asking if you want to remove the failover domain. Conﬁrm that the failover domain identiﬁed in the warning dialog box is the one you want to delete and click Yes. Clicking Yes causes the failover domain to be removed from the list of failover domains under Failover Domains in the left frame of the Cluster Con- ﬁguration Tool.

3. At the Cluster Conﬁguration Tool, perform one of the following actions depending on whether the conﬁguration is for a new cluster or for one that is operational and running:

• New cluster — If this is a new cluster, choose File => Save to save the changes to

the cluster conﬁguration.

• Running cluster — If this cluster is operational and running, and you want to

propagate the change immediately, click the Send to Cluster button. Clicking Send to Cluster automatically saves the conﬁguration change. If you do not want

to propagate the change immediately, choose File => Save to save the changes to the cluster conﬁguration.

3.8.3. Removing a Member from a Failover Domain

To remove a member from a failover domain, follow these steps:

1. At the left frame of the the Cluster Conﬁguration Tool, click the failover domain that you want to change (listed under Failover Domains).

2. At the bottom of the right frame (labeled Properties), click the Edit Failover Do- main Properties button. Clicking the Edit Failover Domain Properties button causes the Failover Domain Conﬁguration dialog box to be displayed (Figure 3-

10).

3. At the Failover Domain Conﬁguration dialog box, in the Member Node column, click the node name that you want to delete from the failover domain and click the Remove Member from Domain button. Clicking Remove Member from Domain removes the node from the Member Node column. Repeat this step for each node that is to be deleted from the failover domain. (Nodes must be deleted one at a time.)

4. When ﬁnished, click Close.

Page 76

60 Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software

5. At the Cluster Conﬁguration Tool, perform one of the following actions depending

on whether the conﬁguration is for a new cluster or for one that is operational and running:

• New cluster — If this is a new cluster, choose File => Save to save the changes to

the cluster conﬁguration.

• Running cluster — If this cluster is operational and running, and you want to

propagate the change immediately, click the Send to Cluster button. Clicking Send to Cluster automatically saves the conﬁguration change. If you do not want

to propagate the change immediately, choose File => Save to save the changes to the cluster conﬁguration.

3.9. Adding Cluster Resources

To specify a device for a cluster service, follow these steps:

1. On the Resources property of the Cluster Conﬁguration Tool, click the Create

a Resource button. Clicking the Create a Resource button causes the Resource Conﬁguration dialog box to be displayed.

2. At the Resource Conﬁguration dialog box, under Select a Resource Type, click the

drop-down box. At the drop-down box, select a resource to conﬁgure. The resource options are described as follows:

GFS

Name — Create a name for the ﬁle system resource.

Mount Point — Choose the path to which the ﬁle system resource is mounted.

Device — Specify the device ﬁle associated with the ﬁle system resource.

Options — Options to pass to the

call

for the new ﬁle system.

File System ID — When creating a new ﬁle system resource, you can leave this ﬁeld blank. Leaving the ﬁeld blank causes a ﬁle system ID to be assigned automatically after you click OK at the Resource Conﬁguration dialog box. If you need to assign a ﬁle system ID explicitly, specify it in this ﬁeld.

Force Unmount checkbox — If checked, forces the ﬁle system to unmount. The default setting is unchecked.

File System

Name — Create a name for the ﬁle system resource.

mount

Page 77

Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software 61

File System Type — Choose the ﬁle system for the resource using the drop-

down menu.

Mount Point — Choose the path to which the ﬁle system resource is mounted.

Device — Specify the device ﬁle associated with the ﬁle system resource.

Options — Options to pass to the call for the new ﬁle system.

File System ID — When creating a new ﬁle system resource, you can leave

this ﬁeld blank. Leaving the ﬁeld blank causes a ﬁle system ID to be assigned automatically after you click OK at the Resource Conﬁguration dialog box. If you need to assign a ﬁle system ID explicitly, specify it in this ﬁeld.

Checkboxes — Specify mount and unmount actions when a service is stopped (for example, when disabling or relocating a service):

• Force unmount — If checked, forces the ﬁle system to unmount. The default

setting is unchecked.

• Reboot host node if unmount fails — If checked, reboots the node if un-

mounting this ﬁle system fails. The default setting is unchecked.

• Check ﬁle system before mounting — If checked, causes fsck to be run on

the ﬁle system before mounting it. The default setting is unchecked.

IP Address

IP Address — Type the IP address for the resource.

Monitor Link checkbox — Check the box to enable or disable link status mon-

itoring of the IP address resource

NFS Mount

Name — Create a symobolic name for the NFS mount.

Mount Point — Choose the path to which the ﬁle system resource is mounted.

Host — Specify the NFS server name.

Export Path — NFS export on the server.

NFS and NFS4 options — Specify NFS protocol:

• NFS — Speciﬁes using NFSv3 protocol. The default setting is NFS.

• NFS4 — Speciﬁes using NFSv4 protocol.

Options — NFS-speciﬁc options to pass to the

call for the new ﬁle sys-

tem. For more information, refer to the nfs(5) man page.

Force Unmount checkbox — If checked, forces the ﬁle system to unmount. The default setting is unchecked.

mount

Page 78

62 Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software

NFS Client

Name — Enter a name for the NFS client resource.

Target — Enter a target for the NFS client resource. Supported targets are host-

names, IP addresses (with wild-card support), and netgroups.

Read-Write and Read Only options — Specify the type of access rights for this NFS client resource:

• Read-Write — Speciﬁes that the NFS client has read-write access. The de-

fault setting is Read-Write.

• Read Only — Speciﬁes that the NFS client has read-only access.

Options — Additional client access rights. For more information, refer to the exports(5) man page, General Options

NFS Export

Name — Enter a name for the NFS export resource.

Script

Name — Enter a name for the custom user script.

File (with path) — Enter the path where this custom script is located (for ex-

ample, /etc/init.d/userscript)

Samba Service

Name — Enter a name for the Samba server.

Work Group — Enter the Windows workgroup name or Windows NT domain

of the Samba service.

Note

When creating or editing a cluster service, connect a Samba-service resource directly to the service, not to a resource within a service. That is, at the Ser-

vice Management dialog box, use either Create a new resource for this service or Add a Shared Resource to this service; do not use Attach a new Private Resource to the Selection or Attach a Shared Resource to the selection.

3. When ﬁnished, click OK.

4. Choose File => Save to save the change to the /etc/cluster/cluster.conf conﬁguration ﬁle.

Page 79

Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software 63

3.10. Adding a Cluster Service to the Cluster

To add a cluster service to the cluster, follow these steps:

1. At the left frame, click Services.

2. At the bottom of the right frame (labeled Properties), click the Create a Service button. Clicking Create a Service causes the Add a Service dialog box to be displayed.

3. At the Add a Service dialog box, type the name of the service in the Name text box and click OK. Clicking OK causes the Service Management dialog box to be displayed (refer to Figure 3-12).

Tip

Use a descriptive name that clearly distinguishes the service from other services in the cluster.

Figure 3-12. Adding a Cluster Service

4. If you want to restrict the members on which this cluster service is able to run, choose a failover domain from the Failover Domain drop-down box. (Refer to Section 3.8 Conﬁguring a Failover Domain for instructions on how to conﬁgure a failover domain.)

5. Autostart This Service checkbox — This is checked by default. If Autostart This Service is checked, the service is started automatically when a cluster is started and running. If Autostart This Service is not checked, the service must be started manually any time the cluster comes up from stopped state.

Page 80

64 Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software

6. Run Exclusive checkbox — This sets a policy wherein the service only runs on nodes that have no other services running on them. For example, for a very busy web server that is clustered for high availability, it would would be advisable to keep that service on a node alone with no other services competing for his resources — that is, Run Exclusive checked. On the other hand, services that consume few resources (like NFS and Samba), can run together on the same node without little concern over contention for resources. For those types of services you can leave the Run Exclusive unchecked.

7. Select a recovery policy to specify how the resource manager should recover from a service failure. At the upper right of the Service Management dialog box, there are three Recovery Policy options available:

• Restart — Restart the service in the node the service is currently located. The

default setting is Restart. If the service cannot be restarted in the the current node, the service is relocated.

• Relocate — Relocate the service before restarting. Do not restart the node where

the service is currently located.

• Disable — Do not restart the service at all.

8. Click the Add a Shared Resource to this service button and choose the a resource listed that you have conﬁgured in Section 3.9 Adding Cluster Resources.

Note

If you are adding a Samba-service resource, connect a Samba-service resource directly to the service, not to a resource within a service. That is, at the Service

Management dialog box, use either Create a new resource for this service or Add a Shared Resource to this service; do not use Attach a new Private Resource to the Selection or Attach a Shared Resource to the selection.

9. If needed, you may also create a private resource that you can create that becomes a subordinate resource by clicking on the Attach a new Private Resource to the Selection button. The process is the same as creating a shared resource described in Section 3.9 Adding Cluster Resources. The private resource will appear as a child to the shared resource to which you associated with the shared resource. Click the triangle icon next to the shared resource to display any private resources associated.

10. When ﬁnished, click OK.

11. Choose File => Save to save the changes to the cluster conﬁguration.

Page 81

Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software 65

Note

To verify the existence of the IP service resource used in a cluster service, you must use the /sbin/ip addr list command on a cluster node. The following output shows the

/sbin/ip addr list command executed on a node running a cluster service:

1: lo:



LOOPBACK,UPmtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host

valid_lft forever preferred_lft forever

2: eth0:



BROADCAST,MULTICAST,UPmtu 1356 qdisc pfifo_fast qlen 1000 link/ether 00:05:5d:9a:d8:91 brd ff:ff:ff:ff:ff:ff inet 10.11.4.31/22 brd 10.11.7.255 scope global eth0 inet6 fe80::205:5dff:fe9a:d891/64 scope link inet 10.11.4.240/22 scope global secondary eth0

valid_lft forever preferred_lft forever

3.11. Propagating The Conﬁguration File: New Cluster

For newly deﬁned clusters, you must propagate the conﬁguration ﬁle to the cluster nodes as follows:

1. Log in to the node where you created the conﬁguration ﬁle.

2. Using the scp command, copy the /etc/cluster/cluster.conf ﬁle to all nodes in the cluster.

Note

Propagating the cluster conﬁguration ﬁle this way is necessary for the ﬁrst time a cluster is created. Once a cluster is installed and running, the cluster conﬁguration ﬁle is propagated using the Red Hat cluster management GUI Send to Cluster button. For more information about propagating the cluster conﬁguration using the GUI Send to Cluster button, refer to Section 4.4 Modifying the Cluster Conﬁguration.

3.12. Starting the Cluster Software

After you have propagated the cluster conﬁguration to the cluster nodes you can either reboot each node or start the cluster software on each cluster node by running the following commands at each node in this order:

Page 82

66 Chapter 3. Installing and Conﬁguring Red Hat Cluster Suite Software

1. service ccsd start

2. service lock_gulmd start or service cman start according to the type of lock manager used

3. service fenced start (DLM clusters only)

4. service clvmd start

5. service gfs start, if you are using Red Hat GFS

6. service rgmanager start

Page 83

Chapter 4.

Cluster Administration

This chapter describes the various administrative tasks for maintaining a cluster after it has been installed and conﬁgured.

4.1. Overview of the Cluster Status Tool

The Cluster Status Tool is part of the Red Hat Cluster Suite management GUI, (the

system-config-cluster package) and is accessed by a tab in the Red Hat Cluster Suite

management GUI. The Cluster Status Tool displays the status of cluster members and ser- vices and provides control of cluster services.

The members and services displayed in the Cluster Status Tool are determined by the cluster conﬁguration ﬁle (/etc/cluster/cluster.conf). The cluster conﬁguration ﬁle is maintained via the Cluster Conﬁguration Tool in the cluster management GUI.

Warning

You can access the Cluster Status Tool by clicking the Cluster Management tab at the cluster management GUI to Figure 4-1).

Use the Cluster Status Tool to enable, disable, restart, or relocate a service. To enable a service, select the service in the Services area and click Enable. To disable a service, select the service in the Services area and click Disable. To restart a service, select the service in the Services area and click Restart. To relocate service from one member to another, drag the service to another member and drop the service onto that member. Relocating a member restarts the service on that member. (Relocating a service to its current member — that is, dragging a service to its current member and dropping the service onto that member — restarts the service.)

Page 84

68 Chapter 4. Cluster Administration

Figure 4-1. Cluster Status Tool

4.2. Displaying Cluster and Service Status

Monitoring cluster and application service status can help identify and resolve problems in the cluster environment. The following tools assist in displaying cluster status information:

• The Cluster Status Tool

• The clustat utility

Important

Members that are not running the cluster software cannot determine or report the status of other members of the cluster.

Page 85

Chapter 4. Cluster Administration 69

Cluster and service status includes the following information:

• Cluster member system status

• Service status and which cluster system is running the service or owns the service

The following tables describe how to analyze the status information shown by the Cluster Status Tool and the clustat utility.

Member Status Description

Member The node is part of the cluster.

Note: A node can be a member of a cluster; however, the node may be inactive and incapable of running services. For example, if rgmanager is not running on the node, but all other cluster software components are running in the node, the node appears as a Member in the Cluster Status Tool. However, without

rgmanager running, the node does not appear in the clustat

display.

Dead The member system is unable to participate as a cluster member.

The most basic cluster software is not running on the node.

Table 4-1. Member Status for the Cluster Status Tool

Member Status Description

Online The node is communicating with other nodes in the cluster.

Inactive The node is unable to communicate with the other nodes in the

cluster. If the node is inactive, clustat does not display the node. If rgmanager is not running in a node, the node is inactive.

Note: Although a node is inactive, it may still appear as a Member in the Cluster Status Tool. However, if the node is inactive, it is incapable of running services.

Table 4-2. Member Status for clustat

Page 86

70 Chapter 4. Cluster Administration

Service Status Description

Started The service resources are conﬁgured and available on the cluster

system that owns the service.

Pending The service has failed on a member and is pending start on

another member.

Disabled The service has been disabled, and does not have an assigned

owner. A disabled service is never restarted automatically by the cluster.

Stopped The service is not running; it is waiting for a member capable of

starting the service. A service remains in the stopped state if autostart is disabled.

Failed The service has failed to start on the cluster and cannot

successfully stop the service. A failed service is never restarted automatically by the cluster.

Table 4-3. Service Status

The Cluster Status Tool displays the current cluster status in the Services area and au- tomatically updates the status every 10 seconds. Additionally, you can display a snapshot of the current cluster status from a shell prompt by invoking the clustat utility. Example 4-1 shows the output of the clustat utility.

# clustat

Member Status: Quorate, Group Member

Member Name State ID

------ ---- ----- -tng3-2 Online 0x0000000000000002 tng3-1 Online 0x0000000000000001

Service Name Owner (Last) State

-------- ----- ----- ------ ----webserver (tng3-1 ) failed email tng3-2 started

Example 4-1. Output of clustat

Page 87

Chapter 4. Cluster Administration 71

To monitor the cluster and display status at speciﬁc time intervals from a shell prompt, invoke clustat with the -i time option, where time speciﬁes the number of seconds between status snapshots. The following example causes the clustat utility to display cluster status every 10 seconds:

#clustat -i 10

4.3. Starting and Stopping the Cluster Software

To start the cluster software on a member, type the following commands in this order:

1. service ccsd start

2. service lock_gulmd start or service cman start according to the type of lock manager used

3. service fenced start (DLM clusters only)

4. service clvmd start

5. service gfs start, if you are using Red Hat GFS

6. service rgmanager start

To stop the cluster software on a member, type the following commands in this order:

1. service rgmanager stop

2. service gfs stop, if you are using Red Hat GFS

3. service clvmd stop

4. service fenced stop (DLM clusters only)

5. service lock_gulmd stop or service cman stop according to the type of lock manager used

6. service ccsd stop

Stopping the cluster services on a member causes its services to fail over to an active member.

4.4. Modifying the Cluster Conﬁguration

To modify the cluster conﬁguration (the cluster conﬁguration ﬁle (/etc/cluster/cluster.conf), use the Cluster Conﬁguration Tool. For more information about using the Cluster Conﬁguration Tool, refer to Chapter 3 Installing and Conﬁguring Red Hat Cluster Suite Software.

Page 88

72 Chapter 4. Cluster Administration

Warning

Important

Although the Cluster Conﬁguration Tool provides a Quorum Votes parameter in the Properties dialog box of each cluster member, that parameter is intended only for use

during initial cluster conﬁguration. Furthermore, it is recommended that you retain the default Quorum Votes value of 1. For more information about using the Cluster Conﬁgu- ration Tool, refer to Chapter 3 Installing and Conﬁguring Red Hat Cluster Suite Software.

To edit the cluster conﬁguration ﬁle, click the Cluster Conﬁguration tab in the cluster conﬁguration GUI. Clicking the Cluster Conﬁguration tab displays a graphical representation of the cluster conﬁguration. Change the conﬁguration ﬁle according the the following steps:

1. Make changes to cluster elements (for example, create a service).

2. Propagate the updated conﬁguration ﬁle throughout the cluster by clicking Send to Cluster.

Note

The Cluster Conﬁguration Tool does not display the Send to Cluster button if the cluster is new and has not been started yet, or if the node from which you are running the Cluster Conﬁguration Tool is not a member of the cluster. If the Send to Cluster button is not displayed, you can still use the Cluster Conﬁguration Tool; however, you cannot propagate the conﬁguration. You can still save the conﬁguration ﬁle. For information about using the Cluster Conﬁguration Tool for a new cluster conﬁguration, refer to Chapter 3 Installing and Conﬁguring Red Hat Cluster Suite Software.

3. Clicking Send to Cluster causes a Warning dialog box to be displayed. Click Yes to save and propagate the conﬁguration.

4. Clicking Yes causes an Information dialog box to be displayed, conﬁrming that the current conﬁguration has been propagated to the cluster. Click OK.

5. Click the Cluster Management tab and verify that the changes have been propagated to the cluster members.

Page 89

Chapter 4. Cluster Administration 73

4.5. Backing Up and Restoring the Cluster Database

The Cluster Conﬁguration Tool automatically retains backup copies of the three most recently used conﬁguration ﬁles (besides the currently used conﬁguration ﬁle). Retaining the backup copies is useful if the cluster does not function correctly because of misconﬁguration and you need to return to a previous working conﬁguration.

Each time you save a conﬁguration ﬁle, the Cluster Conﬁguration Tool saves backup copies of the three most recently used conﬁguration ﬁles as

/etc/cluster/cluster.conf.bak.1, /etc/cluster/cluster.conf.bak.2,

and /etc/cluster/cluster.conf.bak.3. The backup ﬁle

/etc/cluster/cluster.conf.bak.1 is the newest backup, /etc/cluster/cluster.conf.bak.2 is the second newest backup, and /etc/cluster/cluster.conf.bak.3 is the third newest backup.

If a cluster member becomes inoperable because of misconﬁguration, restore the conﬁguration ﬁle according to the following steps:

1. At the Cluster Conﬁguration Tool tab of the Red Hat Cluster Suite management GUI, click File => Open.

2. Clicking File => Open causes the system-conﬁg-cluster dialog box to be displayed.

3. At the the system-conﬁg-cluster dialog box, select a backup ﬁle (for example,

/etc/cluster/cluster.conf.bak.1). Verify the ﬁle selection in the Selection

box and click OK.

4. Increment the conﬁguration version beyond the current working version number as follows:

a. Click Cluster => Edit Cluster Properties.

b. At the Cluster Properties dialog box, change the Conﬁg Version value and

click OK.

5. Click File => Save As.

6. Clicking File => Save As causes the system-conﬁg-cluster dialog box to be displayed.

7. At the the system-conﬁg-cluster dialog box, select

/etc/cluster/cluster.conf and click OK. (Verify the ﬁle selection in the

Selection box.)

8. Clicking OK causes an Information dialog box to be displayed. At that dialog box, click OK.

9. Propagate the updated conﬁguration ﬁle throughout the cluster by clicking Send to Cluster.

Page 90

74 Chapter 4. Cluster Administration

Note

10. Clicking Send to Cluster causes a Warning dialog box to be displayed. Click Yes

to propagate the conﬁguration.

11. Click the Cluster Management tab and verify that the changes have been propa-

gated to the cluster members.

4.6. Updating the Cluster Software

For information about updating the cluster software, contact an authorized Red Hat support representative.

4.7. Changing the Cluster Name

Although the Cluster Conﬁguration Tool provides a Cluster Properties dialog box with a cluster Name parameter, the parameter is intended only for use during initial cluster conﬁguration. The only way to change the name of a Red Hat cluster is to create a new cluster with the new name. For more information about using the Cluster Conﬁguration Tool, refer to Chapter 3 Installing and Conﬁguring Red Hat Cluster Suite Software.

4.8. Disabling the Cluster Software

It may become necessary to temporarily disable the cluster software on a cluster member. For example, if a cluster member experiences a hardware failure, you may want to reboot that member, but prevent it from rejoining the cluster to perform maintenance on the system.

Use the /sbin/chkconfig command to stop the member from joining the cluster at bootup as follows:

chkconfig --level 2345 rgmanager off chkconfig --level 2345 gfs off chkconfig --level 2345 clvmd off chkconfig --level 2345 fenced off

Page 91

Chapter 4. Cluster Administration 75

chkconfig --level 2345 lock_gulmd off chkconfig --level 2345 cman off chkconfig --level 2345 ccsd off

Once the problems with the disabled cluster member have been resolved, use the following commands to allow the member to rejoin the cluster:

chkconfig --level 2345 rgmanager on chkconfig --level 2345 gfs on chkconfig --level 2345 clvmd on chkconfig --level 2345 fenced on chkconfig --level 2345 lock_gulmd on chkconfig --level 2345 cman on chkconfig --level 2345 ccsd on

You can then reboot the member for the changes to take effect or run the following commands in the order shown to restart cluster software:

1. service ccsd start

2. service lock_gulmd start or service cman start according to the type of lock manager used

3. service fenced start (DLM clusters only)

4. service clvmd start

5. service gfs start, if you are using Red Hat GFS

6. service rgmanager start

4.9. Diagnosing and Correcting Problems in a Cluster

For information about diagnosing and correcting problems in a cluster, contact an authorized Red Hat support representative.

Page 92

76 Chapter 4. Cluster Administration

Page 93

Chapter 5.

Setting Up Apache HTTP Server

This chapter contains instructions for conﬁguring Red Hat Enterprise Linux to make the Apache HTTP Server highly available.

The following is an example of setting up a cluster service that fails over an Apache HTTP Server. Although the actual variables used in the service depend on the speciﬁc conﬁguration, the example may assist in setting up a service for a particular environment.

5.1. Apache HTTP Server Setup Overview

First, conﬁgure Apache HTTP Server on all nodes in the cluster. If using a failover domain , assign the service to all cluster nodes conﬁgured to run the Apache HTTP Server. Refer to Section 3.8 Conﬁguring a Failover Domain for instructions. The cluster software ensures that only one cluster system runs the Apache HTTP Server at one time. The example conﬁguration consists of installing the httpd RPM package on all cluster nodes (or on nodes in the failover domain, if used) and conﬁguring a shared GFS shared resource for the Web content.

When installing the Apache HTTP Server on the cluster systems, run the following command to ensure that the cluster nodes do not automatically start the service when the system boots:

chkconfig --del httpd

Rather than having the system init scripts spawn the httpd daemon, the cluster infrastructure initializes the service on the active cluster node. This ensures that the corresponding IP address and ﬁle system mounts are active on only one cluster node at a time.

When adding an httpd service, a ﬂoating IP address must be assigned to the service so that the IP address will transfer from one cluster node to another in the event of failover or service relocation. The cluster infrastructure binds this IP address to the network interface on the cluster system that is currently running the Apache HTTP Server. This IP address ensures that the cluster node running httpd is transparent to the clients accessing the service.

The ﬁle systems that contain the Web content cannot be automatically mounted on the shared storage resource when the cluster nodes boot. Instead, the cluster software must mount and unmount the ﬁle system as the httpd service is started and stopped. This prevents the cluster systems from accessing the same data simultaneously, which may result in data corruption. Therefore, do not include the ﬁle systems in the /etc/fstab ﬁle.

Page 94

78 Chapter 5. Setting Up Apache HTTP Server

5.2. Conﬁguring Shared Storage

To set up the shared ﬁle system resource, perform the following tasks as root on one cluster system:

1. On one cluster node, use the interactive parted utility to create a partition to use for the document root directory. Note that it is possible to create multiple document root directories on different disk partitions. Refer to Section 2.5.3.1 Partitioning Disks for more information.

2. Use the mkfs command to create an ext3 ﬁle system on the partition you created in the previous step. Specify the drive letter and the partition number. For example:

mkfs -t ext3 /dev/sde3

3. Mount the ﬁle system that contains the document root directory. For example:

mount /dev/sde3 /var/www/html

Do not add this mount information to the /etc/fstab ﬁle because only the cluster software can mount and unmount ﬁle systems used in a service.

4. Copy all the required ﬁles to the document root directory.

5. If you have CGI ﬁles or other ﬁles that must be in different directories or in separate partitions, repeat these steps, as needed.

5.3. Installing and Conﬁguring the Apache HTTP Server

The Apache HTTP Server must be installed and conﬁgured on all nodes in the assigned failover domain, if used, or in the cluster. The basic server conﬁguration must be the same on all nodes on which it runs for the service to fail over correctly. The following example shows a basic Apache HTTP Server installation that includes no third-party modules or performance tuning.

On all node in the cluster (or nodes in the failover domain, if used), install the httpd RPM package. For example:

rpm -Uvh httpd-



version.arch.rpm

To conﬁgure the Apache HTTP Server as a cluster service, perform the following tasks:

1. Edit the /etc/httpd/conf/httpd.conf conﬁguration ﬁle and customize the ﬁle according to your conﬁguration. For example:

• Specify the directory that contains the HTML ﬁles. Also specify this mount point

when adding the service to the cluster conﬁguration. It is only required to change this ﬁeld if the mountpoint for the website’s content differs from the default setting of /var/www/html/. For example:

DocumentRoot "/mnt/httpdservice/html"

Page 95

Chapter 5. Setting Up Apache HTTP Server 79

• Specify a unique IP address to which the service will listen for requests. For ex-

ample:

Listen 192.168.1.100:80

This IP address then must be conﬁgured as a cluster resource for the service using the Cluster Conﬁguration Tool.

• If the script directory resides in a non-standard location, specify the directory that

contains the CGI programs. For example:

ScriptAlias /cgi-bin/ "/mnt/httpdservice/cgi-bin/"

• Specify the path that was used in the previous step, and set the access permissions

to default to that directory. For example:



Directory /mnt/httpdservice/cgi-bin"



AllowOverride None Options None Order allow,deny Allow from all



/Directory



Additional changes may need to be made to tune the Apache HTTP Server or add module functionality. For information on setting up other options, refer to the Red

Hat Enterprise Linux System Administration Guide and the Red Hat Enterprise Linux Reference Guide.

2. The standard Apache HTTP Server start script, /etc/rc.d/init.d/httpd is also used within the cluster framework to start and stop the Apache HTTP Server on the active cluster node. Accordingly, when conﬁguring the service, specify this script by adding it as a Script resource in the Cluster Conﬁguration Tool.

3. Copy the conﬁguration ﬁle over to the other nodes of the cluster (or nodes of the failover domain, if conﬁgured).

Before the service is added to the cluster conﬁguration, ensure that the Apache HTTP Server directories are not mounted. Then, on one node, invoke the Cluster Conﬁgura- tion Tool to add the service, as follows. This example assumes a failover domain named

httpd-domain was created for this service.

1. Add the init script for the Apache HTTP Server service.

• Select the Resources tab and click Create a Resource. The Resources Conﬁg-

ureation properties dialog box is displayed.

• Select Script form the drop down menu.

• Enter a Name to be associated with the Apache HTTP Server service.

• Specify the path to the Apache HTTP Server init script (for example,

/etc/rc.d/init.d/httpd) in the File (with path) ﬁeld.

• Click OK.

Page 96

80 Chapter 5. Setting Up Apache HTTP Server

2. Add a device for the Apache HTTP Server content ﬁles and/or custom scripts.

• Click Create a Resource.

• In the Resource Conﬁguration dialog, select File System from the drop-down

menu.

• Enter the Name for the resource (for example, httpd-content.

• Choose ext3 from the File System Type drop-down menu.

• Enter the mount point in the Mount Point ﬁeld (for example,

/var/www/html/).

• Enter the device special ﬁle name in the Device ﬁeld (for example, /dev/sda3).

3. Add an IP address for the Apache HTTP Server service.

• Click Create a Resource.

• Choose IP Address from the drop-down menu.

• Enter the IP Address to be associatged with the Apache HTTP Server service.

• Make sure that the Monitor Link checkbox is left checked.

• Click OK.

4. Click the Services property.

5. Create the Apache HTTP Server service.

• Click Create a Service. Type a Name for the service in the Add a Service dialog.

• In the Service Management dialog, select a Failover Domain from the drop-

down menu or leave it as None.

• Click the Add a Shared Resource to this service button. From the available list,

choose each resource that you created in the previous steps. Repeat this step until all resources have been added.

• Click OK.

6. Choose File => Save to save your changes.

Page 97

II. Conﬁguring a Linux Virtual Server Cluster

Building a Linux Virtual Server (LVS) system offers highly-available and scalable solution for production services using specialized routing and load-balancing techniques conﬁgured through the Piranha Conﬁguration Tool. This part discusses the conﬁguration of highperformance systems and services with Red Hat Enterprise Linux and LVS.

This section is licensed under the Open Publication License, V1.0 or later. For details refer to the Copyright page.

Table of Contents

6. Introduction to Linux Virtual Server.........................................................................83

7. Linux Virtual Server Overview ..................................................................................85

8. Initial LVS Conﬁguration............................................................................................97

9. Setting Up a Red Hat Enterprise Linux LVS Cluster.............................................103

10. Conﬁguring the LVS Routers with Piranha Conﬁguration Tool.........................115

Page 98

Page 99

Chapter 6.

Introduction to Linux Virtual Server

Using Red Hat Enterprise Linux, it is possible to create highly available server clustering solutions able to withstand many common hardware and software failures with little or no interruption of critical services. By allowing multiple computers to work together in offering these critical services, system administrators can plan and execute system maintenance and upgrades without service interruption.

The chapters in this part guide you through the following steps in understanding and deploying a clustering solution based on the Red Hat Enterprise Linux Linux Virtual Server (LVS) technology:

• Explains the Linux Virtual Server technology used by Red Hat Enterprise Linux to create

a load-balancing cluster

• Explains how to conﬁgure a Red Hat Enterprise Linux LVS cluster

• Guides you through the Piranha Conﬁguration Tool, a graphical interface used for

conﬁguring and monitoring an LVS cluster

6.1. Technology Overview

Red Hat Enterprise Linux implements highly available server solutions via clustering. It is important to note that cluster computing consists of three distinct branches:

• Compute clustering (such as Beowulf) uses multiple machines to provide greater com-

puting power for computationally intensive tasks. This type of clustering is not addressed by Red Hat Enterprise Linux.

• High-availability (HA) clustering uses multiple machines to add an extra level of relia-

bility for a service or group of services.

• Load-balance clustering uses specialized routing techniques to dispatch trafﬁc to a pool

of servers.

Red Hat Enterprise Linux addresses the latter two types of clustering technology. Using a collection of programs to monitor the health of the systems and services in the cluster.

Note

The clustering technology included in Red Hat Enterprise Linux is not synonymous with fault tolerance. Fault tolerant systems use highly specialized and often very expensive

Page 100

84 Chapter 6. Introduction to Linux Virtual Server

hardware to implement a fully redundant environment in which services can run uninterrupted by hardware failures.

However, fault tolerant systems do not account for operator and software errors which Red Hat Enterprise Linux can address through service redundancy. Also, since Red Hat Enterprise Linux is designed to run on commodity hardware, it creates an environment with a high level of system availability at a fraction of the cost of fault tolerant hardware.

6.2. Basic Conﬁgurations

While Red Hat Enterprise Linux can be conﬁgured in a variety of different ways, the conﬁgurations can be broken into two major categories:

• High-availability clusters using Red Hat Cluster Manager

• Load-balancing clusters using Linux Virtual Servers

This part explains what a load-balancing cluster system is and how to conﬁgure a loadbalancing system using Linux Virtual Servers on Red Hat Enterprise Linux.

6.2.1. Load-Balancing Clusters Using Linux Virtual Servers

To an outside user accessing a hosted service (such as a website or database application), a Linux Virtual Server (LVS) cluster appears as one server. In reality, however, the user is actually accessing a cluster of two or more servers behind a pair of redundant LVS routers that distribute client requests evenly throughout the cluster system. Load-balanced clustered services allow administrators to use commodity hardware and Red Hat Enterprise Linux to create continuous and consistent access to all hosted services while also addressing availability requirements.

An LVS cluster consists of at least two layers. The ﬁrst layer is composed of a pair of similarly conﬁgured Linux machines or cluster members. One of these machine acts as the LVS routers, conﬁgured to direct requests from the Internet to the cluster. The second layer consists of a cluster of machines called real servers. The real servers provide the critical services to the end-user while the LVS router balances the load on these servers.

For a detailed overview of LVS clustering, refer to Chapter 7 Linux Virtual Server Overview.