Redhat CLUSTER User Manual

Red Hat Cluster Manager
The Red Hat Cluster Manager Installation and
Administration Guide
ISBN: N/A
Red Hat, Inc.
1801 Varsity Drive Raleigh, NC 27606 USA +1 919 754 3700 (Voice) +1 919 754 3701 (FAX) 888 733 4281 (Voice) P.O. Box 13588 Research Triangle Park, NC 27709 USA
© 2002 Red Hat, Inc.
© 2000 Mission Critical Linux, Inc. © 2000 K.M. Sorenson rh-cm(EN)-1.0-Print-RHI (2002-04-17T17:16-0400) Permission is granted to copy, distribute and/or modify this documentunderthe terms of the GNU Free
Documentation License, Version 1.1 or any later version published by the Free Software Foundation. A copy of the license is included on the GNU Free Documentation License website.
Red Hat, Red Hat Network, the Red Hat "Shadow Man" logo, RPM, Maximum RPM, the RPM logo, Linux Library, PowerTools, Linux Undercover, RHmember, RHmember More, Rough Cuts, Rawhide and all Red Hat-based trademarks and logos are trademarks or registered trademarks of Red Hat, Inc. in the United States and other countries.
Linux is a registered trademark of Linus Torvalds. Motif and UNIX are registered trademarks of The Open Group. Itanium is a registered trademark of Intel Corporation. Netscape is a registered trademark of Netscape Communications Corporation in the United States and
other countries. Windows is a registered trademark of Microsoft Corporation. SSH and Secure Shell are trademarks of SSH Communications Security, Inc. FireWire is a trademark of Apple Computer Corporation. S/390 and zSeries are trademarks of International Business Machines Corporation. All other trademarks and copyrights referred to are the property of their respective owners.
ii
Acknowledgments
The Red Hat Cluster Manager software was originally based on the open source Kimberlite http://oss.missioncriticallinux.com/kimberlite/ cluster project which was developed by Mission Critical Linux, Inc.
Subsequent to its inception based on Kimberlite, developers at Red Hat have made a large number of enhancements and modifications. The following is a non-comprehensive list highlighting some of these enhancements.
Packagingand integration into theRed Hat installation paradigm in order to simplify the end user’s experience.
Addition of support for high availability NFS services.
Addition of support for high availability Samba services.
Addition of support for using watchdog timers as a data integrity provision
Addition of service monitoring which will automatically restart a failed application.
Rewrite of the service manager to facilitate additional cluster-wide operations.
Addition of the Red Hat Cluster Manager GUI, a graphical monitoring tool.
A set of miscellaneous bug fixes.
iii
The Red Hat Cluster Manager software incorporates STONITH compliant power switch modules from the Linux-HA project http://www.linux-ha.org/stonith/.
Contents
Red Hat Cluster Manager
Acknowledgments........... ..................... ..................... .................. . iii
Chapter 1 Introduction to Red Hat Cluster Manager ..... ...... 7
1.1 Cluster Overview ................................................................. 7
1.2 Cluster Features.................................................................. 9
1.3 How To Use This Manual ........................................................ 12
Chapter 2 Hardware Installation and Operating System
Configuration
2.1 Choosing a Hardware Configuration ........................................... 13
2.2 Steps for Setting Up the Cluster Systems ..................................... 30
2.3 Steps for Installing and Configuring the Red Hat Linux Distribution ........ 33
2.4 Steps for Setting Up and Connecting the Cluster Hardware ................. 39
Chapter 3 Cluster Software Installation and Configuration 55
3.1 Steps for Installing and Initializing the Cluster Software...................... 55
3.2 Checking the Cluster Configuration............................................. 62
3.3 Configuring syslog Event Logging .............................................. 65
3.4 Using the cluadmin Utility........................................................ 67
............... ..................... ..................... .... 13
Chapter 4 Service Configuration and Administration..... .... 73
4.1 Configuring a Service ............................................................ 73
4.2 Displaying a Service Configuration ............................................. 77
4.3 Disabling a Service............................................................... 79
4.4 Enabling a Service ............................................................... 79
4.5 Modifying a Service .............................................................. 80
4.6 Relocating a Service ............................................................. 80
4.7 Deleting a Service................................................................ 81
4.8 Handling Services that Fail to Start............................................. 81
iv
Chapter 5 Database Services................. ..................... ............ 83
5.1 Setting Up an Oracle Service ................................................... 83
5.2 Tuning Oracle Services .......................................................... 91
5.3 Setting Up a MySQL Service.................................................... 92
5.4 Setting Up a DB2 Service ....................................................... 96
Chapter 6 Network File Sharing Services....... ..................... . 103
6.1 Setting Up an NFS Service...................................................... 103
6.2 Setting Up a High Availability Samba Service ................................. 112
Chapter 7 Apache Services..... .................. ..................... ......... 123
7.1 Setting Up an Apache Service .................................................. 123
Chapter 8 Cluster Administration .. ..................... ................... 129
8.1 Displaying Cluster and Service Status ......................................... 129
8.2 Starting and Stopping the Cluster Software ................................... 132
8.3 Removing a Cluster Member.................................................... 132
8.4 Modifying the Cluster Configuration ............................................ 133
8.5 Backing Up and Restoring the Cluster Database ............................. 133
8.6 Modifying Cluster Event Logging ............................................... 134
8.7 Updating the Cluster Software .................................................. 135
8.8 Reloading the Cluster Database ................................................ 136
8.9 Changing the Cluster Name..................................................... 136
8.10 Reinitializing the Cluster ......................................................... 136
8.11 Disabling the Cluster Software.................................................. 137
8.12 Diagnosing and Correcting Problems in a Cluster ............................ 137
Chapter 9 Configuring and using the Red Hat Cluster
Manager GUI
9.1 Setting up the JRE ............................................................... 145
9.2 Configuring Cluster Monitoring Parameters ................................... 146
9.3 Enabling the Web Server ........................................................ 147
9.4 Starting the Red Hat Cluster Manager GUI.................................... 147
............... ..................... ..................... ...... 145
v
Appendix A Supplementary Hardware Information .. .............. 151
A.1 Setting Up Power Switches...................................................... 151
A.2 SCSI Bus Configuration Requirements ........................................ 160
A.3 SCSI Bus Termination............................................................ 161
A.4 SCSI Bus Length ................................................................. 162
A.5 SCSI Identification Numbers .................................................... 162
A.6 Host Bus Adapter Features and Configuration Requirements............... 163
A.7 Tuning the Failover Interval...................................................... 168
Appendix B Supplementary Software Information ..... ............ 171
B.1 Cluster Communication Mechanisms .......................................... 171
B.2 Cluster Daemons ................................................................. 172
B.3 Failover and Recovery Scenarios............................................... 173
B.4 Cluster Database Fields ......................................................... 177
B.5 Using Red Hat Cluster Manager with Piranha................................. 179
vi
Section 1.1:Cluster Overview 7
1 Introduction to Red Hat Cluster Manager
The Red Hat Cluster Manager is a collection of technologies working together to provide data in­tegrity and the ability to maintain application availability in the event of a failure. Using redundant hardware, shared disk storage, power management, and robust cluster communication and application failover mechanisms, a cluster can meet the needs of the enterprise market.
Specially suited for database applications, network file servers, and World Wide Web (Web) servers with dynamic content, a cluster can also be used in conjunction with the Piranha load balancing cluster software, based on the Linux Virtual Server (LVS) project, to deploy a highly available e-commerce site that has complete data integrity and application availability, in addition to load balancing capabil­ities. See Section B.5, Using
Red Hat Cluster Manager
1.1 Cluster Overview
To set up a cluster, an administrator must connect the cluster systems (often referred to as member systems) to the cluster hardware, and configure the systems into the cluster environment. The foun-
dation of a cluster is an advanced host membership algorithm. This algorithm ensures that the cluster maintains complete data integrity at all times by using the following methods of inter-node commu­nication:
Quorum partitions on shared disk storage to hold system status
with Piranha for more information.
Ethernet and serial connections between the cluster systems for heartbeat channels
To make an application and data highly availablein a cluster, the administrator must configure a clus- ter service — a discrete group of service properties and resources, such as an application and shared disk storage. A service can be assigned an IP address to provide transparent client access to the ser­vice. For example, an administrator can set up a cluster service that provides clients with access to highly-available database application data.
Both cluster systems can run any service and access the service data on shared disk storage. However, each service can run on only one cluster system at a time, in order to maintain data integrity. Adminis­trators can set up an active-active configuration in which both cluster systems run different services, or a hot-standby configuration in which a primary cluster system runs all the services, and a backup cluster system takes over only if the primary system fails.
8 Chapter 1:Introduction to Red Hat Cluster Manager
Figure 1–1 Example Cluster
Figure 1–1, Example Cluster shows an example of a cluster in an active-active configuration. If a hardware or software failure occurs, the cluster will automatically restart the failed system’s ser-
vices on the functional cluster system. This service failover capability ensures that no data is lost, and there is little disruption to users. When the failed system recovers, the cluster can re-balance the services across the two systems.
In addition, a cluster administrator can cleanly stop the services running on a cluster system and then restart them on the other system. This service relocation capability enables the administrator to main­tain application and data availability when a cluster system requires maintenance.
Section 1.2:Cluster Features 9
1.2 Cluster Features
A cluster includes the following features:
No-single-point-of-failure hardware configuration Clusters can include a dual-controller RAID array, multiple network and serial communication
channels, and redundant uninterruptible power supply (UPS) systems to ensure that no single fail­ure results in application down time or loss of data.
Alternately, a low-cost cluster can be set up to provide less availability than a no-single-point-of­failure cluster. For example, an administrator can set up a cluster with a single-controller RAID array and only a single heartbeat channel.
Note
Certain low-cost alternatives, such as software RAID and multi-initiator parallel SCSI, are not compatible or appropriate for use on the shared cluster storage. Refer to Section 2.1, Choosing a Hardware Configura- tion, for more information.
Service configuration framework Clusters enable an administrator to easily configure individual services to make data and appli-
cations highly available. To create a service, an administrator specifies the resources used in the service and properties for the service, including the service name, application start and stop script, disk partitions, mount points, and the cluster system on which an administrator prefers to run the service. After the administrator adds a service, the cluster enters the information into the cluster database on shared storage, where it can be accessed by both cluster systems.
Thecluster provides an easy-to-use framework for databaseapplications. For example, a database serviceserves highly-available data to a database application. The application running on a cluster system provides network access to database client systems, such as Web servers. If the service fails over to another cluster system, the application can still access the shared database data. A network-accessible database service is usually assigned an IP address, which is failed over along with the service to maintain transparent access for clients.
The cluster service framework can be easily extended to other applications, as well.
Data integrity assurance To ensure data integrity, only one cluster system can run a service and access service data at one
time. Using power switches in the cluster configuration enable each cluster system to power-cycle the other cluster system before restarting its services during the failover process. This prevents
10 Chapter 1:Introduction to Red Hat Cluster Manager
the two systems from simultaneously accessing the same data and corrupting it. Although not required, it is recommended that power switches are used to guarantee data integrity under all failure conditions. Watchdog timers are an optional variety of power control to ensure correct operation of service failover.
Cluster administration user interface A user interface simplifies cluster administration and enables an administrator to easily create,
start, stop, relocate services, and monitor the cluster.
Multiple cluster communication methods To monitor the health of the other cluster system, each cluster system monitors the health of the
remote power switch, if any, and issues heartbeat pings over network and serial channels to mon­itor the health of the other cluster system. In addition, each cluster system periodically writes a timestamp and cluster state information to two quorum partitions located on shared disk storage. System state information includes whether the system is an active cluster member. Service state information includes whether the service is running and which cluster system is running the ser­vice. Each cluster system checks to ensure that the other system’s status is up to date.
To ensure correct cluster operation, if a system is unable to write to both quorum partitions at startup time, it will not be allowedto join the cluster. In addition, if a cluster system is not updating its timestamp, and if heartbeats to the system fail, the cluster system will be removed from the cluster.
Section 1.2:Cluster Features 11
Figure 1–2 Cluster Communication Mechanisms
Figure 1–2, Cluster Communication Mechanisms shows how systems communicate in a cluster configuration. Note that the terminal server used to access system consoles via serial ports is not a required cluster component.
Service failover capability If a hardware or software failure occurs, the cluster will take the appropriate action to maintain ap-
plication availability and data integrity. For example, if a cluster system completely fails, the other cluster system will restart its services. Services already running on this system are not disrupted.
When the failed system reboots and is able to write to the quorum partitions, it can rejoin the cluster and run services. Depending on how the services are configured, the cluster can re-balance the services across the two cluster systems.
12 Chapter 1:Introduction to Red Hat Cluster Manager
Manual service relocation capability In addition to automatic service failover, a cluster enables administrators to cleanly stop services
on one cluster system and restart them on the other system. This allows administrators to perform planned maintenance on a cluster system, while providing application and data availability.
Event logging facility To ensure that problems are detected and resolved before they affect service availability, the cluster
daemons log messages by using the conventional Linux syslog subsystem. Administrators can customize the severity level of the logged messages.
Application Monitoring The cluster services infrastructure can optionally monitor the state and health of an application. In
this manner, should an application-specific failure occur, the cluster will automatically restart the application. In response to the application failure, the application will attempt to be restarted on the member it was initially running on; failing that, it will restart on the other cluster member.
Status Monitoring Agent A cluster status monitoring agent is used to gather vital cluster and application state information.
This information is then accessible both locally on the cluster member as well as remotely. A graphical user interface can then display status information from multiple clusters in a manner which does not degrade system performance.
1.3 How To Use This Manual
This manual contains information about setting up the cluster hardware, and installing the Linux dis­tribution and the cluster software. These tasks are described in Chapter 2, Hardware Installation and Operating System Configuration and Chapter 3, Cluster Software Installation and Configuration.
For information about setting up and managing cluster services, see Chapter 4, Service Configuration and Administration. For information about managing a cluster, see Chapter 8, Cluster Administration.
Appendix A, Supplementary Hardware Information contains detailed configuration information on specific hardware devices and shared storage configurations. Appendix B, Supplementary Software Information contains background information on the cluster software and other related information.
Section 2.1:Choosing a Hardware Configuration 13
2 Hardware Installation and Operating System Configuration
To set up the hardware configuration and install the Linux distribution, follow these steps:
Choosea cluster hardware configurationthat meets the needs of applications and users, see Section
2.1, Choosing a Hardware Configuration.
Set up and connect the cluster systems and the optional console switch and network switch or hub, see Section 2.2, Steps for Setting Up the Cluster Systems.
Install and configure the Linux distribution on the cluster systems, see Section 2.3, Steps for In- stalling and Configuring the Red Hat Linux Distribution.
Set up the remaining cluster hardware components and connect them to the cluster systems, see Section 2.4, Steps for Setting Up and Connecting the Cluster Hardware.
After setting up the hardware configuration and installing the Linux distribution, installing the cluster software is possible.
2.1 Choosing a Hardware Configuration
The Red Hat Cluster Manager allows administrators to use commodity hardware to set up a cluster configuration that will meet the performance, availability,and data integrity needs of applications and users. Cluster hardware ranges from low-cost minimum configurations that include only the com­ponents required for cluster operation, to high-end configurations that include redundant heartbeat channels, hardware RAID, and power switches.
Regardlessofconfiguration, the use of high-qualityhardwarein a cluster is recommended, as hardware malfunction is the primary cause of system down time.
Although all cluster configurations provide availability,some configurations protect against every sin­gle point of failure. In addition, all cluster configurations provide data integrity, but some configura­tions protect data under every failure condition. Therefore, administrators must fully understand the needs of their computing environment and also the availability and data integrity features of different hardwareconfigurationsin order to choose the cluster hardware that will meet the proper requirements.
When choosing a cluster hardware configuration, consider the following:
Performance requirements of applications and users
Choose a hardware configuration that will provide adequate memory, CPU, and I/O resources. Be sure that the configuration chosen will be able to handle any future increases in workload, as well.
14 Chapter 2:Hardware Installation and Operating System Configuration
Cost restrictions
The hardware configuration chosen must meet budget requirements. For example, systems with multiple I/O ports usually cost more than low-end systems with less expansion capabilities.
Availability requirements
If a computing environment requires the highest degree of availability, such as a production en­vironment, then a cluster hardware configurationthat protects against all single points of failure, includingdisk, storage interconnect, heartbeatchannel, and power failures is recommended. En­vironments that can tolerate an interruption in availability, such as development environments, may not require as much protection. See Section 2.4.1, Configuring Heartbeat Channels, Sec­tion 2.4.3, Configuring UPS Systems, and Section 2.4.4, Configuring Shared Disk Storage for more information about using redundant hardware for high availability.
Data integrity under all failure conditions requirement
Using power switches in a cluster configuration guarantees that service data is protected under every failure condition. These devices enable a cluster system to power cycle the other cluster system before restarting its services during failover. Power switches protect against data cor­ruption if an unresponsive(or hanging) system becomes responsiveafter its services have failed over,and then issues I/O to a disk that is also receiving I/O from the other cluster system.
In addition, if a quorum daemon fails on a cluster system, the system is no longer able to monitor thequorumpartitions. If you are not usingpowerswitchesin the cluster,this error condition may result in services being run on more than one cluster system, which can cause data corruption. See Section 2.4.2, ConfiguringPower Switches for more information about the benefits of using powerswitches ina cluster. It is recommended thatproduction environmentsusepowerswitches or watchdog timers in the cluster configuration.
2.1.1 Shared Storage Requirements
The operation of the cluster depends on reliable, coordinated access to shared storage. In the event of hardware failure, it is desirable to be able to disconnect one member from the shared storage for repair without disrupting the other member. Shared storage is truly vital to the cluster configuration.
Testing has shown that it is difficult, if not impossible, to configure reliable multi-initiator parallel SCSI configurations at data rates above 80 MBytes/sec. using standard SCSI adapters. Further tests have shown that these configurations can not support online repair because the bus does not work reliably when the HBA terminators are disabled, and external terminators are used. For these reasons, multi-initiator SCSI configurations using standard adapters are not supported. Single-initiatorparallel SCSI buses, connected to multi-ported storage devices, or Fibre Channel, are required.
The Red Hat Cluster Manager requires that both cluster members have simultaneous access to the shared storage. Certain host RAID adapters are capable of providing this type of access to shared
Section 2.1:Choosing a Hardware Configuration 15
RAID units. These products require extensive testing to ensure reliable operation, especially if the shared RAID units are based on parallel SCSI buses. These products typically do not allow for online repair of a failed system. No host RAID adapters are currently certified with Red Hat Cluster Manager. Refer to the Red Hat web site at http://www.redhat.com for the most up-to-date supported hardware matrix.
The use of software RAID, or software Logical Volume Management (LVM), is not supported on shared storage. This is because these products do not coordinate access from multiple hosts to shared storage. SoftwareRAID or LVMmay be used on non-shared storage on cluster members (for example, boot and system partitions and other filesysytems which are not associated with any cluster services).
2.1.2 Minimum Hardware Requirements
A minimum hardware configuration includes only the hardware components that are required for cluster operation, as follows:
Two servers to run cluster services
Ethernet connection for a heartbeat channel and client network access
Shared disk storage for the cluster quorum partitions and service data.
See Section 2.1.5, Example of a Minimum Cluster Configuration for an example of this type of hard­ware configuration.
The minimum hardware configuration is the most cost-effective cluster configuration; however, it in­cludes multiple points of failure. For example, if the RAID controller fails, then all cluster services will be unavailable. When deploying the minimal hardware configuration, software watchdog timers should be configured as a data integrity provision.
To improve availability,protect against component failure, and guarantee data integrity under all fail­ure conditions, the minimum configuration can be expanded. Table 2–1, Improving Availability and Guaranteeing Data Integrity shows how to improve availability and guarantee data integrity:
Table 2–1 Improving Availability and Guaranteeing Data Integrity
Problem Solution
Disk failure Hardware RAID to replicate data across multiple disks. RAID controller failure Dual RAID controllers to provide redundant access to
disk data.
Heartbeat channel failure Point-to-point Ethernet or serial connection between
the cluster systems.
16 Chapter 2:Hardware Installation and Operating System Configuration
Problem Solution
Power source failure Redundant uninterruptible power supply (UPS) systems. Data corruption under all failure
conditions
A no-single-point-of-failure hardware configuration that guarantees data integrity under all failure conditions can include the following components:
Two servers to run cluster services
Ethernet connection between each system for a heartbeat channel and client network access
Dual-controller RAID array to replicate quorum partitions and service data
Two power switches to enable each cluster system to power-cycle the other system during the failover process
Point-to-point Ethernet connection between the cluster systems for a redundant Ethernet heartbeat channel
Point-to-point serial connection between the cluster systems for a serial heartbeat channel
Two UPS systems for a highly-available source of power
See Section 2.1.6, Example of a No-Single-Point-Of-FailureConfiguration for an example of this type of hardware configuration.
Cluster hardware configurations can also include other optional hardware components that are com­mon in a computing environment. For example, a cluster can include a network switch or network hub, which enables the connection of the cluster systems to a network. A cluster may also include a console switch, which facilitates the management of multiple systems and eliminates the need for separate monitors, mouses, and keyboards for each cluster system.
One type of console switch is a terminal server, which enables connection to serial consoles and management of many systems from one remote location. As a low-cost alternative, you can use a KVM (keyboard, video, and mouse) switch, which enables multiple systems to share one keyboard, monitor,andmouse. A KVM is suitable forconfigurationsin which access to a graphical user interface (GUI) to perform system management tasks is preferred.
When choosing a cluster system, be sure that it provides the PCI slots, network slots, and serial ports that the hardware configuration requires. For example, a no-single-point-of-failure configuration re­quires multiple serial and Ethernet ports. Ideally, choose cluster systems that have at least two serial ports. See Section 2.2.1, Installing the Basic System Hardware for more information.
Power switches or hardware-based watchdog timers
Section 2.1:Choosing a Hardware Configuration 17
2.1.3 Choosing the Type of Power Controller
The Red Hat Cluster Manager implementation consists of a generic power management layer and a set of device specific modules which accommodate a range of power management types. When se­lecting the appropriate type of power controller to deploy in the cluster, it is important to recognize the implications of specific device types. The following describes the types of supported power switches followed by a summary table. For a more detailed description of the role a power switch plays to ensure data integrity, refer to Section 2.4.2, Configuring Power Switches.
Serial- and Network-attached power switches are separate devices which enable one cluster member to power cycle another member. They resemble a power plug strip on which individual outlets can be turned on and off under software control through either a serial or network cable.
Watchdog timers providea means for failed systems to remove themselves from the cluster prior to an­other system taking over its services, rather than allowing one cluster member to power cycle another. The normal operational mode for watchdog timers is that the cluster software must periodically reset a timer prior to its expiration. If the cluster software fails to reset the timer, the watchdog will trigger under the assumption that the system may have hung or otherwise failed. The healthy cluster member allowsa window of time to pass prior to concluding that another cluster member has failed (by default, this window is 12 seconds). The watchdog timer interval must be less than the duration of time for one cluster member to conclude that another has failed. In this manner, a healthy system can assume that prior to taking over services for a failed cluster member, that it has safely removed itself from the cluster (by rebooting) and therefore is no risk to data integrity. The underlying watchdog support is included in the core Linux kernel. Red Hat Cluster Manager utilizes these watchdog features via its standard APIs and configuration mechanism.
Thereare two types of watchdog timers: Hardware-basedand software-based. Hardware-based watch­dog timers typically consist of system board components such as the Intel circuitry has a high degree of independence from the main system CPU. This independence is benefi­cial in failure scenarios of a true system hang, as in this case it will pull down the system’s reset lead resulting in a system reboot. There are some PCI expansion cards that provide watchdog features.
®
i810 TCO chipset. This
The second type of watchdog timer is software-based. This category of watchdog does not have any dedicated hardware. The implementation is a kernel thread which is periodically run and if the timer duration has expired will initiate a system reboot. The vulnerability of the software watchdog timer is that under certain failure scenarios such as system hangs while interrupts are blocked, the kernel thread will not be called. As a result, in such conditions it can not be definitively depended on for data integrity. This can cause the healthy cluster member to take over services for a hung node which could cause data corruption under certain scenarios.
Finally,administrators can choose not to employ a powercontroller at all. If choosing the "None" type, note that there are no provisions for a cluster member to power cycle a failed member. Similarly, the failed member can not be guaranteed to reboot itself under all failure conditions. Deploying clusters
18 Chapter 2:Hardware Installation and Operating System Configuration
with a power controller type of "None" is useful for simple evaluation purposes, but because it affords the weakest data integrity provisions, it is not recommended for usage in a production environment.
Ultimately, the right type of power controller deployed in a cluster environment depends on the data integrity requirements weighed against the cost and availability of external power switches.
Table 2–2, Power Switches summarizes the types of supported power management modules and dis- cusses their advantages and disadvantages individually.
Table 2–2 Power Switches
Type Notes Pros Cons
Serial-attached power switches
Network­attached power switches
Hardware Watchdog Timer
Software Watchdog Timer
No power controller
Two serial attached power controllers are used in a cluster (one per member system)
A single network attached power controller is required per cluster
Affords strong data integrity guarantees
Offers acceptable data integrity provisions
No power controller function is in use
Affords strong data integrity guarantees. the power controller itself is not a single point of failure as there are two in a cluster.
Affords strong data integrity guarantees.
Obviates the need to purchase external power controller hardware
Obviates the need to purchase external power controller hardware; works on any system
Obviates the need to purchase external power controller hardware; works on any system
Requires purchase of power controller hardware and cables; consumes serial ports
Requires purchase of power controller hardware. The power controller itself can be come a single point of failure (although they are typically very reliable devices).
Not all systems include supported watchdog hardware
Under some failure scenarios, the software watchdog will not be operational, opening a small vulnerability window
Vulnerable to data corruption under certain failure scenarios
Section 2.1:Choosing a Hardware Configuration 19
2.1.4 Cluster Hardware Tables
Use the following tables to identify the hardware components required for your cluster configuration. In some cases, the tables list specific products that have been tested in a cluster, although a cluster is expected to work with other products.
The complete set of qualified cluster hardware components change over time. Consequently, the table below may be incomplete. For the most up-do-date itemization of supported hardware components, refer to the Red Hat documentation website at http://www.redhat.com/docs.
Table 2–3 Cluster System Hardware Table
Hardware Quantity Description Required
Cluster system
Two
Red Hat Cluster Manager supports IA-32 hardware platforms. Each cluster system must provide enough PCI slots, network slots, and serial ports for the cluster hardware configuration. Because disk devices must have the same name on each cluster system, it is recommended that the systems have symmetric I/O subsystems. In addition, it is recommended that each system have a minimum of 450 MHz CPU speed and 256 MB of memory. See Section 2.2.1, Installing the Basic System Hardware for more information.
Yes
Table 2–4, PowerSwitch Hardware Tableincludes several different types of power switches. A single cluster requires only one type of power switch shown below.
20 Chapter 2:Hardware Installation and Operating System Configuration
Table 2–4 Power Switch Hardware Table
Hardware Quantity Description Required
Serial power switches
Null modem cable
Two
Two
Power switches enable each cluster system to power-cycle the other cluster system. See Section 2.4.2, Configuring Power Switches for information about using power switches in a cluster. Note that clusters are configured with either serial or network attached power switches and not both. The following serial attached power switch has been fully tested: RPS-10 (model M/HD in the US, and model M/EC in Europe), which is available from http://www.wti.com/rps-10.htm. Refer to Section A.1.1, Setting up RPS-10 Power
Switches
Latent support is provided for the following serial attached power switch. This switch has not yet been fully tested: APC Serial On/Off Switch (partAP9211), http://www.apc.com
Null modem cables connect a serial port on a cluster system to a serial power switch. This serial connection enables each cluster system to power-cycle the other system. Some power switches may require different cables.
Strongly recommended for data integrity under all failure conditions
Only if using serial power switches
Mounting bracket
One Some power switches support rack mount
configurations and require a separate mounting bracket (e.g. RPS-10).
Only for rack mounting power switches
Section 2.1:Choosing a Hardware Configuration 21
Hardware Quantity Description Required
Network power switch
Watchdog Timer
One Network attached power switches
enable each cluster member to power cycle all others. Refer to Section
2.4.2, Configuring Power Switches for information about using network attached power switches, as well as caveats associated with each. The following network attached power switch has been fully tested:
· WTI NPS-115, or NPS-230, available from http://www.wti.com. Note that the NPS power switch can properly accommodate systems with dual redundant power supplies. Refer to Section A.1.2, Setting up WTI NPS Power Switches.
· Baytech RPC-3 and RPC-5, http://www.baytech.net Latent support is provided for the APC Master Switch (AP9211, or AP9212), www.apc.com
Two
Watchdog timers cause a failed cluster member to remove itself from a cluster prior to a healthy member taking over its services. Refer to Section 2.4.2, Configuring Power Switches for more information
Strongly recommended for data integrity under all failure conditions
Recom­mended for data integrity on systems which pro­vide inte­grated watch­dog hardware
The following table shows a variety of storage devices for an administrator to choose from. An indi­vidual cluster does not require all of the components listed below.
22 Chapter 2:Hardware Installation and Operating System Configuration
Table 2–5 Shared Disk Storage Hardware Table
Hardware Quantity Description Required
External disk storage enclosure
One Use Fibre Channel or single-initiator
parallel SCSI to connect the cluster systems to a single or dual-controller RAID array. To use single-initiator buses, a RAID controller must have multiple host ports and provide simultaneous access to all the logical units on the host ports. To use a dual-controller RAID array, a logical unit must fail over from one controller to the other in a way that is transparent to the operating system.
The following are recommended SCSI RAID arrays that provide simultaneous access to all the logical units on the host ports (this is not a comprehensive list; rather its limited to those RAID boxes which have been tested):
· Winchester Systems FlashDisk RAID Disk Array, which is available from http://www.winsys.com.
· Dot Hill’s SANnet Storage Systems, which is available from http://www.dothill.com
· Silicon Image CRD-7040 & CRA-7040, CRD -7220, CRD-7240 & CRA-7240, CRD-7400 & CRA-7400 controller based RAID arrays. Available from http://www.synetexinc.com
In order to ensure symmetry of device IDs and LUNs, many RAID arrays with dual redundant controllers are required to be configured in an active/passive mode. See Section 2.4.4, Configuring Shared Disk Storage for more information.
Yes
Section 2.1:Choosing a Hardware Configuration 23
Hardware Quantity Description Required
Host bus adapter
Two
To connect to shared disk storage, you must install either a parallel SCSI or a
Yes
Fibre Channel host bus adapter in a PCI slot in each cluster system.
For parallel SCSI, use a low voltage differential (LVD) host bus adapter. Adapters have either HD68 or VHDCI connectors. Recommended parallel SCSI host bus adapters include the following:
· Adaptec 2940U2W, 29160, 29160LP, 39160, and 3950U2
· Adaptec AIC-7896 on the Intel L440GX+ motherboard
· Qlogic QLA1080 and QLA12160
· Tekram Ultra2 DC-390U2W
· LSI Logic SYM22915
· A recommended Fibre Channel host bus adapter is the Qlogic QLA2200.
See Section A.6, Host Bus Adapter
Features and Configuration Requirements
for device features and configuration information. Host-bus adapter based RAID cards are only supported if they correctly support multi-host operation. At the time of publication, there were no fully tested host-bus adapter based RAID cards. Refer to http://www.redhat.com for more the latest hardware information.
SCSI cable
Two
SCSI cables with 68 pins connect each host bus adapter to a storage enclosure port. Cables have either HD68 or VHDCI connectors. Cables vary based on adapter type
Only for par­allel SCSI configura­tions
24 Chapter 2:Hardware Installation and Operating System Configuration
Hardware Quantity Description Required
SCSI terminator
Two
For a RAID storage enclosure that uses "out" ports (such as FlashDisk RAID Disk Array) and is connected to single-initiator SCSI buses, connect terminators to the "out" ports in order to terminate the buses.
Only for par­allel SCSI configura­tions and only if necessary for termina­tion
Fibre Channel hub or switch
Fibre Channel cable
One or two A Fibre Channel hub or switch is required. Only for some
Fibre Chan­nel configura­tions
Two to six A Fibre Channel cable connects a host bus
adapter to a storage enclosure port, a Fibre Channel hub, or a Fibre Channel switch. If a hub or switch is used, additional cables are needed to connect the hub or switch to the storage adapter ports.
Only for Fibre Channel con­figurations
Table 2–6 Network Hardware Table
Hardware Quantity Description Required
Network interface
Network switch or hub
Network cable
One for each network connection
One A network switch or hub allows connection of
One for each network interface
Each network connection requires a network interface installed in a cluster system.
multiple systems to a network. A conventional network cable, such as a cable
with an RJ45 connector, connects each network interface to a network switch or a network hub.
Yes
No
Yes
Section 2.1:Choosing a Hardware Configuration 25
Table 2–7 Point-To-Point Ethernet Heartbeat Channel Hardware Table
Hardware Quantity Description Required
Network interface
Two for each channel
Each Ethernet heartbeat channel requires a network interface installed in both cluster systems.
No
Network crossover cable
One for each channel
A network crossover cable connects a network interface on one cluster system to a network interface on the other cluster system, creating an Ethernet heartbeat channel.
Only for a redundant Ethernet heartbeat channel
26 Chapter 2:Hardware Installation and Operating System Configuration
Table 2–8 Point-To-Point Serial Heartbeat Channel Hardware Table
Hardware Quantity Description Required
Serial card Two for each
serial channel
Each serial heartbeat channel requires a serial port on both cluster systems. To expand your serial port capacity, you can use multi-port serial PCI cards. Recommended multi-port cards include the following: Vision Systems VScom 200H PCI card, which provides two serial ports, is available from http://www.vscom.de Cyclades-4YoPCI+ card, which provides four serial ports, is available from http://www.cyclades.com. Note that since configuration of serial heartbeat channels is optional, it is not required to invest in additional hardware specifically for this purpose. Should future support be provided for more than 2 cluster members, serial heartbeat channel support may be deprecated.
No
Null modem cable
One for each channel
A null modem cable connects a serial port on one cluster system to a corresponding serial port on the other cluster system, creating a serial heartbeat channel.
Only for serial heartbeat channel
Table 2–9 Console Switch Hardware Table
Hardware Quantity Description Required
Terminal server
KVM
One
One A KVM enables multiple systems to share one
A terminal server enables you to manage many systems from one remote location.
keyboard, monitor, and mouse. Cables for connecting systems to the switch depend on the type of KVM.
No
No
Section 2.1:Choosing a Hardware Configuration 27
Table 2–10 UPS System Hardware Table
Hardware Quantity Description Required
UPS system One or two Uninterruptible power supply (UPS)
systems protect against downtime if a power outage occurs. UPS systems are highly recommended for cluster operation. Ideally, connect the power cables for the shared storage enclosure and both power switches to redundant UPS systems. In addition, a UPS system must be able to provide voltage for an adequate period of time, and should be connected to its own power circuit. A recommended UPS system is the APC Smart-UPS 1400 Rackmount available from http://www.apc.com.
Strongly recommended for availability
2.1.5 Example of a Minimum Cluster Configuration
The hardware components described in Table 2–11, Minimum Cluster Hardware Configuration Com­ponents can be used to set up a minimum cluster configuration. This configuration does not guarantee
data integrity under all failure conditions, because it does not include power switches. Note that this is a sample configuration; it is possible to set up a minimum configuration using other hardware.
Table 2–11 Minimum Cluster Hardware Configuration Components
Hardware Quantity
Two servers
Each cluster system includes the following hardware: Network interface for client access and an Ethernet heartbeat channel One Adaptec 29160 SCSI adapter (termination disabled) for the shared storage connection
Two network cables with RJ45 connectors
Network cables connect a network interface on each cluster system to the network for client access and Ethernet heartbeats.
28 Chapter 2:Hardware Installation and Operating System Configuration
Hardware Quantity
RAID storage enclosure The RAID storage enclosure contains one controller with at least
two host ports.
Two HD68 SCSI cables Each cable connects one HBA to one port on the RAID
controller, creating two single-initiator SCSI buses.
2.1.6 Example of a No-Single-Point-Of-Failure Configuration
The components described in Table2–12, No-Single-Point-Of-Failure Configuration Components can be used to set up a no-single-point-of-failure cluster configuration that includes two single-initiator SCSI buses and power switches to guarantee data integrity under all failure conditions. Note that this is a sample configuration; it is possible to set up a no-single-point-of-failure configuration using other hardware.
Table 2–12 No-Single-Point-Of-Failure Configuration Components
Hardware Quantity
Two servers
Each cluster system includes the following hardware: Two network interfaces for: Point-to-point Ethernet heartbeat channel Client network access and Ethernet heartbeat connection Three serial ports for: Point-to-point serial heartbeat channel Remote power switch connection Connection to the terminal server One Tekram Ultra2 DC-390U2W adapter (termination enabled) for the shared disk storage connection
One network switch A network switch enables the connection of multiple systems
to a network.
One Cyclades terminal server A terminal server allows for management of remote systems
from a central location. (A terminal server is not required for cluster operation.)
Three network cables Network cables connect the terminal server and a network
interface on each cluster system to the network switch.
Two RJ45 to DB9 crossover cables
RJ45 to DB9 crossover cables connect a serial port on each cluster system to the Cyclades terminal server.
Section 2.1:Choosing a Hardware Configuration 29
Hardware Quantity
One network crossover cable A network crossover cable connects a network interface on
one cluster system to a network interface on the other system, creating a point-to-point Ethernet heartbeat channel.
Two RPS-10 power switches Power switches enable each cluster system to power-cycle the
other system before restarting its services. The power cable for each cluster system is connected to its own power switch.
Three null modem cables Null modem cables connect a serial port on each cluster
system to the power switch that provides power to the other cluster system. This connection enables each cluster system to power-cycle the other system.
A null modem cable connects a serial port on one cluster system to a corresponding serial port on the other system, creating a point-to-point serial heartbeat channel.
FlashDisk RAID Disk Array with dual controllers
Two HD68 SCSI cables HD68 cables connect each host bus adapter to a RAID enclosure
Two terminators Terminators connected to each "out" port on the RAID enclosure
Redundant UPS Systems UPS systems provide a highly-available source of power. The
Figure 2–1, No-Single-Point-Of-Failure Configuration Example shows an example of a no-single- point-of-failure hardware configuration that includes the hardware described in the previous table, two single-initiator SCSI buses, and power switches to guarantee data integrity under all error condi­tions. A "T" enclosed in a circle represents a SCSI terminator.
Dual RAID controllers protect against disk and controller failure. The RAID controllers provide simultaneous access to all the logical units on the host ports.
"in" port, creating two single-initiator SCSI buses.
terminate both single-initiator SCSI buses.
power cables for the power switches and the RAID enclosure are connected to two UPS systems.
30 Chapter 2:Hardware Installation and Operating System Configuration
Figure 2–1 No-Single-Point-Of-Failure Configuration Example
2.2 Steps for Setting Up the Cluster Systems
After identifying the cluster hardware components described in Section 2.1, Choosing a Hardware Configuration, set up the basic cluster system hardware and connect the systems to the optional con-
sole switch and network switch or hub. Follow these steps:
1. In both cluster systems, install the required network adapters, serial cards, and host bus adapters. See Section 2.2.1, Installing the Basic System Hardware for more information about performing this task.
2. Set up the optional console switch and connect it to each cluster system. See Section 2.2.2, Setting Up a Console Switch for more information about performing this task.
If a console switch is not used, then connect each system to a console terminal.
3. Set up the optional network switch or hub and use conventionalnetwork cables to connect it to the cluster systems and the terminal server (if applicable). See Section 2.2.3, Setting Up a Network Switch or Hub for more information about performing this task.
If a network switch or hub is not used, then conventionalnetwork cables should be used to connect each system and the terminal server (if applicable) to a network.
After performing the previous tasks, install the Linux distribution as described in Section 2.3, Steps for Installing and Configuring the Red Hat Linux Distribution.
Section 2.2:Steps for Setting Up the Cluster Systems 31
2.2.1 Installing the Basic System Hardware
Cluster systems must provide the CPU processing power and memory required by applications. It is recommended that each system have a minimum of 450 MHz CPU speed and 256 MB of memory.
In addition, cluster systems must be able to accommodate the SCSI or FC adapters, network inter­faces, and serial ports that the hardware configuration requires. Systems have a limited number of preinstalled serial and network ports and PCI expansion slots. The following table will help to deter­mine how much capacity the cluster systems employed will require:
Table 2–13 Installing the Basic System Hardware
Serial
Cluster Hardware Component
Remote power switch connection (optional, but strongly recommended)
SCSI or Fibre Channel adapter to shared disk storage One for
Network connection for client access and Ethernet heartbeat One for
Point-to-point Ethernet heartbeat channel (optional) One for
Point-to-point serial heartbeat channel (optional) One for
Terminal server connection (optional) One
Most systems come with at least one serial port. Ideally, choose systems that have at least two serial ports. If a system has graphics display capability,it is possible to use the serial console port for a serial heartbeat channel or a power switch connection. To expand your serial port capacity, use multi-port serial PCI cards.
Ports
One
each channel
Network Slots
each net­work con­nection
each channel
PCI Slots
each bus adapter
In addition, be sure that local system disks will not be on the same SCSI bus as the shared disks. For example, use two-channel SCSI adapters, such as the Adaptec 39160-series cards, and put the internal
32 Chapter 2:Hardware Installation and Operating System Configuration
devices on one channel and the shared disks on the other channel. Using multiple SCSI cards is also possible.
See the system documentation supplied by the vendor for detailed installation information. See Ap­pendix A, Supplementary Hardware Information for hardware-specific information about using host bus adapters in a cluster.
Figure 2–2 Typical Cluster System External Cabling
Figure 2–2, Typical Cluster System External Cabling shows the bulkhead of a sample cluster system and the external cable connections for a typical cluster configuration.
2.2.2 Setting Up a Console Switch
Although a console switch is not required for cluster operation, it can be used to facilitate cluster system management and eliminate the need for separate monitors, mouses, and keyboards for each cluster system. There are several types of console switches.
Forexample, a terminal serverenables connection to serial consoles and management of many systems from a remote location. For a low-cost alternative, use a KVM (keyboard, video, and mouse) switch, which enables multiple systems to share one keyboard, monitor, and mouse. A KVM switch is suitable for configurations in which access to a graphical user interface (GUI) to perform system management tasks is preferred.
Section 2.3:Steps for Installing and Configuring the Red Hat Linux Distribution 33
Set up the console switch according to the documentation provided by the vendor. After the console switch has been set up, connect it to each cluster system. The cables used depend on
the type of console switch. For example, if you a Cyclades terminal server uses RJ45 to DB9 crossover cables to connect a serial port on each cluster system to the terminal server.
2.2.3 Setting Up a Network Switch or Hub
Although a network switch or hub is not required for cluster operation, it can be used to facilitate cluster and client system network operations.
Set up a network switch or hub according to the documentation provided by the vendor. Afterthe network switch or hub has been set up, connect it to each clustersystemby using conventional
network cables. When using a terminal server, a network cable connects it to the network switch or hub.
2.3 Steps for Installing and Configuring the Red Hat Linux Distribution
After the setup of basic system hardware, proceed with installation of Red Hat Linux on both cluster systems and ensure that they recognize the connected devices. Follow these steps:
1. Install the Red Hat Linux distribution on both cluster systems. If customizing the kernel, be sure to follow the kernel requirements and guidelines described in Section 2.3.1, Kernel Requirements.
2. Reboot the cluster systems.
3. When using a terminal server, configure Linux to send console messages to the console port.
4. Edit the /etc/hosts file on each cluster system and include the IP addresses used in the cluster. SeeSection 2.3.2, Editing the
5. Decrease the alternate kernel boot timeout limit to reduce cluster system boot time. See Section
2.3.3, Decreasing the Kernel Boot TimeoutLimit for more information about performing this task.
6. Ensure that no login (or getty) programs are associated with the serial ports that are being used for the serial heartbeat channel or the remote power switch connection (if applicable). To perform this task, edit the /etc/inittab file and use a pound symbol ( that correspond to the serial ports used for the serial channel and the remote power switch. Then, invoke the init q command.
7. Verify that both systems detect all the installed hardware:
Use the dmesg command to display the console startup messages. SeeSection 2.3.4, Display-
ing Console Startup Messages for more information about performing this task.
/etc/hosts
Filefor more information about performing this task.
#) to comment out the entries
34 Chapter 2:Hardware Installation and Operating System Configuration
Use the cat /proc/devices command to display the devices configured in the kernel. See Section 2.3.5, Displaying Devices Configured in the Kernel for more information about performing this task.
8. Verify that the cluster systems can communicate over all the network interfaces by using the ping command to send test packets from one system to the other.
9. If intending toconfigure Samba services, verifythat the Samba related RPMpackages are installed on your system.
2.3.1 Kernel Requirements
When manually configuring the kernel, adhere to the following are kernel requirements:
Enable IP Aliasing support in the kernel by setting the CONFIG_IP_ALIAS kernel option to y. When specifying kernel options, under Networking Options, select IP aliasing sup- port.
Enable support for the /proc file system by setting the CONFIG_PROC_FS kernel option to y. When specifying kernel options, under Filesystems, select /proc filesystem sup- port.
Ensure that the SCSI driver is started before the cluster software. For example, edit the startup scripts so that the driver is started before the cluster script. It is also possible to statically build the SCSI driver into the kernel, instead of including it as a loadable module, by modifying the /etc/modules.conf file.
In addition, when installing the Linux distribution, it is strongly recommended to do the following:
Gather the IP addresses for the cluster systems and for the point-to-point Ethernet heartbeat in­terfaces, before installing a Linux distribution. Note that the IP addresses for the point-to-point Ethernet interfaces can be private IP addresses, (for example, 10.x.x.x).
Optionally, reserve an IP address to be used as the "cluster alias". This address is typically used to facilitate remote monitoring.
Enable the following Linux kernel options to provide detailed information about the system con­figuration and events and help you diagnose problems:
– Enable SCSI logging support by setting the CONFIG_SCSI_LOGGING kernel option to y.
When specifying kernel options, under SCSI Support, select SCSI logging facil- ity.
– Enable support for sysctl by setting the CONFIG_SYSCTL kernel option to y. When spec-
ifying kernel options, under General Setup, select Sysctl support.
Section 2.3:Steps for Installing and Configuring the Red Hat Linux Distribution 35
Do not place local file systems, such as /, /etc, /tmp, and /var on shared disks or on the same SCSI bus as shared disks. This helps prevent the other cluster member from accidentally mounting these file systems, and also reserves the limited number of SCSI identification numbers on a bus for cluster disks.
Place /tmp and /var on different file systems. This may improve system performance.
When a cluster system boots, be sure that the system detects the disk devices in the same order in which they were detected during the Linux installation. If the devices are not detected in the same order, the system may not boot.
When using RAID storage configured with Logical Unit Numbers (LUNs) greater than zero, it is necessary to enable LUN support by adding the following to /etc/modules.conf:
options scsi_mod max_scsi_luns=255
Aftermodifying modules.conf, itis necessary to rebuildthe initial ram disk using mkinitrd. Refer to the Official Red Hat Linux Customization Guide for more information about creating ramdisks using mkinitrd.
2.3.2 Editing the /etc/hosts File
The /etc/hosts file contains the IP address-to-hostname translation table. The /etc/hosts file on each cluster system must contain entries for the following:
IP addresses and associated host names for both cluster systems
IPaddresses and associated hostnames for the point-to-point Ethernet heartbeat connections (these can be private IP addresses)
As an alternative to the /etc/hosts file, naming services such as DNS or NIS can be used to define the host names used by a cluster. However, to limit the number of dependencies and optimize avail­ability, it is strongly recommended to use the /etc/hosts file to define IP addresses for cluster network interfaces.
The following is an example of an /etc/hosts file on a cluster system:
127.0.0.1 localhost.localdomain localhost
193.186.1.81 cluster2.yourdomain.com cluster2
10.0.0.1 ecluster2.yourdomain.com ecluster2
193.186.1.82 cluster3.yourdomain.com cluster3
10.0.0.2 ecluster3.yourdomain.com ecluster3
193.186.1.83 clusteralias.yourdomain.com clusteralias
The previous example shows the IP addresses and host names for two cluster systems (cluster2 and cluster3), and the private IP addresses and host names for the Ethernet interface used for the point-to-
36 Chapter 2:Hardware Installation and Operating System Configuration
point heartbeat connection on each cluster system (ecluster2 and ecluster3) as well as the IP alias clusteralias used for remote cluster monitoring.
Verify correct formatting of the local host entry in the /etc/hosts file to ensure that it does not include non-local systems in the entry for the local host. An example of an incorrect local host entry that includes a non-local system (server1) is shown next:
127.0.0.1 localhost.localdomain localhost server1
A heartbeat channel may not operate properly if the format is not correct. For example, the channel will erroneously appear to be offline. Check the /etc/hosts file and correct the file format by removing non-local systems from the local host entry, if necessary.
Note that each network adapter must be configured with the appropriate IP address and netmask. The following is an example of a portion of the output from the /sbin/ifconfig command on a
cluster system:
# ifconfig
eth0 Link encap:Ethernet HWaddr 00:00:BC:11:76:93
eth1 Link encap:Ethernet HWaddr 00:00:BC:11:76:92
inet addr:192.186.1.81 Bcast:192.186.1.245 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:65508254 errors:225 dropped:0 overruns:2 frame:0 TX packets:40364135 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 Interrupt:19 Base address:0xfce0
inet addr:10.0.0.1 Bcast:10.0.0.245 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 Interrupt:18 Base address:0xfcc0
The previous example shows two network interfaces on a cluster system: The eth0 network interface for the cluster system and the eth1 (network interface for the point-to-point heartbeat connection).
2.3.3 Decreasing the Kernel Boot Timeout Limit
It is possible to reduce the boot time for a cluster system by decreasing the kernel boot timeout limit. During the Linux boot sequence, the bootloader allows for specifying an alternate kernel to boot. The default timeout limit for specifying a kernel is ten seconds.
Section 2.3:Steps for Installing and Configuring the Red Hat Linux Distribution 37
To modify the kernel boot timeout limit for a cluster system, edit the /etc/lilo.conf file and specify the desired value (in tenths of a second) for the timeout parameter. The following example sets the timeout limit to three seconds:
timeout = 30
To apply any changes made to the /etc/lilo.conf file, invoke the /sbin/lilo command. Similarly, when using the grub boot loader, the timeout parameter in /boot/grub/grub.conf
should be modified to specify the appropriate number of seconds before timing out. To set this interval to 3 seconds, edit the parameter to the following:
timeout = 3
2.3.4 Displaying Console Startup Messages
Use the dmesg command to display the console startup messages. See the dmesg(8) manual page for more information.
The following example of the dmesg command output shows that a serial expansion card was recog­nized during startup:
May 22 14:02:10 storage3 kernel: Cyclades driver 2.3.2.5 2000/01/19 14:35:33 May 22 14:02:10 storage3 kernel: built May 8 2000 12:40:12 May 22 14:02:10 storage3 kernel: Cyclom-Y/PCI #1: 0xd0002000-0xd0005fff, IRQ9,
4 channels starting from port 0.
The following example of the dmesg command output shows that two external SCSI buses and nine disks were detected on the system (note that lines with forward slashes will be printed as one line on most screens):
May 22 14:02:10 storage3 kernel: scsi0 : Adaptec AHA274x/284x/294x \
(EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4 May 22 14:02:10 storage3 kernel: May 22 14:02:10 storage3 kernel: scsi1 : Adaptec AHA274x/284x/294x \
May 22 14:02:10 storage3 kernel: May 22 14:02:10 storage3 kernel: scsi : 2 hosts. May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST39236LW Rev: 0004 May 22 14:02:11 storage3 kernel: Detected scsi disk sda at scsi0, channel 0, id 0, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdb at scsi1, channel 0, id 0, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdc at scsi1, channel 0, id 1, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdd at scsi1, channel 0, id 2, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001
(EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4
38 Chapter 2:Hardware Installation and Operating System Configuration
May 22 14:02:11 storage3 kernel: Detected scsi disk sde at scsi1, channel 0, id 3, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdf at scsi1, channel 0, id 8, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdg at scsi1, channel 0, id 9, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdh at scsi1, channel 0, id 10, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdi at scsi1, channel 0, id 11, lun 0 May 22 14:02:11 storage3 kernel: Vendor: Dell Model: 8 BAY U2W CU Rev: 0205 May 22 14:02:11 storage3 kernel: Type: Processor \
May 22 14:02:11 storage3 kernel: scsi1 : channel 0 target 15 lun 1 request sense \
failed, performing reset. May 22 14:02:11 storage3 kernel: SCSI bus is being reset for host 1 channel 0. May 22 14:02:11 storage3 kernel: scsi : detected 9 SCSI disks total.
ANSI SCSI revision: 03
The following example of the dmesg command output shows that a quad Ethernet card was detected on the system:
May 22 14:02:11 storage3 kernel: 3c59x.c:v0.99H 11/17/98 Donald Becker May 22 14:02:11 storage3 kernel: tulip.c:v0.91g-ppc 7/16/99 becker@cesdis.gsfc.nasa.gov May 22 14:02:11 storage3 kernel: eth0: Digital DS21140 Tulip rev 34 at 0x9800, \
00:00:BC:11:76:93, IRQ 5. May 22 14:02:12 storage3 kernel: eth1: Digital DS21140 Tulip rev 34 at 0x9400, \
00:00:BC:11:76:92, IRQ 9. May 22 14:02:12 storage3 kernel: eth2: Digital DS21140 Tulip rev 34 at 0x9000, \
00:00:BC:11:76:91, IRQ 11. May 22 14:02:12 storage3 kernel: eth3: Digital DS21140 Tulip rev 34 at 0x8800, \
00:00:BC:11:76:90, IRQ 10.
2.3.5 Displaying Devices Configured in the Kernel
To be sure that the installed devices, including serial and network interfaces, are configured in the kernel, use the cat /proc/devices command on each cluster system. Use this command to also determine if there is raw device support installed on the system. For example:
# cat /proc/devices Character devices:
1 mem 2 pty 3 ttyp 4 ttyS 5 cua 7 vcs
10 misc
Section 2.4:Steps for Setting Up and Connecting the Cluster Hardware 39
19 ttyC
20 cub 128 ptm 136 pts 162 raw
Block devices:
2fd 3 ide0 8sd
65 sd #
The previous example shows:
Onboard serial ports (ttyS)
Serial expansion card (ttyC)
Raw devices (raw)
SCSI devices (sd)
2.4 Steps for Setting Up and Connecting the Cluster Hardware
After installing Red Hat Linux, set up the cluster hardware components and verify the installation to ensure that the cluster systems recognize all the connected devices. Note that the exact steps for setting up the hardware depend on the type of configuration. See Section 2.1, Choosing a Hardware Configuration for more information about cluster configurations.
To set up the cluster hardware, follow these steps:
1. Shut down the cluster systems and disconnect them from their power source.
2. Set up the point-to-point Ethernet and serial heartbeat channels, if applicable. See Section 2.4.1, Configuring Heartbeat Channels for more information about performing this task.
3. When using power switches, set up the devicesand connect each cluster system to a powerswitch. See Section 2.4.2, Configuring Power Switches for more information about performing this task.
In addition, it is recommended to connect each power switch (or each cluster system’s power cord if not using power switches) to a different UPS system. See Section 2.4.3, Configuring UPS Systems for information about using optional UPS systems.
40 Chapter 2:Hardware Installation and Operating System Configuration
4. Set up the shared disk storage according to the vendor instructions and connect the cluster systems to the external storage enclosure.See Section 2.4.4, Configuring Shared Disk Storage for more information about performing this task.
In addition, it is recommended to connect the storage enclosure to redundant UPS systems. See Section 2.4.3, Configuring UPS Systems for more information about using optional UPS systems.
5. Turn on power to the hardware, and boot each cluster system. During the boot-up process, enter the BIOS utility to modify the system setup, as follows:
Ensure that the SCSI identification number used by the HBA is unique for the SCSI bus it is
attached to. See Section A.5, SCSI Identification Numbers for more information about per­forming this task.
Enable or disable the onboard termination for each host bus adapter, as required by the storage
configuration. See Section 2.4.4, Configuring SharedDisk Storage and Section A.3, SCSI Bus Termination for more information about performing this task.
Enable the cluster system to automatically boot when it is powered on.
6. Exit from the BIOS utility, and continue to boot each system. Examine the startup messages to verifythat the Linux kernel has been configured and can recognize the full set of shared disks. Use the dmesg command to display console startup messages. See Section 2.3.4, Displaying Console Startup Messages for more information about using this command.
7. Verify that the cluster systems can communicate over each point-to-point Ethernet heartbeat con­nection by using the ping command to send packets over each network interface.
8. Set up the quorum disk partitions on the shared disk storage. See Configuring Quorum Partitions in Section 2.4.4 for more information about performing this task.
2.4.1 Configuring Heartbeat Channels
The cluster uses heartbeat channels as a policy input during failover of the cluster systems. For exam­ple, if a cluster system stops updating its timestamp on the quorum partitions, the other cluster system will check the status of the heartbeat channels to determine if additional time should be alloted prior to initiating a failover.
A cluster must include at least one heartbeat channel. It is possible to use an Ethernet connection for both client access and a heartbeat channel. However,it is recommended to set up additional heartbeat channels for high availability, using redundant Ethernet heartbeat channels, in addition to one or more serial heartbeat channels.
For example, if using both an Ethernet and a serial heartbeat channel, and the cable for the Ethernet channel is disconnected, the cluster systems can still check status through the serial heartbeat channel.
Section 2.4:Steps for Setting Up and Connecting the Cluster Hardware 41
To set up a redundant Ethernet heartbeat channel, use a network crossover cable to connect a network interface on one cluster system to a network interface on the other cluster system.
To set up a serial heartbeat channel, use a null modem cable to connect a serial port on one cluster system to a serial port on the other cluster system. Be sure to connect corresponding serial ports on the cluster systems; do not connect to the serial port that will be used for a remote power switch connection. In the future, should support be added for more than two cluster members, then usage of serial based heartbeat channels may be deprecated.
2.4.2 Configuring Power Switches
Power switches enable a cluster system to power-cycle the other cluster system before restarting its services as part of the failover process. The ability to remotely disable a system ensures data in­tegrity is maintained under any failure condition. It is recommended that production environments use power switches or watchdog timers in the cluster configuration. Only development (test) envi­ronments should use a configuration without power switches (type "None"). Refer to Section 2.1.3, Choosing the Type of Power Controller for a description of the various types of power switches. Note that within this section, the general term "power switch" also includes watchdog timers.
In a cluster configuration that uses physical power switches, each cluster system’s power cable is connected to a power switch through either a serial or network connection (depending on switch type). When failover occurs, a cluster system can use this connection to power-cycle the other cluster system before restarting its services.
Power switches protect against data corruption if an unresponsive (or hanging) system becomes re­sponsive after its services have failed over, and issues I/O to a disk that is also receiving I/O from the other cluster system. In addition, if a quorum daemon fails on a cluster system, the system is no longer able to monitor the quorum partitions. If power switches or watchdog timers are not used in the cluster, then this error condition may result in services being run on more than one cluster system, which can cause data corruption and possibly system crashes.
It is strongly recommended to use power switches in a cluster. However, administrators who are aware of the risks may choose to set up a cluster without power switches.
A cluster system may hang for a few seconds if it is swapping or has a high system workload. For this reason, adequate time is allowed prior to concluding another system has failed (typically 12 seconds).
A cluster system may "hang" indefinitelybecause of a hardware failure or kernel error. In this case, the other cluster will notice that the hung system is not updating its timestamp on the quorum partitions, and is not responding to pings over the heartbeat channels.
If a cluster system determines that a hung system is down, and power switches are used in the cluster, the cluster system will power-cycle the hung system before restarting its services. Clusters configured to use watchdog timers will self-reboot under most system hangs. This will cause the hung system to reboot in a clean state, and prevent it from issuing I/O and corrupting service data.
42 Chapter 2:Hardware Installation and Operating System Configuration
If power switches are not used in cluster, and a cluster system determines that a hung system is down, it will set the status of the failed system to DOWN on the quorum partitions, and then restart the hung system’s services. If the hung system becomes becomes responsive, it will notice that its status is DOWN, and initiate a system reboot. This will minimize the time that both cluster systems may be able to issue I/O to the same disk, but it does not provide the data integrity guarantee of power switches. If the hung system never becomes responsive and no power switches are in use, then a manual reboot is required.
When used, power switches must be set up according to the vendor instructions. However,some clus­ter-specific tasks may be required to use a power switch in the cluster. See Section A.1, Setting Up Power Switches for detailed information on power switches (including information about watchdog timers). Be sure to take note of any caveats or functional attributes of specific power switch types. Note that the cluster-specific information provided in this document supersedes the vendor informa­tion.
When cabling power switches, take special care to ensure that each cable is plugged into the appro­priate outlet. This is crucial because there is no independent means for the software to verify correct cabling. Failure to cable correctly can lead to an incorrect system being power cycled, or for one sys­tem to inappropriately conclude that it has successfully power cycled another cluster member.
After setting up the power switches, perform these tasks to connect them to the cluster systems:
1. Connect the power cable for each cluster system to a power switch.
2. On each cluster system, connect a serial port to the serial port on the power switch that provides power to the other cluster system. The cable used for the serial connection depends on the type of power switch. For example, an RPS-10 power switch uses null modem cables, while a network attached power switch requires a network cable.
3. Connect the power cable for each power switch to a power source. It is recommended to connect each power switch to a different UPS system. See Section 2.4.3, Configuring UPS Systems for more information.
After the installation of the cluster software, test the power switches to ensure that each cluster system can power-cycle the other system before starting the cluster. See Section 3.2.2, Testing the Power Switches for information.
2.4.3 Configuring UPS Systems
Uninterruptible power supply (UPS) systems provide a highly-available source of power. Ideally, a redundant solution should be used that incorporates multiple UPS’s (one per server). For maximal fault-tolerance, it is possible to incorporate two UPS’s per server as well as APC’s Automatic Transfer Switches to manage the power and shutdown management of the server. Both solutions are solely dependent on the level of availability desired.
Section 2.4:Steps for Setting Up and Connecting the Cluster Hardware 43
It is not recommended to use a large UPS infrastructure as the sole source of power for the cluster. A UPS solution dedicated to the cluster itself allows for more flexibility in terms of manageability and availability.
A complete UPS system must be able to provide adequate voltage and current for a prolonged period of time. While there is no single UPS to fit every power requirement, a solution can be tailored to fit a particular configuration. Visit APC’s UPS configurator at http://www.apcc.com/template/size/apc to find the correct UPS configuration for your server. The APC Smart-UPS product line ships with software management for Red Hat Linux. The name of the RPM package is pbeagent.
If the cluster disk storage subsystem has two power supplies with separate power cords, set up two UPS systems, and connect one power switch (or one cluster system’s power cord if not using power switches) and one of the storage subsystem’s power cords to each UPS system. A redundant UPS system configuration is shown in Figure 2–3, Redundant UPS System Configuration.
Figure 2–3 Redundant UPS System Configuration
An alternative redundant power configuration is to connect both power switches (or both cluster sys­tems’ power cords) and the disk storage subsystem to the same UPS system. This is the most cost-ef­fectiveconfiguration, and provides some protection against power failure. However,if a power outage occurs, the single UPS system becomes a possible single point of failure. In addition, one UPS system may not be able to provide enough power to all the attached devices for an adequate amount of time. A single UPS system configuration is shown in Figure 2–4, Single UPS System Configuration.
44 Chapter 2:Hardware Installation and Operating System Configuration
Figure 2–4 Single UPS System Configuration
Many vendor-supplied UPS systems include Linux applications that monitor the operational status of the UPS system through a serial port connection. If the battery power is low, the monitoring software will initiate a clean system shutdown. As this occurs, the cluster software will be properly stopped, because it is controlled by a System V run level script (for example, /etc/rc.d/init.d/clus- ter).
See the UPS documentation supplied by the vendor for detailed installation information.
2.4.4 Configuring Shared Disk Storage
In a cluster, shared disk storage is used to hold service data and two quorum partitions. Because this storage must be available to both cluster systems, it cannot be located on disks that depend on the availability of any one system. See the vendor documentation for detailed product and installation information.
There are some factors to consider when setting up shared disk storage in a cluster:
External RAID It is strongly recommended to use use RAID 1 (mirroring) to make service data and the quorum
partitions highly available. Optionally, parity RAID can also be employed for high-availability. Do not use RAID 0 (striping) alone for quorum partitions because this reduces storage availability.
Multi-Initiator SCSI configurations
Section 2.4:Steps for Setting Up and Connecting the Cluster Hardware 45
Multi-initiator SCSI configurations are not supported due to the difficulty in obtaining proper bus termination.
The Linux device name for each shared storage device must be the same on each cluster system. For example, a device named /dev/sdc on one cluster system must be named /dev/sdc on the other cluster system. Using identical hardware for both cluster systems usually ensures that these devices will be named the same.
A disk partition can be used by only one cluster service.
Donotinclude any file systems used in acluster service in the cluster system’slocal /etc/fstab files, because the cluster software must control the mounting and unmounting of service file sys­tems.
Foroptimal performance, use a 4 KB block size when creating shared file systems. Note that some of the mkfs file system build utilities have a default 1 KB block size, which can cause long fsck times.
The following list details the parallel SCSI requirements, and must be adhered to if employed in a cluster environment:
SCSIbusesmustbeterminated at each end, andmust adhere to length andhotplugging restrictions.
Devices (disks, host bus adapters, and RAID controllers) on a SCSI bus must have a unique SCSI identification number.
See Section A.2, SCSI Bus Configuration Requirements for more information. In addition, it is strongly recommended to connect the storage enclosure to redundant UPS systems for
ahighly-availablesource of power. See Section 2.4.3, Configuring UPS Systems for more information. See Setting Up a Single-Initiator SCSI Bus in Section 2.4.4 and Setting Up a Fibre Channel Intercon-
nect in Section 2.4.4 for more information about configuring shared storage. After setting up the shared disk storage hardware, partition the disks and then either create file systems
or raw devices on the partitions. Two raw devices must be created for the primary and the backup quorum partitions. See Configuring Quorum Partitions in Section 2.4.4, Partitioning Disks in Section
2.4.4, Creating Raw Devices in Section 2.4.4, and Creating File Systems in Section 2.4.4 for more
information.
Setting Up a Single-Initiator SCSI Bus
A single-initiator SCSI bus has only one cluster system connected to it, and provideshost isolation and better performance than a multi-initiator bus. Single-initiator buses ensure that each cluster system is protected from disruptions due to the workload, initialization, or repair of the other cluster system.
When using a single or dual-controller RAID array that has multiple host ports and provides simulta­neous access to all the shared logical units from the host ports on the storage enclosure, the setup of
46 Chapter 2:Hardware Installation and Operating System Configuration
two single-initiator SCSI buses to connect each cluster system to the RAID array is possible. If a log­ical unit can fail over from one controller to the other, the process must be transparent to the operating system. Note that some RAID controllers restrict a set of disks to a specific controller or port. In this case, single-initiator bus setups are not possible.
A single-initiator bus must adhere to the requirements described in Section A.2, SCSI Bus Configu-
ration Requirements. In addition, see Section A.6, Host Bus Adapter Features and Configuration Re­quirements for detailed information about terminating host bus adapters and configuring a single-ini-
tiator bus. To set up a single-initiator SCSI bus configuration, the following is required:
Enable the on-board termination for each host bus adapter.
Enable the termination for each RAID controller.
Use the appropriate SCSI cable to connect each host bus adapter to the storage enclosure. Setting host bus adapter termination is usually done in the adapter BIOS utility during system boot.
To set RAID controller termination, refer to the vendor documentation. shows a configuration that uses two single-initiator SCSI buses.
Figure 2–5 Single-Initiator SCSI Bus Configuration
Figure 2–6, Single-Controller RAID Array Connected to Single-Initiator SCSI Buses shows the termi­nation in a single-controller RAID array connected to two single-initiator SCSI buses.
Section 2.4:Steps for Setting Up and Connecting the Cluster Hardware 47
Figure 2–6 Single-Controller RAID Array Connected to Single-Initiator SCSI Buses
Figure 2–7 Dual-Controller RAID Array Connected to Single-Initiator SCSI Buses
48 Chapter 2:Hardware Installation and Operating System Configuration
Setting Up a Fibre Channel Interconnect
Fibre Channel can be used in either single-initiator or multi-initiator configurations A single-initiator Fibre Channel interconnect has only one cluster system connected to it. This may
provide better host isolation and better performance than a multi-initiator bus. Single-initiator inter­connects ensure that each cluster system is protected from disruptions due to the workload, initializa­tion, or repair of the other cluster system.
If employing a RAID array that has multiple host ports, and the RAID array provides simultaneous access to all the shared logical units from the host ports on the storage enclosure, set up two single­initiator Fibre Channel interconnects to connect each cluster system to the RAID array. If a logical unit can fail over from one controller to the other, the process must be transparent to the operating system.
Figure 2–8, Single-Controller RAID Array Connected to Single-Initiator FibreChannel Interconnects showsa single-controller RAID array with two host ports, and the host bus adapters connected directly to the RAID controller, without using Fibre Channel hubs or switches.
Figure 2–8 Single-Controller RAID Array Connected to Single-Initiator Fibre Channel Interconnects
Section 2.4:Steps for Setting Up and Connecting the Cluster Hardware 49
Figure 2–9 Dual-Controller RAID Array Connected to Single-Initiator Fibre Channel Interconnects
If a dual-controller RAID array with two host ports on each controller is used, a Fibre Channel hub or switch is required to connect each host bus adapter to one port on both controllers, as shown in Figure 2–9, Dual-Controller RAID Array Connected to Single-Initiator Fibre Channel Interconnects.
If a multi-initiator Fibre Channel is used, then a Fibre Channel hub or switch is required. In this case, each HBA is connected to the hub or switch, and the hub or switch is connected to a host port on each RAID controller.
Configuring Quorum Partitions
Two raw devices on shared disk storage must be created for the primary quorum partition and the backup quorum partition. Each quorum partition must have a minimum size of 10 MB. The amount of data in a quorum partition is constant; it does not increase or decrease over time.
The quorum partitions are used to hold cluster state information. Periodically, each cluster system writes its status (either UP or DOWN), a timestamp, and the state of its services. In addition, the quorum partitions contain a version of the cluster database. This ensures that each cluster system has a common view of the cluster configuration.
To monitor cluster health, the cluster systems periodically read state information from the primary quorum partition and determine if it is up to date. If the primary partition is corrupted, the cluster systems read the information from the backup quorum partition and simultaneously repair the primary
50 Chapter 2:Hardware Installation and Operating System Configuration
partition. Data consistency is maintained through checksums and any inconsistencies between the partitions are automatically corrected.
If a system is unable to write to both quorum partitions at startup time, it will not be allowed to join the cluster. In addition, if an active cluster system can no longer write to both quorum partitions, the system will remove itself from the cluster by rebooting (and may be remotely power cycled by the healthy cluster member).
The following are quorum partition requirements:
Both quorum partitions must have a minimum size of 10 MB.
Quorum partitions must be raw devices. They cannot contain file systems.
Quorum partitions can be used only for cluster state and configuration information. The following are recommended guidelines for configuring the quorum partitions:
It is strongly recommended to set up a RAID subsystem for shared storage, and use RAID 1 (mir­roring) to make the logical unit that contains the quorum partitions highly available. Optionally, parity RAID can be used for high-availability. Do not use RAID 0 (striping) alone for quorum partitions.
Place both quorum partitions on the same RAID set, or on the same disk if RAID is not employed, because both quorum partitions must be available in order for the cluster to run.
Do not put the quorum partitions on a disk that contains heavily-accessed service data. If possible, locate the quorum partitions on disks that contain service data that is rarely accessed.
See Partitioning Disks in Section 2.4.4 and Creating Raw Devices in Section 2.4.4 for more informa- tion about setting up the quorum partitions.
See Section 3.1.1, Editing the file to bind the raw character devices to the block devices each time the cluster systems boot.
rawdevices
File for information about editing the rawdevices
Partitioning Disks
After shared disk storage hardware has been set up, partition the disks so they can be used in the cluster. Then, create file systems or raw devices on the partitions. Forexample, two raw devices must be created for the quorum partitions using the guidelines described in Configuring Quorum Partitions in Section 2.4.4.
Invoke the interactive fdisk command to modify a disk partition table and divide the disk into parti­tions. While in fdisk, use the p command to display the current partition table and the n command to create new partitions.
The following example shows how to use the fdisk command to partition a disk:
Section 2.4:Steps for Setting Up and Connecting the Cluster Hardware 51
1. Invokethe interactive fdisk command, specifying anavailableshared disk device. At the prompt, specify the p command to display the current partition table.
# fdisk /dev/sde Command (m for help): p
Disk /dev/sde: 255 heads, 63 sectors, 2213 cylinders Units = cylinders of 16065 * 512 bytes
Device Boot Start End Blocks Id System /dev/sde1 1 262 2104483+ 83 Linux /dev/sde2 263 288 208845 83 Linux
2. Determine the number of the next available partition, and specify the n command to add the par­tition. If there are already three partitions on the disk, then specify e for extended partition or p to create a primary partition.
Command (m for help): n Command action
e extended p primary partition (1-4)
3. Specify the partition number required:
Partition number (1-4): 3
4. Press the
[Enter] key or specify the next available cylinder:
First cylinder (289-2213, default 289): 289
5. Specify the partition size that is required:
Last cylinder or +size or +sizeM or +sizeK (289-2213, default 2213): +2000M
Note that large partitions will increase the cluster service failover time if a file system on the par­tition must be checked with fsck. Quorum partitions must be at least 10 MB.
6. Specify the w command to write the new partition table to disk:
Command (m for help): w The partition table has been altered!
Calling ioctl() to re-read partition table.
WARNING: If you have created or modified any DOS 6.x partitions, please see the fdisk manual page for additional information.
52 Chapter 2:Hardware Installation and Operating System Configuration
Syncing disks.
7. If a partition was added while both cluster systems are powered on and connected to the shared storage, reboot the other cluster system in order for it to recognize the new partition.
After partitioning a disk, format the partition for use in the cluster. For example, create file systems or raw devices for quorum partitions.
See Creating Raw Devices in Section 2.4.4 and Creating FileSystems in Section 2.4.4 for more infor­mation.
For basic information on partitioning hard disks at installation time, see The Official Red Hat Linux
x86 Installation Guide. Appendix E. An Introduction to Disk Partitions of The Official Red Hat Linux x86 Installation Guide also explains the basic concepts of partitioning.
For basic information on partitioning disks using fdisk, refer to the following URL http://kb.red­hat.com/view.php?eid=175.
Creating Raw Devices
After partitioning the shared storage disks, create raw devices on the partitions. File systems are block devices (for example, /dev/sda1) that cache recently-used data in memory in order to improve performance. Raw devices do not utilize system memory for caching. See Creating File Systems in Section 2.4.4 for more information.
Linux supports raw character devices that are not hard-coded against specific block devices. Instead, Linux uses a character major number (currently 162) to implement a series of unbound raw devices in the /dev/raw directory. Any block device can have a character raw device front-end, even if the block device is loaded later at runtime.
To create a raw device, edit the /etc/sysconfig/rawdevices file to bind a raw character de­vice to the appropriate block device. Once bound to a block device, a raw device can be opened, read, and written.
Quorum partitions and some database applications require raw devices, because these applications perform their own buffer caching for performance purposes. Quorum partitions cannot contain file systems because if state data was cached in system memory, the cluster systems would not have a consistent view of the state data.
Raw character devices must be bound to block devices each time a system boots. To ensure that this occurs, edit the /etc/sysconfig/rawdevices file and specify the quorum partition bindings. If using a raw device in a cluster service, use this file to bind the devices at boot time. See Section
3.1.1, Editing the
After editing /etc/sysconfig/rawdevices, the changes will take effect either by rebooting or by execute the following command:
rawdevices
File for more information.
Section 2.4:Steps for Setting Up and Connecting the Cluster Hardware 53
# service rawdevices restart
Query all the raw devices by using the command raw -aq:
# raw -aq /dev/raw/raw1 bound to major 8, minor 17 /dev/raw/raw2 bound to major 8, minor 18
Note that, for raw devices, there is no cache coherency between the raw device and the block device. In addition, requests must be 512-byte aligned both in memory and on disk. For example, the standard dd command cannot be used with raw devices because the memory buffer that the command passes to the write system call is not aligned on a 512-byte boundary.
For more information on using the raw command, refer to the raw(8) manual page.
Creating File Systems
Use the mkfs command to create an ext2 file system on a partition. Specify the drive letter and the partition number. For example:
# mkfs -t ext2 -b 4096 /dev/sde3
For optimal performance of shared filesystems, a 4 KB block size was specified in the above example. Note that it is necessary in most cases to specify a 4 KB block size when creating a filesystem since many of the mkfs file system build utilities default to a 1 KB block size, which can cause long fsck times.
Similarly, to create an ext3 filesystem, the following command can be used:
# mkfs -t ext2 -j -b 4096 /dev/sde3
For more information on creating filesystems, refer to the mkfs(8) manual page.
54 Chapter 2:Hardware Installation and Operating System Configuration
Section 3.1:Steps for Installing and Initializing the Cluster Software 55
3 Cluster Software Installation and Configuration
After installing and configuring the cluster hardware, the cluster system software can be installed. The following sections describe installing and initializing of cluster software, checking cluster configura­tion, configuring syslog event logging, and using the cluadmin utility.
3.1 Steps for Installing and Initializing the Cluster Software
Before installing Red Hat Cluster Manager, be sure to install all of the required software, as de­scribed in Section 2.3.1, Kernel Requirements.
In order to preserve the existing cluster configuration database when running updates to the cluster software, back up the cluster database and stop the cluster software before reinstallation. See Section
8.7, Updating the Cluster Software for more information. To install Red Hat Cluster Manager, invoke the command rpm --install clumanager-
x
.rpm, where x is the version of Red Hat Cluster Manager currently available. This package is
installed by default in Red Hat Linux Advanced Server so it is typically not necessary to manually install this individual package.
To initialize and start the cluster software, perform the following tasks:
1. Edit the /etc/sysconfig/rawdevices file on both cluster systems and specify the raw device special files and character devices for the primary and backup quorum partitions. See Con- figuring Quorum Partitions in Section 2.4.4 and Section 3.1.1, Editing the more information.
2. Run the /sbin/cluconfig utility on one cluster system. If updating the cluster software, the utility will inquire before using the the existing cluster database. The utility will remove the cluster database if it is not used.
The utility will prompt for the following cluster-specific information, which will be entered into
member fields in the cluster database. A copy of this islocated in the /etc/cluster.conf
the file:
Raw device special files for the primary and backup quorum partitions, as specified
in the /etc/sysconfig/rawdevices file (for example, /dev/raw/raw1 and /dev/raw/raw2)
Cluster system host names that are returned by the hostname command
rawdevices
File for
56 Chapter 3:Cluster Software Installation and Configuration
Number of heartbeat connections (channels), both Ethernet and serial
Device special file for each heartbeat serial line connection (for example, /dev/ttyS1)
IP host name associated with each heartbeat Ethernet interface
IP address for remote cluster monitoring, also referred to as the "cluster alias". Refer to Section
3.1.2, Configuring the Cluster Alias for further information.
Device special files for the serial ports to which the power switches are connected, if any (for example, /dev/ttyS0), or IP address of a network attached power switch.
Power switch type (for example,
RPS10 or None if not using power switches)
The system will prompt whether or not to enable remote monitoring. Refer to Section 3.1.2, Configuring the Cluster Alias for more information.
See Section 3.1.4, Example of the
cluconfig
Utility for an example of running the utility.
3. After completing the cluster initialization on one cluster system, perform the following tasks on the other cluster system:
Run the/sbin/cluconfig --init=
raw_file
command, where raw_filespecifiesthe primary quorum partition. The script will use the information specified for the first cluster sys­tem as defaults. For example:
cluconfig --init=/dev/raw/raw1
4. Check the cluster configuration:
Invoke the cludiskutil utility with the -t option on both cluster systems to ensure that the quorum partitions map to the same physical device. See Section 3.2.1, Testing the Quorum Partitions for more information.
If using power switches, invoke the clustonith command on both cluster systems to test the remote connections to the power switches. See Section 3.2.2, Testing the Power Switches for more information.
5. Optionally, configure event logging so that cluster messages are logged to a separate file. See Section 3.3, Configuring syslog Event Logging for information.
6. Start the cluster by invoking the cluster start command located in the System V init directory on both cluster systems. For example:
service cluster start
After initializing the cluster, proceed to add cluster services. See Section 3.4, Using the Utility and Section 4.1, Configuring a Service for more information.
cluadmin
Section 3.1:Steps for Installing and Initializing the Cluster Software 57
3.1.1 Editing the rawdevices File
The /etc/sysconfig/rawdevices file is used to map the raw devices for the quorum parti­tions each time a cluster system boots. As part of the cluster software installation procedure, edit the rawdevices file on each cluster system and specify the raw character devices and block devices for the primary and backup quorum partitions. This must be done prior to running the cluconfig utility.
If raw devices are employed in a cluster service, the rawdevices file is also used to bind the devices at boot time. Edit the file and specify the raw character devices and block devices that you want to bind each time the system boots. To make the changes to the rawdevices file take effect without requiring a reboot, perform the following command:
service rawdevices restart
The following is an example rawdevices file that designates two quorum partitions:
# raw device bindings # format: <rawdev> <major> <minor> # <rawdev> <blockdev> # example: /dev/raw/raw1 /dev/sda1 # /dev/raw/raw2 8 5 /dev/raw/raw1 /dev/sdb1 /dev/raw/raw2 /dev/sdb2
See Configuring Quorum Partitionsin Section 2.4.4 for more information about setting up the quorum partitions. SeeCreatingRawDevices in Section 2.4.4for more information onusing the raw command to bind raw character devices to block devices.
Note
The rawdevices configuration must be performed on both cluster members.
3.1.2 Configuring the Cluster Alias
A cluster alias is a means of binding an IP address to one of the active cluster members. At any point in time this IP address will only be bound by one of the cluster members. This IP address is a useful conveniencefor system management and monitoring purposes. For example,suppose an administrator wishes to be able to telnet into an active cluster member, but does not care which cluster member. In this case, simply telnet to the cluster alias IP address (or associated name). The principal usage of the cluster alias is to enable the direction of the cluster GUI monitoring interface to connect to an active cluster member. In this manner, if either of the cluster members are not currently active it is still possible to derive cluster status while being abstracted from having to designate a specific cluster member to connect to.
58 Chapter 3:Cluster Software Installation and Configuration
While running cluconfig, you will be prompted as to whether or not you wish to configure a cluster alias. This appears as the following prompt:
Enter IP address for cluster alias [NONE]: 172.16.33.105
As shown above, the default value is set to NONE, which means that there is no cluster alias, but the user overrides this default and configures an alias using an IP address of 172.16.33.105. The IP address used for a cluster alias is distinct from the IP addresses associated with the cluster member’s hostnames. It is also different from IP addresses associated with cluster services.
3.1.3 Enabling Remote Monitoring
While running cluconfig to specify cluster configuration parameters, the utility will prompt for the following:
Do you wish to enable monitoring, both locally and remotely, via \
the Cluster GUI? yes/no [yes]:
Answering yes enables the cluster to be locally and remotely monitored by the cluster GUI. This is currently the only security provision controlling cluster monitoring access. The cluster GUI is only capable of performing monitoring requests and cannot make any active configuration changes.
Answering no disables Cluster GUI access completely.
3.1.4 Example of the cluconfig Utility
This section details an example of the cluconfig cluster configuration utility, which prompts you for information about the cluster members, and then enters the information into the cluster database. A copy of this is located in the cluster.conf file. In this example, the information entered in clu- config prompts applies to the following configuration:
On the
On the storage1 cluster system:
IP address to be used for the cluster alias: 10.0.0.154
storage0 cluster system:
Ethernet heartbeat channels: storage0 Power switch serial port: /dev/ttyC0 Power switch: RPS10 Quorum partitions: /dev/raw/raw1 and /dev/raw/raw2
Ethernet heartbeat channels: storage1 and cstorage1 Serial heartbeat channel: /dev/ttyS1 Power switch serial port: /dev/ttyS0 Power switch: RPS10 Quorum partitions: /dev/raw/raw1 and /dev/raw/raw2
Section 3.1:Steps for Installing and Initializing the Cluster Software 59
/sbin/cluconfig
Red Hat Cluster Manager Configuration Utility (running on storage0)
- Configuration file exists already.
Would you like to use those prior settings as defaults? (yes/no) [yes]: yes
Enter cluster name [Development Cluster]:
Enter IP address for cluster alias [10.0.0.154]: 10.0.0.154
-------------------------------­Information for Cluster Member 0
--------------------------------
Enter name of cluster member [storage0]: storage0
Looking for host storage0 (may take a few seconds)...
Enter number of heartbeat channels (minimum = 1) [1]: 1 Information about Channel 0 Channel type: net or serial [net]: Enter hostname of the cluster member on heartbeat channel 0 \
[storage0]: storage0
Looking for host storage0 (may take a few seconds)...
Information about Quorum Partitions Enter Primary Quorum Partition [/dev/raw/raw1]: /dev/raw/raw1 Enter Shadow Quorum Partition [/dev/raw/raw2]: /dev/raw/raw2
Information About the Power Switch That Power Cycles Member ’storage0’ Choose one of the following power switches:
o NONE o RPS10 o BAYTECH o APCSERIAL o APCMASTER o WTI_NPS
Power switch [RPS10]: RPS10 Enter the serial port connected to the power switch \
[/dev/ttyS0]: /dev/ttyS0
-------------------------------­Information for Cluster Member 1
-------------------------------­Enter name of cluster member [storage1]: storage1 Looking for host storage1 (may take a few seconds)...
Information about Channel 0
60 Chapter 3:Cluster Software Installation and Configuration
Enter hostname of the cluster member on heartbeat channel 0 \
[storage1]: storage1
Looking for host storage1 (may take a few seconds)...
Information about Quorum Partitions Enter Primary Quorum Partition [/dev/raw/raw1]: /dev/raw/raw1 Enter Shadow Quorum Partition [/dev/raw/raw2]: /dev/raw/raw2
Information About the Power Switch That Power Cycles Member ’storage1’ Choose one of the following power switches:
o NONE o RPS10 o BAYTECH o APCSERIAL o APCMASTER o WTI_NPS
Power switch [RPS10]: RPS10 Enter the serial port connected to the power switch \
[/dev/ttyS0]: /dev/ttyS0
Cluster name: Development Cluster Cluster alias IP address: 10.0.0.154 Cluster alias netmask: 255.255.254.0
Serial port connected to the power switch \
[/dev/ttyS0]: /dev/ttyS0
Cluster name: Development Cluster Cluster alias IP address: 10.0.0.154 Cluster alias netmask: 255.255.254.0
-------------------­Member 0 Information
-------------------­Name: storage0 Primary quorum partition: /dev/raw/raw1 Shadow quorum partition: /dev/raw/raw2 Heartbeat channels: 1 Channel type: net, Name: storage0 Power switch IP address or hostname: storage0 Identifier on power controller for member storage0: storage0
-------------------­Member 1 Information
-------------------­Name: storage1
Section 3.1:Steps for Installing and Initializing the Cluster Software 61
Primary quorum partition: /dev/raw/raw1 Shadow quorum partition: /dev/raw/raw2 Heartbeat channels: 1 Channel type: net, Name: storage1 Power switch IP address or hostname: storage1 Identifier on power controller for member storage1: storage1
-------------------------­Power Switch 0 Information
-------------------------­Power switch IP address or hostname: storage0 Type: RPS10 Login or port: /dev/ttyS0 Password: 10
-------------------------­Power Switch 1 Information
-------------------------­Power switch IP address or hostname: storage1 Type: RPS10 Login or port: /dev/ttyS0 Password: 10
Save the cluster member information? yes/no [yes]: Writing to configuration file...done Configuration information has been saved to /etc/cluster.conf.
---------------------------­Setting up Quorum Partitions
---------------------------­Running cludiskutil -I to initialize the quorum partitions: done Saving configuration information to quorum partitions: done Do you wish to enable monitoring, both locally and remotely, via the \
Cluster GUI? yes/no [yes]: yes
----------------------------------------------------------------
Configuration on this member is complete.
To configure the next member, invoke the following command on that system:
# /sbin/cluconfig --init=/dev/raw/raw1
See the manual to complete the cluster installation
62 Chapter 3:Cluster Software Installation and Configuration
3.2 Checking the Cluster Configuration
To ensure that the cluster software has been correctly configured, use the following tools located in the /sbin directory:
Test the quorum partitions and ensure that they are accessible. Invoke the cludiskutil utility with the -t option to test the accessibility of the quorum par-
titions. See Section 3.2.1, Testing the Quorum Partitions for more information.
Test the operation of the power switches. If power switches are used in the cluster hardware configuration, run the clustonith command
on each cluster system to ensure that it can remotely power-cycle the other cluster system. Do not run this command while the cluster software is running. See Section 3.2.2, Testing the Power Switches for more information.
Ensure that both cluster systems are running the same software version. Invoke the rpm -q clumanager command on each cluster system to display the revision of
the installed cluster RPM.
The following section explains the cluster utilities in further detail.
3.2.1 Testing the Quorum Partitions
The quorum partitions must refer to the same physical device on both cluster systems. Invoke the cludiskutil utility with the -t command to test the quorum partitions and verify that they are accessible.
If the command succeeds, run the cludiskutil -p command on both cluster systems to display a summary of the header data structure for the quorum partitions. If the output is different on the systems, the quorum partitions do not point to the same devices on both systems. Check to make sure that the raw devices exist and are correctly specified in the /etc/sysconfig/rawdevices file. See Configuring Quorum Partitions in Section 2.4.4 for more information.
The following example shows that the quorum partitions refer to the same physical device on two cluster systems (devel0 and devel1):
/sbin/cludiskutil -p
----- Shared State Header -----­Magic# = 0x39119fcd Version = 1 Updated on Thu Sep 14 05:43:18 2000 Updated by node 0
--------------------------------
Section 3.2:Checking the Cluster Configuration 63
/sbin/cludiskutil -p
----- Shared State Header -----­Magic# = 0x39119fcd Version = 1 Updated on Thu Sep 14 05:43:18 2000 Updated by node 0
--------------------------------
The Magic# and Version fields will be the same for all cluster configurations. The last two lines of output indicate the date that the quorum partitions were initialized with cludiskutil -I, and the numeric identifier for the cluster system that invoked the initialization command.
If the output of the cludiskutil utility with the -p option is not the same on both cluster systems, perform the following:
Examine the /etc/sysconfig/rawdevices file on each cluster system and ensure that the raw character devices and block devices for the primary and backup quorum partitions have been accurately specified. If they are not the same, edit the file and correct any mistakes. Then re-run the cluconfig utility. See Section 3.1.1, Editing the
rawdevices
File for more information.
Ensure that you have created the raw devices for the quorum partitions on each cluster system. See Configuring Quorum Partitions in Section 2.4.4 for more information.
On each cluster system, examine the system startup messages at the point where the system probes the SCSI subsystem to determine the bus configuration. Verify that both cluster systems identify the same shared storage devices and assign them the same name.
Verify that a cluster system is not attempting to mount a file system on the quorum partition. For example, make sure that the actual device (for example, /dev/sdb1) is not included in an /etc/fstab file.
After performing these tasks, re-run the cludiskutil utility with the -p option.
3.2.2 Testing the Power Switches
If either network- or serial-attached power switches are employed in the cluster hardware configura­tion, install the cluster software and invoke the clustonith command to test the power switches. Invoke the command on each cluster system to ensure that it can remotely power-cycle the other clus­ter system. If testing is successful, then the cluster can be started. If using watchdog timers or the switch type "None", then this test can be omitted.
The clustonith command can accurately test a power switch only if the cluster software is not running. This is due to the fact that for serial attached switches, only one program at a time can access the serial port that connects a power switch to a cluster system. When the clustonith command is
64 Chapter 3:Cluster Software Installation and Configuration
invoked, it checks the status of the cluster software. If the cluster software is running, the command exits with a message to stop the cluster software.
The format of the clustonith command is as follows:
clustonith [-sSlLvr] [-t devicetype] [-F options-file] \
[-p stonith-parameters]
Options:
-s Silent mode, supresses error and log messages
-S Display switch status
-l List the hosts a switch can access
-L List the set of supported switch types
-r hostname Power cycle the specified host
-v Increases verbose debugging level
When testing power switches, the first step is to ensure that each cluster member can successfully communicate with its attached power switch. The following example of the clustonith command output shows that the cluster member is able to communicate with its power switch:
clustonith -S
WTI Network Power Switch device OK. An example output of the clustonith command when it is unable to communicate with its power switch appears below:
clustonith -S
Unable to determine power switch type. Unable to determine default power switch type.
The above error could be indicative of the following types of problems:
For serial attached power switches: – Verify that the device special file for the remote power switch connection serial port (for ex-
ample, /dev/ttyS0) is specified correctly in the cluster database, as established via the cluconfig command. If necessary, use a terminal emulation package such as minicom to test if the cluster system can access the serial port.
– Ensure that a non-cluster program (for example, a getty program) is not using the serial port
for the remote power switch connection. Youcan use the lsof command to perform this task.
– Check that the cable connection to the remote power switch is correct. Verify that the correct
type of cable is used (for example, an RPS-10 power switch requires a null modem cable), and that all connections are securely fastened.
– Verify that any physical dip switches or rotary switches on the power switch are set properly.
If using an RPS-10 power switch, see Section A.1.1, Setting up RPS-10 Power Switches for more information.
For network based power switches:
Section 3.3:Configuring syslog Event Logging 65
– Verify that the network connection to network-based switches is operational. Most switches
have a link light that indicates connectivity.
– It should be possible to ping the network switch; if not, then the switch may not be properly
configured for its network parameters.
– Verify that the correct password and login name (depending on switch type) have been speci-
fied in the cluster configuration database (as established by running cluconfig). A useful diagnostic approach is to verify telnet access to the network switch using the same parameters as specified in the cluster configuration.
After successfully verifying communication with the switch, attempt to power cycle the other cluster member. Prior to doing this, it would is recommended to verify that the other cluster member is not actively performing any important functions (such as serving cluster services to active clients). The following command depicts a successful power cycle operation:
clustonith -r clu3
Successfully power cycled host clu3.
3.2.3 Displaying the Cluster Software Version
Invoke the rpm -qa clumanager command to display the revision of the installed cluster RPM. Ensure that both cluster systems are running the same version.
3.3 Configuring syslog Event Logging
It is possible to edit the /etc/syslog.conf file to enable the cluster to log events to a file that is different from the /var/log/messages log file. Logging cluster messages to a separate file will help to diagnose problems more clearly.
The cluster systems use the syslogd daemon to log cluster-related events to a file, as specified in the /etc/syslog.conf file. The log file facilitates diagnosis of problems in the cluster. It is recommended to set up event logging so that the syslogd daemon logs cluster messages only from the system on which it is running. Therefore, you need to examine the log files on both cluster systems to get a comprehensive view of the cluster.
The syslogd daemon logs messages from the following cluster daemons:
cluquorumd — Quorum daemon
clusvcmgrd — Service manager daemon
clupowerd — Power daemon
cluhbd — Heartbeat daemon
clumibd — Administrative system monitoring daemon
66 Chapter 3:Cluster Software Installation and Configuration
The importance of an event determines the severity level of the log entry. Important events should be investigated before they affect cluster availability. The cluster can log messages with the following severity levels, listed in order of severity level:
emerg — The cluster system is unusable.
alert — Action must be taken immediately to address the problem.
crit — A critical condition has occurred.
err — An error has occurred.
warning — A significant event that may require attention has occurred.
notice — An event that does not affect system operation has occurred. info — An normal cluster operation has occurred.
debug — Diagnostic output detailing normal cluster operations.
The default logging severity levels for the cluster daemons are
warning and higher.
Examples of log file entries are as follows:
May 31 20:42:06 clu2 clusvcmgrd[992]: <info> Service Manager starting May 31 20:42:06 clu2 clusvcmgrd[992]: <info> mount.ksh info: /dev/sda3 \
is not mounted
May 31 20:49:38 clu2 clulog[1294]: <notice> stop_service.ksh notice: \
Stopping service dbase_home
May 31 20:49:39 clu2 clusvcmgrd[1287]: <notice> Service Manager received \
a NODE_UP event for stor5
Jun 01 12:56:51 clu2 cluquorumd[1640]: <err> updateMyTimestamp: unable to \
update status block. Jun 01 12:34:24 clu2 cluquorumd[1268]: <warning> Initiating cluster stop Jun 01 12:34:24 clu2 cluquorumd[1268]: <warning> Completed cluster stop Jul 27 15:28:40 clu2 cluquorumd[390]: <err> shoot_partner: successfully shot partner.
[1] [2] [3] [4] [5]
Each entry in the log file contains the following information:
[1]Timestamp
[2] Cluster system on which the event was logged
[3] Subsystem that generated the event
[4] Severity level of the event
[5] Description of the event
Section 3.4:Using the cluadmin Utility 67
After configuring the cluster software, optionally edit the /etc/syslog.conf file to enable the cluster to log events to a file that is different from the default log file, /var/log/messages. The cluster utilities and daemons log their messages using a syslog tag called local4. Using a cluster­specific log file facilitates cluster monitoring and problem solving. To log cluster events to both the
/var/log/cluster and /var/log/messages files, add lines similar to the following to the /etc/syslog.conf file:
# # Cluster messages coming in on local4 go to /var/log/cluster # local4.* /var/log/cluster
To prevent the duplication of messages and log cluster events only to the /var/log/cluster file, add lines similar to the following to the /etc/syslog.conf file:
# Log anything (except mail) of level info or higher. # Don’t log private authentication messages! *.info;mail.none;news.none;authpriv.none;local4.none /var/log/messages
To apply the previous changes, you can invokethe killall -HUP syslogd command, or restart syslog with a command similar to /etc/rc.d/init.d/syslog restart.
In addition, it is possible to modify the severity level of the events that are logged by the individual cluster daemons. See Section 8.6, Modifying Cluster Event Logging for more information.
3.4 Using the cluadmin Utility
The cluadmin utility provides a command-line user interface that enables an administrator to moni­tor and manage the cluster systems and services. Use the cluadmin utility to perform the following tasks:
Add, modify, and delete services
Disable and enable services
Display cluster and service status
Modify cluster daemon event logging
Backup and restore the cluster database The cluster uses an advisory lock to prevent the cluster database from being simultaneously modified
bymultiple users on either cluster system. Users can onlymodify the database ifthey hold the advisory lock.
When the cluadmin utility is invoked, the cluster software checks if the lock is already assigned to a user. If the lock is not already assigned, the cluster software assigns the requesting user the lock. When the user exits from the cluadmin utility, the lock is relinquished.
68 Chapter 3:Cluster Software Installation and Configuration
If another user holds the lock, a warning will be displayed indicating that there is already a lock on the database. The cluster software allows for the option of taking the lock. If the lock is taken by the current requesting user, the previous holder of the lock can no longer modify the cluster database.
Takethe lock only if necessary,because uncoordinated simultaneous configurationsessions may cause unpredictable cluster behavior. In addition, it is recommended to make only one change to the cluster database (for example, adding, modifying, or deleting services) at a time. The cluadmin command line options are as follows:
-d or --debug
Displays extensive diagnostic information.
-h, -?,or--help
Displays help about the utility, and then exits.
-n or --nointeractive
Bypasses the cluadmin utility’s top-level command loop processing. This option is used for cluadmin debugging purposes.
-t or --tcl
Adds a Tcl command to the cluadmin utility’s top-level command interpreter. To pass a Tcl command directly to the utility’s internal Tcl interpreter, at the cluadmin> prompt, preface the Tcl command with tcl. This option is used for cluadmin debugging purposes.
-V or --version
Displays information about the current version of cluadmin.
When the cluadmin utility is invoked without the -n option, the cluadmin> prompt appears. You can then specify commands and subcommands. Table 3–1, commands and subcommands for the cluadmin utility:
cluadmin
Commands describes the
Section 3.4:Using the cluadmin Utility 69
Table 3–1 cluadmin Commands
clu­admin
Com­mand
help
cluster
cluadmin
Subcom­mand
None
status
loglevel
reload
name
backup
Description Example
Displays help for the specified cluadmin command or subcommand.
Displays a snapshot of the current cluster status. See Section 8.1,
Displaying Cluster and Service Status
for information. Sets the logging for the specified
cluster daemon to the specified severity level. See Section 8.6, Modifying Cluster Event Logging for information.
Forces the cluster daemons to re-read the cluster configuration database. See Section 8.8, Reloading the Cluster Database for information.
Sets the name of the cluster to the specified name. The cluster name is included in the output of the clustat cluster monitoring command. See Section 8.9, Changing the Cluster Name for information.
Saves a copy of the cluster configuration database in the /etc/cluster.conf.bak file. See Section 8.5, Backing Up and Restoring the Cluster Database for information.
help service add
cluster status
cluster loglevel cluquorumd 7
cluster reload
cluster name dbasecluster
cluster backup
70 Chapter 3:Cluster Software Installation and Configuration
clu­admin
Com­mand
cluadmin
Subcom­mand
restore
Description Example
Restores the cluster configuration
cluster restore
database from the backup copy in the /etc/cluster.conf.bak file. See Section 8.5, Backing Up and Restoring the Cluster Database for information.
saveas
Saves the cluster configuration database to the specified file. See Section 8.5, Backing Up and
cluster saveas clus­ter_backup.conf
Restoring the Cluster Database for information.
restore­from
Restores the cluster configuration database from the specified file. See Section 8.5, Backing Up and
cluster re­storefrom clus­ter_backup.conf
Restoring the Cluster Database for information.
service
add
Adds a cluster service to the cluster
service add
database. The command prompts you for information about service resources and properties. See Section
4.1, Configuring a Service for information.
modify
Modifies the resources or properties of the specified service. You can
service modify dbservice
modify any of the information that you specified when the service was created. See Section 4.5, Modifying a Service for information.
show state
Displays the current status of all services or the specified service. See
service show state dbservice
Section 8.1, Displaying Cluster and Service Status for information.
Section 3.4:Using the cluadmin Utility 71
clu­admin
Com­mand
cluadmin
Subcom­mand
relocate
Description Example
Causes a service to be stopped on the cluster member its currently running
service relocate nfs1
on and restarted on the other. Refer to Section 4.6, Relocating a Service for more information.
show config
Displays the current configuration for the specified service. See Section 4.2,
service show config dbservice
Displaying a Service Configuration
for information.
disable
Stops the specified service. You must enable a service to make it available
service disable dbservice
again. See Section 4.3, Disabling a Service for information.
enable
Starts the specified disabled service. See Section 4.4, Enabling a Service
service enable dbservice
for information.
delete
Deletes the specified service from the cluster configuration database. See
service delete dbservice
Section 4.7, Deleting a Service for information.
apropos
None
Displays the cluadmin commands
apropos service
that match the specified character string argument or, if no argument is specified, displays all cluadmin commands.
clear exit quit
None None None
Clears the screen display. Exits from cluadmin. Exits from cluadmin.
clear exit quit
72 Chapter 3:Cluster Software Installation and Configuration
While using the cluadmin utility, press the [Tab] key to help identify cluadmin commands. For example, pressing the
[Tab] key at the cluadmin> prompt displays a list of all the commands. En-
tering a letter at the prompt and then pressing the [Tab] key displays the commands that begin with the specified letter. Specifying a command and then pressing the
[Tab] key displays a list of all the
subcommands that can be specified with that command. Users can additionally display the history of cluadmin commands by pressing the up arrow and
down arrow keys at the prompt. The command history is stored in the .cluadmin_history file in the user’s home directory.
Section 4.1:Configuring a Service 73
4 Service Configuration and Administration
The following sections describe how to configure, display, enable/disable, modify,relocate, and delete a service, as well as how to handle services which fail to start.
4.1 Configuring a Service
The cluster systems must be prepared before any attempts to configure a service. For example, set up disk storage or applications used in the services. Then, add information about the service properties and resources to the cluster database by using the cluadmin utility. This information is used as parameters to scripts that start and stop the service.
To configure a service, follow these steps:
1. If applicable, create a script that will start and stop the application used in the service. See Section
4.1.2, Creating Service Scripts for information.
2. Gather information about service resources and properties. See Section 4.1.1, Gathering Service Information for information.
3. Set up the file systems or raw devices that the service will use. See Section 4.1.3, Configuring Service Disk Storage for information.
4. Ensure that the application software can run on each cluster system and that the service script, if any, can start and stop the service application. See Section 4.1.4, Verifying Application Software and Service Scripts for information.
5. Back up the /etc/cluster.conf file. See Section 8.5, Backing Up and Restoring the Cluster Database for information.
6. Invoke the cluadmin utility and specify the service add command. The cluadmin utility will prompt for information about the service resources and properties obtained in Step 2. If the service passes the configuration checks, it will be started on the user-designated cluster system, unless the user wants to keep the service disabled. For example:
cluadmin> service add
For more information about adding a cluster service, see the following:
Section 5.1, Setting Up an Oracle Service
Section 5.3, Setting Up a MySQL Service
74 Chapter 4:Service Configuration and Administration
Section 5.4, Setting Up a DB2 Service
Section 6.1, Setting Up an NFS Service
Section 6.2, Setting Up a High Availability Samba Service
Section 7.1, Setting Up an Apache Service
4.1.1 Gathering Service Information
Before creating a service, gather all available information about the service resources and properties. When adding a service to the cluster database, the cluadmin utility will prompt for this information.
In some cases, it is possible to specify multiple resources for a service (for example, multiple IP ad­dresses and disk devices).
The service properties and resources that a user is able to specify are described in the following table.
Table 4–1 Service Property and Resource Information
Service Property or Resource
Description
Service name
Preferred member
Preferred member relocation policy
Script location
Each service must have a unique name. A service name can consist of one to 63 characters and must consist of a combination of letters (either uppercase or lowercase), integers, underscores, periods, and dashes. However, a service name must begin with a letter or an underscore.
Specify the cluster system, if any, on which the service will run unless failover has occurred or unless the service is manually relocated.
When enabled, this policy will automatically relocate a service to its preferred member when that system joins the cluster. If this policy is disabled, the service will remain running on the non-preferred member. For example, if an administrator enables this policy and the failed preferred member for the service reboots and joins the cluster, the service will automatically restart on the preferred member.
If applicable, specify the full path name for the script that will be used to start and stop the service. See Section 4.1.2, Creating Service Scripts for more information.
Section 4.1:Configuring a Service 75
Service Property or Resource
Description
IP address One or more Internet protocol (IP) addresses may be assigned to a service.
This IP address (sometimes called a "floating" IP address) is different from the IP address associated with the host name Ethernet interface for a cluster system, because it is automatically relocated along with the service resources, when failover occurs. If clients use this IP address to access the service, they will not know which cluster system is running the service, and failover is transparent to the clients.
Note that cluster members must have network interface cards configured in the IP subnet of each IP address used in a service.
Netmask and broadcast addresses for each IP address can also be specified; if they are not, then the cluster uses the netmask and broadcast addresses from the network interconnect in the subnet.
Disk partition
Mount points, file system types, mount options, NFS export options, and Samba shares
Specify each shared disk partition used in a service.
If using a file system, specify the type of file system, the mount point, and any mount options. Mount options available to specify are the standard file system mount options that are described in the mount(8) manual page. It is not necessary to provide mount information for raw devices (if used in a service). The ext2 and ext3 file systems are the recommended file systems for a cluster. Although a different file system may be used (such as reiserfs), only ext2 and ext3 have been thoroughly tested and are supported. Specify whether or not to enable forced unmount for a file system. Forced unmount allows the cluster service management infrastructure to unmount a file system even if it is being accessed by an application or user (that is, even if the file system is "busy"). This is accomplished by terminating any applications that are accessing the file system. cluadmin will prompt whether or not to NFS export the filesystem and if so, what access permissions should be applied. Refer to Section 6.1, Setting Up an NFS Service for details. Specify whether or not to make the filesystem accessible to Windows clients via Samba.
76 Chapter 4:Service Configuration and Administration
Service Property or Resource
Description
Service Check Interval
Disable service policy
Specifies the frequency (in seconds) that the system will check the health of the application associated with the service. For example, it will verify that the necessary NFS or Samba daemons are running. For additional service types, the monitoring consists of examining the return status when calling the "status" clause of the application service script. Specifying a value of 0 for the service check interval will disable checking.
If a user does not want to automatically start a service after it is added to the cluster, it is possible to keep the new service disabled until the user enables it.
4.1.2 Creating Service Scripts
The cluster infrastructure starts and stops service to specified applications by running service specific scripts. For both NFS and Samba services, the associated scripts are built into the cluster services infrastructure. Consequently,when running cluadmin to configure NFS and Samba services, do not enter a service script name. For other application types it is necessary to designate a service script. For example, when configuring a database application in cluadmin, specify the fully qualifiedpathname of the corresponding database start script.
The format of the service scripts conforms to the conventions followed by the System V init scripts. This convention dictates that the scripts have a start, stop, and status clause. These should return an exit status of 0 on success. The cluster infrastructure will stop a cluster service that fails to successfully start. Inability of a service to start will result in the service being placed in a disabled state.
In addition to performing the stop and start functions, service scripts are also used for application service monitoring purposes. This is performed by calling the status clause of a service script. To enable service monitoring, specify a nonzero value for the Status check interval: prompt in cluadmin. If a nonzero exit is returnedby a status check request to the service script, then the cluster infrastructure will first attempt to restart the application on the member it was previously running on. Status functions do not have to be fully implemented in service scripts. If no real monitoring is performed by the script, then a stub status clause should be present which returns success.
The operations performed within the status clause of an application can be tailored to best meet the application’s needs as well as site-specific parameters. For example, a simple status check for a data­base would consist of verifying that the database process is still running. A more comprehensive check would consist of a database table query.
Section 4.2:Displaying a Service Configuration 77
The /usr/share/cluster/doc/services/examples directory contains a template that can be used to create service scripts, in addition to examples of scripts. See Section 5.1, Setting Up
an Oracle Service, Section 5.3, Setting Up a MySQL Service, Section 7.1, Setting Up an Apache Service, and Section 5.4, Setting Up a DB2 Service for sample scripts.
4.1.3 Configuring Service Disk Storage
Prior to creating a service, set up the shared file systems and raw devices that the service will use. See Section 2.4.4, Configuring Shared Disk Storage for more information.
If employing raw devices in a cluster service, it is possible to use the /etc/sysconfig/rawde- vices file to bind the devices at boot time. Edit the file and specify the raw character devices and block devices that are to be bound each time the system boots. See Section 3.1.1, Editing the
vices
Note that software RAID, and host-based RAID are not supported for shared disk storage. Only cer­tified SCSI adapter-based RAID cards can be used for shared disk storage.
Administrators should adhere to the following service disk storage recommendations:
For optimal performance, use a 4 KB block size when creating file systems. Note that some of the
To facilitate quicker failover times, it is recommended that the ext3 filesystem be used. Refer to
For large file systems, use the mount command with the nocheck option to bypass code that
File for more information.
mkfs file system build utilities default to a 1 KB block size, which can cause long fsck times.
Creating File Systems in Section 2.4.4 for more information.
checks all the block groups on the partition. Specifying the nocheck option can significantly decrease the time required to mount a large file system.
rawde-
4.1.4 Verifying Application Software and Service Scripts
Prior to setting up a service, install any application that will be used in a service on each system. After installing the application, verify that the application runs and can access shared disk storage. To prevent data corruption, do not run the application simultaneously on both systems.
If using a script to start and stop the service application, install and test the script on both cluster systems, and verify that it can be used to start and stop the application. See Section 4.1.2, Creating Service Scripts for information.
4.2 Displaying a Service Configuration
Administrators can display detailed information about the configuration of a service. This information includes the following:
Service name
78 Chapter 4:Service Configuration and Administration
Whether the service was disabled after it was added
Preferred member system
Whether the service will relocate to its preferred member when it joins the cluster
Service Monitoring interval
Service start script location IP addresses
Disk partitions
File system type
Mount points and mount options
NFS exports
Samba shares To display cluster service status, see Section 8.1, Displaying Cluster and Service Status. To display service configuration information, invokethe cluadmin utility and specify the service
show config command. For example:
cluadmin> service show config
0) dummy
1) nfs_pref_clu4
2) nfs_pref_clu3
3) nfs_nopref
4) ext3
5) nfs_eng
6) nfs_engineering c) cancel
Choose service: 6 name: nfs_engineering disabled: no preferred node: clu3 relocate: yes IP address 0: 172.16.33.164 device 0: /dev/sdb11
mount point, device 0: /mnt/users/engineering mount fstype, device 0: ext2 mount options, device 0: rw,nosuid,sync force unmount, device 0: yes
NFS export 0: /mnt/users/engineering/ferris
Client 0: ferris, rw
NFS export 0: /mnt/users/engineering/denham
Client 0: denham, rw
Section 4.4:Enabling a Service 79
NFS export 0: /mnt/users/engineering/brown
Client 0: brown, rw
cluadmin>
If the name of the service is known, it can be specified with the service show config
vice_name
command.
ser-
4.3 Disabling a Service
A running service can be disabled in order to stop the service and make it unavailable. Once disabled, a service can then be re-enabled. See Section 4.4, Enabling a Service for information.
There are several situations in which a running service may need to be disabled:
To modify a service Arunning service must be disabled before it can be modified. See Section 4.5, Modifying a Service
for more information.
To temporarily stop a service A running service can be disabled, making it unavailable to clients without having to completely
delete the service.
To disable a running service, invoke the cluadmin utility and specify the service disable
service_name
cluadmin> service disable user_home Are you sure? (yes/no/?) y notice: Stopping service user_home ... notice: Service user_home is disabled service user_home disabled
command. For example:
4.4 Enabling a Service
A disabled service can be enabled to start the service and make it available. To enable a disabled service, invoke the cluadmin utility and specify the service enable
service_name
cluadmin> service enable user_home Are you sure? (yes/no/?) y notice: Starting service user_home ... notice: Service user_home is running service user_home enabled
command:
80 Chapter 4:Service Configuration and Administration
4.5 Modifying a Service
All properties that were specified when a service was created can be modified. For example, specified IP addresses can be changed. More resources can also be added to a service (for example, more file systems). See Section 4.1.1, Gathering Service Information for information.
A service must be disabled before it can be modified. If an attempt is made to modify a running service, the cluster manager will prompt to disable it. See Section 4.3, Disabling a Service for more information.
Because a service is unavailable while being modified, be sure to gather all the necessary service information before disabling it in order to minimize service down time. In addition, back upthe cluster database before modifying a service. See Section 8.5, BackingUp and Restoring the Cluster Database for more information.
To modify a disabled service, invoke the cluadmin utility and specify the service modify
service_name
cluadmin> service modify web1
Service properties and resources can also be modified, as needed. The cluster will check the service modifications and allow correction of any mistakes. The cluster will verify the submitted service modification and then start the service, unless prompted to keep the service disabled. If changes are not submitted, the service will be started, if possible, using the original configuration.
command.
4.6 Relocating a Service
In addition to providing automatic service failover, a cluster enables administrators to cleanly stop a service on one cluster system and then start it on the other cluster system. This service relocation functionality allows administrators to perform maintenance on a cluster system while maintaining application and data availability.
To relocate a service by using the cluadmin utility, invoke the service relocate command. For example:
cluadmin> service relocate nfs1
If a specific service is not designated, then a menu of running services will appear to choose from. If an error occurs while attempting to relocate a service, a useful diagnostic approach would be to try
to disable the individual service and then enable the service on the other cluster member.
Section 4.8:Handling Services that Fail to Start 81
4.7 Deleting a Service
A cluster service can be deleted. Note that the cluster database should be backed up before deleting a service. See Section 8.5, Backing Up and Restoring the Cluster Database for information.
To delete a service by using the cluadmin utility, follow these steps:
1. Invoke the cluadmin utility on the cluster system that is running the service, and specify the
service disable
service_name
more information.
2. Specify the service delete
service_name
For example:
cluadmin> service disable user_home Are you sure? (yes/no/?) y notice: Stopping service user_home ... notice: Service user_home is disabled service user_home disabled
cluadmin> service delete user_home Deleting user_home, are you sure? (yes/no/?): y user_home deleted. cluadmin>
command. See Section 4.3, Disabling a Service for
command to delete the service.
4.8 Handling Services that Fail to Start
The cluster puts a service into the disabled state if it is unable to successfully start the service. A
disabled state can be caused by various problems, such as a service start did not succeed, and the
subsequent service stop also failed. Be sure to carefully handle failed services. If service resources are still configured on the owner sys-
tem, starting the service on the other cluster system may cause significant problems. For example, if a file system remains mounted on the owner system, and you start the service on the other cluster sys­tem, the file system will be mounted on both systems, which can cause data corruption. If the enable fails, the service will remain in the
It is possible to modify a service that is in the correct the problem that caused the the owner system, if possible, or it will remain in the to follow in the event of service failure:
1. Modify cluster event logging to log debugging messages. See Section 8.6, Modifying Cluster Event Logging for more information.
disabled state.
disabled state. It may be necessary to do this in order to
disabled state. After modifying the service, it will be enabled on
disabled state. The following list details steps
82 Chapter 4:Service Configuration and Administration
2. Use the cluadmin utility to attempt to enable or disable the service on the cluster system that owns the service. See Section 4.3, Disabling a Service and Section 4.4, Enabling a Service for more information.
3. If the service does not start or stop on the owner system, examinethe /var/log/messages log file, and diagnose and correct the problem. You may need to modify the service to fix incorrect information in the cluster database (for example, an incorrect start script), or you may need to perform manual tasks on the owner system (for example, unmounting file systems).
4. Repeat the attempt to enable or disable the service on the owner system. If repeated attempts fail to correct the problem and enable or disable the service, reboot the owner system.
5. If still unable to successfully start the service, verify that the service can be manually restarted out­side of the cluster framework. For example, this may include manually mounting the filesystems and manually running the service start script.
Section 5.1:Setting Up an Oracle Service 83
5 Database Services
This chapter contains instructions for configuring Red Hat Linux Advanced Server to make database services highly available.
Note
The following descriptions present example database configuration instruc­tions. Be aware that differences may exist in newer versionsof each database product. Consequently, this information may not be directly applicable.
5.1 Setting Up an Oracle Service
A database service can serve highly-availabledata to a database application. Theapplication can then provide network access to database client systems, such as Web servers. If the service fails over, the application accesses the shared database data through the new cluster system. A network-accessible database service is usually assigned an IP address, which is failed over along with the service to main­tain transparent access for clients.
This section provides an example of setting up a cluster service for an Oracle database. Although the variables used in the service scripts depend on the specific Oracle configuration, the example may aid in setting up a service for individual environments. See Section 5.2, Tuning Oracle Services for information about improving service performance.
In the example that follows:
The service includes one IP address for the Oracle clients to use.
The service has two mounted file systems, one for the Oracle software (/u01) and the other for the Oracle database (/u02), which were set up before the service was added.
An Oracle administration account with the name the service was added.
Network access in this example is through Perl DBI proxy.
The administration directory is on a shared disk that is used in conjunction with the Oracle service (for example, /u01/app/oracle/admin/db1).
oracle was created on both cluster systems before
The Oracle service example uses five scripts that must be placed in /home/oracle and owned by the Oracle administration account. The oracle script is used to start and stop the Oracle service. Specify this script when you add the service. This script calls the other Oracle example scripts. The startdb and stopdb scripts start and stop the database. The startdbi and stopdbi scripts
84 Chapter 5:Database Services
start and stop a Web application that has been written using Perl scripts and modules and is used to interact with the Oracle database. Note that there are many ways for an application to interact with an Oracle database.
The following is an example of the oracle script, which is used to start and stop the Oracle service. Note that the script is run as user oracle, instead of root.
#!/bin/sh # # Cluster service script to start/stop oracle #
cd /home/oracle
case $1 in ’start’)
su - oracle -c ./startdbi su - oracle -c ./startdb ;;
’stop’)
su - oracle -c ./stopdb su - oracle -c ./stopdbi ;;
esac
The following is an example of the startdb script, which is used to start the Oracle Database Server instance:
#!/bin/sh #
# # Script to start the Oracle Database Server instance. # ######################################################################## # # ORACLE_RELEASE # # Specifies the Oracle product release. # ########################################################################
ORACLE_RELEASE=8.1.6
######################################################################## #
Section 5.1:Setting Up an Oracle Service 85
# ORACLE_SID # # Specifies the Oracle system identifier or "sid", which is the name of # the Oracle Server instance. # ########################################################################
export ORACLE_SID=TESTDB
######################################################################## # # ORACLE_BASE # # Specifies the directory at the top of the Oracle software product and # administrative file structure. # ########################################################################
export ORACLE_BASE=/u01/app/oracle
######################################################################## # # ORACLE_HOME # # Specifies the directory containing the software for a given release. # The Oracle recommended value is $ORACLE_BASE/product/<release> # ########################################################################
export ORACLE_HOME=/u01/app/oracle/product/${ORACLE_RELEASE}
######################################################################## # # LD_LIBRARY_PATH # # Required when using Oracle products that use shared libraries. # ########################################################################
export LD_LIBRARY_PATH=/u01/app/oracle/product/${ORACLE_RELEASE}/lib
######################################################################## # # PATH #
86 Chapter 5:Database Services
# Verify that the users search path includes $ORCLE_HOME/bin # ########################################################################
export PATH=$PATH:/u01/app/oracle/product/${ORACLE_RELEASE}/bin
######################################################################## # # This does the actual work. # # The oracle server manager is used to start the Oracle Server instance # based on the initSID.ora initialization parameters file specified. # ########################################################################
/u01/app/oracle/product/${ORACLE_RELEASE}/bin/svrmgrl << EOF spool /home/oracle/startdb.log connect internal; startup pfile = /u01/app/oracle/admin/db1/pfile/initTESTDB.ora open; spool off EOF
exit 0
The following is an example of the stopdb script, which is used to stop the Oracle Database Server instance:
#!/bin/sh # # # Script to STOP the Oracle Database Server instance. # ###################################################################### # # ORACLE_RELEASE # # Specifies the Oracle product release. # ######################################################################
ORACLE_RELEASE=8.1.6
###################################################################### # # ORACLE_SID #
Section 5.1:Setting Up an Oracle Service 87
# Specifies the Oracle system identifier or "sid", which is the name # of the Oracle Server instance. # ######################################################################
export ORACLE_SID=TESTDB
###################################################################### # # ORACLE_BASE # # Specifies the directory at the top of the Oracle software product # and administrative file structure. # ######################################################################
export ORACLE_BASE=/u01/app/oracle
###################################################################### # # ORACLE_HOME # # Specifies the directory containing the software for a given release. # The Oracle recommended value is $ORACLE_BASE/product/<release> # ######################################################################
export ORACLE_HOME=/u01/app/oracle/product/${ORACLE_RELEASE}
###################################################################### # # LD_LIBRARY_PATH # # Required when using Oracle products that use shared libraries. # ######################################################################
export LD_LIBRARY_PATH=/u01/app/oracle/product/${ORACLE_RELEASE}/lib
###################################################################### # # PATH # # Verify that the users search path includes $ORCLE_HOME/bin #
88 Chapter 5:Database Services
######################################################################
export PATH=$PATH:/u01/app/oracle/product/${ORACLE_RELEASE}/bin
###################################################################### # # This does the actual work. # # The oracle server manager is used to STOP the Oracle Server instance # in a tidy fashion. # ######################################################################
/u01/app/oracle/product/${ORACLE_RELEASE}/bin/svrmgrl << EOF spool /home/oracle/stopdb.log connect internal; shutdown abort; spool off EOF
exit 0
The following is an example of the startdbi script, which is used to start a networking DBI proxy daemon:
#!/bin/sh # # ##################################################################### # # This script allows are Web Server application (perl scripts) to # work in a distributed environment. The technology we use is # base upon the DBD::Oracle/DBI CPAN perl modules. # # This script STARTS the networking DBI Proxy daemon. # #####################################################################
export ORACLE_RELEASE=8.1.6 export ORACLE_SID=TESTDB export ORACLE_BASE=/u01/app/oracle export ORACLE_HOME=/u01/app/oracle/product/${ORACLE_RELEASE} export LD_LIBRARY_PATH=/u01/app/oracle/product/${ORACLE_RELEASE}/lib export PATH=$PATH:/u01/app/oracle/product/${ORACLE_RELEASE}/bin
Section 5.1:Setting Up an Oracle Service 89
# # This line does the real work. #
/usr/bin/dbiproxy --logfile /home/oracle/dbiproxy.log --localport 1100 &
exit 0
The following is an example of the stopdbi script, which is used to stop a networking DBI proxy daemon:
#!/bin/sh # # ################################################################### # # Our Web Server application (perl scripts) work in a distributed # environment. The technology we use is base upon the # DBD::Oracle/DBI CPAN perl modules. # # This script STOPS the required networking DBI Proxy daemon. # ###################################################################
PIDS=$(ps ax | grep /usr/bin/dbiproxy | awk ’{print $1}’)
for pid in $PIDS do
done
exit 0
kill -9 $pid
The following example shows how to use cluadmin to add an Oracle service.
cluadmin> service add oracle
The user interface will prompt you for information about the service. Not all information is required for all services.
Enter a question mark (?) at a prompt to obtain help.
Enter a colon (:) and a single-character command at a prompt to do one of the following:
90 Chapter 5:Database Services
c - Cancel and return to the top-level cluadmin command r - Restart to the initial prompt while keeping previous responses p - Proceed with the next prompt
Preferred member [None]: ministor0 Relocate when the preferred member joins the cluster (yes/no/?) \
[no]: yes
User script (e.g., /usr/foo/script or None) \
[None]: /home/oracle/oracle
Do you want to add an IP address to the service (yes/no/?): yes
IP Address Information
IP address: 10.1.16.132 Netmask (e.g. 255.255.255.0 or None) [None]: 255.255.255.0 Broadcast (e.g. X.Y.Z.255 or None) [None]: 10.1.16.255
Do you want to (a)dd, (m)odify, (d)elete or (s)how an IP address,
or are you (f)inished adding IP addresses: f
Do you want to add a disk device to the service (yes/no/?): yes
Disk Device Information
Device special file (e.g., /dev/sda1): /dev/sda1 Filesystem type (e.g., ext2, reiserfs, ext3 or None): ext2 Mount point (e.g., /usr/mnt/service1 or None) [None]: /u01 Mount options (e.g., rw, nosuid): [Return] Forced unmount support (yes/no/?) [no]: yes
Do you want to (a)dd, (m)odify, (d)elete or (s)how devices,
or are you (f)inished adding device information: a
Device special file (e.g., /dev/sda1): /dev/sda2 Filesystem type (e.g., ext2, reiserfs, ext3 or None): ext2 Mount point (e.g., /usr/mnt/service1 or None) [None]: /u02 Mount options (e.g., rw, nosuid): [Return] Forced unmount support (yes/no/?) [no]: yes
Do you want to (a)dd, (m)odify, (d)elete or (s)how devices,
or are you (f)inished adding devices: f
Section 5.2:Tuning Oracle Services 91
Disable service (yes/no/?) [no]: no
name: oracle disabled: no preferred node: ministor0 relocate: yes user script: /home/oracle/oracle IP address 0: 10.1.16.132
netmask 0: 255.255.255.0 broadcast 0: 10.1.16.255
device 0: /dev/sda1
mount point, device 0: /u01 mount fstype, device 0: ext2 force unmount, device 0: yes
device 1: /dev/sda2
mount point, device 1: /u02 mount fstype, device 1: ext2 force unmount, device 1: yes
Add oracle service as shown? (yes/no/?) y notice: Starting service oracle ... info: Starting IP address 10.1.16.132 info: Sending Gratuitous arp for 10.1.16.132 (00:90:27:EB:56:B8) notice: Running user script ’/home/oracle/oracle start’ notice, Server starting Added oracle. cluadmin>
5.2 Tuning Oracle Services
The Oracle database recoverytime after a failoveris directly proportional to the number of outstanding transactions and the size of the database. The following parameters control database recovery time:
LOG_CHECKPOINT_TIMEOUT
LOG_CHECKPOINT_INTERVAL
FAST_START_IO_TARGET
REDO_LOG_FILE_SIZES
To minimize recovery time, set the previous parameters to relatively low values. Note that excessively low values will adversely impact performance. Try different values in order to find the optimal value.
Oracle provides additional tuning parameters that control the number of database transaction retries and the retry delay time. Be sure that these values are large enough to accommodate the failover time
92 Chapter 5:Database Services
in the cluster environment. This will ensure that failover is transparent to database client application programs and does not require programs to reconnect.
5.3 Setting Up a MySQL Service
A database service can serve highly-available data to a MySQL database application. The application can then provide network access to database client systems, such as Web servers. If the service fails over, the application accesses the shared database data through the new cluster system. A network­accessible database service is usually assigned one IP address, which is failed over along with the service to maintain transparent access for clients.
An example of a MySQL database service is as follows:
The MySQL server and the database instance both reside on a file system that is located on a disk partition on shared storage. This allowsthe database data and its run-time state information, which is required for failover, to be accessed by both cluster systems. In the example, the file system is mounted as /var/mysql, using the shared disk partition /dev/sda1.
An IP address is associated with the MySQL database to accommodate network access by clients of the database service. This IP address will automatically be migrated among the cluster members as the service fails over. In the example below, the IP address is 10.1.16.12.
The script that is used to start and stop the MySQL database is the standard System V init script, which has been modified with configuration parameters to match the file system on which the database is installed.
By default, a client connection to a MySQL server will time out after eight hours of inactivity. This connection limit can be modified by setting the wait_timeout variable when you start mysqld. For example, to set timeouts to 4 hours, start the MySQL daemon as follows:
mysqld -O wait_timeout=14400
To check if a MySQL server has timed out, invoke the mysqladmin version command and examine the uptime. Invoke the query again to automatically reconnect to the server.
Depending on the Linux distribution,one of the following messages may indicate a MySQL server timeout:
CR_SERVER_GONE_ERROR CR_SERVER_LOST
A sample script to start and stop the MySQL database is located in /usr/share/clus­ter/doc/services/examples/mysql.server, and is shown below:
#!/bin/sh # Copyright Abandoned 1996 TCX DataKonsult AB & Monty Program KB & Detron HB # This file is public domain and comes with NO WARRANTY of any kind
Section 5.3:Setting Up a MySQL Service 93
# Mysql daemon start/stop script. # Usually this is put in /etc/init.d (at least on machines SYSV R4
# based systems) and linked to /etc/rc3.d/S99mysql. When this is done # the mysql server will be started when the machine is started.
# Comments to support chkconfig on RedHat Linux # chkconfig: 2345 90 90 # description: A very fast and reliable SQL database engine.
PATH=/sbin:/usr/sbin:/bin:/usr/bin basedir=/var/mysql bindir=/var/mysql/bin datadir=/var/mysql/var pid_file=/var/mysql/var/mysqld.pid mysql_daemon_user=root # Run mysqld as this user. export PATH
mode=$1 if test -w / # determine if we should look at the root config file
then # or user config file
conf=/etc/my.cnf
else
conf=$HOME/.my.cnf # Using the users config file
fi # The following code tries to get the variables safe_mysqld needs from the
# config file. This isn’t perfect as this ignores groups, but it should # work as the options doesn’t conflict with anything else.
if test -f "$conf" # Extract those fields we need from config file. then
if grep "^datadir" $conf > /dev/null then
datadir=‘grep "^datadir" $conf | cut -f 2 -d= | tr -d ’ ’‘ fi if grep "^user" $conf > /dev/null then
mysql_daemon_user=‘grep "^user" $conf | cut -f 2 -d= | tr -d ’ ’ | head -1‘ fi if grep "^pid-file" $conf > /dev/null then
pid_file=‘grep "^pid-file" $conf | cut -f 2 -d= | tr -d ’ ’‘
94 Chapter 5:Database Services
else
if test -d "$datadir" then
pid_file=$datadir/‘hostname‘.pid
fi fi if grep "^basedir" $conf > /dev/null then
basedir=‘grep "^basedir" $conf | cut -f 2 -d= | tr -d ’ ’‘
bindir=$basedir/bin fi if grep "^bindir" $conf > /dev/null then
bindir=‘grep "^bindir" $conf | cut -f 2 -d=| tr -d ’ ’‘ fi
fi
# Safeguard (relative paths, core dumps..) cd $basedir
case "$mode" in
’start’)
# Start daemon
if test -x $bindir/safe_mysqld
then
# Give extra arguments to mysqld with the my.cnf file. This script may # be overwritten at next upgrade. $bindir/safe_mysqld –user=$mysql_daemon_user –pid-file=$pid_file –datadir=$datadir &
else
echo "Can’t execute $bindir/safe_mysqld" fi ;;
’stop’)
# Stop daemon. We use a signal here to avoid having to know the # root password. if test -f "$pid_file" then
mysqld_pid=‘cat $pid_file‘
echo "Killing mysqld with pid $mysqld_pid"
kill $mysqld_pid
# mysqld should remove the pid_file when it exits. else
Section 5.3:Setting Up a MySQL Service 95
echo "No mysqld pid file found. Looked for $pid_file." fi ;;
*)
# usage echo "usage: $0 start|stop" exit 1 ;;
esac
The following example shows how to use cluadmin to add a MySQL service.
cluadmin> service add
The user interface will prompt you for information about the service. Not all information is required for all services.
Enter a question mark (?) at a prompt to obtain help.
Enter a colon (:) and a single-character command at a prompt to do one of the following:
c - Cancel and return to the top-level cluadmin command r - Restart to the initial prompt while keeping previous responses p - Proceed with the next prompt
Currently defined services:
databse1 apache2 dbase_home mp3_failover
Service name: mysql_1 Preferred member [None]: devel0 Relocate when the preferred member joins the cluster (yes/no/?) [no]: yes User script (e.g., /usr/foo/script or None) [None]: \
/etc/rc.d/init.d/mysql.server
Do you want to add an IP address to the service (yes/no/?): yes
IP Address Information
IP address: 10.1.16.12 Netmask (e.g. 255.255.255.0 or None) [None]: [Return]
96 Chapter 5:Database Services
Broadcast (e.g. X.Y.Z.255 or None) [None]: [Return]
Do you want to (a)dd, (m)odify, (d)elete or (s)how an IP address,
or are you (f)inished adding IP addresses: f
Do you want to add a disk device to the service (yes/no/?): yes
Disk Device Information
Device special file (e.g., /dev/sda1): /dev/sda1 Filesystem type (e.g., ext2, reiserfs, ext3 or None): ext2 Mount point (e.g., /usr/mnt/service1 or None) [None]: /var/mysql Mount options (e.g., rw, nosuid): rw Forced unmount support (yes/no/?) [no]: yes
Do you want to (a)dd, (m)odify, (d)elete or (s)how devices,
or are you (f)inished adding device information: f
Disable service (yes/no/?) [no]: yes
name: mysql_1 disabled: yes preferred node: devel0 relocate: yes user script: /etc/rc.d/init.d/mysql.server IP address 0: 10.1.16.12
netmask 0: None broadcast 0: None
device 0: /dev/sda1
mount point, device 0: /var/mysql mount fstype, device 0: ext2 mount options, device 0: rw force unmount, device 0: yes
Add mysql_1 service as shown? (yes/no/?) y Added mysql_1. cluadmin>
5.4 Setting Up a DB2 Service
This section provides an example of setting up a cluster service that will fail over IBM DB2 Enter­prise/Workgroup Edition on a cluster. This example assumes that NIS is not running on the cluster systems. To install the software and database on the cluster systems, follow these steps:
Section 5.4:Setting Up a DB2 Service 97
1. On both cluster systems, log in as root and add the IP address and host name that will be used to access the DB2 service to /etc/hosts file. For example:
10.1.16.182 ibmdb2.class.cluster.com ibmdb2
2. Choose an unused partition on a shared disk to use for hosting DB2 administration and instance data, and create a file system on it. For example:
# mke2fs /dev/sda3
3. Create a mount point on both cluster systems for the file system created in Step 2. For example:
# mkdir /db2home
4. On the first cluster system, devel0, mount the file system created in Step 2 on the mount point created in Step 3. For example:
devel0# mount -t ext2 /dev/sda3 /db2home
5. On the first cluster system, devel0, mount the DB2 cdrom and copy the setup response file included in the distribution to /root. For example:
devel0% mount -t iso9660 /dev/cdrom /mnt/cdrom devel0% cp /mnt/cdrom/IBM/DB2/db2server.rsp /root
6. Modify the setup response file, db2server.rsp, to reflect local configuration settings. Make sure that the UIDs and GIDs are reserved on both cluster systems. For example:
-----------Instance Creation Settings------------
------------------------------------------------­DB2.UID = 2001 DB2.GID = 2001 DB2.HOME_DIRECTORY = /db2home/db2inst1
-----------Fenced User Creation Settings----------
-------------------------------------------------­UDF.UID = 2000 UDF.GID = 2000 UDF.HOME_DIRECTORY = /db2home/db2fenc1
-----------Instance Profile Registry Settings------
--------------------------------------------------­DB2.DB2COMM = TCPIP
----------Administration Server Creation Settings---
---------------------------------------------------­ADMIN.UID = 2002 ADMIN.GID = 2002
98 Chapter 5:Database Services
ADMIN.HOME_DIRECTORY = /db2home/db2as
---------Administration Server Profile Registry Settings-
--------------------------------------------------------­ADMIN.DB2COMM = TCPIP
---------Global Profile Registry Settings-------------
-----------------------------------------------------­DB2SYSTEM = ibmdb2
7. Start the installation. For example:
devel0# cd /mnt/cdrom/IBM/DB2 devel0# ./db2setup -d -r /root/db2server.rsp 1>/dev/null \
2>/dev/null &
8. Check for errors during the installation by examining the installation log file, /tmp/db2setup.log. Every step in the installation must be marked as
SUCCESS at
the end of the log file.
9. Stop the DB2 instance and administration server on the first cluster system. For example:
devel0# su - db2inst1 devel0# db2stop devel0# exit devel0# su - db2as devel0# db2admin stop devel0# exit
10. Unmount the DB2 instance and administration data partition on the first cluster system. For ex­ample:
devel0# umount /db2home
11. Mount the DB2 instance and administration data partition on the second cluster system, devel1. For example:
devel1# mount -t ext2 /dev/sda3 /db2home
12. Mount the DB2 CDROM on the second cluster system and remotely copy the db2server.rsp file to /root. For example:
devel1# mount -t iso9660 /dev/cdrom /mnt/cdrom devel1# rcp devel0:/root/db2server.rsp /root
13. Start the installation on the second cluster system, devel1. For example:
devel1# cd /mnt/cdrom/IBM/DB2 devel1# ./db2setup -d -r /root/db2server.rsp 1>/dev/null \
Section 5.4:Setting Up a DB2 Service 99
2>/dev/null &
14. Check for errors during the installation by examining the installation log file. Every step in the installation must be marked as
DB2 Instance Creation FAILURE Update DBM configuration file for TCP/IP CANCEL Update parameter DB2COMM CANCEL Auto start DB2 Instance CANCEL DB2 Sample Database CANCEL Start DB2 Instance Administration Server Creation FAILURE Update parameter DB2COMM CANCEL Start Administration Serve CANCEL
SUCCESS except for the following:
15. Test the database installation by invoking the following commands, first on one cluster system, and then on the other cluster system:
# mount -t ext2 /dev/sda3 /db2home # su - db2inst1 # db2start # db2 connect to sample # db2 select tabname from syscat.tables # db2 connect reset # db2stop # exit # umount /db2home
16. Create the DB2 cluster start/stop script on the DB2 administration and instance data partition. For example:
# vi /db2home/ibmdb2 # chmod u+x /db2home/ibmdb2
#!/bin/sh # # IBM DB2 Database Cluster Start/Stop Script #
DB2DIR=/usr/IBMdb2/V6.1
case $1 in "start")
$DB2DIR/instance/db2istrt ;;
"stop")
$DB2DIR/instance/db2ishut
100 Chapter 5:Database Services
;;
esac
17. Modify the /usr/IBMdb2/V6.1/instance/db2ishut file on both cluster systems to forcefully disconnect active applications before stopping the database. For example:
for DB2INST in ${DB2INSTLIST?}; do
echo "Stopping DB2 Instance "${DB2INST?}"..." >> ${LOGFILE?} find_homedir ${DB2INST?} INSTHOME="${USERHOME?}" su ${DB2INST?} -c " \
source ${INSTHOME?}/sqllib/db2cshrc 1> /dev/null 2> /dev/null; \
>>>>>>> db2 force application all; \
done
${INSTHOME?}/sqllib/db2profile 1> /dev/null 2> /dev/null; \
db2stop " 1>> ${LOGFILE?} 2>> ${LOGFILE?}
if [ $? -ne 0 ]; then
ERRORFOUND=${TRUE?}
fi
18. Edit the inittab file and comment out the DB2 line to enable the cluster service to handle starting and stopping the DB2 service. This is usually the last line in the file. For example:
# db:234:once:/etc/rc.db2 > /dev/console 2>&1 # Autostart DB2 Services
Use the cluadmin utility to create the DB2 service. Add the IP address from Step 1, the shared partition created in Step 2, and the start/stop script created in Step 16.
To install the DB2 client on a third system, invoke these commands:
display# mount -t iso9660 /dev/cdrom /mnt/cdrom display# cd /mnt/cdrom/IBM/DB2 display# ./db2setup -d -r /root/db2client.rsp
To configure a DB2 client, add the service’s IP address to the /etc/hosts file on the client system:
10.1.16.182 ibmdb2.lowell.mclinux.com ibmdb2
Then, add the following entry to the /etc/services file on the client system:
db2cdb2inst1 50000/tcp
Invoke the following commands on the client system:
# su - db2inst1 # db2 catalog tcpip node ibmdb2 remote ibmdb2 server db2cdb2inst1 # db2 catalog database sample as db2 at node ibmdb2 # db2 list node directory # db2 list database directory
Loading...