Redhat CLUSTER User Manual

Red Hat Cluster Manager
The Red Hat Cluster Manager Installation and
Administration Guide
ISBN: N/A
Red Hat, Inc.
1801 Varsity Drive Raleigh, NC 27606 USA +1 919 754 3700 (Voice) +1 919 754 3701 (FAX) 888 733 4281 (Voice) P.O. Box 13588 Research Triangle Park, NC 27709 USA
© 2002 Red Hat, Inc.
© 2000 Mission Critical Linux, Inc. © 2000 K.M. Sorenson rh-cm(EN)-1.0-Print-RHI (2002-04-17T17:16-0400) Permission is granted to copy, distribute and/or modify this documentunderthe terms of the GNU Free
Documentation License, Version 1.1 or any later version published by the Free Software Foundation. A copy of the license is included on the GNU Free Documentation License website.
Red Hat, Red Hat Network, the Red Hat "Shadow Man" logo, RPM, Maximum RPM, the RPM logo, Linux Library, PowerTools, Linux Undercover, RHmember, RHmember More, Rough Cuts, Rawhide and all Red Hat-based trademarks and logos are trademarks or registered trademarks of Red Hat, Inc. in the United States and other countries.
Linux is a registered trademark of Linus Torvalds. Motif and UNIX are registered trademarks of The Open Group. Itanium is a registered trademark of Intel Corporation. Netscape is a registered trademark of Netscape Communications Corporation in the United States and
other countries. Windows is a registered trademark of Microsoft Corporation. SSH and Secure Shell are trademarks of SSH Communications Security, Inc. FireWire is a trademark of Apple Computer Corporation. S/390 and zSeries are trademarks of International Business Machines Corporation. All other trademarks and copyrights referred to are the property of their respective owners.
ii
Acknowledgments
The Red Hat Cluster Manager software was originally based on the open source Kimberlite http://oss.missioncriticallinux.com/kimberlite/ cluster project which was developed by Mission Critical Linux, Inc.
Subsequent to its inception based on Kimberlite, developers at Red Hat have made a large number of enhancements and modifications. The following is a non-comprehensive list highlighting some of these enhancements.
Packagingand integration into theRed Hat installation paradigm in order to simplify the end user’s experience.
Addition of support for high availability NFS services.
Addition of support for high availability Samba services.
Addition of support for using watchdog timers as a data integrity provision
Addition of service monitoring which will automatically restart a failed application.
Rewrite of the service manager to facilitate additional cluster-wide operations.
Addition of the Red Hat Cluster Manager GUI, a graphical monitoring tool.
A set of miscellaneous bug fixes.
iii
The Red Hat Cluster Manager software incorporates STONITH compliant power switch modules from the Linux-HA project http://www.linux-ha.org/stonith/.
Contents
Red Hat Cluster Manager
Acknowledgments........... ..................... ..................... .................. . iii
Chapter 1 Introduction to Red Hat Cluster Manager ..... ...... 7
1.1 Cluster Overview ................................................................. 7
1.2 Cluster Features.................................................................. 9
1.3 How To Use This Manual ........................................................ 12
Chapter 2 Hardware Installation and Operating System
Configuration
2.1 Choosing a Hardware Configuration ........................................... 13
2.2 Steps for Setting Up the Cluster Systems ..................................... 30
2.3 Steps for Installing and Configuring the Red Hat Linux Distribution ........ 33
2.4 Steps for Setting Up and Connecting the Cluster Hardware ................. 39
Chapter 3 Cluster Software Installation and Configuration 55
3.1 Steps for Installing and Initializing the Cluster Software...................... 55
3.2 Checking the Cluster Configuration............................................. 62
3.3 Configuring syslog Event Logging .............................................. 65
3.4 Using the cluadmin Utility........................................................ 67
............... ..................... ..................... .... 13
Chapter 4 Service Configuration and Administration..... .... 73
4.1 Configuring a Service ............................................................ 73
4.2 Displaying a Service Configuration ............................................. 77
4.3 Disabling a Service............................................................... 79
4.4 Enabling a Service ............................................................... 79
4.5 Modifying a Service .............................................................. 80
4.6 Relocating a Service ............................................................. 80
4.7 Deleting a Service................................................................ 81
4.8 Handling Services that Fail to Start............................................. 81
iv
Chapter 5 Database Services................. ..................... ............ 83
5.1 Setting Up an Oracle Service ................................................... 83
5.2 Tuning Oracle Services .......................................................... 91
5.3 Setting Up a MySQL Service.................................................... 92
5.4 Setting Up a DB2 Service ....................................................... 96
Chapter 6 Network File Sharing Services....... ..................... . 103
6.1 Setting Up an NFS Service...................................................... 103
6.2 Setting Up a High Availability Samba Service ................................. 112
Chapter 7 Apache Services..... .................. ..................... ......... 123
7.1 Setting Up an Apache Service .................................................. 123
Chapter 8 Cluster Administration .. ..................... ................... 129
8.1 Displaying Cluster and Service Status ......................................... 129
8.2 Starting and Stopping the Cluster Software ................................... 132
8.3 Removing a Cluster Member.................................................... 132
8.4 Modifying the Cluster Configuration ............................................ 133
8.5 Backing Up and Restoring the Cluster Database ............................. 133
8.6 Modifying Cluster Event Logging ............................................... 134
8.7 Updating the Cluster Software .................................................. 135
8.8 Reloading the Cluster Database ................................................ 136
8.9 Changing the Cluster Name..................................................... 136
8.10 Reinitializing the Cluster ......................................................... 136
8.11 Disabling the Cluster Software.................................................. 137
8.12 Diagnosing and Correcting Problems in a Cluster ............................ 137
Chapter 9 Configuring and using the Red Hat Cluster
Manager GUI
9.1 Setting up the JRE ............................................................... 145
9.2 Configuring Cluster Monitoring Parameters ................................... 146
9.3 Enabling the Web Server ........................................................ 147
9.4 Starting the Red Hat Cluster Manager GUI.................................... 147
............... ..................... ..................... ...... 145
v
Appendix A Supplementary Hardware Information .. .............. 151
A.1 Setting Up Power Switches...................................................... 151
A.2 SCSI Bus Configuration Requirements ........................................ 160
A.3 SCSI Bus Termination............................................................ 161
A.4 SCSI Bus Length ................................................................. 162
A.5 SCSI Identification Numbers .................................................... 162
A.6 Host Bus Adapter Features and Configuration Requirements............... 163
A.7 Tuning the Failover Interval...................................................... 168
Appendix B Supplementary Software Information ..... ............ 171
B.1 Cluster Communication Mechanisms .......................................... 171
B.2 Cluster Daemons ................................................................. 172
B.3 Failover and Recovery Scenarios............................................... 173
B.4 Cluster Database Fields ......................................................... 177
B.5 Using Red Hat Cluster Manager with Piranha................................. 179
vi
Section 1.1:Cluster Overview 7
1 Introduction to Red Hat Cluster Manager
The Red Hat Cluster Manager is a collection of technologies working together to provide data in­tegrity and the ability to maintain application availability in the event of a failure. Using redundant hardware, shared disk storage, power management, and robust cluster communication and application failover mechanisms, a cluster can meet the needs of the enterprise market.
Specially suited for database applications, network file servers, and World Wide Web (Web) servers with dynamic content, a cluster can also be used in conjunction with the Piranha load balancing cluster software, based on the Linux Virtual Server (LVS) project, to deploy a highly available e-commerce site that has complete data integrity and application availability, in addition to load balancing capabil­ities. See Section B.5, Using
Red Hat Cluster Manager
1.1 Cluster Overview
To set up a cluster, an administrator must connect the cluster systems (often referred to as member systems) to the cluster hardware, and configure the systems into the cluster environment. The foun-
dation of a cluster is an advanced host membership algorithm. This algorithm ensures that the cluster maintains complete data integrity at all times by using the following methods of inter-node commu­nication:
Quorum partitions on shared disk storage to hold system status
with Piranha for more information.
Ethernet and serial connections between the cluster systems for heartbeat channels
To make an application and data highly availablein a cluster, the administrator must configure a clus- ter service — a discrete group of service properties and resources, such as an application and shared disk storage. A service can be assigned an IP address to provide transparent client access to the ser­vice. For example, an administrator can set up a cluster service that provides clients with access to highly-available database application data.
Both cluster systems can run any service and access the service data on shared disk storage. However, each service can run on only one cluster system at a time, in order to maintain data integrity. Adminis­trators can set up an active-active configuration in which both cluster systems run different services, or a hot-standby configuration in which a primary cluster system runs all the services, and a backup cluster system takes over only if the primary system fails.
8 Chapter 1:Introduction to Red Hat Cluster Manager
Figure 1–1 Example Cluster
Figure 1–1, Example Cluster shows an example of a cluster in an active-active configuration. If a hardware or software failure occurs, the cluster will automatically restart the failed system’s ser-
vices on the functional cluster system. This service failover capability ensures that no data is lost, and there is little disruption to users. When the failed system recovers, the cluster can re-balance the services across the two systems.
In addition, a cluster administrator can cleanly stop the services running on a cluster system and then restart them on the other system. This service relocation capability enables the administrator to main­tain application and data availability when a cluster system requires maintenance.
Section 1.2:Cluster Features 9
1.2 Cluster Features
A cluster includes the following features:
No-single-point-of-failure hardware configuration Clusters can include a dual-controller RAID array, multiple network and serial communication
channels, and redundant uninterruptible power supply (UPS) systems to ensure that no single fail­ure results in application down time or loss of data.
Alternately, a low-cost cluster can be set up to provide less availability than a no-single-point-of­failure cluster. For example, an administrator can set up a cluster with a single-controller RAID array and only a single heartbeat channel.
Note
Certain low-cost alternatives, such as software RAID and multi-initiator parallel SCSI, are not compatible or appropriate for use on the shared cluster storage. Refer to Section 2.1, Choosing a Hardware Configura- tion, for more information.
Service configuration framework Clusters enable an administrator to easily configure individual services to make data and appli-
cations highly available. To create a service, an administrator specifies the resources used in the service and properties for the service, including the service name, application start and stop script, disk partitions, mount points, and the cluster system on which an administrator prefers to run the service. After the administrator adds a service, the cluster enters the information into the cluster database on shared storage, where it can be accessed by both cluster systems.
Thecluster provides an easy-to-use framework for databaseapplications. For example, a database serviceserves highly-available data to a database application. The application running on a cluster system provides network access to database client systems, such as Web servers. If the service fails over to another cluster system, the application can still access the shared database data. A network-accessible database service is usually assigned an IP address, which is failed over along with the service to maintain transparent access for clients.
The cluster service framework can be easily extended to other applications, as well.
Data integrity assurance To ensure data integrity, only one cluster system can run a service and access service data at one
time. Using power switches in the cluster configuration enable each cluster system to power-cycle the other cluster system before restarting its services during the failover process. This prevents
10 Chapter 1:Introduction to Red Hat Cluster Manager
the two systems from simultaneously accessing the same data and corrupting it. Although not required, it is recommended that power switches are used to guarantee data integrity under all failure conditions. Watchdog timers are an optional variety of power control to ensure correct operation of service failover.
Cluster administration user interface A user interface simplifies cluster administration and enables an administrator to easily create,
start, stop, relocate services, and monitor the cluster.
Multiple cluster communication methods To monitor the health of the other cluster system, each cluster system monitors the health of the
remote power switch, if any, and issues heartbeat pings over network and serial channels to mon­itor the health of the other cluster system. In addition, each cluster system periodically writes a timestamp and cluster state information to two quorum partitions located on shared disk storage. System state information includes whether the system is an active cluster member. Service state information includes whether the service is running and which cluster system is running the ser­vice. Each cluster system checks to ensure that the other system’s status is up to date.
To ensure correct cluster operation, if a system is unable to write to both quorum partitions at startup time, it will not be allowedto join the cluster. In addition, if a cluster system is not updating its timestamp, and if heartbeats to the system fail, the cluster system will be removed from the cluster.
Section 1.2:Cluster Features 11
Figure 1–2 Cluster Communication Mechanisms
Figure 1–2, Cluster Communication Mechanisms shows how systems communicate in a cluster configuration. Note that the terminal server used to access system consoles via serial ports is not a required cluster component.
Service failover capability If a hardware or software failure occurs, the cluster will take the appropriate action to maintain ap-
plication availability and data integrity. For example, if a cluster system completely fails, the other cluster system will restart its services. Services already running on this system are not disrupted.
When the failed system reboots and is able to write to the quorum partitions, it can rejoin the cluster and run services. Depending on how the services are configured, the cluster can re-balance the services across the two cluster systems.
12 Chapter 1:Introduction to Red Hat Cluster Manager
Manual service relocation capability In addition to automatic service failover, a cluster enables administrators to cleanly stop services
on one cluster system and restart them on the other system. This allows administrators to perform planned maintenance on a cluster system, while providing application and data availability.
Event logging facility To ensure that problems are detected and resolved before they affect service availability, the cluster
daemons log messages by using the conventional Linux syslog subsystem. Administrators can customize the severity level of the logged messages.
Application Monitoring The cluster services infrastructure can optionally monitor the state and health of an application. In
this manner, should an application-specific failure occur, the cluster will automatically restart the application. In response to the application failure, the application will attempt to be restarted on the member it was initially running on; failing that, it will restart on the other cluster member.
Status Monitoring Agent A cluster status monitoring agent is used to gather vital cluster and application state information.
This information is then accessible both locally on the cluster member as well as remotely. A graphical user interface can then display status information from multiple clusters in a manner which does not degrade system performance.
1.3 How To Use This Manual
This manual contains information about setting up the cluster hardware, and installing the Linux dis­tribution and the cluster software. These tasks are described in Chapter 2, Hardware Installation and Operating System Configuration and Chapter 3, Cluster Software Installation and Configuration.
For information about setting up and managing cluster services, see Chapter 4, Service Configuration and Administration. For information about managing a cluster, see Chapter 8, Cluster Administration.
Appendix A, Supplementary Hardware Information contains detailed configuration information on specific hardware devices and shared storage configurations. Appendix B, Supplementary Software Information contains background information on the cluster software and other related information.
Section 2.1:Choosing a Hardware Configuration 13
2 Hardware Installation and Operating System Configuration
To set up the hardware configuration and install the Linux distribution, follow these steps:
Choosea cluster hardware configurationthat meets the needs of applications and users, see Section
2.1, Choosing a Hardware Configuration.
Set up and connect the cluster systems and the optional console switch and network switch or hub, see Section 2.2, Steps for Setting Up the Cluster Systems.
Install and configure the Linux distribution on the cluster systems, see Section 2.3, Steps for In- stalling and Configuring the Red Hat Linux Distribution.
Set up the remaining cluster hardware components and connect them to the cluster systems, see Section 2.4, Steps for Setting Up and Connecting the Cluster Hardware.
After setting up the hardware configuration and installing the Linux distribution, installing the cluster software is possible.
2.1 Choosing a Hardware Configuration
The Red Hat Cluster Manager allows administrators to use commodity hardware to set up a cluster configuration that will meet the performance, availability,and data integrity needs of applications and users. Cluster hardware ranges from low-cost minimum configurations that include only the com­ponents required for cluster operation, to high-end configurations that include redundant heartbeat channels, hardware RAID, and power switches.
Regardlessofconfiguration, the use of high-qualityhardwarein a cluster is recommended, as hardware malfunction is the primary cause of system down time.
Although all cluster configurations provide availability,some configurations protect against every sin­gle point of failure. In addition, all cluster configurations provide data integrity, but some configura­tions protect data under every failure condition. Therefore, administrators must fully understand the needs of their computing environment and also the availability and data integrity features of different hardwareconfigurationsin order to choose the cluster hardware that will meet the proper requirements.
When choosing a cluster hardware configuration, consider the following:
Performance requirements of applications and users
Choose a hardware configuration that will provide adequate memory, CPU, and I/O resources. Be sure that the configuration chosen will be able to handle any future increases in workload, as well.
14 Chapter 2:Hardware Installation and Operating System Configuration
Cost restrictions
The hardware configuration chosen must meet budget requirements. For example, systems with multiple I/O ports usually cost more than low-end systems with less expansion capabilities.
Availability requirements
If a computing environment requires the highest degree of availability, such as a production en­vironment, then a cluster hardware configurationthat protects against all single points of failure, includingdisk, storage interconnect, heartbeatchannel, and power failures is recommended. En­vironments that can tolerate an interruption in availability, such as development environments, may not require as much protection. See Section 2.4.1, Configuring Heartbeat Channels, Sec­tion 2.4.3, Configuring UPS Systems, and Section 2.4.4, Configuring Shared Disk Storage for more information about using redundant hardware for high availability.
Data integrity under all failure conditions requirement
Using power switches in a cluster configuration guarantees that service data is protected under every failure condition. These devices enable a cluster system to power cycle the other cluster system before restarting its services during failover. Power switches protect against data cor­ruption if an unresponsive(or hanging) system becomes responsiveafter its services have failed over,and then issues I/O to a disk that is also receiving I/O from the other cluster system.
In addition, if a quorum daemon fails on a cluster system, the system is no longer able to monitor thequorumpartitions. If you are not usingpowerswitchesin the cluster,this error condition may result in services being run on more than one cluster system, which can cause data corruption. See Section 2.4.2, ConfiguringPower Switches for more information about the benefits of using powerswitches ina cluster. It is recommended thatproduction environmentsusepowerswitches or watchdog timers in the cluster configuration.
2.1.1 Shared Storage Requirements
The operation of the cluster depends on reliable, coordinated access to shared storage. In the event of hardware failure, it is desirable to be able to disconnect one member from the shared storage for repair without disrupting the other member. Shared storage is truly vital to the cluster configuration.
Testing has shown that it is difficult, if not impossible, to configure reliable multi-initiator parallel SCSI configurations at data rates above 80 MBytes/sec. using standard SCSI adapters. Further tests have shown that these configurations can not support online repair because the bus does not work reliably when the HBA terminators are disabled, and external terminators are used. For these reasons, multi-initiator SCSI configurations using standard adapters are not supported. Single-initiatorparallel SCSI buses, connected to multi-ported storage devices, or Fibre Channel, are required.
The Red Hat Cluster Manager requires that both cluster members have simultaneous access to the shared storage. Certain host RAID adapters are capable of providing this type of access to shared
Section 2.1:Choosing a Hardware Configuration 15
RAID units. These products require extensive testing to ensure reliable operation, especially if the shared RAID units are based on parallel SCSI buses. These products typically do not allow for online repair of a failed system. No host RAID adapters are currently certified with Red Hat Cluster Manager. Refer to the Red Hat web site at http://www.redhat.com for the most up-to-date supported hardware matrix.
The use of software RAID, or software Logical Volume Management (LVM), is not supported on shared storage. This is because these products do not coordinate access from multiple hosts to shared storage. SoftwareRAID or LVMmay be used on non-shared storage on cluster members (for example, boot and system partitions and other filesysytems which are not associated with any cluster services).
2.1.2 Minimum Hardware Requirements
A minimum hardware configuration includes only the hardware components that are required for cluster operation, as follows:
Two servers to run cluster services
Ethernet connection for a heartbeat channel and client network access
Shared disk storage for the cluster quorum partitions and service data.
See Section 2.1.5, Example of a Minimum Cluster Configuration for an example of this type of hard­ware configuration.
The minimum hardware configuration is the most cost-effective cluster configuration; however, it in­cludes multiple points of failure. For example, if the RAID controller fails, then all cluster services will be unavailable. When deploying the minimal hardware configuration, software watchdog timers should be configured as a data integrity provision.
To improve availability,protect against component failure, and guarantee data integrity under all fail­ure conditions, the minimum configuration can be expanded. Table 2–1, Improving Availability and Guaranteeing Data Integrity shows how to improve availability and guarantee data integrity:
Table 2–1 Improving Availability and Guaranteeing Data Integrity
Problem Solution
Disk failure Hardware RAID to replicate data across multiple disks. RAID controller failure Dual RAID controllers to provide redundant access to
disk data.
Heartbeat channel failure Point-to-point Ethernet or serial connection between
the cluster systems.
16 Chapter 2:Hardware Installation and Operating System Configuration
Problem Solution
Power source failure Redundant uninterruptible power supply (UPS) systems. Data corruption under all failure
conditions
A no-single-point-of-failure hardware configuration that guarantees data integrity under all failure conditions can include the following components:
Two servers to run cluster services
Ethernet connection between each system for a heartbeat channel and client network access
Dual-controller RAID array to replicate quorum partitions and service data
Two power switches to enable each cluster system to power-cycle the other system during the failover process
Point-to-point Ethernet connection between the cluster systems for a redundant Ethernet heartbeat channel
Point-to-point serial connection between the cluster systems for a serial heartbeat channel
Two UPS systems for a highly-available source of power
See Section 2.1.6, Example of a No-Single-Point-Of-FailureConfiguration for an example of this type of hardware configuration.
Cluster hardware configurations can also include other optional hardware components that are com­mon in a computing environment. For example, a cluster can include a network switch or network hub, which enables the connection of the cluster systems to a network. A cluster may also include a console switch, which facilitates the management of multiple systems and eliminates the need for separate monitors, mouses, and keyboards for each cluster system.
One type of console switch is a terminal server, which enables connection to serial consoles and management of many systems from one remote location. As a low-cost alternative, you can use a KVM (keyboard, video, and mouse) switch, which enables multiple systems to share one keyboard, monitor,andmouse. A KVM is suitable forconfigurationsin which access to a graphical user interface (GUI) to perform system management tasks is preferred.
When choosing a cluster system, be sure that it provides the PCI slots, network slots, and serial ports that the hardware configuration requires. For example, a no-single-point-of-failure configuration re­quires multiple serial and Ethernet ports. Ideally, choose cluster systems that have at least two serial ports. See Section 2.2.1, Installing the Basic System Hardware for more information.
Power switches or hardware-based watchdog timers
Section 2.1:Choosing a Hardware Configuration 17
2.1.3 Choosing the Type of Power Controller
The Red Hat Cluster Manager implementation consists of a generic power management layer and a set of device specific modules which accommodate a range of power management types. When se­lecting the appropriate type of power controller to deploy in the cluster, it is important to recognize the implications of specific device types. The following describes the types of supported power switches followed by a summary table. For a more detailed description of the role a power switch plays to ensure data integrity, refer to Section 2.4.2, Configuring Power Switches.
Serial- and Network-attached power switches are separate devices which enable one cluster member to power cycle another member. They resemble a power plug strip on which individual outlets can be turned on and off under software control through either a serial or network cable.
Watchdog timers providea means for failed systems to remove themselves from the cluster prior to an­other system taking over its services, rather than allowing one cluster member to power cycle another. The normal operational mode for watchdog timers is that the cluster software must periodically reset a timer prior to its expiration. If the cluster software fails to reset the timer, the watchdog will trigger under the assumption that the system may have hung or otherwise failed. The healthy cluster member allowsa window of time to pass prior to concluding that another cluster member has failed (by default, this window is 12 seconds). The watchdog timer interval must be less than the duration of time for one cluster member to conclude that another has failed. In this manner, a healthy system can assume that prior to taking over services for a failed cluster member, that it has safely removed itself from the cluster (by rebooting) and therefore is no risk to data integrity. The underlying watchdog support is included in the core Linux kernel. Red Hat Cluster Manager utilizes these watchdog features via its standard APIs and configuration mechanism.
Thereare two types of watchdog timers: Hardware-basedand software-based. Hardware-based watch­dog timers typically consist of system board components such as the Intel circuitry has a high degree of independence from the main system CPU. This independence is benefi­cial in failure scenarios of a true system hang, as in this case it will pull down the system’s reset lead resulting in a system reboot. There are some PCI expansion cards that provide watchdog features.
®
i810 TCO chipset. This
The second type of watchdog timer is software-based. This category of watchdog does not have any dedicated hardware. The implementation is a kernel thread which is periodically run and if the timer duration has expired will initiate a system reboot. The vulnerability of the software watchdog timer is that under certain failure scenarios such as system hangs while interrupts are blocked, the kernel thread will not be called. As a result, in such conditions it can not be definitively depended on for data integrity. This can cause the healthy cluster member to take over services for a hung node which could cause data corruption under certain scenarios.
Finally,administrators can choose not to employ a powercontroller at all. If choosing the "None" type, note that there are no provisions for a cluster member to power cycle a failed member. Similarly, the failed member can not be guaranteed to reboot itself under all failure conditions. Deploying clusters
18 Chapter 2:Hardware Installation and Operating System Configuration
with a power controller type of "None" is useful for simple evaluation purposes, but because it affords the weakest data integrity provisions, it is not recommended for usage in a production environment.
Ultimately, the right type of power controller deployed in a cluster environment depends on the data integrity requirements weighed against the cost and availability of external power switches.
Table 2–2, Power Switches summarizes the types of supported power management modules and dis- cusses their advantages and disadvantages individually.
Table 2–2 Power Switches
Type Notes Pros Cons
Serial-attached power switches
Network­attached power switches
Hardware Watchdog Timer
Software Watchdog Timer
No power controller
Two serial attached power controllers are used in a cluster (one per member system)
A single network attached power controller is required per cluster
Affords strong data integrity guarantees
Offers acceptable data integrity provisions
No power controller function is in use
Affords strong data integrity guarantees. the power controller itself is not a single point of failure as there are two in a cluster.
Affords strong data integrity guarantees.
Obviates the need to purchase external power controller hardware
Obviates the need to purchase external power controller hardware; works on any system
Obviates the need to purchase external power controller hardware; works on any system
Requires purchase of power controller hardware and cables; consumes serial ports
Requires purchase of power controller hardware. The power controller itself can be come a single point of failure (although they are typically very reliable devices).
Not all systems include supported watchdog hardware
Under some failure scenarios, the software watchdog will not be operational, opening a small vulnerability window
Vulnerable to data corruption under certain failure scenarios
Section 2.1:Choosing a Hardware Configuration 19
2.1.4 Cluster Hardware Tables
Use the following tables to identify the hardware components required for your cluster configuration. In some cases, the tables list specific products that have been tested in a cluster, although a cluster is expected to work with other products.
The complete set of qualified cluster hardware components change over time. Consequently, the table below may be incomplete. For the most up-do-date itemization of supported hardware components, refer to the Red Hat documentation website at http://www.redhat.com/docs.
Table 2–3 Cluster System Hardware Table
Hardware Quantity Description Required
Cluster system
Two
Red Hat Cluster Manager supports IA-32 hardware platforms. Each cluster system must provide enough PCI slots, network slots, and serial ports for the cluster hardware configuration. Because disk devices must have the same name on each cluster system, it is recommended that the systems have symmetric I/O subsystems. In addition, it is recommended that each system have a minimum of 450 MHz CPU speed and 256 MB of memory. See Section 2.2.1, Installing the Basic System Hardware for more information.
Yes
Table 2–4, PowerSwitch Hardware Tableincludes several different types of power switches. A single cluster requires only one type of power switch shown below.
20 Chapter 2:Hardware Installation and Operating System Configuration
Table 2–4 Power Switch Hardware Table
Hardware Quantity Description Required
Serial power switches
Null modem cable
Two
Two
Power switches enable each cluster system to power-cycle the other cluster system. See Section 2.4.2, Configuring Power Switches for information about using power switches in a cluster. Note that clusters are configured with either serial or network attached power switches and not both. The following serial attached power switch has been fully tested: RPS-10 (model M/HD in the US, and model M/EC in Europe), which is available from http://www.wti.com/rps-10.htm. Refer to Section A.1.1, Setting up RPS-10 Power
Switches
Latent support is provided for the following serial attached power switch. This switch has not yet been fully tested: APC Serial On/Off Switch (partAP9211), http://www.apc.com
Null modem cables connect a serial port on a cluster system to a serial power switch. This serial connection enables each cluster system to power-cycle the other system. Some power switches may require different cables.
Strongly recommended for data integrity under all failure conditions
Only if using serial power switches
Mounting bracket
One Some power switches support rack mount
configurations and require a separate mounting bracket (e.g. RPS-10).
Only for rack mounting power switches
Section 2.1:Choosing a Hardware Configuration 21
Hardware Quantity Description Required
Network power switch
Watchdog Timer
One Network attached power switches
enable each cluster member to power cycle all others. Refer to Section
2.4.2, Configuring Power Switches for information about using network attached power switches, as well as caveats associated with each. The following network attached power switch has been fully tested:
· WTI NPS-115, or NPS-230, available from http://www.wti.com. Note that the NPS power switch can properly accommodate systems with dual redundant power supplies. Refer to Section A.1.2, Setting up WTI NPS Power Switches.
· Baytech RPC-3 and RPC-5, http://www.baytech.net Latent support is provided for the APC Master Switch (AP9211, or AP9212), www.apc.com
Two
Watchdog timers cause a failed cluster member to remove itself from a cluster prior to a healthy member taking over its services. Refer to Section 2.4.2, Configuring Power Switches for more information
Strongly recommended for data integrity under all failure conditions
Recom­mended for data integrity on systems which pro­vide inte­grated watch­dog hardware
The following table shows a variety of storage devices for an administrator to choose from. An indi­vidual cluster does not require all of the components listed below.
22 Chapter 2:Hardware Installation and Operating System Configuration
Table 2–5 Shared Disk Storage Hardware Table
Hardware Quantity Description Required
External disk storage enclosure
One Use Fibre Channel or single-initiator
parallel SCSI to connect the cluster systems to a single or dual-controller RAID array. To use single-initiator buses, a RAID controller must have multiple host ports and provide simultaneous access to all the logical units on the host ports. To use a dual-controller RAID array, a logical unit must fail over from one controller to the other in a way that is transparent to the operating system.
The following are recommended SCSI RAID arrays that provide simultaneous access to all the logical units on the host ports (this is not a comprehensive list; rather its limited to those RAID boxes which have been tested):
· Winchester Systems FlashDisk RAID Disk Array, which is available from http://www.winsys.com.
· Dot Hill’s SANnet Storage Systems, which is available from http://www.dothill.com
· Silicon Image CRD-7040 & CRA-7040, CRD -7220, CRD-7240 & CRA-7240, CRD-7400 & CRA-7400 controller based RAID arrays. Available from http://www.synetexinc.com
In order to ensure symmetry of device IDs and LUNs, many RAID arrays with dual redundant controllers are required to be configured in an active/passive mode. See Section 2.4.4, Configuring Shared Disk Storage for more information.
Yes
Section 2.1:Choosing a Hardware Configuration 23
Hardware Quantity Description Required
Host bus adapter
Two
To connect to shared disk storage, you must install either a parallel SCSI or a
Yes
Fibre Channel host bus adapter in a PCI slot in each cluster system.
For parallel SCSI, use a low voltage differential (LVD) host bus adapter. Adapters have either HD68 or VHDCI connectors. Recommended parallel SCSI host bus adapters include the following:
· Adaptec 2940U2W, 29160, 29160LP, 39160, and 3950U2
· Adaptec AIC-7896 on the Intel L440GX+ motherboard
· Qlogic QLA1080 and QLA12160
· Tekram Ultra2 DC-390U2W
· LSI Logic SYM22915
· A recommended Fibre Channel host bus adapter is the Qlogic QLA2200.
See Section A.6, Host Bus Adapter
Features and Configuration Requirements
for device features and configuration information. Host-bus adapter based RAID cards are only supported if they correctly support multi-host operation. At the time of publication, there were no fully tested host-bus adapter based RAID cards. Refer to http://www.redhat.com for more the latest hardware information.
SCSI cable
Two
SCSI cables with 68 pins connect each host bus adapter to a storage enclosure port. Cables have either HD68 or VHDCI connectors. Cables vary based on adapter type
Only for par­allel SCSI configura­tions
24 Chapter 2:Hardware Installation and Operating System Configuration
Hardware Quantity Description Required
SCSI terminator
Two
For a RAID storage enclosure that uses "out" ports (such as FlashDisk RAID Disk Array) and is connected to single-initiator SCSI buses, connect terminators to the "out" ports in order to terminate the buses.
Only for par­allel SCSI configura­tions and only if necessary for termina­tion
Fibre Channel hub or switch
Fibre Channel cable
One or two A Fibre Channel hub or switch is required. Only for some
Fibre Chan­nel configura­tions
Two to six A Fibre Channel cable connects a host bus
adapter to a storage enclosure port, a Fibre Channel hub, or a Fibre Channel switch. If a hub or switch is used, additional cables are needed to connect the hub or switch to the storage adapter ports.
Only for Fibre Channel con­figurations
Table 2–6 Network Hardware Table
Hardware Quantity Description Required
Network interface
Network switch or hub
Network cable
One for each network connection
One A network switch or hub allows connection of
One for each network interface
Each network connection requires a network interface installed in a cluster system.
multiple systems to a network. A conventional network cable, such as a cable
with an RJ45 connector, connects each network interface to a network switch or a network hub.
Yes
No
Yes
Section 2.1:Choosing a Hardware Configuration 25
Table 2–7 Point-To-Point Ethernet Heartbeat Channel Hardware Table
Hardware Quantity Description Required
Network interface
Two for each channel
Each Ethernet heartbeat channel requires a network interface installed in both cluster systems.
No
Network crossover cable
One for each channel
A network crossover cable connects a network interface on one cluster system to a network interface on the other cluster system, creating an Ethernet heartbeat channel.
Only for a redundant Ethernet heartbeat channel
26 Chapter 2:Hardware Installation and Operating System Configuration
Table 2–8 Point-To-Point Serial Heartbeat Channel Hardware Table
Hardware Quantity Description Required
Serial card Two for each
serial channel
Each serial heartbeat channel requires a serial port on both cluster systems. To expand your serial port capacity, you can use multi-port serial PCI cards. Recommended multi-port cards include the following: Vision Systems VScom 200H PCI card, which provides two serial ports, is available from http://www.vscom.de Cyclades-4YoPCI+ card, which provides four serial ports, is available from http://www.cyclades.com. Note that since configuration of serial heartbeat channels is optional, it is not required to invest in additional hardware specifically for this purpose. Should future support be provided for more than 2 cluster members, serial heartbeat channel support may be deprecated.
No
Null modem cable
One for each channel
A null modem cable connects a serial port on one cluster system to a corresponding serial port on the other cluster system, creating a serial heartbeat channel.
Only for serial heartbeat channel
Table 2–9 Console Switch Hardware Table
Hardware Quantity Description Required
Terminal server
KVM
One
One A KVM enables multiple systems to share one
A terminal server enables you to manage many systems from one remote location.
keyboard, monitor, and mouse. Cables for connecting systems to the switch depend on the type of KVM.
No
No
Section 2.1:Choosing a Hardware Configuration 27
Table 2–10 UPS System Hardware Table
Hardware Quantity Description Required
UPS system One or two Uninterruptible power supply (UPS)
systems protect against downtime if a power outage occurs. UPS systems are highly recommended for cluster operation. Ideally, connect the power cables for the shared storage enclosure and both power switches to redundant UPS systems. In addition, a UPS system must be able to provide voltage for an adequate period of time, and should be connected to its own power circuit. A recommended UPS system is the APC Smart-UPS 1400 Rackmount available from http://www.apc.com.
Strongly recommended for availability
2.1.5 Example of a Minimum Cluster Configuration
The hardware components described in Table 2–11, Minimum Cluster Hardware Configuration Com­ponents can be used to set up a minimum cluster configuration. This configuration does not guarantee
data integrity under all failure conditions, because it does not include power switches. Note that this is a sample configuration; it is possible to set up a minimum configuration using other hardware.
Table 2–11 Minimum Cluster Hardware Configuration Components
Hardware Quantity
Two servers
Each cluster system includes the following hardware: Network interface for client access and an Ethernet heartbeat channel One Adaptec 29160 SCSI adapter (termination disabled) for the shared storage connection
Two network cables with RJ45 connectors
Network cables connect a network interface on each cluster system to the network for client access and Ethernet heartbeats.
28 Chapter 2:Hardware Installation and Operating System Configuration
Hardware Quantity
RAID storage enclosure The RAID storage enclosure contains one controller with at least
two host ports.
Two HD68 SCSI cables Each cable connects one HBA to one port on the RAID
controller, creating two single-initiator SCSI buses.
2.1.6 Example of a No-Single-Point-Of-Failure Configuration
The components described in Table2–12, No-Single-Point-Of-Failure Configuration Components can be used to set up a no-single-point-of-failure cluster configuration that includes two single-initiator SCSI buses and power switches to guarantee data integrity under all failure conditions. Note that this is a sample configuration; it is possible to set up a no-single-point-of-failure configuration using other hardware.
Table 2–12 No-Single-Point-Of-Failure Configuration Components
Hardware Quantity
Two servers
Each cluster system includes the following hardware: Two network interfaces for: Point-to-point Ethernet heartbeat channel Client network access and Ethernet heartbeat connection Three serial ports for: Point-to-point serial heartbeat channel Remote power switch connection Connection to the terminal server One Tekram Ultra2 DC-390U2W adapter (termination enabled) for the shared disk storage connection
One network switch A network switch enables the connection of multiple systems
to a network.
One Cyclades terminal server A terminal server allows for management of remote systems
from a central location. (A terminal server is not required for cluster operation.)
Three network cables Network cables connect the terminal server and a network
interface on each cluster system to the network switch.
Two RJ45 to DB9 crossover cables
RJ45 to DB9 crossover cables connect a serial port on each cluster system to the Cyclades terminal server.
Section 2.1:Choosing a Hardware Configuration 29
Hardware Quantity
One network crossover cable A network crossover cable connects a network interface on
one cluster system to a network interface on the other system, creating a point-to-point Ethernet heartbeat channel.
Two RPS-10 power switches Power switches enable each cluster system to power-cycle the
other system before restarting its services. The power cable for each cluster system is connected to its own power switch.
Three null modem cables Null modem cables connect a serial port on each cluster
system to the power switch that provides power to the other cluster system. This connection enables each cluster system to power-cycle the other system.
A null modem cable connects a serial port on one cluster system to a corresponding serial port on the other system, creating a point-to-point serial heartbeat channel.
FlashDisk RAID Disk Array with dual controllers
Two HD68 SCSI cables HD68 cables connect each host bus adapter to a RAID enclosure
Two terminators Terminators connected to each "out" port on the RAID enclosure
Redundant UPS Systems UPS systems provide a highly-available source of power. The
Figure 2–1, No-Single-Point-Of-Failure Configuration Example shows an example of a no-single- point-of-failure hardware configuration that includes the hardware described in the previous table, two single-initiator SCSI buses, and power switches to guarantee data integrity under all error condi­tions. A "T" enclosed in a circle represents a SCSI terminator.
Dual RAID controllers protect against disk and controller failure. The RAID controllers provide simultaneous access to all the logical units on the host ports.
"in" port, creating two single-initiator SCSI buses.
terminate both single-initiator SCSI buses.
power cables for the power switches and the RAID enclosure are connected to two UPS systems.
30 Chapter 2:Hardware Installation and Operating System Configuration
Figure 2–1 No-Single-Point-Of-Failure Configuration Example
2.2 Steps for Setting Up the Cluster Systems
After identifying the cluster hardware components described in Section 2.1, Choosing a Hardware Configuration, set up the basic cluster system hardware and connect the systems to the optional con-
sole switch and network switch or hub. Follow these steps:
1. In both cluster systems, install the required network adapters, serial cards, and host bus adapters. See Section 2.2.1, Installing the Basic System Hardware for more information about performing this task.
2. Set up the optional console switch and connect it to each cluster system. See Section 2.2.2, Setting Up a Console Switch for more information about performing this task.
If a console switch is not used, then connect each system to a console terminal.
3. Set up the optional network switch or hub and use conventionalnetwork cables to connect it to the cluster systems and the terminal server (if applicable). See Section 2.2.3, Setting Up a Network Switch or Hub for more information about performing this task.
If a network switch or hub is not used, then conventionalnetwork cables should be used to connect each system and the terminal server (if applicable) to a network.
After performing the previous tasks, install the Linux distribution as described in Section 2.3, Steps for Installing and Configuring the Red Hat Linux Distribution.
Loading...
+ 160 hidden pages