Documentation License, Version 1.1 or any later version published by the Free Software Foundation.
A copy of the license is included on the GNU Free Documentation License website.
Red Hat, Red Hat Network, the Red Hat "Shadow Man" logo, RPM, Maximum RPM, the RPM logo,
Linux Library, PowerTools, Linux Undercover, RHmember, RHmember More, Rough Cuts, Rawhide
and all Red Hat-based trademarks and logos are trademarks or registered trademarks of Red Hat, Inc.
in the United States and other countries.
Linux is a registered trademark of Linus Torvalds.
Motif and UNIX are registered trademarks of The Open Group.
Itanium is a registered trademark of Intel Corporation.
Netscape is a registered trademark of Netscape Communications Corporation in the United States and
other countries.
Windows is a registered trademark of Microsoft Corporation.
SSH and Secure Shell are trademarks of SSH Communications Security, Inc.
FireWire is a trademark of Apple Computer Corporation.
S/390 and zSeries are trademarks of International Business Machines Corporation.
All other trademarks and copyrights referred to are the property of their respective owners.
ii
Acknowledgments
The Red Hat Cluster Manager software was originally based on the open source Kimberlite
http://oss.missioncriticallinux.com/kimberlite/ cluster project which was developed by Mission
Critical Linux, Inc.
Subsequent to its inception based on Kimberlite, developers at Red Hat have made a large number
of enhancements and modifications. The following is a non-comprehensive list highlighting some of
these enhancements.
•Packagingand integration into theRed Hat installation paradigm in order to simplify the end user’s
experience.
•Addition of support for high availability NFS services.
•Addition of support for high availability Samba services.
•Addition of support for using watchdog timers as a data integrity provision
•Addition of service monitoring which will automatically restart a failed application.
•Rewrite of the service manager to facilitate additional cluster-wide operations.
•Addition of the Red Hat Cluster Manager GUI, a graphical monitoring tool.
•A set of miscellaneous bug fixes.
iii
The Red Hat Cluster Manager software incorporates STONITH compliant power switch modules
from the Linux-HA project http://www.linux-ha.org/stonith/.
Contents
Red Hat Cluster Manager
Acknowledgments........... ..................... ..................... .................. . iii
Chapter 1Introduction to Red Hat Cluster Manager ..... ...... 7
B.5Using Red Hat Cluster Manager with Piranha................................. 179
vi
Section 1.1:Cluster Overview7
1Introduction to Red Hat Cluster
Manager
The Red Hat Cluster Manager is a collection of technologies working together to provide data integrity and the ability to maintain application availability in the event of a failure. Using redundant
hardware, shared disk storage, power management, and robust cluster communication and application
failover mechanisms, a cluster can meet the needs of the enterprise market.
Specially suited for database applications, network file servers, and World Wide Web (Web) servers
with dynamic content, a cluster can also be used in conjunction with the Piranha load balancing cluster
software, based on the Linux Virtual Server (LVS) project, to deploy a highly available e-commerce
site that has complete data integrity and application availability, in addition to load balancing capabilities. See Section B.5, Using
Red Hat Cluster Manager
1.1 Cluster Overview
To set up a cluster, an administrator must connect the cluster systems (often referred to as member
systems) to the cluster hardware, and configure the systems into the cluster environment. The foun-
dation of a cluster is an advanced host membership algorithm. This algorithm ensures that the cluster
maintains complete data integrity at all times by using the following methods of inter-node communication:
•Quorum partitions on shared disk storage to hold system status
with Piranha for more information.
•Ethernet and serial connections between the cluster systems for heartbeat channels
To make an application and data highly availablein a cluster, the administrator must configure a clus-ter service — a discrete group of service properties and resources, such as an application and shared
disk storage. A service can be assigned an IP address to provide transparent client access to the service. For example, an administrator can set up a cluster service that provides clients with access to
highly-available database application data.
Both cluster systems can run any service and access the service data on shared disk storage. However,
each service can run on only one cluster system at a time, in order to maintain data integrity. Administrators can set up an active-active configuration in which both cluster systems run different services,
or a hot-standby configuration in which a primary cluster system runs all the services, and a backup
cluster system takes over only if the primary system fails.
8Chapter 1:Introduction to Red Hat Cluster Manager
Figure 1–1Example Cluster
Figure 1–1, Example Cluster shows an example of a cluster in an active-active configuration.
If a hardware or software failure occurs, the cluster will automatically restart the failed system’s ser-
vices on the functional cluster system. This service failover capability ensures that no data is lost,
and there is little disruption to users. When the failed system recovers, the cluster can re-balance the
services across the two systems.
In addition, a cluster administrator can cleanly stop the services running on a cluster system and then
restart them on the other system. This service relocation capability enables the administrator to maintain application and data availability when a cluster system requires maintenance.
Section 1.2:Cluster Features9
1.2 Cluster Features
A cluster includes the following features:
•No-single-point-of-failure hardware configuration
Clusters can include a dual-controller RAID array, multiple network and serial communication
channels, and redundant uninterruptible power supply (UPS) systems to ensure that no single failure results in application down time or loss of data.
Alternately, a low-cost cluster can be set up to provide less availability than a no-single-point-offailure cluster. For example, an administrator can set up a cluster with a single-controller RAID
array and only a single heartbeat channel.
Note
Certain low-cost alternatives, such as software RAID and multi-initiator
parallel SCSI, are not compatible or appropriate for use on the shared
cluster storage. Refer to Section 2.1, Choosing a Hardware Configura-tion, for more information.
•Service configuration framework
Clusters enable an administrator to easily configure individual services to make data and appli-
cations highly available. To create a service, an administrator specifies the resources used in the
service and properties for the service, including the service name, application start and stop script,
disk partitions, mount points, and the cluster system on which an administrator prefers to run the
service. After the administrator adds a service, the cluster enters the information into the cluster
database on shared storage, where it can be accessed by both cluster systems.
Thecluster provides an easy-to-use framework for databaseapplications. For example, a databaseserviceserves highly-available data to a database application. The application running on a cluster
system provides network access to database client systems, such as Web servers. If the service
fails over to another cluster system, the application can still access the shared database data. A
network-accessible database service is usually assigned an IP address, which is failed over along
with the service to maintain transparent access for clients.
The cluster service framework can be easily extended to other applications, as well.
•Data integrity assurance
To ensure data integrity, only one cluster system can run a service and access service data at one
time. Using power switches in the cluster configuration enable each cluster system to power-cycle
the other cluster system before restarting its services during the failover process. This prevents
10Chapter 1:Introduction to Red Hat Cluster Manager
the two systems from simultaneously accessing the same data and corrupting it. Although not
required, it is recommended that power switches are used to guarantee data integrity under all
failure conditions. Watchdog timers are an optional variety of power control to ensure correct
operation of service failover.
•Cluster administration user interface
A user interface simplifies cluster administration and enables an administrator to easily create,
start, stop, relocate services, and monitor the cluster.
•Multiple cluster communication methods
To monitor the health of the other cluster system, each cluster system monitors the health of the
remote power switch, if any, and issues heartbeat pings over network and serial channels to monitor the health of the other cluster system. In addition, each cluster system periodically writes a
timestamp and cluster state information to two quorum partitions located on shared disk storage.
System state information includes whether the system is an active cluster member. Service state
information includes whether the service is running and which cluster system is running the service. Each cluster system checks to ensure that the other system’s status is up to date.
To ensure correct cluster operation, if a system is unable to write to both quorum partitions at
startup time, it will not be allowedto join the cluster. In addition, if a cluster system is not updating
its timestamp, and if heartbeats to the system fail, the cluster system will be removed from the
cluster.
Section 1.2:Cluster Features11
Figure 1–2Cluster Communication Mechanisms
Figure 1–2, Cluster Communication Mechanisms shows how systems communicate in a cluster
configuration. Note that the terminal server used to access system consoles via serial ports is not
a required cluster component.
•Service failover capability
If a hardware or software failure occurs, the cluster will take the appropriate action to maintain ap-
plication availability and data integrity. For example, if a cluster system completely fails, the other
cluster system will restart its services. Services already running on this system are not disrupted.
When the failed system reboots and is able to write to the quorum partitions, it can rejoin the
cluster and run services. Depending on how the services are configured, the cluster can re-balance
the services across the two cluster systems.
12Chapter 1:Introduction to Red Hat Cluster Manager
•Manual service relocation capability
In addition to automatic service failover, a cluster enables administrators to cleanly stop services
on one cluster system and restart them on the other system. This allows administrators to perform
planned maintenance on a cluster system, while providing application and data availability.
•Event logging facility
To ensure that problems are detected and resolved before they affect service availability, the cluster
daemons log messages by using the conventional Linux syslog subsystem. Administrators can
customize the severity level of the logged messages.
•Application Monitoring
The cluster services infrastructure can optionally monitor the state and health of an application. In
this manner, should an application-specific failure occur, the cluster will automatically restart the
application. In response to the application failure, the application will attempt to be restarted on
the member it was initially running on; failing that, it will restart on the other cluster member.
•Status Monitoring Agent
A cluster status monitoring agent is used to gather vital cluster and application state information.
This information is then accessible both locally on the cluster member as well as remotely. A
graphical user interface can then display status information from multiple clusters in a manner
which does not degrade system performance.
1.3 How To Use This Manual
This manual contains information about setting up the cluster hardware, and installing the Linux distribution and the cluster software. These tasks are described in Chapter 2, Hardware Installation andOperating System Configuration and Chapter 3, Cluster Software Installation and Configuration.
For information about setting up and managing cluster services, see Chapter 4, Service Configurationand Administration. For information about managing a cluster, see Chapter 8, Cluster Administration.
Appendix A, Supplementary Hardware Information contains detailed configuration information on
specific hardware devices and shared storage configurations. Appendix B, Supplementary SoftwareInformation contains background information on the cluster software and other related information.
Section 2.1:Choosing a Hardware Configuration13
2Hardware Installation and Operating
System Configuration
To set up the hardware configuration and install the Linux distribution, follow these steps:
•Choosea cluster hardware configurationthat meets the needs of applications and users, see Section
2.1, Choosing a Hardware Configuration.
•Set up and connect the cluster systems and the optional console switch and network switch or hub,
see Section 2.2, Steps for Setting Up the Cluster Systems.
•Install and configure the Linux distribution on the cluster systems, see Section 2.3, Steps for In-stalling and Configuring the Red Hat Linux Distribution.
•Set up the remaining cluster hardware components and connect them to the cluster systems, see
Section 2.4, Steps for Setting Up and Connecting the Cluster Hardware.
After setting up the hardware configuration and installing the Linux distribution, installing the cluster
software is possible.
2.1 Choosing a Hardware Configuration
The Red Hat Cluster Manager allows administrators to use commodity hardware to set up a cluster
configuration that will meet the performance, availability,and data integrity needs of applications and
users. Cluster hardware ranges from low-cost minimum configurations that include only the components required for cluster operation, to high-end configurations that include redundant heartbeat
channels, hardware RAID, and power switches.
Regardlessofconfiguration, the use of high-qualityhardwarein a cluster is recommended, as hardware
malfunction is the primary cause of system down time.
Although all cluster configurations provide availability,some configurations protect against every single point of failure. In addition, all cluster configurations provide data integrity, but some configurations protect data under every failure condition. Therefore, administrators must fully understand the
needs of their computing environment and also the availability and data integrity features of different
hardwareconfigurationsin order to choose the cluster hardware that will meet the proper requirements.
When choosing a cluster hardware configuration, consider the following:
Performance requirements of applications and users
Choose a hardware configuration that will provide adequate memory, CPU, and I/O resources.
Be sure that the configuration chosen will be able to handle any future increases in workload,
as well.
14Chapter 2:Hardware Installation and Operating System Configuration
Cost restrictions
The hardware configuration chosen must meet budget requirements. For example, systems with
multiple I/O ports usually cost more than low-end systems with less expansion capabilities.
Availability requirements
If a computing environment requires the highest degree of availability, such as a production environment, then a cluster hardware configurationthat protects against all single points of failure,
includingdisk, storage interconnect, heartbeatchannel, and power failures is recommended. Environments that can tolerate an interruption in availability, such as development environments,
may not require as much protection. See Section 2.4.1, Configuring Heartbeat Channels, Section 2.4.3, Configuring UPS Systems, and Section 2.4.4, Configuring Shared Disk Storage for
more information about using redundant hardware for high availability.
Data integrity under all failure conditions requirement
Using power switches in a cluster configuration guarantees that service data is protected under
every failure condition. These devices enable a cluster system to power cycle the other cluster
system before restarting its services during failover. Power switches protect against data corruption if an unresponsive(or hanging) system becomes responsiveafter its services have failed
over,and then issues I/O to a disk that is also receiving I/O from the other cluster system.
In addition, if a quorum daemon fails on a cluster system, the system is no longer able to monitor
thequorumpartitions. If you are not usingpowerswitchesin the cluster,this error condition may
result in services being run on more than one cluster system, which can cause data corruption.
See Section 2.4.2, ConfiguringPower Switches for more information about the benefits of using
powerswitches ina cluster. It is recommended thatproduction environmentsusepowerswitches
or watchdog timers in the cluster configuration.
2.1.1 Shared Storage Requirements
The operation of the cluster depends on reliable, coordinated access to shared storage. In the event of
hardware failure, it is desirable to be able to disconnect one member from the shared storage for repair
without disrupting the other member. Shared storage is truly vital to the cluster configuration.
Testing has shown that it is difficult, if not impossible, to configure reliable multi-initiator parallel
SCSI configurations at data rates above 80 MBytes/sec. using standard SCSI adapters. Further tests
have shown that these configurations can not support online repair because the bus does not work
reliably when the HBA terminators are disabled, and external terminators are used. For these reasons,
multi-initiator SCSI configurations using standard adapters are not supported. Single-initiatorparallel
SCSI buses, connected to multi-ported storage devices, or Fibre Channel, are required.
The Red Hat Cluster Manager requires that both cluster members have simultaneous access to the
shared storage. Certain host RAID adapters are capable of providing this type of access to shared
Section 2.1:Choosing a Hardware Configuration15
RAID units. These products require extensive testing to ensure reliable operation, especially if the
shared RAID units are based on parallel SCSI buses. These products typically do not allow for online
repair of a failed system. No host RAID adapters are currently certified with Red Hat Cluster Manager.
Refer to the Red Hat web site at http://www.redhat.com for the most up-to-date supported hardware
matrix.
The use of software RAID, or software Logical Volume Management (LVM), is not supported on
shared storage. This is because these products do not coordinate access from multiple hosts to shared
storage. SoftwareRAID or LVMmay be used on non-shared storage on cluster members (for example,
boot and system partitions and other filesysytems which are not associated with any cluster services).
2.1.2 Minimum Hardware Requirements
A minimum hardware configuration includes only the hardware components that are required for
cluster operation, as follows:
•Two servers to run cluster services
•Ethernet connection for a heartbeat channel and client network access
•Shared disk storage for the cluster quorum partitions and service data.
See Section 2.1.5, Example of a Minimum Cluster Configuration for an example of this type of hardware configuration.
The minimum hardware configuration is the most cost-effective cluster configuration; however, it includes multiple points of failure. For example, if the RAID controller fails, then all cluster services
will be unavailable. When deploying the minimal hardware configuration, software watchdog timers
should be configured as a data integrity provision.
To improve availability,protect against component failure, and guarantee data integrity under all failure conditions, the minimum configuration can be expanded. Table 2–1, Improving Availability andGuaranteeing Data Integrity shows how to improve availability and guarantee data integrity:
Table 2–1Improving Availability and Guaranteeing Data Integrity
ProblemSolution
Disk failureHardware RAID to replicate data across multiple disks.
RAID controller failureDual RAID controllers to provide redundant access to
disk data.
Heartbeat channel failurePoint-to-point Ethernet or serial connection between
the cluster systems.
16Chapter 2:Hardware Installation and Operating System Configuration
ProblemSolution
Power source failureRedundant uninterruptible power supply (UPS) systems.
Data corruption under all failure
conditions
A no-single-point-of-failure hardware configuration that guarantees data integrity under all failure
conditions can include the following components:
•Two servers to run cluster services
•Ethernet connection between each system for a heartbeat channel and client network access
•Dual-controller RAID array to replicate quorum partitions and service data
•Two power switches to enable each cluster system to power-cycle the other system during the
failover process
•Point-to-point Ethernet connection between the cluster systems for a redundant Ethernet heartbeat
channel
•Point-to-point serial connection between the cluster systems for a serial heartbeat channel
•Two UPS systems for a highly-available source of power
See Section 2.1.6, Example of a No-Single-Point-Of-FailureConfiguration for an example of this type
of hardware configuration.
Cluster hardware configurations can also include other optional hardware components that are common in a computing environment. For example, a cluster can include a network switch or networkhub, which enables the connection of the cluster systems to a network. A cluster may also include
a console switch, which facilitates the management of multiple systems and eliminates the need for
separate monitors, mouses, and keyboards for each cluster system.
One type of console switch is a terminal server, which enables connection to serial consoles and
management of many systems from one remote location. As a low-cost alternative, you can use a
KVM (keyboard, video, and mouse) switch, which enables multiple systems to share one keyboard,
monitor,andmouse. A KVM is suitable forconfigurationsin which access to a graphical user interface
(GUI) to perform system management tasks is preferred.
When choosing a cluster system, be sure that it provides the PCI slots, network slots, and serial ports
that the hardware configuration requires. For example, a no-single-point-of-failure configuration requires multiple serial and Ethernet ports. Ideally, choose cluster systems that have at least two serial
ports. See Section 2.2.1, Installing the Basic System Hardware for more information.
Power switches or hardware-based watchdog timers
Section 2.1:Choosing a Hardware Configuration17
2.1.3 Choosing the Type of Power Controller
The Red Hat Cluster Manager implementation consists of a generic power management layer and
a set of device specific modules which accommodate a range of power management types. When selecting the appropriate type of power controller to deploy in the cluster, it is important to recognize the
implications of specific device types. The following describes the types of supported power switches
followed by a summary table. For a more detailed description of the role a power switch plays to
ensure data integrity, refer to Section 2.4.2, Configuring Power Switches.
Serial- and Network-attached power switches are separate devices which enable one cluster member
to power cycle another member. They resemble a power plug strip on which individual outlets can be
turned on and off under software control through either a serial or network cable.
Watchdog timers providea means for failed systems to remove themselves from the cluster prior to another system taking over its services, rather than allowing one cluster member to power cycle another.
The normal operational mode for watchdog timers is that the cluster software must periodically reset
a timer prior to its expiration. If the cluster software fails to reset the timer, the watchdog will trigger
under the assumption that the system may have hung or otherwise failed. The healthy cluster member
allowsa window of time to pass prior to concluding that another cluster member has failed (by default,
this window is 12 seconds). The watchdog timer interval must be less than the duration of time for
one cluster member to conclude that another has failed. In this manner, a healthy system can assume
that prior to taking over services for a failed cluster member, that it has safely removed itself from the
cluster (by rebooting) and therefore is no risk to data integrity. The underlying watchdog support is
included in the core Linux kernel. Red Hat Cluster Manager utilizes these watchdog features via
its standard APIs and configuration mechanism.
Thereare two types of watchdog timers: Hardware-basedand software-based. Hardware-based watchdog timers typically consist of system board components such as the Intel
circuitry has a high degree of independence from the main system CPU. This independence is beneficial in failure scenarios of a true system hang, as in this case it will pull down the system’s reset lead
resulting in a system reboot. There are some PCI expansion cards that provide watchdog features.
®
i810 TCO chipset. This
The second type of watchdog timer is software-based. This category of watchdog does not have any
dedicated hardware. The implementation is a kernel thread which is periodically run and if the timer
duration has expired will initiate a system reboot. The vulnerability of the software watchdog timer
is that under certain failure scenarios such as system hangs while interrupts are blocked, the kernel
thread will not be called. As a result, in such conditions it can not be definitively depended on for
data integrity. This can cause the healthy cluster member to take over services for a hung node which
could cause data corruption under certain scenarios.
Finally,administrators can choose not to employ a powercontroller at all. If choosing the "None" type,
note that there are no provisions for a cluster member to power cycle a failed member. Similarly, the
failed member can not be guaranteed to reboot itself under all failure conditions. Deploying clusters
18Chapter 2:Hardware Installation and Operating System Configuration
with a power controller type of "None" is useful for simple evaluation purposes, but because it affords
the weakest data integrity provisions, it is not recommended for usage in a production environment.
Ultimately, the right type of power controller deployed in a cluster environment depends on the data
integrity requirements weighed against the cost and availability of external power switches.
Table 2–2, Power Switches summarizes the types of supported power management modules and dis-
cusses their advantages and disadvantages individually.
Table 2–2Power Switches
TypeNotesProsCons
Serial-attached
power switches
Networkattached power
switches
Hardware
Watchdog Timer
Software
Watchdog Timer
No power
controller
Two serial attached
power controllers are
used in a cluster (one
per member system)
A single network
attached power
controller is required
per cluster
Affords strong data
integrity guarantees
Offers acceptable data
integrity provisions
No power controller
function is in use
Affords strong data
integrity guarantees. the
power controller itself
is not a single point of
failure as there are two
in a cluster.
Affords strong data
integrity guarantees.
Obviates the need to
purchase external power
controller hardware
Obviates the need to
purchase external power
controller hardware;
works on any system
Obviates the need to
purchase external power
controller hardware;
works on any system
Requires purchase
of power controller
hardware and cables;
consumes serial ports
Requires purchase
of power controller
hardware. The power
controller itself can be
come a single point
of failure (although
they are typically very
reliable devices).
Not all systems include
supported watchdog
hardware
Under some failure
scenarios, the software
watchdog will not be
operational, opening
a small vulnerability
window
Vulnerable to data
corruption under certain
failure scenarios
Section 2.1:Choosing a Hardware Configuration19
2.1.4 Cluster Hardware Tables
Use the following tables to identify the hardware components required for your cluster configuration.
In some cases, the tables list specific products that have been tested in a cluster, although a cluster is
expected to work with other products.
The complete set of qualified cluster hardware components change over time. Consequently, the table
below may be incomplete. For the most up-do-date itemization of supported hardware components,
refer to the Red Hat documentation website at http://www.redhat.com/docs.
Table 2–3Cluster System Hardware Table
HardwareQuantityDescriptionRequired
Cluster
system
Two
Red Hat Cluster Manager supports IA-32
hardware platforms. Each cluster system
must provide enough PCI slots, network
slots, and serial ports for the cluster hardware
configuration. Because disk devices must have
the same name on each cluster system, it is
recommended that the systems have symmetric
I/O subsystems. In addition, it is recommended
that each system have a minimum of 450
MHz CPU speed and 256 MB of memory.
See Section 2.2.1, Installing the Basic SystemHardware for more information.
Yes
Table 2–4, PowerSwitch Hardware Tableincludes several different types of power switches. A single
cluster requires only one type of power switch shown below.
20Chapter 2:Hardware Installation and Operating System Configuration
Table 2–4Power Switch Hardware Table
HardwareQuantityDescriptionRequired
Serial power
switches
Null modem
cable
Two
Two
Power switches enable each cluster
system to power-cycle the other cluster
system. See Section 2.4.2, ConfiguringPower Switches for information about
using power switches in a cluster. Note
that clusters are configured with either
serial or network attached power switches
and not both.
The following serial attached power
switch has been fully tested:
RPS-10 (model M/HD in the
US, and model M/EC in
Europe), which is available from
http://www.wti.com/rps-10.htm. Refer to
Section A.1.1, Setting up RPS-10 Power
Switches
Latent support is provided for the
following serial attached power switch.
This switch has not yet been fully tested:
APC Serial On/Off Switch (partAP9211),
http://www.apc.com
Null modem cables connect a serial port on a
cluster system to a serial power switch. This
serial connection enables each cluster system
to power-cycle the other system. Some power
switches may require different cables.
Strongly
recommended
for data
integrity
under all
failure
conditions
Only if using
serial power
switches
Mounting
bracket
OneSome power switches support rack mount
configurations and require a separate mounting
bracket (e.g. RPS-10).
Only for rack
mounting
power
switches
Section 2.1:Choosing a Hardware Configuration21
HardwareQuantityDescriptionRequired
Network
power switch
Watchdog
Timer
OneNetwork attached power switches
enable each cluster member to power
cycle all others. Refer to Section
2.4.2, Configuring Power Switches for
information about using network attached
power switches, as well as caveats
associated with each.
The following network attached power
switch has been fully tested:
· WTI NPS-115, or NPS-230, available
from http://www.wti.com. Note that
the NPS power switch can properly
accommodate systems with dual
redundant power supplies. Refer to
Section A.1.2, Setting up WTI NPS PowerSwitches.
· Baytech RPC-3 and RPC-5,
http://www.baytech.net
Latent support is provided for the APC
Master Switch (AP9211, or AP9212),
www.apc.com
Two
Watchdog timers cause a failed cluster
member to remove itself from a cluster
prior to a healthy member taking over its
services.
Refer to Section 2.4.2, Configuring PowerSwitches for more information
Strongly
recommended
for data
integrity
under all
failure
conditions
Recommended for
data integrity
on systems
which provide integrated watchdog hardware
The following table shows a variety of storage devices for an administrator to choose from. An individual cluster does not require all of the components listed below.
22Chapter 2:Hardware Installation and Operating System Configuration
Table 2–5Shared Disk Storage Hardware Table
HardwareQuantityDescriptionRequired
External
disk storage
enclosure
OneUse Fibre Channel or single-initiator
parallel SCSI to connect the cluster
systems to a single or dual-controller
RAID array. To use single-initiator buses,
a RAID controller must have multiple
host ports and provide simultaneous
access to all the logical units on the host
ports. To use a dual-controller RAID
array, a logical unit must fail over from
one controller to the other in a way that is
transparent to the operating system.
The following are recommended SCSI
RAID arrays that provide simultaneous
access to all the logical units on the host
ports (this is not a comprehensive list;
rather its limited to those RAID boxes
which have been tested):
· Winchester Systems FlashDisk RAID
Disk Array, which is available from
http://www.winsys.com.
· Dot Hill’s SANnet Storage
Systems, which is available from
http://www.dothill.com
· Silicon Image CRD-7040 & CRA-7040,
CRD -7220, CRD-7240 & CRA-7240,
CRD-7400 & CRA-7400 controller
based RAID arrays. Available from
http://www.synetexinc.com
In order to ensure symmetry of device
IDs and LUNs, many RAID arrays with
dual redundant controllers are required to
be configured in an active/passive mode.
See Section 2.4.4, Configuring SharedDisk Storage for more information.
Yes
Section 2.1:Choosing a Hardware Configuration23
HardwareQuantityDescriptionRequired
Host bus
adapter
Two
To connect to shared disk storage, you
must install either a parallel SCSI or a
Yes
Fibre Channel host bus adapter in a PCI
slot in each cluster system.
For parallel SCSI, use a low voltage
differential (LVD) host bus adapter.
Adapters have either HD68 or VHDCI
connectors. Recommended parallel SCSI
host bus adapters include the following:
· Adaptec 2940U2W, 29160, 29160LP,
39160, and 3950U2
· Adaptec AIC-7896 on the Intel
L440GX+ motherboard
· Qlogic QLA1080 and QLA12160
· Tekram Ultra2 DC-390U2W
· LSI Logic SYM22915
· A recommended Fibre Channel host bus
adapter is the Qlogic QLA2200.
See Section A.6, Host Bus Adapter
Features and Configuration Requirements
for device features and configuration
information.
Host-bus adapter based RAID cards are
only supported if they correctly support
multi-host operation. At the time of
publication, there were no fully tested
host-bus adapter based RAID cards.
Refer to http://www.redhat.com for more
the latest hardware information.
SCSI cable
Two
SCSI cables with 68 pins connect each host bus
adapter to a storage enclosure port. Cables have
either HD68 or VHDCI connectors. Cables
vary based on adapter type
Only for parallel SCSI
configurations
24Chapter 2:Hardware Installation and Operating System Configuration
HardwareQuantityDescriptionRequired
SCSI
terminator
Two
For a RAID storage enclosure that uses "out"
ports (such as FlashDisk RAID Disk Array)
and is connected to single-initiator SCSI buses,
connect terminators to the "out" ports in order
to terminate the buses.
Only for parallel SCSI
configurations and only
if necessary
for termination
Fibre Channel
hub or switch
Fibre Channel
cable
One or twoA Fibre Channel hub or switch is required.Only for some
Fibre Channel configurations
Two to sixA Fibre Channel cable connects a host bus
adapter to a storage enclosure port, a Fibre
Channel hub, or a Fibre Channel switch. If a
hub or switch is used, additional cables are
needed to connect the hub or switch to the
storage adapter ports.
Only for Fibre
Channel configurations
Table 2–6Network Hardware Table
HardwareQuantityDescriptionRequired
Network
interface
Network
switch or hub
Network
cable
One for
each network
connection
OneA network switch or hub allows connection of
One for
each network
interface
Each network connection requires a network
interface installed in a cluster system.
multiple systems to a network.
A conventional network cable, such as a cable
with an RJ45 connector, connects each network
interface to a network switch or a network hub.
Each Ethernet heartbeat channel requires a
network interface installed in both cluster
systems.
No
Network
crossover
cable
One for each
channel
A network crossover cable connects a network
interface on one cluster system to a network
interface on the other cluster system, creating
an Ethernet heartbeat channel.
Only for a
redundant
Ethernet
heartbeat
channel
26Chapter 2:Hardware Installation and Operating System Configuration
Table 2–8Point-To-Point Serial Heartbeat Channel Hardware Table
HardwareQuantityDescriptionRequired
Serial cardTwo for each
serial channel
Each serial heartbeat channel requires
a serial port on both cluster systems.
To expand your serial port capacity,
you can use multi-port serial PCI cards.
Recommended multi-port cards include
the following:
Vision Systems VScom 200H PCI card,
which provides two serial ports, is
available from http://www.vscom.de
Cyclades-4YoPCI+ card, which provides
four serial ports, is available from
http://www.cyclades.com.
Note that since configuration of serial
heartbeat channels is optional, it is not
required to invest in additional hardware
specifically for this purpose. Should
future support be provided for more
than 2 cluster members, serial heartbeat
channel support may be deprecated.
No
Null modem
cable
One for each
channel
A null modem cable connects a serial port on
one cluster system to a corresponding serial port
on the other cluster system, creating a serial
heartbeat channel.
Only for serial
heartbeat
channel
Table 2–9Console Switch Hardware Table
HardwareQuantityDescriptionRequired
Terminal
server
KVM
One
OneA KVM enables multiple systems to share one
A terminal server enables you to manage
many systems from one remote location.
keyboard, monitor, and mouse. Cables for
connecting systems to the switch depend on the
type of KVM.
No
No
Section 2.1:Choosing a Hardware Configuration27
Table 2–10UPS System Hardware Table
HardwareQuantityDescriptionRequired
UPS systemOne or twoUninterruptible power supply (UPS)
systems protect against downtime if
a power outage occurs. UPS systems
are highly recommended for cluster
operation. Ideally, connect the power
cables for the shared storage enclosure
and both power switches to redundant
UPS systems. In addition, a UPS system
must be able to provide voltage for an
adequate period of time, and should be
connected to its own power circuit.
A recommended UPS system is the APC
Smart-UPS 1400 Rackmount available
from http://www.apc.com.
Strongly
recommended
for
availability
2.1.5 Example of a Minimum Cluster Configuration
The hardware components described in Table 2–11, Minimum Cluster Hardware Configuration Components can be used to set up a minimum cluster configuration. This configuration does not guarantee
data integrity under all failure conditions, because it does not include power switches. Note that this
is a sample configuration; it is possible to set up a minimum configuration using other hardware.
Each cluster system includes the following hardware:
Network interface for client access and an Ethernet
heartbeat channel
One Adaptec 29160 SCSI adapter (termination disabled)
for the shared storage connection
Two network cables with RJ45
connectors
Network cables connect a network interface on each cluster
system to the network for client access and Ethernet heartbeats.
28Chapter 2:Hardware Installation and Operating System Configuration
HardwareQuantity
RAID storage enclosureThe RAID storage enclosure contains one controller with at least
two host ports.
Two HD68 SCSI cablesEach cable connects one HBA to one port on the RAID
controller, creating two single-initiator SCSI buses.
2.1.6 Example of a No-Single-Point-Of-Failure Configuration
The components described in Table2–12, No-Single-Point-Of-Failure Configuration Components can
be used to set up a no-single-point-of-failure cluster configuration that includes two single-initiator
SCSI buses and power switches to guarantee data integrity under all failure conditions. Note that this
is a sample configuration; it is possible to set up a no-single-point-of-failure configuration using other
hardware.
Each cluster system includes the following hardware:
Two network interfaces for:
Point-to-point Ethernet heartbeat channel
Client network access and Ethernet heartbeat connection
Three serial ports for:
Point-to-point serial heartbeat channel
Remote power switch connection
Connection to the terminal server
One Tekram Ultra2 DC-390U2W adapter (termination
enabled) for the shared disk storage connection
One network switchA network switch enables the connection of multiple systems
to a network.
One Cyclades terminal serverA terminal server allows for management of remote systems
from a central location. (A terminal server is not required for
cluster operation.)
Three network cablesNetwork cables connect the terminal server and a network
interface on each cluster system to the network switch.
Two RJ45 to DB9 crossover
cables
RJ45 to DB9 crossover cables connect a serial port on each
cluster system to the Cyclades terminal server.
Section 2.1:Choosing a Hardware Configuration29
HardwareQuantity
One network crossover cableA network crossover cable connects a network interface on
one cluster system to a network interface on the other system,
creating a point-to-point Ethernet heartbeat channel.
Two RPS-10 power switchesPower switches enable each cluster system to power-cycle the
other system before restarting its services. The power cable for
each cluster system is connected to its own power switch.
Three null modem cablesNull modem cables connect a serial port on each cluster
system to the power switch that provides power to the
other cluster system. This connection enables each cluster
system to power-cycle the other system.
A null modem cable connects a serial port on one cluster
system to a corresponding serial port on the other system,
creating a point-to-point serial heartbeat channel.
FlashDisk RAID Disk Array
with dual controllers
Two HD68 SCSI cablesHD68 cables connect each host bus adapter to a RAID enclosure
Two terminatorsTerminators connected to each "out" port on the RAID enclosure
Redundant UPS SystemsUPS systems provide a highly-available source of power. The
Figure 2–1, No-Single-Point-Of-Failure Configuration Example shows an example of a no-single-
point-of-failure hardware configuration that includes the hardware described in the previous table,
two single-initiator SCSI buses, and power switches to guarantee data integrity under all error conditions. A "T" enclosed in a circle represents a SCSI terminator.
Dual RAID controllers protect against disk and controller failure.
The RAID controllers provide simultaneous access to all the
logical units on the host ports.
"in" port, creating two single-initiator SCSI buses.
terminate both single-initiator SCSI buses.
power cables for the power switches and the RAID enclosure are
connected to two UPS systems.
30Chapter 2:Hardware Installation and Operating System Configuration
Figure 2–1No-Single-Point-Of-Failure Configuration Example
2.2 Steps for Setting Up the Cluster Systems
After identifying the cluster hardware components described in Section 2.1, Choosing a Hardware
Configuration, set up the basic cluster system hardware and connect the systems to the optional con-
sole switch and network switch or hub. Follow these steps:
1. In both cluster systems, install the required network adapters, serial cards, and host bus adapters.
See Section 2.2.1, Installing the Basic System Hardware for more information about performing
this task.
2. Set up the optional console switch and connect it to each cluster system. See Section 2.2.2, SettingUp a Console Switch for more information about performing this task.
If a console switch is not used, then connect each system to a console terminal.
3. Set up the optional network switch or hub and use conventionalnetwork cables to connect it to the
cluster systems and the terminal server (if applicable). See Section 2.2.3, Setting Up a NetworkSwitch or Hub for more information about performing this task.
If a network switch or hub is not used, then conventionalnetwork cables should be used to connect
each system and the terminal server (if applicable) to a network.
After performing the previous tasks, install the Linux distribution as described in Section 2.3, Stepsfor Installing and Configuring the Red Hat Linux Distribution.
Loading...
+ 160 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.