Microsoft, Windows, Windows NT, Windows NT Advanced Server, SQL Server for Windows
.
.
NT are trademarks and/or registered trademarks of Microsoft Corporation.
.
.
.
.
.
ADABAS /D is trademark and/or registered trademark of Software AG.
.
.
.
.
.
.
.
.
.
.
.
.
.
Compaq Recovery Server Solutions for SAP R/3 on various platforms
.
.
.
.
.
.
.
First Edition (December 1996)
.
.
Document Number 465A/1196
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
33
WWHITE HITE PPAPERAPER (cont.)
.
.
.
.
.
.
INTRODUCTION
.
.
.
.
.
Doc Number 465A/1196
The purpose of this White Paper is to help customers implement Compaq Recovery Server
.
.
.
solutions in an environment using SAP R/3 with its related database platform. This White Paper
.
.
.
addresses the process for:
.
.
.
.
• Setting-up the platforms for either Compaq Standby Recover Server or Compaq On-Line
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Recovery Server
• Configuration specifications necessary for SAP R/3
• Using the database to be implemented in a recovery mode
This White Paper includes information extracted from other Compaq White Papers and technical
documentation. The level of detail in this White Paper should explain the technical concepts fully
and provide information on implementing the concepts in practical situations. You can find
additional details in the following Compaq White Papers:
Compaq Standby Recovery Server (document number 180B/0495)
Compaq On-Line Recovery Server (document number 043A/1095)
You can also find details in the Recovery Server Option User Guide (part number
213818-002), which comes with the Recovery Server Option Kit and is also available as an
independent product.
The majority of this White Paper discusses the On-Line Recovery Server implementation.
Because the Standby Recovery Server solution is application-independent, SAP R/3 requires no
special or specific configuration changes. The On-Line Recovery Server solution requires specific
configuration and implementation processes to automatically start up the database and the SAP
R/3 instance on the recovery server. This process of starting the database and R/3 is illustrated by
means of script files, which are platform-specific and must be adapted to the particular
configuration of each platform.
The modular, distributed architecture of SAP R/3 makes it suitable for either of the Compaq
Recovery Server solutions. Due to the sophistication of this architecture, and the critical nature of
a R/3 system, the recovery procedures must be fully compliant with the specifications for recovery
software provided by the SAP High Availability Guide. This document describes methods and
techniques that have been tested as specified in that guide.
COMPAQ RECOVERY SERVER SOLUTIONS OVERVIEW
In the Standby Recovery Server configuration, one server functions as the primary server and
another server functions as a hot standby recovery server that remains idle until a there is a
switchover. All disk storage is external to both servers. The disk storage switches from the
primary server to the recovery server when a fault is detected via the Compaq Recovery Server
Switch. The Recovery Server Switch is an electrically controlled SCSI switch that allows selected
storage devices to be switched dynamically from the failed server to the surviving server.
The On-Line Recovery Server configuration pairs an independently operating Compaq server as
an automatic, hot standby for the primary server. If the primary server fails, the ProLiant Storage
System(s) attached to the failed server will be automatically switched over to the surviving server
via the Compaq Recovery Server Switch.
Table 1 summarizes the differences between the two recovery server configurations.
44
WWHITE HITE PPAPERAPER (cont.)
.
.
.
.
.
.
.
.
.
.
Doc Number 465A/1196
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
COMPARISON OF STANDBY AND ON-LINE CONFIGURATION
StandbyStandbyOn-LineOn-Line
Single network identity. Only the primary server is active
on the network.
Single active server.Two active servers.
Switchover restores operating system and applications.Switchover restores only switched disks. Operating
Benefits all applications.Benefits specific application(s).
No local disks.Local disk (to contain at least the operating system)
SAP R/3 AND RECOVERY SERVER SOLUTIONS
IMPLEMENTATION C ONSIDERATIONS
A typical R/3 platform consists of one database server and a variable number of application
servers, depending on the processing requirements imposed by the workload. SAP R/3 services
are distributed among these servers according to processing requirements imposed by the user
workload. Some of these services can have several instances running on different servers, but
others must run just once on a certain server in the configuration. These are single points of
failure of a R/3 system.
To minimize unplanned R/3 system downtime, all single points of failure in a system should be
secured. A single point of failure can be defined as a component that will lead to (severe) service
loss in case of failure.
Table 2 shows the services offered by the R/3 system. Single points of failure appear in italics and
are underlined.
ServiceServiceNumber of InstancesNumber of Instances
DBMS1 per R/3 System
Dispatcher1 per App-Server
Dialog service1 ... n per App-Server
Update service0 ... n per App-Server
Enqueue service1 per R/3 System
Batch service0 ... n per App-Server
Message service1 per R/3 System
Gateway service1 per R/3 Instance
Spool service1 per App-Server
55
TABLE 1
Two network identities. The primary on recovery server
are active servers on the network.
system is stored and runs of on local disks.
required along with switched disks.
TABLE 2
R3 SERVICES
WWHITE HITE PPAPERAPER (cont.)
.
.
If only one single point of failure is protected, some R/3 services will still be left unprotected. If
.
.
.
one of the remaining single points of failure subsequently fails, the R/3 system will be
.
.
.
unavailable (until R/3 has been reconfigured and restarted). It is therefore recommended to
.
.
.
Doc Number 465A/1196
concentrate all single points of failure on a server that is protected by one of the Compaq
.
.
.
Recovery Server solutions.
.
.
.
.
The database server, also running the enqueue and message services, as well as sapcomm and
.
.
.
saprouter, is a primary candidate for a recovery server implementation because its availability is
.
.
.
crucial to the functioning of the whole R/3 system. If an R/3 application server fails, users are
.
.
.
still able to work because they can reconnect to a dialog instance running on a different
.
.
.
application server or on the database server. Furthermore in such a case, the reconnection could
.
.
.
be automatically provided by the R/3 load balancing mechanism.
.
.
.
.
.
.
.
STANDBY RECOVERY SERVER TECHNOLOGY
.
.
.
.
.
The Standby Recovery Server solution automatically switches all shared disk storage from a
.
.
failed R/3-Database server to a standby recovery server that is waiting to boot the Windows NT
.
.
.
operating system and re-establish access to database files that are stored on the shared drives.
.
.
.
When a switchover occurs, the recovery server electrically switches all of the disk storage from
.
.
.
the primary server to the disk controller contained in the recovery server. When the switchover
.
.
.
is complete, the recovery server begins a normal operating system boot sequence using the same
.
.
.
disks that were previously attached to the primary server.
.
.
.
.
.
The primary and recovery server systems are not required to be identical. However, there are
.
.
.
configurations guidelines that must be met in order for the Standby Recover Server
.
.
configuration to function properly. Although, Compaq recommends that the two servers be
.
.
.
identical.
.
.
.
.
.
Both the primary server and recovery server are connected by a SCSI cable to a Compaq
.
.
.
ProLiant Storage System, which holds a single copy of the Windows NT operating system, R/3
.
.
.
application software, database executables, and the database.
.
.
.
.
The Recovery Server Switch, an electrically controlled SCSI switch, must be installed in each
.
.
.
switchable Compaq ProLiant Storage System. This Recovery Server Switch actually
.
.
.
accomplishes the electrical switching of the disk storage that has a SCSI cable connection to the
.
.
.
primary and recovery servers. All disk storage in Standby Recovery Server configurations must
.
.
.
be contained in external ProLiant Storage Systems that have had the Recovery Server Switch
.
.
.
installed. No disks can be installed internal to the Compaq server. No disks can be attached to the
.
.
.
integrated SCSI controller of the Compaq server. The integrated SCSI controller can however be
.
.
.
used for CD-ROM drives and tape drives.
.
.
.
.
The primary and recovery servers are physically linked as shown in Figure 3 by the Recovery
.
.
.
Server Interconnect, a RS-232 serial cable with specific pinout connections required for the
.
.
.
Standby Recovery Server solution. The Recovery Server Interconnect is required for proper
.
.
.
operation.
.
.
.
.
NOTE:The Standby Recovery Server solution will NOT function with other serial
.
.
.
cables such as null modem cables.
.
.
.
.
.
The primary and recovery server’s hardware configuration should be identical down to the slot
.
.
.
number of each controller board. Theoretically the servers could differ in memory and CPU
.
.
.
configuration as these are dynamically determined by the Windows NT operating system during
.
.
.
the boot process. However, Compaq strongly recommends that the primary and recovery servers
.
.
have identical hardware configurations.
.
.
.
.
.
.
.
.
.
.
66
WWHITE HITE PPAPERAPER (cont.)
.
.
Each server contains at least one SMART SCSI Array Controller or SMART-2 SCSI Array
.
.
.
Controller. SMART and SMART-2 Array Controllers can only be attached to disk drives that are
.
.
.
contained in external ProLiant Storage Systems that have the Recovery Server Switch installed.
.
.
.
Doc Number 465A/1196
Corresponding array controllers in primary and recovery servers that are connected to the same
.
.
.
ProLiant Storage System MUST be of the same type - either both SMART Array Controllers or
.
.
.
both SMART-2 Array Controllers. Corresponding array controllers in the primary and recovery
.
.
.
servers MUST have the same slot placement in each system.
.
.
.
.
Each server’s network interface controller (NIC) must be identical in type, slot placement, and
.
.
.
configuration. Integrated NICs can only be used if they are identical between the primary and
.
.
.
recovery servers. Otherwise, the integrated NIC must be disabled and identical NICs must be
.
.
.
installed in the same expansion slot number and identically configured in each server.
.
.
.
.
The Recovery Server Option Driver must be installed on the Windows NT Server configured as
.
.
.
the primary server. The Standby Recovery Server failure detection mechanism is based on the
.
.
.
Recovery Server Option Driver running on the primary server. As long as the recovery server
.
.
.
receives the heartbeat message within the time-out interval, it assumes that the primary server
.
.
.
has not failed. Any failure in the primary server that stops the Recovery Server Option Driver
.
.
.
from generating the periodic heartbeat message will be a detectable failure. The Recovery Server
.
.
.
Option Driver can be obtained from the Compaq Support Software Diskette for Microsoft
.
.
.
Windows NT (Windows NT SSD).
.
.
.
.
.
.
Normal Operation
.
.
.
.
.
Figure 3 illustrates normal operation of Standby Recovery Server. Both the primary and recovery
.
.
.
servers are attached to the same network. The primary R/3-Database server supports users
.
.
.
attached to it via the network and the standby recovery server is idle.
.
.
.
.
Under normal operation, as soon as the recovery server has completed its power-on self test
.
.
.
(POST) sequence, it executes the Compaq Recovery Agent contained in the system ROM BIOS.
.
.
.
The Recovery Agent monitors a periodic “heartbeat message” transmitted by the Recovery Server
.
.
.
Option Driver to the recovery server via the Recovery Server Interconnect. At this point, no
.
.
.
operating system is loaded on the recovery server. Thus, the standby recovery server is
.
.
.
electronically attached to the network but is not accessible via the network. Its only function is to
.
.
.
wait for the R/3-Database server to fail.
.
.
.
.
The receipt of the heartbeat message within a configured time-out period indicates that the
.
.
.
R/3-Database server is functioning properly. The recovery server responds to each heartbeat
.
.
.
with an acknowledgment message across the serial connection. As long as the recovery server
.
.
.
continues to receive heartbeats according to schedule, it remains in the idle mode.
.
.
.
.
The Recovery Server Option is an extension of the Automatic Server Recovery (ASR) functions
.
.
.
currently supported in Compaq ProLiant servers. See the Compaq Hardware documentation that
.
.
.
came with the server for more information about ASR.
Switchover from the R/3-Database primary server to the standby recovery server occurs when
.
.
the R/3-Database server fails. If the recovery server does not receive a heartbeat message within
.
.
.
the time-out value set by the system configuration utility, the recovery server presumes that the
.
.
.
R/3- Database server has failed. (Loss of the heartbeat message could occur either because the
.
.
.
R/3- Database server has failed or because the connection of the Recovery Server Interconnect
.
.
.
cable has been broken.)
.
.
.
.
.
The switchover events occur as follows:
.
.
.
.
1. The Compaq Recovery Agent in the system ROM BIOS sends commands over the SCSI
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
bus to the Recovery Server Switch installed in the common set of ProLiant Storage
Systems. These commands cause the switch to disconnect the storage drives electrically
from the primary R/3-Database server and then to connect them electrically to the standby
recovery server.
2. The standby recovery server proceeds through a normal boot sequence using the disk
storage that was previously attached to the R/3-Database server.
3. Because the servers are identically configured, when the boot process is completed, the
recovery server assumes the logical network identity that was previously held by the
primary R/3-Database server.
4. The Application Server R/3 instances are restarted.
5. At this point, after restarting the database and R/3, the users can log on to R/3.
With the Compaq automated switchover process, the recovery server becomes the active server
and is back on-line in a matter of minutes, without administrator intervention.
R/3 and Database
Recovery Server
Option Driver
Windows NT
SMART-2
ProLiant
ProLiant Storage System
SCSI Cable
88
Network
Recovery Server
Interconnect
SCSI Cable
Recovery Server Switch
All storage is located
here and contains
database and R/3
System ROM BIOS
.
Recovery Agent
No OS loaded
SMART-2
Recovery Server
ProLiant
WWHITE HITE PPAPERAPER (cont.)
.
.
Figure 4 illustrates a standby recovery server configuration after the switchover has occurred.
.
.
.
The recovery server has assumed the function of the R/3-Database server. The R/3-Database
.
.
.
server has completed an ASR reboot and is waiting to be serviced. The effects of server failure
.
.
.
Doc Number 465A/1196
and switchover on clients are discussed later in this paper, in the section entitled “Client
.
.
.
Behavior.” See the Compaq Hardware documentation that came with the server for more
.
.
.
information about the ASR reboot.
.
.
.
.
If power is lost to both servers in a Standby Recovery Server configuration, the R/3-Database
.
.
.
server will not boot in an unattended manner when the power is restored. An external power
.
.
.
failure of this type will be recorded in the R/3-Database server NVRAM as a server failure
.
.
.
requiring service, not as a power outage. Thus, when the R/3-Database server is powered on, the
.
.
.
administrator is prompted to run diagnostics or to press F8F8 to continue a normal boot sequence.
.
.
.
This illustrates the importance of an uninterruptible power supply.
.
.
.
.
If the system is unattended when the power is restored, the recovery server times out, switches the
.
.
.
storage disks, and boots from the disks because the R/3-Database server is not sending the
The Standby Recovery Server is designed for business-critical servers that cannot sustain
Doc Number 465A/1196
.
.
.
periods of downtime exceeding several minutes. The time required for the recovery server to
.
.
.
assume the function of the R/3-Database server is the sum of the following six factors:
.
.
.
.
1. The time that elapses from the moment at which a failure occurs in the primary processor to
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
the moment at which that failure manifests itself in the loss of a heartbeat message. This
time period may be very short (a few seconds) in the case of catastrophic failures such as
loss of the processor, or it may be relatively long (several minutes) in the case of certain
software failures.
2. The defined time-out period that the Recovery Agent in the system ROM BIOS waits for a
heartbeat message before initiating a switchover is the ASR time-out value. It is set in the
system configuration with a default value of 10 minutes. Available values range from 5 to 30
minutes.
3. Once a switchover has been initiated, the time required to initialize the SMART
Controllers and begin the Windows NT operating system boot process from the drives,
which by this time are electrically connected to the recovery server. This is typically
between 2 and 4 minutes.
4. The time required for the Windows NT operating system to boot. This is dependent upon
the size and number of disk drives that are attached, but is usually accomplished within 3
minutes.
5. The time required for the database to start and recover from the previous failure once the
Windows NT operating system is active and the time required for R/3 to start and be
available to users. This phase depends on the length of the database recovery period, which
is difficult to predict, but generally takes less than 5 minutes.
6. The time for the users to login.
Faults
Many factors affect server operation. In the Standby Recovery Server configuration, several types
of faults can occur such as the following:
• Failures of the R/3-Database server, the types of faults for which the Recovery Server Option
was designed
• Loss of heartbeat resulting from serial cable problems, not from server problems
• Failures that affect operation of the R/3-Database server but do not cause a switchover
1010
WWHITE HITE PPAPERAPER (cont.)
.
.
.
.
.
Failure DetectionFailure Detection
.
.
.
.
The failure detection mechanism in the Standby Recovery Server is based on the Recovery Server
.
Doc Number 465A/1196
.
.
Option Driver software that runs in the R/3-Database server. As long as the recovery server
.
.
.
receives the heartbeat message within the time-out period, it presumes that the R/3-Database
.
.
.
server has not failed. Any failure in the R/3-Database server that stops the Recovery Server
.
.
.
Option Driver from generating the periodic heartbeat message will be a detectable failure.
.
.
.
Examples of detectable failures include:
.
.
.
.
• Catastrophic and unrecoverable hardware failure in the R/3-Database server such as loss of
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
the processor or uncorrectable memory errors
• Loss of the R/3-Database server power supply
Generally, any failure that is detected by ASR will be detected and acted upon by the recovery
server.
NOTE: There is a class of failures that causes the R/3-Database server to malfunction
without causing loss of the heartbeat message. For example, failure of the Network
Interface Controller could render the R/3-Database server unusable, but the Recovery
Server Option Driver would still send the heartbeat message to the recovery server.
Failures of this type cannot be detected by the recovery server; therefore, an automatic
switchover will not occur. Generally, the failures detected by the recovery server are the
same ones that are detected by the ASR mechanism.
The Recovery Server Interconnect can experience three types of failures. These failures and the
behavior they cause are described as follows.
NOTE: This discussion assumes that Compaq Insight Manager is being used.
• R/3-Database Server Cable Failure
If the Recovery Server Interconnect is disconnected from the R/3-Database server, the
recovery server cannot receive the heartbeat message. The Recovery Server Option Driver
in the R/3-Database server can detect this condition. It sends an Insight Manager alarm
indicating that the R/3-Database server has detected a cable fault and that it is shutting
down the Windows NT operating system in anticipation of the switchover that will occur
because the recovery server is no longer receiving the heartbeat message.
• Recovery Server Cable Failure
If the Recovery Server Interconnect is disconnected from the recovery server, the recovery
server detects this condition and does not attribute loss of the heartbeat message to failure of
the R/3-Database server. Because the R/3-Database server can no longer receive the
acknowledgment message from the recovery server, however, the R/3-Database server sends
an Insight Manager alarm indicating possible failure of the recovery server.
• Damaged Cable
If the Recovery Server Interconnect is physically cut, the heartbeat message and the
acknowledgment message cannot travel between the R/3-Database server and the recovery
server. Loss of the acknowledgment message causes the R/3-Database server to send an
Insight Manager alarm indicating possible failure of the recovery server. Meanwhile, loss of
the heartbeat message for longer than the time-out period causes the recovery server to
switch the storage disks from the R/3-Database server to the recovery server and then boot.
1111
WWHITE HITE PPAPERAPER (cont.)
.
.
Upon failure, the R/3-Database server normally becomes totally inactive. In the case of a
.
.
.
.
.
.
.
.
Doc Number 465A/1196
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
damaged cable, however, the R/3-Database server continues running after its connection to
the storage disks has been lost and the recovery server has booted. The Windows NT
operating system on the R/3-Database server can no longer function correctly, but the
network protocol portion of the Windows NT operating system is still active.
When the recovery server boots, it presents the same network identification as that used by
the original R/3-Database server. As a result, network clients might not be able to log in to
the server because both the primary and recovery servers are using the same network
identification.
This type of failure is unlikely and is preventable with simple precautionary steps to protect
the serial cable and its connections. Screw the serial cable down securely; and for maximum
cable protection, rack mount the servers.
Servicing the Failed Server
To re-establish Standby Recovery Server operation after a switchover, a failed R/3-Database
server must be repaired or replaced and brought back on-line. The Standby Recovery Server
makes it possible for the system administrator to schedule service on the R/3-Database server at
a convenient time while the recovery server is active. The R/3-Database server hardware can be
serviced on site or off site.
Once a switchover occurs, no drives are electrically attached to the disk controllers in the
R/3-Database server For this reason, there might be some constraints on diagnostic activities
that can be performed on the failed R/3-Database server on site. However, by disconnecting the
R/3-Database server from the Recovery Server Interconnect and adding other drives to the
R/3-Database server, full on-site diagnosis can be performed on the failed R/3-Database server
while the recovery server is running, The failed R/3-Database server can also be disconnected
from the recovery server and the ProLiant Storage System and moved off site for service.
Restoring the Configuration After Switchover
After the R/3-Database server has been serviced or replaced, restore the original configuration.
The recovery server and ProLiant Storage Systems must be power cycled to reinitialize the
Recovery Server Switch. The disk drives will be electrically connected to the original R/3Database server and it will boot the Windows NT operating system. The recovery server will
return to its role of listening for the heartbeat message from the R/3-Database server.
After setting up and configuring both the primary and recovery servers, verify that both servers
operate correctly and will switch over when needed.
Client Behavior
When a failure of the R/3-Database server occurs, users attached to it experience a service
outage. The length of this outage is described previously in section “Switchover Time.” The
symptoms experienced by the users vary depending on whether their dialog instance was on the
R/3-Database server or on a dedicated application server. In the former case, an error message
displays, communicating to the user that the application server has been shutdown. In the latter
case, the SAPGUI will become unresponsive.
1212
WWHITE HITE PPAPERAPER (cont.)
.
.
.
.
.
.
Disk Subsystem Considerations
.
.
.
.
The following sections discuss disk subsystem considerations, which include disk integrity, disk
Doc Number 465A/1196
.
.
.
volume configuration for Windows NT, and performance considerations.
.
.
.
.
Disk IntegrityDisk Integrity
.
.
.
.
.
Failure of the R/3-Database server can be caused by several different conditions ranging from
.
.
.
software faults in the Windows NT operating system to hardware failure. Depending on the
.
.
.
nature of the fault and the disk activities occurring at the time of the fault, the disk data
.
.
structures can be corrupted and might require corrective processing before the recovery server
.
.
.
boots the Windows NT operating system. In Microsoft Windows NT 3.5x, the disk integrity
.
.
.
check and corrective processing are performed automatically.
.
.
.
.
.
.
Disk Volume Configuration for Windows NT 3.5XDisk Volume Configuration for Windows NT 3.5X
.
.
.
.
.
Compaq recommends that the NTFS file system be used for all Windows NT disk partitions.
.
.
Additionally, Compaq recommends that the Windows NT system disk and other executables be
.
.
.
placed on a separate SMART or SMART-2 controller logical drive. Use other logical drives to
In a Standby Recovery Server configuration, the Array Accelerator, which serves as a read/write
.
.
cache for I/O requests directed to the SMART or SMART-2 Array Controller, must be disabled
.
.
.
when using a SMART controller or changed to 100% read cache when using a SMART-2
.
.
.
controller.
.
.
.
.
.
For the SMART controller, the Compaq System Configuration Utility automatically disables the
.
.
.
Array Accelerator when the SMART controller is attached to switchable disks in a Standby
.
.
.
Recovery Server configuration. For the SMART-2 controller, the Compaq Array Configuration
.
.
.
Utility automatically changes the Array Accelerator to 100% read cache when the SMART-2
.
.
.
controller is attached to switchable disks in a Standby Recovery Server configuration.
.
.
.
.
The system performance impact of changing the Array Accelerator configuration is determined
.
.
.
by the interaction of the controllers with software and other hardware in the system and by tuning
.
.
.
of the system. As a result, the performance of the overall system(s) needs to be considered to
.
.
.
determine if adjustments are required to compensate for this factor. In certain cases, changing the
.
.
.
Array Accelerator configuration will degrade system(s) performance.
.
.
.
.
For example, the database might be tuned so that it is processor constrained and not I/O
.
.
.
constrained. In this case, enabling or disabling the SMART Controller Array Accelerator would
.
.
.
have little effect on overall system performance. However, an I/O-constrained system, disabling
.
.
.
the Array Accelerator would lower the system performance. In all cases, system performance
.
.
.
should be considered when planning for a Standby Recovery Server configuration.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1313
WWHITE HITE PPAPERAPER (cont.)
.
.
.
.
SETTING UP A STANDBY RECOVERY SERVER
.
.
.
.
.
The following sections discuss setting up a Standby Recovery Server, which include information
.
Doc Number 465A/1196
.
.
on system configuration, testing the configuration, and R/3- Database specific settings.
.
.
.
.
.
.
System Configuration
.
.
.
.
.
The primary and the recovery server must have identical hardware configurations, including
.
.
identical slot locations of all controller boards. Anytime there is a change to both the primary and
.
.
.
the recovery server, you must run the Compaq System Configuration Utility on each system.
.
.
.
.
.
NOTE: When configuring the SMART controller that is connected to the Recovery
.
.
.
Server Switch, set the Array Accelerator Status to Disabled on both the primary server
.
.
.
and the recovery server. When configuring the SMART-2 controller that is connected to
.
.
.
the Recovery Server Switch, set the Array Accelerator to 100% read cache on both the
.
.
.
primary server and the recovery server. Failure to properly configure the Array
.
.
.
Accelerators could result in the disk drives attached to the controller becoming corrupted
.
.
after returning to the primary server from the recovery server.
.
.
.
.
.
When configuring the recovery server, be sure to set the ASR time-out value higher than the total
.
.
.
time required for the primary server to boot and become operational.
.
.
.
.
A finite amount of time is required for the primary server to boot from the operating system and
.
.
.
become operational. If the Automatic Server Recovery (ASR) time-out value is set for less than
.
.
.
that amount of time, then the recovery server times out and triggers a switchover, even though no
.
.
.
server failure has occurred.
.
.
.
.
If the original, verified system configuration is changed, it is necessary to reconfigure the system
.
.
.
and to verify that the new configuration is correct. For example, if you add a disk drive, you must
.
.
.
reconfigure the system.
.
.
.
.
.
To reconfigure the system, follow these steps:
.
.
.
.
1. Shut down the application software and operating system on the primary server.
.
.
.
.
.
2. Turn off the primary server.
.
.
.
.
3. Turn off the recovery server.
.
.
.
.
4. Turn off the ProLiant Storage System(s).
.
.
.
.
.
5. Make the hardware changes: Add or remove disks, add or remove adapter cards, etc.
.
.
.
.
6. Power on the ProLiant Storage System(s).
.
.
.
.
.
7. Power on the primary server.
.
.
.
.
8. Run the System Configuration Utility to configure the primary server if necessary. If using a
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
SMART Controller, ensure that the Array Accelerators are disabled. If using a SMART-2
Controller, ensure that the Array Accelerators are set to 100% read cache.
NOTE: If you are using a SMART-2 Controller and you have made changes to the disk
configuration, you will need to run the Compaq Array Configuration Utility to configure the
Array Accelerator setting.
9. Verify that the application software and the operating system are functioning correctly.
10. Shut down the application software and the operating system on the primary server.
11. Turn off the primary server.
1414
WWHITE HITE PPAPERAPER (cont.)
.
.
12. Turn on the recovery server.
.
.
.
.
.
13. Press the F8 key on the recovery server to switch the storage disks manually to the
.
.
.
Doc Number 465A/1196
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
recovery server.
14. Run the System Configuration Utility to configure the recovery server if necessary. Verify
that all SMART Controller Array Accelerators are disabled. Verify that all SMART-2
Controller Array Accelerators are set to 100% read cache.
NOTE: If you are using a SMART-2 Controller and you have made changes to the disk
configuration, you will need to run the Compaq Array Configuration Utility to configure the
Array Accelerator setting.
15. Verify that the application software and the operating system are functioning correctly.
16. Shut down the application software and the operating system on the recovery server.
17. Turn off the recovery server.
18. Turn off the ProLiant Storage System(s).
19. Turn on the ProLiant Storage System(s).
20. Turn on the primary server.
21. Turn on the recovery server.
22. The primary server should boot. The recovery server should begin monitoring the
primary server.
23. Test the configuration to verify that it will switch over properly to the recovery server.
Testing the Configuration
Once you have set up and configured both the primary and recovery servers, you must verify that
both servers operate correctly and will switchover when needed. You can use two methods to
perform a switchover test, which are:
• Recommended Switchover Test
• Alternate Switchover Test
Recommended Switchover Test MethodRecommended Switchover Test Method
Compaq recommends testing a configuration by powering down the primary server while the
operating system is running. This allows the recovery server to detect that the primary server is
not available, to switch access to the storage disks from the primary server to the recovery server,
and to boot the operating system on the recovery server.
To perform this test, turn off the primary server while it is active with the operating system and
applications. After the recovery server ASR time-out period expires, the recovery server switches
the storage system(s) from the primary to the recovery server. The recovery server then boots
from the storage disks. This test verifies the configuration and demonstrates the effect of the
failure and switchover event.
Alternate Switchover Test MethodAlternate Switchover Test Method
You can also perform a manual switchover from the primary server to the recovery server.
1515
WWHITE HITE PPAPERAPER (cont.)
.
.
.
.
To perform this test, follow these steps:
.
.
.
.
1. Shut down the operating system and power off the system on the primary server.
.
.
.
Doc Number 465A/1196
.
.
.
2. Press the F8 key while this message displays on the recovery server:
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Press F8 to switch now.
3. Press the Y key to confirm your selection on the recovery server.
After a brief period, the recovery server boots the operating system and assumes the role of the
primary server. If the recovery server does not boot, check your configuration and repeat the test.
R/3 Software Specific Settings
Because of the application-independence of this solution, no special SAP R/3 Database specific
configuration is required. Of course the surviving application server has to stop and start their
SAP application services to bind to the physical different machine. However, ensure that R/3 and
the database automatically start at boot time so that after a switchover no intervention is required.
You can ensure this by developing a script to perform the necessary tasks and installing it as a
service with Automatic startup. You can set this up with a Microsoft Windows NT Resource Kit
tool called SRVANY.
The following steps describe the installation procedure:
1. Copy SRVANY.EXE to your system and install it as a Windows NT service with a
meaningful name, for example:
INSTSRV R3UP c:\reskit35\srvany.exe
2. Configure as automatic via the Services applet ("Startup..." dialog) of the Control Panel.
3. Set the account for the service (the SAP administrator) via the Services applet ("Startup..."
dialog) of the Control Panel.
4. Run the Registry Editor (REGEDT32.EXE):
a) Create a “Parameters” key under the following: