The information contained herein is subject to change without notice.
The only warranties for HP products and services are set forth in the express warranty statements
accompanying such products and services. Nothing herein should be construed as constituting an additional
warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.
Printed in the US.
Confidential computer software. Valid license from HP required for possession, use or copying. Consistent
with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and
Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s standard
commercial license.
Trademark Notices
UNIX is a registered trademark of The Open Group.
2
Printing History
The printing date and part number indicate the current edition. The printing date changes when a new
edition is printed. (Minor corrections and updates which are incorporated at reprint do not cause the date to
change.) The part number changes when extensive technical changes are incorporated.
New editions of this manual will incorporate all material updated since the previous edition.
May 2005Edition 7
June 2004Edition 6
December 2003 Edition 5
July 2003Edition 4
April 2003Edition 3
February 2003Edition 2
September 2001 Edition 1
Internal Date: July 17, 2001
This guide is intended for use by system administrators and others involved in managing HP-UX system
hardware resources. It describes the installation and use of (EMS) Hardware Monitors—an important tool in
managing the operation and health of system hardware resources.
The book is organized as follows:
•Chapter 1, “Introduction,” provides a foundation for understanding what the hardware monitors are and
how they work. This material will help you use the hardware event monitors efficiently.
•Chapter 2, “Installing and Using Monitors,” describes the procedures for creating and managing
monitoring requests.
•Chapter 3, “Detailed Description,” gives a detailed picture of the components involved in hardware
monitoring, their interaction, and the files involved.
•Chapter 4, “Using the Peripheral Status Monitor,”covers the Peripheral Status Monitor (PSM), which
serves as the interface between the event-driven hardware event monitors and MC/ServiceGuard.
•Chapter 5, “Hardware Monitor Configuration Files,” describes how to control the operation of hardware
monitors by modifying the configuration files.
NOTEThe information previously contained in the chapter titled “Monitor Data Sheets,” has been
moved to the Web at http://docs.hp.com/hpux/onlinedocs/diag/ems/emd_summ.htm.
An HP-UX man page is available for each monitor. To access the man page, type:
man
monitorname
where
Typographical Conventions
This guide uses the following typographical conventions:
NOTENotes contain important information.
CAUTIONCaution messages indicate procedures which, if not observed, could result in damage to your
equipment or loss of your data.
WARNINGWarning messages indicate procedures or practices which, if not observed, could
result in personal injury.
Supporting Documentation
The following documentation contains information related to the installation and use of the hardware event
monitors:
monitorname
is the executable file listed in the data sheet.
•Support Plus: Diagnostics User's Guide - provides information on installing the EMS Hardware Monitors
13
•Managing MC/ServiceGuard (B3936-90024) - provides information on creating package dependencies for
hardware resources
•Using EMS HA Monitors (B5735-90001) - provides detailed information on using EMS to create
monitoring requests.
Note: This manual pertains to High Availability (HA) Monitors rather than to the EMS Hardware
Monitors.
Related Web sites
The following Web sites provide information on hardware monitoring.
•http://docs.hp.com/en/diag.html—the online library for information about EMS Hardware Monitors
•http://docs.hp.com/en/onlinedocs/diag/ems/emd_summ.htm—Data sheets for the hardware event
monitors
Reader Comments
We welcome your comments on our documentation. If you have editorial suggestions or recommended
improvements for this document, please write to us. You can give your feedback at the online customer
feedback web site http://www.docs.hp.com/en/feedback.html. Please include the following information in
your message:
•Title of the manual you are referencing.
•Manual part number (from the title page).
•Edition number or publication date (from the title page).
•Your name.
•Your company’s name.
Serious errors, such as technical inaccuracies that may render a program or a hardware device inoperative,
should be reported to the HP Response Center or directly to a Support Engineer.
14
Introduction
1Introduction
This chapter introduces the EMS Hardware Monitors. The topics discussed in this chapter include the
following:
•What is hardware monitoring?
•How does hardware monitoring work?
•Benefits of hardware monitoring
•Products supported by hardware monitoring
•Tips for hardware monitoring
•Hardware monitoring terms
NOTEDo I Really Need to Read This Chapter?
Although it is not essential that you read this material before using the hardware monitors, it
will help you understand how monitoring works, which in turn should help you use it
effectively. New users are strongly encouraged to read through the general overview material
before proceeding to Chapter 2, “Installing and Using Monitors”.
Chapter 1
15
Introduction
Hardware Monitoring Overview
Hardware Monitoring Overview
What is Hardware Monitoring?
Hardware monitoring is the process of watching a hardware resource (such as a disk) for the occurrence of
any unusual activity, called an event. When an event occurs, it is reported using a variety of notification
methods (such as email). Event detection and notification are all handled automatically with minimal
involvement on your part.
To achieve a high level of system reliability and availability, it is essential that you know when any system
resource is experiencing a problem. Hardware monitoring gives you the ability to detect problems with your
system hardware resources. By providing immediate detection and notification, hardware monitoring allows
you to quickly identify and correct problems—often before they impact system operation.
Another important feature of hardware monitoring is its integration with applications responsible for
maintaining system availability, such as MC/ServiceGuard. It is vital that these applications be alerted to
hardware problems immediately so they can take the necessary action to avoid system interruption.
Hardware monitoring is easily integrated with MC/ServiceGuard, and the necessary notification methods are
provided for communication with other applications such as HP OpenView.
Hardware monitoring is designed to provide a high level of protection against system hardware failure with
minimal impact on system performance. By using hardware monitoring, you can virtually eliminate
undetected hardware failures that could interrupt system operation or cause data loss.
16
Chapter 1
Introduction
Hardware Monitoring Overview
How Does Hardware Monitoring Work?
The following figure shows the basic components involved in hardware monitoring.
Figure 1-1Components Involved in Hardware Monitoring
The typical hardware monitoring process works as follows:
1. While monitoring its hardware resources, the hardware event monitor detects some type of abnormal
behavior on one of the resources.
2. The hardware event monitor creates the appropriate event message, which includes suggested corrective
action, and passes it to the Event Monitoring Service (EMS).
3. EMS sends the event message to the system administrator using the notification method specified in the
monitoring request.
4. The system administrator (or Hewlett-Packard service provider) receives the messages, corrects the
problem, and returns the hardware to its normal operating condition.
5. If the PSM has been properly configured, events are also processed by the PSM. The PSM changes the
device status to DOWN if the event is serious enough. The change in device status is passed to EMS,
which in turn alerts MC/ServiceGuard. The DOWN status will cause MC/ServiceGuard to failover any
package associated with the failed hardware resource.
NOTEThe Difference Between Hardware Event Monitoring and Hardware Status
Monitoring
Hardware event monitoring is the detection of events experienced by a hardware resource. It is
the task of the EMS Hardware Monitors to detect hardware events. Events are temporary in
the sense that the monitor detects them but does not remember them. Of course the event itself
may not be temporary—a failed disk will likely remain failed until it is replaced.
Hardware status monitoring is an extension of event monitoring that converts an event to a
change in device status. This conversion, performed by the PSM, provides a mechanism for
remembering the occurrence of an event by storing the resultant status. This persistence
provides compatibility with applications such as MC/ServiceGuard, which require a change in
device status to manage high availability packages.
Chapter 1
17
Introduction
Hardware Monitoring Overview
Benefits of Hardware Monitoring
Hardware monitoring provides the following benefits:
•Reduces system downtime by detecting hardware failures when they occur, allowing you to quickly
identify and correct problems.
•Integrates with MC/ServiceGuard and other applications responsible for maintaining system availability.
These applications can now add many hardware resources to the components they monitor.
•Minimizes the time required to isolate and repair failures through detailed messages describing what the
problem is and how to fix it.
•Includes a default monitoring configuration that offers immediate protection for your system hardware
without any intervention on your part after monitoring is enabled.
•Provides a common tool for monitoring a wide variety of system hardware resources.
•Offers a variety of notification methods to alert you when a problem occurs. You no longer need to check
the system console to determine if something has gone wrong.
•Requires minimal maintenance once installed and configured. New hardware resources added to the
system are automatically included in the monitoring structure.
18
Chapter 1
Introduction
Hardware Monitoring Overview
Products Supported by Hardware Monitors
EMS Hardware Monitors are provided for a wide range of system hardware resources. The following list
identifies the types of hardware supported by monitors at the time of publication. A detailed list of the specific
hardware products supported by each hardware monitor is included in http://docs.hp.com/en/diag/ - the
online library for information about EMS Hardware Monitors (look for “Supported Products” under EMS
Hardware Monitors).
•HP disk arrays, including AutoRAID Disk Arrays and High Availability Disk Arrays
•HP disk devices, including CD-ROM drives and MO drives
•HP SCSI tape devices, including many DLT libraries and autochangers
•HP Fibre Channel SCSI Multiplexer
•HP Fibre Channel Adapters
•HP Fibre Channel Adapter (A5158)
•High Availability Storage Systems
•HP Fibre Channel Arbitrated Loop Hubs
•HP Fibre Channel Switch
•System memory
•Core hardware
•Low Priority Machine Checks (LPMCs)
•HP-UX kernel resources
•HP Fibre Channel disk array FC60
•SCSI1, SCSI2, SCSI3 interface cards
•System information
•HP UPSs (Uninterruptible Power Systems)
•Devices supported by HP device management software (Remote Monitor)
NOTEWill new products be supported?
Hewlett-Packard's strategy is to provide monitoring for all critical system hardware resources,
including new products. For the latest information on what products are supported by EMS
Hardware Monitors, visit the hardware monitoring web pages available at
www.docs.hp.com/en/diag/ - the online library for information about EMS Hardware
Monitors (look for "Supported Products" under EMS Hardware Monitors).
Chapter 1
19
Introduction
Hardware Monitoring Overview
Tips for Hardware Monitoring
Here are some tips for using hardware monitoring.
✓ Keep hardware monitoring enabled to protect your system from undetected failures. Hardware
monitoring is an important tool for maintaining high-availability on your system. In a high-availability
environment, the failure of a hardware resource makes the system vulnerable to another failure. Until
the failed hardware is repaired, the backup hardware resource represents a single-point of failure.
Without hardware monitoring you may not be aware of the failure. But if you are using hardware
monitoring, you are alerted to the failure. This allows you to repair the failure and restore
high-availability as quickly as possible.
✓ Integrate the PSM into your MC/ServiceGuard strategy. An important feature of hardware
monitoring is its ability to communicate with applications responsible for maintaining system
availability, such as MC/ServiceGuard. The PSM allows you to integrate hardware monitoring into
MC/ServiceGuard. The PSM gives you the ability to failover a package based on an event detected by
hardware monitoring. If you are using MC/ServiceGuard, you should consider using the PSM to include
your system hardware resources in the MC/ServiceGuard strategy. In addition, the necessary notification
methods are provided for communicating with network management application such as HP OpenView.
✓ Utilize the many notification methods available. The notification methods provided by hardware
monitoring provide a great deal of flexibility in designing a strategy to keep you informed of how well your
system hardware is working. The default monitoring configuration was selected to provide a variety of
notification for all supported hardware resources. As you become familiar with hardware monitoring, you
may want to customize the monitoring to meet your individual requirements.
✓ Use e-mail and/or text file notification methods for all your requests. Both of these methods,
which are included in the default monitoring, receive the entire content of the message so you can read it
immediately. Methods such as console and syslog alert you to the occurrence of an event but do not deliver
the entire message. You are required to retrieve the message using the resdata utility, which requires an
additional step.
✓ Use the `All monitors' option when creating a monitoring request. This option enables monitoring
request to all monitors. It ensures any new class of hardware resource added to your system is
automatically monitored. This means that new hardware is protected from undetected hardware failure
with no effort on your part.
✓ Easily replicate your hardware monitoring on all your systems. Once you have implemented a
hardware monitoring strategy on one of your systems, you can replicate that same monitoring on other
systems. Simply copy all of the hardware monitor configuration files to each system that will use the same
monitoring. The monitor configuration files are found at /var/stm/config/tools/monitor. Of course,
you must have installed hardware event monitoring on each system before you copy the configuration files
to it. Be sure to enable monitoring on all systems.
20
Chapter 1
Introduction
Hardware Monitoring Overview
Hardware Monitoring Terms
The following terms are used throughout this guide. Understanding them is important when learning how
the hardware event monitors work and how to use them effectively.
Table 1-1Hardware Monitoring Terms
TermDefinition
Asynchronous event
detection
Default monitoring
request
Event Monitoring
Service (EMS)
EMS Hardware
Monitors
EMS High
Availability (HA)
Monitors
The ability to detect an event at the time it occurs. When an
event occurs the monitor is immediately aware of it. This
method provides quicker notification response than polling.
The default monitoring configuration created when the EMS
Hardware Monitors are installed. The default requests
ensure that a complete level of protection is automatically
provided for all supported hardware resources.
The application framework used for monitoring system
resources on HP-UX 10.20 and 11.x. EMS Hardware
Monitors use the EMS framework for reporting events and
creating PSM monitoring requests. The EMS framework is
also used by EMS High Availability Monitors.
The monitors described in this manual. They monitor
hardware resources such as I/O devices (disk arrays, tape
drives, etc.), interface cards, and memory. They are
distributed on the Support Plus Media and are managed
with the Hardware Monitoring Request Manager
(monconfig).
These monitors are different from EMS Hardware Monitors
and are not described in this manual. They monitor disk
resources, cluster resources, network resources and system
resources. They are designed for a high availability
environment and are available at additional cost. For more
information, refer to Using EMS HA Monitors, which is
available at http://docs.hp.com/en/ha.html.
Event severity levelEach event that occurs within the hardware is assigned a
severity level, which reflects the impact the event may have
on system operation. The severity levels provide the
mechanism for directing event notification. For example, you
may choose a notification method for critical events that will
alert you immediately to their occurrence, and direct less
important events to a log file for examination at your
convenience. Also, when used with MC/ServiceGuard to
determine failover criteria, severe and critical events cause
failover.
Hardware eventAny unusual or notable activity experienced by a hardware
resource. For example, a disk drive that is not responding, or
a tape drive that does not have a tape loaded. When any such
activity occurs, the occurrence is reported as an event to the
event monitor.
Chapter 1
21
Introduction
Hardware Monitoring Overview
Table 1-1Hardware Monitoring Terms (Continued)
TermDefinition
Hardware event
monitor
Hardware resourceA hardware device used in system operation. Resources
MC/ServiceGuardHewlett-Packard's application for creating and managing
A monitor daemon that gathers information on the
operational status of hardware resources. Each monitor is
responsible for watching a specific group or type of hardware
resources. For example, the tape monitor handles all tape
devices on the system. The monitor may use polling or
asynchronous event detection for tracking events.
Unlike a status monitor, an event monitor does not
“remember” the occurrence of an event. It simply detects and
reports the event. An event can be converted into a more
permanent status condition using the PSM.
supported by hardware monitoring include mass storage
devices such as disks and tapes, connectivity devices such as
hubs and multiplexors, and device adapters.
High Availability clusters of HP 9000 Series 800 computers.
A High Availability computer system allows application
services to continue in spite of a hardware or software
failure. Hardware monitoring integrates with
MC/ServiceGuard to ensure that hardware problems are
detected and reported immediately, allowing
MC/ServiceGuard to take the necessary action to maintain
system availability. MC/ServiceGuard is available at
additional cost
Monitoring requestA group of settings that define how events for a specific
monitor are handled by EMS. A monitoring request identifies
the severity levels of interest and the type of notification
method to use when an event occurs. A monitoring request is
applied to each hardware device (or instance) supported by
the monitor.
Monitoring requests are created for hardware events using
the Hardware Monitoring Request Manager. Monitoring
requests are created for changes in hardware status using
the EMS GUI.
Multiple-viewAs of the HP-UX 11.00/10.20 June 2000 release (IPR 0006),
certain monitors will allow event reporting to be tailored for
different targets (clients). This “multiple-view”
(“Predictive-enabled”) feature will be added to all hardware
monitors in future releases. Previously, hardware monitors
generated events the same way for all targets. The problem
is that different targets, such as HP Support Applications,
may have different requirements for events.
22
Chapter 1
Table 1-1Hardware Monitoring Terms (Continued)
TermDefinition
Introduction
Hardware Monitoring Overview
Peripheral Status
Monitor (PSM)
PollingThe process of connecting to a hardware resource at regular
Predictive-enabledSee “multiple-view.” This feature enables hardware monitors
Resource instanceA specific hardware device. The resource instance is the last
Resource pathHardware event monitors are organized into classes (and
Included with the hardware event monitors, the PSM is a
monitor daemon that acts as a hardware status monitor by
converting events to changes in hardware resource status.
This provides compatibility with MC/ServiceGuard, which
uses changes in status to manage cluster resources. Through
the EMS GUI, the PSM is also used to create hardware
status monitoring requests.
intervals to determine its status. Any events that occur
between polling intervals will not be detected until the next
poll, unless the monitor supports asynchronous event
monitoring.
to work with HP Support Applications.
element of the resource path and is typically the hardware
path to the resource (e.g., 10_12_5.0.0), but it may also be a
product ID as in the case of AutoRAID disk arrays. There
may be multiple instances for a monitor, each one
representing a unique hardware device for which the
monitor is responsible.
subclasses) for creating monitoring requests. These classes
identify the unique path to each hardware resource
supported by the monitor. Two similar resource paths exist
for each hardware resource—an event path used for creating
event monitoring requests, and a status path used for
creating PSM monitoring requests.
Chapter 1
23
Introduction
Hardware Monitoring Overview
24
Chapter 1
Installing and Using Monitors
2Installing and Using Monitors
This chapter instructs you how to use the EMS Hardware Monitors to manage your hardware resources. The
topics discussed in this chapter include:
•An overview of the steps involved
•Installing EMS Hardware Monitors
•Adding and managing monitor requests
•Disabling and enabling EMS Hardware Monitors
NOTEYou don't need to completely understand the terms and concepts to begin protecting your
system with EMS Hardware Monitors by following the procedures in this chapter. If a term or
concept puzzles you, refer to Chapter 1, “Introduction,” or to Chapter 3, “Detailed Description.”
Chapter 2
25
Installing and Using Monitors
The Steps Involved
The Steps Involved
The steps involved in installing and configuring hardware monitoring are shown in Figure 2-1 on page 27.
Each step is described in detail in this chapter on the page indicated. Installation of Support Tools is
necessary if you have Diagnostic/IPR Media release earlier than the June 1999 release only. With HP-UX 11i,
the Support Tools are automatically installed when the OS is installed.
Step 1: Install the Support Tools from the most current copy of Support Plus Media you can find. You can
also download this package over the Web. See “Installing EMS Hardware Monitors”. This step is
necessary if you have Diagnostic/IPR Media release earlier than the June 1999 release only.
Step 2: Examine the list of supported products to see if any of your devices has special requirements in
order to be monitored. For example, if monitoring FC-AL hubs, edit the file:
/var/stm/config/tools/monitor/dm_fc_hub. See “Fibre Channel Arbitrated Loop Hub Monitor”.
Step 3: Enable hardware event monitoring. See “Enabling Hardware Event Monitoring”. This step is
necessary if you have Diagnostic/IPR Media release earlier than the June 1999 release only.
Step 4: Determine whether default monitoring requests are adequate. See “Viewing Current Monitoring
Requests”.
Step 5: Add or modify monitoring requests as necessary. See “Adding a Monitoring Request” and
“Modifying Monitoring Requests”.
Step 6: If desired, verify monitor operation (recommended, but optional). See “Verifying Hardware Event
Monitoring”.
NOTEHow Long Will it Take to Get Hardware Monitoring Working? (For Diagnostic/IPR
Media released earlier than the June 1999 release only.)
You can get hardware monitoring installed and working in minutes. Once the software is
installed, you simply need to run the Hardware Monitoring Request Manager and enable
monitoring. The default hardware monitoring configuration should meet your monitoring
requirements without any changes or modifications. If you find that the default monitoring
should be customized, you can always return later and add or modify monitoring requests as
needed.
NOTEIf I'm Already Using EMS HA Monitors, Can I Also Use the EMS GUI to Manage
Hardware Monitoring?
For the most part, no. Hardware event monitoring is managed using the Hardware Monitoring
Request Manager, which serves the same function the EMS GUI serves for the EMS HA
monitors. The only portion of hardware monitoring that is managed using the EMS GUI is
status monitoring done using the PSM described in Chapter 4, “Using the Peripheral Status
Monitor.”
26
Chapter 2
Installing and Using Monitors
The Steps Involved
Figure 2-1The Steps for Installing and Configuring Hardware Monitoring
Chapter 2
27
Installing and Using Monitors
Installing EMS Hardware Monitors
Installing EMS Hardware Monitors
The EMS Hardware Monitors software is distributed with the Support Tools (diagnostics). All the necessary
files for hardware monitoring are installed automatically when the Support Tools are installed. There are
several different ways that the Support Tools are installed:
•The Support Plus Media: installing the OnlineDiag depot from the Support Plus Media using swinstall.
•HP Software Depot website: downloading the “Support Tools for the HP 9000” in the “Enhancement
Releases” product category, then using swinstall to install the OnlineDiag depot.
•Automatic: with HP-UX 11i, the Support Tools are automatically installed from the OE CD-ROM when
the operating system is installed.
Complete instructions for installing STM are contained in Chapter 5 of the Support Plus: Diagnostics User'sGuide.
The following software components are installed for hardware monitoring:
•All hardware event monitors
•Monitor configuration files
•Monitoring Request Manager
•EMS framework, including the EMS graphical interface
All EMS Hardware Monitors on the CD-ROM will be installed on your system, but only those that support
hardware resources you are using will be active. If you add a new hardware resource to your system that uses
an installed monitor, the monitor will be launched when the system is restarted or following the execution of
the IOSCAN utility (which performs a real/hard ioscan).
NOTEReinstalling or upgrading the STM software will erase the current PSM configuration. Any
MC/ServiceGuard package dependencies or EMS monitoring requests you have created with
the PSM will be lost. Before reinstalling the STM software, record the current PSM
configuration so you can easily recreate it after the software has been installed. Or you can
comment out the PSM dependencies in the ServiceGuard configuration files, then re-enable
them after the STM software has been installed.
IOSCAN Utility
When you execute the IOSCAN utility, a “real/hard” ioscan is performed. The utility performs a scan of your
system hardware, gathering the most-current information.
Conversely, ‘ioscan -k’ is used by hardware monitors and diagnostics to obtain their information about
configured devices. The data returned by ‘ioscan -k’ is only as accurate as the last system reboot, or when a
“real/hard” ioscan is executed. This means that if a device or component is added to / removed from the
system, a “real/hard” ioscan should be executed in order to ensure an updated IOSCAN table in the kernel for
use by the hardware monitors and diagnostics. Otherwise, the hardware monitors and diagnostics will
operate on a stale, inaccurate picture of the system’s configuration.
Supported System Configuration
To use the hardware event monitors, your system must meet the following requirements:
28
Chapter 2
Installing and Using Monitors
Installing EMS Hardware Monitors
•HP 9000 Series 700 or 800 Computer
•HP-UX 10.20 or 11.x (Hardware event monitoring is not currently available on the special high security
systems, HP-UX 10.26 (TOS) and HP-UX 11.04 (VVOS).
•Support Plus Media, the more current the better. The hardware event monitors were first distributed in
the HP-UX 10.20/11.00 February 1999 release (IPR 9902). Before the September 1999 release, the
Support Plus Media was called the Diagnostic/IPR Media.
Rather than use the Support Plus Media, you can download the Support Tools (including STM and the
hardware event monitors) over the Web. See Chapter 5 of the Support Plus: Diagnostics User's Guide for
more information
•If you are using MC/ServiceGuard (optional), you must have version A.10.11 on HP-UX 10.20, or version
A.11.04 for HP-UX 11.x.
Removing EMS Hardware Monitors
The hardware monitoring software can be removed using the swremove utility. Run swremove and select the
OnlineDiag bundle. This will remove the hardware monitoring software components and the STM software
components.
Chapter 2
29
Installing and Using Monitors
Checking for Special Requirements
Checking for Special Requirements
Some devices have special requirements in order to be monitored. Examine the tables of supported products
below to see if any of your devices have special requirements.
Table 2-1Disk Arrays
Product
HP AutoRAID Disk Array
Supported by: AutoRAID Disk Array
Monitor
HP High Availability Disk Array
Supported by: High-Availability Disk
Array Monitor
HP Fast/Wide SCSI Disk Array
Supported by: Fast/Wide SCSI Disk
Array Monitor
HP Fibre Channel High Availability
Disk Array (Model 60/FC)