HP EMS User Manual

EMS Hardware Monitors User's Guide

Manufacturing Part Number: B6191-90028

September 2001

© Copyright 2001 Hewlett-Packard Company

Legal Notices

The information contained in this document is subject to change without notice.

Hewlett-Packard makes no warranty of any kind with regard to this manual, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Hewlett-Packard shall not be
liable for errors contained herein or direct, indirect, special, incidental or consequential damages in connection with the furnishing, performance, or use of this material.
Copyright © 2001 Hewlett-Packard Company. This document contains information which is protected by copyright. All rights are reserved. Reproduction,
adaptation, or translation without prior written permission is prohibited, except as allowed under the copyright laws.

Corporate Offices:

Hewlett-Packard Co. 3000 Hanover St. Palo Alto, CA 94304
Use, duplication or disclosure by the U.S. Government Department of Defense is subject to restrictions as set forth in paragraph (b)(3)(ii) of the Rights in Technical Data and Software clause in FAR 52.227-7013.
Rights for non-DOD U.S. Government Departments and Agencies are as set forth in FAR 52.227-19(c)(1,2). Use of this manual and flexible disc(s), compact disc(s), or tape cartridge(s) supplied for this pack is restricted
to this product only. Additional copies of the programs may be made for security and back-up purposes only. Resale of the programs in their present form or with alterations, is expressly prohibited.
A copy of the specific warranty terms applicable to your Hewlett-Packard product and replacement parts can be obtained from your local Sales and Service Office.
© Copyright 1980, 1984, 1986 AT&T Technologies, Inc. UNIX and System V are registered trademarks of AT&T in the USA and other countries.
UNIX is a registered trademark in the United States and other countries, licensed exclusively through X/Open Company Limited.
© Copyright 1979, 1980, 1983, 1985-1990 Regents of the University of California. This software is based in part on the Fourth Berkeley Software Distribution under license from the Regents of the University of California.
Copyright © The Regents of the University of Colorado, a body corporate 1979 This document has been reproduced and modified with the permission of the Regents of the University of
Colorado, a body corporate. PostScript is a trademark of Adobe Systems, Inc. Ethernet is a trademark of Xerox Corporation. © Copyright 1985-1986, 1988 Massachusetts Institute of Technology. X Window System is a trademark of the
Massachusetts Institute of Technology. MS-DOS and Microsoft are U.S. registered trademarks of Microsoft Corporation. OSF/Motif is a trademark of the Open Software Foundation, Inc. in the U.S. and other countries. Certification
for conformance with OSF/Motif user environment pending. AutoRAID(tm) is a trademark of Hewlett-Packard Company for the Advanced Disk Array product.
2

Printing History

September 2001 Edition 1 The printing date and part number indicate the current edition. The printing date changes when a new
edition is printed. (Minor corrections and updates which are incorporated at reprint do not cause the date to change.) The part number changes when extensive technical changes are incorporated.
New editions of this manual will incorporate all material updated since the previous edition. Internal Date: July 17, 2001

Event Management Lab Hewlett-Packard Co. 19091 Pruneridge Ave. Cupertino, CA 95014

3
4
1. Introduction
Hardware Monitoring Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
What is Hardware Monitoring? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
How Does Hardware Monitoring Work? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Benefits of Hardware Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Products Supported by Hardware Monitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Tips for Hardware Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Hardware Monitoring Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2. Installing and Using Monitors
The Steps Involved . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Installing EMS Hardware Monitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Supported System Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Removing EMS Hardware Monitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Checking for Special Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Using Hardware Monitoring Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
What Is a Monitoring Request? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Some Monitoring Request Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Running the Monitoring Request Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Enabling Hardware Event Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Default Monitoring Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Listing Monitor Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Viewing Current Monitoring Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Adding a Monitoring Request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Example of Adding a Monitoring Request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Modifying Monitoring Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Verifying Hardware Event Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Checking Detailed Monitoring Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Retrieving and Interpreting Event Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Sample Event Message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Deleting Monitoring Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Disabling Hardware Event Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Contents
3. Detailed Description
The Detailed Picture of Hardware Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Components from Three Different Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Hardware Monitoring Request Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
EMS Hardware Event Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5
Contents
Polling or Asynchronous? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Startup Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Peripheral Status Monitor (PSM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Event Monitoring Service (EMS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
File Locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Startup Process (in Detail) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Asynchronous Event Detection (in Detail) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Event Polling (in Detail) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4. Using the Peripheral Status Monitor
Peripheral Status Monitor Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
How Does the PSM Work? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
PSM Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
PSM States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
PSM Resource Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Configuring MC/ServiceGuard Package Dependencies with the PSM . . . . . . . . . . . . 71
Configuring Package Dependencies using SAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Configuring Package Dependencies by Editing the Configuration File . . . . . . . . . . 73
Creating EMS Monitoring Requests for PSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Monitoring Request Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Specifying When to Send Event - <Notify>. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Determining the Frequency of Events - <Options> . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Setting the Polling Interval - <Polling Interval> . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Selecting Protocols for Sending Events - <Notify Via> . . . . . . . . . . . . . . . . . . . . . . . 76
Adding a Notification Comment - <Comment> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Copying Monitoring Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Modifying Monitoring Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Removing Monitoring Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Viewing Monitoring Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Using the set_fixed Utility to Restore Hardware UP State . . . . . . . . . . . . . . . . . . . . . 83
5. Hardware Monitor Configuration Files
Monitor Configuration Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
File Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Considerations for Modifying the Monitor Configuration File Settings . . . . . . . . . . 88
6
Sample Global Configuration File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Sample Monitor-Specific Configuration File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Startup Configuration File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
File Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Considerations for Modifying the Startup Configuration File Settings . . . . . . . . . . 93
Default File Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Peripheral Status Monitor (PSM) Configuration File . . . . . . . . . . . . . . . . . . . . . . . . . . 96
File Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Considerations for Modifying the PSM Configuration File . . . . . . . . . . . . . . . . . . . . 97
Example File Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6. Special Procedures
Fibre Channel Arbitrated Loop Hub Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Supported Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Special Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Resource Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Executable File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Monitor Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
PSM State Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Initial Monitor Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Adding or Removing an FC-AL Hub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Configuration Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Fibre Channel Switch Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Supported Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Special Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Resource Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Executable File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Monitor Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
PSM State Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Initial Monitor Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Adding or Removing an FC Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Configuration Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Contents
7
Contents
8

About This Manual

This guide is intended for use by system administrators and others involved in managing HP-UX system hardware resources. It describes the installation and use of EMS Hardware Monitors—an important tool in managing the operation and health of system hardware resources.

The book is organized as follows:

• Chapter 1, “Introduction,” provides a foundation for understanding what the hardware monitors are and how they work. Although not required reading, this material will help you use the hardware event monitors efficiently.
• Chapter 2, “Installing and Using Monitors,” describes the procedures for creating and managing monitoring requests.
• Chapter 3, “Detailed Description,” gives a detailed picture of the components involved in hardware monitoring, their interaction, and the files involved.
• Chapter 4, “Using the Peripheral Status Monitor,”covers the PSM, which serves as the interface between the event-driven hardware event monitors and MC/ServiceGuard.
• Chapter 5, “Hardware Monitor Configuration Files,” describes how to control the operation of hardware monitors by modifying the configuration files.
• Chapter 6, “Special Procedures,” describes monitor-specific tasks.
NOTE The information previously contained in the chapter entitled “Monitor Data Sheets,” has been
moved to the Web at http://docs.hp.com/hpux/onlinedocs/diag/ems/emd_summ.htm. An HP-UX man page is available for each monitor. To access the man page, type:
monitorname
man
where

Typographical Conventions

This guide uses the following typographical conventions:

NOTE Notes contain important information.

CAUTION Caution messages indicate procedures which, if not observed, could result in damage to your
equipment or loss of your data.
WARNI NG Warning messages indicate procedures or practices which, if not observed, could
result in personal injury.
monitorname
is the executable file listed in the data sheet.
9

Supporting Documentation

The following documentation contains information related to the installation and use of the hardware event monitors:
Support Plus: Diagnostics User's Guide - provides information on installing the EMS Hardware Monitors.
Managing MC/ServiceGuard (B3936-90024) - provides information on creating package dependencies for hardware resources.
Using EMS HA Monitors (B5735-90001) - provides detailed information on using EMS to create monitoring requests. Note: this manual pertains to high availability (HA) monitors rather than to the hardware monitors.

Related Web Site

The following web site provides information on hardware monitoring.

http://docs.hp.com/hpux/diag—the online library for information about EMS Hardware Monitors
http://docs.hp.com/hpux/onlinedocs/diag/ems/emd_summ.htm—Data sheets for the hardware event monitors

Reader Comments

We welcome your comments about our documentation. If you have editorial suggestions or recommended improvements for this document, please write to us. You can reach us through e-mail at
dst-support@am.exch.hp.com, or by sending your letter to: Documentation Manager M/S 42L9, Hewlett Packard Company, 19483 Pruneridge Ave., Cupertino, CA 95014-0604. USA. Please include the following
information in your message:
• Title of the manual you are referencing.
• Manual part number (from the title page).
• Edition number or publication date (from the title page).
•Your name.
• Your company’s name.
Serious errors, such as technical inaccuracies that may render a program or a hardware device inoperative, should be reported to your HP Response Center or directly to a Support Engineer.
10

1 Introduction

This chapter introduces the EMS Hardware Monitors. The topics discussed in this chapter include:

• What is hardware monitoring?

• How does hardware monitoring work?
• Benefits of hardware monitoring
• Products supported by hardware monitoring
• Tips for hardware monitoring
• Hardware monitoring terms

NOTE Do I Really Need to Read This Chapter?

Although it is not essential that you read this material before using the hardware monitors, it will help you understand how monitoring works, which in turn should help you use it effectively. New users are strongly encouraged to read through the general overview material before proceeding to Chapter 2, “Installing and Using Monitors”.

Introduction

Chapter 1
11
Introduction

Hardware Monitoring Overview

Hardware Monitoring Overview

What is Hardware Monitoring?

Hardware monitoring is the process of watching a hardware resource (such as a disk) for the occurrence of any unusual activity, called an event. When an event occurs, it is reported using a variety of notification methods (such as email). Event detection and notification are all handled automatically with minimal involvement on your part.
To achieve a high level of system reliability and availability, it is essential that you know when any system resource is experiencing a problem. Hardware monitoring gives you the ability to detect problems with your system hardware resources. By providing immediate detection and notification, hardware monitoring allows you to quickly identify and correct problems—often before they impact system operation.
Another important feature of hardware monitoring is its integration with applications responsible for maintaining system availability, such as MC/ServiceGuard. It is vital that these applications be alerted to hardware problems immediately so they can take the necessary action to avoid system interruption. Hardware monitoring is easily integrated with MC/ServiceGuard, and the necessary notification methods are provided for communication with other applications such as HP OpenView.
Hardware monitoring is designed to provide a high level of protection against system hardware failure with minimal impact on system performance. By using hardware monitoring, you can virtually eliminate undetected hardware failures that could interrupt system operation or cause data loss.
12
Chapter 1
Introduction
Hardware Monitor ing Overvie w

How Does Hardware Monitoring Work?

The following figure shows the basic components involved in hardware monitoring.
Figure 1-1 Components Involved in Hardware Monitoring
The typical hardware monitoring process works as follows:
1. While monitoring its hardware resources, the hardware event monitor detects some type of abnormal behavior on one of the resources.
2. The hardware event monitor creates the appropriate event message, which includes suggested corrective action, and passes it to the Event Monitoring Service (EMS).
3. EMS sends the event message to the system administrator using the notification method specified in the monitoring request.
4. The system administrator (or Hewlett-Packard service provider) receives the messages, corrects the problem, and returns the hardware to its normal operating condition.
5. If the Peripheral Status Monitor (PSM) has been properly configured, events are also processed by the PSM. The PSM changes the device status to DOWN if the event is serious enough. The change in device status is passed to EMS, which in turn alerts MC/ServiceGuard. The DOWN status will cause MC/ServiceGuard to failover any package associated with the failed hardware resource.
NOTE The Difference Between Hardware Event Monitoring and Hardware Status
Monitoring
Hardware event monitoring is the detection of events experienced by a hardware resource. It is the task of the EMS Hardware Monitors to detect hardware events. Events are temporary in the sense that the monitor detects them but does not remember them. Of course the event itself may not be temporary—a failed disk will likely remain failed until it is replaced.
Hardware status monitoring is an extension of event monitoring that converts an event to a change in device status. This conversion, performed by the Peripheral Status Monitor, provides a mechanism for remembering the occurrence of an event by storing the resultant status. This persistence provides compatibility with applications such as MC/ServiceGuard, which require a change in device status to manage high availability packages.
Chapter 1
13
Introduction
Hardware Monitoring Overview

Benefits of Hardware Monitoring

Hardware monitoring provides the following benefits:
• Reduces system downtime by detecting hardware failures when they occur, allowing you to quickly identify and correct problems.
• Integrates into MC/ServiceGuard and other applications responsible for maintaining system availability. These applications can now add many hardware resources to the components they monitor.
• Minimizes the time required to isolate and repair failures through detailed messages describing what the problem is and how to fix it.
• Includes a default monitoring configuration that offers immediate protection for your system hardware without any intervention on your part after monitoring is enabled.
• Provides a common tool for monitoring a wide variety of system hardware resources.
• Offers a variety of notification methods to alert you when a problem occurs. No longer do you need to check the system console to determine if something has gone wrong.
• Requires minimal maintenance once installed and configured. New hardware resources added to the system are automatically included in the monitoring structure.
14
Chapter 1
Introduction
Hardware Monitor ing Overvie w

Products Supported by Hardware Monitors

EMS hardware monitors are provided for a wide range of system hardware resources. The following list identifies the types of hardware supported by monitors at the time of publication. A detailed list of the specific hardware products supported by each hardware monitor is included in http://docs.hp.com/hpux/diag/ ­the online library for information about EMS Hardware Monitors (look for “Supported Products” under EMS Hardware Monitors).
• HP disk arrays, including AutoRAID Disk Arrays and High Availability Disk Arrays
• HP disk devices, including CD-ROM drives and MO drives
• HP SCSI tape devices, including many DLT libraries and autochangers
• HP Fibre Channel SCSI Multiplexer
• HP Fibre Channel Adapters
• HP Fibre Channel Adapter (A5158)
• High Availability Storage Systems
• HP Fibre Channel Arbitrated Loop Hubs
• HP Fibre Channel Switch
• System memory
• Core hardware
• Low Priority Machine Checks (LPMCs)
• HP-UX kernel resources
• HP Fibre Channel disk array FC60
• SCSI1, SCSI2, SCSI3 interface cards
• System information
• HP UPSs (Uninterruptible Power Systems)
• Devices supported by HP device management software (Remote Monitor)
NOTE Will new products be supported?
Hewlett-Packard's strategy is to provide monitoring for all critical system hardware resources, including new products. For the latest information on what products are supported by EMS Hardware Monitors, visit the hardware monitoring web pages available at www.docs.hp.com/hpux/systems/.
Chapter 1
15
Introduction
Hardware Monitoring Overview

Tips for Hardware Monitoring

Here are some tips for using hardware monitoring. Keep hardware monitoring enabled to protect your system from undetected failures. Hardware
monitoring is an important tool for maintaining high-availability on your system. In a high-availability environment, the failure of a hardware resource makes the system vulnerable to another failure. Until the failed hardware is repaired, the backup hardware resource represents a single-point of failure. Without hardware monitoring you may not be aware of the failure. But if you are using hardware monitoring, you are alerted to the failure. This allows you to repair the failure and restore high-availability as quickly as possible.
Integrate the peripheral status monitor (PSM) into your MC/ServiceGuard strategy. An
important feature of hardware monitoring is its ability to communicate with applications responsible for maintaining system availability, such as MC/ServiceGuard. The peripheral status monitor (PSM) allows you to integrate hardware monitoring into MC/ServiceGuard. The PSM gives you the ability to failover a package based on an event detected by hardware monitoring. If you are using MC/ServiceGuard, you should consider using the PSM to include your system hardware resources in the MC/ServiceGuard strategy. In addition, the necessary notification methods are provided for communicating with network management application such as HP OpenView.
Utilize the many notification methods available. The notification methods provided by hardware
monitoring provide a great deal of flexibility in designing a strategy to keep you informed of how well your system hardware is working. The default monitoring configuration was selected to provide a variety of notification for all supported hardware resources. As you become familiar with hardware monitoring, you may want to customize the monitoring to meet your individual requirements.
Use email and/or textfile notification methods for all your requests. Both of these methods, which
are included in the default monitoring, receive the entire content of the message so you can read it immediately. Methods such as console and syslog alert you to the occurrence of an event but do not deliver the entire message. You are required to retrieve the message using the resdata utility, which requires an additional step.
Use the `All monitors' option when creating a monitoring request. This applies the monitoring
request to all monitors. This has the benefit of ensuring a new class of hardware resource added to your system will automatically be monitored. This means that new hardware is protected from undetected hardware failure with no effort on your part.
Easily replicate your hardware monitoring on all your systems. Once you have implemented a
hardware monitoring strategy on one of your system, you can replicate that same monitoring on other systems. Simply copy all of the hardware monitor configuration files to each system that will use the same monitoring. The monitor configuration files live in /var/stm/config/tools/monitor. Of course, you must have installed hardware event monitoring on each system before you copy the configuration files to it. Be sure to enable monitoring on all systems.
16
Chapter 1
Introduction
Hardware Monitor ing Overvie w

Hardware Monitoring Terms

The following terms are used throughout this guide. Understanding them is important when learning how the hardware event monitors work and how to use them effectively.
Table 1-1 Hardware Monitoring Terms
Term De fin it io n
Asynchronous event detection
Default monitoring request
Event Monitoring Service (EMS)
EMS hardware monitors
EMS high availability (HA) monitors
The ability to detect an event at the time it occurs. When an event occurs the monitor is immediately aware of it. This method provides quicker notification response than polling.
The default monitoring configuration created when the EMS Hardware Monitors are installed. The default requests ensure that a complete level of protection is automatically provided for all supported hardware resources.
The application framework used for monitoring system resources on HP-UX 10.20 and 11.x. EMS hardware monitors use the EMS framework for reporting events and creating PSM monitoring requests. The EMS framework is also used by EMS High Availability monitors.
The monitors described in this manual. They monitor hardware resources such as I/O devices (disk arrays, tape drives, etc.), interface cards, and memory. They are distributed on the Support Plus Media and are managed with the Hardware Monitoring Request Manager (monconfig).
These monitors are different than EMS hardware monitors and are not described in this manual. They monitor disk resources, cluster resources, network resources and system resources. They are designed for a high availability environment and are available at additional cost. For more information, refer to Using EMS HA Monitors, which can be downloaded from http://docs.hp.com/hpux/ha/
Chapter 1
Event severity level Each event that occurs within the hardware is assigned a
severity level, which reflects the impact the event may have on system operation. The severity levels provide the mechanism for directing event notification. For example, you may choose a notification method for critical events that will alert you immediately to their occurrence, and direct less important events to a log file for examination at your convenience. Also, when used with MC/ServiceGuard to determine failover criteria, severe and critical events cause failover.
Hardware event Any unusual or notable activity experienced by a hardware
resource. For example, a disk drive that is not responding, or a tape drive that does not have a tape loaded. When any such activity occurs, the occurrence is reported as an event to the event monitor.
17
Introduction
Hardware Monitoring Overview
Table 1-1 Hardware Monitoring Terms
Term De fin it io n
Hardware event monitor
Hardware resource A hardware device used in system operation. Resources
MC/ServiceGuard Hewlett-Packard's application for creating and managing
A monitor daemon that gathers information on the operational status of hardware resources. Each monitor is responsible for watching a specific group or type of hardware resources. For example, the tape monitor handles all tape devices on the system. The monitor may use polling or asynchronous event detection for tracking events.
Unlike a status monitor, an event monitor does not “remember” the occurrence of an event. It simply detects and reports the event. An event can be converted into a more permanent status condition using the Peripheral Status Monitor.
supported by hardware monitoring include mass storage devices such as disks and tapes, connectivity devices such as hubs and multiplexors, and device adapters.
high availability clusters of HP 9000 Series 800 computers. A high availability computer system allows application services to continue in spite of a hardware or software failure. Hardware monitoring integrates with MC/ServiceGuard to ensure that hardware problems are detected and reported immediately, allowing MC/ServiceGuard to take the necessary action to maintain system availability. MC/ServiceGuard is available at additional cost
18
Monitoring request A group of settings that define how events for a specific
monitor are handled by EMS. A monitoring request identifies the severity levels of interest and the type of notification method to use when an event occurs. A monitoring request is applied to each hardware device (or instance) supported by the monitor.
Monitoring requests are created for hardware events using the Hardware Monitoring Request Manager. Monitoring requests are created for changes in hardware status using the EMS GUI.
Multiple-view As of the June 2000 release (IPR 0006), certain monitors will
allow event reporting to be tailored for different targets (clients). This “multiple-view” (“Predictive-enabled”) feature will be added to all hardware monitors in future releases. Previously, hardware monitors generated events the same way for all targets. The problem is that different targets, such as Predictive Support, may have different requirements for events.
Chapter 1
Table 1-1 Hardware Monitoring Terms
Term De fin it io n
Introduction
Hardware Monitor ing Overvie w
Peripheral Status Monitor (PSM)
Polling The process of connecting to a hardware resource at regular
Predictive-enabled See “multiple-view.” This feature enables hardware monitors
Resource instance A specific hardware device. The resource instance is the last
Resource path Hardware event monitors are organized into classes (and
Included with the hardware event monitors, the PSM is a monitor daemon that acts as a hardware status monitor by converting events to changes in hardware resource status. This provides compatibility with MC/ServiceGuard, which uses changes in status to manage cluster resources. Through the EMS GUI, the PSM is also used to create hardware status monitoring requests.
intervals to determine its status. Any events that occur between polling intervals will not be detected until the next poll, unless the monitor supports asynchronous event monitoring.
to work with Predictive Support.
element of the resource path and is typically the hardware path to the resource (e.g., 10_12_5.0.0), but it may also be a product ID as in the case of AutoRAID disk arrays. There may be multiple instances for a monitor, each one representing a unique hardware device for which the monitor is responsible.
subclasses) for creating monitoring requests. These classes identify the unique path to each hardware resource supported by the monitor. Two similar resource paths exist for each hardware resource—an event path used for creating event monitoring requests, and a status path used for creating PSM monitoring requests.
Chapter 1
19
Introduction
Hardware Monitoring Overview
20
Chapter 1

Installing and Using Monitors

2 Installing and Using Monitors

This chapter instructs you on how to use the EMS Hardware Monitors to manage your hardware resources. The topics discussed in this chapter include:

• An overview of the steps involved

• Installing EMS Hardware Monitors
• Adding and managing monitor requests
• Disabling and enabling EMS Hardware Monitors
NOTE You don't need to completely understand the terms and concepts to begin protecting your
system with EMS Hardware Monitors by following the procedures in this chapter. If a term or concept puzzles you, refer to Chapter 1, “Introduction,” or to Chapter 3, “Detailed Description.”
Chapter 2
21
Installing and Using Monitors

The Steps Involved

The Steps Involved

The steps involved in installing and configuring hardware monitoring are shown in Figure 2-1 on page 23. Each step is described in detail in this chapter on the page indicated.

Step 1: (HP-UX 10.20 or 11.x only.) Install the Support Tools from the most current copy of Support Plus Media you can find. You can also download this package over the Web. See “Installing EMS Hardware Monitors”. This step is only necessary for HP-UX 10.20 or 11.x. With HP-UX 11i, the Support Tools are automatically installed when the OS is installed.

Step 2: Examine the list of supported products to see if any of your devices have special requirements in order to be monitored. For example, if monitoring FC-AL hubs, edit the file: /var/stm/config/tools/monitor/dm_fc_hub. See “Fibre Channel Arbitrated Loop Hub Monitor”.
Step 3: (Releases earlier than June 1999 only.) Enable hardware event monitoring. See “Enabling Hardware Event Monitoring”. This step is only necessary if you have a release of the Diagnostic/IPR Media earlier than June 1999.
Step 4: Determine whether default monitoring requests are adequate. See “Viewing Current Monitoring Requests”.
Step 5: Add or modify monitoring requests as necessary. See “Adding a Monitoring Request” and “Modifying Monitoring Requests”.
Step 6: If desired, verify monitor operation (recommended but optional). See “Verifying Hardware Event Monitoring”.

NOTE How Long Will it Take to Get Hardware Monitoring Working?

You can get hardware monitoring installed and working in minutes. Once the software is installed, you simply need to run the Hardware Monitoring Request Manager and enable monitoring. The default hardware monitoring configuration should meet your monitoring requirements without any changes or modifications. If you find that the default monitoring should be customized, you can always return later and add or modify monitoring requests as needed.
NOTE If I'm Already Using EMS HA Monitors, Can I Also Use the EMS GUI to Manage
Hardware Monitoring?
For the most part, no. Hardware event monitoring is managed using the Hardware Monitoring Request Manager, which serves the same function the EMS GUI serves for the EMS HA monitors. The only portion of hardware monitoring that is managed using the EMS GUI is status monitoring done using the Peripheral Status Monitor (PSM) described in Chapter 4, “Using the Peripheral Status Monitor.”
22
Chapter 2
Installing and Using Monitors
Figure 2-1 The Steps for Installing and Configuring Hardware Monitoring
The Steps Invol ved
Chapter 2
23
Installing and Using Monitors

Installing EMS Hardware Monitors

Installing EMS Hardware Monitors

The EMS Hardware Monitors software is distributed with the Support Tools (diagnostics). All the necessary files for hardware monitoring are installed automatically when the Support Tools are installed. There are several different ways that the Support Tools are installed:
• The Support Plus Media: installing the OnlineDiag depot from the Support Plus Media using swinstall.
• HP Software Depot website: downloading the “Support Tools for the HP 9000” in the “Enhancement Releases” product category, then using swinstall to install the OnlineDiag depot.
• Automatic: with HP-UX 11i, the Support Tools are automatically installed from the OE CD-ROM when the operating system is installed.

Complete instructions for installing STM are contained in Chapter 5 of the Support Plus: Diagnostics User's Guide.

The following software components are installed for hardware monitoring:

• All hardware event monitors

• Monitor configuration files
• Monitoring Request Manager
• EMS framework, including the EMS graphical interface
All EMS Hardware Monitors on the CD-ROM will be installed on your system, but only those that support hardware resources you are using will be active. If you add a new hardware resource to your system that uses an installed monitor, the monitor will be launched when the system is restarted, following the execution of the IOSCAN utility, or when the Enable Monitoring option is selected from the Hardware Monitoring Request Manager. This ensures that immediate monitoring and notification are provided for the new hardware resource.
NOTE Reinstalling or upgrading the STM software will erase the current Peripheral Status Monitor
(PSM) configuration. Any MC/ServiceGuard package dependencies or EMS monitoring requests you have created with the PSM will be lost. Before reinstalling the STM software, record the current PSM configuration so you can easily recreate it after the software has been installed. Or you can comment out the PSM dependencies in the ServiceGuard configuration files, then re-enable them after the STM software has been installed.

Supported System Configuration

To use the hardware event monitors your system must meet the following requirements:
• HP 9000 Series 700 or 800 Computer
• HP-UX 10.20 or 11.x (Hardware event monitoring is not currently available on the special high security systems, HP-UX 10.26 (TOS) and HP-UX 11.04 (VVOS).
• Support Plus Media, the more current the better. The hardware event monitors were first distributed in the February 1999 release (IPR 9902). Before the September 1999 release, the Support Plus Media was called the Diagnostic/IPR Media.
Rather than use the Support Plus Media, you can download the Support Tools (including STM and the hardware event monitors) over the Web. See Chapter 5 of the Support Plus: Diagnostics User's Guide for
24
Chapter 2
Installing and Using Monitors
Installing EMS Hardware Monitors
more information
• If you are using MC/ServiceGuard (optional), you must have version A.10.11 on HP-UX 10.20, or version A.11.04 for HP-UX 11.x.

Removing EMS Hardware Monitors

The hardware monitoring software can be removed using the swremove utility. Run swremove and select the OnlineDiag bundle. This will remove the hardware monitoring software components and the STM software
components.
Chapter 2
25
Installing and Using Monitors

Checking for Special Requirements

Checking for Special Requirements

Some devices have special requirements in order to be monitored. Examine the tables of supported products below to see if any of your devices have special requirements.
Table 2-1 Disk Arrays
Product Model/Product
Number
HP AutoRAID Disk Array
Supported by: AutoRAID Disk Array Monitor
HP High Availability Disk Array
Supported by: High-Availability Disk Array Monitor
HP Fast/Wide SCSI Disk Array
Supported by: Fast/Wide SCSI Disk Array Monitor
HP Fibre Channel High Availability Disk Array (Model 60/FC)
Supported by Disk Array FC60 Monitor
12H 12
30/FC 20 10
C243XHA None
60/FC Requires AM60Srvr
Special Requirements
Requires the following ARMServer versions:
B.10.01.10 for HP-UX
10.X (Patch PHCO_21309)
B.11.00.10 for HP-UX
11.X (Patch PHCO_21435)
None
daemon, part of the Array Manager 60 disk array management software.
26
For HP-UX 11.00: patch PHCO_21315
For HP-UX 10.20: patch PHCO_21314
Chapter 2
Tab l e 2-2 D is k Pr od uct s
Installing and Using Monitors
Checking for Special Require ments
Product Model/Product
Number
All disks bound to the sdisk and disc30 drivers and not under the control of another event monitor (such as a disk array monitor).
Supported by: Disk Monitor
NA None
Special Requirements
Table 2-3 Tape Products (all supported by the SCSI Tape Devices Monitor)
Product Model/Product
Number
DDS-2 Autoloader A3400A None
DDS-3 Autoloader A3716A None
DDS-4 Autoloader C6370A, C6371A March 00 Release
DLT4000 4/48 Library; HP-UX; Differential SCSI
DLT4000 2/48 Library; HP-UX; Differential SCSI
A3544A None
A3545A None
Special Requirements
DLT4000 2/28 Library; HP-UX; Differential SCSI
DLT 4000 and 7000; 2/28; Drives Differential; Robotics SE/Diff
DLT 4000 & 7000; 4/48: Drives Differential; Robotics SE/Diff
DLT 4000 and 7000; 15 slot; Deskside/Rack; Differential
DLT 4000 and 7000; 588 slot; Drives Diff; Robotics SE
DLT 4000 and 7000; 100 slot; Drives Diff; Robotics SE
DLT 4000 and 7000; 30 slot; Differential A4853A None
DLT7000 8-slot Library A5501A March 00 Release
DLT8000 8-slot Library A1375A March 00 Release
DLT8000 20-slot Library A5583A,A5584A,
A3546A None
A4850A None
A4855A None
A4851A None
A4845A None
A4846A None
March 00 Release A4680AZ, A4680AHP, A4681AHP
Chapter 2
27
Installing and Using Monitors
Checking for Special Requirements
Table 2-3 Tape Products (all supported by the SCSI Tape Devices Monitor)
Product Model/Product
Number
DLT8000 40-slot Library A5585A, A5586A,
A4682AZ, A4682AHP, A4683AHP
DLT8000 60-slot Library A5587A, A5588A,
A4684AZ, A4684AHP, A4685AHP
DLT8000 100-slot Library A4665A, A4666A June 00 Release
DLT8000 120-slot Library A4667A, A4668A June 00 Release
DLT8000 140-slot Library A4669A, A4670A June 00 Release
DLT8000 700-slot Library A5597A March 00 Release
DLT8000 180-slot Library A5617A March 00 Release
All SCSI tape resources bound to the following device drivers: tape2 - HP-PB tape driver stape - HSC tape driver
Special
Requirements
March 00 Release
March 00 Release
None

All the tape products in the above table are monitored by the SCSI Tape Devices Monitor.

Table 2-4 High Availability Storage Systems
Product Model/Product
Number
HP High Availability Storage System
Supported by: High-Availability Storage System Monitor
1010D None
Special
Requirements
Table 2-5 Fibre Channel SCSI Multiplexers
Product Model/Product
Number
HP Fibre Channel SCSI Multiplexer
Supported by: Fibre Channel SCSI Multiplexer Monitor
A3308A Firmware version
Special
Requirements
3840
28
Chapter 2
Table 2-6 Fibre Channel Adapters
Installing and Using Monitors
Checking for Special Require ments
Product Model/Product
Number
HP Fibre Mass Storage Channel Adapters
Supported by: Fibre Channel Mass Storage Channel Adapter
A5158A (Tachyon-TL) Fibre Channel Mass Storage Channel Adapter
Supported by: A5158A Fibre Channel Mass Storage Channel Adapter
A3404A A3591A A3636A A3740A
A5158A B.11.00 Tachlite
Table 2-7 Fibre Channel Arbitrated Loop (FC-AL) Hub
Product Model/Product
Number
HP Fibre Channel Arbitrated Loop Hubs
Supported by: Fibre Channel Arbitrated Loop Hub Monitor
A3724A A4839A
Special Requirements
The FC-AL Hub monitor requires: Device Firmware revisions:
• Device Agent Firmware revision 2.14 or greater
• Hub Controller Firmware revision 3.06 or greater
Firmware and installation instructions are available at http://www.software.hp.com
Special Requirements
The following driver revisions are required: B.10.20 TFC plus Dart40 B.11.00 release IPR9808 (Rocklin version)
driver (td) Dart 48
B.11.00 release IPR 0003
Chapter 2
C++ runtime support patches:
• 10.20 PHSS_17872 (has a dependency: PHSS_17225)
• 11.00 PHSS_21906
Before using the hub monitor, edit the monitor configuration file (/var/stm/config/tools/monitor/dm_fc_hub.cfg ) to indicate what hubs will be monitored. See “Fibre Channel Arbitrated Loop Hub Monitor” on page 102.
29
Installing and Using Monitors
Checking for Special Requirements
Table 2-8 Fibre Channel Switch
Product Model/Product
HP Fibre Channel Switch
Supported by: Fibre Channel Switch Monitor
Tab l e 2- 9 Mem or y
Product Model/Product
All system memory.
Supported by: Memory Monitor
Special Requirements
Number
A5223A The FC Switch monitor requires:
C++ runtime support patches:
• 10.20 PHSS_17872 (has a dependency: PHSS_17225)
• 11.00 PHSS_21906
Before using the switch monitor, edit the monitor configuration file (/var/stm/config/tools/monitor/dm_fc_sw.cfg) to indicate what switches will be monitored. See “Fibre Channel Arbitrated Loop Hub Monitor” on page 102.
Special
Number
NA None
Requirements
Tab l e 2-10 S yst e m
Product Model/Product
Core hardware (hardware within the SPU cabinet)
Supported by: Core Hardware Monitor
Low Priority Machine Checks (LPMCs)
Supported by: LPMC Monitor
HP-UX Kernel Resources
Supported by: Kernel Resource Monitor
System Status
Supported by: System Status Monitor
Special
Number
NA HP-UX 11.x
NA HP-UX 11.x
NA HP-UX 11.x.
NA None
Requirements
Requires configuration through SAM.
30
Chapter 2
Loading...
+ 85 hidden pages