The information in this document is subject to change without notice.
Hewlett-Packard makes no warranty of any kind with regard to this manual, including, but not limited to, the
implied warranties of merchantability and fitness for a particular purpose. Hewlett-Packard shall not be held
liable for errors contained herein or direct, indirect, special, incidental or consequential damages in
connection with the furnishing, performance, or use of this material.
Warranty
A copy of the specific warranty terms applicable to your Hewlett-Packard product and replacement parts can
be obtained from your local Sales and Service Office.
Restricted Rights Legend
Use, duplication or disclosure by the U.S. Government is subject to restrictions as set forth in subparagraph
(c) (1) (ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013 for DOD
agencies, and subparagraphs (c) (1) and (c) (2) of the Commercial Computer Software Restricted Rights clause
at FAR 52.227-19 for other agencies.
HEWLETT-PACKARD DEVELOPMENT COMPANY L.P.
20555 S.H. 249
Houston, Texas 77070
U.S. A.
Use of this document and any supporting software media supplied for this pack is restricted to this product
only. Additional copies of the programs may be made for security and back-up purposes only. Resale of the
programs, in their present form or with alterations, is expressly prohibited.
Copyright Notices
Copyright
adaptation, or translation of this document without prior written permission is prohibited, except as allowed
under the copyright laws.
Trademark Notices
UNIX
Open Group.
2001-2008 Hewlett-Packard Development Company L.P. All rights reserved. Reproduction,
is a registered trademark in the United States and other countries, licensed exclusively through The
2
Publishing History
New editions of this manual will incorporate information that is new or has changed since the previous
edition was published (minor typographical or formatting corrections do not result in the publication of a new
edition). The publishing date, manufacturing part number, and edition number all change each time a new
edition is published, providing unique identification for each edition.
Edition / Manufacturing Part Number / Date
First Edition / 5991-4793 / March 2006
Second Edition / 5991-5308 / April 2006
Third Edition / 5992-0539 / March 2007
Fourth Edition / 5992-3799 / March 2008
Conventions
We use the following typographical conventions.
audit (5) An HP-UX manpage. audit is the name and 5 is the section in the HP-UX Reference. On the
web and on the Instant Information CD, it may be a hot link to the manpage itself. From the
HP-UX command line, you can enter “man audit” or “man 5 audit” to view the manpage.
See man (1).
Book TitleThe title of a book. On the web and on the Instant Information CD, it may be a hot link to
the book itself.
KeyCap The name of a keyboard key. Note that Return and Enter both refer to the same key.
Emphasis Text that is emphasized.
Emphasis Text that is strongly emphasized.
Term The defined use of an important word or phrase.
ComputerOut Text displayed by the computer.
UserInput Commands and other text that you type.
Command A command name or qualified command phrase.
Variable
[] The contents are optional in formats and command descriptions.
{} The contents are required in formats and command descriptions. If the contents are a list
. . . The preceding element may be repeated an arbitrary number of times.
| Separates litems in a list of choices.
The name of a variable that you may replace in a command or function or information in a
The PCI Error Handling feature allows an HP-UX system to avoid a Machine Check Abort (MCA) or a
High Priority Machine Check (HPMC), if a PCI error occurs (for example, a parity error).
If a PCI error occurs on a bus without the PCI Error Handling feature installed, an MCA or an HPMC will
occur, then the system will crash.
With the PCI Error Handling feature installed, if a PCI error occurs on a bus containing an I/O card that
supports PCI Error Recovery:
• The corresponding device driver reports the error
•The PCI bus is quarantined to isolate the system from further I/O - preventing the error from damaging
the system
•The olrad command and the Attention Button can be used to online recover, restoring the slot, card, and
driver to a usable state
Accessing and Installing the PCI Error Handling Feature
The PCI Error Handling feature can be accessed and installed on supported systems from the Software Pack
CD-ROM, or the HP Software Depot.
Confirm PCI Error Handling is Supported
Step 1. Review the PCI Error Handling Support Matrix document posted at http://www.docs.hp.com to
confirm if PCI Error Handling is supported with your configuration and system firmware version.
Step 2. To confirm which system firmware version is installed on your system, or any cell in your system,
use the sysrev command from the management processor Command Menu (CM) prompt as
follows:
MP:CM> sysrev
The sysrev command output on Superdome systems is different from the sysrev command output
on the other systems that support PCI Error Handling.
The sysrev command output on Superdome systems will list the system firmware version under
the SYS FW heading as illustrated in the following example:
Chapter
3
PCI Error Handling Product Note
Accessing and Installing the PCI Error Handling Feature
On the mid-range systems that support PCI Error Handling, the system firmware version will be
listed with the Pri SFW heading as illustrated in the following example:
MP:CM> sysrev
Cabinet firmware revision report
4
Chapter
Accessing and Installing the PCI Error Handling Feature
PROGRAMMABLE HARDWARE :
System Backplane : GPM FM OSP
------- ------- -------
1.002 1.002 1.002
PCI-X Backplane : LPM HS
------- -------
2.000 1.000
Core IO : Master Slave
-------- -------
2.010 2.010
PCI Error Handling Product Note
LPM PDHC
------- -------
Cell 0 : 1.002 1.010
Cell 1 : 1.002 1.010
Cell 2 : 1.002 1.010
Cell 3 : 1.002 1.010
FIRMWARE:
Core IO
Master : A.007.008
Event Dict. : 0.009
Slave : A.007.008
Event Dict. : 0.009
Chapter
Cell 0
PDHC : A.003.027
Pri SFW : 23.001 (PA)
Sec SFW : 23.001 (PA)
5
PCI Error Handling Product Note
Accessing and Installing the PCI Error Handling Feature
Cell 1
PDHC : A.003.027
Pri SFW : 23.001 (PA)
Sec SFW : 23.001 (PA)
Cell 2
PDHC : A.003.027
Pri SFW : 23.001 (PA)
Sec SFW : 23.001 (PA)
Cell 3
PDHC : A.003.027
Pri SFW : 23.001 (PA)
Sec SFW : 23.001
NOTEThe sysrev command output on some systems includes extra zeros in the system
firmware version number. These zeros can be ignored. For example, 3.88 and 3.088
on HP Integrity systems are the same firmware version, also 23.1 and 23.001 on HP
9000 systems represent the same firmware version.
Step 3. The system firmware is the main component of the firmware recipe required to support PCI Error
Handling. If you do not have the minimum system firmware version listed in the PCI Error
Handling Support Matrix (or a later version), you do not have a firmware recipe that supports PCI
Error Handling installed on your system. Contact your HP representative for assistance on
accessing and installing a firmware recipe that supports the PCI Error Handling feature. If you
have the supported system firmware version (or later) installed on your system, you are ready to
install PCI Error Handling from the Application Release (AR) media.
For information about installing the PCI Error Handling product from the AR media, see:
http://www.docs.hp.com/en/5992-1978/5992-1978.pdf
under Chapter 7, Installing HP Applications and Patches.
NOTEIn addition to installing the PCIErrorHandling bundle, the btlan, igelan, and iether drivers
require patches to enable PCI Error Handling. Also, the latest version of the fcd and mpt
driver must be installed to enable PCI Error Handling.
6
Chapter
New Error Messages for PCI Error Handling
The patch required for the btlan driver is included with the PCIErrorHandling bundle.
The patches required for the igelan and iether drivers must be downloaded and installed
separately from the IT Resource Center at http://www.itrc.hp.com.
—The iether driver requires patch PHNE 32199 or later.
—The igelan driver requires patch PHNE 34037 or later.
The latest version of the fcd driver (FibrChanl-01 bundle, version B.11.23.0401, or later) must
be downloaded and installed from the Software Depot at http://h20293.www2.hp.com.
The latest version of the mpt driver (scsiU320-00 bundle, version B.11.23.0606, or later) must
be downloaded and installed from the Software Depot at http://h20293.www2.hp.com.
Installing PCI Error Handling from the Software Depot
To install PCI Error Handling from the Software Depot:
Step 1. Go to the HP Software Depot at http://h20293.www2.hp.com
Step 2. Select “Enhancement releases and patch bundles”
Step 4. Follow the instructions to download and install PCI Error Handling
NOTEIn addition to installing the PCIErrorHandling bundle, the btlan, igelan, and iether drivers
require patches to enable PCI Error Handling. Also, the latest version of the fcd and mpt
driver must be installed to enable PCI Error Handling.
The patch required for the btlan driver is included with the PCIErrorHandling bundle.
The patches required for the igelan and iether drivers must be downloaded and installed
separately from the IT Resource Center at http://www.itrc.hp.com.
—The iether driver requires patch PHNE 32199 or later.
—The igelan driver requires patch PHNE 34037 or later.
The latest version of the fcd driver (FibrChanl-01 bundle, version B.11.23.0401, or later) must
be downloaded and installed from the Software Depot at http://h20293.www2.hp.com.
The latest version of the mpt driver (scsiU320-00 bundle, version B.11.23.0606, or later) must
be downloaded and installed from the Software Depot at http://h20293.www2.hp.com.
New Error Messages for PCI Error Handling
When the PCI Error Handling feature is installed, new error messages are included for each of the drivers
that support PCI Error Handling.
Chapter
7
PCI Error Handling Product Note
New Error Messages for PCI Error Handling
— Error messages for the btlan, igelan, and iether drivers appear in the console log only and do not get
logged in syslog.
— Error messages for the fcd and mpt drivers are logged in syslog and diaglog.
— If an I/O card has multiple ports, error messages may not be reported for all of the ports on the card if the
PCI Error Handling feature suspends the driver before the error is detected on all of the ports.
New btlan Driver Error Message
There is 1 new error message for the btlan driver (100BaseT – Networking) that will appear in the console
log as illustrated in the following example:
There are 3 new error messages for the igelan driver (Gigabit Networking) that will appear in the console log
as illustrated in the following examples:
Thu Jan 24 MST 2008 21:50:49.540624 DISASTER Subsys:IETHER Loc:00000
<1002> 1000Base-T in path 6/0/0/1/0
Was moved to DEAD state due to a PCI error.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Thu Jan 24 MST 2008 21:50:49.565469 DISASTER Subsys:IETHER Loc:00000
<1004> 1000Base-T in path 6/0/0/1/0
Is being suspended due to a PCI error.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Thu Jan 24 MST 2008 21:50:49.585899 DISASTER Subsys:IETHER Loc:00000
<1004> 1000Base-T in path 6/0/0/1/1
Is being suspended due to a PCI error.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
New fcd Driver Error Messages
There 6 new error messages for the fcd driver (2-Gigabit 2 port and 4-Gigabit 1 port FibreChannel - mass
storage) that will be logged in syslog and diaglog as illustrated in the following example(s):
1/0/10/1/0: Fibre Channel Driver has detected a Fatal PCI Error.
1/0/10/1/0: Fibre Channel Driver Received Suspend Request.
1/0/10/1/0: Fibre Channel Driver has been Successfully Suspended.
1/0/10/1/1: Fibre Channel Driver Received Suspend Request.
1/0/10/1/1: Fibre Channel Driver has detected a Fatal PCI Error.
1/0/10/1/1: Fibre Channel Driver has been Successfully Suspended.
New mpt Driver Error Messages
There are six new error messages for the mpt driver that will be logged in syslog and diaglog for the
following events:
•When the driver is taken offline due to a PCI error
Chapter
9
PCI Error Handling Product Note
How to Online Recover from a PCI Error
•When the driver is suspended due to a PCI error
•When the driver is resumed after a PCI error
•When the resume operation fails due to a PCI error
•When a firmware update on the card associated with the driver fails due to a PCI error
•When an initiator ID change fails due to a PCI error
How to Online Recover from a PCI Error
The olrad command and the Attention Button can be used to attempt online recovery from a PCI error
without requiring a system reboot.
Recovery Using the olrad Command
Step 1. If the PCI slot remains powered ON, use the olrad –p OFF slot_id command to power it OFF.
Step 2. If power OFF succeeds, try a Post Replace operation at the slot using the olrad -R slot_id
command.
Step 3. If the Post Replace operation fails there is a high probability that the I/O card is bad. HP
recommends replacing the I/O card with an I/O card that has the same HP Manufacturing Part
Number and the same (or later) release version number, then repeat the Post Replace operation
described in Step 2.
Step 4. If the Post Replace operation succeeds and the I/O card/slot recovers from the error, the software
state of the components will be marked CLAIMED in the ioscan(1M) output. If you continue to experience errors on this slot, there is a high probability that the I/O card is bad. HP recommends
replacing the I/O card with an I/O card that has the same HP Manufacturing Part Number and the
same (or later) release version number, then repeat the Post Replace operation described in Step 2.
IMPORTANT If you use Serviceguard, HP recommends the PCI Error Handling feature only be enabled if
your storage devices are configured with multiple paths and are protected by high availability
storage software such as PVLink, SecurePath, or MirrorDisk/UX. If PCI Error Handling is
enabled, but your storage devices are configured with only a single path, a system reboot may
be necessary to recover from a PCI error.
NOTEWith the PCI Error Handling solution installed, there is still a remote possibility that an MCA
or HPMC could occur during a PCI OLA operation (online addition of an I/O card). At the
beginning of a PCI OLA operation, there is a brief time during which the PCI Error Handling
infrastructure determines if the driver associated with the card is PCI Error Handling capable.
Any PCI error that occurs during this brief window of exposure can cause an MCA or HPMC.
This exposure only exists during PCI OLA operations. This exposure does not exist during PCI
OLR operations (online replacement of an I/O card), or during ordinary I/O card operations.
10
Chapter
PCI Error Handling Product Note
How to Online Recover from a PCI Error
The following example shows how the PCI Error Handling feature is used to handle a PCI error involving the
iether driver:
NOTEThe PCI Error Handling procedure detailed in this example may vary slightly from what you
will experience, depending on the platform and IO card driver.
A. A PCI error occurs and error messages are displayed on the console:
To use the Attention Button to recover from a PCI error, refer to the Interface Card OL* Support Guide,
Manufacturing Part Number B2355-90862, for instructions on using the Attention Button, then use the
Attention Button to complete the same steps that are illustrated in “Recovery Using the olrad Command” on
page 10:
Step 1. Confirm the driver/card is suspended
Step 2. Confirm card is in an error state
Step 3. Power off the slot
Step 4. Confirm the slot power is off
Step 5. Resume the card
Step 6. Confirm the card has been resumed
PCI Error Handling Documentation
The documentation that supports this release of the PCI Error Handling feature consists of:
•PCI Error Handling Product Note, March 2007, Manufacturing Part Number 5992-0539 — available in
the High Availability category at http://docs.hp.com/en/ha.html
•SW Depot web page — available from the HP Software Depot at http://h20293.www2.hp.com
14
Chapter
PCI Error Handling Product Note
Known Problems
•olrad manpage — after installing the PCI Error Handling feature, enter man olrad from the command
line to view the olrad manpage that includes PCI Error Handling information.
•Interface Card OL* Support Guide, September 2004, Manufacturing Part Number B2355-90862 —
available at: http://docs.hp.com
•Patch Management User Guide for HP-UX 11.x Systems, February 2007, Manufacturing Part Number 5991-6449 — available at: http://docs.hp.com
Known Problems
IMPORTANT If you use Serviceguard, HP recommends the PCI Error Handling feature only be enabled if
your storage devices are configured with multiple paths and are protected by high availability
storage software such as PVLink, SecurePath, or MirrorDisk/UX. If PCI Error Handling is
enabled, but your storage devices are configured with only a single path, Serviceguard may not
detect when connectivity is lost and cause a failover.
NOTEWith the PCI Error Handling solution installed, there is still a remote possibility that an MCA
or HPMC could occur during a PCI OLA operation (online addition of an I/O card). At the
beginning of a PCI OLA operation, there is a brief time during which the PCI Error Handling
infrastructure determines if the driver associated with the card is PCI Error Handling capable.
Any PCI error that occurs during this brief window of exposure can cause an MCA or HPMC.
This exposure only exists during PCI OLA operations. This exposure does not exist during PCI
OLR operations (online replacement of an I/O card), or during ordinary I/O card operations.
Removing the PCI Error Handling Feature
To remove the PCI OL* Error Handling feature use the swremove command:
# swremove -x autoreboot=true PCIErrorHandling
This will remove the PCI Error Handling feature and reboot your system, leaving the bundle wrapper and
kernel patches on your system. The kernel patches that were included with the product bundle are
recommended for your system. Therefore, we advise that you do not remove them.
For more information on managing patches on your system, see the Patch Management User Guide for HP-UX
11.x Systems, February 2007, Manufacturing Part Number 5991-6449. This document is available on the
Support Plus media and on the Hewlett-Packard documentation web site:
http://www.docs.hp.com
Use the swlist command to verify PCI Errror Handling has been removed from your system. The swlist
command will not display PCIErrorHandling if it has been removed from your system.
Chapter
15
PCI Error Handling Product Note
Terms and Definitions
Terms and Definitions
HPMC High Priority Machine Check – Highest Priority interruption onPA-RISC based systems
MCA Machine Check Abort – Highest Priority interruption on Itanium
Post Replace Operation - By issuing the olrad -R slot_id command after an I/O card is replaced, slot
power is turned on, suspended drivers are resumed, driver scripts (post_replace) for the slot (slot_id) and
affected slots (if any) are run, and the attention LED for the slot (slot_id) is set to OFF.
based systems
16
Chapter
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.