HP PCI Error Handling and Recovery User's Guide

PCI Error Handling Product Note
HP-UX Servers and Workstations
Fourth Edition
Manufacturing Part Number : 5992-3799
March 2008
United States
© Copyright 2001-2008 Hewlett-Packard Development Company LP. All rights reserved.
The information in this document is subject to change without notice.
Hewlett-Packard makes no warranty of any kind with regard to this manual, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Hewlett-Packard shall not be held
liable for errors contained herein or direct, indirect, special, incidental or consequential damages in connection with the furnishing, performance, or use of this material.
Warranty
A copy of the specific warranty terms applicable to your Hewlett-Packard product and replacement parts can be obtained from your local Sales and Service Office.
Restricted Rights Legend
Use, duplication or disclosure by the U.S. Government is subject to restrictions as set forth in subparagraph (c) (1) (ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013 for DOD agencies, and subparagraphs (c) (1) and (c) (2) of the Commercial Computer Software Restricted Rights clause at FAR 52.227-19 for other agencies.
HEWLETT-PACKARD DEVELOPMENT COMPANY L.P. 20555 S.H. 249 Houston, Texas 77070 U.S. A.
Use of this document and any supporting software media supplied for this pack is restricted to this product only. Additional copies of the programs may be made for security and back-up purposes only. Resale of the programs, in their present form or with alterations, is expressly prohibited.
Copyright Notices
Copyright adaptation, or translation of this document without prior written permission is prohibited, except as allowed under the copyright laws.
Trademark Notices
UNIX Open Group.
2001-2008 Hewlett-Packard Development Company L.P. All rights reserved. Reproduction,
is a registered trademark in the United States and other countries, licensed exclusively through The
2
Publishing History
New editions of this manual will incorporate information that is new or has changed since the previous edition was published (minor typographical or formatting corrections do not result in the publication of a new edition). The publishing date, manufacturing part number, and edition number all change each time a new edition is published, providing unique identification for each edition.
Edition / Manufacturing Part Number / Date
First Edition / 5991-4793 / March 2006 Second Edition / 5991-5308 / April 2006 Third Edition / 5992-0539 / March 2007 Fourth Edition / 5992-3799 / March 2008
Conventions
We use the following typographical conventions. audit (5) An HP-UX manpage. audit is the name and 5 is the section in the HP-UX Reference. On the
web and on the Instant Information CD, it may be a hot link to the manpage itself. From the HP-UX command line, you can enter “man audit” or “man 5 audit” to view the manpage. See man (1).
Book Title The title of a book. On the web and on the Instant Information CD, it may be a hot link to
the book itself.
KeyCap The name of a keyboard key. Note that Return and Enter both refer to the same key.
Emphasis Text that is emphasized.
Emphasis Text that is strongly emphasized. Term The defined use of an important word or phrase.
ComputerOut Text displayed by the computer.
UserInput Commands and other text that you type.
Command A command name or qualified command phrase.
Variable
[] The contents are optional in formats and command descriptions. {} The contents are required in formats and command descriptions. If the contents are a list
. . . The preceding element may be repeated an arbitrary number of times. | Separates litems in a list of choices.
The name of a variable that you may replace in a command or function or information in a
display that represents several possible values.
separated by |, you must choose one of the items
3
4
PCI Error Handling Product Note
What is PCI Error Handling? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Accessing and Installing the PCI Error Handling Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Confirm PCI Error Handling is Supported . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Installing PCI Error Handling from the Software Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
New Error Messages for PCI Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
New btlan Driver Error Message. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
New igelan Driver Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
New iether Driver Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
New fcd Driver Error Messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
New mpt Driver Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
How to Online Recover from a PCI Error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Recovery Using the olrad Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Recovery Using the Attention Button . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
PCI Error Handling Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Known Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Removing the PCI Error Handling Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Terms and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Contents
1
Contents
2
PCI Error Handling Product Note

What is PCI Error Handling?

The PCI Error Handling feature allows an HP-UX system to avoid a Machine Check Abort (MCA) or a High Priority Machine Check (HPMC), if a PCI error occurs (for example, a parity error).
If a PCI error occurs on a bus without the PCI Error Handling feature installed, an MCA or an HPMC will occur, then the system will crash.
With the PCI Error Handling feature installed, if a PCI error occurs on a bus containing an I/O card that supports PCI Error Recovery:
The corresponding device driver reports the error
The PCI bus is quarantined to isolate the system from further I/O - preventing the error from damaging the system
•The olrad command and the Attention Button can be used to online recover, restoring the slot, card, and driver to a usable state

Accessing and Installing the PCI Error Handling Feature

The PCI Error Handling feature can be accessed and installed on supported systems from the Software Pack CD-ROM, or the HP Software Depot.

Confirm PCI Error Handling is Supported

Step 1. Review the PCI Error Handling Support Matrix document posted at http://www.docs.hp.com to
confirm if PCI Error Handling is supported with your configuration and system firmware version.
Step 2. To confirm which system firmware version is installed on your system, or any cell in your system,
use the sysrev command from the management processor Command Menu (CM) prompt as follows:
MP:CM> sysrev
The sysrev command output on Superdome systems is different from the sysrev command output on the other systems that support PCI Error Handling.
The sysrev command output on Superdome systems will list the system firmware version under the SYS FW heading as illustrated in the following example:
Chapter
3
PCI Error Handling Product Note
Accessing and Installing the PCI Error Handling Feature
MP:CM> sysrev Utility Subsystem FW Revision Level: 15.22
| Cabinet #0 | Cabinet #1 | Cab #8 | Cab #9 |
-----------------------+-----------------+-----------------+--------+--------+
| SYS FW | PDHC | SYS FW | PDHC | | |
Cell (slot 0) | 3.64 | 15.12 | 3.82 | 15.12 | | |
Cell (slot 1) | 3.82 | 15.12 | 3.66 | 15.12 | | |
Cell (slot 2) | 3.88 | 15.14 | 3.66 | 15.12 | | |
Cell (slot 3) | 3.82 | 15.12 | 3.50 | 15.10 | | |
Cell (slot 4) | 3.82 | 15.12 | 3.86 | 15.14 | | |
Cell (slot 5) | 3.64 | 15.12 | 3.82 | 15.12 | | |
Cell (slot 6) | 3.82 | 15.12 | 3.84 | 15.10 | | |
Cell (slot 7) | 3.88 | 15.14 | 3.82 | 15.12 | | |
| | | | |
MP | 15.22 | | | |
ED | 3.13 | | | |
CLU | 15.2 | 15.2 | 15.2 | 15.2 |
PM | 15.0 | 15.0 | 15.0 | 15.0 |
CIO (bay 0, chassis 1) | 15.0 | 15.0 | 15.0 | 15.0 |
CIO (bay 0, chassis 3) | 15.0 | 15.0 | 15.0 | 15.0 |
CIO (bay 1, chassis 1) | 15.0 | 15.0 | 15.0 | 15.0 |
CIO (bay 1, chassis 3) | 15.0 | | 15.0 | 15.0 |
On the mid-range systems that support PCI Error Handling, the system firmware version will be listed with the Pri SFW heading as illustrated in the following example:
MP:CM> sysrev
Cabinet firmware revision report
4
Chapter
Accessing and Installing the PCI Error Handling Feature
PROGRAMMABLE HARDWARE :
System Backplane : GPM FM OSP
------- ------- -------
1.002 1.002 1.002
PCI-X Backplane : LPM HS
------- -------
2.000 1.000
Core IO : Master Slave
-------- -------
2.010 2.010
PCI Error Handling Product Note
LPM PDHC
------- -------
Cell 0 : 1.002 1.010
Cell 1 : 1.002 1.010
Cell 2 : 1.002 1.010
Cell 3 : 1.002 1.010
FIRMWARE:
Core IO
Master : A.007.008
Event Dict. : 0.009
Slave : A.007.008
Event Dict. : 0.009
Chapter
Cell 0
PDHC : A.003.027
Pri SFW : 23.001 (PA)
Sec SFW : 23.001 (PA)
5
PCI Error Handling Product Note
Accessing and Installing the PCI Error Handling Feature
Cell 1
PDHC : A.003.027
Pri SFW : 23.001 (PA)
Sec SFW : 23.001 (PA)
Cell 2
PDHC : A.003.027
Pri SFW : 23.001 (PA)
Sec SFW : 23.001 (PA)
Cell 3
PDHC : A.003.027
Pri SFW : 23.001 (PA)
Sec SFW : 23.001
NOTE The sysrev command output on some systems includes extra zeros in the system
firmware version number. These zeros can be ignored. For example, 3.88 and 3.088 on HP Integrity systems are the same firmware version, also 23.1 and 23.001 on HP 9000 systems represent the same firmware version.
Step 3. The system firmware is the main component of the firmware recipe required to support PCI Error
Handling. If you do not have the minimum system firmware version listed in the PCI Error Handling Support Matrix (or a later version), you do not have a firmware recipe that supports PCI Error Handling installed on your system. Contact your HP representative for assistance on accessing and installing a firmware recipe that supports the PCI Error Handling feature. If you have the supported system firmware version (or later) installed on your system, you are ready to install PCI Error Handling from the Application Release (AR) media.
For information about installing the PCI Error Handling product from the AR media, see:
http://www.docs.hp.com/en/5992-1978/5992-1978.pdf
under Chapter 7, Installing HP Applications and Patches.
NOTE In addition to installing the PCIErrorHandling bundle, the btlan, igelan, and iether drivers
require patches to enable PCI Error Handling. Also, the latest version of the fcd and mpt driver must be installed to enable PCI Error Handling.
6
Chapter

New Error Messages for PCI Error Handling

The patch required for the btlan driver is included with the PCIErrorHandling bundle. The patches required for the igelan and iether drivers must be downloaded and installed
separately from the IT Resource Center at http://www.itrc.hp.com.
—The iether driver requires patch PHNE 32199 or later. —The igelan driver requires patch PHNE 34037 or later. The latest version of the fcd driver (FibrChanl-01 bundle, version B.11.23.0401, or later) must
be downloaded and installed from the Software Depot at http://h20293.www2.hp.com. The latest version of the mpt driver (scsiU320-00 bundle, version B.11.23.0606, or later) must
be downloaded and installed from the Software Depot at http://h20293.www2.hp.com.

Installing PCI Error Handling from the Software Depot

To install PCI Error Handling from the Software Depot:
Step 1. Go to the HP Software Depot at http://h20293.www2.hp.com
Step 2. Select “Enhancement releases and patch bundles”
PCI Error Handling Product Note
Step 3. Select HP-UX Software Pack (Optional HP-UX 11i v2 Core Enhancements)
Step 4. Follow the instructions to download and install PCI Error Handling
NOTE In addition to installing the PCIErrorHandling bundle, the btlan, igelan, and iether drivers
require patches to enable PCI Error Handling. Also, the latest version of the fcd and mpt driver must be installed to enable PCI Error Handling.
The patch required for the btlan driver is included with the PCIErrorHandling bundle. The patches required for the igelan and iether drivers must be downloaded and installed
separately from the IT Resource Center at http://www.itrc.hp.com.
—The iether driver requires patch PHNE 32199 or later. —The igelan driver requires patch PHNE 34037 or later. The latest version of the fcd driver (FibrChanl-01 bundle, version B.11.23.0401, or later) must
be downloaded and installed from the Software Depot at http://h20293.www2.hp.com. The latest version of the mpt driver (scsiU320-00 bundle, version B.11.23.0606, or later) must
be downloaded and installed from the Software Depot at http://h20293.www2.hp.com.
New Error Messages for PCI Error Handling
When the PCI Error Handling feature is installed, new error messages are included for each of the drivers that support PCI Error Handling.
Chapter
7
PCI Error Handling Product Note
New Error Messages for PCI Error Handling
— Error messages for the btlan, igelan, and iether drivers appear in the console log only and do not get logged in syslog.
— Error messages for the fcd and mpt drivers are logged in syslog and diaglog. — If an I/O card has multiple ports, error messages may not be reported for all of the ports on the card if the
PCI Error Handling feature suspends the driver before the error is detected on all of the ports.

New btlan Driver Error Message

There is 1 new error message for the btlan driver (100BaseT – Networking) that will appear in the console log as illustrated in the following example:
--------------------------100 Mb/s LAN/9000 Networking----------------------@#%
Fri Dec 02 PST 2005 11:14:54.650350 DISASTER Subsys:BTLAN Loc:00000
<6006> 10/100BASE-T adapter in slot(Crd In#) 6 detected a PCI
error. The adapter was moved to DEAD state.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

New igelan Driver Error Messages

There are 3 new error messages for the igelan driver (Gigabit Networking) that will appear in the console log as illustrated in the following examples:
-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Fri Dec 02 PST 2005 11:27:44.700477 DISASTER Subsys:IGELAN Loc:00000
<1002> 1000Base-T in path 1/0/8/1/0/6/1
Was moved to DEAD state due to a PCI error.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Fri Dec 02 PST 2005 11:27:44.700477 DISASTER Subsys:IGELAN Loc:00000
<1003> 1000Base-T in path 1/0/8/1/0/6/1
Resume failed due to a PCI error.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Fri Dec 02 PST 2005 11:27:44.753351 DISASTER Subsys:IGELAN Loc:00000
<1004> 1000Base-T in path 1/0/8/1/0/6/1
Is being suspended due to a PCI Error.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

New iether Driver Error Messages

The new error messages for the iether driver (Gigabit Networking) that will appear in the console log as illustrated in the following examples:
8
Chapter
PCI Error Handling Product Note
New Error Messages for PCI Error Handling
-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Thu Jan 24 MST 2008 21:50:49.540624 DISASTER Subsys:IETHER Loc:00000
<1002> 1000Base-T in path 6/0/0/1/0 Was moved to DEAD state due to a PCI error. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Thu Jan 24 MST 2008 21:50:49.565469 DISASTER Subsys:IETHER Loc:00000
<1004> 1000Base-T in path 6/0/0/1/0 Is being suspended due to a PCI error. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Thu Jan 24 MST 2008 21:50:49.585899 DISASTER Subsys:IETHER Loc:00000
<1004> 1000Base-T in path 6/0/0/1/1 Is being suspended due to a PCI error. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

New fcd Driver Error Messages

There 6 new error messages for the fcd driver (2-Gigabit 2 port and 4-Gigabit 1 port FibreChannel - mass storage) that will be logged in syslog and diaglog as illustrated in the following example(s):
1/0/10/1/0: Fibre Channel Driver has detected a Fatal PCI Error.
1/0/10/1/0: Fibre Channel Driver Received Suspend Request.
1/0/10/1/0: Fibre Channel Driver has been Successfully Suspended.
1/0/10/1/1: Fibre Channel Driver Received Suspend Request.
1/0/10/1/1: Fibre Channel Driver has detected a Fatal PCI Error.
1/0/10/1/1: Fibre Channel Driver has been Successfully Suspended.

New mpt Driver Error Messages

There are six new error messages for the mpt driver that will be logged in syslog and diaglog for the following events:
When the driver is taken offline due to a PCI error
Chapter
9
PCI Error Handling Product Note

How to Online Recover from a PCI Error

When the driver is suspended due to a PCI error
When the driver is resumed after a PCI error
When the resume operation fails due to a PCI error
When a firmware update on the card associated with the driver fails due to a PCI error
When an initiator ID change fails due to a PCI error
How to Online Recover from a PCI Error
The olrad command and the Attention Button can be used to attempt online recovery from a PCI error without requiring a system reboot.

Recovery Using the olrad Command

Step 1. If the PCI slot remains powered ON, use the olrad –p OFF slot_id command to power it OFF.
Step 2. If power OFF succeeds, try a Post Replace operation at the slot using the olrad -R slot_id
command.
Step 3. If the Post Replace operation fails there is a high probability that the I/O card is bad. HP
recommends replacing the I/O card with an I/O card that has the same HP Manufacturing Part Number and the same (or later) release version number, then repeat the Post Replace operation described in Step 2.
Step 4. If the Post Replace operation succeeds and the I/O card/slot recovers from the error, the software
state of the components will be marked CLAIMED in the ioscan(1M) output. If you continue to experience errors on this slot, there is a high probability that the I/O card is bad. HP recommends replacing the I/O card with an I/O card that has the same HP Manufacturing Part Number and the same (or later) release version number, then repeat the Post Replace operation described in Step 2.
IMPORTANT If you use Serviceguard, HP recommends the PCI Error Handling feature only be enabled if
your storage devices are configured with multiple paths and are protected by high availability storage software such as PVLink, SecurePath, or MirrorDisk/UX. If PCI Error Handling is enabled, but your storage devices are configured with only a single path, a system reboot may be necessary to recover from a PCI error.
NOTE With the PCI Error Handling solution installed, there is still a remote possibility that an MCA
or HPMC could occur during a PCI OLA operation (online addition of an I/O card). At the beginning of a PCI OLA operation, there is a brief time during which the PCI Error Handling infrastructure determines if the driver associated with the card is PCI Error Handling capable. Any PCI error that occurs during this brief window of exposure can cause an MCA or HPMC. This exposure only exists during PCI OLA operations. This exposure does not exist during PCI OLR operations (online replacement of an I/O card), or during ordinary I/O card operations.
10
Chapter
PCI Error Handling Product Note
How to Online Recover from a PCI Error
The following example shows how the PCI Error Handling feature is used to handle a PCI error involving the iether driver:
NOTE The PCI Error Handling procedure detailed in this example may vary slightly from what you
will experience, depending on the platform and IO card driver.
A. A PCI error occurs and error messages are displayed on the console:
-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Thu Jan 24 MST 2008 21:50:49.540624 DISASTER Subsys:IETHER Loc:00000
<1002> 1000Base-T in path 6/0/0/1/0
Was moved to DEAD state due to a PCI error.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Thu Jan 24 MST 2008 21:50:49.565469 DISASTER Subsys:IETHER Loc:00000
<1004> 1000Base-T in path 6/0/0/1/0
Is being suspended due to a PCI error.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Thu Jan 24 MST 2008 21:50:49.585899 DISASTER Subsys:IETHER Loc:00000
<1004> 1000Base-T in path 6/0/0/1/1
Is being suspended due to a PCI error.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
B. Execute the olrad -q command to confirm the card is in the suspended state:
Capable
Slot Path Bus Max Spd Pwr Occu Susp OLAR OLD Max Mode
Num Spd Mode
0-1-1-0 6/0/0/1 24576 133 133 On Yes Yes Yes N/A PCI-X PCI-X
0-1-1-1 6/0/1/1 24832 133 66 On Yes No Yes N/A PCI-X PCI
0-1-1-8 6/0/12/1 26880 133 133 On Yes Yes Yes N/A PCI-X PCI-X
0-1-1-9 6/0/10/1 26624 133 133 Off No N/A N/A N/A PCI-X PCI-X
0-1-1-10 6/0/9/1 26368 133 133 On Yes No Yes N/A PCI-X PCI-X
0-1-1-11 6/0/8/1 26112 133 133 Off No N/A N/A N/A PCI-X PCI-X
Chapter
11
PCI Error Handling Product Note
How to Online Recover from a PCI Error
PCI-Express Slots Information
-----------------------------
Driver(s)
Capable
Slot Path Link Max Max Link Pwr Occu Susp OLAR OLD Mode
Spd Link Link Width
Spd Width
0-1-1-2 6/0/2/0/0/0 2.5 2.5 x8 x8 Off No N/A N/A N/A PCIe
0-1-1-3 6/0/4/0/0/0 2.5 2.5 x8 x1 On Yes N/A N/A N/A PCIe
0-1-1-4 6/0/5/0/0/0 2.5 2.5 x8 x8 Off No N/A N/A N/A PCIe
0-1-1-5 6/0/6/0/0/0 2.5 2.5 x8 x4 On Yes No Yes N/A PCIe
0-1-1-6 6/0/14/0/0/0 2.5 2.5 x8 x4 On Yes No Yes N/A PCIe
0-1-1-7 6/0/13/0/0/0 2.5 2.5 x8 x8 On Yes N/A N/A N/A PCIe
C. Execute ioscan -kfnH on the iether driver to confirm the card is in error state:
Class I H/W Path Driver S/W State H/W Type Description
====================================================================
ba 0 6/0/0 lba ERROR BUS_NEXUS Local PCI-X Bus Adapter
(12ee)
lan 0 6/0/0/1/0 iether ERROR INTERFACE HP A7012-60001 PCI/PCI-X
1000Base-T Dual-port Adapter
lan 1 6/0/0/1/1 iether ERROR INTERFACE HP A7012-60001 PCI/PCI-X
1000Base-T Dual-port Adapter
D. To recover from the error, use the olrad -p off command to power off the slot:
# olrad -p off 0-1-1-0
E. Execute the olrad -q command to confirm the power is off:
# olrad –q
Capable
Slot Path Bus Max Spd Pwr Occu Susp OLAR OLD Max Mode
Num Spd Mode
0-1-1-0 6/0/0/1 24576 133 133 Off Yes Yes Yes N/A PCI-X PCI-X
0-1-1-1 6/0/1/1 24832 133 66 On Yes No Yes N/A PCI-X PCI
12
Chapter
PCI Error Handling Product Note
How to Online Recover from a PCI Error
0-1-1-8 6/0/12/1 26880 133 133 On Yes Yes Yes N/A PCI-X PCI-X
0-1-1-9 6/0/10/1 26624 133 133 Off No N/A N/A N/A PCI-X PCI-X
0-1-1-10 6/0/9/1 26368 133 133 On Yes No Yes N/A PCI-X PCI-X
0-1-1-11 6/0/8/1 26112 133 133 Off No N/A N/A N/A PCI-X PCI-X
PCI-Express Slots Information
-----------------------------
Driver(s)
Capable
Slot Path Link Max Max Link Pwr Occu Susp OLAR OLD Mode
Spd Link Link Width
Spd Width
0-1-1-2 6/0/2/0/0/0 2.5 2.5 x8 x8 Off No N/A N/A N/A PCIe
0-1-1-3 6/0/4/0/0/0 2.5 2.5 x8 x1 On Yes N/A N/A N/A PCIe
0-1-1-4 6/0/5/0/0/0 2.5 2.5 x8 x8 Off No N/A N/A N/A PCIe
0-1-1-5 6/0/6/0/0/0 2.5 2.5 x8 x4 On Yes No Yes N/A PCIe
0-1-1-6 6/0/14/0/0/0 2.5 2.5 x8 x4 On Yes No Yes N/A PCIe
0-1-1-7 6/0/13/0/0/0 2.5 2.5 x8 x8 On Yes N/A N/A N/A PCIe
F. Use the olrad -R command to resume the card:
# olrad -R 0-1-1-0
Activity : Start of Post Replace
Target slot : 0-1-1-0
Activity : End of Post Replace
Target slot : 0-1-1-0
Activity : Target slot powered on, drivers resumed, OK to start using the card
Target slot : 0-1-1-0
G. Execute the olrad -q command again to confirm the card has been resumed:
# olrad –q Capable
Slot Path Bus Max Spd Pwr Occu Susp OLAR OLD Max Mode
Num Spd Mode
0-1-1-0 6/0/0/1 24576 133 133 On Yes No Yes N/A PCI-X PCI-X
0-1-1-1 6/0/1/1 24832 133 66 On Yes No Yes N/A PCI-X PCI
0-1-1-8 6/0/12/1 26880 133 133 On Yes Yes Yes N/A PCI-X PCI-X
0-1-1-9 6/0/10/1 26624 133 133 Off No N/A N/A N/A PCI-X PCI-X
Chapter
13
PCI Error Handling Product Note

PCI Error Handling Documentation

0-1-1-10 6/0/9/1 26368 133 133 On Yes No Yes N/A PCI-X PCI-X
0-1-1-11 6/0/8/1 26112 133 133 Off No N/A N/A N/A PCI-X PCI-X
PCI-Express Slots Information
-----------------------------
Driver(s)
Capable
Slot Path Link Max Max Link Pwr Occu Susp OLAR OLD Mode
Spd Link Link Width
Spd Width
0-1-1-2 6/0/2/0/0/0 2.5 2.5 x8 x8 Off No N/A N/A N/A PCIe
0-1-1-3 6/0/4/0/0/0 2.5 2.5 x8 x1 On Yes N/A N/A N/A PCIe
0-1-1-4 6/0/5/0/0/0 2.5 2.5 x8 x8 Off No N/A N/A N/A PCIe
0-1-1-5 6/0/6/0/0/0 2.5 2.5 x8 x4 On Yes No Yes N/A PCIe
0-1-1-6 6/0/14/0/0/0 2.5 2.5 x8 x4 On Yes No Yes N/A PCIe
0-1-1-7 6/0/13/0/0/0 2.5 2.5 x8 x8 On Yes N/A N/A N/A PCIe

Recovery Using the Attention Button

To use the Attention Button to recover from a PCI error, refer to the Interface Card OL* Support Guide, Manufacturing Part Number B2355-90862, for instructions on using the Attention Button, then use the Attention Button to complete the same steps that are illustrated in “Recovery Using the olrad Command” on page 10:
Step 1. Confirm the driver/card is suspended
Step 2. Confirm card is in an error state
Step 3. Power off the slot
Step 4. Confirm the slot power is off
Step 5. Resume the card
Step 6. Confirm the card has been resumed
PCI Error Handling Documentation
The documentation that supports this release of the PCI Error Handling feature consists of:
PCI Error Handling Product Note, March 2007, Manufacturing Part Number 5992-0539 — available in the High Availability category at http://docs.hp.com/en/ha.html
SW Depot web page — available from the HP Software Depot at http://h20293.www2.hp.com
14
Chapter
PCI Error Handling Product Note

Known Problems

olrad manpage — after installing the PCI Error Handling feature, enter man olrad from the command line to view the olrad manpage that includes PCI Error Handling information.
Interface Card OL* Support Guide, September 2004, Manufacturing Part Number B2355-90862 — available at: http://docs.hp.com
Patch Management User Guide for HP-UX 11.x Systems, February 2007, Manufacturing Part Number 5991-6449 — available at: http://docs.hp.com
Known Problems
IMPORTANT If you use Serviceguard, HP recommends the PCI Error Handling feature only be enabled if
your storage devices are configured with multiple paths and are protected by high availability storage software such as PVLink, SecurePath, or MirrorDisk/UX. If PCI Error Handling is enabled, but your storage devices are configured with only a single path, Serviceguard may not detect when connectivity is lost and cause a failover.
NOTE With the PCI Error Handling solution installed, there is still a remote possibility that an MCA
or HPMC could occur during a PCI OLA operation (online addition of an I/O card). At the beginning of a PCI OLA operation, there is a brief time during which the PCI Error Handling infrastructure determines if the driver associated with the card is PCI Error Handling capable. Any PCI error that occurs during this brief window of exposure can cause an MCA or HPMC. This exposure only exists during PCI OLA operations. This exposure does not exist during PCI OLR operations (online replacement of an I/O card), or during ordinary I/O card operations.

Removing the PCI Error Handling Feature

To remove the PCI OL* Error Handling feature use the swremove command:
# swremove -x autoreboot=true PCIErrorHandling
This will remove the PCI Error Handling feature and reboot your system, leaving the bundle wrapper and kernel patches on your system. The kernel patches that were included with the product bundle are recommended for your system. Therefore, we advise that you do not remove them.
For more information on managing patches on your system, see the Patch Management User Guide for HP-UX
11.x Systems, February 2007, Manufacturing Part Number 5991-6449. This document is available on the
Support Plus media and on the Hewlett-Packard documentation web site: http://www.docs.hp.com Use the swlist command to verify PCI Errror Handling has been removed from your system. The swlist
command will not display PCIErrorHandling if it has been removed from your system.
Chapter
15
PCI Error Handling Product Note

Terms and Definitions

Terms and Definitions
HPMC High Priority Machine Check – Highest Priority interruption onPA-RISC based systems
MCA Machine Check Abort – Highest Priority interruption on Itanium Post Replace Operation - By issuing the olrad -R slot_id command after an I/O card is replaced, slot
power is turned on, suspended drivers are resumed, driver scripts (post_replace) for the slot (slot_id) and affected slots (if any) are run, and the attention LED for the slot (slot_id) is set to OFF.
based systems
16
Chapter
Loading...