IBM SY33-0193-00 User Manual

IBM PC Servers
SY33-0193-00
IBM SerialRAID Adapter for PC Servers
Hardware Maintenance Manual Supplement
October 1998
IBM
IBM PC Servers
IBM SerialRAID Adapter for PC Servers
Hardware Maintenance Manual Supplement
October 1998
SY33-0193-00
Note
Before using this information and the product it supports, be sure to read the general information under “Notices” in the product documentation.
First Edition (October 1998)
The following paragraph does not apply to any country where such provisions are inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions; therefore, this statement may not apply to you.
This publication could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements or changes in the products or the programs described in this publication at any time.
It is possible that this publication may contain reference to, or information about, IBM products (machines and programs), programming, or services that are not announced in your country. Such references or information must not be construed to mean that IBM intends to announce such IBM products, programming, or services in your country.
Requests for technical information about IBM products should be made to your IBM Authorized Dealer or your Marketing Representative.
Copyright International Business Machines Corporation 1998. All rights reserved.
Note to U.S. Government Users — Documentation related to restricted rights — Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp.
Contents
About This Supplement ............................... v
How This Book Is Organized ............................. v
Related Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Introducing the IBM SerialRAID Adapter ..................... 1
General Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Service Request Numbers (SRNs) ......................... 3
Displaying SRNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
The SRN Table .................................... 3
Using the SRN Table ................................. 3
Software and Microcode Errors .......................... 12
SSA Loop Configurations that Are Not Valid .................... 12
Dealing with Fast-Write Problems ......................... 12
SSA Link Errors .................................. 15
SSA Link Error Problem Determination ...................... 15
Link Status Lights .................................. 16
Locating a Broken Loop .............................. 17
Removing and Replacing FRUs ......................... 19
Exchanging Disk Drives .............................. 19
Replacement Disk Drives ............................ 19
Exchanging a Non-Array Disk Drive ....................... 19
Exchanging an Array Disk ............................ 20
Exchanging DRAMs on the IBM SerialRAID Adapter ............... 22
Removing a DRAM ............................... 22
Installing a DRAM ................................ 23
Exchanging the Fast-Write Cache Card ...................... 23
Removing the Cache Card ........................... 23
Installing the Cache Card ............................ 26
Service Aids and Other Utilities ......................... 29
Disk Service Aids .................................. 29
Accessing Service Aids from the DOS Configurator .............. 29
Accessing Service Aids from the RSM Configurator .............. 30
Service Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
To Set Service Mode with the DOS Configurator ................ 31
To Set Service Mode With the RSM Configurator ............... 33
Format Disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
To Format Using the DOS Configurator ..................... 34
To Format With the RSM Configurator ..................... 35
Certify Disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
To Certify using the DOS Configurator ..................... 35
To Certify With the RSM Configurator ..................... 36
Copyright IBM Corp. 1998 iii
The Identify Function ................................ 37
To Identify with the DOS Configurator ..................... 37
To Identify With the RSM Configurator ..................... 37
Download Microcode Function ........................... 37
Finding the Physical Location of a Device ..................... 38
Finding the Device When Service Aids Are Available ............. 38
Finding the Device When No Service Aids Are Available ............ 38
The Event/Error Logger ............................... 39
Analyze SSA Event Log ............................. 39
View SSA Event Log .............................. 39
Stop SSA Event Logging ............................ 39
Modify Event Logger Time Out ......................... 39
Error Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Maintenance Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Converting a New Resource to a Free Resource ................. 41
Using the DOS Configurator: .......................... 41
Using the RSM Configurator: .......................... 42
Deleting a Resource from the System Resource List ............... 43
Using the DOS Configurator: .......................... 43
Removing a Disk Drive from an Array ....................... 43
Listing or Deleting Records of Old Arrays in the NVRAM ............. 43
Adding a Disk Drive to an Array .......................... 44
Modifying Attributes of Resources ......................... 44
Creating an Array .................................. 45
Attaching a Resource to the System ........................ 47
Maintenance Analysis Procedures (MAPs) ................... 49
Introduction to Using the MAPs .......................... 49
A Note on Configurator Utilities .......................... 49
MAP 2010: START . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2010-1
MAP 2320: SSA Link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2320-1
MAP 2323: SSA Intermittent Link Error .................... 2323-1
MAP 2324: SSA RAID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2324-1
MAP 2410: SSA Repair Verification ...................... 2410-1
iv IBM SerialRAID Adapter Maintenance Information
About This Supplement
This book is intended for service representatives who maintain PC servers that use the IBM SerialRAID Adapter.
How This Book Is Organized
“Introducing the IBM SerialRAID Adapter” on page 1 introduces the IBM SerialRAID Adapter.
“Service Request Numbers (SRNs)” on page 3 provides a table of service request numbers (SRNs) that are related to the IBM SerialRAID Adapter.
“Removing and Replacing FRUs” on page 19 describes how to exchange disk drives, DRAMs and the Fast-Write cache.
“Service Aids and Other Utilities” on page 29 describes the SSA service aids. “Maintenance Tasks” on page 41 describes a number of tasks involving the
configurator utilities that are called during the maintenance analysis procedures (MAPs). “Maintenance Analysis Procedures (MAPs)” on page 49 provides maintenance analysis
procedures for the IBM SerialRAID Adapter.
Important
This manual is intended for trained service personnel who are familiar with IBM PC Server products.
Before servicing an IBM product, be sure to review “Safety Information” in the product documentation.
Copyright IBM Corp. 1998 v
Related Publications
Other manuals that you might find useful are:
IBM SerialRAID Adapter: Installation and User’s Guide
IBM SerialRAID Adapter: Technical Reference
For more information, contact IBM or your IBM Authorized Dealer.
, S33-3283-00
, SA33-3275-01
vi IBM SerialRAID Adapter Maintenance Information
Introducing the IBM SerialRAID Adapter
The IBM SerialRAID Adapter is a Peripheral Component Interconnect (PCI) adapter that serves as the interface between systems based on PCI architecture and devices that use Serial Storage Architecture (SSA). The adapter has four ports, which can be connected in pairs to drive two SSA loops. Each loop can contain a maximum of 48 disk drives. Two adapters, each one located in a different PC Server, can drive the same SSA loop. This arrangement is referred to as a Cluster Configuration. (See the
IBM SerialRAID Adapter Installation and User's Guide
The four SSA ports on the adapter can operate at 20MB/s full-duplex over point-to-point copper cables up to 25 meters long. SSA uses an industry-standard interface based on SCSI-2 commands, queuing model, status and sense bytes.
.1/ SSA Loop B Port 2 .3/ SSA Loop A Port 2 .2/ SSA Loop B Port 1 .4/ SSA Loop A Port 1
Internal Connectors
for more details).
1
Fast-Write Cache card
2 3 4
Lights
The four SSA connectors on the adapter card are arranged in two pairs; connectors A1 and A2 are one pair, B1 and B2 are the other. Next to each pair of connectors is a light that functions as follows:
ON continuously: Power is turned on to the adapter and both ports for that loop
are operational; that is, the devices in the loop have power turned on, are connected correctly to the adapter, and are operational.
Copyright IBM Corp. 1998 1
Flashing continuously: One of the ports is not operational. This condition occurs
if the cable to the port is not connected correctly, or if the device in the loop connected next to the adapter is not operational.
Off: Both ports are non-operational.
General Information
For general information regarding the IBM SerialRAID Adapter, RAID technology and SSA loops, see the
The IBM SerialRAID Adapter also contains array management software that provides RAID-5 functions to control the arrays of the RAID system. An array can have from 3 to 16 member disk drives and is handled as one large disk drive by the operating system. The array management software translates requests to the single large disk into requests to the individual member disk drives. Configuration software is available that allows the user to define which disk drives in the loop, if any, are to be included in an array.
Up to three adapters can be present in one system unit. For performance reasons it is recommended that they are all placed on the same bus.
A module on the IBM SerialRAID Adapter contains a lithium battery.
CAUTION: A lithium battery can cause fire, explosion, or a severe burn. Do not recharge, disassemble, heat above 100°C (212°F), solder directly to the cell, incinerate, or expose cell contents to water. Keep away from children. Replace only with the part number specified with your system. Use of another battery might present a risk of fire or explosion.
IBM SerialRAID Adapter: Installation and User's Guide
.
The battery connector is polarized; do not try to reverse the polarity.
Dispose of the battery according to local regulations.
2 IBM SerialRAID Adapter Maintenance Information
Service Request Numbers (SRNs)
Service request numbers (SRNs) are generated by the error logging facility and by the diagnostics. SRNs help you to identify the cause of a problem, the failing field-replaceable units (FRUs), and the service actions that might be needed to solve the problem.
Displaying SRNs
To see the SRNs run the Remote Systems Management (RSM) Configurator. (see the configurator)
Use the configurator to display the SRNs as follows:
1. On the opening page, select Event Logger
2. On the second page, select Analyse
The error log is analysed and all errors with a severity level that calls for service intervention are displayed.
The SRN Table
The table in this section lists the SRNs and describes the actions you should take. The table columns are:
Installation and User Guide
for details of how to load and start the RSM
SRN The service reference number. FRU list The FRU or FRUs that might be causing the problem, and how likely it is
(by percentage) that the FRU is causing the problem.
Problem A description of the problem and the action you must take.
Abbreviations used in the table are:
DMA Direct memory access FRU Field-replaceable unit PAA P = Adapter port number
AA = SSA address (see also “Finding the Device When No Service Aids Are Available” on page 38)
PCI Peripheral Component Interconnect POST Power-On Self-Test
Using the SRN Table
Important: You should have been sent here from either diagnostics or a START MAP.
Do not start problem determination from the SRN table; always go to the START MAP for the unit in which the device is installed.
Copyright IBM Corp. 1998 3
1. Find the SRN in the table.
If you cannot find the SRN
, refer to the documentation for the subsystem or device. If you still cannot find the SRN, you have a problem with the diagnostics, the microcode, or the documentation. Call your support center for assistance.
2. Read carefully the “Action” you must do for the problem.
adapters unless you are instructed to do so
.
Do not exchange
3. Normally exchange only one adapter at a time. Always use instructions provided with the system unit when exchanging adapters. After each adapter is exchanged, go to “MAP 2410: SSA Repair Verification” on page 2410-1 to verify the repair.
SRN FRU List Problem
20PAA Device (45%)
21PAA
to
29PAA
2A002 Device (50%)
2A003 Device (50%)
2A004 Device (50%)
2FFFF None Description: An async code that is not valid has been received.
303FF Device (100%)
40000 SSA adapter card (100%) Description: The SSA adapter card has failed.
40004 4 MB DRAM module 0 (100%)
40008 8 MB DRAM module 0 (100%)
40016 16 MB DRAM module 0 (100%)
(“Exchanging Disk Drives” on page 19).
SSA adapter card (45%) External SSA cables (6%) Internal SSA connections (4%)
(
Hardware Maintenance Manual
Device (45%)
(“Exchanging Disk Drives” on page 19).
SSA adapter card (45%) External SSA cables (6%) Internal SSA connections (4%)
(
Hardware Maintenance Manual
(“Exchanging Disk Drives” on page 19).
SSA adapter card (50%)
(“Exchanging Disk Drives” on page 19).
SSA adapter card (50%)
(“Exchanging Disk Drives” on page 19).
SSA adapter card (50%)
(“Exchanging Disk Drives” on page 19).
(“Exchanging DRAMs on the IBM SerialRAID Adapter” on page 22).
(“Exchanging DRAMs on the IBM SerialRAID Adapter” on page 22).
(“Exchanging DRAMs on the IBM SerialRAID Adapter” on page 22).
).
).
Description: An open SSA link has been detected. Action: Run the Disk service aid to isolate the failure (see “Service Aids and Other Utilities” on page 29). If the SSA service aids are not available, go to the service information for the unit in which the device is installed.
Description: An SSA ‘Threshold exceeded’ link error has been detected. Action: Go to “MAP 2010: START” on page 2010-1.
Description: Async code 02 has been received. Probably, a software error has occurred. Action: Go to “Software and Microcode Errors” on page 12 before exchanging any FRUs.
Description: Async code 03 has been received. Probably, a software error has occurred. Action: Go to “Software and Microcode Errors” on page 12 before exchanging any FRUs.
Description: Async code 04 has been received. Probably, a software error has occurred. Action: Go to “Software and Microcode Errors” on page 12 before exchanging any FRUs.
Action: Go to “Software and Microcode Errors” on page 12. Description: A SCSI status that is not valid has been received.
Action: Go to “Software and Microcode Errors” on page 12.
Action: Exchange the FRU for a new FRU. Description: A 4 MB DRAM in adapter card module 0 has failed.
Action: Exchange the FRU for a new FRU.
Description: An 8 MB DRAM in adapter card module 0 has failed. Action: Exchange the FRU for a new FRU.
Description: A 16 MB DRAM in adapter card module 0 has failed. Action: Exchange the FRU for a new FRU.
4 IBM SerialRAID Adapter Maintenance Information
SRN FRU List Problem
40032 32 MB DRAM module 0 (100%)
(“Exchanging DRAMs on the IBM SerialRAID Adapter” on page 22).
40064 64 MB DRAM module 0 (100%)
(“Exchanging DRAMs on the IBM SerialRAID Adapter” on page 22).
40128 128 MB DRAM module 0 (100%)
(“Exchanging DRAMs on the IBM SerialRAID Adapter” on page 22).
41004 4 MB DRAM module 1 (100%)
(“Exchanging DRAMs on the IBM SerialRAID Adapter” on page 22).
41008 8 MB DRAM module 1 (100%)
(“Exchanging DRAMs on the IBM SerialRAID Adapter” on page 22).
41016 16 MB DRAM module 1 (100%)
(“Exchanging DRAMs on the IBM SerialRAID Adapter” on page 22).
41032 32 MB DRAM module 1 (100%)
(“Exchanging DRAMs on the IBM SerialRAID Adapter” on page 22).
41064 64 MB DRAM module 1 (100%)
(“Exchanging DRAMs on the IBM SerialRAID Adapter” on page 22).
41128 128 MB DRAM module 1 (100%)
(“Exchanging DRAMs on the IBM SerialRAID Adapter” on page 22).
42000 SSA adapter card (50%)
DRAM modules (50%)
(“Exchanging DRAMs on the IBM SerialRAID Adapter” on page 22).
42200 None Description: Other adapters on the SSA loop are using levels of
Description: A 32 MB DRAM in adapter card module 0 has failed. Action: Exchange the FRU for a new FRU.
Description: A 64 MB DRAM in adapter card module 0 has failed. Action: Exchange the FRU for a new FRU.
Description: A 128 MB DRAM in adapter card module 0 has failed. Action: Exchange the FRU for a new FRU.
Description: A 4 MB DRAM in adapter card module 1 has failed. Action: Exchange the FRU for a new FRU.
Description: An 8 MB DRAM in adapter card module 1 has failed. Action: Exchange the FRU for a new FRU.
Description: A 16 MB DRAM in adapter card module 1 has failed. Action: Exchange the FRU for a new FRU.
Description: A 32 MB DRAM in adapter card module 1 has failed. Action: Exchange the FRU for a new FRU.
Description: A 64 MB DRAM in adapter card module 1 has failed. Action: Exchange the FRU for a new FRU.
Description: A 128 MB DRAM in adapter card module 1 has failed. Action: Exchange the FRU for a new FRU.
Description: The SSA adapter has detected that both DRAM modules are failing. Action:
1. Check whether both DRAM modules are correctly installed on the adapter card. Make any necessary corrections.
2. If this problem has occurred immediately after an upgrade to the adapter card, check whether the correct type of DRAM modules have been installed. Make any necessary corrections.
3. If the problem remains, exchange the adapter card FRU for a new one.
4. Install the DRAM modules from the original adapter card onto the new adapter card, then install the new adapter card.
5. If the problem remains, exchange the DRAM modules for new modules.
6. Install the new DRAM modules onto the original adapter card. Reinstall the original adapter card.
microcode that are not compatible. Action: Install the latest level of microcode on all other adapters in this SSA loop. First refer to “Software and Microcode Errors” on page 12 and if necessary “Download Microcode Function” on page 37.
Do not exchange any DRAM modules yet
.
Service Request Numbers (SRNs) 5
SRN FRU List Problem
42500 Fast-Write Cache Card (98%)
42510 None Description: Not enough DRAM available to run the fast-write
42515 Fast-write Cache Card (90%)
(“Exchanging the Fast-Write Cache Card” on page 23) SSA Adapter Card (2%) (Installation and User Guide)
“Exchanging the Fast-Write Cache Card” on page 23)
SSA adapter card (10%) (using system
Installation and Users Guide
)
Description: The Fast-Write Cache Card has failed. Action:
1. Exchange the Cache Card for a new one
2. Switch on power to the using system
3. New error codes are produced if the original cache card contained data that was not moved to a disk drive. Run diagnostics on the adapter and if a SRN is produced, do the actions for that SRN.
cache operation. Action:
1. Start the using-system service aids.
2. Select Display or Change Configuration or Vital Product Data (VPD).
3. Select Display Vital Product Data.
4. Find the VPD for the SSA adapter that is logging the error.
5. Note the DRAM and cache sizes (Device Specifics Z0 and Z1).
6. For fast-write operations, you must have a 32 MB DRAM. Check that you have the correct size of DRAM.
Description: A fast-write disk is installed, but no Fast-Write Cache Card has been detected. This problem can be caused because:
The cache card is not installed correctly.The Fast-Write feature is not installed on this machine, but
a disk drive that is configured for fast-write operations has been added to the subsystem.
Action:
1. If you have not already done so, run diagnostics on the adapter. If a different SRN is generated solve that problem first.
2. Do the following actions as appropriate:
If the cache card is not installed correctly, remove it
from the adapter and reinstall it correctly.
If the cache card is installed correctly, it might have
failed. Exchange for new FRUs, the FRUs that are shown in the FRU list for this SRN.
If the Fast-Write feature is not installed, and you want
to delete the fast-write function for one or more disk drives that have been added to this subsystem:
3. Verify with the customer that the fast-write function can be deleted for the disk drives configured for fast-write.
4. Using the RSM configurator, select the resource in question and select Delete FW. See “Dealing with Fast-Write Problems” on page 12 for more details.
6 IBM SerialRAID Adapter Maintenance Information
SRN FRU List Problem
42520 Fast-Write Cache Card (100%) Description: A Fast-Write Cache Card has failed. Data has been
42521 Fast-Write Cache Card (100%)
(“Exchanging the Fast-Write Cache Card” on page 23)
written to the cache card and cannot be recovered. The location of the lost data is not known. The disk drive is offline.
Action:
1. Ask the customer to determine:
Which disk drives are affected by this errorHow much data has been lostWhich data recovery procedures can be done
2. Ask the customer to disable the Fast-Write feature for:
Each device for which Fast-Write is offlineAll other devices that are connected to the failing
adapter and have Fast-Write enabled For details of how to disable Fast-Write see “Dealing
with Fast-Write Problems” on page 12.
3. Exchange the Fast-Write Cache Card for a new one.
4. Ask the customer to re-enable Fast-Write for the devices that are attached to the new Fast-Write Cache Card.
Description: A Fast-Write Cache card has failed. Data has been written to the card and cannot be recovered. The disk drives that have lost data cannot be identified. All unsynchronized fast-write disk drives that are attached to this adapter are off-line.
Action:
1. Ask the customer to determine:
Which disk drives are affected by this errorHow much data has been lostWhich data recovery procedures can be done
2. Ask the customer to disable Fast-Write for:
Each device for which the Fast-Write is offlineAll other devices that are connected to the failing
adapter and have the Fast-Write enabled For details of how to disable Fast-Write see “Dealing
with Fast-Write Problems” on page 12.
3. Exchange the Fast-Write Cache card for a new one.
4. Ask the customer to re-enable Fast-Write for the devices that are attached to the new Fast-Write Cache Card.
Service Request Numbers (SRNs) 7
SRN FRU List Problem
42522 Fast-Write Cache Card (100%)
(“Exchanging the Fast-Write Cache Card” on page 23)
42523 None Description: The Fast-Write Cache Card has a bad version
42524 Fast-Write Cache Option Card (100%) Description: A fast-write disk drive (or drives) that does not
Description: A Fast-Write Cache card has failed. Data has been written to the card and cannot be recovered. One or more 4 KB blocks of data for a known disk have been lost and cannot be read.
Action:
1. Ask the customer to determine:
Which disk drives are affected by this errorHow much data has been lostWhich data recovery procedures can be done
2. Ask the customer to disable Fast-Write for:
Each device for which the Fast-Write is offlineAll other devices that are connected to the failing
adapter and have the Fast-Write enabled For details of how to disable Fast-Write see “Dealing
with Fast-Write Problems” on page 12.
3. Exchange the Fast-Write Cache card for a new one.
4. Ask the customer to re-enable Fast-Write for the devices that are attached to the new Fast-Write Cache Card.
number. Action: Install the correct adapter microcode for this card.
contain synchronized data has been detected. The Fast-Write Cache Card, however, cannot be detected. The disk drive (or drives) is offline.
Action:
If the Fast-Write Cache Card has been removed, reinstall it.If the Fast-Write Cache card has failed:
1. Ask the customer to disable the Fast-Write for: – Each device for which the Fast-Write is offline – All other devices that are connected to the failing
adapter, and have Fast-Write enabled For details of how to disable Fast-Write see
“Dealing with Fast-Write Problems” on page 12.
2. Exchange the Fast-Write Cache Card for a new one.
3. Ask the customer to re-enable Fast-Write for the
devices that are attached to the new Fast-Write Cache Card.
8 IBM SerialRAID Adapter Maintenance Information
SRN FRU List Problem
42525 None Description: The wrong Fast-Write Cache Card has been
42526 SSA adapter card (100%) (
42527 None Description: A dormant fast-write cache entry exists.
42528 None Description: A fast-write disk drive has been detected that was
43PAA Device (90%)
and User Guide
(“Exchanging Disk Drives” on page 19).
SSA adapter card (10%)
).
Installation
detected by a fast-write disk drive that contains unsynchronized data.
Action: The failing disk drive is offline. If the disk drive has just been moved from another adapter, do either of the following actions:
Return the disk drive to its original adapter.Move the original Fast-Write Cache card to this adapter so
that the data can be synchronized.
If you cannot do either action, or the data on the disk drive has no value:
1. Ask the customer to disable Fast-Write for:
Each device for which the Fast-Write option is offlineAll other devices that are connected to the failing
adapter, and have Fast-Write enabled. For details of how to disable Fast-Write see “Dealing
with Fast-Write Problems” on page 12.
2. Ask the customer to re-enable Fast-Write for the devices that are attached to the new Fast-Write Cache Card.
Description: This adapter card does not provide support for the Fast-Write Cache.
Action: Install the correct SSA adapter (if applicable).
Action: The fast-write cache contains unsynchronized data for a disk drive that is no longer available. If possible, reconnect the disk drive to the adapter to enable the data to be synchronized. If you cannot reconnect the disk drive (for example, because the disk drive has failed), the user should delete the dormant fast-write cache entry
Although the resource is no longer available, the RSM configurator will show the resource. Go to the Resource View page of the RSM and select Detach or Delete as appropriate.
previously unsynchronized, but has since been configured on a different adapter.
Action: If this disk drive contains data that should be kept, return the disk drive to the adapter to which it was previously connected.
If the disk drive does not contain data that should be kept, ask the user to delete all offline items:
1. Open the RSM Configurator and go to the Resource View
2. Select Detach or Delete as appropriate.
When the items have been deleted the disk drive becomes free. Description: An SSA device on the link is preventing the
completion of the loop configuration. Action: Go to “MAP 2010: START” on page 2010-1.
Service Request Numbers (SRNs) 9
SRN FRU List Problem
44PAA Device (100%)
45PAA Device (40%)
46000 None Description: An array is the Offline state because more than one
47000 None Description: An attempt has been made to store in the SSA
47500 None Description: Part of the array data might have been lost.
48000 None Description: The SSA adapter has detected a link configuration
49000 None Description: An array is in the Degraded state because a disk
49100 None Description: An array is in the Exposed state because a disk
49500 None Description: No hot-spare disk drives are available for an array
(“Exchanging Disk Drives” on page 19).
(“Exchanging Disk Drives” on page 19).
SSA Adapter card (40%) SSA cables, or other SSA connections in the device enclosure (20%).
(
Hardware Maintenance Manual
).
Description: An SSA device has a ‘Failed’ status. Action: If the SSA service aids are available, run the Disk service aid (see “Service Aids and Other Utilities” on page 29) to find the failing device. If no device is listed as Rejected, use the PAA part of the SRN to determine which device is failing. Before you exchange the failing device, run nonconcurrent diagnostics to that device to determine the cause of the problem. If the SSA service aids are not available, note the value of PAA in this SRN, and go to “Finding the Physical Location of a Device” on page 38. Exchange the failing FRU for a new FRU.
Description: The SSA adapter has detected an open SSA loop. Action: If the SSA service aids are available, run the Disk service aid (see “Service Aids and Other Utilities” on page 29) to determine which part of the loop is failing. If the SSA service aids are not available, note the value of PAA in this SRN, and go to “Finding the Physical Location of a Device” on page 38.
disk drive is not available. At least one member disk drive of the array is present, but more than one member disk drive is missing. Action: Go to “MAP 2010: START” on page 2010-1.
adapter the details of more than 32 arrays. Action: Go to “MAP 2010: START” on page 2010-1.
Action: Go to “MAP 2010: START” on page 2010-1.
that is not valid. Action: See “SSA Loop Configurations that Are Not Valid” on page 12.
drive is not available to the array, and a write command has been sent to that array. Action: A disk drive might not be available for one of the following reasons:
The disk drive has failed.The disk drive has been removed from the subsystem.An SSA link has failed.A power failure has occurred.
Go to “MAP 2010: START” on page 2010-1.
drive is not available to the array. Action: A disk drive can become not available for several reasons:
The disk drive has failed.The disk drive has been removed from the subsystem.An SSA link has failed.A power failure has occurred.
Go to “MAP 2010: START” on page 2010-1.
that is configured for hot spare disk drives. Action: Go to “MAP 2010: START” on page 2010-1.
10 IBM SerialRAID Adapter Maintenance Information
SRN FRU List Problem
49700 None Description: The parity for the array is not complete.
50000 SSA adapter card (100%) Description: The SSA adapter failed to respond to the device
50001 SSA adapter card (100%) Description: A data parity error has occurred.
50002 SSA adapter card (100%) Description: An SSA adapter DMA error has occurred.
50004 SSA adapter card (100%) Description: Channel check.
50005 SSA adapter card (100%) Description: A software error has occurred.
50006 SSA adapter card (100%) Description: A channel check has occurred.
50008 SSA adapter card (100%) Description: Unable to read or write the PCI registers.
50010 SSA adapter card (100%) Description: An SSA adapter or device drive protocol error has
50012 SSA adapter card (100%) Description: The SSA adapter microcode has hung.
D4000 SSA adapter card (100%) Description: The diagnostics cannot configure the SSA adapter.
D4100 SSA adapter card (100%) Description: The diagnostics cannot open the SSA adapter.
D4300 SSA adapter card (100%) Description: The diagnostics have detected an SSA adapter
DFFFF SSA adapter card (100%) Note: The description and action for this SRN are valid only if
Action: Go to “MAP 2010: START” on page 2010-1.
driver. Action: Exchange the FRU for a new FRU.
Action: Exchange the FRU for a new FRU.
Action: Exchange the FRU for a new FRU.
Action: Exchange the FRU for a new FRU.
Action: Go to “Software and Microcode Errors” on page 12 before exchanging the FRU.
Action: Exchange the FRU for a new FRU.
Action: Exchange the FRU for a new FRU.
occurred. Action: Go to “Software and Microcode Errors” on page 12 before exchanging the FRU.
Action: Run nonconcurrent diagnostics to the SSA adapter. If the diagnostics fail, exchange the FRU for a new FRU. If the diagnostics do not fail, go to “Software and Microcode Errors” on page 12 before exchanging the FRU.
Action: Exchange the FRU for a new FRU.
Action: Exchange the FRU for a new FRU.
POST failure. Action: Exchange the FRU for a new FRU.
you have run diagnostics to the SSA attachment. If this SRN has occurred because you have run diagnostics on some other device, see the service information for that device. Description: A command or parameter that has been sent or received is not valid. This problem is caused either by the SSA adapter, or by an error in the microcode. Action: Go to “Software and Microcode Errors” on page 12 before exchanging the FRU.
Service Request Numbers (SRNs) 11
Software and Microcode Errors
Some SRNs indicate that a problem might have been caused by a software error or by a microcode error. If you have one of these SRNs, do the following actions:
1. Make a note of the contents of the error log for the device that has the problem.
2. Go to the system service aids and select Display Vital Product Data to display the VPD of the failing system. Make a note of the VPD for all the SSA adapters and disk drives.
3. Report the problem to your support center. The center can tell you whether you have a known problem, and can, if necessary, provide you with a correction for the software or microcode.
SSA Loop Configurations that Are Not Valid
Note: This section is related to SRN 48000.
SRN 48000 shows that the SSA loop contains more devices or adapters than are allowed. The maximum numbers allowed depend on the adapter. Refer to the
SerialRAID Adapter: Installation and User's Guide
If the SRN occurred when you or the customer turned on the system:
1. Turn off the system.
2. Review the configuration that you are trying to make, and determine why that configuration is not valid.
3. Correct your configuration by reconfiguring the SSA cables or by removing the excess devices or adapters from the loop.
4. Turn on the system.
IBM
for details.
If the SRN occurred because additional devices or adapters were added to a working SSA loop:
1. Remove the additional devices or adapters that are causing the problem, and put the loop back into its original, working configuration.
Note:
configuration code to reset itself from the effects of the error.
2. Review the configuration that you are trying to make, and determine why that configuration is not valid.
3. Correct your configuration by reconfiguring the SSA cables or by removing the excess devices or adapters from the loop.
It is important that you do these actions
, because they enable the
Dealing with Fast-Write Problems
Fast-Write problems are indicated by Service Request Numbers (SRNs): in the series 425xx.
12 IBM SerialRAID Adapter Maintenance Information
The procedure, using the RSM configurator, for removing the Fast-Write function from a resource when advised to do this is as follows:
1. Start the RSM Configurator and select all the resources that have Fast-Write enabled
2. Perform the following actions on each resource in turn:
a. Go to the Resource View page for each resource you want to change b. Select Delete FW
Note: All Fast-Write resources are identified with the symbol of a lightening flash against them.
Service Request Numbers (SRNs) 13
14 IBM SerialRAID Adapter Maintenance Information
SSA Link Errors
SSA link errors can be caused by a number reasons, for example if:
Power is removed from an SSA deviceAn SSA device is failingAn SSA device is removedA cable is disconnected.
Errors might be indicated in various ways, such as: SRN 45PAA
A flashing link status (Ready) light on the SSA device at each end of the failing linkThe indication of an open link when using the Disk Service Aid.
SSA Link Error Problem Determination
Instead of using the normal MAPs to solve a link error problem, you can refer directly to the link status lights to isolate the failing FRU. The descriptions given here show you how to do this.
In an SSA loop, devices are connected through two or more SSA links to an SSA RAID Adapter. Each SSA link is the connection between two SSA nodes (devices or adapters); for example:
Disk drive to disk driveAdapter to disk drive moduleAdapter to adapter
An SSA link can contain several parts. When doing problem determination, think of the link and all its parts as one complete item.
Here are some examples of SSA links. Each link contains more than one part.
Example 1
This link is between two disk drives that are in the same subsystem. It has three parts.
SSA Subsystem
Internal
Disk
Connection
Drive 1
Copyright IBM Corp. 1998 15
Disk
Drive 2
Example 2
This link is between two disk drives that are in the same subsystem. It has five parts.
SSA Subsyst em
Disk
Drive 1
Internal
Connection
Dummy
Disk
Drive
Internal
Connection
Disk
Drive 2
Example 3
This link is between two disk drives that are not in the same subsystem. It has seven parts.
SSA Subsyst em SSA Subsystem
Disk
Drive
Internal
Connection
SSA
Connector
Card
Cable
SSA
Connector
Card
Internal
Connection
Disk
Drive
Example 4
This link is between a disk drive and an SSA RAID Adapter. It has five parts.
SSA Subsyst em
Disk
Drive
Internal
Connection
SSA
Connector
Card
Cable
Adapter
Link Status Lights
If a fault occurs that prevents the operation of a particular link, the link status lights of the various parts of the complete link show that the error has occurred.
You can find the failing link by looking for the flashing green status light at each end of the affected link. Some configurations might have other indicators along the link (for example, SSA connector cards) to help with FRU isolation.
The meanings of the disk drive and adapter lights are summarized here.
16 IBM SerialRAID Adapter Maintenance Information
Status of Light Meaning
Off Both SSA links are inactive. Permanently on Both SSA links are active. Slow flash
(two seconds on, two seconds off)
If you need more information about the lights, see:
For adapter lights, “Introducing the IBM SerialRAID Adapter” on page 1 in this
book.
For other lights, the service information for the device that contains the lights.
Locating a Broken Loop
Using the RSM configurator, go to the Physical View of the selected adapter and look for the symbol Break. This indicates a broken SSA loop.
Using the DOS configurator you can access the disk service aids to show the SSA loop that is broken.
Only one SSA link is active.
à
@
┌───────────────────────────────────────────────────────────────────────┐Ur │CONFIG SSA Configurator and Service Aids yymmdd DOS Version│ └───────────────────────────────────────────────────────────────────────┘
┌────────────────────────────┐ Main Menu ├────────────────────────────┤ │ ┌─────────────────────────────────────┐
│ │ Disk Service Aids │ ├─────────────────────────────────────┤ │ │ Link SSA UID Status │
│ │ Port A1 │ │ UIDxxxxx │ │ UIDyxxxx │ │ UID3xxxx │ │ UID4xxxx │ │ -------- │ │ UID5xxxx
│ │ Port A2
└──│ Port B1 No disks
└─────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────┐ │<ESCAPE> Exit <ENTER> Select <F1> Help <F2> Format │ │<F3> Certify <F4> ServiceMode <F5> Diagnostics <F9> FlashOn │ │<F1ð> FlashOff @ └────────────────────────────────────────────────────────────────┘
á
│ Port B2
ð
ñ
This example screen shows a break (the dotted line) in the SSA loop between the second and third disk drives. In the condition shown by the display, the Ready lights on the second and third disk drives are both flashing.
SSA Link Errors 17
To help locate these disk drives, select the disk drive, and press F9 (FlashOn). The Check light on the selected disk drive flashes. This action does not affect the customer’s operations.
For more information about the service aids, see “Service Aids and Other Utilities” on page 29.
18 IBM SerialRAID Adapter Maintenance Information
Removing and Replacing FRUs
Exchanging Disk Drives
When a maintenance procedure requires you to replace a faulty disk drive with a new one, first check whether the disk drive to be removed is a member of a RAID array.
If the disk drive to be changed IS NOT a member of an array, go to “Exchanging
an Array Disk” on page 20.
If the disk drive to be changed IS a member of an array, go to “Exchanging a
Non-Array Disk Drive.”
Replacement Disk Drives
There are two points to note about a disk drive you are installing to replace a faulty unit.
If the replacement disk drive is a new unit from the factory, or one previously used
in an AIX machine, it will be placed on the list of New Resources. It must be converted to a Free Resource before it can be used by the PC.
If the replacement disk was previously formatted as a member of a RAID array in a
different system, it will be identified as a Pre-Configured disk. It must be converted to a Free Resource before it can be used in the new system.
Exchanging a Non-Array Disk Drive
The procedure depends on whether you are using the DOS configurator or the RSM configurator.
Using the DOS Configurator
1. From the Main Menu, select SSA Adapter List
2. Select the required adapter from the list displayed.
3. From the Adapter Menu, select Disk Service Aids.
4. Select the disk drive that you want to change. If necessary use the Identify function to find the disk drive. Press F9 (FlashOn);
the Check light flashes on the selected disk drive; press F10 (FlashOff) to remove the function.
5. Put the disk drive into Service Mode. Place the cursor on the disk drive entry and press F4.
6. Remove the old drive and replace it with a new one (see the unit
Maintenance Manual
7. Press Esc to exit from the Service Aids window. This action automatically resets Service Mode on the new disk drive.
Copyright IBM Corp. 1998 19
).
Hardware
8. Repeat the procedure given above for any other disk drive that you are changing.
9. If necessary, convert the newly-installed disk drive into a free resource (see the configurator information in the
Guide
).
IBM SerialRAID Adapter: Installation and User's
Using the RSM Configurator
1. Start the RSM configurator and select the appropriate adapter from the Adapter List.
2. On the Adapter View page, select the Physical View.
3. On the Physical View page, select the disk drive you want to change.
4. On the Disk View page, click on the Service Mode button at the bottom of the page to put the disk into service mode.
If necessary use the Identify function to find the actual disk drive. Click FlashOn, the Check light flashes on the selected disk drive; click FlashOff to remove the function.
5. You can now remove the faulty disk drive and insert the replacement.
6. To reset Service Mode on the new disk, move back to the Physical View page and click on the Reset Service Mode button at the base of the page.
If necessary, convert the newly-installed disk drive into a free resource (see the configurator information in the
Guide
).
IBM SerialRAID Adapter: Installation and User's
Exchanging an Array Disk
This section describes how a disk is logically removed from an array and replaced by a compatible Free Resource. Such action could be necessay, for example, to check a disk drive that is giving a high level of read/write errors but has not been rejected from the array.
This action is also necessary if an array disk develops a hard fault and there is no hot-spare available. If this happens, at the next write operation the faulty disk is automatically de-configured and moved to the Rejected list. In the array, the faulty disk is replaced by a Blank Reserved. To restore the array to its full operational status you need to replace the Blank Reserved in the array with a suitable Free Resource.
Note: If there was a Hot-Spare available when the array disk became faulty the hot-spare is automatically integrated into the array and the faulty disk moved to the list of Rejected disks. The procedure in this event is to convert the Rejected disk to a Free resource, then change it as described in “Exchanging a Non-Array Disk Drive” on page 19. The new disk can then be reassigned as a Hot-Spare to replace the one that was used.
20 IBM SerialRAID Adapter Maintenance Information
Exchanging an Array Disk Using the DOS Configurator
1. Start the DOS Configurator.
2. From the Main Menu, select SSA Adapter List, then select the required adapter from the list.
3. From the Adapter Menu, select RAID 5 Resources.
4. Select the array from which you want to remove a disk drive.
5. Select View Members.
6. Select the disk drive that you want to remove.
7. Press F7 (Exchange Members). This displays a list of Free Resources (disk drives) that are compatible as
replacements for the array disk drive. The list also includes the item Blank Reserved. If there are no Free Resources
available and you intend to physically remove the array disk, you must exchange the array disk with the Blank Reserved.
8. Select an appropriate Free Resource (or the Blank Reserved). The selected item replaces the disk drive in the array. The array disk drive is
logically removed and returned to the list of Free Resources.
9. If you need to perform maintenance on the disk drive removed from the array:
a. Go to the list of Free Resources. b. Select the disk drive that you logically removed from the array. c. Set Service Mode.
You can now physically remove the disk drive for maintenance.
Exchanging an Array Disk Using the RSM Configurator
1. Start the RSM configurator and select the appropriate adapter from the adapter list.
2. On the Adapter View page, select the Logical View.
3. On the Logical View page, select the RAID type. This opens the Resource list, listing the defined arrays.
4. Select the required array.
5. On the Array View page scroll down to the list of array members.
6. Select the disk drive that you want to remove. This will take you to the Disk View page for this disk.
7. On the Disk View page, click on the Comp.Exchange button. A list is displayed showing the Free Resource candidates that are suitable as
replacements for the array disk drive.
8. Select an appropriate Free Resource.
Removing and Replacing FRUs 21
Loading...
+ 67 hidden pages