IBM 88631SU, System x3850, 8863, 7362 Service Manual

Page 1
IBM System x3850 Ty pe 8863, 7362

Problem Dete rminatio n and Service Guid e
Page 2
Page 3
IBM System x3850 Ty pe 8863, 7362

Problem Dete rminatio n and Service Guid e
Page 4
Note: Before using this information and the product it supports, read the general information in Appendix B, “Notices,” on page
159.
14th Edition (February 2007)
US Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Page 5
Contents
Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Guidelines for trained service technicians . . . . . . . . . . . . . . . viii
Inspecting for unsafe conditions . . . . . . . . . . . . . . . . . viii
Guidelines for servicing electrical equipment . . . . . . . . . . . . . viii
Safety statements . . . . . . . . . . . . . . . . . . . . . . . .x
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . .1
Related documentation . . . . . . . . . . . . . . . . . . . . . .1
Notices and statements used in this document . . . . . . . . . . . . . .2
Features and specifications . . . . . . . . . . . . . . . . . . . . .3
Server controls, LEDs, and connectors . . . . . . . . . . . . . . . .4
Front view . . . . . . . . . . . . . . . . . . . . . . . . . .4
Rear view . . . . . . . . . . . . . . . . . . . . . . . . . .6
System-board layouts . . . . . . . . . . . . . . . . . . . . . . .8
I/O board internal connectors and jumpers . . . . . . . . . . . . . .8
Memory-card connectors . . . . . . . . . . . . . . . . . . . . .9
Memory-card LEDs . . . . . . . . . . . . . . . . . . . . . . .9
Microprocessor-board connectors and LEDs . . . . . . . . . . . . .10
PCI-X board connectors . . . . . . . . . . . . . . . . . . . .10
PCI-X board LEDs . . . . . . . . . . . . . . . . . . . . . .11
SAS-backplane connectors . . . . . . . . . . . . . . . . . . .11
Chapter 2. Diagnostics . . . . . . . . . . . . . . . . . . . . .13
Diagnostic tools . . . . . . . . . . . . . . . . . . . . . . . .13
POST . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
POST beep codes . . . . . . . . . . . . . . . . . . . . . .14
Error logs . . . . . . . . . . . . . . . . . . . . . . . . . .18
POST error codes . . . . . . . . . . . . . . . . . . . . . . .20
POST and SMI error messages . . . . . . . . . . . . . . . . . .34
Checkout procedure . . . . . . . . . . . . . . . . . . . . . . .46
About the checkout procedure . . . . . . . . . . . . . . . . . .46
Performing the checkout procedure . . . . . . . . . . . . . . . .46
(Trained service technicians only) Checkpoint codes . . . . . . . . . . .47
Problem isolation tables . . . . . . . . . . . . . . . . . . . . .47
CD or DVD drive problems . . . . . . . . . . . . . . . . . . .48
General problems . . . . . . . . . . . . . . . . . . . . . . .49
Hard disk drive problems . . . . . . . . . . . . . . . . . . . .49
Intermittent problems . . . . . . . . . . . . . . . . . . . . . .49
Keyboard, mouse, or pointing-device problems . . . . . . . . . . . .50
USB keyboard, mouse, or pointing-device problems . . . . . . . . . .51
Memory problems . . . . . . . . . . . . . . . . . . . . . . .52
Microprocessor problems . . . . . . . . . . . . . . . . . . . .53
Monitor problems . . . . . . . . . . . . . . . . . . . . . . .53
Optional-device problems . . . . . . . . . . . . . . . . . . . .56
Power problems . . . . . . . . . . . . . . . . . . . . . . .57
Serial port problems . . . . . . . . . . . . . . . . . . . . . .58
ServerGuide problems . . . . . . . . . . . . . . . . . . . . .59
Software problems . . . . . . . . . . . . . . . . . . . . . .59
Universal Serial Bus (USB) port problems . . . . . . . . . . . . . .60
Video problems . . . . . . . . . . . . . . . . . . . . . . . .60
Light path diagnostics . . . . . . . . . . . . . . . . . . . . . .60
Light path diagnostic LEDs . . . . . . . . . . . . . . . . . . .63
Remind button . . . . . . . . . . . . . . . . . . . . . . . .67
© Copyright IBM Corp. 2007 iii
Page 6
Power-supply LEDs . . . . . . . . . . . . . . . . . . . . . . .67
Diagnostic programs, messages, and error codes . . . . . . . . . . . .69
Real-time diagnostics . . . . . . . . . . . . . . . . . . . . .70
Running the on-board diagnostic programs . . . . . . . . . . . . .70
Diagnostic text messages . . . . . . . . . . . . . . . . . . . .71
Viewing the test log . . . . . . . . . . . . . . . . . . . . . .71
Diagnostic error codes . . . . . . . . . . . . . . . . . . . . .71
Recovering from a BIOS update failure . . . . . . . . . . . . . . . .88
System-error log messages . . . . . . . . . . . . . . . . . . . .89
Solving SCSI problems . . . . . . . . . . . . . . . . . . . . . 102
Solving power problems . . . . . . . . . . . . . . . . . . . . . 102
Solving Ethernet controller problems . . . . . . . . . . . . . . . . 103
Solving undetermined problems . . . . . . . . . . . . . . . . . . 104
Calling IBM for service . . . . . . . . . . . . . . . . . . . . . 105
Chapter 3. Parts listing, Type 8863, 7362 . . . . . . . . . . . . . . 107
Server replaceable units . . . . . . . . . . . . . . . . . . . . . 108
Alcohol wipes . . . . . . . . . . . . . . . . . . . . . . . . .110
Power cords . . . . . . . . . . . . . . . . . . . . . . . . .110
Chapter 4. Removing and replacing server components . . . . . . . .113
Installation guidelines . . . . . . . . . . . . . . . . . . . . . .113
System reliability guidelines . . . . . . . . . . . . . . . . . . .114
Working inside the server with the power on . . . . . . . . . . . . .114
Handling static-sensitive devices . . . . . . . . . . . . . . . . .114
Returning a device or component . . . . . . . . . . . . . . . . .115
Removing the cover and bezel . . . . . . . . . . . . . . . . . . .115
Tier 1 CRU information . . . . . . . . . . . . . . . . . . . . .116
Battery . . . . . . . . . . . . . . . . . . . . . . . . . .116
DVD Drive . . . . . . . . . . . . . . . . . . . . . . . . .118
Hot-swap fan . . . . . . . . . . . . . . . . . . . . . . . .118
Hot-swap power supply . . . . . . . . . . . . . . . . . . . . 120
Memory module . . . . . . . . . . . . . . . . . . . . . . . 122
Tier 2 CRU information . . . . . . . . . . . . . . . . . . . . . 132
Operator information panel assembly . . . . . . . . . . . . . . . 132
I/O board . . . . . . . . . . . . . . . . . . . . . . . . . 133
PCI-X adapter guide . . . . . . . . . . . . . . . . . . . . . 134
Power-supply structure . . . . . . . . . . . . . . . . . . . . 135
SAS backplane . . . . . . . . . . . . . . . . . . . . . . . 136
FRU information . . . . . . . . . . . . . . . . . . . . . . . . 137
Front-panel assembly . . . . . . . . . . . . . . . . . . . . . 137
Microprocessor tray and microprocessor . . . . . . . . . . . . . . 138
PCI-X board assembly . . . . . . . . . . . . . . . . . . . . 142
PCI-X switch card assembly . . . . . . . . . . . . . . . . . . 143
Power backplane . . . . . . . . . . . . . . . . . . . . . . 144
Chapter 5. Configuration information and instructions . . . . . . . . 147
Updating the firmware . . . . . . . . . . . . . . . . . . . . . . 147
Configuring the server . . . . . . . . . . . . . . . . . . . . . . 147
Using the ServerGuide Setup and Installation CD . . . . . . . . . . . 148
Using the Configuration/Setup Utility program . . . . . . . . . . . . 148
Installing and using the baseboard management controller utility programs 153
Using the SAS/SATA Configuration Utility program . . . . . . . . . . 154
Configuring the Ethernet controller . . . . . . . . . . . . . . . . 154
Using the PXE boot agent utility program . . . . . . . . . . . . . . 154
Using the ServeRAID configuration programs . . . . . . . . . . . . 155
iv IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 7
Appendix A. Getting help and technical assistance . . . . . . . . . . 157
Before you call . . . . . . . . . . . . . . . . . . . . . . . . 157
Using the documentation . . . . . . . . . . . . . . . . . . . . . 157
Getting help and information from the World Wide Web . . . . . . . . . 158
Software service and support . . . . . . . . . . . . . . . . . . . 158
Hardware service and support . . . . . . . . . . . . . . . . . . . 158
Appendix B. Notices . . . . . . . . . . . . . . . . . . . . . . 159
Edition notice . . . . . . . . . . . . . . . . . . . . . . . . . 159
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . 160
Important notes . . . . . . . . . . . . . . . . . . . . . . . . 160
Product recycling and disposal . . . . . . . . . . . . . . . . . . 161
Battery return program . . . . . . . . . . . . . . . . . . . . . 162
Electronic emission notices . . . . . . . . . . . . . . . . . . . . 163
Federal Communications Commission (FCC) statement . . . . . . . . 163
Industry Canada Class A emission compliance statement . . . . . . . . 163
Australia and New Zealand Class A statement . . . . . . . . . . . . 163
United Kingdom telecommunications safety requirement . . . . . . . . 163
European Union EMC Directive conformance statement . . . . . . . . 163
Taiwanese Class A warning statement . . . . . . . . . . . . . . . 164
Chinese Class A warning statement . . . . . . . . . . . . . . . . 164
Japanese Voluntary Control Council for Interference (VCCI) statement 164
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Contents v
Page 8
vi IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 9
Safety
Before installing this product, read the Safety Information.
Antes de instalar este produto, leia as Informações de Segurança.
Pred instalací tohoto produktu si prectete prírucku bezpecnostních instrukcí.
Læs sikkerhedsforskrifterne, før du installerer dette produkt.
Lees voordat u dit product installeert eerst de veiligheidsvoorschriften.
Ennen kuin asennat tämän tuotteen, lue turvaohjeet kohdasta Safety Information.
Avant d’installer ce produit, lisez les consignes de sécurité.
Vor der Installation dieses Produkts die Sicherheitshinweise lesen.
Prima di installare questo prodotto, leggere le Informazioni sulla Sicurezza.
Les sikkerhetsinformasjonen (Safety Information) før du installerer dette produktet.
Antes de instalar este produto, leia as Informações sobre Segurança.
Antes de instalar este producto, lea la información de seguridad.
Läs säkerhetsinformationen innan du installerar den här produkten.
© Copyright IBM Corp. 2007 vii
Page 10
Guidelines for trained service technicians
This section contains information for trained service technicians.
Inspecting for unsafe conditions
Use the information in this section to help you identify potential unsafe conditions in an IBM product that you are working on. Each IBM product, as it was designed and manufactured, has required safety items to protect users and service technicians from injury. The information in this section addresses only those items. Use good judgment to identify potential unsafe conditions that might be caused by non-IBM alterations or attachment of non-IBM features or options that are not addressed in this section. If you identify an unsafe condition, you must determine how serious the hazard is and whether you must correct the problem before you work on the product.
Consider the following conditions and the safety hazards that they present:
v Electrical hazards, especially primary power. Primary voltage on the frame can
cause serious or fatal electrical shock.
v Explosive hazards, such as a damaged CRT face or a bulging capacitor.
v Mechanical hazards, such as loose or missing hardware.
inspect the product for potential unsafe conditions, complete the following steps:
To
1. Make sure that the power is off and the power cord is disconnected.
2. Make sure that the exterior cover is not damaged, loose, or broken, and observe any sharp edges.
3. Check the power cord:
v Make sure that the third-wire ground connector is in good condition. Use a
meter to measure third-wire ground continuity for 0.1 ohm or less between the external ground pin and the frame ground.
v Make sure that the power cord is the correct type, as specified in “Power
cords” on page 110.
v Make sure that the insulation is not frayed or worn.
Remove the cover.
4.
5. Check for any obvious non-IBM alterations. Use good judgment as to the safety of any non-IBM alterations.
6. Check inside the server for any obvious unsafe conditions, such as metal filings, contamination, water or other liquid, or signs of fire or smoke damage.
7. Check for worn, frayed, or pinched cables.
8. Make sure that the power-supply cover fasteners (screws or rivets) have not been removed or tampered with.
Guidelines for servicing electrical equipment
Observe the following guidelines when servicing electrical equipment:
v Check the area for electrical hazards such as moist floors, nongrounded power
extension cords, power surges, and missing safety grounds.
v Use only approved tools and test equipment. Some hand tools have handles that
are covered with a soft material that does not provide insulation from live electrical currents.
v Regularly inspect and maintain your electrical hand tools for safe operational
condition. Do not use worn or broken tools or testers.
viii IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 11
v Do not touch the reflective surface of a dental mirror to a live electrical circuit.
The surface is conductive and can cause personal injury or equipment damage if it touches a live electrical circuit.
v Some rubber floor mats contain small conductive fibers to decrease electrostatic
discharge. Do not use this type of mat to protect yourself from electrical shock.
v Do not work alone under hazardous conditions or near equipment that has
hazardous voltages.
v Locate the emergency power-off (EPO) switch, disconnecting switch, or electrical
outlet so that you can turn off the power quickly in the event of an electrical accident.
v Disconnect all power before you perform a mechanical inspection, work near
power supplies, or remove or install main units.
v Before you work on the equipment, disconnect the power cord. If you cannot
disconnect the power cord, have the customer power-off the wall box that supplies power to the equipment and lock the wall box in the off position.
v Never assume that power has been disconnected from a circuit. Check it to
make sure that it has been disconnected.
v If you have to work on equipment that has exposed electrical circuits, observe
the following precautions:
Make sure that another person who is familiar with the power-off controls is
near you and is available to turn off the power if necessary.
When you are working with powered-on electrical equipment, use only one
hand. Keep the other hand in your pocket or behind your back to avoid creating a complete circuit that could cause an electrical shock.
When using a tester, set the controls correctly and use the approved probe
leads and accessories for that tester.
Stand on a suitable rubber mat to insulate you from grounds such as metal
floor strips and equipment frames.
Use extreme care when measuring high voltages.
v
v To ensure proper grounding of components such as power supplies, pumps,
blowers, fans, and motor generators, do not service these components outside of their normal operating locations.
v If an electrical accident occurs, use caution, turn off the power, and send another
person to get medical aid.
Safety ix
Page 12
Safety statements
Important:
Each caution and danger statement in this documentation begins with a number. This number is used to cross reference an English-language caution or danger statement with translated versions of the caution or danger statement in the Safety
Information document.
For example, if a caution statement begins with a number 1, translations for that caution statement appear in the Safety Information document under statement 1.
Be sure to read all caution and danger statements in this documentation before performing the instructions. Read any additional safety information that comes with your server or optional device before you install the device.
x IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 13
Statement 1:
DANGER
Electrical
current from power, telephone, and communication cables is
hazardous.
To avoid a shock hazard: v Do not connect or disconnect any cables or perform installation,
maintenance, or reconfiguration of this product during an electrical storm.
v Connect all power cords to a properly wired and grounded electrical
outlet.
v Connect to properly wired outlets any equipment that will be attached to
this product.
v When possible, use one hand only to connect or disconnect signal
cables.
v Never turn on any equipment when there is evidence of fire, water, or
structural damage.
v Disconnect the attached power cords, telecommunications systems,
networks, and modems before you open the device covers, unless instructed otherwise in the installation and configuration procedures.
v Connect and disconnect cables as described in the following table when
installing, moving, or opening covers on this product or attached devices.
To Connect: To Disconnect:
1. Turn everything OFF.
2. First, attach all cables to devices.
3. Attach signal cables to connectors.
4. Attach power cords to outlet.
1. Turn everything OFF.
2. First, remove power cords from outlet.
3. Remove signal cables from connectors.
4. Remove all cables from devices.
5. Turn device ON.
Safety xi
Page 14
Statement 2:
CAUTION: When replacing the lithium battery, use only IBM Part Number 33F8354 or an equivalent type battery recommended by the manufacturer. If your system has a module containing a lithium battery, replace it only with the same module type made by the same manufacturer. The battery contains lithium and can explode if not properly used, handled, or disposed of.
Do not:
v Throw or immerse into water v Heat to more than 100°C (212°F) v Repair or disassemble
Dispose
of the battery as required by local ordinances or regulations.
Statement 3:
CAUTION: When laser products (such as CD-ROMs, DVD drives, fiber optic devices, or transmitters) are installed, note the following:
v Do not remove the covers. Removing the covers of the laser product could
result in exposure to hazardous laser radiation. There are no serviceable parts inside the device.
v Use of controls or adjustments or performance of procedures other than
those specified herein might result in hazardous radiation exposure.
DANGER
Some laser products contain an embedded Class 3A or Class 3B laser diode. Note the following.
Laser radiation when open. Do not stare into the beam, do not view directly with optical instruments, and avoid direct exposure to the beam.
xii IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 15
Statement 4:
18 kg (39.7 lb) 32 kg (70.5 lb) 55 kg (121.2 lb)
CAUTION: Use safe practices when lifting.
Statement 5:
CAUTION: The power control button on the device and the power switch on the power supply do not turn off the electrical current supplied to the device. The device also might have more than one power cord. To remove all electrical current from the device, ensure that all power cords are disconnected from the power source.
2
1
Safety xiii
Page 16
Statement 8:
CAUTION: Never remove the cover on a power supply or any part that has the following label attached.
Hazardous voltage, current, and energy levels are present inside any component that has this label attached. There are no serviceable parts inside these components. If you suspect a problem with one of these parts, contact a service technician.
Statement 10:
CAUTION: Do not place any object on top of rack-mounted devices.
xiv IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 17
Chapter 1. Introduction
The IBM® System x3850 server is a 3-U-high1 rack model server for high-volume network transaction processing. This high-performance, symmetric multiprocessing (SMP) server is ideally suited for networking environments that require superior microprocessor performance, input/output (I/O) flexibility, and high manageability.
The server comes with a limited warranty. For more information about the terms of the warranty, see the Warranty and Support Information document on the IBM
Documentation CD.
You can obtain up-to-date information about the server and other IBM server products at http://www.ibm.com/servers/eserver/serverproven/compat/us/.
Related documentation
This Problem Determination and Service Guide contains information to help you solve problems yourself, and it contains information for a service technician. In addition to this Problem Determination and Service Guide, the following documentation comes with the server:
v Installation Guide
This printed document contains instructions for setting up the server and basic instructions for installing some options, and how to get help.
v User’s Guide
This document is in Portable Document Format (PDF) on the IBM Documentation CD. It provides general information about the server, including information about features, and how to configure the server. It also contains detailed instructions for installing, removing, and connecting optional devices that the server supports.
v Rack Installation Instructions
This printed document contains instructions for installing the server in a rack.
v Safety Information
This document is in PDF on the IBM Documentation CD. It contains translated caution and danger statements. Each caution and danger statement that appears in the documentation has a number that you can use to locate the corresponding statement in your language in the Safety Information document.
v Warranty and Support Information
This document is in PDF on the IBM Documentation CD. It contains information about the terms of the warranty and about service and assistance.
Depending on the server model, additional documentation might be included on the IBM Documentation CD.
The server might have features that are not described in the documentation that you received with the server. The documentation might be updated occasionally to include information about those features, or technical updates might be available to provide additional information that is not included in the server documentation. These updates are available from the IBM Web site. Complete the following steps to check for updated documentation and technical updates:
1. Racks are marked in vertical increments of 1.75 inches each. Each increment is referred to as a unit, or a “U”. A 1-U-high device is 1.75 inches tall.
© Copyright IBM Corp. 2007 1
Page 18
1. Go to http://www.ibm.com/support/.
2. In the Browse by topic section, click Publications.
3. On the Publications page, in the Brand field, select Servers.
4. In the Family field, select System x3850.
5. Click Continue.
Notices and statements used in this document
The caution and danger statements that appear in this document are also in the multilingual Safety Information document, which is on the IBM Documentation CD. Each statement is numbered for reference to the corresponding statement in the
Safety Information document.
The following notices and statements are used in this document:
v Note: These notices provide important tips, guidance, or advice. v Important: These notices provide information or advice that might help you avoid
inconvenient or problem situations.
v Attention: These notices indicate potential damage to programs, devices, or
data. An attention notice is placed just before the instruction or situation in which damage could occur.
v Caution: These statements indicate situations that can be potentially hazardous
to you. A caution statement is placed just before the description of a potentially hazardous procedure step or situation.
v Danger: These statements indicate situations that can be potentially lethal or
extremely hazardous to you. A danger statement is placed just before the description of a potentially lethal or extremely hazardous procedure step or situation.
2 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 19
Features and specifications
The following information is a summary of the features and specifications of the server. Depending on the server model, some features might not be available, or some specifications might not apply.
Table 1. Features and specifications
Microprocessor:
v Intel® Xeon
v 1 MB Level-2 cache v 667 MHz front-side bus (FSB) v Support for up to four microprocessors
Use the Configuration/Setup Utility
Note:
program to determine the type and speed of the microprocessors.
Memory:
v Minimum: 2 GB depending on server
model, expandable to 32 GB
v Type: 333 MHz, registered, ECC,
PC2-3200 double data rate (DDR) II, SDRAM
v Sizes: 1 GB or 2 GB in pairs v Connectors: Two-way interleaved, four
dual inline memory module (DIMM) connectors per memory card
v Maximum: Four memory cards, each
card containing two pairs of PC2-3200 DDRII DIMMS
Drives:
v Slim DVD-ROM: IDE v Serial Attached SCSI (SAS) hard disk
drives
Expansion
bays:
v Six SAS, 2.5-inch bays v One 12.7-mm removable-media drive
bay (DVD-ROM drive installed)
Expansion
slots:
Six PCI-X 2.0 hot-plug 266 MHz/64-bit slots
Upgradeable microcode:
System BIOS, diagnostics, service processor, BMC, and SAS microcode
Power supply:
v Standard: One dual-rated power supply
1300 watts at 220 V ac input 650 watts at 11 0 V ac input
Upgradeable to two power supplies
v
(hot-swappable at 220 V ac only)
Size:
v 3U v Height: 128.35 mm (5.05 in.) v Depth: 715 mm (28.15 in.) v Width: 440 mm (17.32 in.) v Weight: approximately 38.5 kg (85 lb)
when fully configured or 31.75 kg (70 lb) minimum
Racks of 4.45 cm (1.75 inches). Each increment is referred to as a unit, or “U.” A 1-U-high device is 4.45 cm (1.75 inches) tall.
Integrated functions:
v Baseboard management controller v IBM EXA-32 Chipset with integrated
memory and I/O controller
v Service processor support for Remote
Supervisor Adapter II SlimLine
v Light path diagnostics v Three Universal Serial Bus (USB) ports
Tw o on rear of server One on front of server
Broadcom 5704C dual 10/100/1000
v
Gigabit Ethernet controllers
v AT I 7000-M video
16 MB video memory SVGA compatible
Mouse connector
v v Keyboard connector v Serial connector
Acoustical
v Sound power, idle: 6.6 bel declared v Sound power, operating: 6.6 bel
declared
are marked in vertical increments
noise emissions:
Environment:
v Air temperature:
Server on:
- 10° to 35°C (50° to 95°F); altitude: 0 to 914 m (3000 ft). If the server has a dual-core microprocessor, at maximum power reduce the 35°C by 1°C per 300 m above sea level, or the microprocessor might throttle to remain within the internal thermal specifications.
- 10° to 32°C (50° to 90°F); altitude: 914 m to 2133 m (7000 ft.)
Humidity:
v
Server on: 8% to 80% Server off: 8% to 80%
Electrical
input:
v Sine-wave input (50-60 Hz) required v Input voltage high range:
Minimum: 200 V ac Maximum: 240 V ac
Approximate input kilovolt-amperes (kVA):
v
Minimum: 0.08 kVA Maximum: 1.6 kVA
Notes:
1. Power consumption and heat output vary depending on the number and type of optional features installed and the power-management optional features in use.
2. These levels were measured in controlled acoustical environments according to the procedures specified by the American National Standards Institute (ANSI) S12.10 and ISO 7779 and are reported in accordance with ISO
9296. Actual sound-pressure levels in a given location might exceed the average values stated because of room reflections and other nearby noise sources. The declared sound-power levels indicate an upper limit, below which a large number of computers will operate.
Chapter 1. Introduction 3
Page 20
Server controls, LEDs, and connectors
This section describes the controls, light-emitting diodes (LEDs), and connectors on the front and rear of the server.
Front view
The following illustration shows the controls, LEDs, and connectors on the front of the server.
Hard disk drive status LED
Hard disk drive activity LED
Operator information panel
DVD-eject button
Electrostatic-discharge connector
DVD drive activity LED
Hard disk drive status LED: If a ServeRAID-8i adapter is installed, when this LED
is lit it indicates that the associated hard disk drive has failed. If the LED flashes slowly (one flash per second), the drive is being rebuilt. If the LED flashes rapidly (three flashes per second), the controller is identifying the drive.
Hard disk drive activity LED: On some server models, each hot-swap hard disk
drive has an activity LED. When this LED is flashing, it indicates that the drive is in use.
Operator information panel: This panel contains controls and LEDs. The following
illustration shows the controls and LEDs on the operator information panel.
Power-control button
USB connector
Power-on LED
Hard disk drive activity LED
Locator LED
Information LED
Release latch
System-error LED
The following controls, connectors, and LEDs are on the operator information panel:
v USB connector: Connect a USB device to this connector. v Power-control button: Press this button to turn the server on and off manually.
A power-control-button shield comes with the server.
v Information LED: When this LED is lit, it indicates that there is a suboptimal
condition in the server and that light path diagnostics will light an additional LED to help isolate the condition. If the LOG LED on the light path diagnostics panel is lit, information is available in the baseboard management controller (BMC) log or in the system-event log about the condition. The condition might be that the BMC log is full or almost full.
4 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 21
This LED and LEDs on the light path diagnostics panel remain lit until you resolve the condition. If the only condition is that the BMC log is full or almost full, clear the BMC log or the system-event log through the Configuration/Setup Utility program to turn off the lit LEDs. See the User’s Guide on the IBM
Documentation CD for information about clearing the logs. Clear the logs after
you have resolved all conditions.
Important: If the server has a baseboard management controller, clear the BMC
log and system-event log after you resolve all conditions. This will turn off the information LED and LOG LED, if all conditions are resolved.
v Release latch: Slide this latch to the left to access the light path diagnostics
panel.
v System-error LED: When this LED is lit, it indicates that a system error has
occurred. An LED on the light path diagnostics panel is also lit to help isolate the error.
v Locator LED: When this LED is lit, it has been lit remotely by the system
administrator to aid in visually locating the server.
v Hard disk drive activity LED: When this LED is flashing, it indicates that a SAS
hard disk drive is in use.
v Power-on LED: When this LED is lit and not flashing, it indicates that the server
is turned on. When this LED is flashing, it indicates that the server is turned off and still connected to an ac power source. When this LED is off, it indicates that ac power is not present, or the power supply or the LED itself has failed.
Note: If this LED is off, it does not mean that there is no electrical power in the
server. The LED might be burned out. To remove all electrical power from the server, you must disconnect the power cords from the electrical outlets.
DVD-eject
button: Press this button to release a CD or DVD from the DVD drive.
DVD drive activity LED: When this LED is lit, it indicates that the DVD drive is in
use.
Electrostatic-discharge connector: Connect an electrostatic-discharge wrist strap
to this connector.
Chapter 1. Introduction 5
Page 22
Rear view
The following illustration shows the connectors and LEDs on the rear of the server.
Power-supply connector: Connect the power cord to this connector.
Video connector: Connect a monitor to this connector.
USB 1 connector: Connect a USB device to this connector.
SP Ethernet 10/100 connector: Use this connector to connect the service
processor to a network.
SP Ethernet 10/100 activity LED: This LED is on the SP Ethernet 10/100
connector. When this LED is lit, it indicates that there is activity between the server and the network.
SP Ethernet 10/100 link LED: This LED is on the SP Ethernet 10/100 connector.
When this LED is lit, it indicates that there is an active connection on the Ethernet port.
USB 2 connector: Connect a USB device to this connector.
System serial connector: Connect a 9-pin serial device to this connector.
SP Serial connector: Connect a 9-pin serial device to this connector.
Mouse connector: Connect a mouse or other device to this connector.
Keyboard connector: Connect a keyboard to this connector.
Remote Supervisor Adapter II SlimLine status LED: This LED is on the I/O
board and is visible on the rear of the server. When this LED flashes, it indicates that there is activity on the Remote Supervisor Adapter II SlimLine. When this LED is lit continuously, it indicates that there is a problem with the Remote Supervisor Adapter II SlimLine.
6 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 23
IXA RS485 connector: Use this connector to connect to an iSeries server when an
Integrated xSeries Adapter (IXA) is installed. The cable for this connection comes with the server.
The optional Integrated xSeries Adapter (IXA) cab be installed only in slot 2. You must move jumpers J35 and J40 on the IXA. For details about installing the IXA, see the documentation that comes with the adapter.
I/O board error LED: This LED is on the I/O board and is visible on the rear of the
server. When this LED is lit, it indicates that there is a problem with the I/O board.
Gigabit Ethernet 2 activity LED: This LED is on the Gigabit Ethernet 2 connector.
When this LED flashes, it indicates that there is activity between the server and the network.
Gigabit Ethernet 2 connector: Use this connector to connect the server to a
network.
Gigabit Ethernet 2 link LED: This LED is on the Gigabit Ethernet 2 connector.
When this LED is lit, it indicates that there is an active connection on the Ethernet port.
Gigabit Ethernet 1 activity LED: This LED is on the Gigabit Ethernet 1 connector.
When this LED flashes, it indicates that there is activity between the server and the network.
Gigabit Ethernet 1 connector: Use this connector to connect the server to a
network.
Gigabit Ethernet 1 link LED: This LED is on the Gigabit Ethernet 1 connector.
When this LED is lit, it indicates that there is an active connection on the Ethernet port.
Chapter 1. Introduction 7
Page 24
System-board layouts
The following illustrations show the connectors, LEDs, and jumpers on the memory card, microprocessor board, PCI-X board, SAS backplane, and I/O board. The illustrations in this document might differ slightly from your hardware.
I/O board internal connectors and jumpers
The following illustration shows the internal connectors and jumpers on the I/O board.
Table 2 describes the function of each three-pin jumper block.
Table 2. I/O board jumper blocks
Jumper name Description
Force power on (J2) The default position is pins 1 and 2. Change the position of this
jumper to pins 2 and 3 to force the server to startup when you connect the server to ac power.
Power-on password (J9) The default position is pins 1 and 2. Change the position of this
jumper to pins 2 and 3 to bypass the power-on password check.
Changing the position of this jumper does not affect the administrator password check if an administrator password is set. If the administrator password is lost, the operator information panel must be replaced.
Boot recovery (J14) The default position is pins 1 and 2 (use the primary page during
startup). Move the jumper to pins 2 and 3 to use the secondary page during startup.
Wake on LAN® bypass (J15) The default position is pins 1 and 2. Move the jumper to pins 2 and
3 to prevent a Wake on LAN packet from waking the system when the system is in the powered-off state.
8 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 25
Memory-card connectors
The following illustration shows the connectors on the memory card.
Memory-card LEDs
The following illustration shows the LEDs on the memory card.
Light path diagnostics button
Light path diagnostics button power LED
DIMM 1
DIMM 2
DIMM 3
DIMM 4
Memory card error LED
To p view of the memory card
Memory Port Power
Error
Memory Hot-Swap Enabled
DIMM 1 error LED
DIMM 2 error LED
DIMM 3 error LED
DIMM 4 error LED
Chapter 1. Introduction 9
Page 26
Microprocessor-board connectors and LEDs
The following illustration shows the connectors and LEDs on the microprocessor board.
Light path diagnostics button
Fan 6
Fan 2
Memory card 1
Fan 7
Fan 3
Memory card 2
Memory card 3
Fan 8
Fan 5
Fan 1
Microprocessor 1 socket
Microprocessor 2 socket
Microprocessor 1 error LED
PCI-X board connectors
The following illustration shows the connectors on the PCI-X board.
PCI slot 1 266 MHz 64-bit
PCI-X slot 2 266 MHz 64-bit
PCI-X slot 3 266 MHz 64-bit
PCI-X slot 4 266 MHz 64-bit
PCI-X slot 5 266 MHz 64-bit
PCI-X slot 6 266 MHz 64-bit
11 22
Microprocessor 2 error LED
44
33
Microprocessor 3 error LED
Microprocessor 3 socket
Microprocessor 4 error LED
Microprocessor 4 socket
Attention LED
Power LED
ServeRAID-8i
Active PCI cable
I/O board
Memory card 4
Microprocessor card error LED
Fan 4
Microprocessor 3 VRM connector
Microprocessor 4 VRM connector
VRM 4 error LED
VRM 3 error LED
10 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
SAS internal power cable connector
Page 27
PCI-X board LEDs
The following illustration shows the LEDs on the PCI-X board.
PCI power LEDs
Power good LED
PCI attention LEDs
SAS-backplane connectors
The following illustration shows the connectors on the SAS backplane.
Front of SAS backplane
Back of SAS backplane
SAS hard disk drive connectors
2
SAS signal cable 2SAS signal cable 1 SAS power
Chapter 1. Introduction 11
Page 28
12 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 29
Chapter 2. Diagnostics
This chapter provides basic troubleshooting information to help you solve some common problems that might occur with the server.
If you cannot locate and correct the problem using the information in this chapter, see Appendix A, “Getting help and technical assistance,” on page 157 for more information.
Diagnostic tools
The following tools are available to help you diagnose and solve hardware-related problems:
v POST beep codes, error messages, and error logs
The power-on self-test (POST) generates beep codes and messages to indicate successful test completion or the detection of a problem. See “POST” for more information.
v Problem isolation tables
Use these tables to help you diagnose various symptoms. See “Problem isolation tables” on page 47.
v Light path diagnostics
Use the light path diagnostics to diagnose system errors quickly. See “Light path diagnostics” on page 60 for more information.
v Diagnostic programs and error messages
The diagnostic programs are stored in memory on the microprocessor tray. These programs are the primary method of testing the major components of the server. See “Diagnostic programs, messages, and error codes” on page 69 for more information.
POST
When you turn on the server, it performs a series of tests to check the operation of server components and some of the options in the server. This series of tests is called the power-on self-test, or POST.
If POST finishes without detecting any problems, a single beep sounds, and the first screen of the operating system opens, or an application program starts.
If POST detects a problem, more than one beep might sound, or an error message appears on the screen. See “Beep code descriptions” on page 14 and “POST error codes” on page 20 for more information.
Notes:
1. If a power-on password is set, you must type the password and press Enter, when prompted, before POST will continue.
2. A single problem might cause several error messages. When this occurs, correct the cause of the first error message. The other error messages usually will not occur the next time you run the test.
© Copyright IBM Corp. 2007 13
Page 30
POST beep codes
A beep code is a combination of short or long beeps or a series of short beeps separated by pauses. For example, a “1-2-3” beep code is one beep, a pause, two beeps, a pause, and three beeps.
When POST is completed, one beep is emitted to indicate that the server is working correctly. If POST detects a problem during startup, other beep codes might occur. See “Beep code descriptions” to help diagnose and solve problems that are detected during startup. If no beep code sounds, see “No-beep symptoms” on page
18.
Beep code descriptions
The following table describes the beep codes and suggested actions to correct the detected problems.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Beep code Description Action
1-1-3 CMOS write/read test failed.
1-1-4 BIOS ROM checksum failed.
1-2-1 Programmable interval timer failed.
1-2-2 DMA initialization failed.
1-2-3 DMA page register write/read failed.
1-2-4 RAM refresh verification failed.
1-3-1 1st 64K RAM test failed.
1. Reseat the following components:
a. Battery
b. I/O board
Replace the components listed in step 1
2. one at a time, in the order shown, restarting the server each time.
1. Reseat the microprocessor tray.
2. (Trained service technician only) Replace the microprocessor tray.
1. Reseat the I/O board.
2. Replace the I/O board.
1. Reseat the I/O board.
2. Replace the I/O board.
1. Reseat the I/O board.
2. Replace the I/O board.
1. Reseat the following components:
a. DIMM
b. Memory card
Replace the components listed in step 1
2. one at a time, in the order shown, restarting the server each time.
1. Reseat the following components:
a. DIMM
b. Memory card
2.
Replace the components listed in step 1
one at a time, in the order shown, restarting the server each time.
14 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 31
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Beep code Description Action
2-1-1 Secondary DMA register failed.
1. Reseat the I/O board.
2. Replace the I/O board.
2-1-2 Primary DMA register failed.
1. Reseat the I/O board.
2. Replace the I/O board.
2-1-3 Primary interrupt mask register failed.
1. Reseat the I/O board.
2. Replace the I/O board.
2-1-4 Secondary interrupt mask register failed.
1. Reseat the I/O board.
2. Replace the I/O board.
2-2-2 Keyboard controller failed.
1. Reseat the I/O board.
2. Replace the I/O board.
3-1-1 Timer tick interrupt failed.
1. Reseat the I/O board.
2. Replace the I/O board.
3-1-2 Interval timer channel 2 failed.
1. Reseat the I/O board.
2. Replace the I/O board.
3-1-4 Time-of-day clock failed.
1. Reseat the following components:
a. Battery
b. I/O board
Replace the components listed in step 1
2. one at a time, in the order shown, restarting the server each time.
3-3-2 Critical SMBUS error occurred.
1. Disconnect power cord, wait 30 seconds, and retry.
2. Reseat the following components:
a. DIMM
b. Memory card
c. Microprocessor tray
d. I/O board
Replace the following components one at a
3. time, in the order shown, restarting the server each time.
a. DIMM
b. Memory card
c. (Trained service technician only)
Microprocessor tray
d. I/O board
Chapter 2. Diagnostics 15
Page 32
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Beep code Description Action
3-3-3 No operational memory in system. Note: Make sure you re-enable the memory in
the Configuration/Setup Utility program. See, “Memory problems” on page 52
1. Make sure that all memory cards contain the correct number of DIMMs; install or reseat DIMMS; then, restart the server.
2. Reseat the following components:
a. DIMM
b. Memory card
c. Microprocessor tray
Replace the following components one at a
3. time, in the order shown, restarting the server each time.
a. DIMM
b. Memory card
c. (Trained service technician only)
Microprocessor tray
Two short beeps Information only, configuration has
changed.
1. Run the Configuration/Setup Utility program.
2. Run the diagnostic programs.
Three short beeps Memory error. Note: Make sure you re-enable the memory in
the Configuration/Setup Utility program. See, “Memory problems” on page 52
1. Reseat the following components:
a. DIMM
b. Memory card
c. Microprocessor tray
Replace the following components one at a
2. time, in the order shown, restarting the server each time.
a. DIMM
b. Memory card
c. (Trained service technician only)
Microprocessor tray
16 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 33
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Beep code Description Action
One continuous beep Microprocessor error.
1. Reseat the following components:
a. (Trained service technician only)
Microprocessor
b. (Trained service technician only)
Optional microprocessor
c. Microprocessor tray
Replace the following components one at a
2. time, in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor
b. (Trained service technician only)
Optional microprocessor
c. (Trained service technician only)
Microprocessor tray
Repeating short beeps Keyboard error.
1. Reseat the following components:
a. Keyboard
b. I/O board
Replace the components listed in step 1
2. one at a time, in the order shown, restarting the server each time.
Repeating long beeps Memory error. Reseat the DIMMs.
One long and one short beep
Card error.
1. Reseat the following components:
a. Microprocessor tray
b. I/O board
Replace the following components one at a
2. time, in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor tray
b. I/O board
One long and two short beeps
Card error.
1. Reseat the following components:
a. Microprocessor tray
b. I/O board
Replace the following components one at a
2. time, in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor tray
b. I/O board
Chapter 2. Diagnostics 17
Page 34
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Beep code Description Action
Two long and two short beeps
Card error.
1. Reseat the following components:
a. Microprocessor tray
b. I/O board
Replace the following components one at a
2. time, in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor tray
b. I/O board
No-beep symptoms
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
No-beep symptom Description Action
No beeps occur, and the system operates correctly.
No beeps occur after successful completion of POST.
No beeps occur, and there is no video.
The power-on status is Disabled.
1. (Trained service technician only) Reseat the operator information panel.
2. (Trained service technician only) Replace the operator information panel.
1. Run the Configuration/Setup Utility program and select Start Options; then, set Power-On Status to Enable.
2. (Trained service technician only) Reseat the operator information panel.
3. (Trained service technician only) Replace the operator information panel.
See “Solving undetermined problems” on page
104.
Error logs
The POST error log contains the three most recent error codes and messages that were generated during POST. The BMC log and the system-error log contain messages that were generated during POST and all system status messages from the service processor.
18 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 35
Notes:
v The BMC log is limited in size and is designed so that when the log is full, new
entries will not overwrite existing entries; therefore, you must periodically clear the BMC log from the Configuration/Setup Utility program (the menu choices are described in the User’s Guide).
v When troubleshooting an error, make sure to clear the BMC log so that you can
find current errors more easily.
v Entries written to the BMC log early in the POST procedure will show an
incorrect date as the default timestamp; however, the date and time will correct itself as POST continues.
v Each BMC log entry appears on its own page; to display all the data for an entry,
use the up arrow (—) and down arrow (–) or the Page Up and Page Down keys. To move from one entry to the next, move the cursor to the Get Next Entry or
Get Previous Entry line; then, press Enter.
v The log indicates an Assertion Event when an event has occurred. It indicates a
Deassertion Event when the event is no longer occurring.
v Some of the error codes and messages in the BMC log are abbreviated.
v Viewing the BMC log through the web interface of the optional Remote
Supervisor Adapter II SlimLine allows all messages to be translated.
The following illustration shows an example of a BMC log entry.
BMC System Event Log
---------------------------------------------------------­Get Next Entry Get Previous Entry Clear BMC SEL
Entry Number= 00005 / 00011 Record ID= 0005 Record Type= 02 Timestamp= 2005/01/25 16:15:17 Entry Details: Generator ID= 0020
Sensor Type= 04 Assertion Event Fan Threshold Lower Non-critical - going high
Sensor Number= 40 Event Direction/Type= 01
Event Data= 52 00 1A
You can view the contents of the POST error log, the BMC log, and the system-error log from the Configuration/Setup Utility program. Yo u can view the contents of the BMC log also from the diagnostic programs.
Note: When troubleshooting PCI-X slots, note that the error logs report the PCI-X
buses numerically. The numerical assignments vary depending on the configuration. You can check the assignments by running the Configuration/Setup Utility program (see the User’s Guide for more information).
Viewing error logs from the Configuration/Setup Utility program
For complete information about using the Configuration/Setup Utility program, see the User’s Guide.
Chapter 2. Diagnostics 19
Page 36
To view the error logs, complete the following steps:
1. Turn on the server.
2. When the prompt Press F1 for Configuration/Setup appears, press F1. If you
have set both a power-on password and an administrator password, you must type the administrator password to view the error logs.
3. Use one of the following procedures:
v To view the POST error log, select Error Logs, and then select POST Error
Log.
v To view the BMC log, select Advanced Settings, select Baseboard
Management Controller (BMC) settings, and then select BMC System Event Log.
v To view the system-error log (available only if an optional Remote Supervisor
Adapter II SlimLine is installed), select Event/Error Logs, and then select System Event/Error Log.
Viewing the BMC log from the diagnostic programs
The BMC log contains the same information whether it is viewed from the Configuration/Setup Utility program or from the diagnostic programs.
Notes:
v Some of the error codes and messages in the BMC log are abbreviated.
v Viewing the BMC log through the web interface of the optional Remote
Supervisor Adapter II SlimLine allows all messages to be translated.
information about using the diagnostic programs, see “Running the on-board
For diagnostic programs” on page 70.
To view the BMC log, complete the following steps:
1. If the server is running, turn off the server and all attached devices.
2. Turn on all attached devices; then, turn on the server.
3. When the prompt F2 for Diagnostics appears, press F2. If you have set both a
power-on password and an administrator password, you must type the administrator password to run the diagnostic programs.
4. From the top of the screen, select Hardware Info.
5. From the list, select BMC Log.
POST error codes
The following table describes the POST error codes and suggested actions to correct the detected problems.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
062 Three consecutive boot failures using the
default configuration.
1. Flash the system firmware to the latest level (see “Updating the firmware” on page 147).
2. Reseat the I/O board.
3. Replace the I/O board.
20 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 37
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
101, 102 Tick timer internal interrupt, internal timer
channel 2.
114 Adapter read-only memory (ROM) error.
1. Reseat the I/O board.
2. Replace the I/O board.
1. Remove all adapters and reinstall them one at a time, restarting the server each time, to identify the failing adapter; then, replace the failing adapter.
2. Reseat the microprocessor tray.
3. Reseat the I/O board.
4. (Trained service technician only) Replace the microprocessor tray.
5. Replace the I/O board.
151 Real-time clock error.
1. Reseat the following components:
a. Battery
b. I/O board
Replace the components listed in step 1 one at a
2. time, in the order shown, restarting the server each time.
161 Real-time clock battery error.
1. Reseat the following components:
a. Battery
b. I/O board
Replace the components listed in step 1 one at a
2. time, in the order shown, restarting the server each time.
162 Device configuration error.
1. Run the Configuration/Setup Utility program, select Load Default Settings, and save the settings.
2. Reseat the following components:
a. Battery
b. Failing device
c. I/O board
Remove the battery for 60 minutes; then, reinstall
3. the battery and restart the server.
4. Replace the components listed in step 2 one at a time, in the order shown, restarting the server each time.
Chapter 2. Diagnostics 21
Page 38
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
163 Real-time clock error.
1. Run the Configuration/Setup Utility program,
select Load Default Settings, make sure that the date and time are correct, and save the settings.
2. Reseat the following components:
a. Battery
b. I/O board
Replace the components listed in step 2 one at a
3. time, in the order shown, restarting the server each time.
175 Bad EEPROM CRC#1.
1. Restart the server.
2. Update the BMC firmware (see “Updating the firmware” on page 147).
3. Reseat the microprocessor tray.
4. (Trained service technician only) Replace the microprocessor tray.
178 System VPD not available.
1. Restart the server.
2. Update the BMC firmware (see “Updating the firmware” on page 147).
3. Reseat the microprocessor tray.
4. (Trained service technician only) Replace the microprocessor tray.
184 Power-on password damaged.
1. Run the Configuration/Setup Utility program, select Load Default Settings, and save the settings.
2. Reseat the following components:
a. Battery
b. I/O board
Remove the battery for 60 minutes; then, reinstall
3. the battery and restart the server.
4. Replace the components listed in step 2 one at a time, in the order shown, restarting the server each time.
187 VPD serial number not set.
1. Set the serial number by updating the BIOS code level (see “Updating the firmware” on page 147).
2. Reseat the following components:
a. I/O board
b. Optional Remote Supervisor Adapter II
SlimLine
Replace the components listed in step 2 one at a
3. time, in the order shown, restarting the server each time.
22 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 39
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
188 Bad EEPROM CRC #2.
1. Restart the server.
2. Update the BMC firmware (see “Updating the firmware” on page 147).
3. Reseat the microprocessor tray.
4. (Trained service technician only) Replace the microprocessor tray.
189 An attempt was made to access the server
with an incorrect password.
Restart the server and enter the administrator password; then, run the Configuration/Setup Utility program and change the power-on password.
289 A DIMM has been disabled by the user or
by the system.
Note: Make sure you re-enable the memory in the
Configuration/Setup Utility program. See, “Memory problems” on page 52
1. If the DIMM was disabled by the user, run the Configuration/Setup Utility program and enable the DIMM.
2. Make sure that the DIMM is installed correctly (see “Memory module” on page 122).
3. Reseat the DIMM.
4. Replace the DIMM.
301 Keyboard or keyboard controller error.
1. If you have installed a USB keyboard, run the Configuration/Setup Utility program and enable keyboardless operation to prevent the POST error message 301 from being displayed during startup.
2. Reseat the following components:
a. Keyboard
b. I/O board
Replace the components listed in step 2 one at a
3. time, in the order shown, restarting the server each time.
303 Keyboard controller error.
1. Reseat the following components:
a. I/O board
b. Keyboard
Replace the components listed in step 1 one at a
2. time, in the order shown, restarting the server each time.
1600 The baseboard management controller
failed BIST (built-in self-test).
1. Update the BMC firmware (see “Updating the firmware” on page 147).
2. Reseat the following components:
a. Microprocessor tray
b. I/O board
c. PCI or PCI-X adapters
(Trained service technician only) Replace the
3. microprocessor tray.
Chapter 2. Diagnostics 23
Page 40
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
1601 Systems-management adapter
communication error.
1. Make sure that the Remote Supervisor Adapter II SlimLine is installed correctly.
2. Update the Remote Supervisor Adapter II SlimLine firmware (see “Updating the firmware” on page 147).
3. Update the BMC firmware (see “Updating the firmware” on page 147).
4. Reseat the following components:
a. Microprocessor tray
b. I/O board
c. PCI or PCI-X adapter
(Trained service technician only) Replace the
5. microprocessor tray.
1602 Systems-management adapter
communication error.
1. Make sure that the Remote Supervisor Adapter II SlimLine is installed correctly.
2. Update the Remote Supervisor Adapter II SlimLine firmware (see “Updating the firmware” on page 147).
3. Update the BMC firmware (see “Updating the firmware” on page 147).
4. Reseat the following components:
a. Microprocessor tray
b. I/O board
c. (Trained service technician only) PCI-X board
Replace the Remote Supervisor Adapter II
5. SlimLine.
6. (Trained service technician only) Replace the microprocessor tray.
1762 Fixed disk configuration error.
1. Run the Configuration/Setup Utility program and load the defaults.
2. Reseat the following components:
a. SAS cables
b. SAS hard disk drive
c. I/O board
Replace the components listed in step 2 one at a
3. time, in the order shown, restarting the server each time.
24 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 41
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
178x Fixed disk error.
1. Reseat the hard disk drive cables.
2. Replace the hard disk drive cables.
3. Run the hard disk drive diagnostic tests.
4. Reseat the following components:
a. Optional ServeRAID™-8i adapter.
b. Hard disk drive.
c. I/O board.
Replace the components listed in step 4 one at a
5. time, in the order shown, restarting the server each time.
1800 Unavailable PCI hardware interrupt.
1. Run the Configuration/Setup Utility program and adjust the adapter settings.
2. Remove each adapter one at a time, restarting the server each time, until the problem is isolated.
1962 A drive does not contain a valid boot sector.
1. Make sure that a bootable operating system is installed.
2. Run the hard disk drive diagnostic tests.
3. Reseat the following components:
a. SAS drive
b. SAS hard disk drive backplane cable
c. I/O board
Replace the components listed in step 3 one at a
4. time, in the order shown, restarting the server each time.
5962 IDE CD or DVD drive configuration error.
1. Run the Configuration/Setup Utility program and load the default settings (see “Configuration/Setup Utility menu choices” on page 149).
2. Reseat the following components:
a. CD or DVD drive cable
b. CD or DVD drive
c. I/O board
Replace the components listed in step 2 one at a
3. time, in the order shown, restarting the server each time.
8603 Pointing-device error.
1. Reseat the following components:
a. Pointing device
b. I/O board
Replace the components listed in step 1 one at a
2. time, in the order shown, restarting the server each time.
Chapter 2. Diagnostics 25
Page 42
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
0001295 ECC circuit check.
1. Reseat the following components:
a. DIMM
b. Memory card
Replace the components in step 1 one at a time,
2. in the order shown, restarting the server each time.
00012000 Processor machine check error.
1. Reseat the following components:
a. (Trained service technician only)
Microprocessor
b. Microprocessor tray
Replace the following components one at a time,
2. in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor
b. (Trained service technician only)
Microprocessor tray
00019501 Processor 1 is not functioning; check
processor LEDs.
1. Reseat the following components:
a. Microprocessor tray
b. (Trained service technician only)
Microprocessor 1
Replace the following components one at a time,
2. in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor 1
b. (Trained service technician only)
Microprocessor tray
00019502 Processor 2 is not functioning; check
processor LEDs.
1. Reseat the following components:
a. Microprocessor tray
b. (Trained service technician only)
Microprocessor 2
Replace the following components one at a time,
2. in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor 2
b. (Trained service technician only)
Microprocessor tray
26 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 43
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
00019503 Processor 3 is not functioning; check VRM
and processor LEDs.
1. Reseat the following components:
a. Microprocessor tray
b. VRM 3
c. (Trained service technician only)
Microprocessor 3
Replace the following components one at a time,
2. in the order shown, restarting the server each time.
a. VRM 3
b. (Trained service technician only)
Microprocessor 3
c. (Trained service technician only)
Microprocessor tray
00019504 Processor 4 is not functioning; check VRM
and processor LEDs.
1. Reseat the following components:
a. Microprocessor tray
b. VRM 4
c. (Trained service technician only)
Microprocessor 4
Replace the following components one at a time,
2. in the order shown, restarting the server each time.
a. VRM 4
b. (Trained service technician only)
Microprocessor 4
c. (Trained service technician only)
Microprocessor tray
00019701 Processor 1 failed BIST.
1. Reseat the following components:
a. (Trained service technician only)
Microprocessor 1
b. Microprocessor tray
Replace the following components one at a time,
2. in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor 1
b. (Trained service technician only)
Microprocessor tray
Chapter 2. Diagnostics 27
Page 44
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
00019702 Processor 2 failed BIST.
1. Reseat the following components:
a. (Trained service technician only)
Microprocessor 2
b. Microprocessor tray
Replace the following components one at a time,
2. in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor 2
b. (Trained service technician only)
Microprocessor tray
00019703 Processor 3 failed BIST.
1. Reseat the following components:
a. (Trained service technician only)
Microprocessor 3
b. VRM3
c. Microprocessor tray
Replace the following components one at a time,
2. in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor 3
b. VRM3
c. (Trained service technician only)
Microprocessor tray
00019704 Processor 4 failed BIST.
1. Reseat the following components:
a. (Trained service technician only)
Microprocessor 4
b. VRM4
c. Microprocessor tray
Replace the following components one at a time,
2. in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor 4
b. VRM4
c. (Trained service technician only)
Microprocessor tray
28 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 45
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
00180100 A PCI adapter has requested memory
resources that are not available.
1. Change the order of the adapters in the PCI-X slots. Make sure that the boot device is positioned early in the scan order (see the User’s
Guide for information about the scan order).
2. Make sure that the settings for the PCI or PCI-X adapter and all other adapters in the Configuration/Setup Utility program are correct. If the memory resource settings are not correct, change them.
3. If all memory resources are being used, remove an adapter to make memory available to the PCI or PCI-X adapter. Disabling the BIOS on the adapter should correct the error. See the documentation that comes with the adapter.
00180200 No more I/O space is available for a PCI
adapter.
1. If the error code indicates a particular PCI or PCI-X slot or device, remove that device.
2. If the error continues, reseat the following components:
a. Each adapter
b. (Trained service technician only) PCI-X board
Replace the components listed in step 2 one at a
3. time, in the order shown, restarting the server each time.
00180300 No more memory (above 1 MB for a PCI
adapter).
1. If the error code indicates a particular PCI or PCI-X slot or device, remove that device.
2. Reseat the following components:
a. Each adapter
b. (Trained service technician only) PCI-X board
Replace the components listed in step 2 one at a
3. time, in the order shown, restarting the server each time.
00180400 No more memory (below 1 MB for a PCI
adapter).
1. Reseat the following components:
a. Each adapter
b. (Trained service technician only) PCI-X board
Replace the components listed in step 1 one at a
2. time, in the order shown, restarting the server each time.
00180500 PCI option ROM checksum error.
1. Remove the failing PCI or PCI-X adapter.
2. Reseat the following components:
a. Each adapter
b. (Trained service technician only) PCI-X board
Replace the components listed in step 2 one at a
3. time, in the order shown, restarting the server each time.
Chapter 2. Diagnostics 29
Page 46
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
00180600 PCI built-in self-test failure.
1. If the error code indicates a particular PCI or PCI-X slot or device, remove that device.
Note: Slot 0 indicates the I/O board.
2. Reseat the following components:
a. Each adapter
b. (Trained service technician only, if the
specified board is a FRU) The board indicated in the error code. (See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107, to determine CRU or FRU status.)
Replace the components listed in step 2 one at a
3. time, in the order shown above, restarting the server each time.
00180700, 00180800
General PCI error.
1. Make sure that no devices have been disabled in the Configuration/Setup Utility program.
2. Reseat the following components:
a. Failing adapter
Note: If an error LED is lit on the PCI-X
board or on an adapter, reseat that adapter first; if no LEDs are lit, reseat each adapter one at a time, restarting the server each time, to isolate the failing adapter.
b. (Trained service technician only) PCI-X board
Replace the components listed in step 2 one at a
3. time, in the order shown, restarting the server each time.
00181000 PCI error.
1. Remove the adapters from the PCI or PCI-X slots.
2. Reseat the following components:
a. Failing adapter
Note: If an error LED is lit on the PCI-X
board or on an adapter, reseat that adapter first; if no LEDs are lit, reseat each adapter one at a time, restarting the server each time, to isolate the failing adapter.
b. (Trained service technician only) PCI-X board
Replace the components listed in step 2 one at a
3. time, in the order shown, restarting the server each time.
30 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 47
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
01295085 ECC checking hardware test error.
1. Reseat the following components:
a. (Trained service technician only)
Microprocessor
b. DIMM
c. Microprocessor tray
Replace the following components one at a time,
2. in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor
b. DIMM
c. (Trained service technician only)
Microprocessor tray
01298001 No update data for processor 1.
1. Make sure that all microprocessors have the same cache size (see “Configuration/Setup Utility menu choices” on page 149).
2. Update the BIOS code again (see “Updating the firmware” on page 147).
3. (Trained service technician only) Reseat microprocessor 1.
4. (Trained service technician only) Replace microprocessor 1.
01298002 No update data for processor 2.
1. Make sure that all microprocessors have the same cache size (see “Configuration/Setup Utility menu choices” on page 149).
2. Update the BIOS code again (see “Updating the firmware” on page 147).
3. (Trained service technician only) Reseat microprocessor 2.
4. (Trained service technician only) Replace microprocessor 2.
01298004 No update data for processor 3.
1. Make sure that all microprocessors have the same cache size (see “Configuration/Setup Utility menu choices” on page 149).
2. Update the BIOS code again (see “Updating the firmware” on page 147).
3. (Trained service technician only) Reseat microprocessor 3.
4. (Trained service technician only) Replace microprocessor 3.
Chapter 2. Diagnostics 31
Page 48
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
01298005 No update data for processor 4.
1. Make sure that all microprocessors have the same cache size (see “Configuration/Setup Utility menu choices” on page 149).
2. Update the BIOS code again (see “Updating the firmware” on page 147).
3. (Trained service technician only) Reseat microprocessor 4.
4. (Trained service technician only) Replace microprocessor 4.
01298101 Bad update data for processor 1.
1. Make sure that all microprocessors have the same cache size (see “Configuration/Setup Utility menu choices” on page 149).
2. Update the BIOS code again (see “Updating the firmware” on page 147).
3. (Trained service technician only) Reseat microprocessor 1.
4. (Trained service technician only) Replace microprocessor 1.
01298102 Bad update data for processor 2.
1. Make sure that all microprocessors have the same cache size (see “Configuration/Setup Utility menu choices” on page 149).
2. Update the BIOS code again (see “Updating the firmware” on page 147).
3. (Trained service technician only) Reseat microprocessor 2.
4. (Trained service technician only) Replace microprocessor 2.
01298103 Bad update data for processor 3.
1. Make sure that all microprocessors have the same cache size (see “Configuration/Setup Utility menu choices” on page 149).
2. Update the BIOS code again (see “Updating the firmware” on page 147).
3. (Trained service technician only) Reseat microprocessor 3.
4. (Trained service technician only) Replace microprocessor 3.
32 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 49
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
01298104 Bad update data for processor 4.
1. Make sure that all microprocessors have the same cache size (see “Configuration/Setup Utility menu choices” on page 149).
2. Update the BIOS code again (see “Updating the firmware” on page 147).
3. (Trained service technician only) Reseat microprocessor 4.
4. (Trained service technician only) Replace microprocessor 4.
0I298200 Processor speed mismatch. Make sure that all microprocessors have the same
cache size (see “Configuration/Setup Utility menu choices” on page 149).
I9990301 Fixed disk sector error.
1. Reseat the following components:
a. Hard disk drive
b. SAS hard disk drive backplane
c. I/O board
Replace the components listed in step 1 one at a
2. time, in the order shown, restarting the server each time.
I9990305 An operating system was not found.
1. Make sure that a bootable operating system is installed.
2. Run the hard disk drive diagnostic tests.
3. Reseat the following components:
a. Hard disk drive
b. SAS hard disk drive backplane and cables
c. DVD drive and cables
d. I/O board
Replace the components listed in step 3 one at a
4. time, in the order shown, restarting the server each time.
I9990650 AC power has been restored.
1. Check the power cables.
2. Check for interruption of the power supply (see “Power-supply LEDs” on page 67).
3. Reseat the following components:
a. Power supply
b. (Trained service technician only) Power
backplane
Replace the components listed in step 3 one at a
4. time, in the order shown, restarting the server each time.
Chapter 2. Diagnostics 33
Page 50
POST and SMI error messages
BIOS can log two types of error messages in the BMC log and the system-error log: POST events, which occur during system startup, and SMI events, which are generally run time errors detected by hardware. The following table describes the possible POST and SMI error messages and suggested actions to correct the detected problems.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error message Action
POST reporting Processor Event: Invalid configuration of processor card. Chassis Number = X.
POST reporting Processor Event: Processor mismatch detected. Chassis Number = X. Processor Number = Y.
POST reporting Processor Event: POST does not support current stepping of processor. Chassis Number = X, Processor Number = Y.
POST reporting Processor Event: Unable to apply microcode (patch) update. Chassis Number = X. Processor Number = Y.
POST reporting Processor Event: Processor failed BIST. Chassis Number= X. Processor Number = Y.
POST reporting memory event: North Bridge Uncorrectable memory error occurred. Chassis Number = X. Memory Card = Y. Memory DIMM = Z.
POST reporting memory event: North Bridge Correctable memory threshold occurred. Chassis Number = X. Memory Card = Y. Memory DIMM = Z. Failing Symbol = 0xcb.
POST reporting memory event: DIMM Disabled - Failed ECC Test. Chassis Number = X. Memory Card = Y. Memory DIMM = Z.
Make sure that all microprocessors have the same part number.
1. Make sure that the BIOS code is at the latest level.
2. Make sure that all microprocessors have the same part number.
3. (Trained service technician only) Replace the microprocessor.
1. Make sure that the BIOS code is at the latest level.
2. Make sure that all microprocessors have the same part number.
3. (Trained service technician only) Replace the microprocessor.
(Trained service technician only) Replace the microprocessor.
(Trained service technician only) Replace the microprocessor.
1. Reseat the DIMM.
2. If the DIMM was disabled by the user, run the Configuration/Setup Utility program and enable the DIMM.
3. Replace the DIMM.
4. If the DIMM was disabled by the user, run the Configuration/Setup Utility program and enable the DIMM.
1. Reseat the DIMM.
2. If the DIMM was disabled by the user, run the Configuration/Setup Utility program and enable the DIMM.
3. Replace the DIMM.
4. If the DIMM was disabled by the user, run the Configuration/Setup Utility program and enable the DIMM.
1. Reseat the DIMM.
2. If the DIMM was disabled by the user, run the Configuration/Setup Utility program and enable the DIMM.
3. Replace the DIMM.
4. If the DIMM was disabled by the user, run the Configuration/Setup Utility program and enable the DIMM.
34 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 51
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error message Action
POST reporting memory event: DIMM Disabled - Failed POST/BIOS Memory Test. Chassis Number = X. Memory Card = Y. Memory DIMM = Z.
1. Reseat the DIMM.
2. If the DIMM was disabled by the user, run the Configuration/Setup Utility program and enable the DIMM.
3. Replace the DIMM.
4. If the DIMM was disabled by the user, run the Configuration/Setup Utility program and enable the DIMM.
POST reporting memory event: DIMM Disabled - Failed ECC Test. Chassis Number = X. Memory Card = Y. Memory DIMM = Z.
1. Reseat the DIMM.
2. If the DIMM was disabled by the user, run the Configuration/Setup Utility program and enable the DIMM.
3. Replace the DIMM.
4. If the DIMM was disabled by the user, run the Configuration/Setup Utility program and enable the DIMM.
POST reporting memory event: DIMM Disabled - Failed ECC Test. Chassis Number = X. Memory Card = Y. Memory DIMM = Z.
1. Reseat the DIMM.
2. If the DIMM was disabled by the user, run the Configuration/Setup Utility program and enable the DIMM.
3. Replace the DIMM.
4. If the DIMM was disabled by the user, run the Configuration/Setup Utility program and enable the DIMM.
Unknown SERR/PERR detected on PCI bus Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
Address of special cycle DPE on PCI primary Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
Master read parity error on PCI primary Chassis#=4 Slot#=2 Bus#=3 Dev.ID=0xaa99 Vend.ID=0xccbb Status=0xeedd DevFun#=0xff
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
Received target parity error on PCI primary Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
Chapter 2. Diagnostics 35
Page 52
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error message Action
Master write parity error on PCI primary Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
Device signaled SERR on PCI primary. Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
Slave signaled parity error on PCI primary Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
Signaled target abort on PCI primary Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
Additional correctable ECC error on PCI primary Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
Informational only; if the message remains:
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
Received Master Abort on PCI primary Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
Additional uncorrectable ECC error on PCI primary Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
36 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 53
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error message Action
Split completion discarded on PCI primary Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
Correctable ECC error on PCI primary Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
Informational only; if the message remains:
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
Unexpected split completion on PCI primary Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
Uncorrectable ECC error on PCI primary Chassis#=4 Slot#=2 Bus#=3 Dev.ID=0xaa99 Vend.ID=0xccbb Status=0xeedd DevFun#=0xff
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
Received split completion error on PCI primary Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI-PCI bridge secondary: Address of special cycle DPE Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI-PCI bridge secondary: Master read parity error Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
Chapter 2. Diagnostics 37
Page 54
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error message Action
PCI-PCI bridge secondary: Received target parity error Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI-PCI bridge secondary: Master write parity error Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI-PCI bridge secondary: Device signaled SERR Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI-PCI bridge secondary: Slave signaled parity error. Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI-PCI bridge secondary: Signaled target abort Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI-PCI bridge secondary: Additional correctable ECC error Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
Informational only; if the message remains:
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI-PCI bridge secondary: Received master abort Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
38 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 55
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error message Action
PCI-PCI bridge secondary: Additional uncorrectable ECC error Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI-PCI bridge secondary: Split completion discarded Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI-PCI bridge secondary: Correctable ECC error Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
Informational only; if the message remains:
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI-PCI bridge secondary: Unexpected split completion Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI-PCI bridge secondary: Uncorrectable ECC error Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI-PCI bridge secondary: Received split completion error Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI ECC Error (Corrected) Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
Informational only; if the message remains:
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
Chapter 2. Diagnostics 39
Page 56
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error message Action
PCI Bus Address Parity Error Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus Data Parity Error Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
SERR# asserted Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PERR Received by PCI Bridge on a PCIX split completion Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus Invalid Address Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus TCE Extent error Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus Page Fault Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus Unauthorized Access Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
40 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 57
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error message Action
PCI Bus Parity error in DMA read data buffer Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus timeout Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus DMA delay read timeout Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
Internal error on PCIX split completion Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus DMA read reply (RIO) timeout Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus Internal RAM error on DMA write Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus MVE valid bit off Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus ECC Error (Corrected) Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
Chapter 2. Diagnostics 41
Page 58
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error message Action
PCI Bus SERR# Detected Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus data parity error Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus No DEVSEL# Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus timeout Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus Retry count expired Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus Target Abort. Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus Invalid size Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus Access not enabled Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
42 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 59
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error message Action
PCI Bus Internal RAM error on MMIO Store Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus Split response received Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCIX split completion error status received Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
Unexpected PCIX split completion received Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCIX split completion timeout Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus Recoverable error summary bit Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus CSR error summary bit Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus Internal RAM error on MMIO load Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
Chapter 2. Diagnostics 43
Page 60
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error message Action
PCI Bus Bad command Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus Length field invalid Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus Load greater than 8 and no write buffer enabled Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCIX Discontiguous byte enable error Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus 4K address boundary crossing error Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus Store wrap state machine check Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus Target state machine check Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus Invalid transaction PM/DW Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
44 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 61
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error message Action
PCI Bus Invalid transaction PM/DR Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus Invalid transaction PS/DW Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Bus DMA write command FIFO parity error Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Secondary Status Register Dump Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI Secondary Status Register Dump Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
PCI to PCI Bridge Discard Timer Error Chassis#=X Slot#=Y Bus#=Z Dev.ID=0xSSSS Vend.ID=0xTTTT Status=0xUUUU DevFun#=0xVV
1. If the slot number is greater than 0, complete the following steps:
a. Reseat the adapter.
b. Replace the adapter.
If the slot number is 0, replace the PCI board.
2.
SMI handler reporting Memory Mirroring Failover Occurred. Running from mirrored image.
Note: This message immediately follows an
uncorrectable memory error.
1. Reseat the DIMM or memory card.
2. If the DIMM was disabled by the user, run the Configuration/Setup Utility program and enable the DIMM.
3. Replace the DIMM or memory card.
4. If the DIMM was disabled by the user, run the Configuration/Setup Utility program and enable the DIMM.
SMI handler reporting Processor Event:
(Trained service technician only) Replace the microprocessor. Unrecoverable error. Chassis Number = X. Processor ID = Y.
Chapter 2. Diagnostics 45
Page 62
Checkout procedure
The checkout procedure is the sequence of tasks that you should follow to diagnose a problem in the server.
About the checkout procedure
Before performing the checkout procedure for diagnosing hardware problems, review the following information:
v Read the safety information beginning on page vii.
v The diagnostics programs provide the primary methods of testing the major
components of the server, for example, the I/O board, Ethernet controller, keyboard, mouse (pointing device), serial ports, and hard disk drives. Yo u can also use them to test some external devices. If you are not sure whether a problem is caused by the hardware or by the software, you can use the diagnostics programs to confirm that the hardware is working correctly.
v When you run the diagnostics programs, a single problem might cause several
error messages. If you receive several error messages, correct the cause of the first error message. The other error messages might not occur the next time you run the diagnostics programs.
v Before running the diagnostics programs, you must determine whether the failing
server is part of a shared hard disk drive cluster (two or more servers sharing external storage devices). If it is part of a cluster, you can run all diagnostics programs except the ones that test the storage unit (that is, a hard disk drive in the storage unit) or the storage adapter that is attached to the storage unit. The failing server might be part of a cluster if any of the following conditions is true:
You have identified the failing server as part of a cluster (two or more servers
sharing external storage devices).
One or more external storage units are attached to the failing server and at
least one of the attached storage units is also attached to another server or unidentifiable device.
One or more servers are located near the failing server.
Important:
If the server is part of a shared hard disk drive cluster, run one test
at a time. Do not run any suite of tests, such as “quick” or “normal” tests, because this might enable the hard disk drive diagnostic tests.
v If the server is suspended and a POST error code is displayed, see “Error logs”
on page 18. If the server is suspended and no error message is displayed, see “Problem isolation tables” on page 47 and “Solving undetermined problems” on page 104.
v For information about power-supply problems, see “Solving power problems” on
page 102 and “Power-supply LEDs” on page 67.
v For intermittent problems, check the error log; see “Error logs” on page 18 and
“Diagnostic programs, messages, and error codes” on page 69.
Performing the checkout procedure
To perform the checkout procedure, complete the following steps:
001 IS THE SERVER PART OF A CLUSTER?
002 No. Go to step 004.
003 Yes. Schedule maintenance. Shut down all failing systems that are
related to the cluster. Go to step 004.
46 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 63
004
v Turn off the server and all external devices. v Check all cables and power cords. v Set all display controls to the middle positions. v Turn on all external devices. v Turn on the server. v Check the operator information panel system-error LED; if it is
flashing, check the light path diagnostics LEDs (see “Light path diagnostics” on page 60).
v Check for the following correct responses:
A single beep. Readable instructions on the main menu.
YOU RECEIVE THE CORRECT RESPONSES?
DID
005 No. Find the failure symptom in “Problem isolation tables”; if
necessary, see “Solving undetermined problems” on page 104.
006 Yes. Run the diagnostic programs (see “Running the on-board
diagnostic programs” on page 70).
v If you receive an error, see “Diagnostic error codes” on page 71.
v If the diagnostics programs were completed successfully and you
still suspect a problem, see “Solving undetermined problems” on page 104.
If the server does not turn on, see “Problem isolation tables.”
Important: If the server has a baseboard management controller, clear the BMC
log and system-event log after you resolve all conditions. This will turn off the information LED and LOG LED, if all conditions are resolved.
(Trained service technicians only) Checkpoint codes
Checkpoint codes give the check that was taking place at the time the system stopped; they do not provide error codes or suggested replacement parts. The checkpoint display will indicate where the server has stopped without waiting for the video to initialize at each startup during problem isolation.
There are two types of checkpoint codes: CPLD hardware checkpoint codes, and BIOS checkpoint codes. The BIOS checkpoint codes might change when the BIOS code is updated.
The checkpoint display for the System x3850 is located on the I/O board.
Checkpoint codes can be found at http://w3.pc.ibm.com/helpcenter/infotips/techinfo/ MIGR-58350.html.
Problem isolation tables
Use the problem isolation tables to find solutions to problems that have definite symptoms.
If you cannot find the problem in the error symptom charts, go to “Running the on-board diagnostic programs” on page 70 to test the server.
If you have just added new software or a new option and the server is not working, use the following procedures before using the problem isolation tables:
Chapter 2. Diagnostics 47
Page 64
1. Check the light path diagnostics LEDs on the operator information panel (see “Light path diagnostics” on page 60).
2. Remove the software or device that you just added.
3. Run the diagnostic tests to determine whether the server is running correctly.
4. Reinstall the new software or new device.
CD or DVD drive problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Symptom Action
The CD or DVD drive is not recognized.
A CD or DVD is not working correctly.
The CD or DVD drive tray is not working.
1. Make sure that:
v The IDE channel to which the CD or DVD drive is attached (primary or
secondary) is enabled in the Configuration/Setup Utility program.
v All cables and jumpers are installed correctly.
v The correct device driver is installed for the CD or DVD drive.
Run the CD or DVD drive diagnostic programs.
2.
3. Reseat the following components:
a. CD or DVD drive
b. CD or DVD drive cable
c. I/O board
Replace the components listed in step 3 one at a time, in the order shown,
4. restarting the server each time.
1. Clean the CD or DVD.
2. Run the CD or DVD drive diagnostic programs.
3. Reseat the CD or DVD drive.
4. Replace the CD or DVD drive.
1. Make sure that the server is turned on.
2. Insert the end of a straightened paper clip into the manual tray-release opening.
3. Reseat the CD or DVD drive.
4. Replace the CD or DVD drive.
48 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 65
General problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Symptom Action
A cover lock is broken, an LED is not working, or a similar
If the part is a CRU, replace it. If the part is a FRU, the part must be replaced by a trained service technician.
problem has occurred.
Hard disk drive problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Symptom Action
Not all drives are recognized by the hard disk drive diagnostic test (the Fixed Disk test).
The server stops responding during the hard disk drive diagnostic test.
A hard disk drive was not detected while the operating system was being started.
Remove the drive indicated on the diagnostic tests; then, run the hard disk drive diagnostic test again. If the remaining drives are recognized, replace the drive that you removed with a new one.
Remove the hard disk drive that was being tested when the server stopped responding, and run the diagnostic test again. If the hard disk drive diagnostic test runs successfully, replace the drive that you removed with a new one.
Reseat all hard disk drives and cables; then, run the hard disk drive diagnostic tests again.
Intermittent problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Symptom Action
A problem occurs only occasionally and is difficult to diagnose.
1. Make sure that:
v All cables and cords are connected securely to the rear of the server and
attached devices.
v When the server is turned on, air is flowing from the fan grille. If there is no
airflow, the fan is not working. This can cause the server to overheat and shut down.
Check the system-error log or BMC log (see “Error logs” on page 18).
2.
Chapter 2. Diagnostics 49
Page 66
Keyboard, mouse, or pointing-device problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Symptom Action
All or some keys on the keyboard do not work.
The mouse or pointing device does not work.
1. If the server is attached to a KVM switch, make sure the switch is working correctly by plugging the keyboard cable directly into the correct port on the rear of the server, thus bypassing the KVM switch.
2. Make sure that:
v The keyboard cable is securely connected to the server and the keyboard
and mouse cables are not reversed.
v The server and the monitor are turned on.
Reseat the following components:
3.
a. Keyboard
b. I/O board
Replace the components listed in step 3 one at a time, in the order shown,
4. restarting the server each time.
1. If the server is attached to a KVM switch, make sure the switch is working correctly by plugging the mouse or pointing device cable directly into the correct port on the rear of the server, thus bypassing the KVM switch.
2. Make sure that:
v The mouse or pointing-device cable is securely connected and the keyboard
and mouse cables are not reversed.
v The mouse device drivers are installed correctly.
v The mouse is enabled in the Configuration/Setup Utility program
Reseat the following components:
3.
a. Mouse or pointing device
b. I/O board
Replace the components listed in step 3 one at a time, in the order shown,
4. restarting the server each time.
50 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 67
USB keyboard, mouse, or pointing-device problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Symptom Action
All or some keys on the keyboard do not work.
The USB mouse or USB pointing device does not work.
1. If you have installed a USB keyboard, run the Configuration/Setup Utility program and enable keyboardless operation to prevent the POST error message 301 from being displayed during startup.
2. Make sure that:
v The keyboard cable is securely connected and the keyboard and mouse
cables are not reversed.
v The server and the monitor are turned on.
Reseat the following components:
3.
a. Keyboard
b. I/O board
Replace the components listed in step 3 one at a time, in the order shown,
4. restarting the server each time.
1. Make sure that:
v The mouse or pointing-device USB cable is securely connected to the
server, the keyboard and mouse or pointing-device cables are not reversed, and the device drivers are installed correctly.
v The server and the monitor are turned on.
v Keyboardless operation has been enabled in the Configuration/Setup Utility
program.
If a USB hub is in use, disconnect the USB device from the hub and connect it
2. directly to the server.
3. Reseat the following components:
a. Mouse or pointing device
b. I/O board
Replace the components listed in step 3 one at a time, in the order shown,
4. restarting the server each time.
Chapter 2. Diagnostics 51
Page 68
Memory problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Symptom Action
The amount of system memory that is displayed is less than the amount of installed physical memory.
Note: If you change the memory, you must updated the memory configuration in
the Configuration/Setup Utility program.
1. Make sure that:
v No error LEDs are lit on the operator information panel or on the memory
card.
v Memory mirroring does not account for the discrepancy.
v The memory modules are seated correctly.
v You have installed the correct type of memory.
v All banks of memory are enabled. The server might have automatically
disabled a memory bank when it detected a problem, or a memory bank might have been manually disabled.
Check the POST error log for error message 289:
2.
v If a DIMM was disabled by a system-management interrupt (SMI), replace
the DIMM.
v If a DIMM was disabled by the user or by POST, run the Configuration/Setup
Utility program and enable the DIMM.
Run memory diagnostics (see “Running the on-board diagnostic programs” on
3. page 70).
4. Make sure there is no memory mismatch when the server is at the minimum memory configuration (two 1GB DIMMs; see “Minimum configuration” on page
104).
5. Add one pair of DIMMs at a time, making sure the DIMMs match for each pair added.
6. Add one memory card at a time, making sure the memory matches for each card added.
7. Reseat the following components:
a. DIMM
b. Memory card
Replace the components listed in step 7 one at a time, in the order shown,
8. restarting the server each time.
52 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 69
Microprocessor problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Symptom Action
The server emits a continuous beep during POST, indicating that the startup (boot) microprocessor is not working correctly.
1. Correct any errors indicated by the light path (see “Light path diagnostics” on page 60).
2. Make sure that all microprocessors are supported on this server, and that they all match in speed and cache size.
3. (Trained service technician only) Make sure that the microprocessor 1 is seated correctly.
4. Reseat the following components:
a. (Trained service technician only) Microprocessor 1
b. Microprocessor VRM 3 or 4
c. Microprocessor tray
(Trained service technicians only) If there is no indication of which
5. microprocessor has failed, isolate the error by testing with one microprocessor at a time.
6. Replace the following components one at a time, in the order shown, restarting the server each time.
a. (Trained service technician only) Microprocessor 1
b. Microprocessor VRM 3 or 4
c. (Trained service technician only) Microprocessor tray
(Trained service technician only) If there are multiple microprocessor, light path,
7. or error codes that indicate a damaged microprocessor, switch two microprocessors to see if the error moves with the microprocessor or if it stays with the microprocessor socket.
Note: For sockets 3 and 4, if the error stays with the socket, switch the VRM3
and VRM4.
v If the error moved with the microprocessor, replace the microprocessor.
v If the error moved with the VRM, replace the VRM.
v If the error stayed at the same socket, replace the microprocessor tray.
Monitor problems
Some IBM monitors have their own self-tests. If you suspect a problem with your monitor, see the documentation that comes with the monitor for instructions for testing and adjusting the monitor. If you cannot diagnose the problem, call for service.
Chapter 2. Diagnostics 53
Page 70
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Symptom Action
Testing the monitor
1. Make sure the monitor cables are firmly connected.
2. Try using a different monitor on the server, or try using the monitor being tested on a different server.
3. Run the diagnostic programs. If the monitor passes the diagnostic programs, the problem might be a video device driver.
4. Reseat the following components:
a. Remote Supervisor Adapter II SlimLine, if present
b. I/O board
Replace the components listed in step 4 one at a time, in the order shown,
5. restarting the server each time.
The screen is blank.
1. If the server is attached to a KVM switch, make sure the switch is working correctly by plugging the monitor cable directly into the correct port on the rear of the server, thus bypassing the KVM switch.
2. Make sure that:
v The server is powered on. If there is no power to the server, see “Power
problems” on page 57.
v The monitor cables are connected correctly. v The monitor is turned on and the brightness and contrast controls are
adjusted correctly.
v Make sure that no beep codes sounded when the server is turned on.
Important:
In some memory configurations, the 3-3-3 beep code might sound
during POST, followed by a blank monitor screen. If this occurs do the following:
a. Turn off the server.
b. Move the memory card to a different slot.
c. Turn on the server.
Note: BIOS will see a new configuration and automatically re-enable the
memory slots the were previously disabled.
d. Turn off the server.
e. Return the memory card to the slot it was removed from in 2b.
f. Turn on the server.
Make sure that the correct server is controlling the monitor, if applicable.
3.
4. Make sure that damaged BIOS code is not affecting the video; see “Recovering from a BIOS update failure” on page 88.
5. Observe the checkpoint LEDs on the I/O board; if the codes are changing, go to the next step. if the codes are not changing, see “(Trained service technicians only) Checkpoint codes” on page 47.
6. See “Solving undetermined problems” on page 104.
54 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 71
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Symptom Action
The monitor works when you turn on the server, but the screen goes blank when you start some application programs.
1. Make sure that:
v The application program is not setting a display mode that is higher than the
capability of the monitor.
v You installed the necessary device drivers for the application.
Run video diagnostics (see “Running the on-board diagnostic programs” on
2. page 70).
v If the video diagnostics pass, the video is good; see “Solving undetermined
problems” on page 104.
v (Trained service technician only) If the video diagnostics fail, reseat the I/O
board.
v Replace the I/O board.
The monitor has screen jitter, or the screen image is wavy, unreadable, rolling, or distorted.
1. If the monitor self-tests show the monitor is working correctly, consider the location of the monitor. Magnetic fields around other devices (such as transformers, appliances, fluorescent lights, and other monitors) can cause screen jitter or wavy, unreadable, rolling, or distorted screen images. If this happens, turn off the monitor.
Attention: Moving a color monitor while it is turned on might cause screen
discoloration.
Move the device and the monitor at least 305 mm (12 in.) apart, and turn on the monitor.
Notes:
a. To prevent diskette drive read/write errors, make sure that the distance
between the monitor and any external diskette drive is at least 76 mm (3 in.).
b. Non-IBM monitor cables might cause unpredictable problems.
Reseat the following components:
2.
a. Monitor
b. Remote Supervisor Adapter II SlimLine, if present
c. I/O board
Replace the components listed in step 2 one at a time, in the order shown,
3. restarting the server each time.
Wrong characters appear on the screen.
1. If the wrong language is displayed, update the BIOS code (see “Updating the firmware” on page 147) with the correct language.
2. Reseat the following components:
a. Monitor
b. I/O board
Replace the components listed in step 2 one at a time, in the order shown,
3. restarting the server each time.
Chapter 2. Diagnostics 55
Page 72
Optional-device problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Symptom Action
An IBM optional device that was just installed does not work.
An IBM optional device that used to work does not work now.
POST reporting PCI Event: Redundant PCI Host Bridge IB Link Failed. Slot Number = NA. Bus Number = NA.Device ID = 0xffff. Vendor ID = 0xffff
1. Make sure that:
v The device is designed for the server (see the ServerProven® list on the
World Wide Web at http://www.ibm.com/servers/eserver/serverproven/ compat/us/).
v You followed the installation instructions that came with the device. v The device is installed correctly. v You have not loosened any other installed devices or cables. v You updated the configuration information in the Configuration/Setup Utility
program. Whenever memory or any other device is changed, you must update the configuration.
Reseat the device that you just installed.
2.
3. Replace the device that you just installed.
1. Make sure that all of the hardware and cable connections for the device are secure.
2. Make sure the memory is enabled in the Configuration/Setup Utility program. See, “Memory problems” on page 52.
3. If the device comes with test instructions, use those instructions to test the device.
4. If the failing device is a SCSI device, make sure that:
v The cables for all external SCSI devices are connected correctly. v The last device in each SCSI chain, or the end of the SCSI cable, is
terminated correctly.
v Any external SCSI device is turned on. Yo u must turn on an external SCSI
device before turning on the server.
Reseat the failing device.
5.
6. Replace the failing device.
1. Check for bent pins between the Microprocessor board and the PCI-X board.
2. PCI-X board assembly.
3. Microprocessor tray assembly.
56 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 73
Power problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Symptom Action
The power-on button does not work, and the reset button does work (the server does not turn on).
Note: The power button will not
function until 20 seconds after ac power has been applied to the server.
1. Make sure that the operator information panel power-control button is working correctly:
a. Disconnect the server power cords.
b. Reconnect the power cords.
c. (Trained service technician only) Reseat the operator information panel
cables, and then repeat steps 1a and 1b.
v (Trained service technician only) If the server turns on, reseat the
operator information panel. If the problem persists, replace the operator information panel.
v If the server does not turn on, bypass the operator information panel
power-control button by using the force power-on jumper (see “I/O board internal connectors and jumpers” on page 8); if the server turns on, reseat the operator information panel. If the problem persists, replace the operator information panel.
Make sure that the reset button is working correctly:
2.
a. Disconnect the server power cords.
b. Reconnect the power cords.
c. (Trained service technician only) Reseat the light path panel cable, and then
repeat steps 1a and 1b.
v (Trained service technician only) If the server turns on, replace the light
path panel.
v If the server does not turn on, go to step 3.
Make sure that:
3.
v The power cords are correctly connected to the server and to a working
electrical outlet.
v The type of memory that is installed is correct. v The memory card is fully seated. v The LEDs on the power supply do not indicate a problem. v The microprocessors are installed in the correct sequence.
Reseat the following components:
4.
a. Memory card
b. (Trained service technician only) Power switch connector
c. (Trained service technician only) Power backplane
d. I/O board
Replace the components listed in step 4 one at a time, in the order shown,
5. restarting the server each time.
6. If you just installed an optional device, remove it, and restart the server. If the server now turns on, you might have installed more devices than the power supply supports.
7. See “Power-supply LEDs” on page 67.
8. See “Solving undetermined problems” on page 104.
Chapter 2. Diagnostics 57
Page 74
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Symptom Action
The server does not turn off.
1. Determine whether you are using an Advanced Configuration and Power Management (ACPI) or a non-ACPI operating system. If you are using a non-ACPI operating system, complete the following steps:
a. Press Ctrl+Alt+Delete.
b. Turn off the server by holding the power-control button for 5 seconds.
c. Restart the server.
d. If the server fails POST and the power-control button does not work,
disconnect the ac power cord for 20 seconds; then, reconnect the ac power cord and restart the server.
If the problem remains or if you are using an ACPI-aware operating system,
2. suspect the I/O board.
The server unexpectedly shuts
See “Solving undetermined problems” on page 104. down, and the LEDs on the operator information panel are not lit.
Serial port problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Symptom Action
The number of serial ports that are identified by the operating system is less than the number of installed serial ports.
A serial device does not work.
1. Make sure that:
v Each port is assigned a unique address in the Configuration/Setup Utility
program and none of the serial ports is disabled.
v The serial-port adapter, if you installed one, is seated properly.
Reseat the serial port adapter.
2.
3. Replace the serial port adapter.
1. Make sure that:
v The device is compatible with the server. v The serial port is enabled and is assigned a unique address. v The device is connected to the correct port (see “System-board layouts” on
page 8).
Reseat the following components:
2.
a. Failing serial device
b. Serial cable
c. Remote Supervisor Adapter II SlimLine, if present
d. I/O board
Replace the components listed in step 2 one at a time, in the order shown,
3. restarting the server each time.
58 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 75
ServerGuide problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Symptom Action
The ServerGuide™ Setup and Installation CD will not start.
The ServeRAID Manager program cannot view all installed drives, or the operating system cannot be installed.
The operating-system installation program continuously loops.
The ServerGuide program will not start the operating-system CD.
The operating system cannot be installed; the option is not available.
v Make sure that the server supports the ServerGuide program and has a
startable (bootable) CD or DVD drive.
v If the startup (boot) sequence settings have been changed, make sure that the
CD or DVD drive is first in the startup sequence.
v If more than one CD or DVD drive is installed, make sure that only one drive is
set as the primary drive. Start the CD from the primary drive.
v Make sure that the hard disk drive is connected correctly. v Make sure that the SAS hard disk drive cables are securely connected.
Make more space available on the hard disk.
Make sure that the operating-system CD is supported by the ServerGuide program. See the ServerGuide Setup and Installation CD label for a list of supported operating-system versions.
Make sure that the server supports the operating system. If it does, either no logical drive is defined (SCSI RAID systems), or the ServerGuide System Partition is not present. Run the ServerGuide program and make sure that setup is complete.
Software problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Symptom Action
You suspect a software problem.
1. To determine whether the problem is caused by the software, make sure that:
v The server has the minimum memory that is needed to use the software. For
memory requirements, see the information that comes with the software.
Note: If you have just installed an adapter, the server might have an
adapter-address conflict.
v The software is designed to operate on the server. v Other software works on the server. v The software works on another server.
If you received any error messages when using the software, see the
2. information that comes with the software for a description of the messages and suggested solutions to the problem.
3. Contact your place of purchase of the software.
Chapter 2. Diagnostics 59
Page 76
Universal Serial Bus (USB) port problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Symptom Action
A USB device does not work.
1. Run USB diagnostics (see “Running the on-board diagnostic programs” on page 70).
2. Make sure that:
v The correct USB device driver is installed. v The operating system supports USB devices.
If a standard PS/2 keyboard or mouse is connected, any USB keyboard or
3. mouse will not work during POST.
4. Make sure that the USB configuration options are set correctly in the Configuration/Setup Utility program menu (see the User’s Guide for more information).
5. If you are using a USB hub, disconnect the USB device from the hub and connect it directly to the server.
Video problems
See “Monitor problems” on page 53.
Light path diagnostics
Light path diagnostics provides a path that you can follow to help you identify the source of an error. The server must be connected to a power source for the LEDs inside the server to be lit; the server does not have to be turned on for the LEDs to be lit.
Press the reset button to reset the server and run the power-on self-test (POST). You might have to use a pen or the end of a straightened paper clip to press the button.
The server is designed so that LEDs remain lit when the server is connected to an ac power source but is not turned on, provided that the power supply is operating correctly. This feature helps you to isolate the problem when the operating system is shut down.
Any PCI-X, memory, microprocessor, and VRM LED can be lit again without ac power after you remove the microprocessor tray so that you can isolate a problem. After ac power has been removed from the server, power remains available to these LEDs for up to 24 hours.
To view the PCI-X, memory, microprocessor, and VRM LEDs, press and hold the light-path-diagnostics button on the PCI-X board, memory card, or microprocessor board for 30 seconds to light the error LEDs.
The LEDs that were lit while the server was running will be lit again while the button is pressed.
60 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 77
Many errors are first indicated by a lit information LED or system-error LED on the operator information panel on the front of the server. If one or both of these LEDs are lit, one or more LEDs elsewhere in the server might also be lit and can direct you to the source of the error.
Note: Read the safety information beginning on page vii and “Handling
static-sensitive devices” on page 114.
View the LEDs in the following order:
1. Check the operator information panel on the front of the server.
v If the information LED is lit, it indicates that information about a suboptimal
condition in the server is available in the BMC log or in the system-error log.
Important: If the server has a baseboard management controller, clear the
BMC log and system-event log after you resolve all conditions. This will turn off the information LED and LOG LED, if all conditions are resolved.
v If the system-error LED is lit, it indicates that an error has occurred; go to
step 2.
The following illustration shows the operator information panel.
Power-control button
Information LED
USB connector
Release latch
Power-on LED
Hard disk drive activity LED
System-error LED
Locator LED
2. To view the light path diagnostics panel, press the release latch on the front of the operator information panel to the left; then, slide it forward. This reveals the light path diagnostics panel. Lit LEDs on this panel indicate the type of error that has occurred.
Light Path
Diagnostics
LINK
CPU
MEM
SP
PS
VRM
NMI
DASD
TEMP
LOG
PCI
RAID
FAN
OVER SPEC
REMIND
NONRED
Look at the system service label on the top of the server, which gives an overview of internal components that correspond to the LEDs on the light path
PCI
BRD
CPU BRD
I/O
BRD
Chapter 2. Diagnostics 61
Page 78
diagnostics panel. This information and the information in “Light path diagnostic LEDs” on page 63 can often provide enough information to correct the error.
3. Remove the server cover and look inside the server for lit LEDs. Certain components inside the server have LEDs that will be lit to indicate the location of a problem. For example, a VRM error will light the LED next to the failing VRM on the microprocessor tray.
The following illustration shows the LEDs and connectors on the microprocessor tray.
Light path diagnostics button
Fan 6
Fan 2
Memory card 1
Fan 7
Fan 3
Memory card 2
Memory card 3
Fan 8
Fan 5
Fan 1
Microprocessor 1 socket
Microprocessor 2 socket
11 22
Microprocessor 1 error LED
Microprocessor 2
44
33
Microprocessor 3 error LED
Microprocessor 3 socket
Microprocessor 4 error LED
Microprocessor 4 socket
error LED
The following illustration shows the LEDs on the PCI-X board.
PCI attention LEDs
Memory card 4
Microprocessor card error LED
Fan 4
Microprocessor 3 VRM connector
Microprocessor 4 VRM connector
VRM 4 error LED
VRM 3 error LED
PCI power LEDs
Power good LED
62 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 79
Light path diagnostic LEDs
The following tables describe the LEDs on the light path diagnostics panel and on the boards inside the server and suggested actions to correct the detected problems.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Lit light path diagnostics LED with the system-error or system-information LED also lit Description Action
All LEDs off (the power LED is lit; the information LED might be lit).
OVERSPEC There is insufficient power to power
the system. The NON RED and LOG LEDs might also be lit.
PS A power supply has failed or has
been removed; also see “Power-supply LEDs” on page 67.
Note: In a redundant power
configuration, the dc power LED on one power supply might be off.
LINK Reserved
No action necessary.
1. Add an optional power supply if only one power supply is installed.
2. Use 220 VAC input power.
3. Reseat the following components:
a. Power supply
b. (Trained service technician only) Power
backplane
Replace the components listed in step 3 one at a
4. time, in the order shown, restarting the server each time.
5. Use 220 VAC instead of 110 VAC.
1. Reinstall the removed power supply.
2. Check the individual power-supply LEDs to find the failing power supply.
3. Reseat the following components:
a. Failing power supply
b. (Trained service technician only) Power
backplane
Replace the components listed in step 3 one at a
4. time, in the order shown, restarting the server each time.
5. If a 240 VA fault has occurred, ac power must be removed before dc power can be restored.
Chapter 2. Diagnostics 63
Page 80
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Lit light path diagnostics LED with the system-error or system-information LED also lit Description Action
CPU A microprocessor has failed, is
missing, or has been improperly installed.
Note: Make sure that the
microprocessors are installed in the correct sequence; see “Removing and installing a microprocessor” on page 138.
1. Check the BMC log or the system-error log to determine the reason for the lit LED.
2. Find the failing, missing, or mismatched microprocessor by checking the LEDs on the microprocessor tray.
3. Reseat the following components:
a. (Trained service technician only) Failing
microprocessor
b. Microprocessor tray
Replace the following components one at a time,
4. in the order shown, restarting the server each time.
a. (Trained service technician only) Failing
microprocessor
b. (Trained service technician only)
Microprocessor tray
VRM A dc-dc regulator has failed or is
missing.
1. Check the BMC log or the system-error log to determine the reason for the lit LED (for a VRM).
2. Find the failing or missing VRM by checking the LEDs on the microprocessor tray.
3. Install any missing VRMs.
4. Reseat the following components:
a. Failing VRM
b. (Trained service technician only)
Microprocessor associated with the VRM
c. Microprocessor tray
Replace the following components one at a time,
5. in the order shown, restarting the server each time.
a. Failing VRM
b. (Trained service technician only)
Microprocessor associated with the VRM
c. (Trained service technician only)
Microprocessor tray
LOG Information is present in the BMC
log and system-error log. One or both logs may be full or close to full.
1. The system-error log is 75% full; save the log if necessary and clear it (see Error Logs at “Configuration/Setup Utility menu choices” on page
149).
2. Check the log for possible errors (see “Error logs” on page 18).
64 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 81
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Lit light path diagnostics LED with the system-error or system-information LED also lit Description Action
MEM Memory failure.
Note: The error LED on the
memory card is also lit.
1. Remove the memory card with the lit error LED on the top of the card; then, press the light path button on the memory card to identify the failed card or DIMM.
2. Reseat the DIMM.
3. Replace the following components one at a time, in the order shown, restarting the server and re-enabling the memory each time:
a. Memory card
b. DIMM
c. (Trained service technician only)
Microprocessor tray
NMI A hardware error has been reported
to the operating system.
Note: The PCI or MEM LED might
also be lit.
1. See the BMC log and the system-error log (see “Error logs” on page 18).
2. If the PCI LED is lit, follow the instructions for that LED.
3. If the MEM LED is lit, follow the instructions for that LED.
4. Restart the server.
PCI A PCI adapter has failed.
Note: The error LED next to the
failing adapter on the PCI-X board is also lit.
1. See the BMC log or the system-error log (see “Error logs” on page 18).
2. Reseat the following components:
a. Failing adapter
b. I/O board
Replace the components listed in step 2 one at a
3. time, in the order shown, restarting the server each time.
SP There is a fault in the Remote
Supervisor Adapter II SlimLine.
1. Reseat the Remote Supervisor Adapter II SlimLine.
2. Update the firmware for the Remote Supervisor Adapter II SlimLine (see “Updating the firmware” on page 147).
3. Replace the Remote Supervisor Adapter II SlimLine.
Chapter 2. Diagnostics 65
Page 82
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Lit light path diagnostics LED with the system-error or system-information LED also lit Description Action
DASD A hard disk drive has failed or has
been removed.
Note: The error LED on the failing
hard disk drive is also lit.
1. Reinstall the removed drive.
2. Reseat the following components:
a. Failing hard disk drive
b. SAS hard disk drive backplane
c. SAS 6x cable
d. I/O board
Replace the components listed in step 2 one at a
3. time, in the order shown, restarting the server each time.
RAID The RAID adapter (ServeRAID 8i)
has indicated a fault.
1. See the BMC log or the system-error log (see “Error logs” on page 18).
2. Reseat the following components:
a. RAID adapter
b. Hard disk drives
c. I/O board
Replace the components listed in step 2 one at a
3. time, in the order shown, restarting the server each time.
NONRED The server is operating with
nonredundant power. If a power supply or its ac power source fails, the system will be over spec.
Note: The LOG LED might also be
lit.
TEMP A system temperature or component
has exceeded specifications.
Note: A fan LED might also be lit.
1. If the PS LED on the light path diagnostics panel is lit, follow the instructions for that LED.
2. Replace the failing power supply.
3. Remove optional devices.
4. Use 220 VAC instead of 110 VAC.
1. See the BMC log or the system-error log (see “Error logs” on page 18) for the source of the fault.
2. Make sure that the airflow of the server is not blocked.
3. If a fan LED is lit, reseat the fan.
4. Replace the fan for which the LED is lit.
5. Make sure that the room is neither too hot nor too cold (see “Environment” in “Features and specifications” on page 3).
6. If one of the VRDs indicates “hot,” ac power must be removed before dc power can be restored.
66 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 83
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Lit light path diagnostics LED with the system-error or system-information LED also lit Description Action
FAN A fan has failed or has been
removed.
Note: A failing fan can also cause
the TEMP LED to be lit.
PCI BRD The PCI-X board has failed.
CPU BRD The microprocessor tray has failed.
I/O BRD The I/O board has failed.
1. Reinstall the removed fan.
2. If an individual fan LED is lit, replace the fan.
3. Reseat the microprocessor tray.
4. (Trained service technician only) Replace the microprocessor tray.
1. (Trained service technician only) Reseat the PCI-X board assembly.
2. (Trained service technician only) Replace the PCI-X board assembly.
1. Reseat the microprocessor tray.
2. (Trained service technician only) Replace the microprocessor tray.
1. Reseat the I/O board.
2. Replace the I/O board.
Remind button
You can use the remind button to put the system-error LED on the operator information panel into Remind mode. When you press the remind button, you acknowledge the error but indicate that you will not take immediate action. The system-error LED flashes while it is in Remind mode.
The system-error LED stays in Remind mode until one of the following conditions occurs:
v All known errors are corrected. v The server is restarted. v A new error occurs (the LED is lit again).
Power-supply LEDs
The following table describes the power-supply LEDs and suggested actions to correct the detected problems.
The following minimum configuration is required for the power-supply dc good LED to be lit:
v Power supply v Power backplane v Power cord v Microprocessor tray
The following minimum configuration is required for the server to turn on:
Chapter 2. Diagnostics 67
Page 84
v One microprocessor v Two 1 GB DIMMs on the memory card v One power supply v Power backplane v Power cord v I/O board v PCI-X board assembly
AC
DC
Power supply 1 (PS1)
AC power LED (green)
AC
DC power LED (green)
DC
68 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 85
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Power-supply
LEDs
AC
good
good
Off Off Off No power to the
Operator information panel power
DC
LED Description Action
server, or a problem with the ac power source.
1. Check the ac power to the server.
2. Make sure that the power cord is connected to a functioning power source.
3. Remove one power supply at a time.
Lit Off Off DC source power
problem
1. Make sure that the microprocessor tray is connected to the power backplane.
2. Remove one power supply at a time.
3. View the system-error log (see “Error logs” on page
18).
Lit Lit Off Standby power
problem
1. View the system-error log (see “Error logs” on page
18).
2. Isolate by removing one power supply at a time.
3. (Trained service technician only) Replace the power backplane.
Lit Lit Flashing System power-on
problem
1. View the system-error log (see “Error logs” on page
18).
2. Press the power-control button on the operator information panel.
3. (Trained service technician only) Use the force-power-on jumper as a debugging aid (see “I/O board internal connectors and jumpers” on page 8) to determine whether the information panel switch and cable are faulty.
4. Remove the optional Remote Supervisor Adapter II SlimLine, and try to turn on the server.
5. Reseat the microprocessor tray.
6. (Trained service technician only) Replace the microprocessor tray.
Lit Lit Lit The power is good. No action.
Diagnostic programs, messages, and error codes
The server diagnostic programs are the primary method of testing the major components of the server.
As you run the diagnostic programs, text messages and error codes are displayed on the screen and are saved in the test log. A diagnostic text message or error code indicates that a problem has been detected; to determine what action you should take as a result of a message or error code, see the table in “Diagnostic error codes” on page 71.
Chapter 2. Diagnostics 69
Page 86
Real-time diagnostics
Real-time diagnostics can help you diagnose certain devices on IBM System x and xSeries® servers while the operating system is running. Using these diagnostic actions, you can prevent and minimize server downtime.
For more information and to download the real-time diagnostics, go to the following Web page:
http://www-1.ibm.com/support/docview.wss?uid=psg1MIGR-50681
Running the on-board diagnostic programs
To run the on-board diagnostic programs, complete the following steps:
1. If the server is running, turn off the server and all attached devices.
2. Turn on all attached devices; then, turn on the server.
3. When the prompt F2 for Diagnostics appears, press F2. If you have set both a
power-on password and an administrator password, you must type the administrator password to run the diagnostic programs.
4. From the top of the screen, select either Extended or Basic.
5. From the diagnostic programs screen, select the test that you want to run, and follow the instructions on the screen.
To determine what action you should take as a result of a diagnostic text message or error code, see the table in “Diagnostic error codes” on page 71.
A single problem might cause several text messages or error codes. When this occurs, correct the cause of the first message or error code. The other messages and error codes usually will not occur the next time you run the test.
For help with the diagnostic programs, press F1. You also can press F1 from within a help screen to obtain online documentation from which you can select different categories. To exit from the help information, press Esc.
If the server stops during testing and you cannot continue, restart the server and try running the diagnostic programs again. If the problem remains, replace the component that was being tested when the server stopped.
The keyboard and mouse (pointing device) tests assume that a keyboard and mouse are attached to the server.
If no mouse or a USB mouse is attached to the server, you cannot use the Next
Cat and Prev Cat buttons to select categories. All other mouse-selectable functions
are available through function keys.
You can use the regular keyboard test to test a USB keyboard, and you can use the regular mouse test to test a USB mouse. You can run the USB interface test only if no USB devices are attached. The USB test will not run if a Remote Supervisor Adapter II SlimLine is installed.
To view server configuration information (such as system configuration, memory contents, interrupt request (IRQ) use, direct memory access (DMA) use, device drivers, and so on), select Hardware Info from the top of the screen.
70 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 87
If the diagnostic programs do not detect any hardware errors but the problem remains during normal server operations, a software error might be the cause. If you suspect a software problem, see the information that comes with your software.
Diagnostic text messages
Diagnostic text messages are displayed while the tests are running. A diagnostic text message contains one of the following results:
Passed: The test was completed without any errors.
Failed: The test detected an error.
User Aborted: You stopped the test before it was completed.
Not Applicable: You attempted to test a device that is not present in the server.
Aborted: The test could not proceed because of the server configuration.
Warning: The test could not be run. There was no failure of the hardware that was
being tested, but there might be a hardware failure elsewhere, or another problem prevented the test from running; for example, there might be a configuration problem, or the hardware might be missing or is not being recognized.
The result is followed by an error code or other additional information about the error.
Viewing the test log
To view the test log when the tests are completed, select Utility from the top of the screen and then select View Test Log. The test-log data is maintained only while you are running the diagnostic programs. When you exit from the diagnostic programs, the test log is cleared.
To save the test log to a file on a diskette or to the hard disk, click Save Log on the diagnostic programs screen and specify a location and name for the saved log file.
Notes:
1. To create and use a diskette, you must add an optional external diskette drive to the server.
2. To save the test log to a diskette, you must use a diskette that you have formatted yourself; this function does not work with preformatted diskettes. If the diskette has sufficient space for the test log, the diskette can contain other data.
Diagnostic error codes
The following table describes the error codes that the diagnostic programs might generate and suggested actions to correct the detected problems.
If the diagnostic programs generate error codes that are not listed in the table, make sure that the latest levels of BIOS, Remote Supervisor Adapter II SlimLine, and ServeRAID code are installed.
In the error codes, x can be any numeral or letter. However, if the three-digit number in the central position of the code is 000, 195, or 197, do not replace a CRU or FRU. These numbers appearing in the central position of the code have the following meanings:
Chapter 2. Diagnostics 71
Page 88
000 The server passed the test. Do not replace a CRU or FRU.
195 The Esc key was pressed to end the test. Do not replace a CRU or FRU.
197 This is a warning error, but it does not indicate a hardware failure; do not
replace a CRU or FRU. Take the action indicated in the “Action” column but do not replace a CRU or a FRU. See the description for Warning in the section “Diagnostic text messages” on page 71 for more information.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
001-198-000 Test aborted.
001-250-00x Test failed, where
v x of 0 = ECC logic on I/O board
v x of 1 = ECC logic on memory card
001-292-000 Core system: failed/CMOS checksum failed. Load the BIOS default settings using the
001-xxx-000 Failed core tests.
001-xxx-001 Failed core tests.
1. Check the system-error log and the BMC log for messages indicating the cause of the error, and take the indicated action.
2. From the diagnostic programs, run Quick Memory Test All Banks; then, if an error is detected, take the indicated action. Make sure you re-enable the memory in the Configuration/Setup Utility program. See, “Memory problems” on page 52.
3. Reinstall and, if necessary, update the BIOS code on the server; then, rerun the test (see “Updating the firmware” on page 147).
1. Check the system-error log and the BMC log for messages indicating the cause of the error, and take the indicated action.
2. From the diagnostic programs, run Quick Memory Test All Banks; then, if an error is detected, take the indicated action.
3. From the diagnostic programs, rerun the ECC test; then, if an error is detected, take the indicated action.
4. Reseat the following components:
a. Memory card
b. I/O board
Replace the components listed in step 4 one at a
5. time, in the order shown, restarting the server each time. Make sure you re-enable the memory in the Configuration/Setup Utility program. See, “Memory problems” on page 52
Configuration/Setup Utility program and run the test again (see “Configuration/Setup Utility menu choices” on page 149).
1. Reseat the I/O board.
2. Replace the I/O board.
1. Reseat the I/O board.
2. Replace the I/O board.
72 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 89
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
005-xxx-000 Failed video test.
1. Reseat the I/O board.
2. Replace the I/O board.
011-xxx-000 Failed COM1 serial port test.
1. Reseat the I/O board.
2. Replace the I/O board.
015-xxx-001 Failed USB test.
1. Reseat the I/O board.
2. Replace the I/O board.
015-xxx-015 Failed USB external loopback test.
1. Reseat the I/O board.
2. Replace the I/O board.
015-xxx-198 Remote Supervisor Adapter II SlimLine
installed or USB device connected during USB test.
1. If a Remote Supervisor Adapter II SlimLine is installed as an option, remove it and run the test again.
2. Remove all USB devices and run the test again.
3. Reseat the I/O board.
4. Replace the I/O board.
020-xxx-000 Failed PCI Interface test.
1. Reseat the following components:
a. (Trained service technician only) PCI-X switch
card assembly
b. I/O board
Replace the components listed in step 1 one at a
2. time, in the order shown, restarting the server each time.
020-xxx-001 Failed hot-swap slot 1 PCI latch test.
1. (Trained service technician only) Reseat the PCI-X switch card assembly.
2. (Trained service technician only) Replace the PCI-X switch card assembly.
020-xxx-002 Failed hot-swap slot 2 PCI latch test.
1. (Trained service technician only) Reseat the PCI-X switch card assembly.
2. (Trained service technician only) Replace the PCI-X switch card assembly.
020-xxx-003 Failed hot-swap slot 3 PCI latch test.
1. (Trained service technician only) Reseat the PCI-X switch card assembly.
2. (Trained service technician only) Replace the PCI-X switch card assembly.
020-xxx-004 Failed hot-swap slot 4 PCI latch test.
1. (Trained service technician only) Reseat the PCI-X switch card assembly.
2. (Trained service technician only) Replace the PCI-X switch card assembly.
Chapter 2. Diagnostics 73
Page 90
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
020-xxx-005 Failed hot-swap slot 5 PCI latch test.
1. (Trained service technician only) Reseat the PCI-X switch card assembly.
2. (Trained service technician only) Replace the PCI-X switch card assembly.
020-xxx-006 Failed hot-swap slot 6 PCI latch test.
1. (Trained service technician only) Reseat the PCI-X switch card assembly.
2. (Trained service technician only) Replace the PCI-X switch card assembly.
030-265-001 Communication Error.
1. Update the microcode for the Serial Attached SCSI (SAS) controller (see “Updating the firmware” on page 147).
2. Update the BIOS code (see “Updating the firmware” on page 147).
3. Reseat the I/O board.
4. Replace the I/O board.
030-266-001 Eight SAS/SATA Channel Error.
1. Update the microcode for the Serial Attached SCSI (SAS) controller (see “Updating the firmware” on page 147).
2. Update the BIOS code (see “Updating the firmware” on page 147).
3. Reseat the I/O board.
4. Replace the I/O board.
030-267-001 Central Management Seq error.
1. Update the microcode for the Serial Attached SCSI (SAS) controller (see “Updating the firmware” on page 147).
2. Update the BIOS code (see “Updating the firmware” on page 147).
3. Reseat the I/O board.
4. Replace the I/O board.
030-268-001 Link m Cntrl 0 Sequencer error.
1. Update the microcode for the Serial Attached SCSI (SAS) controller (see “Updating the firmware” on page 147).
2. Update the BIOS code (see “Updating the firmware” on page 147).
3. Reseat the I/O board.
4. Replace the I/O board.
74 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 91
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
030-269-001 Link m Cntrl 1 Sequencer error.
1. Update the microcode for the Serial Attached SCSI (SAS) controller (see “Updating the firmware” on page 147).
2. Update the BIOS code (see “Updating the firmware” on page 147).
3. Reseat the I/O board.
4. Replace the I/O board.
030-270-001 On Chip Memory access error.
1. Update the microcode for the Serial Attached SCSI (SAS) controller (see “Updating the firmware” on page 147).
2. Update the BIOS code (see “Updating the firmware” on page 147).
3. Reseat the I/O board.
4. Replace the I/O board.
030-271-001 SRAM access error.
1. Update the microcode for the Serial Attached SCSI (SAS) controller (see “Updating the firmware” on page 147).
2. Update the BIOS code (see “Updating the firmware” on page 147).
3. Reseat the I/O board.
4. Replace the I/O board.
030-272-001 NVRAM access error.
1. Update the microcode for the Serial Attached SCSI (SAS) controller (see “Updating the firmware” on page 147).
2. Update the BIOS code (see “Updating the firmware” on page 147).
3. Reseat the I/O board.
4. Replace the I/O board.
030-273-001 FLASH access error.
1. Update the microcode for the Serial Attached SCSI (SAS) controller (see “Updating the firmware” on page 147).
2. Update the BIOS code (see “Updating the firmware” on page 147).
3. Reseat the I/O board.
4. Replace the I/O board.
030-274-001 Base Addr Register Key error.
1. Update the microcode for the Serial Attached SCSI (SAS) controller (see “Updating the firmware” on page 147).
2. Update the BIOS code (see “Updating the firmware” on page 147).
3. Reseat the I/O board.
4. Replace the I/O board.
Chapter 2. Diagnostics 75
Page 92
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
030-xxx-00n Failed SCSI test on PCI slot n where n
represents the slot number of the failing adapter.
1. Check the BMC log or system-error log before replacing a CRU or FRU (see“Error logs” on page
18).
2. Reseat and, if necessary, replace the adapter in slot n.
035-002-0nn ServeRAID interface timeout.
1. The ServeRAID controller might not be configured correctly. Obtain the basic and extended configuration status bytes and see the ServeRAID
Hardware Maintenance Manual for more
information.
2. Reseat the following components:
a. SAS hard disk drive backplane cables
b. ServeRAID controller
Replace the components listed in step 2 one at a
3. time, in the order shown, restarting the server each time.
035-253-0nn ServeRAID controller 0nn initialization
failure; 0nn = the controller number.
1. The ServeRAID controller might not be configured correctly. See the ServeRAID Hardware
Maintenance Manual for more information.
2. Reseat the following components:
a. SAS hard disk drive backplane cables
b. ServeRAID controller
Replace the components listed in step 2 one at a
3. time, in the order shown, restarting the server each time.
035-253-s99 RAID adapter initialization failure.
1. Reseat the following components:
a. ServeRAID adapter
b. SAS hard disk drive backplane cable
Replace the components listed in step 1 one at a
2. time, in the order shown, restarting the server each time.
035-254-0nn Setup error; unable to allocate memory to
run test.
Check the system resources and make more memory available (see “Configuration/Setup Utility menu choices” on page 149); then, run the test again.
035-255-0nn Internal error.
1. Reseat the SAS hard disk drive backplane cable.
2. Replace the SAS hard disk drive backplane.
035-260-0nn System to controller interface failure.
1. Reseat the following components:
a. ServeRAID adapter
b. I/O board
Replace the components listed in step 1 one at a
2. time, in the order shown, restarting the server each time.
76 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 93
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
035-265-0nn Adapter Communication error.
1. Update the RAID controller firmware (see “Updating the firmware” on page 147).
2. Reseat and, if necessary, replace the RAID controller.
035-266-0nn Adapter CPU test error.
1. Update the RAID controller firmware (see “Updating the firmware” on page 147).
2. Reseat and, if necessary, replace the RAID controller.
035-267-0nn Adapter Local RAM test error.
1. Update the RAID controller firmware (see “Updating the firmware” on page 147).
2. Reseat and, if necessary, replace the RAID controller.
035-268-0nn Adapter NVSRAM test error.
1. Update the RAID controller firmware (see “Updating the firmware” on page 147).
2. Reseat and, if necessary, replace the RAID controller.
035-269-0nn Adapter Cache test error.
1. Update the RAID controller firmware (see “Updating the firmware” on page 147).
2. Reseat and, if necessary, replace the RAID controller.
035-271-0nn Adapter XOR engine test error.
1. Update the RAID controller firmware (see “Updating the firmware” on page 147).
2. Reseat and, if necessary, replace the RAID controller.
035-272-0nn Adapter Drive test error. Replace the attached drive.
035-273-0nn Adapter Drive error. Replace the attached drive.
035-274-0nn Adapter Parameters set error.
1. Update the RAID controller firmware (see “Updating the firmware” on page 147).
2. Reseat and, if necessary, replace the RAID controller.
035-275-001 Adapter Communication error.
1. Update the RAID controller firmware (see “Updating the firmware” on page 147).
2. Reseat and, if necessary, replace the RAID controller.
035-276-001 Adapter CPU test error.
1. Update the RAID controller firmware (see “Updating the firmware” on page 147).
2. Reseat and, if necessary, replace the RAID controller.
Chapter 2. Diagnostics 77
Page 94
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
035-277-001 Adapter Local RAM test error.
1. Update the RAID controller firmware (see “Updating the firmware” on page 147).
2. Reseat and, if necessary, replace the RAID controller.
035-278-001 Adapter NVSRAM test error.
1. Update the RAID controller firmware (see “Updating the firmware” on page 147).
2. Reseat and, if necessary, replace the RAID controller.
035-279-001 Adapter Cache test error.
1. Update the RAID controller firmware (see “Updating the firmware” on page 147).
2. Reseat and, if necessary, replace the RAID controller.
035-280-001 Adapter Drive test error. Replace the attached drive.
035-281-001 Adapter Drive error. Replace the attached drive.
035-282-001 Adapter Parameters set error.
1. Update the RAID controller firmware (see “Updating the firmware” on page 147).
2. Reseat and, if necessary, replace the RAID controller.
035-283-001 Adapter Battery error. Replace the battery module on the RAID controller.
035-xxx-cnn c = ServeRAID channel number, nn = SCSI
ID of failing fixed disk drive.
1. Check the BMC log or system-error log before replacing a FRU.
2. Reseat and, if necessary, replace the hard disk drive on channel C, SCSI ID nn.
035-xxx-snn nn = SCSI ID of failing fixed disk.
1. Check the BMC log or system-error log before replacing a FRU.
2. Reseat and, if necessary, replace the SCSI disk with ID nn on adapter in slot s
075-xxx-000 Failed power supply test.
1. Reseat the following components:
a. Power supply
b. (Trained service technician only) Power
backplane
Replace the components listed in step 1 one at a
2. time, in the order shown, restarting the server each time.
78 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 95
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
089-xxx-0nn Failed microprocessor test, where
nn=APIC ID.
APIC ID (physical mode)
Microprocessor
00, 01, 02, 03 1
04, 05, 06, 07 2
10, 11, 12, 13 3
14, 15, 16, 17 4
ID (logical mode)
APIC
Microprocessor
1. Reseat the following components:
a. (Trained service technician only)
Microprocessor nn
b. Microprocessor tray
Replace the following components one at a time,
2. in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor nn
b. (Trained service technician only)
Microprocessor tray
00, 01, 02, 03 1
24, 25, 26, 27 2
10, 11, 12, 13 3
34, 35, 36, 37 4
155-xxx-xxx Failed Active Memory™ latch test. Note: Make sure you re-enable the memory in the
Configuration/Setup Utility program. See, “Memory problems” on page 52
1. Reseat the memory card.
2. Replace the memory card.
166-051-000 System Management: Failed. Unable to
communicate with ASM. It may be busy. Run the test again.
1. Update the firmware (BIOS, service processor, and diagnostics; see “Updating the firmware” on page 147).
2. Run the diagnostic test again.
3. Correct other error conditions (including failed systems-management tests and items that are logged in the Remote Supervisor Adapter II SlimLine system-error log) and retry.
4. Disconnect all server and option power cords from the server, wait 30 seconds, reconnect the power cords, and retry.
5. Reseat the Remote Supervisor Adapter II SlimLine.
6. Replace the Remote Supervisor Adapter II SlimLine.
Chapter 2. Diagnostics 79
Page 96
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
166-060-000 System Management: Failed. Unable to
communicate with ASM. It may be busy. Run the test again.
1. Update the firmware (BIOS, service processor, and diagnostics; see “Updating the firmware” on page 147).
2. Run the diagnostic test again.
3. Correct other error conditions (including failed systems-management tests and items that are logged in the Remote Supervisor Adapter II SlimLine system-error log) and retry.
4. Disconnect all server and option power cords from the server, wait 30 seconds, reconnect the power cords, and retry.
5. Reseat the Remote Supervisor Adapter II SlimLine.
6. Replace the Remote Supervisor Adapter II SlimLine.
166-070-000 System Management: Failed. Unable to
communicate with ASM. It may be busy. Run the test again.
1. Update the firmware (BIOS, service processor, and diagnostics; see “Updating the firmware” on page 147).
2. Run the diagnostic test again.
3. Correct other error conditions (including failed systems-management tests and items that are logged in the Remote Supervisor Adapter II SlimLine system-error log) and retry.
4. Disconnect all server and option power cords from the server, wait 30 seconds, reconnect the power cords, and retry.
5. Reseat the Remote Supervisor Adapter II SlimLine.
6. Replace the Remote Supervisor Adapter II SlimLine.
166-198-000 BIOS cannot detect ASM. Reseat ASM
adapter in correct slot; ASM restart failure. Unplug and cold boot server to reset ASM.
1. Run the diagnostic test again.
2. Correct other error conditions (including other failed systems-management tests and items that are logged in the Remote Supervisor Adapter II SlimLine system-error log) and retry.
3. Disconnect all server and option power cords from the server, wait 30 seconds, reconnect the power cords, and retry.
4. Reseat the following components:
a. Remote Supervisor Adapter II SlimLine
b. I/O board
Replace the components listed in step 4 one at a
5. time, in the order shown, restarting the server each time.
166-201-000 ISMP indicates I2C errors on bus X. Reseat and, if necessary, replace the I/O board.
80 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 97
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
166-201-001 ISMP indicates I2C errors on bus P.
1. Reseat the following components:
a. (Trained service technician only) Power
backplane
b. I/O board
c. Microprocessor tray
Replace the following components one at a time,
2. in the order shown, restarting the server each time.
a. (Trained service technician only) Power
backplane
b. I/O board
c. (Trained service technician only)
Microprocessor tray
166-201-002 ISMP indicates I2C errors on bus I. Reseat and, if necessary, replace the I/O board.
166-201-003 ISMP indicates I2C errors on bus C.
1. Reseat the following components:
a. Microprocessor tray
b. I/O board
Replace the following components one at a time,
2. in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor tray
b. I/O board
166-201-004 ISMP indicates I2C errors on bus M.
1. Reseat the following components:
a. I/O board
b. Memory card
c. Microprocessor tray
Replace the following components one at a time,
2. in the order shown, restarting the server each time.
a. I/O board
b. Memory card
c. (Trained service technician only)
Microprocessor tray
166-201-005 ISMP indicates I2C errors on bus S.
1. Reseat the following components:
a. SAS hard disk drive backplane cables
b. I/O board
2.
Replace the following components one at a time,
in the order shown, restarting the server each time.
a. SAS hard disk drive backplane
b. I/O board
Chapter 2. Diagnostics 81
Page 98
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
166-201-006 ISMP indicates I2C errors on bus O.
1. Reseat the following components:
a. (Trained service technician only) Operator
information panel
b. I/O board
Replace the components listed in step 1 one at a
2. time, in the order shown, restarting the server each time.
166-201-007 ISMP indicates I2C errors on bus M0.
1. Reseat the following components:
a. Memory card
b. I/O board
c. Microprocessor tray
Replace the following components one at a time,
2. in the order shown, restarting the server each time.
a. Memory card
b. I/O board
c. (Trained service technician only)
Microprocessor tray
166-201-008 ISMP indicates I2C errors on bus M1.
1. Reseat the following components:
a. Memory card
b. I/O board
c. Microprocessor tray
Replace the following components one at a time,
2. in the order shown, restarting the server each time.
a. Memory card
b. I/O board
c. (Trained service technician only)
Microprocessor tray
166-260-000 ASM restart failure.
1. Disconnect all server and option power cords from the server, wait 30 seconds, reconnect the power cords, and retry.
2. Reseat the Remote Supervisor Adapter II SlimLine.
3. Replace the Remote Supervisor Adapter II SlimLine.
166-342-000 System management BIST indicates failed
tests.
1. Disconnect all server and option power cords from the server, wait 30 seconds, reconnect the power cords, and retry.
2. Reseat the Remote Supervisor Adapter II SlimLine.
3. Replace the Remote Supervisor Adapter II SlimLine.
82 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Page 99
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
166-400-000 ISMP Self Test Result failed tests: xxx
where xxx=flash, ROM, or RAM.
1. Disconnect all server and option power cords from the server, wait 30 seconds, reconnect the power cords, and retry.
2. Update the BMC firmware (see “Updating the firmware” on page 147).
3. Reseat the I/O board.
4. Replace the I/O board.
166-400-100 DMC Self Test Result failed tests: xxx where
xxx=flash, ROM, or RAM.
1. Disconnect all server and option power cords from the server, wait 30 seconds, reconnect the power cords, and retry.
2. Update the BIOS code, BMC, service processor, and diagnostics firmware (see “Updating the firmware” on page 147).
180-197-000 SCSI ASPI driver not installed.
1. Remove the RAID adapter, if one is installed, and run the test again.
2. Reseat the following components:
a. SAS hard disk drive backplane cables
b. I/O board
c. Microprocessor tray
Replace the following components one at a time,
3. in the order shown, restarting the server each time.
a. SAS hard disk drive backplane
b. I/O board
c. (Trained service technician only)
Microprocessor tray
180-361-003 Failed fan LED test.
1. Reseat the following components:
a. Fan
b. I/O board
Replace the components listed above one at a
2. time, in the order listed above, restarting the server each time.
180-xxx-000 Diagnostics LED failure. Run the diagnostic LED test for the failing LED.
Chapter 2. Diagnostics 83
Page 100
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8863, 7362,” on page 107 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only)”, that step must be performed only by a
trained service technician.
Error code Description Action
180-xxx-001 Failed front LED panel test.
1. Reseat the following components:
a. (Trained service technician only) Operator
information panel
b. I/O board
c. Microprocessor tray
Replace the following components one at a time,
2. in the order shown, restarting the server each time.
a. (Trained service technician only) Operator
information panel
b. I/O board
c. (Trained service technician only)
Microprocessor tray
180-xxx-002 Failed diagnostics LED panel test.
1. Reseat the following components:
a. (Trained service technician only) Operator
information panel
b. I/O board
c. Microprocessor tray
Replace the following components one at a time,
2. in the order shown, restarting the server each time.
a. (Trained service technician only) Operator
information panel
b. I/O board
c. (Trained service technician only)
Microprocessor tray
180-xxx-005 Failed SCSI backplane LED test.
1. Reseat the following components:
a. SAS hard disk drive backplane cable
b. I/O board
c. Microprocessor tray
Replace the following components one at a time,
2. in the order shown, restarting the server each time.
a. SAS hard disk drive backplane cable
b. SAS hard disk drive backplane
c. I/O board
d. (Trained service technician only)
Microprocessor tray
84 IBM System x3850 Type 8863, 7362: Problem Determination and Service Guide
Loading...