IBM LS20, AMD Opteron LS20 Type 8850 Service Manual

AMD Opteron LS20 Type 8850 for IBM BladeCenter
P roblem Dete rminatio n an d Se rvi ce Gui de
AMD Opteron LS20 Type 8850 for IBM BladeCenter
P roblem Dete rminatio n an d Se rvi ce Gui de
Note: Before using this information and the product it supports, read the general information in Appendix B, “Notices,” on page 93.
Fifth Edition (October 2010)
© Copyright IBM Corporation 2005.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
Safety ............................v
Guidelines for trained service technicians ...............vi
Inspecting for unsafe conditions ..................vi
Guidelines for servicing electrical equipment .............vi
Safety statements ........................vii
Chapter 1. Introduction ......................1
Related documentation ......................1
Notices and statements in this document ................2
LS20 Type 8850 specifications for non-NEBS/ETSI environments .......3
Blade server control panel buttons and LEDs ..............4
Turning on the blade server.....................6
Turning off the blade server.....................6
System-board layouts .......................7
System-board connectors ....................7
System-board jumpers .....................7
System-board LEDs ......................8
Chapter 2. Diagnostics ......................9
Diagnostic tools .........................9
POST.............................9
POST beep codes .......................9
Error logs ..........................17
POST error codes .......................19
Checkout procedure .......................26
About the checkout procedure ..................26
Performing the checkout procedure ................27
Troubleshooting tables ......................27
CD or DVD drive problems ...................28
Diskette drive problems .....................29
General problems .......................29
Hard disk drive problems ....................30
Intermittent problems......................30
Keyboard, mouse, or pointing-device problems ............31
Memory problems .......................32
Microprocessor problems ....................32
Monitor or video problems ....................33
Network connection problems ..................34
Optional-device problems ....................35
Power error messages .....................35
Power problems .......................37
ServerGuide problems .....................38
Service processor problems ...................39
Software problems ......................39
Universal Serial Bus (USB) port problems ..............39
Light path diagnostics ......................40
Viewing the light path diagnostics LEDs...............40
Light path diagnostics LEDs ...................42
Diagnostic programs, messages, and error codes ............43
Running the diagnostic programs .................43
Diagnostic text messages ....................44
Viewing the test log ......................45
Diagnostic error codes .....................45
© Copyright IBM Corp. 2005 iii
Recovering from a BIOS update failure ................49
Service processor (BMC) error codes ................50
Solving SCSI problems ......................50
Solving undetermined problems ...................50
Calling IBM for service ......................51
Chapter 3. Parts listing, Type 8850 .................53
Server replaceable units .....................54
Product recovery CDs .....................55
Chapter 4. Removing and replacing blade server components ......59
Installation guidelines ......................59
System reliability guidelines ...................60
Handling static-sensitive devices .................60
Returning a device or component .................60
Removing and installing the blade server in a BladeCenter unit .......61
Operating the blade server cover ..................64
Removing and replacing the bezel assembly ..............66
Removing and replacing Tier 1 CRUs ................67
SCSI hard disk drive ......................67
Memory modules (DIMMs) ...................69
I/O expansion card ......................72
Battery ...........................76
Removing and replacing FRUs ...................78
Microprocessor ........................78
System board assembly ....................84
Chapter 5. Configuration information and instructions .........85
Updating the firmware ......................85
Configuring the blade server ....................86
Using the Configuration/Setup Utility program .............86
Starting the Configuration/Setup Utility program ............87
Configuration/Setup Utility menu choices ..............87
Using passwords .......................87
Configuring the Gigabit Ethernet controllers ..............88
Blade server Ethernet controller enumeration ..............89
Configuring a SCSI RAID .....................89
Appendix A. Getting help and technical assistance ..........91
Before you call .........................91
Using the documentation .....................91
Getting help and information from the World Wide Web ..........92
Software service and support ...................92
Hardware service and support ...................92
Appendix B. Notices ......................93
Edition notice .........................93
Trademarks ..........................94
Important notes.........................95
Product recycling and disposal ...................95
Battery return program ......................96
Index ............................97
iv AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Safety
Before installing this product, read the Safety Information.
Antes de instalar este produto, leia as Informações de Segurança.
Pred instalací tohoto produktu si prectete prírucku bezpecnostních instrukcí.
Læs sikkerhedsforskrifterne, før du installerer dette produkt.
Lees voordat u dit product installeert eerst de veiligheidsvoorschriften.
Ennen kuin asennat tämän tuotteen, lue turvaohjeet kohdasta Safety Information.
Avant d'installer ce produit, lisez les consignes de sécurité.
Vor der Installation dieses Produkts die Sicherheitshinweise lesen.
Prima di installare questo prodotto, leggere le Informazioni sulla Sicurezza.
Les sikkerhetsinformasjonen (Safety Information) før du installerer dette produktet.
Antes de instalar este produto, leia as Informações sobre Segurança.
Antes de instalar este producto, lea la información de seguridad.
Läs säkerhetsinformationen innan du installerar den här produkten.
© Copyright IBM Corp. 2005 v
Guidelines for trained service technicians
This section contains information for trained service technicians.
Inspecting for unsafe conditions
Use the information in this section to help you identify potential unsafe conditions in an IBM product that you are working on. Each IBM product, as it was designed and manufactured, has required safety items to protect users and service technicians from injury. The information in this section addresses only those items. Use good judgment to identify potential unsafe conditions that might be caused by non-IBM alterations or attachment of non-IBM features or options that are not addressed in this section. If you identify an unsafe condition, you must determine how serious the hazard is and whether you must correct the problem before you work on the product.
Consider the following conditions and the safety hazards that they present:
v Electrical hazards, especially primary power. Primary voltage on the frame can
cause serious or fatal electrical shock.
v Explosive hazards, such as a damaged CRT face or a bulging capacitor.
v Mechanical hazards, such as loose or missing hardware.
To inspect the product for potential unsafe conditions, complete the following steps:
1. Make sure that the power is off and the power cord is disconnected.
2. Make sure that the exterior cover is not damaged, loose, or broken, and
observe any sharp edges.
3. Check the power cord:
v Make sure that the third-wire ground connector is in good condition. Use a
meter to measure third-wire ground continuity for 0.1 ohm or less between the external ground pin and the frame ground.
v Make sure that the power cord is the correct type, as specified in the
documentation for your BladeCenter unit type.
v Make sure that the insulation is not frayed or worn.
4. Remove the cover.
5. Check for any obvious non-IBM alterations. Use good judgment as to the safety
of any non-IBM alterations.
6. Check inside the server for any obvious unsafe conditions, such as metal filings,
contamination, water or other liquid, or signs of fire or smoke damage.
7. Check for worn, frayed, or pinched cables.
8. Make sure that the power-supply cover fasteners (screws or rivets) have not
been removed or tampered with.
Guidelines for servicing electrical equipment
Observe the following guidelines when servicing electrical equipment:
v Check the area for electrical hazards such as moist floors, nongrounded power
extension cords, and missing safety grounds.
v Use only approved tools and test equipment. Some hand tools have handles that
are covered with a soft material that does not provide insulation from live
electrical current.
v Regularly inspect and maintain your electrical hand tools for safe operational
condition. Do not use worn or broken tools or testers.
vi AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
v Do not touch the reflective surface of a dental mirror to a live electrical circuit.
The surface is conductive and can cause personal injury or equipment damage if it touches a live electrical circuit.
v Some rubber floor mats contain small conductive fibers to decrease electrostatic
discharge. Do not use this type of mat to protect yourself from electrical shock.
v Do not work alone under hazardous conditions or near equipment that has
hazardous voltages.
v Locate the emergency power-off (EPO) switch, disconnecting switch, or electrical
outlet so that you can turn off the power quickly in the event of an electrical accident.
v Disconnect all power before you perform a mechanical inspection, work near
power supplies, or remove or install main units.
v Before you work on the equipment, disconnect the power cord. If you cannot
disconnect the power cord, have the customer power-off the wall box that supplies power to the equipment and lock the wall box in the off position.
v Never assume that power has been disconnected from a circuit. Check it to
make sure that it has been disconnected.
v If you have to work on equipment that has exposed electrical circuits, observe
the following precautions:
– Make sure that another person who is familiar with the power-off controls is
near you and is available to turn off the power if necessary.
– When you are working with powered-on electrical equipment, use only one
hand. Keep the other hand in your pocket or behind your back to avoid creating a complete circuit that could cause an electrical shock.
– When using a tester, set the controls correctly and use the approved probe
leads and accessories for that tester.
– Stand on a suitable rubber mat to insulate you from grounds such as metal
floor strips and equipment frames.
v Use extreme care when measuring high voltages.
v To ensure proper grounding of components such as power supplies, pumps,
blowers, fans, and motor generators, do not service these components outside of their normal operating locations.
v If an electrical accident occurs, use caution, turn off the power, and send another
person to get medical aid.
Safety statements
Important:
Each caution and danger statement in this documentation begins with a number.
This number is used to cross reference an English-language caution or danger
statement with translated versions of the caution or danger statement in the Safety
Information document.
For example, if a caution statement begins with a number 1, translations for that
caution statement appear in the Safety Information document under statement 1.
Be sure to read all caution and danger statements in this documentation before
performing the instructions. Read any additional safety information that comes with
your server or optional device before you install the device.
Safety vii
Statement 1:
DANGER
Electrical current from power, telephone, and communication cables is hazardous.
To avoid a shock hazard:
v Do not connect or disconnect any cables or perform installation,
maintenance, or reconfiguration of this product during an electrical storm.
v Connect all power cords to a properly wired and grounded electrical
outlet.
v Connect to properly wired outlets any equipment that will be attached to
this product.
v When possible, use one hand only to connect or disconnect signal
cables.
v Never turn on any equipment when there is evidence of fire, water, or
structural damage.
v Disconnect the attached power cords, telecommunications systems,
networks, and modems before you open the device covers, unless instructed otherwise in the installation and configuration procedures.
v Connect and disconnect cables as described in the following table when
installing, moving, or opening covers on this product or attached devices.
To Connect: To Disconnect:
1. Turn everything OFF.
2. First, attach all cables to devices.
3. Attach signal cables to connectors.
4. Attach power cords to outlet.
5. Turn device ON.
1. Turn everything OFF.
2. First, remove power cords from outlet.
3. Remove signal cables from connectors.
4. Remove all cables from devices.
viii AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Statement 2:
CAUTION:
When replacing the lithium battery, use only IBM Part Number 33F8354 or an
equivalent type battery recommended by the manufacturer. If your system has
a module containing a lithium battery, replace it only with the same module
type made by the same manufacturer. The battery contains lithium and can
explode if not properly used, handled, or disposed of.
Do not:
v Throw or immerse into water
v Heat to more than 100°C (212°F)
v Repair or disassemble
Dispose of the battery as required by local ordinances or regulations.
Statement 3:
CAUTION:
When laser products (such as CD-ROMs, DVD drives, fiber optic devices, or
transmitters) are installed, note the following:
v Do not remove the covers. Removing the covers of the laser product could
result in exposure to hazardous laser radiation. There are no serviceable parts inside the device.
v Use of controls or adjustments or performance of procedures other than
those specified herein might result in hazardous radiation exposure.
DANGER
Some laser products contain an embedded Class 3A or Class 3B laser diode. Note the following.
Laser radiation when open. Do not stare into the beam, do not view directly with optical instruments, and avoid direct exposure to the beam.
Safety ix
Statement 4:
18 kg (39.7 lb) 32 kg (70.5 lb) 55 kg (121.2 lb)
CAUTION: Use safe practices when lifting.
Statement 5:
CAUTION: The power control button on the device and the power switch on the power supply do not turn off the electrical current supplied to the device. The device also might have more than one power cord. To remove all electrical current from the device, ensure that all power cords are disconnected from the power source.
2
1
x AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Statement 8:
CAUTION:
Never remove the cover on a power supply or any part that has the following
label attached.
Hazardous voltage, current, and energy levels are present inside any
component that has this label attached. There are no serviceable parts inside
these components. If you suspect a problem with one of these parts, contact
a service technician.
Statement 10:
CAUTION:
Do not place any object on top of rack-mounted devices.
Safety xi
xii AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Chapter 1. Introduction
This Problem Determination and Service Guide contains information to help you
solve problems that might occur in your AMD Opteron LS20 Type 8850 for IBM
BladeCenter server. It describes the diagnostic tools that come with the server, error
codes and suggested actions, and instructions for replacing failing components.
Replaceable components are of three types:
v Tier 1 customer replaceable unit (CRU): Replacement of Tier 1 CRUs is your
responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation.
v Tier 2 customer replaceable unit: You may install a Tier 2 CRU yourself or
request IBM to install it, at no additional charge, under the type of warranty service that is designated for your server.
v Field replaceable unit (FRU): FRUs must be installed only by trained service
technicians.
For information about the terms of the warranty and getting service and assistance,
see the Warranty and Support Information document.
Related documentation
In addition to this document, the following documentation also comes with the
server:
v Installation and User’s Guide
This printed document contains general information about the server, including how to install supported options and how to configure the server.
v Safety Information
This document is in Portable Document Format (PDF) on the Documentation CD. It contains translated caution and danger statements. Each caution and danger statement that appears in the documentation has a number that you can use to locate the corresponding statement in your language in the Safety Information document.
v Warranty and Support Information
This document is in PDF on the Documentation CD. It contains information about the terms of the warranty and about service and assistance.
®
Depending on the server model, additional documentation might be included on the
Documentation CD.
The blade server might have features that are not described in the documentation
that comes with the server. The documentation might be updated occasionally to
include information about those features, or technical updates might be available to
provide additional information that is not included in the blade server
documentation. The most recent versions of all BladeCenter documentation is at
http://www.ibm.com/support/.
In addition to the documentation in this library, be sure to review the IBM
BladeCenter Planning and Installation Guide for your BladeCenter unit type for
information to help you prepare for system installation and configuration. This
document is available at http://www.ibm.com/pc/eserver/bladecenter/.
© Copyright IBM Corp. 2005 1
Notices and statements in this document
The caution and danger statements that appear in this document are also in the multilingual Safety Information document, which is on the Documentation CD. Each statement is numbered for reference to the corresponding statement in the Safety Information document.
The following notices and statements are used in this document:
v Note: These notices provide important tips, guidance, or advice.
v Important: These notices provide information or advice that might help you avoid
inconvenient or problem situations.
v Attention: These notices indicate potential damage to programs, devices, or
data. An attention notice is placed just before the instruction or situation in which
damage could occur.
v Caution: These statements indicate situations that can be potentially hazardous
to you. A caution statement is placed just before the description of a potentially
hazardous procedure step or situation.
v Danger: These statements indicate situations that can be potentially lethal or
extremely hazardous to you. A danger statement is placed just before the
description of a potentially lethal or extremely hazardous procedure step or
situation.
2 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
LS20 Type 8850 specifications for non-NEBS/ETSI environments
The following table provides a summary of the features and specifications of the
LS20 Type 8850 blade server operating in a non-NEBS/ETSI environment.
Note: Power, cooling, removable-media drives, external ports, and advanced
system management are provided by the BladeCenter unit.
Microprocessor:
Supports up to two microprocessors
v AMD Opteron processor
v AMD chipset
Note: Use the Configuration/Setup Utility program to determine the type and speed of the microprocessors in your blade server.
Memory:
v Dual channel (DDR1) with 4 dual
inline memory module (DIMM) slots (two for each microprocessor)
v Type: 2-way interleaved, DDR1,
PC3200, Very Low Profile (VLP), ECC SDRAM registered x4 (Chipkill) DIMMs only (Chipkill is not supported for 512 MB DIMMs)
v Supports 512 MB, 1 GB, and 2 GB
DIMMs (as of the date of this publication)
Drives: Support for two internal small-form-factor SCSI drives
Integrated functions:
v Dual-channel Gigabit Ethernet
controller
v Expansion card interface
v Baseboard management controller
(BMC) with IPMI firmware
v ATI Radeon 7000M video
controller
v LSI 1020 SCSI controller
v Light path diagnostics
v Local service processor (BMC)
v RS-485 interface for
communication with the management module
v Automatic server restart (ASR)
v Serial over LAN (SOL)
v Intelligent Platform Management
Interface (IPMI)
v 4 USB buses for communication
with keyboard, mouse, diskette drive, and CD-ROM drive
Predictive Failure Analysis (PFA) alerts:
v Microprocessor
v Memory
Electrical Input: 12Vdc
Environment:
v Air temperature:
– Blade server on: 10° to 35°C (50°
to 95°F). Altitude: 0 to 914 m (2998.69 ft)
– Blade server on: 10° to 32°C (50°
to 95°F). Altitude: 914 m to 2134 m (2998.69 ft to 7000 ft)
– Blade server off: -40° to 60°C
(-40° to 140°F)
v Humidity:
– Blade server on: 8% to 80% – Blade server off: 5% to 80%
Size:
v Height: 24.5 cm (9.7 inches)
v Depth: 44.6 cm (17.6 inches)
v Width: 2.9 cm (1.14 inches)
v Maximum weight: 5.0 kg (11 lb)
Note: The operating system in the blade server must provide USB support for the
blade server to recognize and use the keyboard, mouse, CD drive, and diskette
drive. The BladeCenter unit uses USB for internal communications with these
devices.
Chapter 1. Introduction 3
Blade server control panel buttons and LEDs
This section describes the blade server control panel buttons and LEDs.
Note: The control panel door is shown in the closed (normal) position in the following illustration. To access the power-control button, you must open the control panel door.
Activity LED
Location LED
Information LED
Blade-error LED
Power-control button
Power-on LED
Keyboard/ mouse select button
video/
CD/diskette/USB select button
Keyboard/video/mouse (KVM) select button: Press this button to associate the shared BladeCenter unit keyboard port, video port, and mouse port with the blade server. The LED on this button flashes while the request is being processed then is lit when the ownership of the keyboard, video, and mouse has been transferred to the blade server. It can take approximately 20 seconds to switch the keyboard, video, and mouse control to the blade server.
You can also press keyboard keys in the following sequence to switch keyboard, mouse, and video control between blade servers:
NumLock NumLock blade_server_number Enter
Where blade_server_number is the two-digit number for the blade bay in which the blade server is installed.
Although the keyboard that is attached to the BladeCenter unit is a PS/2-style keyboard, internal communication with it is through the USB. The operating system in the blade server must provide USB support for the blade server to recognize and use the keyboard and mouse. When you are not running an operating system that has USB device drivers, such as in the following situations, the keyboard responds very slowly:
v Running the blade server integrated diagnostics
v Running a BIOS update diskette on a blade server
v Updating the diagnostics on a blade server
v Running the Broadcom firmware CD for a blade server
If there is no response when you press the keyboard/video/mouse select button, you can use the management-module Web interface to determine whether local control has been disabled on the blade server.
4 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
If you install a supported Microsoft Windows operating system on the blade server
while it is not the current owner of the keyboard, mouse, and video, a delay of up to
1 minute occurs the first time you switch the keyboard, mouse, and video to the
blade server. During this one-time-only delay, the blade server device manager
enumerates the keyboard, mouse, and video and loads the device drivers. All
subsequent switching takes place in the normal keyboard/video/mouse switching
time frame (up to 20 seconds).
CD/diskette/USB select button: Press this button to associate the shared
BladeCenter unit removable-media drives and USB ports with the blade server. The
LED on the button flashes while the request is being processed then is lit when the
ownership of the removable-media drives and USB ports has been transferred to
the blade server. It can take approximately 20 seconds for the operating system in
the blade server to recognize the removable-media drives and USB ports.
The operating system in the blade server must provide USB support for the blade
server to recognize and use the removable-media drives and USB ports. The
BladeCenter unit uses USB for internal communication with these devices. If there
is no response when you press the CD/diskette/USB select button, you can use the
management-module Web interface to determine whether local control has been
disabled on the blade server.
Activity LED: When this green LED is lit, it indicates that there is activity on the
hard disk drive or network.
Location LED: When this blue LED is lit, it has been turned on by the system
administrator to aid in visually locating the blade server. The location LED on the
BladeCenter unit will be lit also. The location LED can be turned off through the
management-module Web interface or through IBM Director Console.
Information LED: When this amber LED is lit, it indicates that information about a
system error for the blade server has been placed in the Management Module
Event Log. The information LED can be turned off through the management-module
Web interface or through IBM Director Console.
Blade-error LED: When this amber LED is lit, it indicates that a system error has
occurred in the blade server. The blade-error LED will turn off only after the error is
corrected.
Power-control button: This button is behind the control panel door. Press this
button to turn on or turn off the blade server.
Note: The power-control button has effect only if local power control is enabled for
the blade server. Local power control is enabled and disabled through the
management-module Web interface.
Power-on LED: This green LED indicates the power status of the blade server in
the following manner:
v Flashing rapidly: The service processor (BMC) on the blade server is
handshaking with the management module.
v Flashing slowly: The blade server has power but is not turned on.
v Lit continuously: The blade server has power and is turned on.
Chapter 1. Introduction 5
Turning on the blade server
After you connect the blade server to power through the BladeCenter unit, the blade server can start in any of the following ways:
v You can press the power-control button on the front of the blade server (behind
the control panel door, see “Blade server control panel buttons and LEDs” on
page 4) to start the blade server.
Notes:
1. Wait until the power-on LED on the blade server flashes slowly before pressing the blade server power-control button. During this time, the service processor in the management module is initializing; therefore, the power-control button on the blade server does not respond.
2. While the blade server is powering-up, the power-on LED on the front of the server is lit. See “Blade server control panel buttons and LEDs” on page 4 for the power-on LED states.
v If a power failure occurs, the BladeCenter unit and then the blade server can
start automatically when power is restored (if the blade server is configured through the management module to do so).
v You can turn on the blade server remotely by means of the service processor in
the management module.
v If the operating system supports the Wake on LAN feature and the blade server
power-on LED is flashing slowly, the Wake on LAN feature can turn on the blade server, if the Wake on LAN feature has not been disabled through the management module.
Turning off the blade server
When you turn off the blade server, it is still connected to power through the BladeCenter unit. The blade server can respond to requests from the service processor, such as a remote request to turn on the blade server. To remove all power from the blade server, you must remove it from the BladeCenter unit.
Shut down the operating system before you turn off the blade server. See the operating-system documentation for information about shutting down the operating system.
The blade server can be turned off in any of the following ways:
v You can press the power-control button on the blade server (behind the control
panel door, see “Blade server control panel buttons and LEDs” on page 4). This also starts an orderly shutdown of the operating system, if this feature is supported by the operating system.
Note: After turning off the blade server, wait at least 5 seconds before you press the power-control button to turn on the blade server again.
v If the operating system stops functioning, you can press and hold the
power-control button for more than 4 seconds to turn off the blade server.
v The management module can turn off the blade server.
6 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
System-board layouts
The following illustrations show the connectors, LEDs, switches, and jumpers on the system board. The illustrations in this document might differ slightly from your hardware.
System-board connectors
The following illustration shows the connectors on the system board.
I/O expansion option connector (J10)
SCSI connector 1 (J12)
SCSI connector 0 (J11)
I/O expansion option connector (J13)
Battery (BH1)
DIMM 1 (J31)
DIMM 2 (J30)
Microprocessor 1 and heat sink
Microprocessor socket 2 and heat sink filler
Control panel connector
DIMM 3 (J4)
DIMM 4 (J2)
System-board jumpers
The following illustration shows the jumpers on the system board.
Note: This server does not have a power-on password override jumper. See “Using passwords” on page 87 for information about bypassing a power-on password.
BIOS backup page (J16) WOL bypass (J15)
2
1
Normal operation
3
4
2
1
Start from backup
3
4
BIOS image
Normal
3
4
operation
2
1
3
4
Disable WOL
2
1
Chapter 1. Introduction 7
System-board LEDs
The following illustration shows the LEDs on the system board. You have to remove the blade server from the BladeCenter unit, open the cover, and press the light path diagnostics switch to light any error LEDs that were turned on during processing.
SCSI 1 error LED (CR53) (future use)
SCSI 0 error LED (CR51) (future use)
BMC error LED (CR47) (future use)
Light path diagnostics panel
Microprocessor 1 error LED (CR54)
Microprocessor 2 error LED (CR50)
DIMM 1 error LED (CR52)
DIMM 2 error LED (CR48)
DIMM 3 error LED (CR46)
DIMM 4 error LED (CR49)
The following illustration shows the light path diagnostics panel on the system board.
NMI
MIS S BRD
TEMP
NMI error LED
Microprocessor mismatch error LED
System-board error LED
Over temperature error LED
Light path diagnostics LED
Light path diagnostics switch
8 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Chapter 2. Diagnostics
This chapter describes the diagnostic tools that are available to help you solve problems that might occur in the server.
If you cannot locate and correct the problem using the information in this chapter, see Appendix A, “Getting help and technical assistance,” on page 91 for more information.
Diagnostic tools
The following tools are available to help you diagnose and solve hardware-related problems:
v POST beep codes, error messages, and error logs
The power-on self-test (POST) generates beep codes and messages to indicate successful test completion or the detection of a problem. See “POST” for more information.
v Troubleshooting tables
These tables list problem symptoms and actions to correct the problems. See “Troubleshooting tables” on page 27 for more information.
v Light path diagnostics
Use the light path diagnostics to diagnose system errors quickly. See “Light path diagnostics” on page 40 for more information.
v Diagnostic programs, messages, and error codes
The diagnostic programs are the primary method of testing the major components of the blade server. These programs are stored in read-only memory (ROM) on the blade server. See “Diagnostic programs, messages, and error codes” on page 43 for more information.
POST
When you turn on the blade server, it performs a series of tests to check the operation of server components and some optional devices in the server. This series of tests is called the power-on self-test, or POST.
If a power-on password is set, you must type the password and press Enter, when prompted, for POST to run.
If POST is completed without detecting any problems, a single beep sounds, and the server startup is completed.
If POST detects a problem, more than one beep might sound, or an error message is displayed. See “Beep code descriptions” on page 10 and “POST error codes” on page 19 for more information.
POST beep codes
A beep code is a combination of short or long beeps or a series of short beeps that are separated by pauses. For example, a “1-2-3” beep code is one short beep, a pause, two short beeps, a pause, and three short beeps. A beep code other than one beep indicates that POST has detected a problem. To determine the meaning of a beep code, see “Beep code descriptions” on page 10. If no beep code sounds, see “No-beep symptoms” on page 16.
© Copyright IBM Corp. 2005 9
Beep code descriptions
The following table describes the beep codes and suggested actions to correct the detected problems.
A single problem might cause more than one error message. When this occurs, correct the cause of the first error message. The other error messages usually will not occur the next time POST runs.
Exception: If there are multiple error codes or light path diagnostics LEDs that indicate a microprocessor error, the error might be in a microprocessor or in a microprocessor socket. See “Microprocessor problems” on page 32 for information about diagnosing microprocessor problems.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Beep code Description Action
1-1-2 Microprocessor register test failed
1-1-3 CMOS write/read test failed.
1-1-4 BIOS ROM checksum failed.
1-2-1 Programmable interval timer failed. (Trained service technician only) Replace the
1. Reseat the following components:
a. (Trained service technician only)
Microprocessor 2
b. (Trained service technician only)
Microprocessor 1
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor 2
b. (Trained service technician only)
Microprocessor 1
c. (Trained service technician only) System
board assembly
1. Reseat the battery
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Battery
b. (Trained service technician only) System
board assembly
1. Flash the BIOS.
2. Reseat the DIMMs.
3. Replace the following components one at a time, in the order shown, restarting the server each time.
a. DIMMs
b. (Trained service technician only) System
board assembly
system board assembly.
10 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Beep code Description Action
1-2-2 DMA initialization failed. (Trained service technician only) Replace the
system board assembly.
1-2-3 DMA page register write/read failed. (Trained service technician only) Replace the
system board assembly.
1-2-4 RAM refresh verification failed.
1. Reseat the DIMMs.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. DIMMs
b. (Trained service technician only) System
board assembly
1-3-1 First 64K RAM test failed.
1. Reseat the DIMMs.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. DIMMs
b. (Trained service technician only) System
board assembly
1-3-2 First 64K RAM parity test failed.
1. Reseat the DIMMs.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. DIMMs
b. (Trained service technician only) System
board assembly
2-1-1 Secondary DMA register failed. (Trained service technician only) Replace the
system board assembly.
2-1-2 Primary DMA register failed. (Trained service technician only) Replace the
system board assembly.
2-1-3 Primary interrupt mask register failed. (Trained service technician only) Replace the
system board assembly.
2-1-4 Secondary interrupt mask register failed. (Trained service technician only) Replace the
system board assembly.
2-2-1 Interrupt vector loading failed. (Trained service technician only) Replace the
system board assembly.
Chapter 2. Diagnostics 11
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Beep code Description Action
2-2-2 Keyboard controller failed.
2-2-3 CMOS power failure and checksum
checks failed.
2-2-4 CMOS configuration information
validation failed.
2-3-1 Screen initialization failed. (Trained service technician only) Replace the
2-3-2 Screen memory failed. (Trained service technician only) Replace the
2-3-3 Screen retrace failed. (Trained service technician only) Replace the
2-3-4 Search for video ROM failed. (Trained service technician only) Replace the
2-4-1 Video failed; screen believed operable. (Trained service technician only) Replace the
2-4-4 Unsupported memory configuration.
1. Reseat the keyboard.
2. Check the management module functionality (refer to the documentation for your BladeCenter Unit type).
3. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Keyboard
b. (Trained service technician only) System
board assembly
1. Reseat the battery.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Battery
b. (Trained service technician only) System
board assembly
1. Reseat the battery.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Battery
b. (Trained service technician only) System
board assembly
system board assembly.
system board assembly.
system board assembly.
system board assembly.
system board assembly.
1. Correct based on the actions for 289 POST Error Code if displayed (see “POST error codes” on page 19 for more information on the 289 error).
2. Check the DIMM error LEDs.
3. Check the Management Module for DIMM errors (refer to the documentation for your BladeCenter Unit type).
12 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Beep code Description Action
3-1-1 Timer tick interrupt failed. (Trained service technician only) Replace the
system board assembly.
3-1-2 Interval timer channel 2 failed. (Trained service technician only) Replace the
system board assembly.
3-1-3 RAM test failed above address 0FFFFh.
1. Reseat the DIMMs.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. DIMMs
b. (Trained service technician only) System
board assembly
3-1-4 Time-of-day clock failed.
1. Reseat the battery.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Battery
b. (Trained service technician only) System
board assembly
3-2-1 Serial port failed. (Trained service technician only) Replace the
system board assembly.
3-2-3 Math coprocessor test failed
1. (Trained service technician only) Reseat the microprocessor.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor
b. (Trained service technician only) System
board assembly
3-2-4 Failure comparing CMOS memory size
against actual
1. Reseat the following components:
a. DIMMs
b. Battery
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. DIMMs
b. (Trained service technician only) System
board assembly
c. Battery
Chapter 2. Diagnostics 13
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Beep code Description Action
3-3-1 Memory size mismatch occurred.
3-3-2 Critical SMBUS error occurred.
3-3-3 No operational memory in system. Important: In some memory configurations,
Two short beeps Information only, configuration has
changed.
1. Verify that both DIMMs in the bank are of the same size, speed, type and technology.
2. Reseat the following components:
a. DIMMs
b. Battery
3. Replace the following components one at a time, in the order shown, restarting the server each time.
a. DIMMs
b. (Trained service technician only) System
board assembly
c. Battery
1. Power down the blade server and reseat it in the BladeCenter unit.
2. Reseat the DIMMs.
3. Replace the following components one at a time, in the order shown, restarting the server each time.
a. DIMMs
b. (Trained service technician only) System
board assembly
the 3-3-3 beep code might sound during POST followed by a blank display screen. If this occurs and the Boot Fail Count feature in the Start Options of the Configuration/Setup Utility program is set to Enabled (its default setting), you must restart the blade server three times to force the system BIOS to reset the CMOS values to the default configuration (memory connector or bank of connectors enabled).
1. Install or reseat DIMMS and restart the server.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. DIMMs
b. (Trained service technician only) System
board assembly
Run the Configuration/Setup Utility program.
14 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Beep code Description Action
Three short beeps Memory error.
1. Reseat the DIMMs.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. DIMMs
b. (Trained service technician only) System
board assembly
One continuous beep Microprocessor error.
1. Reseat the following components:
a. (Trained service technician only)
Microprocessor 1
b. (Trained service technician only)
Microprocessor 2
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor 1
b. (Trained service technician only)
Microprocessor 2
c. (Trained service technician only) System
board assembly
Repeating short beeps Keyboard error.
1. Reseat the keyboard.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Keyboard
b. (Trained service technician only) System
board assembly
Chapter 2. Diagnostics 15
No-beep symptoms
The following table describes situations in which no beep code sounds when POST is completed.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
No-beep symptom Description Action
No beep and the system operates correctly
No beep and no video (System error LED is OFF)
No beep and no video (System Attention LED is ON)
(Trained service technician only) Replace the system board assembly.
See “Solving undetermined problems” on page
50.
See “Light path diagnostics” on page 40.
16 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Loading...
+ 86 hidden pages