IBM LS20, AMD Opteron LS20 Type 8850 Service Manual

AMD Opteron LS20 Type 8850 for IBM BladeCenter
P roblem Dete rminatio n an d Se rvi ce Gui de
AMD Opteron LS20 Type 8850 for IBM BladeCenter
P roblem Dete rminatio n an d Se rvi ce Gui de
Note: Before using this information and the product it supports, read the general information in Appendix B, “Notices,” on page 93.
Fifth Edition (October 2010)
© Copyright IBM Corporation 2005.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
Safety ............................v
Guidelines for trained service technicians ...............vi
Inspecting for unsafe conditions ..................vi
Guidelines for servicing electrical equipment .............vi
Safety statements ........................vii
Chapter 1. Introduction ......................1
Related documentation ......................1
Notices and statements in this document ................2
LS20 Type 8850 specifications for non-NEBS/ETSI environments .......3
Blade server control panel buttons and LEDs ..............4
Turning on the blade server.....................6
Turning off the blade server.....................6
System-board layouts .......................7
System-board connectors ....................7
System-board jumpers .....................7
System-board LEDs ......................8
Chapter 2. Diagnostics ......................9
Diagnostic tools .........................9
POST.............................9
POST beep codes .......................9
Error logs ..........................17
POST error codes .......................19
Checkout procedure .......................26
About the checkout procedure ..................26
Performing the checkout procedure ................27
Troubleshooting tables ......................27
CD or DVD drive problems ...................28
Diskette drive problems .....................29
General problems .......................29
Hard disk drive problems ....................30
Intermittent problems......................30
Keyboard, mouse, or pointing-device problems ............31
Memory problems .......................32
Microprocessor problems ....................32
Monitor or video problems ....................33
Network connection problems ..................34
Optional-device problems ....................35
Power error messages .....................35
Power problems .......................37
ServerGuide problems .....................38
Service processor problems ...................39
Software problems ......................39
Universal Serial Bus (USB) port problems ..............39
Light path diagnostics ......................40
Viewing the light path diagnostics LEDs...............40
Light path diagnostics LEDs ...................42
Diagnostic programs, messages, and error codes ............43
Running the diagnostic programs .................43
Diagnostic text messages ....................44
Viewing the test log ......................45
Diagnostic error codes .....................45
© Copyright IBM Corp. 2005 iii
Recovering from a BIOS update failure ................49
Service processor (BMC) error codes ................50
Solving SCSI problems ......................50
Solving undetermined problems ...................50
Calling IBM for service ......................51
Chapter 3. Parts listing, Type 8850 .................53
Server replaceable units .....................54
Product recovery CDs .....................55
Chapter 4. Removing and replacing blade server components ......59
Installation guidelines ......................59
System reliability guidelines ...................60
Handling static-sensitive devices .................60
Returning a device or component .................60
Removing and installing the blade server in a BladeCenter unit .......61
Operating the blade server cover ..................64
Removing and replacing the bezel assembly ..............66
Removing and replacing Tier 1 CRUs ................67
SCSI hard disk drive ......................67
Memory modules (DIMMs) ...................69
I/O expansion card ......................72
Battery ...........................76
Removing and replacing FRUs ...................78
Microprocessor ........................78
System board assembly ....................84
Chapter 5. Configuration information and instructions .........85
Updating the firmware ......................85
Configuring the blade server ....................86
Using the Configuration/Setup Utility program .............86
Starting the Configuration/Setup Utility program ............87
Configuration/Setup Utility menu choices ..............87
Using passwords .......................87
Configuring the Gigabit Ethernet controllers ..............88
Blade server Ethernet controller enumeration ..............89
Configuring a SCSI RAID .....................89
Appendix A. Getting help and technical assistance ..........91
Before you call .........................91
Using the documentation .....................91
Getting help and information from the World Wide Web ..........92
Software service and support ...................92
Hardware service and support ...................92
Appendix B. Notices ......................93
Edition notice .........................93
Trademarks ..........................94
Important notes.........................95
Product recycling and disposal ...................95
Battery return program ......................96
Index ............................97
iv AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Safety
Before installing this product, read the Safety Information.
Antes de instalar este produto, leia as Informações de Segurança.
Pred instalací tohoto produktu si prectete prírucku bezpecnostních instrukcí.
Læs sikkerhedsforskrifterne, før du installerer dette produkt.
Lees voordat u dit product installeert eerst de veiligheidsvoorschriften.
Ennen kuin asennat tämän tuotteen, lue turvaohjeet kohdasta Safety Information.
Avant d'installer ce produit, lisez les consignes de sécurité.
Vor der Installation dieses Produkts die Sicherheitshinweise lesen.
Prima di installare questo prodotto, leggere le Informazioni sulla Sicurezza.
Les sikkerhetsinformasjonen (Safety Information) før du installerer dette produktet.
Antes de instalar este produto, leia as Informações sobre Segurança.
Antes de instalar este producto, lea la información de seguridad.
Läs säkerhetsinformationen innan du installerar den här produkten.
© Copyright IBM Corp. 2005 v
Guidelines for trained service technicians
This section contains information for trained service technicians.
Inspecting for unsafe conditions
Use the information in this section to help you identify potential unsafe conditions in an IBM product that you are working on. Each IBM product, as it was designed and manufactured, has required safety items to protect users and service technicians from injury. The information in this section addresses only those items. Use good judgment to identify potential unsafe conditions that might be caused by non-IBM alterations or attachment of non-IBM features or options that are not addressed in this section. If you identify an unsafe condition, you must determine how serious the hazard is and whether you must correct the problem before you work on the product.
Consider the following conditions and the safety hazards that they present:
v Electrical hazards, especially primary power. Primary voltage on the frame can
cause serious or fatal electrical shock.
v Explosive hazards, such as a damaged CRT face or a bulging capacitor.
v Mechanical hazards, such as loose or missing hardware.
To inspect the product for potential unsafe conditions, complete the following steps:
1. Make sure that the power is off and the power cord is disconnected.
2. Make sure that the exterior cover is not damaged, loose, or broken, and
observe any sharp edges.
3. Check the power cord:
v Make sure that the third-wire ground connector is in good condition. Use a
meter to measure third-wire ground continuity for 0.1 ohm or less between the external ground pin and the frame ground.
v Make sure that the power cord is the correct type, as specified in the
documentation for your BladeCenter unit type.
v Make sure that the insulation is not frayed or worn.
4. Remove the cover.
5. Check for any obvious non-IBM alterations. Use good judgment as to the safety
of any non-IBM alterations.
6. Check inside the server for any obvious unsafe conditions, such as metal filings,
contamination, water or other liquid, or signs of fire or smoke damage.
7. Check for worn, frayed, or pinched cables.
8. Make sure that the power-supply cover fasteners (screws or rivets) have not
been removed or tampered with.
Guidelines for servicing electrical equipment
Observe the following guidelines when servicing electrical equipment:
v Check the area for electrical hazards such as moist floors, nongrounded power
extension cords, and missing safety grounds.
v Use only approved tools and test equipment. Some hand tools have handles that
are covered with a soft material that does not provide insulation from live
electrical current.
v Regularly inspect and maintain your electrical hand tools for safe operational
condition. Do not use worn or broken tools or testers.
vi AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
v Do not touch the reflective surface of a dental mirror to a live electrical circuit.
The surface is conductive and can cause personal injury or equipment damage if it touches a live electrical circuit.
v Some rubber floor mats contain small conductive fibers to decrease electrostatic
discharge. Do not use this type of mat to protect yourself from electrical shock.
v Do not work alone under hazardous conditions or near equipment that has
hazardous voltages.
v Locate the emergency power-off (EPO) switch, disconnecting switch, or electrical
outlet so that you can turn off the power quickly in the event of an electrical accident.
v Disconnect all power before you perform a mechanical inspection, work near
power supplies, or remove or install main units.
v Before you work on the equipment, disconnect the power cord. If you cannot
disconnect the power cord, have the customer power-off the wall box that supplies power to the equipment and lock the wall box in the off position.
v Never assume that power has been disconnected from a circuit. Check it to
make sure that it has been disconnected.
v If you have to work on equipment that has exposed electrical circuits, observe
the following precautions:
– Make sure that another person who is familiar with the power-off controls is
near you and is available to turn off the power if necessary.
– When you are working with powered-on electrical equipment, use only one
hand. Keep the other hand in your pocket or behind your back to avoid creating a complete circuit that could cause an electrical shock.
– When using a tester, set the controls correctly and use the approved probe
leads and accessories for that tester.
– Stand on a suitable rubber mat to insulate you from grounds such as metal
floor strips and equipment frames.
v Use extreme care when measuring high voltages.
v To ensure proper grounding of components such as power supplies, pumps,
blowers, fans, and motor generators, do not service these components outside of their normal operating locations.
v If an electrical accident occurs, use caution, turn off the power, and send another
person to get medical aid.
Safety statements
Important:
Each caution and danger statement in this documentation begins with a number.
This number is used to cross reference an English-language caution or danger
statement with translated versions of the caution or danger statement in the Safety
Information document.
For example, if a caution statement begins with a number 1, translations for that
caution statement appear in the Safety Information document under statement 1.
Be sure to read all caution and danger statements in this documentation before
performing the instructions. Read any additional safety information that comes with
your server or optional device before you install the device.
Safety vii
Statement 1:
DANGER
Electrical current from power, telephone, and communication cables is hazardous.
To avoid a shock hazard:
v Do not connect or disconnect any cables or perform installation,
maintenance, or reconfiguration of this product during an electrical storm.
v Connect all power cords to a properly wired and grounded electrical
outlet.
v Connect to properly wired outlets any equipment that will be attached to
this product.
v When possible, use one hand only to connect or disconnect signal
cables.
v Never turn on any equipment when there is evidence of fire, water, or
structural damage.
v Disconnect the attached power cords, telecommunications systems,
networks, and modems before you open the device covers, unless instructed otherwise in the installation and configuration procedures.
v Connect and disconnect cables as described in the following table when
installing, moving, or opening covers on this product or attached devices.
To Connect: To Disconnect:
1. Turn everything OFF.
2. First, attach all cables to devices.
3. Attach signal cables to connectors.
4. Attach power cords to outlet.
5. Turn device ON.
1. Turn everything OFF.
2. First, remove power cords from outlet.
3. Remove signal cables from connectors.
4. Remove all cables from devices.
viii AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Statement 2:
CAUTION:
When replacing the lithium battery, use only IBM Part Number 33F8354 or an
equivalent type battery recommended by the manufacturer. If your system has
a module containing a lithium battery, replace it only with the same module
type made by the same manufacturer. The battery contains lithium and can
explode if not properly used, handled, or disposed of.
Do not:
v Throw or immerse into water
v Heat to more than 100°C (212°F)
v Repair or disassemble
Dispose of the battery as required by local ordinances or regulations.
Statement 3:
CAUTION:
When laser products (such as CD-ROMs, DVD drives, fiber optic devices, or
transmitters) are installed, note the following:
v Do not remove the covers. Removing the covers of the laser product could
result in exposure to hazardous laser radiation. There are no serviceable parts inside the device.
v Use of controls or adjustments or performance of procedures other than
those specified herein might result in hazardous radiation exposure.
DANGER
Some laser products contain an embedded Class 3A or Class 3B laser diode. Note the following.
Laser radiation when open. Do not stare into the beam, do not view directly with optical instruments, and avoid direct exposure to the beam.
Safety ix
Statement 4:
18 kg (39.7 lb) 32 kg (70.5 lb) 55 kg (121.2 lb)
CAUTION: Use safe practices when lifting.
Statement 5:
CAUTION: The power control button on the device and the power switch on the power supply do not turn off the electrical current supplied to the device. The device also might have more than one power cord. To remove all electrical current from the device, ensure that all power cords are disconnected from the power source.
2
1
x AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Statement 8:
CAUTION:
Never remove the cover on a power supply or any part that has the following
label attached.
Hazardous voltage, current, and energy levels are present inside any
component that has this label attached. There are no serviceable parts inside
these components. If you suspect a problem with one of these parts, contact
a service technician.
Statement 10:
CAUTION:
Do not place any object on top of rack-mounted devices.
Safety xi
xii AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Chapter 1. Introduction
This Problem Determination and Service Guide contains information to help you
solve problems that might occur in your AMD Opteron LS20 Type 8850 for IBM
BladeCenter server. It describes the diagnostic tools that come with the server, error
codes and suggested actions, and instructions for replacing failing components.
Replaceable components are of three types:
v Tier 1 customer replaceable unit (CRU): Replacement of Tier 1 CRUs is your
responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation.
v Tier 2 customer replaceable unit: You may install a Tier 2 CRU yourself or
request IBM to install it, at no additional charge, under the type of warranty service that is designated for your server.
v Field replaceable unit (FRU): FRUs must be installed only by trained service
technicians.
For information about the terms of the warranty and getting service and assistance,
see the Warranty and Support Information document.
Related documentation
In addition to this document, the following documentation also comes with the
server:
v Installation and User’s Guide
This printed document contains general information about the server, including how to install supported options and how to configure the server.
v Safety Information
This document is in Portable Document Format (PDF) on the Documentation CD. It contains translated caution and danger statements. Each caution and danger statement that appears in the documentation has a number that you can use to locate the corresponding statement in your language in the Safety Information document.
v Warranty and Support Information
This document is in PDF on the Documentation CD. It contains information about the terms of the warranty and about service and assistance.
®
Depending on the server model, additional documentation might be included on the
Documentation CD.
The blade server might have features that are not described in the documentation
that comes with the server. The documentation might be updated occasionally to
include information about those features, or technical updates might be available to
provide additional information that is not included in the blade server
documentation. The most recent versions of all BladeCenter documentation is at
http://www.ibm.com/support/.
In addition to the documentation in this library, be sure to review the IBM
BladeCenter Planning and Installation Guide for your BladeCenter unit type for
information to help you prepare for system installation and configuration. This
document is available at http://www.ibm.com/pc/eserver/bladecenter/.
© Copyright IBM Corp. 2005 1
Notices and statements in this document
The caution and danger statements that appear in this document are also in the multilingual Safety Information document, which is on the Documentation CD. Each statement is numbered for reference to the corresponding statement in the Safety Information document.
The following notices and statements are used in this document:
v Note: These notices provide important tips, guidance, or advice.
v Important: These notices provide information or advice that might help you avoid
inconvenient or problem situations.
v Attention: These notices indicate potential damage to programs, devices, or
data. An attention notice is placed just before the instruction or situation in which
damage could occur.
v Caution: These statements indicate situations that can be potentially hazardous
to you. A caution statement is placed just before the description of a potentially
hazardous procedure step or situation.
v Danger: These statements indicate situations that can be potentially lethal or
extremely hazardous to you. A danger statement is placed just before the
description of a potentially lethal or extremely hazardous procedure step or
situation.
2 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
LS20 Type 8850 specifications for non-NEBS/ETSI environments
The following table provides a summary of the features and specifications of the
LS20 Type 8850 blade server operating in a non-NEBS/ETSI environment.
Note: Power, cooling, removable-media drives, external ports, and advanced
system management are provided by the BladeCenter unit.
Microprocessor:
Supports up to two microprocessors
v AMD Opteron processor
v AMD chipset
Note: Use the Configuration/Setup Utility program to determine the type and speed of the microprocessors in your blade server.
Memory:
v Dual channel (DDR1) with 4 dual
inline memory module (DIMM) slots (two for each microprocessor)
v Type: 2-way interleaved, DDR1,
PC3200, Very Low Profile (VLP), ECC SDRAM registered x4 (Chipkill) DIMMs only (Chipkill is not supported for 512 MB DIMMs)
v Supports 512 MB, 1 GB, and 2 GB
DIMMs (as of the date of this publication)
Drives: Support for two internal small-form-factor SCSI drives
Integrated functions:
v Dual-channel Gigabit Ethernet
controller
v Expansion card interface
v Baseboard management controller
(BMC) with IPMI firmware
v ATI Radeon 7000M video
controller
v LSI 1020 SCSI controller
v Light path diagnostics
v Local service processor (BMC)
v RS-485 interface for
communication with the management module
v Automatic server restart (ASR)
v Serial over LAN (SOL)
v Intelligent Platform Management
Interface (IPMI)
v 4 USB buses for communication
with keyboard, mouse, diskette drive, and CD-ROM drive
Predictive Failure Analysis (PFA) alerts:
v Microprocessor
v Memory
Electrical Input: 12Vdc
Environment:
v Air temperature:
– Blade server on: 10° to 35°C (50°
to 95°F). Altitude: 0 to 914 m (2998.69 ft)
– Blade server on: 10° to 32°C (50°
to 95°F). Altitude: 914 m to 2134 m (2998.69 ft to 7000 ft)
– Blade server off: -40° to 60°C
(-40° to 140°F)
v Humidity:
– Blade server on: 8% to 80% – Blade server off: 5% to 80%
Size:
v Height: 24.5 cm (9.7 inches)
v Depth: 44.6 cm (17.6 inches)
v Width: 2.9 cm (1.14 inches)
v Maximum weight: 5.0 kg (11 lb)
Note: The operating system in the blade server must provide USB support for the
blade server to recognize and use the keyboard, mouse, CD drive, and diskette
drive. The BladeCenter unit uses USB for internal communications with these
devices.
Chapter 1. Introduction 3
Blade server control panel buttons and LEDs
This section describes the blade server control panel buttons and LEDs.
Note: The control panel door is shown in the closed (normal) position in the following illustration. To access the power-control button, you must open the control panel door.
Activity LED
Location LED
Information LED
Blade-error LED
Power-control button
Power-on LED
Keyboard/ mouse select button
video/
CD/diskette/USB select button
Keyboard/video/mouse (KVM) select button: Press this button to associate the shared BladeCenter unit keyboard port, video port, and mouse port with the blade server. The LED on this button flashes while the request is being processed then is lit when the ownership of the keyboard, video, and mouse has been transferred to the blade server. It can take approximately 20 seconds to switch the keyboard, video, and mouse control to the blade server.
You can also press keyboard keys in the following sequence to switch keyboard, mouse, and video control between blade servers:
NumLock NumLock blade_server_number Enter
Where blade_server_number is the two-digit number for the blade bay in which the blade server is installed.
Although the keyboard that is attached to the BladeCenter unit is a PS/2-style keyboard, internal communication with it is through the USB. The operating system in the blade server must provide USB support for the blade server to recognize and use the keyboard and mouse. When you are not running an operating system that has USB device drivers, such as in the following situations, the keyboard responds very slowly:
v Running the blade server integrated diagnostics
v Running a BIOS update diskette on a blade server
v Updating the diagnostics on a blade server
v Running the Broadcom firmware CD for a blade server
If there is no response when you press the keyboard/video/mouse select button, you can use the management-module Web interface to determine whether local control has been disabled on the blade server.
4 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
If you install a supported Microsoft Windows operating system on the blade server
while it is not the current owner of the keyboard, mouse, and video, a delay of up to
1 minute occurs the first time you switch the keyboard, mouse, and video to the
blade server. During this one-time-only delay, the blade server device manager
enumerates the keyboard, mouse, and video and loads the device drivers. All
subsequent switching takes place in the normal keyboard/video/mouse switching
time frame (up to 20 seconds).
CD/diskette/USB select button: Press this button to associate the shared
BladeCenter unit removable-media drives and USB ports with the blade server. The
LED on the button flashes while the request is being processed then is lit when the
ownership of the removable-media drives and USB ports has been transferred to
the blade server. It can take approximately 20 seconds for the operating system in
the blade server to recognize the removable-media drives and USB ports.
The operating system in the blade server must provide USB support for the blade
server to recognize and use the removable-media drives and USB ports. The
BladeCenter unit uses USB for internal communication with these devices. If there
is no response when you press the CD/diskette/USB select button, you can use the
management-module Web interface to determine whether local control has been
disabled on the blade server.
Activity LED: When this green LED is lit, it indicates that there is activity on the
hard disk drive or network.
Location LED: When this blue LED is lit, it has been turned on by the system
administrator to aid in visually locating the blade server. The location LED on the
BladeCenter unit will be lit also. The location LED can be turned off through the
management-module Web interface or through IBM Director Console.
Information LED: When this amber LED is lit, it indicates that information about a
system error for the blade server has been placed in the Management Module
Event Log. The information LED can be turned off through the management-module
Web interface or through IBM Director Console.
Blade-error LED: When this amber LED is lit, it indicates that a system error has
occurred in the blade server. The blade-error LED will turn off only after the error is
corrected.
Power-control button: This button is behind the control panel door. Press this
button to turn on or turn off the blade server.
Note: The power-control button has effect only if local power control is enabled for
the blade server. Local power control is enabled and disabled through the
management-module Web interface.
Power-on LED: This green LED indicates the power status of the blade server in
the following manner:
v Flashing rapidly: The service processor (BMC) on the blade server is
handshaking with the management module.
v Flashing slowly: The blade server has power but is not turned on.
v Lit continuously: The blade server has power and is turned on.
Chapter 1. Introduction 5
Turning on the blade server
After you connect the blade server to power through the BladeCenter unit, the blade server can start in any of the following ways:
v You can press the power-control button on the front of the blade server (behind
the control panel door, see “Blade server control panel buttons and LEDs” on
page 4) to start the blade server.
Notes:
1. Wait until the power-on LED on the blade server flashes slowly before pressing the blade server power-control button. During this time, the service processor in the management module is initializing; therefore, the power-control button on the blade server does not respond.
2. While the blade server is powering-up, the power-on LED on the front of the server is lit. See “Blade server control panel buttons and LEDs” on page 4 for the power-on LED states.
v If a power failure occurs, the BladeCenter unit and then the blade server can
start automatically when power is restored (if the blade server is configured through the management module to do so).
v You can turn on the blade server remotely by means of the service processor in
the management module.
v If the operating system supports the Wake on LAN feature and the blade server
power-on LED is flashing slowly, the Wake on LAN feature can turn on the blade server, if the Wake on LAN feature has not been disabled through the management module.
Turning off the blade server
When you turn off the blade server, it is still connected to power through the BladeCenter unit. The blade server can respond to requests from the service processor, such as a remote request to turn on the blade server. To remove all power from the blade server, you must remove it from the BladeCenter unit.
Shut down the operating system before you turn off the blade server. See the operating-system documentation for information about shutting down the operating system.
The blade server can be turned off in any of the following ways:
v You can press the power-control button on the blade server (behind the control
panel door, see “Blade server control panel buttons and LEDs” on page 4). This also starts an orderly shutdown of the operating system, if this feature is supported by the operating system.
Note: After turning off the blade server, wait at least 5 seconds before you press the power-control button to turn on the blade server again.
v If the operating system stops functioning, you can press and hold the
power-control button for more than 4 seconds to turn off the blade server.
v The management module can turn off the blade server.
6 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
System-board layouts
The following illustrations show the connectors, LEDs, switches, and jumpers on the system board. The illustrations in this document might differ slightly from your hardware.
System-board connectors
The following illustration shows the connectors on the system board.
I/O expansion option connector (J10)
SCSI connector 1 (J12)
SCSI connector 0 (J11)
I/O expansion option connector (J13)
Battery (BH1)
DIMM 1 (J31)
DIMM 2 (J30)
Microprocessor 1 and heat sink
Microprocessor socket 2 and heat sink filler
Control panel connector
DIMM 3 (J4)
DIMM 4 (J2)
System-board jumpers
The following illustration shows the jumpers on the system board.
Note: This server does not have a power-on password override jumper. See “Using passwords” on page 87 for information about bypassing a power-on password.
BIOS backup page (J16) WOL bypass (J15)
2
1
Normal operation
3
4
2
1
Start from backup
3
4
BIOS image
Normal
3
4
operation
2
1
3
4
Disable WOL
2
1
Chapter 1. Introduction 7
System-board LEDs
The following illustration shows the LEDs on the system board. You have to remove the blade server from the BladeCenter unit, open the cover, and press the light path diagnostics switch to light any error LEDs that were turned on during processing.
SCSI 1 error LED (CR53) (future use)
SCSI 0 error LED (CR51) (future use)
BMC error LED (CR47) (future use)
Light path diagnostics panel
Microprocessor 1 error LED (CR54)
Microprocessor 2 error LED (CR50)
DIMM 1 error LED (CR52)
DIMM 2 error LED (CR48)
DIMM 3 error LED (CR46)
DIMM 4 error LED (CR49)
The following illustration shows the light path diagnostics panel on the system board.
NMI
MIS S BRD
TEMP
NMI error LED
Microprocessor mismatch error LED
System-board error LED
Over temperature error LED
Light path diagnostics LED
Light path diagnostics switch
8 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Chapter 2. Diagnostics
This chapter describes the diagnostic tools that are available to help you solve problems that might occur in the server.
If you cannot locate and correct the problem using the information in this chapter, see Appendix A, “Getting help and technical assistance,” on page 91 for more information.
Diagnostic tools
The following tools are available to help you diagnose and solve hardware-related problems:
v POST beep codes, error messages, and error logs
The power-on self-test (POST) generates beep codes and messages to indicate successful test completion or the detection of a problem. See “POST” for more information.
v Troubleshooting tables
These tables list problem symptoms and actions to correct the problems. See “Troubleshooting tables” on page 27 for more information.
v Light path diagnostics
Use the light path diagnostics to diagnose system errors quickly. See “Light path diagnostics” on page 40 for more information.
v Diagnostic programs, messages, and error codes
The diagnostic programs are the primary method of testing the major components of the blade server. These programs are stored in read-only memory (ROM) on the blade server. See “Diagnostic programs, messages, and error codes” on page 43 for more information.
POST
When you turn on the blade server, it performs a series of tests to check the operation of server components and some optional devices in the server. This series of tests is called the power-on self-test, or POST.
If a power-on password is set, you must type the password and press Enter, when prompted, for POST to run.
If POST is completed without detecting any problems, a single beep sounds, and the server startup is completed.
If POST detects a problem, more than one beep might sound, or an error message is displayed. See “Beep code descriptions” on page 10 and “POST error codes” on page 19 for more information.
POST beep codes
A beep code is a combination of short or long beeps or a series of short beeps that are separated by pauses. For example, a “1-2-3” beep code is one short beep, a pause, two short beeps, a pause, and three short beeps. A beep code other than one beep indicates that POST has detected a problem. To determine the meaning of a beep code, see “Beep code descriptions” on page 10. If no beep code sounds, see “No-beep symptoms” on page 16.
© Copyright IBM Corp. 2005 9
Beep code descriptions
The following table describes the beep codes and suggested actions to correct the detected problems.
A single problem might cause more than one error message. When this occurs, correct the cause of the first error message. The other error messages usually will not occur the next time POST runs.
Exception: If there are multiple error codes or light path diagnostics LEDs that indicate a microprocessor error, the error might be in a microprocessor or in a microprocessor socket. See “Microprocessor problems” on page 32 for information about diagnosing microprocessor problems.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Beep code Description Action
1-1-2 Microprocessor register test failed
1-1-3 CMOS write/read test failed.
1-1-4 BIOS ROM checksum failed.
1-2-1 Programmable interval timer failed. (Trained service technician only) Replace the
1. Reseat the following components:
a. (Trained service technician only)
Microprocessor 2
b. (Trained service technician only)
Microprocessor 1
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor 2
b. (Trained service technician only)
Microprocessor 1
c. (Trained service technician only) System
board assembly
1. Reseat the battery
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Battery
b. (Trained service technician only) System
board assembly
1. Flash the BIOS.
2. Reseat the DIMMs.
3. Replace the following components one at a time, in the order shown, restarting the server each time.
a. DIMMs
b. (Trained service technician only) System
board assembly
system board assembly.
10 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Beep code Description Action
1-2-2 DMA initialization failed. (Trained service technician only) Replace the
system board assembly.
1-2-3 DMA page register write/read failed. (Trained service technician only) Replace the
system board assembly.
1-2-4 RAM refresh verification failed.
1. Reseat the DIMMs.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. DIMMs
b. (Trained service technician only) System
board assembly
1-3-1 First 64K RAM test failed.
1. Reseat the DIMMs.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. DIMMs
b. (Trained service technician only) System
board assembly
1-3-2 First 64K RAM parity test failed.
1. Reseat the DIMMs.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. DIMMs
b. (Trained service technician only) System
board assembly
2-1-1 Secondary DMA register failed. (Trained service technician only) Replace the
system board assembly.
2-1-2 Primary DMA register failed. (Trained service technician only) Replace the
system board assembly.
2-1-3 Primary interrupt mask register failed. (Trained service technician only) Replace the
system board assembly.
2-1-4 Secondary interrupt mask register failed. (Trained service technician only) Replace the
system board assembly.
2-2-1 Interrupt vector loading failed. (Trained service technician only) Replace the
system board assembly.
Chapter 2. Diagnostics 11
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Beep code Description Action
2-2-2 Keyboard controller failed.
2-2-3 CMOS power failure and checksum
checks failed.
2-2-4 CMOS configuration information
validation failed.
2-3-1 Screen initialization failed. (Trained service technician only) Replace the
2-3-2 Screen memory failed. (Trained service technician only) Replace the
2-3-3 Screen retrace failed. (Trained service technician only) Replace the
2-3-4 Search for video ROM failed. (Trained service technician only) Replace the
2-4-1 Video failed; screen believed operable. (Trained service technician only) Replace the
2-4-4 Unsupported memory configuration.
1. Reseat the keyboard.
2. Check the management module functionality (refer to the documentation for your BladeCenter Unit type).
3. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Keyboard
b. (Trained service technician only) System
board assembly
1. Reseat the battery.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Battery
b. (Trained service technician only) System
board assembly
1. Reseat the battery.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Battery
b. (Trained service technician only) System
board assembly
system board assembly.
system board assembly.
system board assembly.
system board assembly.
system board assembly.
1. Correct based on the actions for 289 POST Error Code if displayed (see “POST error codes” on page 19 for more information on the 289 error).
2. Check the DIMM error LEDs.
3. Check the Management Module for DIMM errors (refer to the documentation for your BladeCenter Unit type).
12 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Beep code Description Action
3-1-1 Timer tick interrupt failed. (Trained service technician only) Replace the
system board assembly.
3-1-2 Interval timer channel 2 failed. (Trained service technician only) Replace the
system board assembly.
3-1-3 RAM test failed above address 0FFFFh.
1. Reseat the DIMMs.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. DIMMs
b. (Trained service technician only) System
board assembly
3-1-4 Time-of-day clock failed.
1. Reseat the battery.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Battery
b. (Trained service technician only) System
board assembly
3-2-1 Serial port failed. (Trained service technician only) Replace the
system board assembly.
3-2-3 Math coprocessor test failed
1. (Trained service technician only) Reseat the microprocessor.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor
b. (Trained service technician only) System
board assembly
3-2-4 Failure comparing CMOS memory size
against actual
1. Reseat the following components:
a. DIMMs
b. Battery
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. DIMMs
b. (Trained service technician only) System
board assembly
c. Battery
Chapter 2. Diagnostics 13
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Beep code Description Action
3-3-1 Memory size mismatch occurred.
3-3-2 Critical SMBUS error occurred.
3-3-3 No operational memory in system. Important: In some memory configurations,
Two short beeps Information only, configuration has
changed.
1. Verify that both DIMMs in the bank are of the same size, speed, type and technology.
2. Reseat the following components:
a. DIMMs
b. Battery
3. Replace the following components one at a time, in the order shown, restarting the server each time.
a. DIMMs
b. (Trained service technician only) System
board assembly
c. Battery
1. Power down the blade server and reseat it in the BladeCenter unit.
2. Reseat the DIMMs.
3. Replace the following components one at a time, in the order shown, restarting the server each time.
a. DIMMs
b. (Trained service technician only) System
board assembly
the 3-3-3 beep code might sound during POST followed by a blank display screen. If this occurs and the Boot Fail Count feature in the Start Options of the Configuration/Setup Utility program is set to Enabled (its default setting), you must restart the blade server three times to force the system BIOS to reset the CMOS values to the default configuration (memory connector or bank of connectors enabled).
1. Install or reseat DIMMS and restart the server.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. DIMMs
b. (Trained service technician only) System
board assembly
Run the Configuration/Setup Utility program.
14 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Beep code Description Action
Three short beeps Memory error.
1. Reseat the DIMMs.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. DIMMs
b. (Trained service technician only) System
board assembly
One continuous beep Microprocessor error.
1. Reseat the following components:
a. (Trained service technician only)
Microprocessor 1
b. (Trained service technician only)
Microprocessor 2
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor 1
b. (Trained service technician only)
Microprocessor 2
c. (Trained service technician only) System
board assembly
Repeating short beeps Keyboard error.
1. Reseat the keyboard.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Keyboard
b. (Trained service technician only) System
board assembly
Chapter 2. Diagnostics 15
No-beep symptoms
The following table describes situations in which no beep code sounds when POST is completed.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
No-beep symptom Description Action
No beep and the system operates correctly
No beep and no video (System error LED is OFF)
No beep and no video (System Attention LED is ON)
(Trained service technician only) Replace the system board assembly.
See “Solving undetermined problems” on page
50.
See “Light path diagnostics” on page 40.
16 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Error logs
The BMC log contains all system status messages from the blade server service processor. The management-module event log in your BladeCenter unit contains messages that were generated on each blade server during POST and status messages from the BladeCenter service processor. (See the documentation for the management module in your BladeCenter unit type.)
The following illustration shows an example of a BMC log entry.
---------------------------------------------------------­Get Next Entry Get Previous Entry Clear BMC SEL
Entry Number= 00005 / 00011 Record ID= 0005 Record Type= 02 Timestamp= 2005/01/25 16:15:17 Entry Details: Generator ID= 0020
BMC System Event Log
Sensor Type= 04 Assertion Event Fan Threshold Lower Non-critical - going high
Sensor Number= 40 Event Direction/Type= 01
Event Data= 52 00 1A
Important:
v A single problem might cause several error messages. When this occurs, work to
correct the cause of the first error message. After you correct the cause of the first error message, the other error messages usually will not occur the next time you run the test.
v The management-module event log in your BladeCenter unit lists messages
according to the position of the blade server in the blade bays. If a blade server is moved from one bay to another, the management-module event log will report messages for that blade server using the new bay number; messages for that blade server that were generated prior to the move will still be listed using the previous bay number.
The BMC log is limited in size. When the log is full, new entries will not overwrite existing entries; therefore, you must periodically clear the BMC log through the Configuration/Setup Utility program (the menu choices are described in the Installation and User’s Guide). When you are troubleshooting an error, be sure to clear the BMC log so that you can find current errors more easily.
Entries that are written to the BMC log during the early phase of POST show an incorrect date and time as the default time stamp; however, the date and time are corrected as POST continues.
Each BMC log entry appears on its own page. To display all the data for an entry, use the Up Arrow () and Down Arrow () keys or the Page Up and Page Down keys. To move from one entry to the next, select Get Next Entry or Get Previous Entry.
Chapter 2. Diagnostics 17
The BMC log indicates an assertion event when an event has occurred. It indicates a deassertion event when the event is no longer occurring.
Some of the error codes and messages in the BMC log are abbreviated.
You can view the contents of the BMC log from the Configuration/Setup Utility program and from the diagnostic programs.
When troubleshooting PCI-X slots, note that the error logs report the PCI-X buses numerically. The numerical assignments vary depending on the configuration. You can check the assignments by running the Configuration/Setup Utility program (see the Installation and User’s Guide for more information).
Viewing the BMC log from the Configuration/Setup Utility program
For complete information about using the Configuration/Setup Utility program, see the Installation and User’s Guide.
To view the BMC log, complete the following steps:
1. Turn on the server.
2. When the prompt Press F1 for Configuration/Setup appears, press F1. If you
have set a power-on password, you must type the password and press Enter to start the Configuration/Setup Utility program.
3. Select Advanced Settings, select Baseboard Management Controller (BMC)
settings, and then select BMC System Event Log.
Viewing the BMC log from the diagnostic programs
The BMC log contains the same information whether it is viewed from the Configuration/Setup Utility program or from the diagnostic programs.
For information about using the diagnostic programs, see “Running the diagnostic programs” on page 43.
To view the BMC log, complete the following steps:
1. If the blade server is running, turn off the blade server.
2. Turn on the blade server.
3. When the prompt F2 for Diagnostics appears, press F2.
4. From the top of the screen, select Hardware Info.
5. From the list, select BMC Log.
For complete information about using the Configuration/Setup Utility program, see the Installation and User’s Guide.
18 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
POST error codes
The following table describes the POST error codes and suggested actions to correct the detected problems.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error code Description Action
161 Real-time clock battery error.
162 Device configuration error.
163 Real-time clock error.
184 Power-on password damaged.
189 An attempt was made to access the server
with invalid passwords.
1. Run the Configuration/Setup Utility program.
2. Reseat the battery.
3. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Battery
b. (Trained service technician only) System
board assembly
1. Run the Configuration/Setup Utility program, select Load Default Settings, and save the settings.
2. Reseat the following components:
a. Battery
b. Failing device
3. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Battery
b. Failing device
c. (Trained service technician only) System
board assembly
1. Run the Configuration/Setup Utility program, select Load Default Settings, make sure that the date and time are correct, and save the settings.
2. Reseat the battery.
3. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Battery
b. (Trained service technician only) System
board assembly
1. Run the Configuration/Setup Utility program, select Load Default Settings, and save the settings.
2. (Trained service technician only) Replace the system board assembly.
Restart the server, run the Configuration/Setup Utility program and change the power-on password.
Chapter 2. Diagnostics 19
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error code Description Action
199 Processor power rating unsupported.
201 Memory test error.
229 Cache error.
289 DIMM disabled by user or system.
1. (Trained service technician only) Make sure that the microprocessor that was just installed has a supported power rating. If the power rating is not supported, replace the microprocessor with one that has a supported power rating.
2. Update BIOS code.
3. (Trained service technician only) Replace the microprocessor.
1. Update BIOS code and rerun diagnostics.
2. Reseat the DIMMs.
3. Replace the following components one at a time, in the order shown, restarting the server each time.
a. DIMMs
b. (Trained service technician only) System
board assembly
1. Reseat the following components:
a. (Trained service technician only)
Microprocessor 1
b. (Trained service technician only)
Microprocessor 2
2. Replace the components listed in step 1 one at a time, in the order shown, restarting the server each time.
1. If the DIMM was disabled by the user, run the Configuration/Setup Utility program and enable the DIMM.
2. Reseat the disabled DIMM.
3. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Disabled DIMM
b. (Trained service technician only) System
board assembly
20 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error code Description Action
301 Keyboard or keyboard controller error.
1. If you have installed a USB keyboard, run the Configuration/Setup Utility program and enable keyboardless operation to prevent the POST error message 301 from being displayed during startup.
2. Reseat the keyboard.
3. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Keyboard
b. (Trained service technician only) System
board assembly
303 Keyboard controller error. (Trained service technician only) Replace the system
board assembly.
602 Invalid diskette boot record.
1. Remove and reinsert the diskette, and restart the server; or restart the server using another bootable diskette.
2. Reseat the following components:
a. Diskette drive
b. Diskette drive cable
3. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Diskette drive
b. Diskette drive cable
c. (Trained service technician only) System
board assembly
604 Diskette drive error.
1. Run the Configuration/Setup Utility program.
2. Reseat the following components:
a. Diskette drive
b. Diskette drive cable
3. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Diskette drive
b. Diskette drive cable
c. (Trained service technician only) System
board assembly
Chapter 2. Diagnostics 21
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error code Description Action
662 Diskette drive configuration error.
11xx System board serial port 1 or 2 error.
1162 Serial port configuration conflicts.
1200 Processor machine check.
1295 ECC circuit check. (Trained service technician only) Replace the system
1. Run the Configuration/Setup Utility program.
2. Reseat the following components:
a. Diskette drive
b. Diskette drive cable
3. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Diskette drive
b. Diskette drive cable
c. (Trained service technician only) System
board assembly
1. Reseat the external cable on the serial port.
2. Run the Configuration/Setup Utility program.
3. (Trained service technician only) Replace the system board assembly.
1. Run the Configuration/Setup Utility program ensure that the IRQ and I/O port assignments needed by the serial port are available.
2. If all interrupts are being used by adapters, remove an adapter or force other adapters to share an interrupt.
1. Reseat the following components:
a. (Trained service technician only)
Microprocessor 1
b. (Trained service technician only)
Microprocessor 2
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor 1
b. (Trained service technician only)
Microprocessor 2
c. (Trained service technician only) System
board assembly
board assembly.
22 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error code Description Action
1762 Hard disk configuration error.
1. Reseat the hard disk drive.
2. Run the Configuration/Setup Utility program and load the defaults.
3. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Hard disk drive
b. (Trained service technician only) System
board assembly
18xx Unavailable PCI hardware interrupt.
1. Run the Configuration/Setup Utility program and adjust the adapter settings.
2. Remove each adapter one at a time, restarting the server each time, until the problem is isolated.
1962 A drive does not contain a valid boot sector.
1. Make sure that a bootable operating system is installed.
2. Run the Fixed Disk diagnostic test.
3. Reseat the hard disk drive.
4. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Hard disk drive
b. (Trained service technician only) System
board assembly
2400 Video controller test failure.
1. Verify that the keyboard/mouse/video select button LED on the front of the blade server is on, indicating that the blade server is connected to the shared BladeCenter monitor.
2. Verify that the monitor is connected correctly to the BladeCenter unit.
3. Reseat the video adapter (if installed).
4. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Video adapter (if installed)
b. (Trained service technician only) System
board assembly
2462 Video memory configuration error.
1. Verify that the keyboard/mouse/video select button LED on the front of the blade server is on, indicating that the blade server is connected to the shared BladeCenter monitor.
2. Verify that the monitor is connected correctly to the BladeCenter unit.
3. (Trained service technician only) Replace the system board assembly.
Chapter 2. Diagnostics 23
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error code Description Action
5962 IDE CD or DVD drive configuration error.
8603 Pointing-device error.
00019xxx Processor x is not functioning or failed built
in self test.
00180xxx A PCI adapter requested a resource that is
not available.
1. Run the Configuration/Setup Utility program, select Load Default Settings, and save the settings.
2. Reseat the following components:
a. CD or DVD drive power cable
b. CD or DVD drive IDE cable
c. CD or DVD drive
3. Replace the following components one at a time, in the order shown, restarting the server each time.
a. CD or DVD drive power cable
b. CD or DVD drive IDE cable
c. CD or DVD drive
d. (Trained service technician only) System
board assembly
e. Battery
1. Reseat the pointing device.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Pointing device
b. (Trained service technician only) System
board assembly
1. (Trained service technician only) Reseat microprocessor x.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor x
b. (Trained service technician only) System
board assembly
1. Run the Configuration/Setup Utility program to verify that the interrupt resource settings are correct.
2. Reseat the failing adapter (if installed).
3. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Failing adapter
b. (Trained service technician only) System
board assembly
24 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error code Description Action
01295085 ECC-checking hardware test error.
1. (Trained service technician only) Reseat the microprocessor.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. (Trained service technician only) System
board assembly
b. (Trained service technician only)
Microprocessor
012980xx Patch code missing.
1. Flash the BIOS code.
2. (Trained service technician only) Reseat microprocessor x.
3. (Trained service technician only) Replace microprocessor x.
012981xx Processor patch (microcode) update failed.
1. Flash the BIOS code.
2. (Trained service technician only) Reseat microprocessor x.
3. (Trained service technician only) Replace microprocessor x.
01298200 Processor speed mismatch. Make sure that microprocessors 1 and 2 have the
same number of cores, cache size and type, clock speed, internal and external clock frequencies (see “Configuration/Setup Utility menu choices” on page
87).
19990301 Hard disk sector error.
1. Reseat the hard disk drive.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Hard disk drive
b. (Trained service technician only) System
board assembly
19990305 An operating system was not found.
1. Make sure that a bootable operating system is installed.
2. Run the Fixed Disk diagnostic test.
3. Reseat the hard disk drive.
4. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Hard disk drive
b. (Trained service technician only) System
board assembly
Chapter 2. Diagnostics 25
Checkout procedure
The checkout procedure is the sequence of tasks that you should follow to diagnose a problem in the blade server.
About the checkout procedure
Before performing the checkout procedure for diagnosing hardware problems, review the following information:
v Read the safety information that begins on page v.
v The diagnostic programs provide the primary methods of testing the major
components of the blade server. If you are not sure whether a problem is caused by the hardware or by the software, you can use the diagnostic programs to confirm that the hardware is working correctly.
v When you run the diagnostic programs, a single problem might cause more than
one error message. When this happens, correct the cause of the first error message. The other error messages usually will not occur the next time you run the diagnostic programs.
Exception: If there are multiple error codes or light path diagnostics LEDs that indicate a microprocessor error, the error might be in a microprocessor or in a microprocessor socket. See “Microprocessor problems” on page 32 for information about diagnosing microprocessor problems.
v If the server is halted and a POST error code is displayed, see “Error logs” on
page 17. If the server is halted and no error message is displayed, see “Troubleshooting tables” on page 27 and “Solving undetermined problems” on page 50.
v For intermittent problems, check the error log; see “Error logs” on page 17 and
“Diagnostic programs, messages, and error codes” on page 43.
v If the blade server front panel shows no LEDs, verify the blade server status and
errors in the BladeCenter Web interface; also see “Solving undetermined problems” on page 50.
v If device errors occur, see “Troubleshooting tables” on page 27.
26 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Performing the checkout procedure
To perform the checkout procedure, complete the following steps:
1. If the blade server is running, turn off the blade server.
2. Turn on the blade server. Make sure that the server has control of the video (the keyboard/video/mouse button is lit). If the server does not start, see “Troubleshooting tables.”
3. Record any POST beep codes that are heard or POST error messages that are displayed on the monitor. If an error is displayed, look up the first error in the “POST error codes” on page 19.
4. Check the control panel blade-error LED; if it is lit, check the light path diagnostics LEDs (see “Light path diagnostics” on page 40).
5. Check for the following results:
v Successful completion of POST, indicated by a single beep
v Successful completion of startup, indicated by a readable display of the
operating-system desktop
6. Did a single beep sound and are there readable instructions on the main menu?
v No: Find the failure symptom in “Troubleshooting tables”; if necessary, see
“Solving undetermined problems” on page 50.
v Yes: Run the diagnostic programs (see “Running the diagnostic programs” on
page 43).
– If you receive an error, see “Diagnostic error codes” on page 45.
– If the diagnostic programs were completed successfully and you still
suspect a problem, see “Solving undetermined problems” on page 50.
Troubleshooting tables
Use the troubleshooting tables to find solutions to problems that have identifiable symptoms.
If you cannot find the problem in these tables, see “Running the diagnostic programs” on page 43 for information about testing the server.
If you have just added new software or a new optional device and the server is not working, complete the following steps before using the troubleshooting tables:
1. Remove the software or device that you just added.
2. Run the diagnostic tests to determine whether the server is running correctly.
3. Reinstall the new software or new device.
Chapter 2. Diagnostics 27
CD or DVD drive problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Symptom Action
The CD or DVD drive is not recognized.
A CD or DVD is not working correctly.
The CD or DVD drive tray is not working.
The CD or DVD drive is detected as /dev/sr0 by SUSE LINUX. (If the SUSE LINUX operating system is installed remotely onto a blade server that is not the current owner of the media tray [CD or DVD drive, diskette drive, and USB port], SUSE LINUX detects the CD or DVD drive as /dev/sr0 instead of /dev/cdrom.)
The CD or DVD drive is not recognized after being switched back to the blade server running Windows 2000 Advanced Server with SP3 applied. (When the CD or DVD drive owned by blade server x is switched to another blade server, then is switched back to blade server x, the operating system in blade server x no longer recognizes the CD or DVD drive. This happens when you have not safely stopped the drives before switching ownership of the media tray [CD or DVD drive, diskette drive, and USB port].)
1. Make sure that:
v All cables and jumpers are installed correctly.
v The correct device driver is installed for the CD or DVD drive.
2. Reseat the CD or DVD drive.
3. Replace the CD or DVD drive.
1. Clean the CD or DVD.
2. Reseat the CD or DVD drive.
3. Replace the CD or DVD drive.
Note: The blade server must have ownership of the CD or DVD drive.
1. Insert the end of a straightened paper clip into the manual tray-release opening.
2. Reseat the CD or DVD drive.
3. Replace the CD or DVD drive.
Establish a link between /dev/sr0 and /dev/cdrom as follows:
1. Enter the following command:
rm /dev/cdrom; ln -s /dev/sr0 /dev/cdrom
2. Insert the following line in the /etc/fstab file:
/dev/cdrom /media/cdrom auto ro,noauto,user,exec 0 0
Note: Because the BladeCenter unit uses a USB bus to communicate with the media tray devices, switching ownership of the media tray to another blade server is the same as unplugging a USB device. Before switching ownership of the CD or DVD drive (media tray) to another blade server, safely stop the media tray devices on the blade server that currently owns the media tray, as follows:
1. Double-click the Unplug/Eject Hardware icon in the Windows taskbar at the bottom right of the screen.
2. Select USB Floppy and click Stop.
3. Select USB Mass Storage Device and click Stop.
4. Click Close.
You can now safely switch ownership of the media tray to another blade server.
28 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Diskette drive problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Symptom Action
Diskette drive activity LED stays on, or the system bypasses the diskette drive.
1. If there is a diskette in the drive, verify that:
v The diskette is inserted correctly in the drive. v The diskette is good and not damaged. (Try another diskette if you have
one.) The drive light comes on (one-second flash) when the diskette is inserted.
v The diskette contains the necessary files to start the computer. v The diskette drive is enabled in the Configuration/Setup utility program. v The software program is working properly. v The cable is installed correctly (in the proper orientation).
2. To prevent diskette drive read/write errors, be sure the distance between monitors and diskette drives is at least 76 mm (3 in.).
3. Reseat the following components:
a. Diskette drive cable.
b. Diskette drive
c. Media tray card
4. Replace the components listed in step 3 one at a time, in the order shown, restarting the blade server each time.
General problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Symptom Action
A cover lock is broken, an LED is not working, or a similar problem has occurred.
If the part is a CRU, replace it. If the part is a FRU, the part must be replaced by a trained service technician.
Chapter 2. Diagnostics 29
Hard disk drive problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Symptom Action
Not all drives are recognized by the Fixed Disk diagnostic test.
The server stops responding during the Fixed Disk diagnostic test.
A hard disk drive passes the Fixed Disk diagnostics test but the problem remains.
Remove the drive indicated on the diagnostic test; then, run the Fixed Disk diagnostic test again. If the remaining drives are recognized, replace the drive that you removed with a new one.
Remove the hard disk drive that was being tested when the server stopped responding, and run the diagnostic test again. If the Fixed Disk diagnostic test runs successfully, replace the drive that you removed with a new one.
Run the SCSI Fixed Disk diagnostic test. Note: This test is not available on servers that have RAID arrays or servers that have IDE or SATA disk drives.
Intermittent problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Symptom Action
A problem occurs only occasionally and is difficult to diagnose.
1. Make sure that:
v When the blade server is turned on, air is flowing from the rear of the blade
server at the blower grill. If there is no airflow, the blower is not working. This causes the blade server to overheat and shut down.
v Ensure that the SCSI bus and devices are configured correctly.
2. Check the BMC log (see “Error logs” on page 17).
30 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Keyboard, mouse, or pointing-device problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Symptom Action
All or some keys on the keyboard do not work.
The mouse or pointing device does not work.
Mouse function lost during Red Hat installation.
1. Make sure that: v The keyboard cable is securely connected to the BladeCenter management
module, and the keyboard and mouse cables are not reversed.
v Both the blade server and the monitor are turned on.
2. Replace the keyboard.
3. Replace the management module on the BladeCenter unit; see the Hardware
Maintenance Manual and Troubleshooting Guide or Problem Determination and Service Guide for your BladeCenter unit type.
1. Make sure that:
v The keyboard/mouse/video select button LED on the front of the blade
server is lit, indicating that the blade server is connected to the shared BladeCenter monitor.
v The mouse or pointing-device cable is securely connected to the
BladeCenter management module, and that the keyboard and mouse cables are not reversed.
v The mouse works correctly with other blade servers.
v The mouse device drivers are installed correctly.
v Both the blade server and the monitor are turned on.
v The mouse is recognized as a USB device, not PS/2, by the blade server.
Although the mouse is a PS/2-style device, communication with the mouse is through an internal USB bus in the BladeCenter unit. Some operating systems permit you to select the type of mouse during installation of the operating system. Select USB.
2. Replace the mouse or pointing device.
3. Replace the management module on the BladeCenter unit; see the Hardware
Maintenance Manual and Troubleshooting Guide or Problem Determination and Service Guide for your BladeCenter unit type.
If, while installing Red Hat Enterprise Linux 2.1 to a blade server, you or someone else selects a different blade server as owner of the keyboard, video, and monitor (KVM), you might lose mouse function for the installation process.
Do not switch KVM owners until the installation process begins to install the packages (after the “About to Install” window).
Chapter 2. Diagnostics 31
Memory problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Symptom Action
The amount of system memory displayed is less than the amount of physical memory installed.
1. Make sure that:
v The memory modules are seated properly. v You have installed the correct type of memory. v If you changed the memory, you updated the memory configuration with the
Configuration/Setup Utility program.
v All banks of memory on the DIMMs are enabled. The blade server might
have automatically disabled a DIMM bank when it detected a problem or a DIMM bank could have been manually disabled.
2. Check BMC log for error message 289:
v If the DIMM was disabled by a system-management interrupt (SMI), replace
the DIMM.
v If the DIMM was disabled by the user or by POST:
a. Start the Configuration/Setup Utility program.
b. Enable the DIMM.
c. Save the configuration and restart the computer.
3. Reseat the DIMM.
4. Replace the DIMM.
5. (Trained service technician only) Replace the system board assembly.
Microprocessor problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Symptom Action
The server emits a continuous beep during POST, indicating that the startup (boot) microprocessor is not working correctly.
1. (Trained service technician only) Reseat microprocessor 1.
2. (Trained service technician only) Replace microprocessor 1.
32 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Monitor or video problems
Some IBM monitors have their own self-tests. If you suspect a problem with your monitor, see the documentation that comes with the monitor for instructions for testing and adjusting the monitor. If you cannot diagnose the problem, call for service.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Symptom Action
The screen is blank.
Only the cursor appears. Make sure that the keyboard, video and mouse on the BladeCenter have not been
The monitor goes blank when you direct it to a working blade server, or goes blank when you start some application programs in the blade servers.
1. Make sure that: v The keyboard/mouse/video select button LED on the front of the blade
server is lit, indicating that the blade server is connected to the shared BladeCenter monitor.
v The system power cord is plugged into the BladeCenter power module and a
working electrical outlet.
v The monitor cables are connected properly. v The monitor is turned on and the Brightness and Contrast controls are
adjusted correctly.
v Damaged BIOS code is not affecting the video; see “Recovering from a
BIOS update failure” on page 49.
Important: In some memory configurations, the 3-3-3 beep code might sound during POST followed by a blank display screen. If this occurs and the Boot Fail Count feature in the Start Options of the Configuration/Setup Utility program is set to Enabled (its default setting), you must restart the blade server three times to force the system BIOS to reset the CMOS values to the default configuration (memory connector or bank of connectors enabled).
2. If you have verified these items and the screen remains blank, replace:
a. Monitor
b. Management module on the BladeCenter unit (see the Hardware
Maintenance Manual and Troubleshooting Guide or Problem Determination and Service Guide for your BladeCenter unit type).
switched to another blade server. If the problem remains, see“Solving undetermined problems” on page 50.
Make sure that the monitor cable is connected to the video port on the BladeCenter management module. Some IBM monitors have their own self-tests. If you suspect a problem with the monitor, see the information that comes with the monitor for adjusting and testing instructions.
If you still cannot find the problem, try using the monitor with another blade server. If the problem remains, see the Hardware Maintenance Manual and Troubleshooting Guide or Problem Determination and Service Guide for your BladeCenter unit type.
Chapter 2. Diagnostics 33
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Symptom Action
The screen is wavy, unreadable, rolling, distorted, or has screen jitter.
Wrong characters appear on the screen.
No video.
1. If the monitor self-tests show the monitor is working properly, consider the location of the monitor. Magnetic fields around other devices (such as transformers, appliances, fluorescent lights, and other monitors) can cause screen jitter or wavy, unreadable, rolling, or distorted screen images. If this happens, turn off the monitor. (Moving a color monitor while it is turned on might cause screen discoloration.) Then move the device and the monitor at least 305 mm (12 in.) apart. Turn on the monitor.
Notes:
a. To prevent diskette drive read/write errors, be sure the distance between
monitors and diskette drives is at least 76 mm (3 in.).
b. Non-IBM monitor cables might cause unpredictable problems.
2. Replace the monitor.
3. (Trained service technician only) Replace the system board assembly.
1. If the wrong language is displayed, update the firmware or operating system with the correct language in the blade server that has ownership of the monitor.
2. Replace the monitor.
3. (Trained service technician only) Replace the system board assembly.
1. Make sure that the correct blade server is selected, if applicable.
2. Make sure that all cables are fastened securely.
Network connection problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Symptom Action
One or more blade servers are unable to communicate with the network.
Make sure that: v The switch modules for the network interface being used are installed in the
correct BladeCenter bays and are configured and operating correctly. See the
Hardware Maintenance Manual and Troubleshooting Guide or Problem Determination and Service Guide for your BladeCenter unit type for details.
v The settings in the switch module are appropriate for the blade server (settings
in the switch module are blade-specific).
If the problem remains, see “Solving undetermined problems” on page 50.
34 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Optional-device problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Symptom Action
An IBM optional device that was just installed does not work.
1. Make sure that: v The option is designed for the server (see the ServerProven
World Wide Web at http://www.ibm.com/us/compat/).
v You followed the installation instructions that came with the option. v The option is installed correctly. v You have not loosened any other installed devices or cables. v You updated the configuration information in the Configuration/Setup Utility
program. Whenever memory or any other device is changed, you must update the configuration.
2. If the option comes with its own test instructions, use those instructions to test the option.
3. Reseat the device that you just installed.
4. Replace the device that you just installed.
®
list on the
Power error messages
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Message Action
System Power Good fault
VRM Power Good fault
System over recommended voltage for +12v.
1. Check the BladeCenter unit power. (See the Hardware
Maintenance Manual and Troubleshooting Guide or Problem Determination and Service Guide for your BladeCenter unit type
for details.)
2. Reseat the blade server.
3. Replace the blade server.
1. Check the BladeCenter unit power. (See the Hardware
Maintenance Manual and Troubleshooting Guide or Problem Determination and Service Guide for your BladeCenter unit type
for details.)
2. Reseat the blade server.
3. Replace the blade server.
1. Check the BladeCenter unit power. (See the Hardware
Maintenance Manual and Troubleshooting Guide or Problem Determination and Service Guide for your BladeCenter unit type
for details.)
2. Reseat the blade server.
3. Replace the blade server.
Chapter 2. Diagnostics 35
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Message Action
System over recommended voltage for +1.25v.
System over recommended voltage for +1.5v.
System over recommended voltage for +2.5v.
System over recommended voltage for +3.3v.
System over recommended 5V fault.
VRM voltage over recommended tolerance.
System under recommended voltage for +12v.
System under recommended voltage for +1.25v.
System under recommended voltage for +1.5v.
System under recommended voltage for +2.5v.
System under recommended voltage for +3.3v.
System under recommended 5V fault.
1. Reseat the blade server.
2. Replace the blade server.
1. Reseat the blade server.
2. Replace the blade server.
1. Reseat the blade server.
2. Replace the blade server.
1. Reseat the blade server.
2. Replace the blade server.
1. Reseat the blade server.
2. Replace the blade server.
1. Reseat the blade server.
2. Replace the blade server.
1. Check the BladeCenter unit power. (See the Hardware
Maintenance Manual and Troubleshooting Guide or Problem Determination and Service Guide for your BladeCenter unit type
for details.)
2. Reseat the blade server.
3. Replace the blade server.
1. Reseat the blade server.
2. Replace the blade server.
1. Reseat the blade server.
2. Replace the blade server.
1. Reseat the blade server.
2. Replace the blade server.
1. Reseat the blade server.
2. Replace the blade server.
1. Reseat the blade server.
2. Replace the blade server.
36 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Power problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Symptom Action
Power switch does not work and reset button, if supported, does work.
The blade server does not turn on.
The blade server does not start and the following conditions are present:
v The amber system-error LED
on the BladeCenter unit system LED panel is lit.
v The amber blade error LED
on the blade server control panel is lit.
v The management-module
event log contains the message Processor speed mismatch.
The blade server turns off for no apparent reason
1. Reseat the control-panel connector.
2. Replace the bezel assembly.
3. (Trained service technician only) Replace the system board assembly.
1. Make sure that: a. The power LED on the front of the BladeCenter unit is on. b. The LEDs on all the BladeCenter power modules are on. c. If the blade server is in blade bay 7 through 14 (in a BladeCenter unit) or in
blade bay 5 through 8 (in a BladeCenter T unit), power modules must be present in all four power-module bays.
d. The power-on LED on the blade server control panel is blinking slowly.
v If the power LED is flashing rapidly and continues to do so, the blade
server is not communicating with the management module; reseat the blade server and go to step 3
v If the power LED is off, the blade bay is not receiving power, the blade
server is defective, or the LED information panel is loose or defective.
e. Local power control for the blade server is enabled (use the BladeCenter
management module Web interface to verify), or the blade server was instructed through the management module (Web interface or IBM Director) to start.
2. If you just installed a device in the blade server, remove it, and restart the blade server. If the blade server now starts, you might have installed more devices than the power to that blade bay supports.
3. Try another blade server in the blade bay; if it works, replace the faulty blade server.
4. See “Solving undetermined problems” on page 50.
Make sure that microprocessors 1 and 2 are identical (number of cores, cache size and type, clock speed, internal and external clock frequencies).
(Trained service technician only) If microprocessors are not identical, remove the microprocessor with the incorrect specifications and replace with a microprocessor that has the correct specifications.
1. Make sure that each blade bay has a blade server, expansion unit, or filler blade correctly installed. If these components are missing or incorrectly installed, an over-temperature condition might result in shutdown.
2. (Trained service technician only) If the microprocessor error LED is lit, replace the microprocessor.
Chapter 2. Diagnostics 37
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Symptom Action
The blade server does not turn off.
1. Verify whether you are using an ACPI or non-ACPI operating system. If you are using a non-ACPI operating system:
a. Press Ctrl+Alt+Delete.
b. Turn off the system by holding the power-control button for 4 seconds.
c. If the blade server fails during POST and the power-control button does not
work, remove the blade server from the bay and reseat it.
2. If the problem remains or if you are using an ACPI-aware operating system, suspect the system board.
ServerGuide problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Symptom Action
The ServerGuide Setup and Installation CD will not start.
The ServeRAID Manager program cannot view all installed drives, or the operating system cannot be installed.
The operating-system installation program continuously loops.
The ServerGuide program will not start the operating-system CD.
The operating system cannot be installed; the option is not available.
v Make sure that the CD or DVD drive is associated with the blade server that you
are configuring.
v Make sure that the server supports the ServerGuide program and has a
startable (bootable) CD or DVD drive.
v If the startup (boot) sequence settings have been changed, make sure that the
CD or DVD drive is first in the startup sequence.
v Make sure that there are no duplicate SCSI IDs or IRQ assignments. v Make sure that the hard disk drive is connected correctly.
Make more space available on the hard disk.
Make sure that the operating-system CD is supported by the ServerGuide program. See the ServerGuide Setup and Installation CD label for a list of supported operating-system versions.
Make sure that the server supports the operating system. If it does, either no logical drive is defined (SCSI RAID systems), or the ServerGuide System Partition is not present. Run the ServerGuide program and make sure that setup is complete.
38 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Service processor problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Symptom Action
Service processor in the management module reports a general monitor failure.
Disconnect the BladeCenter unit from all electrical sources, wait for 30 seconds, reconnect the BladeCenter unit to the electrical sources, and restart the server. If the problem remains, see “Solving undetermined problems” on page 50, and the
Hardware Maintenance Manual and Troubleshooting Guide or Problem Determination and Service Guide for your BladeCenter unit type.
Software problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Symptom Action
You suspect a software problem.
1. To determine whether the problem is caused by the software, make sure that: v The server has the minimum memory that is needed to use the software. For
memory requirements, see the information that comes with the software. Note: If you have just installed an adapter or memory, the server might have a memory-address conflict.
v The software is designed to operate on the server. v Other software works on the server. v The software works on another server.
2. If you received any error messages when using the software, see the information that comes with the software for a description of the messages and suggested solutions to the problem.
3. Contact your place of purchase of the software.
Universal Serial Bus (USB) port problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Symptom Action
A USB device does not work. Make sure that:
v The correct USB device driver is installed. v The operating system supports USB devices.
Chapter 2. Diagnostics 39
Light path diagnostics
Light path diagnostics is a system of LEDs on the control panel and on various internal components of the blade server. When an error occurs, LEDs are lit throughout the blade server. By viewing the LEDs in a particular order, you can often identify the source of the error.
Some internal components contain LEDs that can be lit for a short time after you remove the blade server from the BladeCenter unit. After you remove the blade server, you can press and hold the light path diagnostics switch for a maximum of 25 seconds to light the LEDs that were lit before you removed the blade server from the BladeCenter unit. Power remains available to light these LEDs for up to 24 hours after the component is removed from the server. The following components have this feature:
v Microprocessors
v Memory modules (DIMMs)
v Light path diagnostics panel
Viewing the light path diagnostics LEDs
Before working inside the server to view light path diagnostics LEDs, read the safety information that begins on page v and “Handling static-sensitive devices” on page 60.
If an error occurs, view the light path diagnostics LEDs in the following order:
1. Look at the control panel on the front of the server (see “Blade server control panel buttons and LEDs” on page 4).
v If the information LED is lit, it indicates that information about a suboptimal
condition in the server is available in the BMC log or in the management-module event log.
v If the blade-error LED is lit, it indicates that an error has occurred; go to step
2.
2. To view the light path diagnostics panel and LEDs, complete the following steps:
a. Remove the blade server from the BladeCenter unit.
b. Place the blade server on a flat, static-protective surface.
c. Remove the cover from the blade server.
d. Press and hold the light path diagnostics switch to relight the LEDs that
were lit before you removed the blade server from the BladeCenter unit. The LEDs will remain lit for as long as you press the switch, to a maximum of 25 seconds.
40 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
The following illustration shows the locations of the system board error LEDs.
SCSI 0 error
SCSI 1 error LED (CR53) (future use)
LED (CR51) (future use)
BMC error LED (CR47) (future use)
Light path diagnostics panel
Microprocessor 1 error LED (CR54)
Microprocessor 2 error LED (CR50)
DIMM 3 error LED (CR46)
DIMM 4 error LED (CR49)
DIMM 1 error LED (CR52)
DIMM 2 error LED (CR48)
The following illustration shows LEDs on the light path diagnostics panel on the system board.
NMI
MIS S BRD
TEMP
NMI error LED
Microprocessor mismatch error LED
System-board error LED
Over temperature error LED
Light path diagnostics LED
Light path diagnostics switch
Note which LEDs are lit on the system board and the light path diagnostics panel. Using this information and the information in “Light path diagnostics LEDs” on page 42 can often provide enough information to diagnose the error.
Chapter 2. Diagnostics 41
Light path diagnostics LEDs
The following table describes the LEDs on the light path diagnostics panel and on the system board, and suggested actions to correct the detected problems.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Lit light path diagnostics LED Description Action
None An error has occurred and cannot be isolated,
or the service processor has failed.
DIMM x error A memory error occurred.
Microprocessor x error
MIS (Microprocessor mismatch)
NMI (NMI error) The system board has failed.
The microprocessor has failed.
The microprocessors do not match. Make sure that microprocessors 1 and 2 are
An error has occurred that is not represented by a light path diagnostics LED. Check the BMC log for more information about the error.
1. Make sure that the DIMM indicated by the lit LED is supported.
2. Reseat the DIMM indicated by the lit LED.
3. Replace the DIMM indicated by the lit LED.
Note: Multiple DIMM LEDs do not necessarily indicate multiple DIMM failures. If more than one DIMM LED is lit, reseat or replace one DIMM at a time until the error goes away. Refer to the Hardware Maintenance Manual
and Troubleshooting Guide or Problem Determination and Service Guide for your
BladeCenter unit type for further isolation.
1. Make sure that the microprocessor indicated by the lit LED is installed correctly. (See “Microprocessor” on page 78 for installation instructions.)
2. (Trained service technician only) Replace the microprocessor indicated by the lit LED.
identical (number of cores, cache size and type, clock speed, internal and external clock frequencies); also, see “Troubleshooting tables” on page 27.
1. Replace the blade server cover, reinsert the blade server in the BladeCenter unit, and then restart the blade server.
2. Check the BMC log for information about the error.
S BRD (System board error)
The system board has failed
42 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
(Trained service technician only) If the problem remains, replace the system board assembly.
1. Replace the blade server cover, reinsert the blade server in the BladeCenter unit, and then restart the server.
2. (Trained service technician only) Replace the system board assembly.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Lit light path diagnostics LED Description Action
TEMP (Over temperature error)
The system temperature has exceeded a threshold level.
1. Check to see if a blower on the BladeCenter unit has failed. If it has, replace the blower (see the Hardware
Maintenance Manual and Troubleshooting Guide or Problem Determination and Service Guide for your BladeCenter unit
type for more information).
2. Make sure that the room temperature is not too high. (See “LS20 Type 8850 specifications for non-NEBS/ETSI environments” on page 3 for temperature information.)
Diagnostic programs, messages, and error codes
The diagnostic programs are the primary method of testing the major components of the server. As you run the diagnostic programs, text messages and error codes are displayed on the screen and are saved in the test log. A diagnostic text message or error code indicates that a problem has been detected; to determine what action you should take as a result of a message or error code, see the table in “Diagnostic error codes” on page 45.
Running the diagnostic programs
To run the diagnostic programs, complete the following steps:
1. If the blade server is running, turn off the blade server.
2. Turn on the blade server.
3. When the prompt F2 for Diagnostics appears, press F2.
4. From the top of the screen, select either Extended or Basic.
5. From the pulldown menu, select the test that you want to run, and follow the instructions on the screen.
For help with the diagnostic programs, press F1. You also can press F1 from within a help screen to obtain online documentation from which you can select different categories. To exit from the help information, press Esc.
To determine what action you should take as a result of a diagnostic text message or error code, see the table in “Diagnostic error codes” on page 45.
If the diagnostic programs do not detect any hardware errors but the problem remains during normal server operations, a software error might be the cause. If you suspect a software problem, see the information that comes with your software.
Chapter 2. Diagnostics 43
A single problem might cause more than one error message. When this happens, correct the cause of the first error message. The other error messages usually will not occur the next time you run the diagnostic programs.
Exception: If there are multiple error codes or light path diagnostics LEDs that indicate a microprocessor error, the error might be in a microprocessor or in a microprocessor socket. See “Microprocessor problems” on page 32 for information about diagnosing microprocessor problems.
If the blade server stops during testing and you cannot continue, restart the blade server and try running the diagnostic programs again. If the problem remains, replace the component that was being tested when the blade server stopped.
The diagnostic programs assume that a keyboard and mouse are attached to the BladeCenter unit and that the blade server controls them. If you run the diagnostic programs with either no mouse or a mouse attached to the BladeCenter unit that is not controlled by the blade server, you cannot use the Next Cat and Prev Cat buttons to select categories. All other mouse-selectable functions are available through function keys.
To view server configuration information (such as system configuration, memory contents, interrupt request (IRQ) use, direct memory access (DMA) use, device drivers, and so on), select Hardware Info from the top of the screen.
Diagnostic text messages
Diagnostic text messages are displayed while the tests are running. A diagnostic text message contains one of the following results:
Passed: The test was completed without any errors.
Failed: The test detected an error.
User Aborted: You stopped the test before it was completed.
Not Applicable: You attempted to test a device that is not present in the server.
Aborted: The test could not proceed because of the server configuration.
Warning: The test could not be run. There was no failure of the hardware that was
being tested, but there might be a hardware failure elsewhere, or another problem prevented the test from running; for example, there might be a configuration problem, or the hardware might be missing or is not being recognized.
The result is followed by an error code or other additional information about the error.
44 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Viewing the test log
To view the test log when the tests are completed, select Utility from the top of the screen and then select View Test Log. The test-log data is maintained only while you are running the diagnostic programs. When you exit from the diagnostic programs, the test log is cleared.
To save the test log to a file on a diskette or to the hard disk, select Save Log on the diagnostic programs screen and specify a location and name for the saved log file.
Note: To save the test log to a diskette, you must use a diskette that you have formatted yourself; this function does not work with preformatted diskettes. If the diskette has sufficient space for the test log, the diskette can contain other data.
Diagnostic error codes
The following table describes the error codes that the diagnostic programs might generate and suggested actions to correct the detected problems.
If the diagnostic programs generate error codes that are not listed in the table, make sure that the latest level of the BIOS code is installed.
In the error codes, x can be any numeral or letter. However, if the three-digit number in the central position of the code is 000, 195, or 197, do not replace a CRU or FRU. These numbers appearing in the central position of the code have the following meanings:
000 The blade server passed the test. Do not replace a CRU or FRU.
195 The Esc key was pressed to end the test. Do not replace a CRU or FRU.
197 This is a warning error, but it does not indicate a hardware failure; do not
replace a CRU or FRU. Take the action that is indicated in the Action column, but do not replace a CRU or a FRU. See the description for Warning in the section “Diagnostic text messages” on page 44 for more information.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error code Description Action
001-197-000 001-198-000
Test aborted.
1. Check the test log for messages indicating the cause of the error, and take the indicated action.
2. From the diagnostic programs, run Quick Memory Test All Banks; then, if an error is detected, take the indicated action.
3. Reinstall and, if necessary, update the BIOS code on the server; then, rerun the test (see “Updating the firmware” on page 85).
Chapter 2. Diagnostics 45
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error code Description Action
001-250-00n ECC Test logic Failed. See PCDR logs. n =
failing CPU. [1 - 4]
001-292-000 Core system: failed/CMOS checksum failed. Load the BIOS default settings using the
001-xxx-000 Failed core tests. (Trained service technician only) Replace the system
001-xxx-001 Failed core tests. (Trained service technician only) Replace the system
005-xxx-000 Failed video test. (Trained service technician only) Replace the system
030-xxx-000 Failed internal SCSI interface test. (Trained service technician only) Replace the system
035-xxx-099 No adapters were found. Reseat the adapter (if installed).
075-xxx-000 Failed power supply test. Replace the power supply (refer to the Hardware
089-xxx-001 Failed microprocessor test.
089-xxx-002 Failed optional microprocessor test.
1. Restart the server.
2. Run the diagnostic test again.
3. (Trained service technician only) If problem still exists, replace the failing microprocessor.
Configuration/Setup Utility program and run the test again (see “Configuration/Setup Utility menu choices” on page 87).
board assembly.
board assembly.
board assembly.
board assembly.
Maintenance Manual and Troubleshooting Guide or Problem Determination and Service Guide for your
BladeCenter unit type).
1. Reseat microprocessor 1.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor 1
b. (Trained service technician only) System
board assembly
1. Reseat microprocessor 2.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor 2
b. (Trained service technician only) System
board assembly
46 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error code Description Action
166-406-001 BMC indicates failure in I2C bus test.
1. Remove the blade server from the BladeCenter unit, wait 30 seconds, reseat it in the BladeCenter unit and run the test again.
2. Update the BMC firmware (see “Updating the firmware” on page 85).
3. (Trained service technician only) Replace the system board assembly.
166-407-001 System Management: Failed. BMC indicates
failure in I2C bus test.
1. Remove the blade server from the BladeCenter unit, wait 30 seconds, reseat it in the BladeCenter unit and run the test again.
2. Update the BMC firmware (see “Updating the firmware” on page 85).
3. (Trained service technician only) Replace the system board assembly.
166-nnn-001 System Management: Failed. BMC indicates
failure in self test. Note: nnn = 300 to 320.
1. Remove the blade server from the BladeCenter unit, wait 30 seconds, reseat it in the BladeCenter unit and run the test again.
2. Update the BMC firmware (see “Updating the firmware” on page 85).
3. (Trained service technician only) Replace the system board assembly.
166-nnn-001 System Management: Failed. BMC indicates
failure in self test. Note: nnn = 400 to 420, excluding 406 and
407.
1. Remove the blade server from the BladeCenter unit, wait 30 seconds, reseat it in the BladeCenter unit and run the test again.
2. Update the BMC firmware (see “Updating the firmware” on page 85).
3. (Trained service technician only) Replace the system board assembly.
180-xxx-000 Diagnostics LED failure. Run the LED test in the diagnostics program.
180-xxx-001 Failed front LED panel test.
1. Reseat the control-panel connector.
2. Replace the bezel assembly.
3. (Trained service technician only) Replace the system board assembly.
201-198-xxx Memory test aborted. See PC Doctor logs.
1. Restart the server.
2. Run the diagnostic test again.
3. Reinstall the diagnostic program.
4. Refer to the error log for additional information.
201-199-xxx Unexpected error. Test aborted. See PC
Doctor logs.
1. Restart the server.
2. Run the diagnostic test again.
3. Reinstall the diagnostic program.
4. Refer to the error log for additional information.
Chapter 2. Diagnostics 47
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine which components are customer
replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error code Description Action
201-xxx-PBD Test failed.
v P = CPU Number [1 - 4].
v B = Failing Bank/Pair number.
v D = Failing dimm number within Failing
CPU#P.
v D=9=Both dimms in the failing
bank/pair of dimms.
202-xxx-001 Failed system cache test.
202-xxx-002 Failed system cache test.
217-198--xxx Could not establish drive parameters.
217-xxx-000 Failed hard disk drive test.
Note: If RAID is configured, the fixed disk number refers to the RAID logical drive.
217-xxx-001 Failed hard disk drive test.
Note: If RAID is configured, the fixed disk number refers to the RAID logical drive.
405-xxx-000 Failed Ethernet test on controller on the
system board.
1. Reseat the failing DIMM, then; run the memory diagnostic test.
2. Replace the failing DIMM(s), then; run the memory diagnostic test.
3. (Trained service technician only) System board assembly.
1. Reseat microprocessor 1.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor 1
b. (Trained service technician only) System
board assembly
1. Reseat microprocessor 2.
2. Replace the following components one at a time, in the order shown, restarting the server each time.
a. (Trained service technician only)
Microprocessor 2
b. (Trained service technician only) System
board assembly
1. Reseat hard disk drive x.
2. Flash the BIOS code.
3. Replace the following components one at a time, in the order shown, restarting the server each time.
a. Hard disk drive x
b. (Trained service technician only) System
board assembly
1. Reseat hard disk drive 1.
2. Replace hard disk drive 1.
1. Reseat hard disk drive 2.
2. Replace hard disk drive 2.
1. Make sure that Ethernet is not disabled in the Configuration/Setup Utility program.
2. Update the Ethernet firmware (see “Configuring the Gigabit Ethernet controllers” on page 88).
3. (Trained service technician only) Replace the system board assembly.
48 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Recovering from a BIOS update failure
The server has an advanced recovery feature that will automatically switch to a backup BIOS page if the BIOS code in the server has become damaged, such as from a power failure during an update.
The flash memory of the blade server consists of a primary page and a backup page. If the BIOS code in the primary page is damaged, the baseboard management controller will detect the error and automatically switch to the backup page to start the server. If this happens, a POST message Booted from backup POST/BIOS image is displayed. The backup page version may not be the same version as the primary image.
You can then recover or restore the original primary page BIOS by using a BIOS flash diskette.
To recover the BIOS code and restore the server operation to the primary page, complete the following steps:
1. Download the latest version of the BIOS code from http://www.ibm.com/pc/ support/.
2. Update the BIOS code, following the instructions that come with the update file that you downloaded. This will automatically restore and update the primary page.
3. Restart the blade server.
If that procedure fails, the blade server might not restart correctly or might not display video. To manually restore the BIOS code, complete the following steps:
1. Read the safety information that begins on page v and “Handling static-sensitive devices” on page 60.
2. Turn off the blade server.
3. Remove the blade server from the BladeCenter unit (see “Removing and installing the blade server in a BladeCenter unit” on page 61).
4. Remove the cover (see “Operating the blade server cover” on page 64).
5. Locate the BIOS backup page jumper (J16) on the system board (see “System-board jumpers” on page 7).
6. Move the J16 jumper to pins 3 and 4 to enable the backup page.
7. Replace the cover and reinstall the blade server in the BladeCenter unit, making sure that the media tray is selected by the relevant blade server.
8. Insert the BIOS flash diskette into the diskette drive.
9. Restart the blade server. The system begins the power-on self-test (POST).
10. Select 1 - Update POST/BIOS from the menu that contains various flash (update) options.
11. When you are prompted whether you want to move the current POST/BIOS
image to the backup ROM location, press N.
Attention: If you press Y, the damaged BIOS will be copied into the secondary page.
12. When you are prompted whether you want to save the current code to a diskette, press N.
13. Select Update the BIOS.
Attention: Do not restart the blade server at this time.
14. Remove the flash diskette from the diskette drive.
Chapter 2. Diagnostics 49
15. Turn off the blade server, remove it from the BladeCenter unit, and remove the cover of the blade server.
16. Move the J16 jumper to pins 1 and 2 to return to normal startup mode.
17. Replace the cover and reinstall the blade server in the BladeCenter unit; then restart the blade server. The system starts up.
Statement 21:
CAUTION: Hazardous energy is present when the blade server is connected to the power source. Always replace the blade cover before installing the blade server.
Service processor (BMC) error codes
The baseboard management controller (BMC) log contains up to 512 of the most recent service processor errors in IPMI format. These messages are a combination of plain text and error code numbers. You can view the BMC log from the Configuration/Setup Utility menu by selecting Advanced Setup > Baseboard Management Controller (BMC) Settings > BMC System Event Log.
You can view additional information and error codes in plain text by viewing the management-module event log in your BladeCenter unit.
Solving SCSI problems
For any SCSI error message, one or more of the following devices might be causing the problem:
v A failing SCSI device (adapter, drive, or controller)
v An improper SCSI configuration
v Duplicate SCSI IDs in the same SCSI chain
For any SCSI error message, make sure that the SCSI devices are configured correctly.
Solving undetermined problems
Note: When you are diagnosing a problem in the LS20 Type 8850, you must
determine whether the problem is in the blade server or in the BladeCenter unit.
v If all of the blade servers have the same symptom, it is probably a BladeCenter
unit problem; for more information, see the Hardware Maintenance Manual and Troubleshooting Guide or Problem Determination and Service Guide for your BladeCenter unit type.
v If the BladeCenter unit contains more than one blade server and only one of the
blade servers has the problem, troubleshoot the blade server that has the problem.
If the diagnostic tests did not diagnose the failure or if the blade server is inoperative, use the information in this section.
50 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
If you suspect that a software problem is causing failures (continuous or intermittent), see “Software problems” on page 39.
Damaged data in CMOS memory or damaged BIOS code can cause undetermined problems. To reset the CMOS data, remove and replace the battery to override the power-on password and clear the CMOS memory; see “Battery” on page 76. If you suspect that the BIOS code is damaged, see “Recovering from a BIOS update failure” on page 49.
Check the LEDs on all the power supplies of the BladeCenter unit where the blade server is installed. If the LEDs indicate that the power supplies are working correctly, and reseating the blade server does not correct the problem, complete the following steps:
1. Make sure that the control panel connector is correctly seated on the system board (see “System-board connectors” on page 7 for the location of the connector).
2. If no LEDs on the control panel are working, replace the bezel assembly; then, try to power-on the blade server from the BladeCenter Web interface (see the BladeCenter documentation for more information).
3. Turn off the blade server.
4. Remove the blade server from the BladeCenter unit and remove the cover.
5. Make sure that the server is cabled correctly.
6. Remove or disconnect the following devices, one at a time, until you find the failure. Reinstall, turn on, and reconfigure the blade server each time.
v I/O adapter. v Drives. v Memory modules. The minimum configuration requirement is 1 GB (two 512
MB DIMMs).
The following minimum configuration is required for the blade server to start:
v System board v One microprocessor v Two 512 MB DIMMs v A functioning BladeCenter unit
7. Install and turn on the blade server. If the problem remains, suspect the following components in the following order: a. DIMM b. System board c. Microprocessor
If the problem is solved when you remove an I/O adapter from the server but the problem recurs when you reinstall the same adapter, suspect the adapter; if the problem recurs when you replace the adapter with a different one, suspect the system board.
If you suspect a networking problem and the blade server passes all the system tests, suspect a network cabling problem that is external to the system.
Calling IBM for service
See Appendix A, “Getting help and technical assistance,” on page 91 for information about calling IBM for service.
When you call for service, have as much of the following information available as possible:
v Machine type and model
Chapter 2. Diagnostics 51
v Microprocessor and hard disk drive upgrades
v Failure symptoms
– Does the server fail the diagnostic programs? If so, what are the error codes?
– What occurs? When? Where?
– Is the failure repeatable?
– Has the current server configuration ever worked?
– What changes, if any, were made before it failed?
– Is this the original reported failure, or has this failure been reported before?
v Diagnostic program type and version level
v Hardware configuration (print screen of the system summary)
v BIOS code level
v Operating-system type and version level
You can solve some problems by comparing the configuration and software setups between working and nonworking servers. When you compare servers to each other for diagnostic purposes, consider them identical only if all the following factors are exactly the same in all the servers:
v Machine type and model
v BIOS level
v Adapters and attachments, in the same locations
v Address jumpers, terminators, and cabling
v Software versions and levels
v Diagnostic program type and version level
v Configuration option settings
v Operating-system control-file setup
52 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Chapter 3. Parts listing, Type 8850
The following replaceable components are available for the LS20 Type 8850 blade server. To check for an updated parts listing on the Web, complete the following steps:
1. Go to http://www.ibm.com/support/
2. Under Search technical support, type 8850 and click Search.
3. Under Document type, select Parts information and click Go.
Note: The illustrations in this document might differ slightly from your hardware.
1
2
3
5
4
8
7
6
© Copyright IBM Corp. 2005 53
Server replaceable units
Replaceable components are of three types:
v Tier 1 customer replaceable unit (CRU): Replacement of Tier 1 CRUs is your
responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation.
v Tier 2 customer replaceable unit: You may install a Tier 2 CRU yourself or
request IBM to install it, at no additional charge, under the type of warranty service that is designated for your server.
v Field replaceable unit (FRU): FRUs must be installed only by trained service
technicians.
For information about the terms of the warranty and getting service and assistance, see the Warranty and Support Information document.
Index Description
1 Cover and label (all models) 31R3404
2 Hard disk drive, 36 GB SCSI (option) 90P1315
2 Hard disk drive, 36 GB SCSI (option) 39R7336
2 Hard disk drive, 73 GB SCSI (option) 90P1316
2 Hard disk drive, 73 GB SCSI (option) 39R7338
3 Heat sink, microprocessor (all models) 31R3399
Microprocessor with heat sink, Opteron 246 2.0-1 MB (model
4
51x, CTO)
Microprocessor with heat sink, Opteron 250 2.4-1 MB (model
4
71x, 72x, CTO)
Microprocessor with heat sink, Opteron 252, 2.6-1 MB (model
4
81x, 82x, CTO)
Microprocessor with heat sink, Opteron 270 2.0-1 MB, Dual
4
Core (model 55x, 56x, CTO)
Microprocessor with heat sink, Opteron 275 2.2-1 MB, Dual
4
Core (models 65x, 66x, CTO)
Microprocessor with heat sink, Opteron 280 2.4-2 MB, Dual
4
Core (models 76x, CTO)
Microprocessor with heat sink, Opteron 254 2.8-1 MB, (models
4
92x CTO)
Memory, VLP, 512 MB PC3200 (models 51x, 55x, 65, 71x,
5
CTO)
5 Memory, VLP, 1 GB PC3200 (model 81x, CTO) 73P5125
Memory, VLP, 512 MB PC3200CL3 (models 56x, 66x, 76x,
5
CTO)
5 Memory, VLP, 1 GB PC3200CL3 (model 82x, 92x, CTO) 39M5848
5 Memory, VLP, 2 GB PC3200 (option) 73P5126
5 Memory, VLP, 2 GB PC3200 (option) 39M5851
5 Memory, VLP, 4 GB PC3200 (option) 43X0616
6 Bezel assembly (all models) 40K6247
7 Filler, microprocessor heat sink (all models) 31R3407
CRU No.
(Tier 1)
73P5124
39M5845
CRU No.
(Tier 2)
FRU No.
31R3444
31R3394
42C4458
39M4804
25R8447
25R8565
42C4460
54 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Index Description
8 System board assembly (all models) 46C7433
Alcohol wipes (all models) 59P4739
Base, with insulator (all models) 31R3403
Battery, 3.0 volt (all models) 33F8354
Ethernet expansion card (option) 13N2306
Fibre channel expansion card (option) 26R0836
Fibre channel expansion card, SFF (option) 26K4859
Gigabit Ethernet expansion card (option) 46M5963
Grease, heat sink (all models) 41Y9292
Kit, miscellaneous parts (all models)
v Screw, M3.5 x 6 Phillips pan head, system board
v Light pipe, hard disk drive LEDs
v Screw, M3 x 3 Phillips (large head), mobile hard disk drive
tray
v Screw, 4-40 x 4.76 mm flat head
v Guide, blade top edge
v Socket, alignment
v Pin, alignment
v Flyer, ID stickers with instructions
v Assembly, system board lightbox with transparency
v Cover, connector plug
v Tray, daughter card mount, new form factor
Label, FRU list (all models) 31R3401
Label, FRU list (all models) 40K6250
Label, system service (all models) 31R3400
Retention module, heat sink 31R3408
Tray, expansion card (all models) 26K5969
Tray, SCSI hard disk drive 26K5970
CRU No.
(Tier 1)
26K5963
CRU No.
(Tier 2)
FRU No.
Product recovery CDs
Table 1. Recover CDs
Description CRU part number
Microsoft Windows Server 2003 R2 Standard Edition 32-bit w/SP2 1-4 microprocessors, U.S. English
Microsoft Windows Server 2003 R2 Standard Edition 32-bit w/SP2 1-4 microprocessors, French
Microsoft Windows Server 2003 R2 Standard Edition 32-bit w/SP2 1-4 microprocessors, Italian
Microsoft Windows Server 2003 R2 Standard Edition 32-bit w/SP2 1-4 microprocessors, German
Microsoft Windows Server 2003 R2 Standard Edition 32-bit w/SP2 1-4 microprocessors, Spanish
44W4046
44W4047
44W4048
44W4049
44W4050
Chapter 3. Parts listing, Type 8850 55
Table 1. Recover CDs (continued)
Description CRU part number
Microsoft Windows Server 2003 R2 Standard Edition 32-bit w/SP2 1-4 microprocessors, Traditional Chinese
Microsoft Windows Server 2003 R2 Standard Edition 32-bit w/SP2 1-4 microprocessors, Japanese
Microsoft Windows Server 2003 R2 Standard Edition 32-bit w/SP2 1-4 microprocessors, Simplified Chinese
Microsoft Windows Server 2003 R2 Standard Edition 32-bit w/SP2 1-4 microprocessors, Korean
Microsoft Windows Server 2003 R2 Standard Edition 64-bit w/SP2 1-4 microprocessors, U.S. English
Microsoft Windows Server 2003 R2 Standard Edition 64-bit w/SP2 1-4 microprocessors, Japanese
Microsoft Windows Server 2003 R2 Enterprise Edition 32-bit w/SP2 1-2 microprocessors, U.S. English
Microsoft Windows Server 2003 R2 Enterprise Edition 32-bit w/SP2 1-2 microprocessors, French
Microsoft Windows Server 2003 R2 Enterprise Edition 32-bit w/SP2 1-2 microprocessors, German
Microsoft Windows Server 2003 R2 Enterprise Edition 32-bit w/SP2 1-2 microprocessors, Spanish
Microsoft Windows Server 2003 R2 Enterprise Edition 32-bit w/SP2 1-2 microprocessors, Simplified Chinese
Microsoft Windows Server 2003 R2 Enterprise Edition 32-bit w/SP2 1-2 microprocessors, Traditional Chinese
Microsoft Windows Server 2003 R2 Enterprise Edition 32-bit w/SP2 1-2 microprocessors, Japanese
Microsoft Windows Server 2003 R2 Enterprise Edition 32-bit w/SP2 1-2 microprocessors, Korean
Microsoft Windows Server 2003 R2 Enterprise Edition 32-bit w/SP2 1-8 microprocessors, U.S. English
Microsoft Windows Server 2003 R2 Enterprise Edition 32-bit w/SP2 1-8 microprocessors, French
Microsoft Windows Server 2003 R2 Enterprise Edition 32-bit w/SP2 1-8 microprocessors, Italian
Microsoft Windows Server 2003 R2 Enterprise Edition 32-bit w/SP2 1-8 microprocessors, German
Microsoft Windows Server 2003 R2 Enterprise Edition 32-bit w/SP2 1-8 microprocessors, Spanish
Microsoft Windows Server 2003 R2 Enterprise Edition 32-bit w/SP2 1-8 microprocessors, Simplified Chinese
Microsoft Windows Server 2003 R2 Enterprise Edition 32-bit w/SP2 1-8 microprocessors, Traditional Chinese
Microsoft Windows Server 2003 R2 Enterprise Edition 32-bit w/SP2 1-8 microprocessors, Japanese
Microsoft Windows Server 2003 R2 Enterprise Edition 32-bit w/SP2 1-8 microprocessors, Korean
44W4051
44W4052
44W4053
44W4054
44W4055
44W4056
44W4057
44W4058
44W4059
44W4060
44W4061
44W4062
44W4063
44W4064
44W4065
44W4066
44W4067
44W4068
44W4069
44W4070
44W4071
44W4072
44W4073
56 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Table 1. Recover CDs (continued)
Description CRU part number
Microsoft Windows Server 2003 R2 Enterprise Edition 64-bit w/SP2
44W4074
1-2 microprocessors, U.S. English
Microsoft Windows Server 2003 R2 Enterprise Edition 64-bit w/SP2
44W4075
1-2 microprocessors, Japanese
Microsoft Windows Server 2003 R2 Enterprise Edition 64-bit w/SP2
44W4078
1-2 microprocessors, Italian
Microsoft Windows Server Enterprise 32 Embedded Software
68Y9467
Package
Chapter 3. Parts listing, Type 8850 57
58 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Chapter 4. Removing and replacing blade server components
Replaceable components are of three types:
v Tier 1 customer replaceable unit (CRU): Replacement of Tier 1 CRUs is your
responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation.
v Tier 2 customer replaceable unit: You may install a Tier 2 CRU yourself or
request IBM to install it, at no additional charge, under the type of warranty service that is designated for your server.
v Field replaceable unit (FRU): FRUs must be installed only by trained service
technicians.
See Chapter 3, “Parts listing, Type 8850,” on page 53 to determine whether a component is a Tier 1 CRU, Tier 2 CRU, or FRU.
For information about the terms of the warranty and getting service and assistance, see the Warranty and Support Information document.
Installation guidelines
Before you install options, read the following information:
v Read the safety information that begins on page v and the guidelines in
“Handling static-sensitive devices” on page 60. This information will help you work safely.
v Observe good housekeeping in the area where you are working. Place removed
covers and other parts in a safe place.
v Back up all important data before you make changes to disk drives.
®
v Before you remove a hot-swap blade server from the BladeCenter
must shut down the operating system and turn off the blade server. You do not have to shut down the BladeCenter unit itself.
v Blue on a component indicates touch points, where you can grip the component
to remove it from or install it in the blade server, open or close a latch, and so on.
v Orange on a component or an orange label on or near a component indicates
that the component can be hot-swapped, which means that if the server and operating system support hot-swap capability, you can remove or install the component while the server is running. (Orange can also indicate touch points on hot-swap components.) See the instructions for removing or installing a specific hot-swap component for any additional procedures that you might have to perform before you remove or install the component.
v For a list of supported options for the blade server, see http://www.ibm.com/pc/
us/compat/.
unit, you
© Copyright IBM Corp. 2005 59
System reliability guidelines
To help ensure proper cooling and system reliability, observe the following guidelines:
v Make sure that microprocessor socket 2 always contains either a microprocessor
heat sink filler or a microprocessor and heat sink. If the blade server has only one microprocessor, it must be installed in the microprocessor socket 1.
v To maintain proper system cooling, do not operate the BladeCenter unit without a
blade server, expansion unit, or filler blade installed in each blade bay. See the documentation for your BladeCenter unit type for additional information.
Handling static-sensitive devices
Attention: Static electricity can damage the blade server and other electronic
devices. To avoid damage, keep static-sensitive devices in their static-protective packages until you are ready to install them.
To reduce the possibility of damage from electrostatic discharge, observe the following precautions:
v When working on the BladeCenter T unit, use an electrostatic discharge (ESD)
wrist strap, especially when you will be handling modules, options, and blade servers. To work properly, the wrist strap must have a good contact at both ends (touching your skin at one end and firmly connected to the ESD connector on the front or back of the BladeCenter T unit).
v Limit your movement. Movement can cause static electricity to build up around
you.
v Handle the device carefully, holding it by its edges or its frame.
v Do not touch solder joints, pins, or exposed circuitry.
v Do not leave the device where others can handle and damage it.
v While the device is still in its static-protective package, touch it to an unpainted
metal part of the BladeCenter unit or any unpainted metal surface on any other grounded rack component in the rack you are installing the device in for at least 2 seconds. This drains static electricity from the package and from your body.
v Remove the device from its package and install it directly into the blade server
without setting down the device. If it is necessary to set down the device, put it back into its static-protective package. Do not place the device on the blade server cover or on a metal surface.
v Take additional care when handling devices during cold weather. Heating reduces
indoor humidity and increases static electricity.
Returning a device or component
If you are instructed to return a device or component, follow all packaging instructions, and use any packaging materials for shipping that are supplied to you.
60 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Removing and installing the blade server in a BladeCenter unit
Attention:
v To maintain proper system cooling, do not operate the BladeCenter unit without a
blade server, expansion unit, or filler blade installed in each blade bay.
v Note the bay number. Reinstalling a blade server into a different bay than the
one from which it was removed could have unintended consequences. Some configuration information and update options are established according to bay number; if you reinstall the blade server into a different bay, you might need to reconfigure the blade server.
To remove the blade server from a BladeCenter unit, complete the following steps.
1. If the blade server is operating, shut down the operating system; then, press the power-control button (behind the blade server control panel door) to turn off the blade server (see “Turning off the blade server” on page 6 for more information).
Attention: Wait at least 30 seconds, until the hard disk drives stop spinning, before proceeding to the next step.
2. Open the two release levers as shown in the illustration. The blade server moves out of the bay approximately 0.6 cm (0.25 inch).
3. Pull the blade server out of the bay. Spring-loaded doors further back in the bay move into place to cover the bay temporarily.
4. Place either a filler blade or another blade server in the bay within 1 minute. The recessed spring-loaded doors will move out of the way as you insert the blade or filler blade.
Chapter 4. Removing and replacing blade server components 61
To install a blade server in a BladeCenter unit, complete the following steps.
Statement 21:
CAUTION: Hazardous energy is present when the blade server is connected to the power source. Always replace the blade cover before installing the blade server.
1. Read the safety information that begins on page v and “Installation guidelines” on page 59 through “Handling static-sensitive devices” on page 60.
2. If you have not done so already, install any options that you want, such as SCSI drives or memory, in the blade server.
3. Make sure that the release levers on the blade server are in the open position (perpendicular to the blade server).
4. If you installed a filler blade or another blade in the bay from which you removed the blade server, remove it from the bay.
Attention: You must install the blade server in the same blade bay from which you removed it. Some blade server configuration information and update options are established according to bay number. Reinstalling a blade server into a different blade bay from the one from which it was removed could have unintended consequences, and you might have to reconfigure the blade server.
5. Slide the blade server into the blade bay from which you removed it until it stops. The spring-loaded doors farther back in the bay that cover the bay opening move out of the way as you insert the blade server.
Note: When installing any blade server or option in blade bays 7 through 14 (in a BladeCenter unit) or in blade bays 5 through 8 (in a BladeCenter T unit), power modules must be present in all four power-module bays.
6. Push the release levers on the front of the blade server closed.
7. Turn on the blade server (see “Turning on the blade server” on page 6 for instructions).
62 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
8. Make sure that the power-on LED on the blade control panel is lit continuously, indicating that the blade server is receiving power and is turned on.
9. (Optional) Write identifying information on one of the user labels that come with the blade servers and place the label on the BladeCenter unit bezel.
Important: Do not place the label on the blade server or in any way block the ventilation holes on the blade server (see the Installation and User’s Guide for information about the label placement).
10. If you have other blade servers to install, do so now.
Note: Reinstall the bezel assembly on the BladeCenter T unit after you have finished installing the blade servers (see the BladeCenter T Types 8720 and 8730 Installation and User’s Guide for detailed instructions for reinstalling the bezel assembly).
If you have changed the configuration of the blade server, or this is a different blade server than the one you removed, you must configure the blade server with the Configuration/Setup Utility and you might have to install the blade server operating system. Detailed information about these tasks is available in the Installation and User’s Guide.
Chapter 4. Removing and replacing blade server components 63
Operating the blade server cover
To open the blade server cover, complete the following steps.
Blade-cover release
1. Read the safety information that begins on page v and “Installation guidelines” on page 59.
2. If the blade server is installed in a BladeCenter unit, remove it (see “Removing and installing the blade server in a BladeCenter unit” on page 61 for instructions).
3. Carefully lay the blade server down on a flat, non-conductive surface, with the cover side up.
4. Press the blade-cover release on each side of the blade server and lift the cover open.
5. Lay the cover flat, or lift it from the blade server and store it for future use.
Blade-cover release
64 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
To close the blade server cover, complete the following steps.
Statement 21:
CAUTION: Hazardous energy is present when the blade server is connected to the power source. Always replace the blade cover before installing the blade server.
Blade-cover release
Blade-cover release
Important: The blade server cannot be inserted into the BladeCenter unit until the cover is installed and closed. Do not attempt to override this protection.
1. Read the safety information that begins on page v and “Installation guidelines” on page 59.
2. If you removed the blade server bezel assembly, replace it now (see “Removing and replacing the bezel assembly” on page 66 for instructions).
3. Lower the cover so that the slots at the rear slide down onto the pins at the rear of the blade server. Before closing the cover, check that all components are installed and seated correctly and that you have not left loose tools or parts inside the blade server.
4. Pivot the cover to the closed position until it clicks into place.
5. Install the blade server into the BladeCenter unit. See “Removing and installing the blade server in a BladeCenter unit” on page 61 for instructions.
Chapter 4. Removing and replacing blade server components 65
Removing and replacing the bezel assembly
To remove the bezel assembly, complete the following steps.
Bezel-assembly release
Control-panel connector
Control-panel cable
1. Read the safety information that begins on page v and “Installation guidelines” on page 59.
2. Open the blade server cover (see “Operating the blade server cover” on page 64 for instructions).
3. Press the bezel-assembly release on each side of the blade server and pull the bezel assembly away from the blade server approximately 1.2 cm (0.5 inch).
4. Disconnect the control-panel cable from the control-panel connector.
5. Pull the bezel assembly away from the blade server.
6. Store the bezel assembly in a safe place.
Bezel-assembly release
Bezel assembly
To install the bezel assembly, complete the following steps.
Bezel-assembly release
Bezel-assembly release
Control-panel connector
Control-panel cable
Bezel assembly
1. Read the safety information that begins on page v and “Installation guidelines” on page 59.
2. Connect the control-panel cable to the control-panel connector on the system board.
3. Carefully slide the bezel assembly onto the blade server until it clicks into place.
66 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Removing and replacing Tier 1 CRUs
Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation.
The illustrations in this document might differ slightly from your hardware.
SCSI hard disk drive
To remove a SCSI hard disk drive, complete the following steps:
Hard drive release lever
SCSI ID 1
SCSI ID 0
Hard drive release lever
1. Read the safety information that begins on page v and “Installation guidelines” on page 59.
2. Shut down the operating system, turn off the blade server, and remove the blade server from the BladeCenter unit. See “Removing and installing the blade server in a BladeCenter unit” on page 61 for instructions.
3. Carefully lay the blade server on a flat, non-conductive surface.
4. Remove the blade server cover (see “Operating the blade server cover” on page 64 for instructions).
5. Locate the hard disk drive to be removed (SCSI ID 0 or SCSI ID 1).
6. While pulling the blue release lever at the front of the hard disk drive tray, slide the drive out of the SCSI connector and disengage it from the drive tray.
Chapter 4. Removing and replacing blade server components 67
To install a replacement SCSI hard disk drive, complete the following steps.
Hard drive release lever
SCSI ID 1
SCSI ID 0
Hard drive release lever
1. Identify the location (SCSI ID 0 or SCSI ID 1) in which the hard disk drive will be installed.
Attention: Do not press on the top of the drive. Pressing the top could damage the drive.
2. Place the drive into the hard disk drive tray and push it toward the rear of the drive, into the connector until the drive moves past the lever at the front of the tray.
3. Install the blade server cover (see “Operating the blade server cover” on page 64 for instructions).
4. Install the blade server into the BladeCenter unit. See “Removing and installing the blade server in a BladeCenter unit” on page 61 for instructions.
68 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Memory modules (DIMMs)
The following notes describe the types of dual inline memory modules (DIMMs) that the blade server supports and other information that you must consider when installing DIMMs:
v The system board contains four DIMM connectors and supports two-way memory
interleaving.
v As of the date of this publication, the DIMM options that are available for the
blade server are 512 MB, 1 GB, and 2 GB. Depending on the memory configuration set in blade server BIOS, as of the date of this publication, the blade server can support a minimum of 1 GB and a maximum of 8 GB of system memory.
v For optimum performance, one pair of DIMMs is associated with each
microprocessor. The blade server comes with two DIMMs installed for each microprocessor.
Pair DIMM connectors
Microprocessor 1 1 (J31) and 2 (J30) Microprocessor 2 3 (J4) and 4 (J2)
v One pair of DIMMs can be used when two microprocessors are installed, but the
blade server will operate at a lower level of efficiency. When one microprocessor is installed, you can only install DIMMs in the DIMM connectors that are associated with the microprocessor.
v When you install memory, you must install a pair of matched DIMMs.
v Both DIMMs in a pair must be the same size, speed, type, technology, and
physical design. You can mix compatible DIMMs from different manufacturers.
v The second pair does not have to be DIMMs of the same size, speed, type,
technology, and physical design as the first pair.
v Install only 2.6 V, 184-pin, DDR1, PC3200, VLP, registered SDRAM with ECC
DIMMs. For a current list of supported DIMMs for the blade server, see the ServerProven list at http://www.ibm.com/pc/us/compat/.
v Installing or removing DIMMs changes the configuration information for the blade
server. After installing or removing a DIMM, you must change and save the new configuration information by using the Configuration/Setup Utility program. When you restart the blade server, it displays a message indicating that the memory configuration has changed. Start the Configuration/Setup Utility program and select Save Settings. See “Configuration/Setup Utility menu choices” on page 87 for more information.
Chapter 4. Removing and replacing blade server components 69
To remove a memory module (DIMM), complete the following steps:
1. Read the safety information that begins on page v and “Installation guidelines” on page 59.
2. Shut down the operating system, turn off the blade server, and remove the blade server from the BladeCenter unit. See “Removing and installing the blade server in a BladeCenter unit” on page 61 for instructions.
3. Carefully lay the blade server on a flat, non-conductive surface.
4. Remove the blade server cover (see “Operating the blade server cover” on page 64 for instructions).
5. Locate the DIMM connectors on the system board. Determine the connector that contains the DIMM to be replaced.
Microprocessor 1
DIMM 2 (J30)
DIMM 1 (J31)
Microprocessor 2
DIMM 4 (J2)
DIMM 3 (J4)
6. To remove a DIMM, repeat the following steps for each DIMM that you want to remove:
Attention: Opening both DIMM retaining clips at the same time will cause the DIMM to be ejected rapidly from the system board connector. This action can damage the DIMM and other system board components.
a. Carefully open the retaining clip on one end of the DIMM until it stops and
the DIMM begins to rise out of the system board connector.
DIMM
Retaining clip
b. While holding the top edge of the DIMM, carefully open the other retaining
clip until it stops and the DIMM is free from the system board connector.
c. Carefully lift the DIMM out of the blade server and place it on a
non-conductive surface.
70 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
To install a replacement memory module (DIMM), complete the following steps:
1. Read the documentation that comes with the DIMMs.
2. Locate the DIMM connectors on the system board. Determine the connector into which the DIMM will be installed.
Microprocessor 1
DIMM 2 (J30)
DIMM 1 (J31)
Microprocessor 2
DIMM 4 (J2)
DIMM 3 (J4)
3. Touch the static-protective package that contains the DIMM option to any unpainted metal surface on the BladeCenter unit or any unpainted metal surface on any other grounded rack component in the rack you are installing the DIMM option in for at least 2 seconds; then, remove the DIMM from its package.
4. To install a DIMM, repeat the following steps for each DIMM that you install:
a. Make sure that both of the connector retaining clips are in the fully open
position.
DIMM
Retaining clip
b. Turn the DIMM so that the DIMM keys align correctly with the connector on
the system board.
Attention: To avoid breaking the retaining clips or damaging the DIMM connectors, handle the clips gently.
c. Insert the DIMM by pressing the DIMM along the guides into the connector.
Make sure that the retaining clips snap into the closed positions.
Important: If there is a gap between the DIMM and the retaining clips, the DIMM has not been correctly installed. In this case, open the retaining clips and remove the DIMM; then, reinsert the DIMM.
5. Install the blade server cover (see “Operating the blade server cover” on page 64 for instructions).
6. Install the blade server into the BladeCenter unit. See “Removing and installing the blade server in a BladeCenter unit” on page 61 for instructions.
Chapter 4. Removing and replacing blade server components 71
I/O expansion card
The following sections describe how to remove and replace small-form-factor and standard-form-factor I/O expansion cards in the blade server.
Small-form-factor expansion card
To remove a small-form-factor expansion card, complete the following steps:
1. Read the safety information that begins on page v and “Installation guidelines” on page 59.
2. Shut down the operating system, turn off the blade server, and remove the blade server from the BladeCenter unit. See “Removing and installing the blade server in a BladeCenter unit” on page 61 for instructions.
3. Carefully lay the blade server on a flat, non-conductive surface.
4. Open the cover (see “Operating the blade server cover” on page 64 for instructions).
Expansion card
P
R
IN
E
S
S
S
T
H
A
LL
E
R
IN
E
G
W
C
H
A
E
R
N
D
5. Gently pivot the wide end of the card out of the expansion card connectors; then, slide the notched end of the card out of the raised hook on the tray and lift the card out of the blade server.
72 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
To install a replacement small-form-factor expansion card, complete the following steps:
1. Install the small-form-factor expansion card, as shown in the following illustration.
Expansion card
P
R
IN
E
S
S
S
T
H
A
LLIN
E
R
E
G
W
C
H
A
EN
R
D
a. Orient the expansion card.
b. Slide the notch in the narrow end of the card into the raised hook on the
tray; then, gently pivot the card into the expansion card connectors.
2. Install the blade server cover (see “Operating the blade server cover” on page 64 for instructions).
3. Install the blade server into the BladeCenter unit. See “Removing and installing the blade server in a BladeCenter unit” on page 61 for instructions.
Chapter 4. Removing and replacing blade server components 73
Standard-form-factor expansion card
To remove a standard-form-factor expansion card, complete the following steps:
1. Read the safety information that begins on page v and “Installation guidelines” on page 59.
2. Shut down the operating system, turn off the blade server, and remove the blade server from the BladeCenter unit. See “Removing and installing the blade server in a BladeCenter unit” on page 61 for instructions.
3. Carefully lay the blade server on a flat, non-conductive surface.
4. Open the cover (see “Operating the blade server cover” on page 64 for instructions).
Expansion card
INSTALLING CARD
PRESS HERE WHEN
Expansion card tray
5. Gently pivot the wide end of the card out of the expansion card connectors; then, slide the notched end of the card out of the raised hook on the tray and lift the card out of the blade server.
6. To remove the expansion card tray, remove the screws that secure it to the system board.
74 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
To install a replacement standard-form-factor expansion card, complete the following steps.
Expansion card
INSTALLING CARD
PRESS HERE WHEN
Expansion card tray
1. If the expansion card tray was removed, secure the tray to the system board with the screws from the option kit or from the removed drive tray.
2. Orient the expansion card and slide the notch in the narrow end of the card into the raised hook on the tray; then, gently pivot the wide end of the card into the expansion card connectors.
3. Install the blade server cover (see “Operating the blade server cover” on page 64 for instructions).
4. Install the blade server into the BladeCenter unit. See “Removing and installing the blade server in a BladeCenter unit” on page 61 for instructions.
Chapter 4. Removing and replacing blade server components 75
Battery
The following notes describe information that you must consider when replacing the battery in the server.
v When replacing the battery, you must replace it with a lithium battery of the same
type from the same manufacturer.
v To order replacement batteries, call 1-800-426-7378 within the United States, and
1-800-465-7999 or 1-800-465-6666 within Canada. Outside the U.S. and Canada, call your IBM marketing representative or authorized reseller.
v After you replace the battery, you must reconfigure the server and reset the
system date and time.
v To avoid possible danger, read and follow the following safety statement.
Statement 2:
CAUTION: When replacing the lithium battery, use only IBM Part Number 33F8354 or an equivalent type battery recommended by the manufacturer. If your system has a module containing a lithium battery, replace it only with the same module type made by the same manufacturer. The battery contains lithium and can explode if not properly used, handled, or disposed of.
Do not:
v Throw or immerse into water
v Heat to more than 100°C (212°F)
v Repair or disassemble
Dispose of the battery as required by local ordinances or regulations.
To remove the battery, complete the following steps:
1. Read the safety information that begins on page v and “Installation guidelines” on page 59
2. Turn off the blade server and remove it from the BladeCenter unit (see “Removing and installing the blade server in a BladeCenter unit” on page 61 for instructions).
3. Open the blade server cover (see “Operating the blade server cover” on page 64 for instructions).
4. Locate the battery on the system board.
76 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Battery (BH1)
5. Remove the battery:
a. Use one fingernail to press the top of the battery clip away from the battery.
The battery pops up when released.
b. Use your thumb and index finger to lift the battery from the socket.
To install the replacement battery, complete the following steps:
1. Follow any special handling and installation instructions that come with the battery.
2. Insert the replacement battery:
a. Tilt the battery so that you can insert it into the socket on the side opposite
the battery clip.
b. Press the battery down into the socket until it clicks into place. Make sure
that the battery clip holds the battery securely.
3. Install the blade server cover (see “Operating the blade server cover” on page 64 for instructions).
4. Install the blade server into the BladeCenter unit. See “Removing and installing the blade server in a BladeCenter unit” on page 61 for instructions.
5. Turn on the blade server and run the Configuration/Setup Utility program. Set configuration parameters as needed (see “Using the Configuration/Setup Utility program” on page 86 for information).
Chapter 4. Removing and replacing blade server components 77
Removing and replacing FRUs
FRUs must be installed only by trained service technicians.
The illustrations in this document might differ slightly from your hardware.
Microprocessor
Read the following important guidelines before removing a microprocessor that is not faulty (for example, when replacing the system board assembly).
Attention: Do not use a thermal grease syringe with this FRU.
If you are not replacing a defective heat sink or microprocessor, the grease on the heat sink and microprocessor will remain effective if you perform the following steps:
1. Carefully handle the heat sink and microprocessor when removing or installing these components. Do not touch the grease or otherwise allow it to become contaminated.
2. For dual-microprocessor systems, since the microprocessor and the heat sink are a matched set, first transfer the heat sink and microprocessor from one socket to the new system board; then, transfer the other heat sink and microprocessor. (This will ensure that the grease remains evenly distributed between each heat sink and microprocessor.)
The following sections contain the instructions for removing and replacing a microprocessor.
Notes:
v The heat sink FRU is packaged with the thermal grease applied to the underside.
This thermal grease is not available as a separate FRU. The heat sink must be replaced when new grease is required, such as when a defective microprocessor is replaced or if the grease is contaminated.
v If you need to install a new heat sink for any reason, first remove the thermal
grease from the microprocessor with an alcohol pad before attaching the new heat sink.
v The microprocessor FRU for this system board includes a heat sink.
v A heat sink FRU can be ordered separately if the grease becomes contaminated.
To remove a microprocessor, complete the following steps:
1. Read the safety information that begins on page v, the “Installation guidelines” on page 59 and “Handling static-sensitive devices” on page 60.
2. Shut down the operating system, turn off the blade server, and remove the blade server from the BladeCenter unit (see “Removing and installing the blade server in a BladeCenter unit” on page 61).
3. Carefully lay the blade server on a flat, non-conductive surface.
4. Open the blade server cover (see “Operating the blade server cover” on page 64 for instructions).
5. Remove the bezel assembly (see “Removing and replacing the bezel assembly” on page 66 for instructions).
78 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
6. Identify the microprocessor to be removed.
Note: If you are replacing a failed microprocessor, verify that you have selected the correct microprocessor for replacement (see “Light path diagnostics” on page 40).
Heat sink
Microprocessor
7. Remove the heat sink:
a. Loosen two captive screws on one side of the heat sink fully; then, loosen
the other two captive screws.
Note: Loosening the screws on one side fully before loosening the other screws will help to break the thermal bond that adheres the heat sink to the microprocessor.
b. Gently pull the heat sink off of the microprocessor.
Attention: Do not use any tools or sharp objects to lift the locking lever on the microprocessor socket. Doing so might result in permanent damage to the system board.
8. Release the microprocessor locking lever by moving it away from the microprocessor socket and around the locking lever retainer tab; then, rotate it upward to the fully open position.
Microprocessor­locking lever
Locking lever retainer tab
Chapter 4. Removing and replacing blade server components 79
Attention:
v You must ensure that the locking lever on the microprocessor socket is in the
fully open position before you remove the microprocessor from the microprocessor socket. Failure to do so might result in permanent damage to the microprocessor, microprocessor socket, and system board.
Microprocessor
Microprocessor­locking lever
Microprocessor socket
v Avoid touching the components and gold pins on the microprocessor.
9. Pull the microprocessor out of the socket.
80 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
To install a replacement microprocessor, complete the following steps
Heat sink
Microprocessor
Attention: Do not use any tools or sharp objects to lift the locking lever on the microprocessor socket. Doing so might result in permanent damage to the system board.
1. If the microprocessor locking lever is closed, release it by moving it away from the microprocessor socket and around the locking lever retainer tab; then, rotate it upward to the fully open position.
Microprocessor­locking lever
Locking lever retainer tab
Attention:
v You must ensure that the locking lever on the microprocessor socket is in the
fully open position before you install the microprocessor into the microprocessor socket. Failure to do so might result in permanent damage to the microprocessor, microprocessor socket, and system board.
Microprocessor­locking lever
Microprocessor socket
v Avoid touching the components and gold pins on the microprocessor. Make
sure that the microprocessor is completely and correctly seated in the socket. Incomplete insertion might cause damage to the system board or to the microprocessor.
Chapter 4. Removing and replacing blade server components 81
2. Touch the static-protective package that contains the replacement microprocessor to any unpainted metal surface on the blade server or any unpainted metal surface on any other grounded rack component in the rack you are installing the microprocessor in for at least 2 seconds; then, remove the microprocessor from the package.
Attention: Do not use excessive force when pressing the microprocessor into the socket.
3. Center the microprocessor over the microprocessor socket. Align the triangle on the corner of the microprocessor with the triangle on the corner of the socket and make sure that the pin patterns of the microprocessor and microprocessor socket match; then, carefully press the microprocessor into the socket.
Microprocessor
Microprocessor
Microprocessor­locking lever
Microprocessor socket
orientation indicator
Attention: Make sure that the microprocessor is oriented and aligned correctly in the socket before you try to close the lever.
4. Carefully close the lever to secure the microprocessor in the socket.
Note: A new microprocessor comes in a kit with a heat sink.
82 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
5. Install a heat sink on the microprocessor.
Attention:
v Do not set down the heat sink after you remove the plastic cover.
v Do not touch the thermal grease on the bottom of the heat sink. Touching the
thermal grease will contaminate it. If the thermal grease on the microprocessor or heat sink becomes contaminated, contact your service technician.
Thermal grease
Heat sink
a. Remove the plastic protective cover from the thermal material on the bottom
of the heat sink.
b. Make sure that the thermal material is still on the bottom of the heat sink;
then, align and place the heat sink on top of the microprocessor in the retention bracket, grease side down. Press firmly on the heat sink.
c. Align the four captive screws on the heat sink with the holes on the
heat-sink retention module.
d. Press firmly on the captive screws and tighten them, alternating between
screws until they are tight. Do not overtighten the screws by using excessive force. If you are using a torque wrench, tighten the screws to 1.00 to 1.26 Newton-meters (Nm) (8.85 to 11.15 inch-pounds).
Attention: If you need to remove the heat sink after installing it, note that the thermal material might have formed a strong bond between the heat sink and the microprocessor. Do not force the heat sink and microprocessor apart; doing so can damage the microprocessor pins. Loosening the captive screws on one side of the heat sink fully before loosening the captive screws on the other side helps break the bond between the components without damaging them.
6. Install the bezel assembly (see “Removing and replacing the bezel assembly” on page 66 for instructions).
7. Install the blade server cover (see “Operating the blade server cover” on page 64 for instructions).
8. Install the blade server into the BladeCenter unit. See “Removing and installing the blade server in a BladeCenter unit” on page 61 for instructions.
Chapter 4. Removing and replacing blade server components 83
System board assembly
This section describes how to replace the system board assembly. When replacing the system board, you will replace the system board and blade base as one assembly. After replacement, you must either update the system with the latest firmware or restore the pre-existing firmware that the customer provides on a diskette or CD image.
Note: See “System-board layouts” on page 7 for more information on the locations of the connectors, jumpers and LEDs on the system board.
To remove the system board assembly, complete the following steps:
1. Read the safety information that begins on page v, the “Installation guidelines” on page 59 and “Handling static-sensitive devices” on page 60.
2. Shut down the operating system and turn off the blade server (see “Turning off the blade server” on page 6).
3. Remove the blade server from the BladeCenter (see “Removing and installing the blade server in a BladeCenter unit” on page 61).
4. Remove the blade server cover (see “Operating the blade server cover” on page 64).
5. Remove the blade server bezel assembly (see “Removing and replacing the bezel assembly” on page 66).
6. Remove any of the installed components listed below from the system board assembly; then, place them on a non-conductive surface or install them on the new system board assembly.
v I/O expansion card. See “I/O expansion card” on page 72.
v Hard disk drives. See “SCSI hard disk drive” on page 67.
v Microprocessors and heat sinks. See “Microprocessor” on page 78.
v DIMMs. See “Memory modules (DIMMs)” on page 69.
v Battery. See “Battery” on page 76.
To replace the system board assembly, complete the following steps:
1. Install any of the components listed below that were removed from the old system board assembly.
v I/O expansion card. See “I/O expansion card” on page 72.
v Hard disk drives. See “SCSI hard disk drive” on page 67.
v Microprocessors and heat sinks. See “Microprocessor” on page 78.
v DIMMs. See “Memory modules (DIMMs)” on page 69.
v Battery. See “Battery” on page 76.
2. Install the bezel assembly (see “Removing and replacing the bezel assembly” on page 66 for instructions).
3. Install the blade server cover (see “Operating the blade server cover” on page 64 for instructions).
4. Install the blade server into the BladeCenter unit. See “Removing and installing the blade server in a BladeCenter unit” on page 61 for instructions.
84 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Chapter 5. Configuration information and instructions
This chapter provides information about updating the firmware and using the configuration utilities.
Updating the firmware
IBM will periodically make firmware updates available for the blade server. Use the following table to determine the methods you can use to install these firmware updates.
Important: To avoid problems and to maintain proper system performance, always ensure that the blade server BIOS, service processor, and diagnostic firmware levels are consistent for all blade servers within the BladeCenter unit.
Management-
module
Update
Firmware
BIOS code Yes Yes Yes No No No
Diagnostic code
Service processor code (BMC code)
1
User must set up a custom task.
diskette RDM
Yes Yes
Yes Yes Yes Yes No No
Update Xpress
1
Yes N o No N o
Web
interface
Switch­module
Web
interface
Switch­module
Telnet
interface
At some time, you might have to update the service processor (BMC) to apply the latest firmware. Download the latest firmware for the service processor from the IBM Support Web site at http://www.ibm.com/pc/support/. Use the management-module Web interface to update the service processor (BMC) firmware. The Web interface is described in the IBM Eserver BladeCenter Management Module User’s Guide.
More information about updating the service processor is available in the Installation and User’s Guide.
© Copyright IBM Corp. 2005 85
Configuring the blade server
The following configuration programs come with the blade server:
v Configuration/Setup Utility program
The Configuration/Setup Utility program is part of the basic input/output system (BIOS) code in the blade server. Use it to change system settings, such as interrupt requests (IRQ), date and time, and passwords. See “Using the Configuration/Setup Utility program” for more information.
v LSI Logic Configuration Utility program
The LSI Logic Configuration Utility program is part of the BIOS code in the blade server. Use it to set the device scan order and to set the SCSI controller IDs. For more information about this function, see the Installation and User’s Guide.
The IBM Remote Deployment Manager (RDM) Version 4.20 program is available for purchase. You can use IBM RDM Version 4.20 (or later) to install a BIOS code update onto a blade server by following the instructions in the documentation that comes with the RDM program. To determine if an operating system supports the RDM program or for updated information about RDM and information about purchasing the software, go to http://www.ibm.com/pc/ww/eserver/xseries/ systems_management/index.html.
For information about setting up the network configuration for remote management, such as with the IBM Director products, see the IBM Eserver BladeCenter Planning and Installation Guide for your BladeCenter unit type. You can obtain the planning guides from http://www.ibm.com/pc/support/.
Using the Configuration/Setup Utility program
This section provides the instructions to start the Configuration/Setup Utility program and descriptions of the menu choices. Use the Configuration/Setup Utility program to:
v View configuration information
v View and change assignments for devices and I/O ports
v Set the date and time
v Set and change passwords
v Set the startup characteristics of the server and the order of startup devices
v Set and change settings for advanced hardware features
v View and clear error logs
v Change interrupt request (IRQ) settings
v Enable USB legacy keyboard and mouse support
v Resolve configuration conflicts
86 AMD Opteron LS20 Type 8850 for IBM BladeCenter: Problem Determination and Service Guide
Loading...