BladeCenter QS21 Ty pe 0792
Problem Dete rminatio n and Service Guid e
BladeCenter QS21 Ty pe 0792
Problem Dete rminatio n and Service Guid e
Note
Before using this information and the product it supports, read the general information in Appendix C, “Notices,” on page 11 3
and the Warranty and Support Information on the Documentation CD.
Fifth Edition (September 2008)
© Copyright International Business Machines Corporation 2006, 2008.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.
Contents
Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . .1
Related documentation . . . . . . . . . . . . . . . . . . . . . .1
Notices and statements used in this document . . . . . . . . . . . . . .2
Features and specifications . . . . . . . . . . . . . . . . . . . . .2
Support for local storage . . . . . . . . . . . . . . . . . . . . . .3
Turning on the blade server . . . . . . . . . . . . . . . . . . . . .3
Turning off the blade server . . . . . . . . . . . . . . . . . . . . .4
Blade server controls and LEDs . . . . . . . . . . . . . . . . . . .6
System board LEDs . . . . . . . . . . . . . . . . . . . . . . .7
System board internal and expansion card connectors . . . . . . . . . . .8
Chapter 2. Configuring the blade server . . . . . . . . . . . . . . .9
Communicating with the blade server . . . . . . . . . . . . . . . . .9
Using the Advanced Management Module . . . . . . . . . . . . . .9
Using the Web interface . . . . . . . . . . . . . . . . . . .10
Using the command-line interface . . . . . . . . . . . . . . . .10
Using Serial over LAN . . . . . . . . . . . . . . . . . . . . .10
Using the serial interface . . . . . . . . . . . . . . . . . . . .10
Using the SMS utility program . . . . . . . . . . . . . . . . . .11
Starting SMS . . . . . . . . . . . . . . . . . . . . . . .11
Viewing FRU information . . . . . . . . . . . . . . . . . . .12
Adding FRU information . . . . . . . . . . . . . . . . . .13
Updating the system and BMC firmware . . . . . . . . . . . . . . .15
Updating steps . . . . . . . . . . . . . . . . . . . . . . . .16
Determining current blade server firmware levels . . . . . . . . . . .17
Updating the BMC firmware . . . . . . . . . . . . . . . . . . .18
Using the BMC update package . . . . . . . . . . . . . . . .18
Using the Advanced Management Module . . . . . . . . . . . . .18
Installing the system firmware . . . . . . . . . . . . . . . . . .20
The firmware update package . . . . . . . . . . . . . . . . . .21
Using the package . . . . . . . . . . . . . . . . . . . . .21
Updating the system firmware automatically . . . . . . . . . . . .22
Installing the firmware manually . . . . . . . . . . . . . . . . . .22
Updating the system firmware images . . . . . . . . . . . . . . .23
Updating the optional expansion card firmware . . . . . . . . . . . . .23
Integrating the Gigabit Ethernet controller into the BladeCenter . . . . . . .23
Updating the Ethernet controller firmware . . . . . . . . . . . . . . .24
Using the update package . . . . . . . . . . . . . . . . . . . .24
Firmware update steps . . . . . . . . . . . . . . . . . . . . .25
Blade server Ethernet controller enumeration . . . . . . . . . . . . . .26
Chapter 3. Parts listing . . . . . . . . . . . . . . . . . . . . .27
Replaceable components . . . . . . . . . . . . . . . . . . . . .27
Chapter 4. Installing and removing replaceable units . . . . . . . . .29
Installation guidelines . . . . . . . . . . . . . . . . . . . . . .29
System reliability guidelines . . . . . . . . . . . . . . . . . . .30
Handling static-sensitive devices . . . . . . . . . . . . . . . . .30
Removing the blade server from the BladeCenter unit . . . . . . . . . .31
Removing the blade server . . . . . . . . . . . . . . . . . . .31
Opening and removing the blade server cover . . . . . . . . . . . . .32
© Copyright IBM Corp. 2006, 2008 iii
Removing the BladeCenter PCI Express I/O Expansion Unit . . . . . . . .32
Installing the optional InfiniBand card . . . . . . . . . . . . . . . . .33
Adding I/O DDR2 memory modules . . . . . . . . . . . . . . . . .36
Replacing DIMM fillers . . . . . . . . . . . . . . . . . . . . . .37
Installing the SAS expansion card . . . . . . . . . . . . . . . . . .38
Installing the BladeCenter PCI Express I/O Expansion Unit . . . . . . . .40
Removing the blade-server front bezel assembly . . . . . . . . . . . .41
Replacing the system board base and planar . . . . . . . . . . . . . .41
Replacing the battery . . . . . . . . . . . . . . . . . . . . . .42
Using the miscellaneous parts kit . . . . . . . . . . . . . . . . . .45
Replacing the ball studs . . . . . . . . . . . . . . . . . . . .45
Finishing the installation . . . . . . . . . . . . . . . . . . . . .47
Installing the front bezel assembly . . . . . . . . . . . . . . . . .47
Closing the blade server cover . . . . . . . . . . . . . . . . . .49
Input/output connectors and devices . . . . . . . . . . . . . . . . .49
Chapter 5. Diagnostics and troubleshooting . . . . . . . . . . . . .51
Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . .51
Basic checks . . . . . . . . . . . . . . . . . . . . . . . . .51
Finding troubleshooting information . . . . . . . . . . . . . . . . .52
Troubleshooting charts . . . . . . . . . . . . . . . . . . . . . .52
Problems indicated by the front panel LEDs . . . . . . . . . . . . .52
Problems indicated by the system board LEDS . . . . . . . . . . . .54
Power problems . . . . . . . . . . . . . . . . . . . . . . .57
Power throttling . . . . . . . . . . . . . . . . . . . . . . .57
Network connection problems . . . . . . . . . . . . . . . . . .57
Service processor problems . . . . . . . . . . . . . . . . . . .58
Software problems . . . . . . . . . . . . . . . . . . . . . .58
Recovering the system firmware code . . . . . . . . . . . . . . . .59
Checking the boot image . . . . . . . . . . . . . . . . . . . .59
Booting from the TEMP image . . . . . . . . . . . . . . . . . .59
Recovering the TEMP image from the PERM image . . . . . . . . . .59
Supported boot media . . . . . . . . . . . . . . . . . . . . . .59
Booting the system . . . . . . . . . . . . . . . . . . . . . . .60
Diagnostic programs and messages . . . . . . . . . . . . . . . . .62
Running diagnostics and preboot DSA . . . . . . . . . . . . . . .62
Diagnostic text messages . . . . . . . . . . . . . . . . . . . .63
Viewing the test log . . . . . . . . . . . . . . . . . . . . . .63
DSA error messages . . . . . . . . . . . . . . . . . . . . . .63
CPU test results . . . . . . . . . . . . . . . . . . . . . . .64
BMC test results . . . . . . . . . . . . . . . . . . . . . . .64
Memory tests . . . . . . . . . . . . . . . . . . . . . . . .70
System firmware startup messages . . . . . . . . . . . . . . . . .71
Boot errors and handling . . . . . . . . . . . . . . . . . . . . .72
Boot list . . . . . . . . . . . . . . . . . . . . . . . . . .72
System firmware update errors . . . . . . . . . . . . . . . . . .74
Memory initialization errors . . . . . . . . . . . . . . . . . . .75
USB errors . . . . . . . . . . . . . . . . . . . . . . . . .75
Network boot errors . . . . . . . . . . . . . . . . . . . . . .77
SAS boot errors . . . . . . . . . . . . . . . . . . . . . . .79
I/O DIMM boot-time errors . . . . . . . . . . . . . . . . . . . .86
Other error messages . . . . . . . . . . . . . . . . . . . . .88
BMC firmware messages . . . . . . . . . . . . . . . . . . . . .89
NMI error messages . . . . . . . . . . . . . . . . . . . . . .92
Problem reporting . . . . . . . . . . . . . . . . . . . . . . . .94
Problem description . . . . . . . . . . . . . . . . . . . . . . .94
iv BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Solving undetermined problems . . . . . . . . . . . . . . . . . . .95
Calling IBM for service . . . . . . . . . . . . . . . . . . . . . .96
Appendix A. Using the SMS utility . . . . . . . . . . . . . . . . .97
Starting the SMS utility . . . . . . . . . . . . . . . . . . . . . .97
The SMS utility menu . . . . . . . . . . . . . . . . . . . . . .97
Select Language . . . . . . . . . . . . . . . . . . . . . . .98
Setup Remote IPL (Initial Program Load) . . . . . . . . . . . . . .98
IP Parameters . . . . . . . . . . . . . . . . . . . . . . .99
Adapter Configuration . . . . . . . . . . . . . . . . . . . . . 100
Ping Test . . . . . . . . . . . . . . . . . . . . . . . . . 101
Advanced Setup: DHCP . . . . . . . . . . . . . . . . . . . . 101
Change SCSI Settings . . . . . . . . . . . . . . . . . . . . 101
Select Console . . . . . . . . . . . . . . . . . . . . . . . 101
Select Boot Options . . . . . . . . . . . . . . . . . . . . . 102
Firmware Boot Side Options . . . . . . . . . . . . . . . . . . 104
Progress Indicator History . . . . . . . . . . . . . . . . . . . 104
FRU information . . . . . . . . . . . . . . . . . . . . . . . 105
Adding FRU information . . . . . . . . . . . . . . . . . . . 106
SAS Settings . . . . . . . . . . . . . . . . . . . . . . . . 108
Appendix B. Getting help and technical assistance . . . . . . . . . . 111
Before you call . . . . . . . . . . . . . . . . . . . . . . . . 111
Using the documentation . . . . . . . . . . . . . . . . . . . . . 111
Getting help and information from the World Wide Web . . . . . . . . . 111
Software service and support . . . . . . . . . . . . . . . . . . .112
Hardware service and support . . . . . . . . . . . . . . . . . . .112
Appendix C. Notices . . . . . . . . . . . . . . . . . . . . . .113
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . .114
Important notes . . . . . . . . . . . . . . . . . . . . . . . .114
Product recycling and disposal . . . . . . . . . . . . . . . . . . .115
Battery return program . . . . . . . . . . . . . . . . . . . . . .116
Electronic emission notices . . . . . . . . . . . . . . . . . . . .117
Federal Communications Commission (FCC) statement . . . . . . . .117
Industry Canada Class A emission compliance statement . . . . . . . .118
Avis de conformité à la réglementation d’Industrie Canada . . . . . . .118
Australia and New Zealand Class A statement . . . . . . . . . . . .118
United Kingdom telecommunications safety requirement . . . . . . . .118
Deutschsprachiger EU Hinweis: Hinweis für Geräte der Klasse A
EU-Richtlinie zur Elektromagnetischen Verträglichkeit . . . . . . . .118
Deutschland: Einhaltung des Gesetzes über die elektromagnetische
Verträglichkeit von Geräten . . . . . . . . . . . . . . . . .118
Zulassungsbescheinigung laut dem Deutschen Gesetz über die
elektromagnetische Verträglichkeit von Geräten (EMVG) (bzw. der EMC
EG Richtlinie 2004/108/EG) für Geräte der Klasse A . . . . . . . .118
European Union EMC Directive conformance statement . . . . . . . .119
Taiwanese Class A warning statement . . . . . . . . . . . . . . .119
Japanese Voluntary Control Council for Interference (VCCI) statement 120
Korean Class A warning statement . . . . . . . . . . . . . . . . 120
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Contents v
vi BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Safety
Before installing this product, read the Safety Information.
Antes de instalar este produto, leia as Informações de Segurança.
Pred instalací tohoto produktu si prectete prírucku bezpecnostních instrukcí.
Læs sikkerhedsforskrifterne, før du installerer dette produkt.
Lees voordat u dit product installeert eerst de veiligheidsvoorschriften.
Ennen kuin asennat tämän tuotteen, lue turvaohjeet kohdasta Safety Information.
Avant d’installer ce produit, lisez les consignes de sécurité.
Vor der Installation dieses Produkts die Sicherheitshinweise lesen.
Prima di installare questo prodotto, leggere le Informazioni sulla Sicurezza.
Les sikkerhetsinformasjonen (Safety Information) før du installerer dette produktet.
Antes de instalar este produto, leia as Informações sobre Segurança.
© Copyright IBM Corp. 2006, 2008 vii
Antes de instalar este producto, lea la información de seguridad.
Läs säkerhetsinformationen innan du installerar den här produkten.
Guidelines for trained service technicians:
This section contains information for trained service technicians.
Inspecting for unsafe conditions:
Use the information in this section to help you identify potential unsafe conditions in
an IBM product that you are working on. Each IBM product, as it was designed and
manufactured, has required safety items to protect users and service technicians
from injury. The information in this section addresses only those items. Use good
judgment to identify potential unsafe conditions that might be caused by non-IBM
alterations or attachment of non-IBM features or options that are not addressed in
this section. If you identify an unsafe condition, you must determine how serious the
hazard is and whether you must correct the problem before you work on the
product.
Consider the following conditions and the safety hazards that they present:
v Electrical hazards, especially primary power.
v Primary voltage on the frame can cause serious or fatal electrical shock.
v Explosive hazards, such as a damaged CRT face or a bulging capacitor.
v Mechanical hazards, such as loose or missing hardware.
inspect the product for potential unsafe conditions, complete the following steps:
To
1. Make sure that the power is off and the power cord is disconnected.
2. Make sure that the exterior cover is not damaged, loose, or broken, and
observe any sharp edges.
3. Check the power cord:
v Make sure that the third-wire ground connector is in good condition. Use a
meter to measure third-wire ground continuity for 0.1 ohm or less between
the external ground pin and the frame ground.
v Make sure that the power cord is the correct type, as specified in the
documentation for your BladeCenter unit type.
v Make sure that the insulation is not frayed or worn.
Remove the cover.
4.
5. Check for any obvious non-IBM alterations. Use good judgment as to the safety
of any non-IBM alterations.
6. Check inside the blade server for any obvious unsafe conditions, such as metal
filings, contamination, water or other liquid, or signs of fire or smoke damage.
7. Check for worn, frayed, or pinched cables.
8. Make sure that the power-supply cover fasteners (screws or rivets) have not
been removed or tampered with.
Guidelines for servicing electrical equipment:
Observe the following guidelines when servicing electrical equipment:
v Check the area for electrical hazards such as moist floors, nongrounded power
extension cords, and missing safety grounds.
viii BladeCenter QS21 Type 0792: Problem Determination and Service Guide
v Use only approved tools and test equipment. Some hand tools have handles that
are covered with a soft material that does not provide insulation from live
electrical current.
v Regularly inspect and maintain your electrical hand tools for safe operational
condition. Do not use worn or broken tools or testers.
v Do not touch the reflective surface of a dental mirror to a live electrical circuit.
The surface is conductive and can cause personal injury or equipment damage if
it touches a live electrical circuit.
v Some rubber floor mats contain small conductive fibers to decrease electrostatic
discharge. Do not use this type of mat to protect yourself from electrical shock.
v Do not work alone under hazardous conditions or near equipment that has
hazardous voltages.
v Locate the emergency power-off (EPO) switch, disconnecting switch, or electrical
outlet so that you can turn off the power quickly in the event of an electrical
accident.
v Disconnect all power before you perform a mechanical inspection, work near
power supplies, or remove or install main units.
v Before you work on the equipment, disconnect the power cord. If you cannot
disconnect the power cord, have the customer power-off the wall box that
supplies power to the equipment and lock the wall box in the off position.
v Never assume that power has been disconnected from a circuit. Check it to
make sure that it has been disconnected.
v If you have to work on equipment that has exposed electrical circuits, observe
the following precautions:
– Make sure that another person who is familiar with the power-off controls is
near you and is available to turn off the power if necessary.
– When you are working with powered-on electrical equipment, use only one
hand. Keep the other hand in your pocket or behind your back to avoid
creating a complete circuit that could cause an electrical shock.
– When using a tester, set the controls correctly and use the approved probe
leads and accessories for that tester.
– Stand on a suitable rubber mat to insulate you from grounds such as metal
floor strips and equipment frames.
Use extreme care when measuring high voltages.
v
v To ensure proper grounding of components such as power supplies, pumps,
blowers, fans, and motor generators, do not service these components outside of
their normal operating locations.
v If an electrical accident occurs, use caution, turn off the power, and send another
person to get medical aid.
Important:
All caution and danger statements in this documentation begin with a
number. This number is used to cross reference an English caution or
danger statement with translated versions of the caution or danger
statement in the IBM Safety Information book.
For example, if a caution statement begins with a number 1,
translations for that caution statement appear in the IBM Safety
Information book under statement 1.
Safety ix
Be sure to read all caution and danger statements in this
documentation before performing the instructions. Read any additional
safety information that comes with the blade server or optional device
before you install the device.
x BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Statement 1:
DANGER
Electrical
current from power, telephone, and communication cables is
hazardous.
To avoid a shock hazard:
v Do not connect or disconnect any cables or perform installation,
maintenance, or reconfiguration of this product during an electrical
storm.
v Connect all power cords to a properly wired and grounded electrical
outlet.
v Connect to properly wired outlets any equipment that will be attached to
this product.
v When possible, use one hand only to connect or disconnect signal
cables.
v Never turn on any equipment when there is evidence of fire, water, or
structural damage.
v Disconnect the attached power cords, telecommunications systems,
networks, and modems before you open the device covers, unless
instructed otherwise in the installation and configuration procedures.
v Connect and disconnect cables as described in the following table when
installing, moving, or opening covers on this product or attached
devices.
To Connect: To Disconnect:
1. Turn everything OFF.
2. First, attach all cables to devices.
3. Attach signal cables to connectors.
4. Attach power cords to outlet.
1. Turn everything OFF.
2. First, remove power cords from outlet.
3. Remove signal cables from connectors.
4. Remove all cables from devices.
5. Turn device ON.
Safety xi
Statement 2:
CAUTION:
When replacing the lithium battery, use only IBM Part Number 43W9859 or
03N2449 or an equivalent type battery recommended by the manufacturer. If
your system has a module containing a lithium battery, replace it only with
the same module type made by the same manufacturer. The battery contains
lithium and can explode if not properly used, handled, or disposed of.
Do not:
v Throw or immerse into water
v Heat to more than 100°C (212°F)
v Repair or disassemble
Dispose
of the battery as required by local ordinances or regulations.
xii BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Statement 3:
CAUTION:
When laser products (such as CD-ROMs, DVD drives, fiber optic devices, or
transmitters) are installed, note the following:
v Do not remove the covers. Removing the covers of the laser product could
result in exposure to hazardous laser radiation. There are no serviceable
parts inside the device.
v Use of controls or adjustments or performance of procedures other than
those specified herein might result in hazardous radiation exposure.
DANGER
laser products contain an embedded Class 3A or Class 3B laser
Some
diode. Note the following.
Laser radiation when open. Do not stare into the beam, do not view directly
with optical instruments, and avoid direct exposure to the beam.
Class 1 Laser Product
Laser Klasse 1
Laser Klass 1
Luokan 1 Laserlaite
Appareil A Laser de Classe 1
`
Safety xiii
Statement 4:
≥ 18 kg (39.7 lb) ≥ 32 kg (70.5 lb) ≥ 55 kg (121.2 lb)
CAUTION:
Use safe practices when lifting.
Statement 5:
CAUTION:
The power control button on the device and the power switch on the power
supply do not turn off the electrical current supplied to the device. The device
also might have more than one power cord. To remove all electrical current
from the device, ensure that all power cords are disconnected from the power
source.
2
1
xiv BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Statement 8:
CAUTION:
Never remove the cover on a power supply or any part that has the following
label attached.
Hazardous voltage, current, and energy levels are present inside any
component that has this label attached. There are no serviceable parts inside
these components. If you suspect a problem with one of these parts, contact
a service technician.
Statement 13:
DANGER
Overloading a branch circuit is potentially a fire hazard and a shock hazard
under certain conditions. To avoid these hazards, ensure that your system
electrical requirements do not exceed branch circuit protection
requirements. Refer to the information that is provided with your device for
electrical specifications.
Statement 21:
CAUTION:
Hazardous energy is present when the blade is connected to the power
source. Always replace the blade cover before installing the blade.
Safety xv
WARNING: Handling the cord on this product or cords associated with accessories
sold with this product, will expose you to lead, a chemical known to the State of
California to cause cancer, and birth defects or other reproductive harm. Wash
hands after handling.
ADVERTENCIA: El contacto con el cable de este producto o con cables de
accesorios que se venden junto con este producto, pueden exponerle al plomo, un
elemento químico que en el estado de California de los Estados Unidos está
considerado como un causante de cancer y de defectos congénitos, además de
otros riesgos reproductivos. Lávese las manos después de usar el producto.
xvi BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Chapter 1. Introduction
This Problem Determination and Service Guide contains information to help you
solve problems that might occur when installing and using your IBM® BladeCenter®.
It describes the diagnostic tools that come with the BladeCenter QS21, error codes
and suggested actions. It also describes how to replace failing components.
Replaceable components are of three types:
v Tier 1 customer replaceable unit (CRU): Replacement of Tier 1 CRUs is your
responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for
the installation.
v Tier 2 CRU: Yo u may install a Tier 2 CRU yourself or request IBM to install it, at
no additional charge, under the type of warranty service that is designated for
your server.
v Field replaceable unit (FRU): FRUs must be installed only by trained service
technicians.
information about the terms of the warranty and getting service and assistance,
For
see Warranty and Support Information .
The illustrations in this document might differ slightly from the hardware.
Note:
Related documentation
In addition to this document, the following documentation also comes with the
server:
v Installation and User’s Guide
This printed document contains general information about the blade server,
including how to install supported options and how to configure the blade server.
v Safety Information
This document is in Portable Document Format (PDF) on the Documentation CD.
It contains translated caution and danger statements. Each caution and danger
statement that appears in the documentation has a number that you can use to
locate the corresponding statement in your language in the Safety Information
document.
v Warranty and Support Information
This document is in PDF on the Documentation CD. It contains information about
the terms of the warranty and about service and assistance.
v IBM Software Development Kit for Multicore Acceleration Version 3.0.0
Installation Guide
This document is in PDF and can be downloaded from http://www.ibm.com/
support/us/en/.
and how to program applications for the blade server.
It contains information about how to install the operating system
Depending
Documentation CD.
The blade server might have features that are not described in the documentation
that comes with the server. The documentation might be updated occasionally to
include information about those features, or technical updates might be available to
© Copyright IBM Corp. 2006, 2008 1
on the server model, additional documentation might be included on the
provide additional information that is not included in the blade server
documentation. The most recent versions of all BladeCenter documentation are at
http://www.ibm.com/support/us/en/.
In addition to the documentation in this library, be sure to review the planning and
installation documents for your BladeCenter hardware available at
http://www.ibm.com/support/us/en/.
Updates might be available for this document. You can check for the most recent
version at http://www.ibm.com/support/us/en/.
Notices and statements used in this document
The caution and danger statements that appear in this document are also in the
multilingual Safety Information document, which is on the Documentation CD. Each
statement is numbered for reference to the corresponding statement in the Safety
Information document.
The following notices and statements are used in this document:
v Notes: These notices provide important tips, guidance, or advice.
v Important: These notices provide information or advice that might help you avoid
inconvenient or problem situations.
v Attention: These notices indicate potential damage to programs, devices, or
data. An attention notice is placed just before the instruction or situation in which
damage could occur.
v Caution: These statements indicate situations that can be potentially hazardous
to you. A caution statement is placed just before the description of a potentially
hazardous procedure step or situation.
v Danger: These statements indicate situations that can be potentially lethal or
extremely hazardous to you. A danger statement is placed just before the
description of a potentially lethal or extremely hazardous procedure step or
situation.
Features and specifications
The following table provides a summary of the features and specifications of the
BladeCenter QS21.
Through the BladeCenter Advanced Management Module, you can view the blade
server firmware code and other hardware configuration information.
The BladeCenter QS21 is an accessory for the BladeCenter H Type 8852 unit and
the BladeCenter HT Type 8740 and 8750 (enterprise environment only).
Providing it is supported by the BladeCenter unit, you can install and operate any
other model of blade server in the same BladeCenter unit as a BladeCenter QS21.
Note: Power, cooling, removable-media drives, external ports, and advanced
system management are provided by the IBM BladeCenter H and HT units.
For more information, see the documentation that comes with your
BladeCenter unit.
2 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Table 1. Blade server features and specifications
Microprocessor:
Integrated functions:
Two IBM Cell/B.E. PowerPC 64-bit
architecture processors w/VMX with 8
Synergistic Processor Units (SPU),
512 KB L2 cache, 256 KB on each
Synergistic Processing Engine (SPE)
v Two 1 Gigabit Ethernet controllers
v Local service processor
v 2 Cell/B.E. companion chips each
providing a PCIe and a single
PCI-X interface
Memory: Fixed system memory
configuration of 2 GB XDR memory, 1
GB per Cell Broadband Engine
™
(Cell/B.E.) processor. Extra memory
cannot be added
v RS-485 interface for
communication with BladeCenter
Management Module
v USB Controller
Supported
v Serial attached SCSI (SAS)
expansion card
v High-Speed InfiniBand Card,
IB-4x
v I/O Buffer DIMM VLP DDR2 512
MB, total 1 GB per channel
Options:
Environment:
v Ambient temperature:
– Operating temperature: 25°C to
35°C (77°F to 95°F). Altitude: 0
to 2133 m (0 to 7000 ft)
v
Humidity:
– Operating temperature: 8% to
80%
Size:
v Height: 24.5 cm (9.7 inches)
v Depth: 44.6 cm (17.6 inches)
v Width: 2.9 cm (1.14 inches)
v Maximum weight: 5 kg (13.2 lb)
Electrical
input:
v Power supply: 12 V dc
Support for local storage
The BladeCenter provides a SAS solution for local storage. This comprises a SAS
expansion card attached to the blade server, a SAS switch in the rear of the
chassis, and various options to attach storage to that integrated SAS switch. An
optional SAS expansion card is available for the BladeCenter QS21.
Storage can be attached via the external SAS host controller. The BladeCenter
QS21 supports the SAS drives of the IBM System Storage™ DS3200 and the IBM
System Storage EXP3000 expansion unit. Check the IBM BladeCenter support Web
site for details of supported SAS drives at http://www.ibm.com/support/us/en/.
Turning on the blade server
The BladeCenter QS21 is hot-swappable and can be inserted into the BladeCenter
unit when the unit is already powered up. However, it can only be powered on by
one of the methods described in this section. While the blade server is powering up,
the power-on LED on the front of the server is lit. See “Blade server controls and
LEDs” on page 6 for the power-on LED states.
After you have installed the BladeCenter QS21 into a powered up BladeCenter unit,
wait until the power on LED on the blade server flashes slowly before turning on the
blade server.
You can turn on the blade server in any of the following ways:
Using the power control button
You can press the power-control button Figure 1 on page 4 which is behind
the control-panel door on the front of the blade server if local power control
is enabled for the blade server. Local power control is enabled and disabled
through the BladeCenter Management Module Web interface.
Chapter 1. Introduction 3
Power-control
button
Figure 1. Blade server power button
Using the BladeCenter Advanced Management Module
You can use the BladeCenter Management Module Web interface to turn on
the blade server remotely.
Using the Wake on LAN® feature:
If you want to use the Wake on LAN feature, the feature must be enabled in
the installed operating system and it must not have been disabled through
the Advanced Management Module.
In the event of a power failure the BladeCenter unit and then the blade server can
start automatically when power is restored. You must configure this through the
BladeCenter Advanced Management Module. See the BladeCenter Management
Module User's Guide for further information about this feature.
Turning off the blade server
When you turn off the blade server, it is still connected to power through the
BladeCenter unit and can continue to respond to requests from the service
processor, including remote requests to turn the blade server on. To remove all
power from the blade server, you must physically remove it from the BladeCenter
unit or power off the BladeCenter unit.
To avoid loss of data, shut down the Linux ® operating system before you turn off the
blade server. Shut down the operating system by entering the shutdown -h now
command at the command prompt or by choosing shutdown if you are using a
graphical user interface (GUI). See your operating system documentation for
additional information about shutting down the operating system.
If the BladeCenter unit has not been turned off, the blade server can be turned off
in any of the following ways:
Using the power control button
You can press the power control button behind the control-panel door on
the front panel of the blade server. This starts an orderly shutdown of the
operating system, providing your operating system supports this feature,
before turning off the BladeCenter QS21. If the operating system stops
functioning, pressing and holding the power control button for more than 4
seconds turns off the blade server.
Using the BladeCenter Advanced Management Module
You can use the Advanced Management Module Web interface to turn off
4 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
the blade server remotely. Yo u can also configure the Advanced
Management Module to turn off the blade server automatically if the system
is not operating correctly.
Note: After turning off the blade server, wait at least 5 seconds before turning it on
again.
Chapter 1. Introduction 5
Blade server controls and LEDs
This section describes the controls and LEDs on the front panel of the blade server.
For further information about the LEDs and how they can be used to assist in
troubleshooting, see “Problems indicated by the front panel LEDs” on page 52.
Information
LED
Location
LED
Activity
LE D
Power-on
LED
Media-tray
select button
Power-control
button
Blade-error
LED
NMI
reset-button
CD
Figure 2. Power-control button and LEDS
Note: The control panel door which normally covers the LEDs and power-control
button is omitted for reasons of clarity.
Activity LED:
This green LED lights when there is network activity.
Location LED:
This blue LED is turned on remotely by the system administrator to assist in
locating the blade server. The location LED on the BladeCenter unit lights
at the same time.
Information LED:
This amber LED lights to indicate that information about a system error has
been placed in the Advanced Management Module Event Log. The
information LED remains on until turned off by Advanced Management
Module or through IBM Director Console.
Blade error LED:
This amber LED lights when a system error has occurred in the blade
server.
Power control button:
Press this button to turn the blade server on or off. The power control
button only has effect if local power control is enabled for the blade server.
Local power control is enabled and disabled through the BladeCenter
Advanced Management Module Web interface.
Media tray select button:
This button associates the shared BladeCenter unit media tray (DVD/CD
drive and USB ports) with the blade server. The LED on the button flashes
while the request is being processed, then lights when the ownership of the
media tray has been transferred to the blade server.
It can take approximately 20 seconds for the operating system on the blade
server to recognize the media tray.
6 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Power on LED:
reset button
NMI
Note: The blade error LED, information LED, and location LED can be turned off
System board LEDs
The BladeCenter QS21 has status LEDs on the system board to indicate the health
of various components. Some are within the light box while others are in different
location. A lit LEDs indicates an error condition. Complete information about the
LEDs can be found in “Troubleshooting charts” on page 52.
This green LED indicates the power status of the blade server as follows:
v Flashing rapidly - The service processor on the blade server is
communicating with the BladeCenter Advanced Management Module.
v Flashing slowly - The blade server has power but is not turned on.
v Lit continuously (steady) - The blade server has power and is turned on.
v Not lit. Either the BladeCenter unit is powered off, or a power failure has
occurred on the blade server or the BladeCenter unit.
If the operating system has been installed, pressing this with a paper clip or
pin causes the operating system to call the system debugger.
through the BladeCenter Management Module Web interface.
To find out what if any errors have occurred on the system board, you must:
1. Remove the blade server from the BladeCenter unit
2. Open the cover
3. Press the light path diagnostics switch
lights any error LEDs that were turned on during processing. It also lights a
This
green LED to indicate the capacitor is charged and the light path diagnostics
system is operating.
Figure 3 on page 8 shows the location of the light path LEDs and the diagnostics
switch.
Chapter 1. Introduction 7
Temperature fault LED
System board LED
CPU fail LED
NMI error LED
TEMP
S BRD
CPU
NMI
LP
Light box
1
Error LED (JDIM )00
Light path
diagnostics
LED
Light path
diagnostics
switch
Error LED (JDIM10)
JDIM11 slot
JDIM10 slot
JDIM01 slot
JDIM00 slot
Error LED (JDIM01)
Error LED (JDIM11)
Figure 3. System-board LEDs
Pressing the light path diagnostics switch lights the LED(s) to indicate where an
error has occurred.
System board internal and expansion card connectors
The following illustration shows the location of the connectors for user-installable
options.
Connector at J201
Connector at J22
Connector at JFC_18
Connector at J200
1
Figure 4. Locations of the expansion option connectors on the system board
8 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Chapter 2. Configuring the blade server
This chapter describes how to:
v Communicate with a blade server.
v Use System Management Services (SMS) to view and update the system
firmware revision number. This does not require the operating system to be
installed.
v Update the baseboard management controller (BMC) firmware using the
Advanced Management Module.
v Update the system firmware using the command-line utility.
v Configure the Ethernet gigabit controllers and in preparation for a network
installation of the operating system.
You can update the BMC firmware through the Advanced Management
Note:
Module Web interface without booting the operating system. However, to
update the system firmware you must boot the operating system first.
Communicating with the blade server
The operating system does not have to be booted before you can communicate
with the BladeCenter QS21. Yo u can access it through:
Advanced Management Module
The Web-based management and configuration program. This is your main
access method to the blade server.
The command-line interface
See “Using the command-line interface” on page 10 for further information.
Serial over LAN (SOL)
This is similar to the serial interface, but allows you to connect to the blade
server over the network. See “Using Serial over LAN” on page 10 for further
information.
The serial interface
You can connect a PC or compatible terminal directly to the BladeCenter H
or HT unit using a special cable. See “Using the serial interface” on page
10 for further information.
The BladeCenter H and HT Serial Breakout cables are not supplied
Note:
with the unit and must be ordered separately
System Management Services (SMS)
The SMS utility allows you to view and update the VPD, change the boot
device and set network parameters. See “Using the SMS utility program” on
page 11 for further information.
Using the Advanced Management Module
The Advanced Management Module is the main means of administering the
BladeCenter system. Use the Advanced Management Module Web-based
management and configuration program to:
v Configure the BladeCenter unit
v Update and configure BladeCenter components including the BladeCenter QS21
v Monitor the current system status
© Copyright IBM Corp. 2006, 2008 9
v Check the event log for system and other errors
Using the Web interface
Complete the following steps to start the Web-based management and configuration
program:
1. Open a Web browser. In the address or URL field, type the Internet protocol (IP)
address or host name that is assigned for the Management Module remote
connection. The default IP address is:
192.168.70.125
The Enter Network Password window opens.
2. Type your user name and password. Before you log in to the Advanced
Management Module for the first time, contact your system administrator
regarding whether your organization has assigned a user name and password
to you. Use the initial (default) user name and password the first time that you
log in to the Advanced Management Module. If you have an assigned user
name and password, use them for all subsequent logins. All login attempts are
documented in the event log.
The initial user ID and password for the Advanced Management Module are:
User ID
USERID (all capital letters)
Password
Follow the instructions that appear on the screen. Be sure to set the timeout
3.
value that you want for your Web session.
BladeCenter management and configuration window opens.
The
For additional information, see the IBM BladeCenter Advanced Management
Module User's Guide .
Using the command-line interface
The IBM BladeCenter Advanced Management Module also provides a
command-line interface to provide direct access to BladeCenter management
functions. Yo u can use this as an alternative to using the BladeCenter Management
Module Web interface.
Through the command-line interface, you can issue commands to control the power
and configuration of the blade server and other components in the BladeCenter
unit. For information and instructions, see the IBM BladeCenter Management
Module Command-Line Interface Reference Guide .
Using Serial over LAN
To establish a Serial over LAN (SOL) connection to the blade server, you must
configure the SOL feature for the blade server and start an SOL session as
described in theIBM BladeCenter Serial over LAN Setup Guide . In addition, the
Advanced Management Module must be configured as described in the IBM
BladeCenter Management Module User’s Guide , and the BladeCenter unit must be
configured as described in the IBM BladeCenter Serial over LAN Setup Guide .
PASSW0RD (note the number zero, not the letter O, in PASSW0RD)
Using the serial interface
Use the serial interface to:
v Observe firmware progress.
10 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
v Access the Linux terminal in order to configure Linux.
can connect a PC serially through the BladeCenter unit using a specific UART
You
cable. To connect to the serial console, plug the serial cable into the BladeCenter
unit and connect the other end to a serial device or computer with a serial port. For
more information, see the documentation that comes with your BladeCenter unit.
Set the following parameters for the serial connection on the terminal client:
v 115200 baud
v 8 data bits
v No parity
v One stop bit
v No flow control
default, the blade server sends output over SOL and to the serial port on the
By
BladeCenter unit. However, the default for input is to use SOL. If you wish to use a
device connected to the serial port for input you must press any key on that device
while the blade server boots.
Using the SMS utility program
The Advanced Management Module is the main means of administering the
BladeCenter unit and the BladeCenter servers. However, another utility is provided
which in some cases can give more information than that displayed in the Advanced
Management Module. This is the System Management Services (SMS) utility
program.
The SMS utility program allows you to view and update the VPD, change the boot
list and set network parameters.
Starting SMS
Complete the following steps to start SMS:
1. Using a Telnet or SSH client, connect to the Advanced Management Module
external Ethernet interface IP address.
2. When prompted, enter a valid user ID and password. The default management
module user ID is USERID, and the default password is PASSW0RD, where the
0 is a zero.
Note: The user ID and password may have been changed. If so, check with the
system administrator for a valid id and password.
3. Power cycle the blade server and start an SOL console session by using the
power -cycle -c command.
For example, to power cycle and start an SOL remote text console with a blade
server that is in the first bay of the BladeCenter unit, issue the command:
power -cycle -c -T system:blade[1]
To open a console session with a blade server that is already powered on, use
the command:
console -T system:blade[1]
4. After approximately 30 seconds, you see a sequence of checkpoint codes
displayed on the console. These codes are generated by the Power On Self
Test (POST).
5. When the POST menu and indicators displays a screen similar to:
Chapter 2. Configuring the blade server 11
QS21 Firmware Starting
Check ROM = OK
Build Date = Apr 24 2007 09:32:34
FW Version = "QB-1.6.0-0"
Press "F1" to enter Boot Configuration (SMS)
Initializing memory configuration...
MEMORY
Modules = Elpida 512MB, 3200 MHz
XDRlibrary = v0.32, Bin A/C, RevB, DualDD
Calibrate = Done
Test = Done
SYSTEM INFORMATION
Processor = Cell/B.E.(TM) DD3.2 @ 3200 MHz
I/O Bridge = Cell BE companion chip DD2.x
Timebase = 26666 kHz (internal)
SMP Size = 2 (4 threads)
Boot-Date = 2007-06-08 11:20
Memory = 2048MB (CPU0: 1024MB, CPU1: 1024MB)
Press F1 to display the SMS menu.
Viewing FRU information
The VPD on each blade server contains details about the machine type or model,
serial number and the universal unique ID.
Complete the following steps to see this information:
1. Start SMS by completing the above steps. The SMS menu appears:
PowerPC Firmware
Version HEAD
SLOF-SMS 1.6 (c) Copyright IBM Corp. 2000,2005,2007 All rights reserved.
--------------------------------------------------------------------------
Main Menu
1. Select Language
2. Setup Remote IPL (Initial Program Load)
3. Change SCSI Settings
4. Select Console
5. Select Boot Options
6. Firmware Boot Side Options
7. Progress Indicator History
8. FRU Information
9. Change SAS Boot Device
---------------------------------------------------------------------------
Navigation Keys:
X = eXit System Management Services
---------------------------------------------------------------------------
Type menu item number and press Enter or select Navigation key:
---------------------------------------------------------------------------
2. Type 8 to select FRU Information. A screen similar to the following appears:
12 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
PowerPC Firmware
Version HEAD
SLOF-SMS 1.6 (c) Copyright IBM Corp. 2000,2005,2007 All rights reserved.
--------------------------------------------------------------------------------
FRU Information
Machine Type and Model: 079232x
Machine Serial Number: ABCDEFG
Universal Unique ID: 12345678-1234-1234-1234-123456789ABC
--------------------------------------------------------------------------------
Navigation Keys:
M = return to Main Menu
ESC key = return to previous screen X = eXit System Management Services
--------------------------------------------------------------------------------
Select Navigation key :
Note: You cannot change the FRU information from this screen, only view it.
Adding
FRU information: When you replace a FRU details are not recorded in
the VPD. Yo u must enter them manually through SMS.
When the system firmware detects an FRU replacement part during boot the
process stops to allow you to enter the machine type or model and serial number.
Boot does not continue until the information is provided.
To enter new FRU information, complete the following steps:
1. Using a Telnet or SSH client, connect to the Advanced Management Module
external Ethernet interface IP address.
2. When prompted, enter a valid user ID and password. The default management
module user ID is USERID, and the default password is PASSW0RD, where the
0 is a zero.
Note: The userid and password may have been changed. If so, check with the
system administrator for a valid user id and password.
3. Power cycle the blade server and start an SOL console by using the power
-cycle -c command. See “Using the SMS utility program” on page 11 for
further information.
4. The following screen appears:
Chapter 2. Configuring the blade server 13
PowerPC Firmware
Version HEAD
SLOF-SMS 1.6 (c) Copyright IBM Corp. 2000,2005,2007 All rights reserved.
--------------------------------------------------------------------------------
Enter Type Model Number
(Must be 7 characters, only A-Z, a-z, 0-9 allowed. Press Esc to skip)
Enter Type Model Number :
Type the model number according to the instructions on the screen and press
Enter to continue.
5. Yo u must confirm the model number:
PowerPC Firmware
Version HEAD
SLOF-SMS 1.6 (c) Copyright IBM Corp. 2000,2005,2007 All rights reserved.
--------------------------------------------------------------------------------
Number entered is: 1234567
Accept number?
(Enter ’y’ or ’Y’ to accept or ’n’ or ’N’ to decline)
Select Navigation key :
Type y or Y and press Enter to confirm the number.
6. At the following screen, type the serial number:
14 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
PowerPC Firmware
Version HEAD
SLOF-SMS 1.6 (c) Copyright IBM Corp. 2000,2005,2007 All rights reserved.
--------------------------------------------------------------------------------
Enter Serial Number
(Must be 7 characters, only A-Z, a-z, 0-9 allowed)
Enter Serial Number :
---------------------------------------------------------------------------------
Press Enter to continue.
7. Yo u must now confirm the serial number:
PowerPC Firmware
Version HEAD
SLOF-SMS 1.6 (c) Copyright IBM Corp. 2000,2005,2007 All rights reserved.
--------------------------------------------------------------------------------
Number entered is: ABCDEFG
Accept number?
(Enter ’y’ or ’Y’ to accept or ’n’ or ’N’ to decline)
Select Navigation key :
---------------------------------------------------------------------------------
Type y or Y and press Enter to confirm the number.
This completes the process and the blade server continues to boot as normal.
Updating the system and BMC firmware
The firmware consists of two distinct packages:
v A firmware package for the baseboard management controller (BMC). This is
referred to as the BMC firmware.
v A firmware package for the basic input/output system (BIOS) which runs on the
Cell/B.E. processor. This is referred to as system firmware.
Chapter 2. Configuring the blade server 15
Note: The user and operating system interfaces of the system firmware are
based on the Open Firmware standard. Detailed system information is
provided through the Open Firmware device tree. You can use the client
interface and Run-Time Abstraction Services (RTAS) to run management
functions.
firmware
BMC
v Communicates with advanced management module
v Controls power on
v Initializes the board, including the Cell/B.E. processors and clock chips
v Monitors the physical board environment
System firmware
v Takes over when the BMC has successfully initialized the board
v Acts as the basic input/output system (BIOS)
v Includes boot-time diagnostics and power-on self test
v Prepares the system for the operating system boot
packages are delivered separately and do not follow the same versioning
The
scheme.
Updating steps
IBM provides two basic update options for updating or ″ flashing″ the firmware:
online and offline. The offline method requires you to use an alternate bootable
media to restart the server and perform the firmware update. For greater
convenience and flexibility, IBM now also provides online updates that you can
install while the operating system is running. The online method allows you to run
the update at any time, with the flexibility to restart the server at a time when it is
most convenient to do so. As a best practice, use the online update packages to
perform all of your basic update functions
IBM periodically makes updates to both BMC and system firmware. These may be
downloaded from http://www.ibm.com/support/us/en/.
Note: To avoid problems and to maintain proper system performance, always make
sure that both the BMC firmware and the system firmware are at the same
level for all BladeCenter QS21 servers within the BladeCenter unit.
Complete the following steps to update the BMC and system firmware images:
1. Check the revision level of the firmware on the blade server and the level of the
updates on http://www.ibm.com/support/us/en/. If the level on the Web site is
higher than the version currently installed, continue with the updating steps.
2. Download the firmware updates.
3. Boot the operating system if it is not running already.
4. Update the BMC firmware using the update package or the Management
Module. See “Updating the BMC firmware” on page 18 for further information.
5. Restart the blade server. This boots the blade server with the new BMC
firmware.
6. Update the system firmware image. See “Installing the system firmware” on
page 20 for further information.
7. The system reboots. This boots the blade server with the new system firmware.
8. Shut down the blade server.
16 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Note: There may be instances where you must update the BMC firmware before
updating the system firmware. Check the readme file that comes with each
firmware package for more information.
Determining current blade server firmware levels
Complete the following steps to view the current firmware code levels for both the
BMC and the system firmware:
1. Access and log on to the Advanced Management Module Web interface as
described in the Management Module User's Guide .
2. From the Monitors menu section, select Firmware VPD :
The Blade Server Firmware Vital Product Data (VPD) window shows the build
identifier, release, and revision level of both the system firmware/BIOS and the
BMC firmware. In the example above, the system firmware or BIOS version is
QB01020000 and the BMC firmware is BNBT06b.
Compare this information to the firmware information provided at
http://www.ibm.com/support/us/en/. If the two match, then the blade server has the
latest firmware. If not, download the firmware package from the IBM Support Web
site. See “Updating the BMC firmware” on page 18 or the IBM Support Web site for
installation instructions.
You can also view the firmware level from within the operating system by using the
following command:
xxd /proc/device-tree/openprom/ibm,fw-vernum_encoded
Output is similar to:
0000000: 5142 3031 3031 3030 3000 00 QB0101000..
where QB0101000 is the system firmware version.
Chapter 2. Configuring the blade server 17
Note: The system firmware version displayed by the BladeCenter Advanced
Management Module might be different from the version displayed by your
operating system. Cross-reference information is given in the firmware
information at http://www.ibm.com/support/us/en/, and in the readme file
which comes with the firmware image.
Updating the BMC firmware
You can update the BMC firmware from the Linux prompt using the update
package, if you have installed RHEL 5.2, or from Advanced Management Module.
The Linux executable package allows you to run the firmware update without exiting
the Linux environment. In addition, when you run it with the -x (extract) option, the
package allows you to extract the Linux update files to a specified location.
Using the BMC update package
If you have not done so already you must install RHEL 5.2 or later before you can
update the BMC firmware from the Linux command prompt.
Complete the following steps to update the BMC firmware from the Linux command
prompt:
1. Check the README that comes with the BMC firmware as it contains specific
information about that particular firmware release.
2. Boot the blade server and the operating system.
3. Download the package from the IBM support site at http://www.ibm.com/support/
The update package has a .sh extension.
us/en/.
4. Change to the directory where you have downloaded the package.
5. Run the package using the -s option.
6. Reboot the blade server.
Using the Advanced Management Module
Complete the following steps to update the BMC firmware:
1. Download the BMC firmware image file from http://www.ibm.com/support/us/en/
to a suitable location on a server that is accessible on the network. The BMC
firmware image file name has the format BNBT<version number>.pkt .
2. Power off the blade server you want to update.
3. Log in to the Advanced Management Module Web interface.
4. Click Firmware Update from the Blade Tasks submenu at the left of your
screen. The following screen appears:
18 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
5. Choose the blade server you want to update (target) and browse to the
firmware image file.
6. Click on Update .
7. The validity of the image is checked, then the following screen appears:
Chapter 2. Configuring the blade server 19
Click Continue .
8. The next screen shows the firmware update progress:
When the update is finished, a confirmation message appears and an entry is
placed in the Advanced Management Module log.
9. Power up and boot the blade server.
BladeCenter QS21 firmware contains a proprietary implementation of
Note:
Cell/B.E. hardware initialization code.
Installing the system firmware
System firmware can only be installed after the operating system has booted. If the
operating system is not installed or cannot boot, then no upgrade or recovery is
possible. See the other sections of the manual Chapter 5, “Diagnostics and
troubleshooting,” on page 51 for further information about troubleshooting the
BladeCenter QS21 blade server.
You can update the system firmware:
v Through IBM Director. See the IBM Director documentation on the IBM Director
CD for further information.
v Using the update package available from http://www.ibm.com/support/us/en/. See
“Updating the system firmware automatically” on page 22 for further information
on how to perform an update.
v Using the update_flash script available on supported Linux operating systems.
This requires the system firmware image file. See “The firmware update
package” on page 21 for information about how to extract the file.
v Updating the firmware manually. See “Installing the firmware manually” on page
22 for further information.
20 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
For all the above options Linux needs to have a current version of rtas_flash
device driver installed. This is normally installed with the operating system. If it is
not, see the installation guide for the Software Development Kit for Multicore
Acceleration for instructions about how to get this device driver and install it.
Note: You may have to update the BMC before updating the system firmware. See
the README file that comes with the package.
The firmware update package
You can update firmware using the update packages available from
http://www.ibm.com/support/us/en/. These can be installed either through IBM
Director or by executing the .sh file contained in the package. This section
describes how to use the update package to install the firmware update or extract
the firmware image for manual installation.
To install the firmware package using IBM Director, see the documentation on the
IBM Director CD.
Note: The blade server must be configured and have a running Linux operating
system before the package can be extracted or installed.
The update package consists of 4 files:
v A file containing the change history for the BladeCenter QS21 system firmware.
This has a .chg extension.
v A file containing the update package. This has an .sh extension.
v A readme file for the update package. This contains specific installation and
configuration information.
v An XML file. This file is for use by IBM Systems Management tools, including
IBM Director Update Manager, UpdateXpress CD, and UpdateXpress System
Pack Installer.
Using the package
The package consists of a file with a .sh extension that runs from the Linux prompt.
It has a number of options. To see what options are available, run the package
without any options or with the -h switch:
# ./ibm_fw_bios_qb-1.9.1-2_linux_cell.sh
In this example, ibm_fw_bios_qb-1.9.1-3_linux_cell.sh is the name of the
firmware update package. The file name changes according to the version of the
firmware.
A screen similar to the following appears:
Usage:
-x /someDirectory - Extract the payload to <some directory>
-xr /someDirectory - Extract the payload plus PkgSdk files to <some directory>
-xd /dev/fd0 - Create a DOS bootable diskette - Internel floppy drive
-xd /dev/sda - Create a DOS bootable diskette - External USB floppy drive
-u - Perform update unattended
-h - Display this help screen
++debug - Display helpful debug information
Note:
All other command line arguments are passed to the
payload executable
The -xd options are not supported on the BladeCenter QS21 blade server.
Chapter 2. Configuring the blade server 21
The -x option
This enables to extract another executable file, in this example
ibm_fw_bios_qb-1.9.1-2.sh which in turn may be run to create the .bin file
required if you wish to update the firmware manually. See “Installing the
firmware manually” for further information.
The -u option
This performs an unattended and automatic update of the system firmware.
The blade server reboots automatically as part of the update process.
Updating the system firmware automatically
Complete the following steps to update the firmware automatically using the update
package:
1. Check the README before attempting to update the system firmware as it
contains specific information about the particular firmware release.
2. Download the update package from http://www.ibm.com/support/us/en/. The
update package has a .sh extension.
3. Change to the directory where you have downloaded the package.
4. Run the package with the -u option. Using the example from above, at the
command prompt enter:
./ibm_fw_bios_qb-1.9.1-2_linux_cell.sh -u
5. Check the system firmware images to confirm the update has succeeded. See
“Determining current blade server firmware levels” on page 17 for instructions.
Installing the firmware manually
If you cannot update the firmware using the update_flash script, it is possible to
update the firmware manually. Yo u can use rtas_flash over /proc .
Complete the following steps to install the firmware manually:
1. Download the update package from http://www.ibm.com/support/us/en/.
2. Extract the system firmware image package. At the command prompt enter:
./<update package> -x <target directory>
For example, to extract the image package ibm_fw_bios_qb-1.9.1-2.sh from
ibm_fw_bios_qb-1.9.1-2_linux_cell.sh in the directory /temp/fwimage enter:
./ibm_fw_bios_qb-1.9.1-2_linux_cell.sh -x /temp/fwimage
If the directory does not exist the firmware package creates it.
3. Change to the directory containing the firmware image package.
4. Extract the firmware image. At the command prompt enter:
./<image package> -x
For example, to extract the image file QB-1.9.1-2-boot_rom.bin from
ibm_fw_bios_qb-1.9.1-2.sh enter:
./ibm_fw_bios_qb-1.9.1-2.sh -x
5. Ensure the rtas_flash driver is loaded. To do this, run lsmod .
6. If the module is not yet in the kernel, invoke the following to load it:
modprobe rtas_flash
7. To update your current firmware, copy the image file to /proc/ppc64/rtas/
firmware_update and reboot manually:
cp <image-file> /proc/ppc64/rtas/firmware_update
shutdown —r now
22 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
For example, to copy the image file cp QB-1.9.1-2-boot_rom.bin to
/proc/ppc64/rtas/firmware_update enter:
cp QB-1.9.1-2-boot_rom.bin /proc/ppc64/rtas/firmware_update
shutdown —r now
8. Once the system reboots, update the system firmware images. See “Updating
the system firmware images” for instructions.
Updating the system firmware images
Once the system firmware is updated, the BladeCenter QS21 boots from the new
firmware. However, there are always two copies of the system firmware image on
the blade server:
TEMP This is the firmware image normally used in the boot process. When the
firmware is updated, it is the TEMP image that is replaced.
PERM This is a backup copy of the system firmware boot image. The blade server
only boots from this image if the TEMP image is corrupt. See “Recovering
the system firmware code” on page 59 for further information about how to
recover from a corrupt TEMP image.
you have updated the system firmware and booted the blade server, you
Once
should copy the TEMP image to the PERM image. This ensures that the PERM and
TEMP images are at the same revision level. The TEMP and PERM images should
always be at the same revision level.
There are two commands you can use to update an old image on PERM.
v From the Linux prompt issue the following command:
update_flash -c
Note: The script checks whether the board has booted from the TEMP image. If
not, the script does not complete.
v From the Linux prompt issue the following command:
echo 0 > /proc/rtas/manage_flash
For more information on booting from the TEMP or PERM images, see “Recovering
the system firmware code” on page 59.
Updating the optional expansion card firmware
If you have installed the SAS optional expansion card or the high-speed InfiniBand
expansion card you may have to update the firmware. See the documentation that
comes with the components for instructions about how to update the firmware.
IBM periodically makes updates available for both SAS and InfiniBand expansion
cards. These may be downloaded from http://www.ibm.com/support/us/en/.
Integrating the Gigabit Ethernet controller into the BladeCenter
One dual-port Gigabit Ethernet controller is integrated on the blade server system
board. Each controller port provides a 1000-Mbps full-duplex interface connecting to
one of the Ethernet Switch Modules in BladeCenter unit I/O bays 1 and 2 of the
BladeCenter H unit or the BladeCenter HT unit. These enable simultaneous
transmission and reception of data on the Ethernet local area network (LAN).
Chapter 2. Configuring the blade server 23
Each Ethernet-controller port on the system board is routed to a different switch
module in I/O bay 1 or bay 2. The routing from the Ethernet-controller port to the
I/O bay varies according to whether an Ethernet adapter is enabled and the
operating system that is installed. See “Blade server Ethernet controller
enumeration” on page 26 for information about how to determine the routing from
the Ethernet-controller ports to I/O bays for your blade server.
You do not have to set any jumpers or configure the controller for the blade server
operating system. However, you must install a device driver to enable the blade
server operating system to address the Ethernet-controller ports. For device drivers
and information about configuring your Ethernet controller ports, see the Ethernet
software documentation that comes with your blade server, or contact your IBM
marketing representative or authorized reseller. For updated information about
configuring the controllers, go to the Barcelona Computing Centre Web site at
http://www.bsc.es/projects/deepcomputing/linuxoncell/.
If your blade server contains a different type of optional Ethernet-compatible
Note:
switch module in I/O bay 1 than the switch modules that are mentioned in
this section, see the documentation that comes with the Ethernet switch
module that you are using.
Updating the Ethernet controller firmware
To update the Ethernet controller firmware, you must download an update package
from http://www.ibm.com/support/us/en/. This section describes how to use the
update package to install the firmware update.
The update package consists of four files:
v A file containing the change history for the QS22 Ethernet Controller firmware.
This has a .chg extension.
v A file containing the update package. This has an .sh extension.
v A readme file for the update package. This contains specific installation and
configuration information.
v An XML file. This file is for use by IBM Systems Management tools, including
IBM Director Update Manager, UpdateXpress CD, and UpdateXpress System
Pack Installer.
Using the update package
The package consists of an file with a .sh extension that runs from the Linux
prompt. It has a number of options. To see what options are available, run the
package without any options or with the -h switch:
# ./brcm_fw_nic_2.0.3-e-1_rhel5_cell.sh
In the example shown above, brcm_fw_nic_2.0.3-e-1_rhel5_cell.sh is the name
of the firmware update package. The file name changes according to the version of
the firmware.
A screen similar to the following appears:
24 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Usage:
-x /someDirectory - Extract the payload to <some directory>
-xr /someDirectory - Extract the payload plus PkgSdk files to <some directory>
-xd /dev/fd0 - Create a DOS bootable diskette - Internel floppy drive
-xd /dev/sda - Create a DOS bootable diskette - External USB floppy drive
-u - Perform update unattended
-h - Display this help screen
++debug - Display helpful debug information
The -xd and -x options are not supported on BladeCenter QS21.
The -u option performs an unattended and automatic update of the firmware. The
blade server reboots automatically as part of the update process.
Firmware update steps
Complete the following steps to update the firmware automatically:
1. Check the README before attempting to update the system firmware as it
contains specific information about the particular firmware release.
2. Download the update package from http://www.ibm.com/support/us/en/. The
update package has a .sh extension.
3. Change to the directory where you have downloaded the package.
4. Run the package with the -u option. Using the example from above, at the
command prompt enter:
./ brcm_fw_nic_2.0.3-e-1_rhel5_cell.sh -u
During the update process, messages similar to the following appear on the
console:
[root@c4b14 brcm-2.0.3-ppc]# ./ brcm_fw_nic_2.0.3-e-1_rhel5_cell.sh -u
IBM Ethernet Firmware Update Tool, Version 1.0.2
Warning. No Broadcom NetXtreme II adapters found.
ADAPTER MAC BOOT IPMI ASF PXE UMP
------------------- ---- ---- --- --- --001A640E030C (5704s) 3.21 2.20 NA NA NA
001A640E030D (5704s) NA NA NA NA NA
Updating Broadcom NetXtreme adapters.
Updating 001A640E030C using file 16A8bc.bin ---> Update successful
Updating 001A640E030C using file 16A8ipmi.bin ---> Update successful
Error! Firmware not detected on device 001A640E030D.
Warning. No Broadcom NetXtreme II adapters found.
ADAPTER MAC BOOT IPMI ASF PXE UMP
------------------- ---- ---- --- --- --001A640E030C (5704s) 3.38 2.47 NA NA NA
001A640E030D (5704s) NA NA NA NA NA
One or more errors occurred during the firmware update process. See /var
Note: The error message shown above is correct as it refers to an adapter not
available on BladeCenter QS21.
Chapter 2. Configuring the blade server 25
Blade server Ethernet controller enumeration
The enumeration of the Ethernet controller or controller ports in a blade server is
operating system dependent. Yo u can verify the Ethernet controller or controller port
designations that a blade server uses through your operating system settings.
The routing of an Ethernet controller or controller port to a particular BladeCenter
unit I/O bay depends on the type of Ethernet expansion card that is installed. You
can verify which Ethernet-controller port in this blade server is routed to which I/O
bay by using the following test:
1. Install only one Ethernet switch module or pass-thru module, in I/O bay 1.
2. Make sure that the ports on the switch module or pass-thru module are enabled
(Switch Tasks → Management → Advanced Switch Management in the
BladeCenter Management Module Web interface).
3. Enable only one of the Ethernet-controller ports on the blade server. Note the
designation that the blade server operating system has for the controller port.
4. Ping an external computer on the network connected to the Ethernet switch
module. If you can ping the external computer, the Ethernet-controller port that
you enabled is associated with the switch module in I/O bay 1. The other
Ethernet-controller port in the blade server is associated with the switch module
in I/O bay 2.
Communications from optional I/O expansion cards are routed to I/O bays 3 and 4.
If you have installed an I/O expansion card on the blade server you can verify
which controller port on an expansion card is routed to which I/O bay by performing
the same test, using a controller on the expansion card and a compatible switch
module or pass-thru module in I/O bay 3 or 4.
26 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Chapter 3. Parts listing
This parts listing supports BladeCenter QS21 replaceable components. To check for
an updated parts list on the Web, do the following:
1. Go to http://www.ibm.com/support/.
2. Under Find resources , select Upgrades, accessories and parts .
Replaceable components
Replaceable components are of three types:
v Tier 1 customer replaceable unit (CRU): Replacement of Tier 1 CRUs is your
responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for
the installation.
v Tier 2 CRU: Yo u may install a Tier 2 CRU yourself or request IBM to install it, at
no additional charge, under the type of warranty service that is designated for
your server.
v Field replaceable unit (FRU): FRUs must be installed only by trained service
technicians.
information about the terms of the warranty and getting service and assistance,
For
see Warranty and Support Information .
The following table lists which replaceable components are available for the
BladeCenter QS21.
Description FRU No.
DIMM , VLP 512 MB DDR2 I/O Buffer 39M5860
Cisco 4X Infiniband Expansion Card for IBM BladeCenter 32R1763
InfiniBand 4X DDR Expansion Card (CFFh) 43W4425
Front bezel 60H2963
BladeCenter QS21 blade assembly, base and planar 60H2960
3V lithium battery 43W9859
SAS expansion card 39Y9188
BladeCenter PCI Express I/O Expansion Unit 43W4390
DIMM filler 60H2962
Miscellaneous Parts Kit 60H3251
Blade Cover and Warning Label 46C7201
System Service Label 60H2965
FRU List Label 60H2966
Tier 1 CRU
No.
Tier 2 CRU
No.
Part numbers can change and other options can become available. For the latest
information, check the IBM Web site at http://www.ibm.com/support/us/en/.
© Copyright IBM Corp. 2006, 2008 27
28 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Chapter 4. Installing and removing replaceable units
This chapter provides instructions for replacing units on the blade server.
Replaceable units are components, such as memory modules, and I/O expansion
cards. Some removal instructions are provided in case you need to replace one
replaceable with another.
You can replace the following items:
v Battery
v Front bezel assembly (control panel)
v Blade server cover
v Impedance air baffles
v DIMM fillers
v System board
can add or remove the following optional items:
You
v Cisco 4X Infiniband Expansion Card for IBM BladeCenter
v InfiniBand 4X DDR Expansion Card (CFFh)
v I/O buffer DDR2 memory modules
v SAS expansion card
v BladeCenter Expansion unit
If you wish to install the InfiniBand 4X DDR Expansion Card (CFFh) you
Note:
must install Red Hat Enterprise Linux 5.2 or higher.
Installation guidelines
Before you begin, read the following:
v Read the safety information beginning on page vii and the guidelines in “Handling
static-sensitive devices” on page 30. This information will help you work safely
with the blade server and components.
v Yo u do not have to turn off the blade server or disconnect the BladeCenter unit
from power to install or replace any of the hot-swappable modules on the rear of
the BladeCenter unit.
v Before you remove a hot-swappable blade server from the BladeCenter unit, you
must shut down the operating system on it by typing the shutdown -h now
command or choosing the shut down option from your GUI. See “Turning off the
blade server” on page 4 for details. You do not have to shut down the
BladeCenter unit itself.
v Blue on a component indicates touch points, where you can grip the component
to remove it from or install it in the blade server or BladeCenter unit, open or
close a latch, and so on.
v Orange on a component or an orange label on or near a component indicates
that the component can be hot-swapped. You can remove or install the
component while the blade server or BladeCenter unit is running providing the
blade server or BladeCenter unit and operating system support the
hot-swappable capability. Orange can also indicate touch points on
hot-swappable components. See the instructions for removing or installing a
specific hot-swappable component for any additional procedures that you might
have to perform before you remove or install the component.
© Copyright IBM Corp. 2006, 2008 29
System reliability guidelines
To help ensure proper cooling and system reliability, make sure that:
v The ventilation holes on the blade server are not blocked.
v Each of the blade bays on the front of the BladeCenter unit has a blade server or
filler blade installed. Do not operate the BladeCenter unit for more than 1 minute
without a blade server or filler blade installed in each blade bay.
v Yo u have followed the reliability guidelines in the documentation that comes with
the BladeCenter unit.
Handling static-sensitive devices
Attention: Static electricity can damage electronic devices and your system. To
avoid damage, keep static-sensitive devices in their static-protective packages until
you are ready to install them.
To reduce the possibility of electrostatic discharge, observe the following
precautions:
v Limit your movement. Movement can cause static electricity to build up around
you.
v Handle the device carefully, holding it by its edges or its frame.
v Do not touch solder joints, pins, or exposed printed circuitry.
v Do not leave the device where others can handle and damage it.
v While the device is still in its static-protective package, touch it to an unpainted
metal part of the BladeCenter chassis for at least 2 seconds. This drains static
electricity from the package and from your body.
v Remove the device from its package and install it directly into the blade server or
BladeCenter unit without setting the device down. If it is necessary to set down
the device, put it back into its static-protective package. Do not place the device
on the blade server cover or on a metal surface.
v Take additional care when handling devices during cold weather. Heating reduces
indoor humidity and increases static electricity.
v Wear an electrostatic-discharge wrist strap, if one is available.
30 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Removing the blade server from the BladeCenter unit
Attention:
v To maintain proper system cooling, do not operate the BladeCenter unit for more
than 1 minute without a blade server or filler blades installed in each blade bay.
v Note the number of the bay that contains the blade server before you remove it.
You must reinstall the blade server in the same bay from which it was removed.
Reinstalling a blade server into a different bay than the one from which it was
removed could have unexpected consequences, such as incorrect reconfiguration
of the blade server. Some blade server configuration information and update
options are established according to bay number.
If you reinstall the blade server into a different bay, you might have to reconfigure
the blade server.
Removing the blade server
The blade server is a hot-swappable device, and the blade bays in the BladeCenter
unit are hot-swappable bays. Therefore, you can install or remove the blade server
without removing power from the BladeCenter unit. However, you must turn off the
blade server before removing it from the BladeCenter unit.
Complete the following steps to remove the blade server:
Releasehandles
open
Figure 5. Removing the blade server
1. Read the safety information beginning on page vii and “Installation guidelines”
on page 29.
2. If the blade server is operating, the power on LED is lit continuously (steady).
Before you remove a blade server from the BladeCenter unit, you must shut
down the operating system on it by typing the shutdown -h now command or
choosing the shut down option from your GUI. See “Turning off the blade
server” on page 4 for details. You do not have to shut down the BladeCenter
unit itself.
3. Open the two release levers as shown in the illustration. The blade server
moves out of the bay approximately 0.6 cm (0.25 inch).
4. Pull the blade server out of the bay.
5. Place either a filler blade or a new blade server in the bay within 1 minute.
Chapter 4. Installing and removing replaceable units 31
Opening and removing the blade server cover
You must open the blade server cover to access, install or remove any of the
replaceable items except the front bezel assembly.
Cover pins
Cover release
Cover release
Figure 6. Opening the blade server cover
Complete the following steps to open the blade server cover:
1. Read the safety information beginning on page vii and “Installation guidelines”
on page 29.
2. Carefully place the blade server on a flat, static-protective surface, with the
cover side up.
3. Press the blue blade cover release on each side of the blade server and lift the
outer cover open (see Figure 6).
4. If you want to remove the cover, carefully lift it from the cover pins and set it
aside (see Figure 6).
Statement 21:
CAUTION:
Hazardous energy is present when the blade server is connected to the power
source. Always replace the blade cover before installing the blade server.
Removing the BladeCenter PCI Express I/O Expansion Unit
You must remove BladeCenter PCI Express I/O Expansion Unit, if installed, to
access, install or remove any of the replaceable items except the front bezel
assembly.
32 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Cover pins
Cover release
Cover release
Figure 7. Removing the expansion unit
Complete the following steps to remove BladeCenter PCI Express I/O Expansion
Unit:
1. Read the safety information beginning on page vii and “Installation guidelines”
on page 29.
2. Carefully place the blade server on a flat, static-protective surface, with the
expansion unit side facing up.
3. Press the blue blade cover release on each side of the blade server and lift the
expansion unit (see Figure 7).
4. To remove the expansion unit, carefully lift it from the cover pins and set it
aside.
Statement 21:
CAUTION:
Hazardous energy is present when the blade server is connected to the power
source. Always replace the blade cover before installing the blade server.
Installing the optional InfiniBand card
The InfiniBand card connects to the high-speed connector on the system board
using the two expansion card locator pins to assist with fitting and locking in place.
Use the blue handling areas to handle the card, and, when it has been placed in
position, to lock it into place.
Note: If you wish to install the InfiniBand 4X DDR Expansion Card (CFFh) you
must install Red Hat Enterprise Linux 5.2 or higher.
Chapter 4. Installing and removing replaceable units 33
Locking clip
Locator pin holes
Handling areas
Figure 8. InfiniBand card handling areas
Complete the following steps to install the InfiniBand card:
1. Shut down the BladeCenter QS21.
2. Remove the BladeCenter QS21 from BladeCenter unit.
3. Remove the top cover.
4. Locate the high-speed connector at location J200 on the system board.
Ball stud
High-speed connector
Expansion card
standoffs
with locator pins
1
Figure 9. Expansion card connector, locator pins, and ball stud
5. Remove the connector cover.
6. Locate the expansion card locator pins at the back of the system board.
7. Locate the connector and ball socket on the InfiniBand card.
34 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Locking clip
Connector
Locator pin holes
Ball socket
Figure 10. InfiniBand card reverse view
8. Slide the InfiniBand card locator pin holes over the expansion card locator
pins. The card rests on the locator pins.
Locator pin
Expansion card
Expansion
connector
cover
Expansion
card
standoff
Figure 11 . Positioning the InfiniBand card
9. Check that the ball socket on the card is over the corresponding ball stud on
the main board then carefully press the InfiniBand card into position. Use the
blue areas only to avoid damage to the card.
10. Check that the blue locking clip has locked into position.
11. If you do not want to install any other options, replace the cover and insert the
BladeCenter QS21 into the BladeCenter unit.
Attention: The connectors on the system board and the InfiniBand card are not
designed for repeated removal or replacement of components. Avoid removing the
InfiniBand card once it is in position,
Chapter 4. Installing and removing replaceable units 35
Adding I/O DDR2 memory modules
This section describes how to add extra I/O DDR2 memory. There are two slots per
Cell/B.E. companion chip allowing up to 1 GB of memory for each Cell/B.E.
companion chip for I/O buffering.
DIMM
filler
DIMM filler
DIMM slot at JDIM11
DIMM slot at JDIM10
DIMM slot at JDIM00
DIMM slot at JDIM01
Figure 12. DIMM slot location
You must add memory as pairs of dual inline memory modules (DIMMs). Yo u may
fit one or more memory modules for each buffer, but each I/O buffer must use the
same type of memory module and have the same amount of memory. The minimum
amount of memory you can add is 512 MB per buffer, or one module per buffer. If
you fit a single pair of DIMMs you must use slots JDIM00 and JDIM11.
The BladeCenter QS21 supports VLP DDR2 512 MB DIMMs only.
The DIMMs are used as memory for the I/O buffers only. You cannot
Note:
increase the size of system memory which is fixed at 1GB for each Cell/B.E.
processor.
To install extra I/O buffer memory, complete the following steps:
1. Shut down the BladeCenter QS21.
2. Remove the BladeCenter QS21 from the BladeCenter unit.
3. Open the top cover.
4. Locate the DIMM slots in which you want to insert the I/O DDR2 memory.
modules.
36 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
1
Slot at JDIM01
Slot at JDIM00
Slot at JDIM11
Slot at JDIM10
Figure 13. DIMM slot location
There are four DIMM slots, two for each Cell/B.E. companion chip. If this is the
first pair of DIMMs you are installing, use slots 00 and 11. Slots 00 and 11 are
the two outer slots as shown in Figure 13. For a second pair of DIMMs, use
the remaining slots 01 and 10.
5. Remove the DIMM fillers from the slots where you want to insert the DIMMs.
Retain the DIMM fillers. Yo u need them if you remove any DIMMs from the
blade server as they are an important part of the blade server cooling system.
6. Place the DIMM in the slot, contact side down. Check the orientation of the
module. The central locating pin in the slot should match the corresponding
cut-out on the module.
7. Carefully press the module into place until the retaining clips snap into
position. Make sure that the clips are locked properly.
Figure 14. DIMM retaining clips
8. Repeat steps 6 and 7 until you have installed all the optional DIMMs.
9. Ensure that all unused DIMM slots are fitted with DIMM fillers.
10. If you do not want to install any other options, replace the cover and insert the
BladeCenter QS21 into the BladeCenter unit.
Replacing DIMM fillers
For the BladeCenter QS21 cooling system to work properly there must be no empty
DIMM slots. Unused slots must be fitted with DIMM fillers. Replace faulty DIMM
fillers and, if you remove memory modules, fit empty slots with DIMM fillers.
To install or replace DIMM fillers, complete the following steps:
1. Shut down the BladeCenter QS21.
DIMM
Retaining clip
Chapter 4. Installing and removing replaceable units 37
2. Remove the BladeCenter QS21 from BladeCenter.
3. Open the top cover.
4. Remove any faulty DIMM fillers.
a. Open the retaining clips on either end of the DIMM slot.
b. Pull the filler out of the slot.
5. If you remove memory modules be sure to remove them in pairs. If you keep a
single pair of memory modules they must be in the outermost slots, JDIM00 and
JDIM11. See Figure 13 on page 37 for further information.
a. Open the retaining clips on either end of the DIMM slot.
b. Pull the module out of the slot.
6. Carefully press the DIMM filler into the empty DIMM slot until the retaining clips
snap into position.
7. Repeat step 6 until all unused slots are fitted with DIMM fillers.
8. Replace the cover and insert the BladeCenter QS21 into the BladeCenter unit.
Installing the SAS expansion card
The BladeCenter QS21 does not have any built-in disk storage. The SAS expansion
card allows you to connect storage to the BladeCenter QS21. Use the blue handling
areas to handle the card.
Handling areas
Figure 15. SAS expansion card handling areas
Complete the following steps to install the SAS expansion card:
1. Shut down the BladeCenter QS21.
2. Remove the BladeCenter QS21 from the BladeCenter unit.
3. Open the top cover.
4. Locate the two SAS expansion card connectors at locations J22 and JFC_18
and the ball stud on the system board.
38 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Connectors
for SAS
expansion card
1
Ball stud
Figure 16. SAS expansion card connector and ball stud location
5. Locate the connectors and the ball socket on the SAS card.
Connectors
Ball socket
Figure 17. SAS expansion card reverse side
6. Align the connectors on the system board with the connector on the SAS card.
Expansion
card
Figure 18. SAS expansion card location
7. Using the blue handling areas, carefully push the card down to insert it into the
connectors. Ensure that the ball stud on the system board engages with the ball
socket on the SAS expansion card.
8. If you do not want to install any other options, replace the cover and insert the
BladeCenter QS21 into the BladeCenter unit.
Chapter 4. Installing and removing replaceable units 39
Installing the BladeCenter PCI Express I/O Expansion Unit
Important:
v A BladeCenter QS21 with the BladeCenter PCI Express I/O Expansion Unit
installed takes up two contiguous slots in the BladeCenter chassis
v Yo u must remove any expansion card using the high-speed connector before
installing the expansion unit.
Cover pins
Cover release
Cover release
Figure 19. Installing the expansion unit
Complete the following steps to install the BladeCenter PCI Express I/O Expansion
Unit:
1. Read the safety information beginning on page vii and “Installation guidelines”
on page 29.
2. Remove the blade server cover and set it aside. See “Opening and removing
the blade server cover” on page 32 for further information.
3. Remove the connector cover or any optional card from the high-speed
connector. Figure 9 on page 34 shows the location of the high-speed connector.
4. Lower the expansion unit so that the slots at the rear slide down onto the cover
pins at the rear of the blade server, as shown in Figure 19.
5. Carefully close the expansion unit as shown in Figure 19 until it clicks into
place.
40 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Removing the blade-server front bezel assembly
Before you can replace a defective system board assembly or blade server front
bezel assembly, you must first remove the blade server front bezel assembly.
Figure 20 shows how to remove the front bezel assembly from a blade server.
Blade Cover
Blade-Cover
Release
Bezel-Assembly
Release
Blade-Cover
Release
Control-Panel
Cable
Bezel-Assembly
Release
Control-Panel
Connector
Figure 20. Removing the front bezel assembly
Complete the following steps to remove the front bezel assembly:
1. Read the safety information beginning on page vii and “Installation guidelines”
on page 29.
2. Open the blade server cover.
3. Carefully disconnect the control panel cable from the control panel connector.
4. Press the front bezel release on both sides of the system board and pull the
front bezel assembly away from the blade server.
5. Store the front bezel assembly in a safe place.
Replacing the system board base and planar
Bezel
Figure 21. System board assembly
Chapter 4. Installing and removing replaceable units 41
Complete the following steps to replace the system board base and planar:
1. Shut down the BladeCenter QS21.
2. Remove the BladeCenter QS21 from the BladeCenter unit.
3. Open and remove the top cover, and set it aside. See “Opening and removing
the blade server cover” on page 32 for detailed instructions.
4. Remove the front bezel from the defective system board and set it aside. See
“Removing the blade-server front bezel assembly” on page 41 for detailed
instructions.
5. Remove any optional components from the defective system board and set
them aside.
6. Note down the serial number of the defective system board. Yo u need this
later to update the VPD information.
7. On the replacement system board, install the front bezel assembly. See
“Installing the front bezel assembly” on page 47for detailed instructions.
8. On the replacement system board, reinstall any options you removed from the
defective system board. See “Installing the optional InfiniBand card” on page
33, “Installing the SAS expansion card” on page 38 and “Adding I/O DDR2
memory modules” on page 36 for detailed instructions.
9. Replace the cover and close. See “Closing the blade server cover” on page 49
for details.
10. Reinstall the blade server in the BladeCenter unit.
11. Update the BMC, system and optional expansion card firmware as described in
Chapter 2, “Configuring the blade server,” on page 9.
12. Using SMS, update the VPD information by entering the serial number of the
defective system board. See “Adding FRU information” on page 13 for details.
13. Configure the replacement blade server to boot from the same device as the
original defective unit. See the QS21 Installation and User's Guide for details.
Note:
Replacing the battery
IBM has designed this product with your safety in mind. The lithium battery must be
handled correctly to avoid possible danger. If you replace the battery, you must
adhere to the following instructions.
Note: In the U. S., call 1-800-IBM-4333 for information about battery disposal.
If you replace the original lithium battery with a heavy-metal battery or a battery with
heavy-metal components, be aware of the following environmental consideration.
Batteries and accumulators that contain heavy metals must not be disposed of with
normal domestic waste. They will be taken back free of charge by the manufacturer,
distributor, or representative, to be recycled or disposed of in a proper manner.
To order replacement batteries, call 1-800-IBM-SERV within the United States, and
1-800-465-7999 or 1-800-465-6666 within Canada. Outside the U.S. and Canada,
call your IBM authorized reseller or IBM marketing representative.
Note: After you replace the battery, the blade server is automatically reconfigured.
However, you must reset the system date and time through the operating
system that you installed.
Providing the options on the new blade server are the same as on the
old you do not have to reinstall or reconfigure the operating system but
simply configure the boot options to boot from the boot device.
42 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Statement 2:
CAUTION:
When replacing the lithium battery, use only IBM Part Number 43W9859 or
03N2449 or an equivalent type battery recommended by the manufacturer. If
your system has a module containing a lithium battery, replace it only with
the same module type made by the same manufacturer. The battery contains
lithium and can explode if not properly used, handled, or disposed of.
Do not:
v Throw or immerse into water
v Heat to more than 100°C (212°F)
v Repair or disassemble
Dispose
of the battery as required by local ordinances or regulations.
Note: See “Battery return program” on page 116 for more information about battery
disposal.
Complete the following steps to replace the battery:
1. Read the safety information beginning on page vii and “Installation guidelines”
on page 29.
2. Follow any special handling and installation instructions that come with the
battery.
3. If the blade server is operating, shut down the operating system by typing the
shutdown -h now command or by choosing shut down from the GUI. If the
blade server was not powered off, press the power control button (behind the
blade server control panel door) to turn off the blade server. See “Blade server
controls and LEDs” on page 6 for more information about the location of the
power control button.
4. Remove the blade server from the BladeCenter unit (see “Removing the blade
server from the BladeCenter unit” on page 31 for information).
5. Carefully place the blade server on a flat, static-protective surface.
6. Open the blade server cover (see “Opening and removing the blade server
cover” on page 32 for instructions).
7. Locate the battery (connector BH1) on the system board.
Chapter 4. Installing and removing replaceable units 43
1
Battery
Figure 22. Battery location
8. Remove the battery:
a. Use one finger to press the top of the battery clip away from the battery.
The battery pops up when released.
b. Use your thumb and index finger to lift the battery from the socket.
c. Dispose of the battery as required by local ordinances or regulations.
9. Insert the new battery:
a. Tilt the battery so that you can insert it into the socket, under the battery
clip.
b. Press the battery down into the socket until it clicks into place. Make sure
the battery clip holds the battery securely.
Close the blade server cover (see “Closing the blade server cover” on page
10.
49).
Statement 21:
CAUTION:
Hazardous energy is present when the blade server is connected to the
power source. Always replace the blade cover before installing the blade
server.
11. Reinstall the blade server into the BladeCenter unit.
12. Turn on the blade server (see “Turning on the blade server” on page 3).
13. Reset the system date and time through the operating system that you
installed. For additional information, see your operating-system documentation.
44 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Using the miscellaneous parts kit
The miscellaneous parts kit contains replacement parts and screws to be used if
the original item is damaged. It contains the following items:
Kit, Miscellaneous Parts Quantity
Socket, alignment 4
Cover Connector Plug, 200 position 4
Pin, InfiniBand expansion card support, pivot point blocks 4
Ball stud, InfiniBand expansion card support 4
Tray, InfiniBand expansion card support end bracket 2
Pin, alignment 2
Screw, Plastite 4-20x6.35 8
Screw, 3.5 x 6 Pan Head, Philips, Planar 6
QS21 Planar Light box with transparency assembly 1
Impedance Air Baffle Top, Foam 4
Impedance Air Baffle DIMM Sides 4
To replace a support or bracket you need a Philips head screwdriver.
Replacing the ball studs
The ball studs help support the optional expansion cards and should be replaced if
damaged.
To remove and replace a ball stud, complete the following steps:
1. Using a Philips head screwdriver pierce the label at the red circle corresponding
with the ball stud you wish to replace.
Screw locations
Blade Service Information
Blade Cover and Be el
Blade Cover
Blade-Cover
Release
Bezel-Assembly
Release
I/O E pansion Option
Expansion
Card
Expansion
Card
Standoff
Expansion Card
Blade-Cover
Release
Control-Panel
Cable
Control-Panel
Connector
Bezel-Assembly
Release
Bezel
Blade
Expansion
Connector
Cover
S stem Board
DIMM 00
DIMM 01
Microprocessor 1
Microprocessor 2
DIMM 10
DIMM 11
Control Panel
Light Path Diagnostics ( )
Light Path Diagnostics LED (Lights when capacitor
is charged.) LP1 indicates the base blade light path.
LP
1
If LP1 does not light, the capacitor should be charged
TEMP
or the Light Path is defective.
SBRD
Light Path Diagnostics Button: Press button to find faults
on the system board. If a memory LED is on, reseat
CPU
the component. If it is still on, replace the component.
NMI
If any of the other LEDs are on, check the
Determination and Service Guide
the problem.
Check BladeCenter cooling (blowers and air inlets at front
of system). Check room temperature.
Reboot blade server. If error still exists, replace system board.
Check error log for additional information. Reboot blade server.
If error still exists, replace system board.
Memor Option
NOTES:
• DIMM
DIMM
Filler
DIMM Slot 00
DIMM Slot 01
to identify and solve
DIMM Installation
Pair 1
Slots 10 & 00
Pair 2
Slots 11 & 01
1
Problem
DIMM Filler
DIMM Slot 11
DIMM Slot 10
2. Carefully unscrew the ball stud and remove.
Chapter 4. Installing and removing replaceable units 45
3. Position the replacement ball stud over the hole and screw into position, taking
care not to over-tighten as this might damage the system board.
46 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Finishing the installation
To complete the installation you must:
1. Reinstall the front bezel assembly on the blade server if removed. See
“Installing the front bezel assembly” for further information.
2. Ensure there is a DIMM filler or a DIMM in each of the I/O buffer DIMM slots.
3. Replace and close the blade server cover. See “Closing the blade server cover”
on page 49 for further information.
Statement 21:
CAUTION:
Hazardous energy is present when the blade server is connected to the
power source. Always replace the blade cover before installing the blade
server.
4. Reinstall the blade server into the BladeCenter unit.
5. Turn on the blade server. See “Turning on the blade server” on page 3 for
further information.
6. If you have replaced the battery or the system board assembly, reset the
system date and time through the operating system that you installed. For
additional information, see your operating system documentation.
If you have just powered on the BladeCenter unit, wait until the power on
Note:
LED on the blade server flashes slowly before powering on the blade server.
Installing the front bezel assembly
The following illustration shows how to reinstall the front bezel assembly on the
blade server.
Chapter 4. Installing and removing replaceable units 47
Blade Cover
Blade-Cover
Release
Bezel-Assembly
Release
Blade-Cover
Release
Control-Panel
Cable
Bezel-Assembly
Release
Bezel
Control-Panel
Connector
Figure 23. Reinstalling the front bezel assembly
Complete the following steps to install the blade server front bezel assembly:
1. Read the safety information beginning on page vii and “Installation guidelines”
on page 29.
2. Connect the control panel cable to the control panel connector on the system
board assembly.
3. Carefully slide the front bezel assembly onto the blade server, as shown in
Figure 23, until it clicks into place.
Make sure that you do not pinch any cables when you reinstall the front
Note:
bezel assembly.
48 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Closing the blade server cover
Important: The blade server cannot be inserted into the BladeCenter unit until the
cover is installed and closed. Do not attempt to override this protection.
Cover pins
Cover release
Cover release
Figure 24. Closing the blade server cover
Complete the following steps to close the blade server cover:
1. Read the safety information beginning on page vii and “Installation guidelines”
on page 29.
2. If you removed the front bezel assembly, replace it now. See “Installing the front
bezel assembly” on page 47 for instructions, and Figure 24.
3. Lower the cover so that the slots at the rear slide down onto the pins at the rear
of the blade server, as shown Figure 24. Before closing the cover, make sure
that all components are installed and seated correctly and that you have not left
loose tools or parts inside the blade server.
4. Carefully close the cover as shown in Figure 24 until it clicks into place.
Input/output connectors and devices
The BladeCenter unit contains the input/output connectors that are available to the
blade server. See the documentation that comes with the BladeCenter unit for
information about the input/output connectors.
Chapter 4. Installing and removing replaceable units 49
50 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Chapter 5. Diagnostics and troubleshooting
This chapter provides basic troubleshooting information to help you solve some
common problems that might occur while setting up your blade server.
A problem with the BladeCenter QS21 can relate either to the BladeCenter QS21 or
the BladeCenter unit.
A problem with the blade server exists if the BladeCenter unit contains more than
one blade server and only one of the blade servers has the symptom. If all of the
blade servers have the same symptom, then the problem relates to the BladeCenter
unit. For more information, see the documentation that comes with your
BladeCenter unit.
The BladeCenter QS21 is supported in the BladeCenter H Type 8852 unit
Note:
and the BladeCenter HT Type 8740 and 8750 (enterprise environment only)
unit. However you can put other blade servers compatible with the
BladeCenter units in the same unit as a BladeCenter QS21.
Prerequisites
Basic checks
Before you start problem determination or servicing, check that:
v The BladeCenter QS21 is inserted correctly into the BladeCenter unit
v All components are connected correctly
v The BladeCenter QS21 has the latest firmware updates. These include:
– BMC
– System
– Gigabit Ethernet controller
– SAS expansion card (if installed)
– InfiniBand high-speed expansion card (if installed)
If you install the blade server in the BladeCenter unit and the blade server does not
start, always perform the following basic checks before continuing with more
advanced troubleshooting:
v Make sure that the BladeCenter unit is correctly connected to a power source.
v Reseat the blade server in the BladeCenter unit.
v If the power on LED is flashing slowly, the blade server may be turned off. To
turn on the blade server, see “Turning on the blade server” on page 3 for further
information.
v If you have just added a new optional device or component, make sure that it is
correctly installed and compatible with the blade server and its components. If
the device or component is not compatible, remove it from the blade server,
reinstall the blade server in the BladeCenter unit, and then restart the blade
server.
v Use Advanced Management Module to check that the blade server appears in
the list of blade servers available.
© Copyright IBM Corp. 2006, 2008 51
Finding troubleshooting information
Table 2 describes where to find troubleshooting information in this section.
Note: Many components, including the CPU, RAM and power supplies cannot be
exchanged in the field. The only replaceable parts are the optional SAS
daughter card, battery, front bezel assembly, I/O buffer DIMM memory, and
the optional InfiniBand card.
Table 2. Where to find troubleshooting information
Component Where to find information
SAS expansion card
Front bezel
High-speed InfiniBand expansion card
Memory “Boot errors and handling” on page 72
LEDs
Power
Network connections
Service processor
Software problems
For troubleshooting information about other BladeCenter components, see the
appropriate Problem Determination and Service Guide , and other product-specific
documentation. See “Related documentation” on page 1 for additional information.
For the latest editions of the IBM BladeCenter documentation, go to
http://www.ibm.com/support/us/en/ on the World Wide Web.
Troubleshooting charts
The following tables list problem symptoms and suggested solutions. If you cannot
find the problem in the troubleshooting charts, or if carrying out the suggested steps
do not solve the problem, have the blade server serviced.
“Solving undetermined problems” on page 95
“Troubleshooting charts” on page 52
If you have problems with an adapter, monitor, keyboard, mouse, or power module,
see the Problem Determination and Service Guide that comes with your
BladeCenter unit for more information.
If you have problems with an Ethernet switch module, I/O adapter, or other optional
device that can be installed in the BladeCenter unit, see the Problem Determination
and Service Guide or other documentation that comes with the device for more
information.
Problems indicated by the front panel LEDs
The state of the LEDs on the front of the blade can help in isolating problems.
52 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Information
LED
Location
LED
Activity
LE D
Power-on
LED
Media-tray
select button
Power-control
button
Blade-error
LED
NMI
reset-button
CD
Figure 25. Power-control button and LEDS
The table below gives an explanation and a suggested action, if required, for each
LED.
Table 3. Explanation of LEDs and their states
LED State Explanation Suggested action
Blade error LED Amber A system error has occurred on
the blade server.
Information LED Amber Information about a system
error has been placed in the
Advanced Management Module
Event Log. The information LED
remains on until turned off by
Advanced Management Module
or through IBM Director
Console.
Activity LED Green There is network activity. No action required. For further
Check the BladeCenter error
log, see “Problem reporting” on
page 94.
Check Advanced Management
Module to see what the
problem is. See the
BladeCenter Management
Module User's Guide for further
information about the error.
information about
troubleshooting networks, see
“Network connection problems”
on page 57.
Chapter 5. Diagnostics and troubleshooting 53
Table 3. Explanation of LEDs and their states (continued)
LED State Explanation Suggested action
Power-on LED Flashing rapidly The service processor on the
No action required
blade server is communicating
with the BladeCenter
Management Module.
Flashing slowly The blade server has power but
Turn on if required
is not turned on.
Lit continuously (steady) The blade server has power
No action required
and is turned on.
Not lit. Blade server not powered.
1. Reseat blade server.
2. Check if BladeCenter power
supplies numbers 3 and 4
are installed and powered.
If they are not, install and
power them or use slots
1-5.
3. Go to “Power problems” on
page 57
Problems indicated by the system board LEDS
The blade server must be removed from the BladeCenter unit and the cover
removed before you can use the light path LEDs for diagnostics. To activate the
light box and the other light path LEDs, press the light path diagnostics switch.
The location of each LED on the system board is shown in the table below.
Temperature fault LED
System board LED
CPU fail LED
NMI error LED
TEMP
S BRD
CPU
NMI
LP
Light box
1
Error LED (JDIM ) 00
Light path
diagnostics
LED
Light path
diagnostics
switch
JDIM01 slot
JDIM00 slot
Error LED (JDIM01)
Error LED (JDIM11)
Error LED (JDIM10)
JDIM11 slot
JDIM10 slot
Figure 26. Light box and system board LEDs
54 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Table 4. System board LEDs
Board
LED Color
location Explanation Comments
Status LEDs The status LEDs are listed for
Heartbeat Green D16 Indicates the BMC is functional.
Alert Yellow D15 Indicates an error condition has
occurred on the system board.
Ethernet 1 activity Green D12 Indicates Ethernet 1 is active and
reasons of completeness since
they are for use by IBM service
only and are not normally visible.
They are not activated by the light
path diagnostics switch.
sending or receiving packets.
Ethernet 0 activity Green D11 Indicates Ethernet 0 is active and
sending or receiving packets.
BE0_PLL_LOCK Green D8 Indicates the phased lock loop of
Cell/B.E.-0 is working.
BE1_PLL_LOCK Green D13 Indicates the phased lock loop of
Cell/B.E.-1 is working.
MM_SELECT_A Green D19 Indicates Advanced Management
Module A is active.
MM_SELECT_B Green D18 Indicates Advanced Management
Module B is active.
Light path LEDs
DIMM at JDIM11
error
DIMM at JDIM10
error
DIMM at JDIM01
error
DIMM at JDIM00
error
Light box LEDs
Yellow D21 There has been a failure in the
I/O DIMM module.
Yellow D20
See Figure 26 on page 54 for the
location of each DIMM and its
Yellow D10
associated LED.
Yellow D7
Either remove or replace the
DIMM and reboot.
Chapter 5. Diagnostics and troubleshooting 55
Table 4. System board LEDs (continued)
Board
LED Color
location Explanation Comments
Temperature fault Yellow Light
box
The blade server has exceeded
the operational temperature
range.
v Using the Advanced
Management Module, check
that the BladeCenter unit
cooling system is operating
correctly.
v Replace any missing filler
blades in the BladeCenter unit.
v Replace any missing filler
blades in the BladeCenter
QS21 DIMM sockets.
v Check that other blade servers
are operating within the
recommended temperature
range.
v Replace the blade server,
power on and boot. Check
Advanced Management
Module for errors.
NMI error Yellow The NMI pinhole reset on the
front panel has been pressed.
CPU fail Yellow One of the Cell BE processors
has failed.
System board Yellow A critical error has occurred in a
component on the system board.
Light path
diagnostics
Green Lights when the light path
diagnostics switch is pressed.
Indicates that the capacitor is
charged and the light path LEDs
can light to show any errors.
If
the problem persists, contact
your IBM service representative
as the system board may need
servicing.
Pressing the reset causes the
operating system to call the
system debugger.
Contact your IBM service
representative as the system
board needs replacement.
Contact your IBM service
representative as the system
board may need replacing.
If this LED does not light then the
light path LEDs cannot function.
Reinstall the blade server in the
BladeCenter unit and power on to
recharge.
If this fails to resolve the problem,
there is a problem with the
system board and it may need
replacement.
56 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Power problems
Power symptom Suggested action
The blade server does not
turn on.
1. Make sure that:
a. The power-on LED on the front of the BladeCenter unit is lit.
b. The LEDs on all the BladeCenter power modules are lit.
c. The power-on LED on the blade-server control panel is flashing slowly.
v The power-on LED flashes rapidly for a short period to indicate it is
communicating with Advanced Management Module. If the power-on LED
to flash rapidly and continues to do so, the blade server is not
communicating with the management module; reseat the blade server and
reboot.
v If the power LED is off, either the blade bay is not receiving power, the
blade server is defective, the Advanced Management Module firmware is an
earlier version and does not support this function, or the LED information
panel is loose or defective.
d.
Local power control for the blade server is enabled. Check using the
Advanced Management Module Web interface. The blade server might have
been instructed through the Advanced Management Module to turn on.
If you have just installed a new option in the blade server, remove it, and restart
2.
the blade server. If the blade server now powers on, troubleshoot the option. See
the documentation that comes with the option for further information.
3. Try another blade server in the blade bay. If it works, you may need to have a
trained service technician replace the system blade assembly.
Power throttling
Be aware that the BladeCenter unit automatically reduces the BladeCenter QS21
processor speed if certain conditions are met. One such condition is temperature
thresholds being exceeded, for example, when the blade server is running in
acoustic mode. This throttling occurs independent of your power configuration. Full
processor speed is restored automatically when the conditions that have caused the
throttling have been resolved.
Network connection problems
Network connection
symptom Suggested action
One or more blade servers
are unable to communicate
with the network.
Make sure that:
v The switch modules for the network interface being used are installed in the
correct BladeCenter bays and are configured and operating correctly.
v The settings in the switch module are correct for the blade server (settings in the
switch module are blade server specific).
For
additional information, see:
v Chapter 2, “Configuring the blade server,” on page 9
v The Problem Determination and Service Guide that comes with your BladeCenter
unit
v Other product-specific documentation that comes with the switch module
For the latest editions of the IBM BladeCenter documentation, go to
Note:
http://www.ibm.com/support/us/en/.
If the problem remains, see “Solving undetermined problems” on page 95.
If all the blades cannot communicate with the network, check the network itself for
problems.
Chapter 5. Diagnostics and troubleshooting 57
Service processor problems
Service processor symptom Suggested action
Service processor reports a
general monitor failure.
1. If the blade server is operating, shut down the operating system.
2. If the blade server was not turned off, press the power-control button (behind the
blade server control-panel door) to turn off the server.
3. Remove the blade server from the BladeCenter unit.
4. Wait 30 seconds and reinstall the blade server into the BladeCenter unit.
5. Restart the blade server.
If
the problem remains, see “Solving undetermined problems” on page 95
Software problems
Symptom Suggested action
You suspect a software
problem.
1. To determine whether the problem is caused by the software, make sure that:
v The blade server has the minimum memory that is needed to use the software.
For memory requirements, see the software documentation.
v The software is designed to operate on the blade server.
v Other software works on the blade server.
v The software works on another server.
If you received any error messages when using the software, see the software
2.
documentation for a description of the messages and suggested solutions to the
problem.
3. Contact the software vendor.
58 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Recovering the system firmware code
The system firmware is contained in two separate images in the flash memory of
the blade server: temporary and permanent. These images are referred to as TEMP
and PERM, respectively. The system normally starts from the TEMP image, and the
PERM image serves as a backup. If the TEMP image becomes damaged, such as
from a power failure during a firmware update, the system automatically starts from
the PERM image.
If the TEMP image is damaged, you can recover the TEMP image from the PERM
image. See “Recovering the TEMP image from the PERM image” for further
information.
Checking the boot image
To check whether the system has started from the PERM image, enter:
cat /proc/device-tree/openprom/ibm,fw-bank
A P is returned if the system has started from the PERM image.
Booting from the TEMP image
To initiate a boot from the TEMP image after the system has booted from the PERM
side, complete the following steps:
1. Turn off the blade server.
2. Restart the blade system management processor from the Advanced
Management Module.
3. Turn on the blade server.
If the temp side is corrupted the boot times out, and an automatic reboot
Note:
occurs after switching to the PERM side.
the blade server does not restart, you must replace the system board assembly.
If
Contact a service support representative for assistance.
Recovering the TEMP image from the PERM image
To recover the TEMP image from the PERM image, you must copy the PERM
image into the TEMP image. To perform the copy, complete the following steps:
1. Copy the perm image to the temp image. Using the Linux operating system,
type the following command:
update_flash -r
2. Shut down the blade server using the operating system.
3. Restart the blade system management processor from the management
module.
4. Turn on the blade server.
might need to update the firmware code to the latest version. See “Updating
You
the system and BMC firmware” on page 15 for more information on updating the
firmware code.
Supported boot media
The BladeCenter QS21 can boot from the operating system installation CDs or
DVDs to allow the operating system to be installed.
Chapter 5. Diagnostics and troubleshooting 59
Once the operating system is installed, the BladeCenter QS21 can also boot either
from attached SAS storage if you have the installed the optional SAS Expansion
Card or from the network.
If you wish to perform a standard Bootp/TFTP network boot, please note the
following restrictions:
v Only the built-in Gigabit Ethernet Controller of I/O Bridge is supported
v Only boot through the Ethernet switch on the top side of BladeCenter
v No fall back or configurable change to the bottom switch is possible
v In the Advanced Management Module you need to set boot list to Network
v There is no support for a router between the blade and TFTP server. Only local
TFTP is supported.
Advanced Management Module to configure the required boot mode. See IBM
Use
BladeCenter Management Module Installation Guide for more information.
Booting the system
This section provides an overview on how to interpret the console output of the host
firmware. The output is grouped into several parts, which are detailed below.
1. The first part of the boot process shows the system name and build date. Yo u
see an error at this point if the firmware image is corrupted.
***************************************************************************
QS21 Firmware Starting
Check ROM = OK
Build Date = Apr 24 2007 13:43:46
FW Version = QB-1.6.0-0
Press "F1" to enter Boot Configuration (SMS)
2. Memory initialization follows next.
Note: It can take several seconds to initialize the RAMBUS memory.
3. The memory is initialized. The screen displays details of the vendor and the
speed of memory modules.
Initializing memory configuration...
MEMORY
Modules = Elpida 512MB, 3200 Mhz
XDRlibrary = v0.32, Bin A/C, RevB, DualDD
Calibrate = Done
Test = Done
The next screens show the open firmware section of the boot process and
provide checkpoints and an overview which adapters are available in the
system. The details in the adapter list are not meaningful.
Note: The warning(!) Permanent Boot ROM is displayed if there is a problem
with the TEMP image and system firmware is running on from the PERM
image. Yo u should correct this problem as soon as possible. See
“Recovering the TEMP image from the PERM image” on page 59 for
further information.
60 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
OPEN FIRMWARE Adapters on 000001460ec0000000
00 0800 (D) : 14e4 16a8 network [ ethernet ]
00 0900 (D) : 14e4 16a8 network [ ethernet ]
Adapters on 000003460ec00000
00 0800 (D) : 1033 0035 usb-ohci ( NEC uPD720101 )
00 0900 (D) : 1033 0035 usb-ohci ( NEC uPD720101 )
00 0a00 (D) : 1033 00e0 usb-ehci*
Welcome to Open Firmware
Licensed Internal Code - Property of IBM
(c) Copyright IBM Corp. 2005, 2007 All Rights Reserved.
Cell BE is a trademark of SONY Computer Entertainment Inc.
Type ’boot’ and press return to continue booting the system.
Type ’reset-all’ and press enter to reboot the system.
disable nvram logging .. done
4. The next screen displays system information. It shows revision information
about the chip set, SMP size, boot date/time, and the available memory.
SYSTEM INFORMATION
Processor = Cell/B.E.(TM) DD3.2 @ 3200 MHz
I/O Bridge = Cell BE companion chip DD2.x
Timebase = 26666 kHz (internal)
SMP Size = 2 (4 threads)
Boot-Date = 2007-06-08 11:20
Memory = 2048MB (CPU0: 1024MB, CPU1: 1024MB)
The Operating System now boots unless you press F1 in which case the SMS
menu starts. See “Using the SMS utility program” on page 11 for further information.
Chapter 5. Diagnostics and troubleshooting 61
Diagnostic programs and messages
The Dynamic System Analysis (DSA) Preboot diagnostic programs are the primary
method of testing the major components of the server. DSA is a system information
collection and analysis tool that you can use to provide information IBM service and
support to aid in the diagnosis of the system problems. The DSA diagnostic
programs come on the IBM Dynamic System Analysis Preboot Diagnostic CD. Yo u
can download the CD from http://www.ibm.com/support/us/en if one did not come
with your server. As you run the diagnostic programs, text messages are displayed
on the screen and are saved in the test log. A diagnostic text message indicates
that a problem has been detected and indicates the action you should take as a
result of the text message.
The DSA diagnostic programs collect the following information about the following
aspects of the system:
v System configuration
v Network interfaces and settings
v Hardware inventory USB information
v IBM LightPath diagnostics status
v Service processor status and configuration
v Vital product data and system firmware information
v Drive Health Information
v LSI RAID & Controller configuration
DSA diagnostic programs can also provide diagnostics for the following system
The
components:
v Baseboard Management Controller
v Memory stress
v CPU stress
Additionally,
DSA creates a merged log that includes events from all collected logs.
All collected information can be output as a compressed XML file that can be sent
to IBM Service. Additionally, you can view the information locally through a
generated text report file. Optionally, the generated HTML pages may be copied to
removable media and viewed from a web browser.
Running diagnostics and preboot DSA
To run the diagnostic programs, complete the following steps:
1. If the server is running, turn off the server and all attached devices.
2. Turn on all attached devices then turn on the server.
3. Ensure that external DSA bootable media is available as a boot device. For boot
device selection, system firmware will work through the boot path as specified in
the onboard planar VPD and try to establish communication with the specified
interfaces in sequential order. These boot devices include the USB attached
DVD (BladeCenter, media tray), the SAS storage if attached , as well as
Network attached storage.
Note: To ensure the blade server boots from the correct device, use the
4. If required, :
62 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Advanced Management Module to change the boot order so the blade
server boots first from the preboot DSA device.
5. When the boot prompt appears, press enter, type dsa and press enter again.
Alternatively you can wait for the timeout to expire.
6. The command line interface prompt will then appear on the SOL connection.
The BladeCenter QS21 does not support the graphical user interface.
7. Follow the on screen directions to run preboot DSA. Diagnostics are run from
within preboot DSA.
you are using the CPU or Memory stress tests, call your IBM service
When
representative if you experience any system instability.
To determine what action you should take as a result of a diagnostic text message,
see “DSA error messages.”
Open firmware memory diagnostic results are output to the SOL connection. They
are also logged in NVRAM. All NVRAM logs (more than just OF diags) are collected
as part of the DSA merged log.
If the diagnostic programs do not detect any hardware errors but the problem
remains during normal server operations, a software error might be the cause. If
you suspect a software problem, see the information that comes with your software.
A single problem might cause more than one error message. When this happens,
correct the cause of the first error message. The other error messages usually will
not occur the next time you run the diagnostic programs.
If there are multiple error codes or light path diagnostics LEDs that indicate a
microprocessor error, the error might be in a microprocessor or in a microprocessor
socket. See Table 20 on page 90 for further information about diagnosing
microprocessor problems.
If the server stops during testing and you cannot continue, restart the server and try
running the diagnostic programs again.
Diagnostic text messages
Diagnostic text messages are displayed while the tests are running. A diagnostic
text message contains one of the following results:
v Passed: The test was completed without any errors
v Failed: The test detected an error
v Aborted: The test could not proceed because of the server configuration
Additional
information concerning test failures is available in the extended
diagnostic results for each test.
Viewing the test log
To view the test log when the tests are completed, issue the view command from
the DSA command line interface. DSA collections may also be transferred to an
external USB device using the copy command from the DSA command line
interface.
DSA error messages
The tables below describe the messages that the diagnostic programs might
generate and suggested actions to correct the detected problems. Follow the
suggested actions in the order given.
Chapter 5. Diagnostics and troubleshooting 63
CPU test results
Table 5. CPU test results
Test Number Status
CPU
stress test
089-901xxx
089-802xxx
089-801xxx
Fail Test failure
Abort System resource
Abort Internal program
Extended
results Actions
1. If the system has stopped responding, turn off and restart
the system and then run the test again.
2. Make sure that the DSA Diagnostic code is at the latest
level. The latest level DSA Diagnostic code can be found
on the IBM Support Web site at http://www.ibm.com/
support/docview.wss?uid=psg1SERV-DSA/.
3. Run the test again.
availability error
4. Check system firmware level and upgrade if necessary.
The installed firmware level can be found in the DSA
Diagnostic Event Log within the Firmware/VPD section for
this component. The latest level firmware for this
component can be found on the IBM Support Web site
athttp://www.ibm.com/support/us/en/.
error
5. Run the test again.
6. If the system has stopped responding, turn off and restart
the system and then run the test again.
7. If the test continues to fail, refer to the other sections of
this chapter for diagnosis and corrective action.
BMC test results
Table 6. BMC test results
Test Number Status
I2C test 166-901-
xxx
Fail The BMC
Extended
results Actions
indicates a
failure in the
IPMB bus.
1. Turn off the system and disconnect it from power. The
system must be removed from AC power in order to reset
the BMC.
2. After 45 seconds, reconnect the system to power and turn
on the system.
3. Run the test again.
4. Make sure that the DSA Diagnostic code is at the latest
level. The latest level DSA Diagnostic code can be found
on the IBM Support Web site at http://www.ibm.com/
support/docview.wss?uid=psg1SERV-DSA/.
5. Check BMC firmware level and upgrade if necessary. The
installed firmware level can be found in the DSA Diagnostic
Event Log within the Firmware/VPD section for this
component. The latest level firmware for this component
can be found on the IBM Support Web site
athttp://www.ibm.com/support/us/en/.
6. Run the test again.
7. If the test continues to fail, refer to the other sections of
this chapter for diagnosis and corrective action.
64 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Table 6. BMC test results (continued)
Test Number Status
166-902-
Fail The BMC
xxx
Extended
results Actions
indicates a
failure in the
memory card
bus.
1. Turn off the system and disconnect it from power. The
system must be removed from AC power in order to reset
the BMC.
2. After 45 seconds, reconnect the system to power and turn
on the system.
3. Run the test again.
4. Make sure that the DSA Diagnostic code is at the latest
level. The latest level DSA Diagnostic code can be found
on the IBM Support Web site at http://www.ibm.com/
support/docview.wss?uid=psg1SERV-DSA/.
5. Check BMC firmware level and upgrade if necessary. The
installed firmware level can be found in the DSA
Diagnostic Event Log within the Firmware/VPD section for
this component. The latest level firmware for this
component can be found on the IBM Support Web site
athttp://www.ibm.com/support/us/en/.
6. Run the test again.
7. If the reported memory size is the same as the installed
memory size, complete the following steps. Otherwise, go
to step 8.
a. Turn off the system and disconnect it from power.
b. Reseat all the system DIMMs within the system.
c. Reconnect the system to power and turn on the
d. Run the test again.
8. Turn off the system and disconnect it from power.
9. Remove all the system memory.
10. Install the minimum memory configuration for the system.
See the QS21 Installation and User's Guide for supported
memory configurations.
11. Reconnect the system to power and turn on the system.
12. Make sure that the reported memory size is the same as
the installed memory size.
13. Run the test again. If the memory passes the test, one of
the uninstalled memory cards or DIMMs is the failing
component.
14. Repeat steps 8 through to 13 as necessary, using
different memory cards and DIMMs, to isolate the failing
component. It is important to change only one element
each time in order to identify the specific cause of the
error.
15. Replace the failing memory card or DIMM.
system.
Chapter 5. Diagnostics and troubleshooting 65
Table 6. BMC test results (continued)
Test Number Status
166-903-
Fail The BMC
xxx
Extended
results Actions
indicates a
failure in the
Ethernet
sideband bus.
1. Turn off the system and disconnect it from power. The
system must be removed from AC power in order to reset
the BMC.
2. After 45 seconds, reconnect the system to power and turn
on the system.
3. Run the test again.
4. Make sure that the DSA Diagnostic code is at the latest
level. The latest level DSA Diagnostic code can be found
on the IBM Support Web site at http://www.ibm.com/
support/docview.wss?uid=psg1SERV-DSA/.
5. Check BMC firmware level and upgrade if necessary. The
installed firmware level can be found in the DSA Diagnostic
Event Log within the Firmware/VPD section for this
component. The latest level firmware for this component
can be found on the IBM Support Web site
athttp://www.ibm.com/support/us/en/.
6. Check Ethernet device firmware level and upgrade if
necessary. The installed firmware level can be found in the
DSA Diagnostic Event Log within the Firmware/VPD
section for this component. The latest level firmware for
this component can be found on the IBM Support Web site
at http://www.ibm.com/support/us/en/ .
7. Run the test again.
8. If the test continues to fail, refer to the other sections of
this chapter for diagnosis and corrective action.
66 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Table 6. BMC test results (continued)
Test Number Status
166-904-
Fail The BMC
xxx
166-905-
Fail The BMC
xxx
166-906-
Fail The BMC
xxx
166-907-
Fail The BMC
xxx
166-908-
Fail The BMC
xxx
166-910-
Fail The BMC
xxx
Extended
results Actions
indicates a
failure in the
main bus.
1. Turn off the system and disconnect it from power. The
system must be removed from AC power in order to reset
the BMC.
2. After 45 seconds, reconnect the system to power and turn
on the system.
indicates a
failure in the
pecos bus.
3. Run the test again.
4. Make sure that the DSA Diagnostic code is at the latest
level. The latest level DSA Diagnostic code can be found
on the IBM Support Web site at http://www.ibm.com/
indicates a
failure in the
BMC private
bus.
support/docview.wss?uid=psg1SERV-DSA/.
5. Check BMC firmware level and upgrade if necessary. The
installed firmware level can be found in the DSA Diagnostic
Event Log within the Firmware/VPD section for this
component. The latest level firmware for this component
indicates a
failure in the
power backplane
bus.
can be found on the IBM Support Web site at
http://www.ibm.com/support/us/en/.
6. Run the test again.
7. If the test continues to fail, refer to the other sections of
this chapter for diagnosis and corrective action.
indicates a
failure in the
microprocessor
bus.
indicates a
failure in the
PCIe and Light
path diagnostics
bus.
Chapter 5. Diagnostics and troubleshooting 67
Table 7. BMC test results
Test Number Status
166-801-
Abort BMC I2C test
xxx BMC
166-802-
Abort BMC I2C test
xxx BMC
166-803-
Abort BMC I2C test
xxx BMC
166-804xxx
BMC
166-805-
Abort BMC I2C test
Abort BMC I2C test
xxx BMC
166-806-
Abort BMC I2C test
xxx BMC
166-807-
Abort BMC I2C test
xxx BMC
166-808-
Abort BMC I2C test
xxx BMC
166-809-
Abort BMC I2C test
xxx BMC
166-810-
Abort BMC I2C test
xxx BMC
166-811-
Abort BMC I2C test
xxx BMC
166-812-
Abort BMC I2C test
xxx BMC
Extended
results Actions
canceled: the
BMC returned
an incorrect
response length.
canceled: the
test cannot be
completed for an
unknown reason.
canceled: the
node is busy; try
later.
1. Turn off the system and disconnect it from power. The
system must be removed from AC power in order to reset
the BMC.
2. After 45 seconds, reconnect the system to power and turn
on the system.
3. Run the test again.
4. Make sure that the DSA Diagnostic code is at the latest
level. The latest level DSA Diagnostic code can be found
on the IBM Support Web site at http://www.ibm.com/
support/docview.wss?uid=psg1SERV-DSA/.
5. Check BMC firmware level and upgrade if necessary. The
installed firmware level can be found in the DSA Diagnostic
Event Log within the Firmware/VPD section for this
component. The latest level firmware for this component
can be found on the IBM Support Web site at
canceled: invalid
command.
http://www.ibm.com/support/us/en/.
6. Run the test again.
7. If the test continues to fail, refer to the other sections of this
canceled: invalid
chapter for diagnosis and corrective action.
command for the
given LUN.
canceled:
timeout while
processing the
command.
canceled: out of
space
canceled:
reservation
canceled or
invalid
reservation ID
canceled:
request data
was truncated.
canceled:
request data
length is invalid.
canceled:
request data
field length limit
is exceeded.
canceled: a
parameter is out
of range.
68 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Table 7. BMC test results (continued)
Test Number Status
166-813-
Abort BMC I2C test
xxx BMC
166-814-
Abort BMC I2C test
xxx BMC
166-814-
Abort BMC I2C test
xxx BMC
166-816-
Abort BMC I2C test
xxx BMC
166-817-
Abort BMC I2C test
xxx BMC
166-818-
Abort BMC I2C test
xxx BMC
Extended
results Actions
canceled: cannot
return the
number of
requested data
bytes.
canceled:
requested
sensor, data, or
record is not
present.
canceled: invalid
data field in the
request.
1. Turn off the system and disconnect it from power. The
system must be removed from AC power in order to reset
the BMC.
2. After 45 seconds, reconnect the system to power and turn
on the system.
3. Run the test again.
4. Make sure that the DSA Diagnostic code is at the latest
level. The latest level DSA Diagnostic code can be found
on the IBM Support Web site at http://www.ibm.com/
support/docview.wss?uid=psg1SERV-DSA/.
5. Check BMC firmware level and upgrade if necessary. The
installed firmware level can be found in the DSA Diagnostic
Event Log within the Firmware/VPD section for this
component. The latest level firmware for this component
can be found on the IBM Support Web site at
http://www.ibm.com/support/us/en/.
6. Run the test again.
7. If the test continues to fail, refer to the other sections of this
chapter for diagnosis and corrective action.
canceled: the
command is
illegal for the
specified sensor
or record type
canceled: a
command
response could
not be provided
canceled: cannot
execute a
duplicated
request.
Chapter 5. Diagnostics and troubleshooting 69
Table 7. BMC test results (continued)
Test Number Status
166-819-
Abort BMC I2C test
xxx BMC
166-820xxx
BMC
166-821-
Abort BMC I2C test
Abort BMC I2C test
xxx BMC
166-822-
Abort BMC I2C test
xxx BMC
166-823-
Abort BMC I2C test
xxx BMC
166-824-
Abort BMC I2C test
xxx BMC
166-000-
Pass
xxx
Extended
results Actions
canceled: a
command
response could
not be provided;
the SDR
repository is in
update mode.
canceled: a
command
response could
not be provided;
the device is in
firmware update
mode.
canceled: a
command
response could
not be provided;
BMC
initialization is in
progress
canceled: the
destination is
unavailable.
canceled: cannot
execute the
command;
insufficient
privilege level.
canceled: cannot
execute the
command.
Memory tests
Table 8. Memory test results
Extended
Test Number Status
Memory
stress
201-000xxx
Pass
test
70 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
results Actions
Table 8. Memory test results (continued)
Extended
Test Number Status
202-
Fail General error:
802-xx
results Actions
memory size is
insufficient to run
the test.
202-901-
Fail Test failure.
xxx
202-801xxx
202-000-
Abort Internal program
error.
Pass
xxx
1. Ensure all memory is enabled by checking Available
System Memory in the Resource Utilization section of the
DSA Diagnostic Event Log.
2. Make sure that the DSA Diagnostic code is at the latest
level. The latest level DSA Diagnostic code can be found on
the IBM Support Web site at http://www.ibm.com/support/
docview.wss?uid=psg1SERV-DSA/.
3. Run the test again.
4. Execute the standard DSA memory diagnostic to validate all
memory.
5. If the test continues to fail, refer to the other sections of this
chapter for diagnosis and corrective action.
1. Execute the standard DSA memory diagnostic to validate all
memory.
2. Make sure that the DSA Diagnostic code is at the latest
level. The latest level DSA Diagnostic code can be found on
the IBM Support Web site at http://www.ibm.com/support/
docview.wss?uid=psg1SERV-DSA/.
3. Turn off the system and disconnect it from power.
4. Reseat the DIMMs.
5. Reconnect the system to power and turn on the system.
6. Run the test again.
7. Execute the standard DSA memory diagnostic to validate all
memory.
8. If you cannot reproduce the problem, contact your IBM
technical-support representative.
1. Turn off and restart the system.
2. Make sure that the system firmware code and DSA code are
at the latest level.
3. Run the test again.
4. Turn off and restart the system if necessary to recover from
a hung state.
5. Run the memory diagnostic to identify the specific failing
DIMM.
6. If the test continues to fail, refer to the other sections of this
chapter for diagnosis and corrective action
System firmware startup messages
The system firmware displays the progress of the startup process on the serial
console from the time that ac power is connected to the system until the operating
system login prompt is displayed following a successful operating system startup.
If a serial console is not connected, you can use the Advanced Management
Module to monitor the logs and display informational and error messages.
Chapter 5. Diagnostics and troubleshooting 71
If the firmware encounters an error during the startup process, a message
describing the error together with an error code is displayed on the serial console.
There are two types of error, where xxx represents the number of the error code:
Cxxx This is an internal checkpoint. If the system stops during the startup
process a checkpoint may be displayed.
Exxx This type of error means that there is a failure that does not allow
the firmware to continue the startup process. Check the error codes
in the section “Boot errors and handling” on page 72. If these do
not help resolve the problem, contact a service support
representative.
are cases where a message that is informational only is displayed on the
There
serial console.
Wxxx This is a warning message. The firmware allows the startup process
to continue, but indicates there maybe a problem. A warning
message can be combined with an error message to give more
complete information about an error.
complete list of possible messages is given in the section “Boot errors and
A
handling” on page 72.
Boot errors and handling
The following sections describe boot errors and actions you can take to resolve
these errors.
Boot list
The following table describes boot list errors.
Table 9. System firmware boot list errors
Code Message Description Action
E3400 It was not possible to boot from
any device specified in the VPD
E3401 Aborting boot, <details > Boot aborted due to error detected
E3402 Aborting boot, internal error. Boot aborted due to error detected
The firmware found a valid VPD
but was not able to find bootable
code on any of the devices listed
in it.
by the low level code. The
<details > string provides the error
description.
by the low level code.
Use Advanced Management
Module Web browser to specify at
least one device that contains
bootable code.
From the Advanced Management
Module Web interface, choose
BladeTasks>Configuration>Boot
Sequence.
Based on the <details > string you
may have to take an action on
faulty hardware or use the
Advanced Management Module to
correct the system configuration.
If the problem persists, contact
your IBM service representative.
The exact reason is unknown but
could be a firmware problem.
72 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
If the problem persists, contact
your IBM service representative.
Table 9. System firmware boot list errors (continued)
Code Message Description Action
E3403 Bad executable: <details > The file loaded from the boot
device is not a valid PPC
executable ELF file. The <details >
string provides more details about
Using the Advanced Management
Module correct the boot device
configuration. Select a valid boot
device and executable path
the file type.
E3404 Not a bootable device! The system cannot load an
executable file from this device.
Using the Advanced Management
Module correct the boot device
configuration. Select a valid boot
device and executable path.
E3405 No such device The specified boot device is
currently not present or not ready
for access.
Check the hardware device or use
the Advanced Management
Module to correct the system
configuration.
If the problem persists, contact
your IBM service representative.
E3406 Client application returned an
error: <details >
The OS or a standalone
application returned an error code
to the system firmware. The
<details > string provides the error
description
Based on the <details > string you
may have to take an action on
faulty hardware or use the
Advanced Management Module to
correct the system configuration. It
may be needed to perform the
firmware or OS upgrade to resolve
compatibility issues. If the problem
persists, contact your IBM service
representative.
E3407 Load failed Load or boot failed to load
requested file from the device.
This is informational message and
may be preceded by one or more
other error messages.
Based on the preceding error
messages you may have to take
an action on faulty hardware or
use the Advanced Management
Module to correct the system
configuration
E3408 Failed to claim memory for the
executable
An attempt to load executable file
from the boot device failed due to
insufficient memory or firmware
problem.
Verify that loaded file was indeed
the right executable intended to
boot this system. If not, using the
Advanced Management Module
correct the system configuration.
Otherwise, contact your IBM
service representative. Yo u may
need to add more memory to the
system or to perform the firmware
upgrade.
E3409 Unknown FORTH Word Internal code error, or compatibility
issue.
Contact your IBM service
representative. Yo u may need to
perform the firmware upgrade.
E3410 Boot list successfully read from
VPD but no useful information
received.
The firmware found a valid VPD
but was not able to find bootable
code on any of the devices listed
in it.
Use Advanced Management
Module Web browser to specify at
least one device that contains
bootable code.
From the Advanced Management
Module Web interface, choose
BladeTasks>Configuration>Boot
Sequence.
Chapter 5. Diagnostics and troubleshooting 73
Table 9. System firmware boot list errors (continued)
Code Message Description Action
W3411 Client application returned. Loaded OS or standalone
application returned to firmware.
This may be a normal condition or
firmware could not detect any error
issued by the client application.
Booting from the boot-device list
will be interrupted at this stage
and no further attempts to boot
None needed. If boot (e.g. yaboot)
exited because of need to boot
from different device in the list,
either boot manually from the
firmware (ok) prompt or, using the
Advanced Management Module,
change the boot device order in
the system configuration.
from devices in the list will be
made.
E3420 Boot list could not be read from
VPD.
The firmware found an invalid
VPD. Possibly it has been
corrupted by the system software.
The VPD must be rewritten. Use
the Advanced Management
Module Web browser to specify at
least one device that contains
bootable code.
From the Advanced Management
Module Web interface, choose
BladeTasks>Configuration>Boot
Sequence.If the problem persists,
contact your IBM service
representative.
System firmware update errors
The following table describes system firmware errors that can occur if there have
been problems after an update.
Table 10. System firmware boot errors
Code Message Description Action
E4000 (RTAS Flash) unknown flash chip
version
E4010 Platform check failed for image The firmware image does not
E4020 (RTAS flash) image corrupted
(CRC)
The flash update code does not
support the onboard boot ROM
flash chip.
match the hardware platform.
The image for a system firmware
update is corrupted.
Contact your IBM service
representative as the system
board may need replacing.
Check the firmware image and
ensure you have the right image
for the BladeCenter QS21. See
“Using the SMS utility program” on
page 11.
If the image is incorrect, download
and install the correct image from
http://www.ibm.com/support/us/en/.
See “Updating the system and
BMC firmware” on page 15 for
further information.
Download the image again and
reapply the update.
If this does not resolve the
problem, apply an image from a
different source.
74 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Memory initialization errors
The following table describes the memory initialization errors that can occur during
boot.
Table 11 . Memory initialization errors
Code Message Description Action
E1006 Memory Incomplete. Not all the XDR system memory
could be initialized.
E1100 System memory init failure. Boot
abort.
The system XDR memory could
not be initialized. The boot process
has aborted.
E111 0 System memory test failure. Boot
abort.
An error has occurred while testing
the XDR memory.
The blade server can still boot but
with reduced system memory.
Power down then reboot the
blade.
If this does not resolve the
problem, contact your IBM service
representative as the system
board may need replacing.
Power down then reboot the
blade.
If this does not resolve the
problem, contact your IBM service
representative as the system
board may need replacing.
Power down then reboot the
blade.
E1200 System memory init failure during
second-pass calibration. CPU
halted.
E1210 Memory controller failed. CPU
halted.
W1250 Timing Calibration failed: BE...
YRAC... DQ... pin...
Note: <xx... > indicates the
number of the pin where the error
has occurred.
Since the first-pass calibration
succeeded, either the CPU or the
system XDR memory could have a
defective contact.
The built-in memory controller of
the CPU encountered an
unexpected error.
This warning message
accompanies a later memory
initialization error message and
lists the pin number for help in
locating the cause of the error.
If this does not resolve the
problem, contact your IBM service
representative as the system
board may need replacing.
Power down then reboot the
blade.
If this does not resolve the
problem, contact your IBM service
representative as the system
board may need replacing.
Power down then reboot the
blade.
If this does not resolve the
problem, contact your IBM service
representative as the system
board may need replacing.
See the accompanying memory
initialization error message for
further information.
USB errors
The following table describes boot list errors. These may occur when booting from a
bootable CD or DVD.
Chapter 5. Diagnostics and troubleshooting 75
Table 12. System firmware boot errors
Code Message Description Action
E5000 (USB) Media or drive not ready for
this blade.
The media tray is not accessible
for boot.
Verify that the media tray is
assigned to the blade and that the
media is configured correctly.
If this does not resolve the
problem, check the other blade
servers. If they cannot access the
media, there may be a problem
with the BladeCenter unit. See the
documentation that comes with the
BladeCenter unit for further
information.
E5010 (USB) No Media Found! Please
check for the drawer/inserted
media.
E5020 (USB) Unknown media format. The media is not recognized by
E5030 (USB) Device communication error Firmware cannot communicate
Media is not inserted or the drawer
of the media tray is open.
the firmware.
with the BladeCenter USB
devices.
Ensure that there is a bootable CD
or DVD in the tray and that the
drawer is closed.
Insert a suitable bootable CD.
This could be a firmware or
physical hardware problem.
Check:
v The Advanced Management
Module for messages
v The system firmware image is
not corrupt. See “System
firmware update errors” on page
74 for more information about
possible errors and their
solution.
v Other blade servers within the
BladeCenter unit to see if they
have the problem. If they do,
the BladeCenter unit itself may
be the cause of the problem.
See the Problem Determination
and Service Guide for your
BladeCenter unit for more
information.
76 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Finally,
power down then reboot
the blade. If this does not help
resolve the problem, contact your
IBM service representative.
Table 12. System firmware boot errors (continued)
Code Message Description Action
E5040 (USB) Device transaction error.
<command>
The drive showed an error during
data transfer.
1. Verify that the media tray is
Note: <command> indicates the
command in progress when the
transaction error occurred. This
information may not always be
available.
2. Check that the correct media is
3. Inspect the media to see if
4. Use another CD or DVD drive
5. Check with other blade servers
assigned to the blade server.
Note: A reboot of the blade
server the to which the media
tray was previously assigned is
required.
inserted and that the drawer is
closed.
there is visible damage.
to check that the media is
readable.
within the BladeCenter unit to
see if they have the problem. If
they do, the BladeCenter unit
itself may be the cause of the
problem. See the Problem
Determination and Service
Guide for your BladeCenter
unit for more information.
Network boot errors
The following table describes the network boot errors.
Table 13. Network boot errors
Code Message Description Action
E3000 (net) Could not read MAC
address.
E3001 (net) Could not get IP address. The DHCP server is not
E3002 (net) ARP request to TFTP server
(x.x.x.x) failed.
Note: (x.x.x.x) represents the
address of the TFTP server.
The firmware could not establish a
communication socket for booting
over the network due to an error
while retrieving the MAC address
of the network device.
responding, or there could be a
MAC address conflict in your
network.
The MAC address resolution failed
for the TFTP server with IP
address (x.x.x.x) .
Power down then reboot the blade
server.
If this does not resolve the
problem, contact your IBM service
representative.
v Check that your DHCP server is
available
v Check that an IP addresses has
been correctly assigned
v check that your MAC address is
valid and is unique across your
network
1. Check that the TFTP server is
2. Check that the DHCP server is
available and can be reached
over the network.
correctly assigning IP
addresses.
Chapter 5. Diagnostics and troubleshooting 77
Table 13. Network boot errors (continued)
Code Message Description Action
E3003 (net) unknown TFTP error. The TFTP server encountered an
error but is not able to determine
Power down then reboot the blade
server.
its cause.
If this does not help resolve the
problem, contact your IBM service
representative.
E3004 (net) TFTP buffer to small for
<filename>
Note: <filename> is the name of
The requested file is too big. Try to load a smaller file. If this
succeeds, check your DHCP
server configuration.
the file TFTP has attempted to
buffer.
E3005 (net) ICMP ERROR: <error
message>
E3006 (net) Could not initialize network
device
The TFTP server cannot be
reached.
The network device could not be
activated.
Check that the TFTP server is
available and correctly configured.
Check that you have connected all
network cables and that you have
enabled the BladeCenter I/O
module.
E3008 (net) Can’t obtain TFTP server IP
address
The DHCP server has not
delivered the IP address of the
Check your DHCP server
configuration.
TFTP server.
E3009 (net) file not found: <filename> The requested file was not found
on the TFTP server.
Check your DHCP server
configuration and make sure that
you are using the proper TFTP
server and the right file name.
E3010 (net) TFTP access violation The TFTP server reported a file
access violation.
Check the file name and the
permissions of the file that should
be downloaded.
E3011 (net) illegal TFTP operation The TFTP server is not able to
handle the request.
There may be too many UDP
ports open on the TFTP server.
Reboot the TFTP server and retry
the transfer.
E3012 (net) unknown TFTP transfer ID The TFTP server could not assign
Reboot and retry the transfer.
the data to a UDP packet based
on its transfer ID. The transfer ID
for this connection may be in use
by another client.
E3013 (net) no such TFTP user The TFTP server reported an
unknown user.
If the problem persists check the
configuration of the UDP ports on
your TFTP server.
Change the TFTP server
configuration to grant anonymous
user access.
E3014 (net) TFTP error occurred after
<No> bad packets received
The TFTP client received too
many bad packets.
Reboot and retry the transfer.
If the error persists, this could
indicate problems with the
network. Check all network
connections and cables.
E3015 (net) TFTP error occurred after
missing <No> responses
The TFTP client has missed too
many packets.
Reboot and retry the transfer.
If the error persists, this could
indicate problems with the
network. Check all network
connections and cables.
E3016 (net) TFTP error missing block
<No> , expected block was <No>
The TFTP client received a packet
that is out of order.
Reboot and retry the transfer.
78 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Table 13. Network boot errors (continued)
Code Message Description Action
E3017 (net) TFTP block size negotiation
failed
TFTP server has sent an
acknowledgement to the client
without block size information for
subsequent TFTP network traffic.
The TFTP server may not be
working properly. Change the
TFTP server configuration to allow
block size negotiation. Reboot the
blade and/or the TFTP server and
try again.
E3018 (net) file exceeds maximum TFTP
transfer size
The requested file is too big to
transfer via TFTP.
Change the TFTP server
configuration to increase the block
size to a maximum value of 1432
bytes.
Note: Be aware that your BladeCenter QS21 has two Ethernet controllers and can
be connected to two Ethernet switches. As the blade center performs a
network boot from the controller that acquires the IP address first make sure
that your Linux configuration supports this. If your Linux environment
requires a static IP address for a particular Ethernet port, you must set up
your DHCP environment accordingly.
SAS boot errors
These error messages only appear if you have installed the optional SAS daughter
card.
Table 14. SAS boot errors
Code Message Description Action
E4303 LSISAS1064 controller
initialization failed.
The blade server firmware was not
able to initialize the controller. This
could indicate a hardware, blade server
firmware, or SAS expansion card
firmware problem.
Try following steps in order to fix the
problem:
1. Reboot the blade.
2. Power down then remove and
reinstall the blade server in the
BladeCenter unit.
3. Remove and reinstall the SAS
Expansion Card.
4. Ensure the SAS Expansion Card
firmware and blade firmware
version are at the correct level.
5. If the error started after a SAS
Expansion Card firmware upgrade
or a blade server firmware
upgrade, consider a rollback to the
previous firmware versions. Check
with the documentation at
http://www.ibm.com/systems/
bladecenter/support/
whether rollback is possible.
6. Plug the SAS expansion card into
another blade server. If the problem
persists, the SAS Expansion Card
may need replacement.
7. Plug a different SAS expansion
card into the blade server. If the
problem persists, the blade server
may need replacement.
to verify
Chapter 5. Diagnostics and troubleshooting 79
Table 14. SAS boot errors (continued)
Code Message Description Action
E4304 LSISAS1064 controller
operation failed.
The blade firmware was not able to
bring the controller to an operational
state. This could indicate a hardware,
blade server firmware, or SAS
expansion card firmware problem.
Try following steps in order to fix the
problem:
1. Reboot the blade.
2. Power down then remove and
reinstall the blade server in the
BladeCenter unit.
3. Remove and reinstall the SAS
Expansion Card.
4. Ensure the SAS Expansion Card
firmware and blade firmware
version are at the correct level.
5. If the error started after a SAS
Expansion Card firmware upgrade
or a blade server firmware
upgrade, consider a rollback to the
previous firmware versions. Check
with the documentation at
http://www.ibm.com/systems/
bladecenter/support/
whether rollback is possible.
6. Plug the SAS expansion card into
another blade server. If the problem
persists, the SAS Expansion Card
may need replacement.
7. Plug a different SAS expansion
card into the blade server. If the
problem persists, the blade server
may need replacement.
to verify
80 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Table 14. SAS boot errors (continued)
Code Message Description Action
E4305 LSISAS1064 port failed. The blade firmware could not enable
the SAS port. This could indicate a
hardware, blade server firmware, or
SAS expansion card firmware problem.
Try following steps in order to fix the
problem:
1. Reboot the blade.
2. Power down then remove and
reinstall the blade server in the
BladeCenter unit.
3. Remove and reinstall the SAS
Expansion Card.
4. Ensure the SAS Expansion Card
firmware and blade firmware
version are at the correct level.
5. If the error started after a SAS
Expansion Card firmware upgrade
or a blade server firmware
upgrade, consider a rollback to the
previous firmware versions. Check
with the documentation at
http://www.ibm.com/systems/
bladecenter/support/
whether rollback is possible.
6. Plug the SAS expansion card into
another blade server. If the problem
persists, the SAS Expansion Card
may need replacement.
7. Plug a different SAS expansion
card into the blade server. If the
problem persists, the blade server
may need replacement.
to verify
Chapter 5. Diagnostics and troubleshooting 81
Table 14. SAS boot errors (continued)
Code Message Description Action
E4307 LSISAS1064 network
topology read failed.
The blade server firmware was not
able to discover the SAS topology. This
could indicate a hardware, blade server
firmware, or SAS expansion card
firmware problem.
Try following steps in order to fix the
problem:
1. Reboot the blade.
2. Power down then remove and
reinstall the blade server in the
BladeCenter unit.
3. Remove and reinstall the SAS
Expansion Card.
4. Ensure the SAS Expansion Card
firmware and blade firmware
version are at the correct level.
5. If the error started after a SAS
Expansion Card firmware upgrade
or a blade server firmware
upgrade, consider a rollback to the
previous firmware versions. Check
with the documentation at
http://www.ibm.com/systems/
bladecenter/support/
whether rollback is possible.
6. Plug the SAS expansion card into
another blade server. If the problem
persists, the SAS Expansion Card
may need replacement.
7. Plug a different SAS expansion
card into the blade server. If the
problem persists, the blade server
may need replacement.
to verify
82 BladeCenter QS21 Type 0792: Problem Determination and Service Guide