IBM QS21, BladeCenter QS21 Type 0792 Service Manual

BladeCenter QS21 Ty pe 0792

Problem Dete rminatio n and Service Guid e
BladeCenter QS21 Ty pe 0792

Problem Dete rminatio n and Service Guid e
Note
Before using this information and the product it supports, read the general information in Appendix C, “Notices,” on page 11 3 and the Warranty and Support Information on the Documentation CD.
© Copyright International Business Machines Corporation 2006, 2008.
US Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . .1
Related documentation . . . . . . . . . . . . . . . . . . . . . .1
Notices and statements used in this document . . . . . . . . . . . . . .2
Features and specifications . . . . . . . . . . . . . . . . . . . . .2
Support for local storage . . . . . . . . . . . . . . . . . . . . . .3
Turning on the blade server . . . . . . . . . . . . . . . . . . . . .3
Turning off the blade server . . . . . . . . . . . . . . . . . . . . .4
Blade server controls and LEDs . . . . . . . . . . . . . . . . . . .6
System board LEDs . . . . . . . . . . . . . . . . . . . . . . .7
System board internal and expansion card connectors . . . . . . . . . . .8
Chapter 2. Configuring the blade server . . . . . . . . . . . . . . .9
Communicating with the blade server . . . . . . . . . . . . . . . . .9
Using the Advanced Management Module . . . . . . . . . . . . . .9
Using the Web interface . . . . . . . . . . . . . . . . . . .10
Using the command-line interface . . . . . . . . . . . . . . . .10
Using Serial over LAN . . . . . . . . . . . . . . . . . . . . .10
Using the serial interface . . . . . . . . . . . . . . . . . . . .10
Using the SMS utility program . . . . . . . . . . . . . . . . . .11
Starting SMS . . . . . . . . . . . . . . . . . . . . . . .11
Viewing FRU information . . . . . . . . . . . . . . . . . . .12
Adding FRU information . . . . . . . . . . . . . . . . . .13
Updating the system and BMC firmware . . . . . . . . . . . . . . .15
Updating steps . . . . . . . . . . . . . . . . . . . . . . . .16
Determining current blade server firmware levels . . . . . . . . . . .17
Updating the BMC firmware . . . . . . . . . . . . . . . . . . .18
Using the BMC update package . . . . . . . . . . . . . . . .18
Using the Advanced Management Module . . . . . . . . . . . . .18
Installing the system firmware . . . . . . . . . . . . . . . . . .20
The firmware update package . . . . . . . . . . . . . . . . . .21
Using the package . . . . . . . . . . . . . . . . . . . . .21
Updating the system firmware automatically . . . . . . . . . . . .22
Installing the firmware manually . . . . . . . . . . . . . . . . . .22
Updating the system firmware images . . . . . . . . . . . . . . .23
Updating the optional expansion card firmware . . . . . . . . . . . . .23
Integrating the Gigabit Ethernet controller into the BladeCenter . . . . . . .23
Updating the Ethernet controller firmware . . . . . . . . . . . . . . .24
Using the update package . . . . . . . . . . . . . . . . . . . .24
Firmware update steps . . . . . . . . . . . . . . . . . . . . .25
Blade server Ethernet controller enumeration . . . . . . . . . . . . . .26
Chapter 3. Parts listing . . . . . . . . . . . . . . . . . . . . .27
Replaceable components . . . . . . . . . . . . . . . . . . . . .27
Chapter 4. Installing and removing replaceable units . . . . . . . . .29
Installation guidelines . . . . . . . . . . . . . . . . . . . . . .29
System reliability guidelines . . . . . . . . . . . . . . . . . . .30
Handling static-sensitive devices . . . . . . . . . . . . . . . . .30
Removing the blade server from the BladeCenter unit . . . . . . . . . .31
Removing the blade server . . . . . . . . . . . . . . . . . . .31
Opening and removing the blade server cover . . . . . . . . . . . . .32
© Copyright IBM Corp. 2006, 2008 iii
Removing the BladeCenter PCI Express I/O Expansion Unit . . . . . . . .32
Installing the optional InfiniBand card . . . . . . . . . . . . . . . . .33
Adding I/O DDR2 memory modules . . . . . . . . . . . . . . . . .36
Replacing DIMM fillers . . . . . . . . . . . . . . . . . . . . . .37
Installing the SAS expansion card . . . . . . . . . . . . . . . . . .38
Installing the BladeCenter PCI Express I/O Expansion Unit . . . . . . . .40
Removing the blade-server front bezel assembly . . . . . . . . . . . .41
Replacing the system board base and planar . . . . . . . . . . . . . .41
Replacing the battery . . . . . . . . . . . . . . . . . . . . . .42
Using the miscellaneous parts kit . . . . . . . . . . . . . . . . . .45
Replacing the ball studs . . . . . . . . . . . . . . . . . . . .45
Finishing the installation . . . . . . . . . . . . . . . . . . . . .47
Installing the front bezel assembly . . . . . . . . . . . . . . . . .47
Closing the blade server cover . . . . . . . . . . . . . . . . . .49
Input/output connectors and devices . . . . . . . . . . . . . . . . .49
Chapter 5. Diagnostics and troubleshooting . . . . . . . . . . . . .51
Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . .51
Basic checks . . . . . . . . . . . . . . . . . . . . . . . . .51
Finding troubleshooting information . . . . . . . . . . . . . . . . .52
Troubleshooting charts . . . . . . . . . . . . . . . . . . . . . .52
Problems indicated by the front panel LEDs . . . . . . . . . . . . .52
Problems indicated by the system board LEDS . . . . . . . . . . . .54
Power problems . . . . . . . . . . . . . . . . . . . . . . .57
Power throttling . . . . . . . . . . . . . . . . . . . . . . .57
Network connection problems . . . . . . . . . . . . . . . . . .57
Service processor problems . . . . . . . . . . . . . . . . . . .58
Software problems . . . . . . . . . . . . . . . . . . . . . .58
Recovering the system firmware code . . . . . . . . . . . . . . . .59
Checking the boot image . . . . . . . . . . . . . . . . . . . .59
Booting from the TEMP image . . . . . . . . . . . . . . . . . .59
Recovering the TEMP image from the PERM image . . . . . . . . . .59
Supported boot media . . . . . . . . . . . . . . . . . . . . . .59
Booting the system . . . . . . . . . . . . . . . . . . . . . . .60
Diagnostic programs and messages . . . . . . . . . . . . . . . . .62
Running diagnostics and preboot DSA . . . . . . . . . . . . . . .62
Diagnostic text messages . . . . . . . . . . . . . . . . . . . .63
Viewing the test log . . . . . . . . . . . . . . . . . . . . . .63
DSA error messages . . . . . . . . . . . . . . . . . . . . . .63
CPU test results . . . . . . . . . . . . . . . . . . . . . . .64
BMC test results . . . . . . . . . . . . . . . . . . . . . . .64
Memory tests . . . . . . . . . . . . . . . . . . . . . . . .70
System firmware startup messages . . . . . . . . . . . . . . . . .71
Boot errors and handling . . . . . . . . . . . . . . . . . . . . .72
Boot list . . . . . . . . . . . . . . . . . . . . . . . . . .72
System firmware update errors . . . . . . . . . . . . . . . . . .74
Memory initialization errors . . . . . . . . . . . . . . . . . . .75
USB errors . . . . . . . . . . . . . . . . . . . . . . . . .75
Network boot errors . . . . . . . . . . . . . . . . . . . . . .77
SAS boot errors . . . . . . . . . . . . . . . . . . . . . . .79
I/O DIMM boot-time errors . . . . . . . . . . . . . . . . . . . .86
Other error messages . . . . . . . . . . . . . . . . . . . . .88
BMC firmware messages . . . . . . . . . . . . . . . . . . . . .89
NMI error messages . . . . . . . . . . . . . . . . . . . . . .92
Problem reporting . . . . . . . . . . . . . . . . . . . . . . . .94
Problem description . . . . . . . . . . . . . . . . . . . . . . .94
iv BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Solving undetermined problems . . . . . . . . . . . . . . . . . . .95
Calling IBM for service . . . . . . . . . . . . . . . . . . . . . .96
Appendix A. Using the SMS utility . . . . . . . . . . . . . . . . .97
Starting the SMS utility . . . . . . . . . . . . . . . . . . . . . .97
The SMS utility menu . . . . . . . . . . . . . . . . . . . . . .97
Select Language . . . . . . . . . . . . . . . . . . . . . . .98
Setup Remote IPL (Initial Program Load) . . . . . . . . . . . . . .98
IP Parameters . . . . . . . . . . . . . . . . . . . . . . .99
Adapter Configuration . . . . . . . . . . . . . . . . . . . . . 100
Ping Test . . . . . . . . . . . . . . . . . . . . . . . . . 101
Advanced Setup: DHCP . . . . . . . . . . . . . . . . . . . . 101
Change SCSI Settings . . . . . . . . . . . . . . . . . . . . 101
Select Console . . . . . . . . . . . . . . . . . . . . . . . 101
Select Boot Options . . . . . . . . . . . . . . . . . . . . . 102
Firmware Boot Side Options . . . . . . . . . . . . . . . . . . 104
Progress Indicator History . . . . . . . . . . . . . . . . . . . 104
FRU information . . . . . . . . . . . . . . . . . . . . . . . 105
Adding FRU information . . . . . . . . . . . . . . . . . . . 106
SAS Settings . . . . . . . . . . . . . . . . . . . . . . . . 108
Appendix B. Getting help and technical assistance . . . . . . . . . . 111
Before you call . . . . . . . . . . . . . . . . . . . . . . . . 111
Using the documentation . . . . . . . . . . . . . . . . . . . . . 111
Getting help and information from the World Wide Web . . . . . . . . . 111
Software service and support . . . . . . . . . . . . . . . . . . .112
Hardware service and support . . . . . . . . . . . . . . . . . . .112
Appendix C. Notices . . . . . . . . . . . . . . . . . . . . . .113
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . .114
Important notes . . . . . . . . . . . . . . . . . . . . . . . .114
Product recycling and disposal . . . . . . . . . . . . . . . . . . .115
Battery return program . . . . . . . . . . . . . . . . . . . . . .116
Electronic emission notices . . . . . . . . . . . . . . . . . . . .117
Federal Communications Commission (FCC) statement . . . . . . . .117
Industry Canada Class A emission compliance statement . . . . . . . .118
Avis de conformité à la réglementation d’Industrie Canada . . . . . . .118
Australia and New Zealand Class A statement . . . . . . . . . . . .118
United Kingdom telecommunications safety requirement . . . . . . . .118
Deutschsprachiger EU Hinweis: Hinweis für Geräte der Klasse A
EU-Richtlinie zur Elektromagnetischen Verträglichkeit . . . . . . . .118
Deutschland: Einhaltung des Gesetzes über die elektromagnetische
Verträglichkeit von Geräten . . . . . . . . . . . . . . . . .118
Zulassungsbescheinigung laut dem Deutschen Gesetz über die
elektromagnetische Verträglichkeit von Geräten (EMVG) (bzw. der EMC
EG Richtlinie 2004/108/EG) für Geräte der Klasse A . . . . . . . .118
European Union EMC Directive conformance statement . . . . . . . .119
Taiwanese Class A warning statement . . . . . . . . . . . . . . .119
Japanese Voluntary Control Council for Interference (VCCI) statement 120
Korean Class A warning statement . . . . . . . . . . . . . . . . 120
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Contents v
vi BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Safety
Before installing this product, read the Safety Information.
Antes de instalar este produto, leia as Informações de Segurança.
Pred instalací tohoto produktu si prectete prírucku bezpecnostních instrukcí.
Læs sikkerhedsforskrifterne, før du installerer dette produkt.
Lees voordat u dit product installeert eerst de veiligheidsvoorschriften.
Ennen kuin asennat tämän tuotteen, lue turvaohjeet kohdasta Safety Information.
Avant d’installer ce produit, lisez les consignes de sécurité.
Vor der Installation dieses Produkts die Sicherheitshinweise lesen.
Prima di installare questo prodotto, leggere le Informazioni sulla Sicurezza.
Les sikkerhetsinformasjonen (Safety Information) før du installerer dette produktet.
Antes de instalar este produto, leia as Informações sobre Segurança.
© Copyright IBM Corp. 2006, 2008 vii
Antes de instalar este producto, lea la información de seguridad.
Läs säkerhetsinformationen innan du installerar den här produkten.
Guidelines for trained service technicians:
This section contains information for trained service technicians.
Inspecting for unsafe conditions:
Use the information in this section to help you identify potential unsafe conditions in an IBM product that you are working on. Each IBM product, as it was designed and manufactured, has required safety items to protect users and service technicians from injury. The information in this section addresses only those items. Use good judgment to identify potential unsafe conditions that might be caused by non-IBM alterations or attachment of non-IBM features or options that are not addressed in this section. If you identify an unsafe condition, you must determine how serious the hazard is and whether you must correct the problem before you work on the product.
Consider the following conditions and the safety hazards that they present:
v Electrical hazards, especially primary power. v Primary voltage on the frame can cause serious or fatal electrical shock. v Explosive hazards, such as a damaged CRT face or a bulging capacitor. v Mechanical hazards, such as loose or missing hardware.
inspect the product for potential unsafe conditions, complete the following steps:
To
1. Make sure that the power is off and the power cord is disconnected.
2. Make sure that the exterior cover is not damaged, loose, or broken, and observe any sharp edges.
3. Check the power cord:
v Make sure that the third-wire ground connector is in good condition. Use a
meter to measure third-wire ground continuity for 0.1 ohm or less between the external ground pin and the frame ground.
v Make sure that the power cord is the correct type, as specified in the
documentation for your BladeCenter unit type.
v Make sure that the insulation is not frayed or worn.
Remove the cover.
4.
5. Check for any obvious non-IBM alterations. Use good judgment as to the safety of any non-IBM alterations.
6. Check inside the blade server for any obvious unsafe conditions, such as metal filings, contamination, water or other liquid, or signs of fire or smoke damage.
7. Check for worn, frayed, or pinched cables.
8. Make sure that the power-supply cover fasteners (screws or rivets) have not been removed or tampered with.
Guidelines for servicing electrical equipment:
Observe the following guidelines when servicing electrical equipment:
v Check the area for electrical hazards such as moist floors, nongrounded power
extension cords, and missing safety grounds.
viii BladeCenter QS21 Type 0792: Problem Determination and Service Guide
v Use only approved tools and test equipment. Some hand tools have handles that
are covered with a soft material that does not provide insulation from live electrical current.
v Regularly inspect and maintain your electrical hand tools for safe operational
condition. Do not use worn or broken tools or testers.
v Do not touch the reflective surface of a dental mirror to a live electrical circuit.
The surface is conductive and can cause personal injury or equipment damage if it touches a live electrical circuit.
v Some rubber floor mats contain small conductive fibers to decrease electrostatic
discharge. Do not use this type of mat to protect yourself from electrical shock.
v Do not work alone under hazardous conditions or near equipment that has
hazardous voltages.
v Locate the emergency power-off (EPO) switch, disconnecting switch, or electrical
outlet so that you can turn off the power quickly in the event of an electrical accident.
v Disconnect all power before you perform a mechanical inspection, work near
power supplies, or remove or install main units.
v Before you work on the equipment, disconnect the power cord. If you cannot
disconnect the power cord, have the customer power-off the wall box that supplies power to the equipment and lock the wall box in the off position.
v Never assume that power has been disconnected from a circuit. Check it to
make sure that it has been disconnected.
v If you have to work on equipment that has exposed electrical circuits, observe
the following precautions: Make sure that another person who is familiar with the power-off controls is
near you and is available to turn off the power if necessary.
When you are working with powered-on electrical equipment, use only one
hand. Keep the other hand in your pocket or behind your back to avoid creating a complete circuit that could cause an electrical shock.
When using a tester, set the controls correctly and use the approved probe
leads and accessories for that tester.
Stand on a suitable rubber mat to insulate you from grounds such as metal
floor strips and equipment frames.
Use extreme care when measuring high voltages.
v
v To ensure proper grounding of components such as power supplies, pumps,
blowers, fans, and motor generators, do not service these components outside of their normal operating locations.
v If an electrical accident occurs, use caution, turn off the power, and send another
person to get medical aid.
Important:
All caution and danger statements in this documentation begin with a number. This number is used to cross reference an English caution or danger statement with translated versions of the caution or danger statement in the IBM Safety Information book.
For example, if a caution statement begins with a number 1, translations for that caution statement appear in the IBM Safety
Information book under statement 1.
Safety ix
Be sure to read all caution and danger statements in this documentation before performing the instructions. Read any additional safety information that comes with the blade server or optional device before you install the device.
x BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Statement 1:
DANGER
Electrical
current from power, telephone, and communication cables is
hazardous.
To avoid a shock hazard: v Do not connect or disconnect any cables or perform installation,
maintenance, or reconfiguration of this product during an electrical storm.
v Connect all power cords to a properly wired and grounded electrical
outlet.
v Connect to properly wired outlets any equipment that will be attached to
this product.
v When possible, use one hand only to connect or disconnect signal
cables.
v Never turn on any equipment when there is evidence of fire, water, or
structural damage.
v Disconnect the attached power cords, telecommunications systems,
networks, and modems before you open the device covers, unless instructed otherwise in the installation and configuration procedures.
v Connect and disconnect cables as described in the following table when
installing, moving, or opening covers on this product or attached devices.
To Connect: To Disconnect:
1. Turn everything OFF.
2. First, attach all cables to devices.
3. Attach signal cables to connectors.
4. Attach power cords to outlet.
1. Turn everything OFF.
2. First, remove power cords from outlet.
3. Remove signal cables from connectors.
4. Remove all cables from devices.
5. Turn device ON.
Safety xi
Statement 2:
CAUTION: When replacing the lithium battery, use only IBM Part Number 43W9859 or 03N2449 or an equivalent type battery recommended by the manufacturer. If your system has a module containing a lithium battery, replace it only with the same module type made by the same manufacturer. The battery contains lithium and can explode if not properly used, handled, or disposed of.
Do not:
v Throw or immerse into water v Heat to more than 100°C (212°F) v Repair or disassemble
Dispose
of the battery as required by local ordinances or regulations.
xii BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Statement 3:
CAUTION: When laser products (such as CD-ROMs, DVD drives, fiber optic devices, or transmitters) are installed, note the following:
v Do not remove the covers. Removing the covers of the laser product could
result in exposure to hazardous laser radiation. There are no serviceable parts inside the device.
v Use of controls or adjustments or performance of procedures other than
those specified herein might result in hazardous radiation exposure.
DANGER
laser products contain an embedded Class 3A or Class 3B laser
Some diode. Note the following.
Laser radiation when open. Do not stare into the beam, do not view directly with optical instruments, and avoid direct exposure to the beam.
Class 1 Laser Product Laser Klasse 1 Laser Klass 1 Luokan 1 Laserlaite Appareil A Laser de Classe 1
`
Safety xiii
Statement 4:
18 kg (39.7 lb) 32 kg (70.5 lb) 55 kg (121.2 lb)
CAUTION: Use safe practices when lifting.
Statement 5:
CAUTION: The power control button on the device and the power switch on the power supply do not turn off the electrical current supplied to the device. The device also might have more than one power cord. To remove all electrical current from the device, ensure that all power cords are disconnected from the power source.
2
1
xiv BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Statement 8:
CAUTION: Never remove the cover on a power supply or any part that has the following label attached.
Hazardous voltage, current, and energy levels are present inside any component that has this label attached. There are no serviceable parts inside these components. If you suspect a problem with one of these parts, contact a service technician.
Statement 13:
DANGER
Overloading a branch circuit is potentially a fire hazard and a shock hazard under certain conditions. To avoid these hazards, ensure that your system electrical requirements do not exceed branch circuit protection requirements. Refer to the information that is provided with your device for electrical specifications.
Statement 21:
CAUTION: Hazardous energy is present when the blade is connected to the power source. Always replace the blade cover before installing the blade.
Safety xv
WARNING: Handling the cord on this product or cords associated with accessories
sold with this product, will expose you to lead, a chemical known to the State of California to cause cancer, and birth defects or other reproductive harm. Wash
hands after handling.
ADVERTENCIA: El contacto con el cable de este producto o con cables de
accesorios que se venden junto con este producto, pueden exponerle al plomo, un elemento químico que en el estado de California de los Estados Unidos está considerado como un causante de cancer y de defectos congénitos, además de otros riesgos reproductivos. Lávese las manos después de usar el producto.
xvi BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Chapter 1. Introduction
This Problem Determination and Service Guide contains information to help you solve problems that might occur when installing and using your IBM® BladeCenter®. It describes the diagnostic tools that come with the BladeCenter QS21, error codes and suggested actions. It also describes how to replace failing components.
Replaceable components are of three types:
v Tier 1 customer replaceable unit (CRU): Replacement of Tier 1 CRUs is your
responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation.
v Tier 2 CRU: Yo u may install a Tier 2 CRU yourself or request IBM to install it, at
no additional charge, under the type of warranty service that is designated for your server.
v Field replaceable unit (FRU): FRUs must be installed only by trained service
technicians.
information about the terms of the warranty and getting service and assistance,
For see Warranty and Support Information.
The illustrations in this document might differ slightly from the hardware.
Note:
Related documentation
In addition to this document, the following documentation also comes with the server:
v Installation and User’s Guide
This printed document contains general information about the blade server, including how to install supported options and how to configure the blade server.
v Safety Information
This document is in Portable Document Format (PDF) on the Documentation CD. It contains translated caution and danger statements. Each caution and danger statement that appears in the documentation has a number that you can use to locate the corresponding statement in your language in the Safety Information document.
v Warranty and Support Information
This document is in PDF on the Documentation CD. It contains information about the terms of the warranty and about service and assistance.
v IBM Software Development Kit for Multicore Acceleration Version 3.0.0
Installation Guide
This document is in PDF and can be downloaded from http://www.ibm.com/ support/us/en/. and how to program applications for the blade server.
It contains information about how to install the operating system
Depending
Documentation CD.
The blade server might have features that are not described in the documentation that comes with the server. The documentation might be updated occasionally to include information about those features, or technical updates might be available to
© Copyright IBM Corp. 2006, 2008 1
on the server model, additional documentation might be included on the
provide additional information that is not included in the blade server documentation. The most recent versions of all BladeCenter documentation are at http://www.ibm.com/support/us/en/.
In addition to the documentation in this library, be sure to review the planning and installation documents for your BladeCenter hardware available at http://www.ibm.com/support/us/en/.
Updates might be available for this document. You can check for the most recent version at http://www.ibm.com/support/us/en/.
Notices and statements used in this document
The caution and danger statements that appear in this document are also in the multilingual Safety Information document, which is on the Documentation CD. Each statement is numbered for reference to the corresponding statement in the Safety
Information document.
The following notices and statements are used in this document:
v Notes: These notices provide important tips, guidance, or advice. v Important: These notices provide information or advice that might help you avoid
inconvenient or problem situations.
v Attention: These notices indicate potential damage to programs, devices, or
data. An attention notice is placed just before the instruction or situation in which damage could occur.
v Caution: These statements indicate situations that can be potentially hazardous
to you. A caution statement is placed just before the description of a potentially hazardous procedure step or situation.
v Danger: These statements indicate situations that can be potentially lethal or
extremely hazardous to you. A danger statement is placed just before the description of a potentially lethal or extremely hazardous procedure step or situation.
Features and specifications
The following table provides a summary of the features and specifications of the BladeCenter QS21.
Through the BladeCenter Advanced Management Module, you can view the blade server firmware code and other hardware configuration information.
The BladeCenter QS21 is an accessory for the BladeCenter H Type 8852 unit and the BladeCenter HT Type 8740 and 8750 (enterprise environment only).
Providing it is supported by the BladeCenter unit, you can install and operate any other model of blade server in the same BladeCenter unit as a BladeCenter QS21.
Note: Power, cooling, removable-media drives, external ports, and advanced
system management are provided by the IBM BladeCenter H and HT units. For more information, see the documentation that comes with your BladeCenter unit.
2 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Table 1. Blade server features and specifications
Microprocessor:
Integrated functions:
Two IBM Cell/B.E. PowerPC 64-bit architecture processors w/VMX with 8 Synergistic Processor Units (SPU), 512 KB L2 cache, 256 KB on each Synergistic Processing Engine (SPE)
v Two 1 Gigabit Ethernet controllers v Local service processor v 2 Cell/B.E. companion chips each
providing a PCIe and a single PCI-X interface
Memory: Fixed system memory
configuration of 2 GB XDR memory, 1 GB per Cell Broadband Engine
(Cell/B.E.) processor. Extra memory cannot be added
v RS-485 interface for
communication with BladeCenter Management Module
v USB Controller
Supported
v Serial attached SCSI (SAS)
expansion card
v High-Speed InfiniBand Card,
IB-4x
v I/O Buffer DIMM VLP DDR2 512
MB, total 1 GB per channel
Options:
Environment:
v Ambient temperature:
Operating temperature: 25°C to
35°C (77°F to 95°F). Altitude: 0 to 2133 m (0 to 7000 ft)
v
Humidity:
Operating temperature: 8% to
80%
Size:
v Height: 24.5 cm (9.7 inches) v Depth: 44.6 cm (17.6 inches) v Width: 2.9 cm (1.14 inches) v Maximum weight: 5 kg (13.2 lb)
Electrical
input:
v Power supply: 12 V dc
Support for local storage
The BladeCenter provides a SAS solution for local storage. This comprises a SAS expansion card attached to the blade server, a SAS switch in the rear of the chassis, and various options to attach storage to that integrated SAS switch. An optional SAS expansion card is available for the BladeCenter QS21.
Storage can be attached via the external SAS host controller. The BladeCenter QS21 supports the SAS drives of the IBM System Storage™ DS3200 and the IBM System Storage EXP3000 expansion unit. Check the IBM BladeCenter support Web site for details of supported SAS drives at http://www.ibm.com/support/us/en/.
Turning on the blade server
The BladeCenter QS21 is hot-swappable and can be inserted into the BladeCenter unit when the unit is already powered up. However, it can only be powered on by one of the methods described in this section. While the blade server is powering up, the power-on LED on the front of the server is lit. See “Blade server controls and LEDs” on page 6 for the power-on LED states.
After you have installed the BladeCenter QS21 into a powered up BladeCenter unit, wait until the power on LED on the blade server flashes slowly before turning on the blade server.
You can turn on the blade server in any of the following ways:
Using the power control button
You can press the power-control button Figure 1 on page 4 which is behind the control-panel door on the front of the blade server if local power control is enabled for the blade server. Local power control is enabled and disabled through the BladeCenter Management Module Web interface.
Chapter 1. Introduction 3
Power-control button
Figure 1. Blade server power button
Using the BladeCenter Advanced Management Module
You can use the BladeCenter Management Module Web interface to turn on the blade server remotely.
Using the Wake on LAN® feature:
If you want to use the Wake on LAN feature, the feature must be enabled in the installed operating system and it must not have been disabled through the Advanced Management Module.
In the event of a power failure the BladeCenter unit and then the blade server can
start automatically when power is restored. You must configure this through the BladeCenter Advanced Management Module. See the BladeCenter Management
Module User's Guide for further information about this feature.
Turning off the blade server
When you turn off the blade server, it is still connected to power through the BladeCenter unit and can continue to respond to requests from the service processor, including remote requests to turn the blade server on. To remove all power from the blade server, you must physically remove it from the BladeCenter unit or power off the BladeCenter unit.
To avoid loss of data, shut down the Linux® operating system before you turn off the blade server. Shut down the operating system by entering the shutdown -h now command at the command prompt or by choosing shutdown if you are using a graphical user interface (GUI). See your operating system documentation for additional information about shutting down the operating system.
If the BladeCenter unit has not been turned off, the blade server can be turned off in any of the following ways:
Using the power control button
You can press the power control button behind the control-panel door on the front panel of the blade server. This starts an orderly shutdown of the operating system, providing your operating system supports this feature, before turning off the BladeCenter QS21. If the operating system stops functioning, pressing and holding the power control button for more than 4 seconds turns off the blade server.
Using the BladeCenter Advanced Management Module
You can use the Advanced Management Module Web interface to turn off
4 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
the blade server remotely. Yo u can also configure the Advanced Management Module to turn off the blade server automatically if the system is not operating correctly.
Note: After turning off the blade server, wait at least 5 seconds before turning it on
again.
Chapter 1. Introduction 5
Blade server controls and LEDs
This section describes the controls and LEDs on the front panel of the blade server. For further information about the LEDs and how they can be used to assist in troubleshooting, see “Problems indicated by the front panel LEDs” on page 52.
Information LED
Location LED
Activity LED
Power-on LED
Media-tray select button
Power-control button
Blade-error LED
NMI reset-button
CD
Figure 2. Power-control button and LEDS
Note: The control panel door which normally covers the LEDs and power-control
button is omitted for reasons of clarity.
Activity LED:
This green LED lights when there is network activity.
Location LED:
This blue LED is turned on remotely by the system administrator to assist in locating the blade server. The location LED on the BladeCenter unit lights at the same time.
Information LED:
This amber LED lights to indicate that information about a system error has been placed in the Advanced Management Module Event Log. The information LED remains on until turned off by Advanced Management Module or through IBM Director Console.
Blade error LED:
This amber LED lights when a system error has occurred in the blade server.
Power control button:
Press this button to turn the blade server on or off. The power control button only has effect if local power control is enabled for the blade server. Local power control is enabled and disabled through the BladeCenter Advanced Management Module Web interface.
Media tray select button:
This button associates the shared BladeCenter unit media tray (DVD/CD drive and USB ports) with the blade server. The LED on the button flashes while the request is being processed, then lights when the ownership of the media tray has been transferred to the blade server.
It can take approximately 20 seconds for the operating system on the blade server to recognize the media tray.
6 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Power on LED:
reset button
NMI
Note: The blade error LED, information LED, and location LED can be turned off
System board LEDs
The BladeCenter QS21 has status LEDs on the system board to indicate the health of various components. Some are within the light box while others are in different location. A lit LEDs indicates an error condition. Complete information about the LEDs can be found in “Troubleshooting charts” on page 52.
This green LED indicates the power status of the blade server as follows:
v Flashing rapidly - The service processor on the blade server is
communicating with the BladeCenter Advanced Management Module.
v Flashing slowly - The blade server has power but is not turned on. v Lit continuously (steady) - The blade server has power and is turned on. v Not lit. Either the BladeCenter unit is powered off, or a power failure has
occurred on the blade server or the BladeCenter unit.
If the operating system has been installed, pressing this with a paper clip or pin causes the operating system to call the system debugger.
through the BladeCenter Management Module Web interface.
To find out what if any errors have occurred on the system board, you must:
1. Remove the blade server from the BladeCenter unit
2. Open the cover
3. Press the light path diagnostics switch
lights any error LEDs that were turned on during processing. It also lights a
This green LED to indicate the capacitor is charged and the light path diagnostics system is operating.
Figure 3 on page 8 shows the location of the light path LEDs and the diagnostics switch.
Chapter 1. Introduction 7
Temperature fault LED
System board LED
CPU fail LED
NMI error LED
TEMP S BRD CPU NMI
LP
Light box
1
Error LED (JDIM )00
Light path diagnostics LED
Light path diagnostics switch
Error LED (JDIM10)
JDIM11 slot JDIM10 slot
JDIM01 slot JDIM00 slot
Error LED (JDIM01)
Error LED (JDIM11)
Figure 3. System-board LEDs
Pressing the light path diagnostics switch lights the LED(s) to indicate where an error has occurred.
System board internal and expansion card connectors
The following illustration shows the location of the connectors for user-installable options.
Connector at J201
Connector at J22 Connector at JFC_18
Connector at J200
1
Figure 4. Locations of the expansion option connectors on the system board
8 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Chapter 2. Configuring the blade server
This chapter describes how to:
v Communicate with a blade server. v Use System Management Services (SMS) to view and update the system
firmware revision number. This does not require the operating system to be installed.
v Update the baseboard management controller (BMC) firmware using the
Advanced Management Module.
v Update the system firmware using the command-line utility. v Configure the Ethernet gigabit controllers and in preparation for a network
installation of the operating system.
You can update the BMC firmware through the Advanced Management
Note:
Module Web interface without booting the operating system. However, to update the system firmware you must boot the operating system first.
Communicating with the blade server
The operating system does not have to be booted before you can communicate with the BladeCenter QS21. Yo u can access it through:
Advanced Management Module
The Web-based management and configuration program. This is your main access method to the blade server.
The command-line interface
See “Using the command-line interface” on page 10 for further information.
Serial over LAN (SOL)
This is similar to the serial interface, but allows you to connect to the blade server over the network. See “Using Serial over LAN” on page 10 for further information.
The serial interface
You can connect a PC or compatible terminal directly to the BladeCenter H or HT unit using a special cable. See “Using the serial interface” on page 10 for further information.
The BladeCenter H and HT Serial Breakout cables are not supplied
Note:
with the unit and must be ordered separately
System Management Services (SMS)
The SMS utility allows you to view and update the VPD, change the boot device and set network parameters. See “Using the SMS utility program” on page 11 for further information.
Using the Advanced Management Module
The Advanced Management Module is the main means of administering the BladeCenter system. Use the Advanced Management Module Web-based management and configuration program to:
v Configure the BladeCenter unit v Update and configure BladeCenter components including the BladeCenter QS21 v Monitor the current system status
© Copyright IBM Corp. 2006, 2008 9
v Check the event log for system and other errors
Using the Web interface
Complete the following steps to start the Web-based management and configuration program:
1. Open a Web browser. In the address or URL field, type the Internet protocol (IP) address or host name that is assigned for the Management Module remote connection. The default IP address is:
192.168.70.125
The Enter Network Password window opens.
2. Type your user name and password. Before you log in to the Advanced Management Module for the first time, contact your system administrator regarding whether your organization has assigned a user name and password to you. Use the initial (default) user name and password the first time that you log in to the Advanced Management Module. If you have an assigned user name and password, use them for all subsequent logins. All login attempts are documented in the event log.
The initial user ID and password for the Advanced Management Module are:
User ID
USERID (all capital letters)
Password
Follow the instructions that appear on the screen. Be sure to set the timeout
3. value that you want for your Web session.
BladeCenter management and configuration window opens.
The
For additional information, see the IBM BladeCenter Advanced Management Module User's Guide.
Using the command-line interface
The IBM BladeCenter Advanced Management Module also provides a command-line interface to provide direct access to BladeCenter management functions. Yo u can use this as an alternative to using the BladeCenter Management Module Web interface.
Through the command-line interface, you can issue commands to control the power and configuration of the blade server and other components in the BladeCenter unit. For information and instructions, see the IBM BladeCenter Management Module Command-Line Interface Reference Guide.
Using Serial over LAN
To establish a Serial over LAN (SOL) connection to the blade server, you must configure the SOL feature for the blade server and start an SOL session as described in theIBM BladeCenter Serial over LAN Setup Guide. In addition, the Advanced Management Module must be configured as described in the IBM BladeCenter Management Module User’s Guide, and the BladeCenter unit must be configured as described in the IBM BladeCenter Serial over LAN Setup Guide.
PASSW0RD (note the number zero, not the letter O, in PASSW0RD)
Using the serial interface
Use the serial interface to:
v Observe firmware progress.
10 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
v Access the Linux terminal in order to configure Linux.
can connect a PC serially through the BladeCenter unit using a specific UART
You cable. To connect to the serial console, plug the serial cable into the BladeCenter unit and connect the other end to a serial device or computer with a serial port. For more information, see the documentation that comes with your BladeCenter unit.
Set the following parameters for the serial connection on the terminal client:
v 115200 baud v 8 data bits v No parity v One stop bit v No flow control
default, the blade server sends output over SOL and to the serial port on the
By BladeCenter unit. However, the default for input is to use SOL. If you wish to use a device connected to the serial port for input you must press any key on that device while the blade server boots.
Using the SMS utility program
The Advanced Management Module is the main means of administering the BladeCenter unit and the BladeCenter servers. However, another utility is provided which in some cases can give more information than that displayed in the Advanced Management Module. This is the System Management Services (SMS) utility program.
The SMS utility program allows you to view and update the VPD, change the boot list and set network parameters.
Starting SMS
Complete the following steps to start SMS:
1. Using a Telnet or SSH client, connect to the Advanced Management Module external Ethernet interface IP address.
2. When prompted, enter a valid user ID and password. The default management module user ID is USERID, and the default password is PASSW0RD, where the 0 is a zero.
Note: The user ID and password may have been changed. If so, check with the
system administrator for a valid id and password.
3. Power cycle the blade server and start an SOL console session by using the
power -cycle -c command.
For example, to power cycle and start an SOL remote text console with a blade server that is in the first bay of the BladeCenter unit, issue the command:
power -cycle -c -T system:blade[1]
To open a console session with a blade server that is already powered on, use the command:
console -T system:blade[1]
4. After approximately 30 seconds, you see a sequence of checkpoint codes displayed on the console. These codes are generated by the Power On Self Test (POST).
5. When the POST menu and indicators displays a screen similar to:
Chapter 2. Configuring the blade server 11
QS21 Firmware Starting
Check ROM = OK Build Date = Apr 24 2007 09:32:34 FW Version = "QB-1.6.0-0"
Press "F1" to enter Boot Configuration (SMS)
Initializing memory configuration... MEMORY
Modules = Elpida 512MB, 3200 MHz XDRlibrary = v0.32, Bin A/C, RevB, DualDD Calibrate = Done Test = Done
SYSTEM INFORMATION
Processor = Cell/B.E.(TM) DD3.2 @ 3200 MHz I/O Bridge = Cell BE companion chip DD2.x Timebase = 26666 kHz (internal) SMP Size = 2 (4 threads) Boot-Date = 2007-06-08 11:20 Memory = 2048MB (CPU0: 1024MB, CPU1: 1024MB)
Press F1 to display the SMS menu.
Viewing FRU information
The VPD on each blade server contains details about the machine type or model, serial number and the universal unique ID.
Complete the following steps to see this information:
1. Start SMS by completing the above steps. The SMS menu appears:
PowerPC Firmware Version HEAD SLOF-SMS 1.6 (c) Copyright IBM Corp. 2000,2005,2007 All rights reserved.
--------------------------------------------------------------------------
Main Menu
1. Select Language
2. Setup Remote IPL (Initial Program Load)
3. Change SCSI Settings
4. Select Console
5. Select Boot Options
6. Firmware Boot Side Options
7. Progress Indicator History
8. FRU Information
9. Change SAS Boot Device
---------------------------------------------------------------------------
Navigation Keys:
X = eXit System Management Services
---------------------------------------------------------------------------
Type menu item number and press Enter or select Navigation key:
---------------------------------------------------------------------------
2. Type 8 to select FRU Information. A screen similar to the following appears:
12 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
PowerPC Firmware
Version HEAD SLOF-SMS 1.6 (c) Copyright IBM Corp. 2000,2005,2007 All rights reserved.
--------------------------------------------------------------------------------
FRU Information
Machine Type and Model: 079232x
Machine Serial Number: ABCDEFG
Universal Unique ID: 12345678-1234-1234-1234-123456789ABC
--------------------------------------------------------------------------------
Navigation Keys: M = return to Main Menu ESC key = return to previous screen X = eXit System Management Services
--------------------------------------------------------------------------------
Select Navigation key :
Note: You cannot change the FRU information from this screen, only view it.
Adding
FRU information: When you replace a FRU details are not recorded in
the VPD. Yo u must enter them manually through SMS.
When the system firmware detects an FRU replacement part during boot the process stops to allow you to enter the machine type or model and serial number. Boot does not continue until the information is provided.
To enter new FRU information, complete the following steps:
1. Using a Telnet or SSH client, connect to the Advanced Management Module external Ethernet interface IP address.
2. When prompted, enter a valid user ID and password. The default management module user ID is USERID, and the default password is PASSW0RD, where the 0 is a zero.
Note: The userid and password may have been changed. If so, check with the
system administrator for a valid user id and password.
3. Power cycle the blade server and start an SOL console by using the power
-cycle -c command. See “Using the SMS utility program” on page 11 for
further information.
4. The following screen appears:
Chapter 2. Configuring the blade server 13
PowerPC Firmware
Version HEAD SLOF-SMS 1.6 (c) Copyright IBM Corp. 2000,2005,2007 All rights reserved.
--------------------------------------------------------------------------------
Enter Type Model Number (Must be 7 characters, only A-Z, a-z, 0-9 allowed. Press Esc to skip)
Enter Type Model Number :
Type the model number according to the instructions on the screen and press
Enter to continue.
5. Yo u must confirm the model number:
PowerPC Firmware
Version HEAD SLOF-SMS 1.6 (c) Copyright IBM Corp. 2000,2005,2007 All rights reserved.
--------------------------------------------------------------------------------
Number entered is: 1234567 Accept number? (Enter ’y’ or ’Y’ to accept or ’n’ or ’N’ to decline)
Select Navigation key :
Type y or Y and press Enter to confirm the number.
6. At the following screen, type the serial number:
14 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
PowerPC Firmware
Version HEAD SLOF-SMS 1.6 (c) Copyright IBM Corp. 2000,2005,2007 All rights reserved.
--------------------------------------------------------------------------------
Enter Serial Number (Must be 7 characters, only A-Z, a-z, 0-9 allowed)
Enter Serial Number :
---------------------------------------------------------------------------------
Press Enter to continue.
7. Yo u must now confirm the serial number:
PowerPC Firmware
Version HEAD SLOF-SMS 1.6 (c) Copyright IBM Corp. 2000,2005,2007 All rights reserved.
--------------------------------------------------------------------------------
Number entered is: ABCDEFG Accept number? (Enter ’y’ or ’Y’ to accept or ’n’ or ’N’ to decline)
Select Navigation key :
---------------------------------------------------------------------------------
Type y or Y and press Enter to confirm the number. This completes the process and the blade server continues to boot as normal.
Updating the system and BMC firmware
The firmware consists of two distinct packages:
v A firmware package for the baseboard management controller (BMC). This is
referred to as the BMC firmware.
v A firmware package for the basic input/output system (BIOS) which runs on the
Cell/B.E. processor. This is referred to as system firmware.
Chapter 2. Configuring the blade server 15
Note: The user and operating system interfaces of the system firmware are
based on the Open Firmware standard. Detailed system information is provided through the Open Firmware device tree. You can use the client interface and Run-Time Abstraction Services (RTAS) to run management functions.
firmware
BMC
v Communicates with advanced management module v Controls power on v Initializes the board, including the Cell/B.E. processors and clock chips v Monitors the physical board environment
System firmware
v Takes over when the BMC has successfully initialized the board v Acts as the basic input/output system (BIOS) v Includes boot-time diagnostics and power-on self test v Prepares the system for the operating system boot
packages are delivered separately and do not follow the same versioning
The scheme.
Updating steps
IBM provides two basic update options for updating or flashing the firmware: online and offline. The offline method requires you to use an alternate bootable media to restart the server and perform the firmware update. For greater convenience and flexibility, IBM now also provides online updates that you can install while the operating system is running. The online method allows you to run the update at any time, with the flexibility to restart the server at a time when it is most convenient to do so. As a best practice, use the online update packages to perform all of your basic update functions
IBM periodically makes updates to both BMC and system firmware. These may be downloaded from http://www.ibm.com/support/us/en/.
Note: To avoid problems and to maintain proper system performance, always make
sure that both the BMC firmware and the system firmware are at the same level for all BladeCenter QS21 servers within the BladeCenter unit.
Complete the following steps to update the BMC and system firmware images:
1. Check the revision level of the firmware on the blade server and the level of the updates on http://www.ibm.com/support/us/en/. If the level on the Web site is higher than the version currently installed, continue with the updating steps.
2. Download the firmware updates.
3. Boot the operating system if it is not running already.
4. Update the BMC firmware using the update package or the Management Module. See “Updating the BMC firmware” on page 18 for further information.
5. Restart the blade server. This boots the blade server with the new BMC firmware.
6. Update the system firmware image. See “Installing the system firmware” on page 20 for further information.
7. The system reboots. This boots the blade server with the new system firmware.
8. Shut down the blade server.
16 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Note: There may be instances where you must update the BMC firmware before
updating the system firmware. Check the readme file that comes with each firmware package for more information.
Determining current blade server firmware levels
Complete the following steps to view the current firmware code levels for both the BMC and the system firmware:
1. Access and log on to the Advanced Management Module Web interface as described in the Management Module User's Guide.
2. From the Monitors menu section, select Firmware VPD:
The Blade Server Firmware Vital Product Data (VPD) window shows the build identifier, release, and revision level of both the system firmware/BIOS and the BMC firmware. In the example above, the system firmware or BIOS version is QB01020000 and the BMC firmware is BNBT06b.
Compare this information to the firmware information provided at http://www.ibm.com/support/us/en/. If the two match, then the blade server has the latest firmware. If not, download the firmware package from the IBM Support Web site. See “Updating the BMC firmware” on page 18 or the IBM Support Web site for installation instructions.
You can also view the firmware level from within the operating system by using the following command:
xxd /proc/device-tree/openprom/ibm,fw-vernum_encoded
Output is similar to:
0000000: 5142 3031 3031 3030 3000 00 QB0101000..
where QB0101000 is the system firmware version.
Chapter 2. Configuring the blade server 17
Note: The system firmware version displayed by the BladeCenter Advanced
Management Module might be different from the version displayed by your operating system. Cross-reference information is given in the firmware information at http://www.ibm.com/support/us/en/, and in the readme file which comes with the firmware image.
Updating the BMC firmware
You can update the BMC firmware from the Linux prompt using the update package, if you have installed RHEL 5.2, or from Advanced Management Module.
The Linux executable package allows you to run the firmware update without exiting the Linux environment. In addition, when you run it with the -x (extract) option, the package allows you to extract the Linux update files to a specified location.
Using the BMC update package
If you have not done so already you must install RHEL 5.2 or later before you can update the BMC firmware from the Linux command prompt.
Complete the following steps to update the BMC firmware from the Linux command prompt:
1. Check the README that comes with the BMC firmware as it contains specific information about that particular firmware release.
2. Boot the blade server and the operating system.
3. Download the package from the IBM support site at http://www.ibm.com/support/
The update package has a .sh extension.
us/en/.
4. Change to the directory where you have downloaded the package.
5. Run the package using the -s option.
6. Reboot the blade server.
Using the Advanced Management Module
Complete the following steps to update the BMC firmware:
1. Download the BMC firmware image file from http://www.ibm.com/support/us/en/ to a suitable location on a server that is accessible on the network. The BMC firmware image file name has the format BNBT<version number>.pkt.
2. Power off the blade server you want to update.
3. Log in to the Advanced Management Module Web interface.
4. Click Firmware Update from the Blade Tasks submenu at the left of your screen. The following screen appears:
18 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
5. Choose the blade server you want to update (target) and browse to the firmware image file.
6. Click on Update.
7. The validity of the image is checked, then the following screen appears:
Chapter 2. Configuring the blade server 19
Click Continue.
8. The next screen shows the firmware update progress:
When the update is finished, a confirmation message appears and an entry is placed in the Advanced Management Module log.
9. Power up and boot the blade server.
BladeCenter QS21 firmware contains a proprietary implementation of
Note:
Cell/B.E. hardware initialization code.
Installing the system firmware
System firmware can only be installed after the operating system has booted. If the operating system is not installed or cannot boot, then no upgrade or recovery is possible. See the other sections of the manual Chapter 5, “Diagnostics and troubleshooting,” on page 51 for further information about troubleshooting the BladeCenter QS21 blade server.
You can update the system firmware:
v Through IBM Director. See the IBM Director documentation on the IBM Director
CD for further information.
v Using the update package available from http://www.ibm.com/support/us/en/. See
“Updating the system firmware automatically” on page 22 for further information on how to perform an update.
v Using the update_flash script available on supported Linux operating systems.
This requires the system firmware image file. See “The firmware update package” on page 21 for information about how to extract the file.
v Updating the firmware manually. See “Installing the firmware manually” on page
22 for further information.
20 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
For all the above options Linux needs to have a current version of rtas_flash device driver installed. This is normally installed with the operating system. If it is not, see the installation guide for the Software Development Kit for Multicore
Acceleration for instructions about how to get this device driver and install it.
Note: You may have to update the BMC before updating the system firmware. See
the README file that comes with the package.
The firmware update package
You can update firmware using the update packages available from http://www.ibm.com/support/us/en/. These can be installed either through IBM Director or by executing the .sh file contained in the package. This section describes how to use the update package to install the firmware update or extract the firmware image for manual installation.
To install the firmware package using IBM Director, see the documentation on the
IBM Director CD.
Note: The blade server must be configured and have a running Linux operating
system before the package can be extracted or installed.
The update package consists of 4 files:
v A file containing the change history for the BladeCenter QS21 system firmware.
This has a .chg extension.
v A file containing the update package. This has an .sh extension. v A readme file for the update package. This contains specific installation and
configuration information.
v An XML file. This file is for use by IBM Systems Management tools, including
IBM Director Update Manager, UpdateXpress CD, and UpdateXpress System Pack Installer.
Using the package
The package consists of a file with a .sh extension that runs from the Linux prompt. It has a number of options. To see what options are available, run the package without any options or with the -h switch:
# ./ibm_fw_bios_qb-1.9.1-2_linux_cell.sh
In this example, ibm_fw_bios_qb-1.9.1-3_linux_cell.sh is the name of the firmware update package. The file name changes according to the version of the firmware.
A screen similar to the following appears:
Usage:
-x /someDirectory - Extract the payload to <some directory>
-xr /someDirectory - Extract the payload plus PkgSdk files to <some directory>
-xd /dev/fd0 - Create a DOS bootable diskette - Internel floppy drive
-xd /dev/sda - Create a DOS bootable diskette - External USB floppy drive
-u - Perform update unattended
-h - Display this help screen
++debug - Display helpful debug information
Note: All other command line arguments are passed to the payload executable
The -xd options are not supported on the BladeCenter QS21 blade server.
Chapter 2. Configuring the blade server 21
The -x option
This enables to extract another executable file, in this example
ibm_fw_bios_qb-1.9.1-2.sh which in turn may be run to create the .bin file
required if you wish to update the firmware manually. See “Installing the firmware manually” for further information.
The -u option
This performs an unattended and automatic update of the system firmware. The blade server reboots automatically as part of the update process.
Updating the system firmware automatically
Complete the following steps to update the firmware automatically using the update package:
1. Check the README before attempting to update the system firmware as it contains specific information about the particular firmware release.
2. Download the update package from http://www.ibm.com/support/us/en/. The update package has a .sh extension.
3. Change to the directory where you have downloaded the package.
4. Run the package with the -u option. Using the example from above, at the command prompt enter:
./ibm_fw_bios_qb-1.9.1-2_linux_cell.sh -u
5. Check the system firmware images to confirm the update has succeeded. See “Determining current blade server firmware levels” on page 17 for instructions.
Installing the firmware manually
If you cannot update the firmware using the update_flash script, it is possible to update the firmware manually. Yo u can use rtas_flash over /proc.
Complete the following steps to install the firmware manually:
1. Download the update package from http://www.ibm.com/support/us/en/.
2. Extract the system firmware image package. At the command prompt enter:
./<update package> -x <target directory>
For example, to extract the image package ibm_fw_bios_qb-1.9.1-2.sh from ibm_fw_bios_qb-1.9.1-2_linux_cell.sh in the directory /temp/fwimage enter:
./ibm_fw_bios_qb-1.9.1-2_linux_cell.sh -x /temp/fwimage
If the directory does not exist the firmware package creates it.
3. Change to the directory containing the firmware image package.
4. Extract the firmware image. At the command prompt enter:
./<image package> -x
For example, to extract the image file QB-1.9.1-2-boot_rom.bin from ibm_fw_bios_qb-1.9.1-2.sh enter:
./ibm_fw_bios_qb-1.9.1-2.sh -x
5. Ensure the rtas_flash driver is loaded. To do this, run lsmod.
6. If the module is not yet in the kernel, invoke the following to load it:
modprobe rtas_flash
7. To update your current firmware, copy the image file to /proc/ppc64/rtas/ firmware_update and reboot manually:
cp <image-file> /proc/ppc64/rtas/firmware_update shutdown —r now
22 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
For example, to copy the image file cp QB-1.9.1-2-boot_rom.bin to /proc/ppc64/rtas/firmware_update enter:
cp QB-1.9.1-2-boot_rom.bin /proc/ppc64/rtas/firmware_update shutdown —r now
8. Once the system reboots, update the system firmware images. See “Updating the system firmware images” for instructions.
Updating the system firmware images
Once the system firmware is updated, the BladeCenter QS21 boots from the new firmware. However, there are always two copies of the system firmware image on the blade server:
TEMP This is the firmware image normally used in the boot process. When the
firmware is updated, it is the TEMP image that is replaced.
PERM This is a backup copy of the system firmware boot image. The blade server
only boots from this image if the TEMP image is corrupt. See “Recovering the system firmware code” on page 59 for further information about how to recover from a corrupt TEMP image.
you have updated the system firmware and booted the blade server, you
Once should copy the TEMP image to the PERM image. This ensures that the PERM and TEMP images are at the same revision level. The TEMP and PERM images should always be at the same revision level.
There are two commands you can use to update an old image on PERM.
v From the Linux prompt issue the following command:
update_flash -c
Note: The script checks whether the board has booted from the TEMP image. If
not, the script does not complete.
v From the Linux prompt issue the following command:
echo 0 > /proc/rtas/manage_flash
For more information on booting from the TEMP or PERM images, see “Recovering the system firmware code” on page 59.
Updating the optional expansion card firmware
If you have installed the SAS optional expansion card or the high-speed InfiniBand expansion card you may have to update the firmware. See the documentation that comes with the components for instructions about how to update the firmware.
IBM periodically makes updates available for both SAS and InfiniBand expansion cards. These may be downloaded from http://www.ibm.com/support/us/en/.
Integrating the Gigabit Ethernet controller into the BladeCenter
One dual-port Gigabit Ethernet controller is integrated on the blade server system board. Each controller port provides a 1000-Mbps full-duplex interface connecting to one of the Ethernet Switch Modules in BladeCenter unit I/O bays 1 and 2 of the BladeCenter H unit or the BladeCenter HT unit. These enable simultaneous transmission and reception of data on the Ethernet local area network (LAN).
Chapter 2. Configuring the blade server 23
Each Ethernet-controller port on the system board is routed to a different switch module in I/O bay 1 or bay 2. The routing from the Ethernet-controller port to the I/O bay varies according to whether an Ethernet adapter is enabled and the operating system that is installed. See “Blade server Ethernet controller enumeration” on page 26 for information about how to determine the routing from the Ethernet-controller ports to I/O bays for your blade server.
You do not have to set any jumpers or configure the controller for the blade server operating system. However, you must install a device driver to enable the blade server operating system to address the Ethernet-controller ports. For device drivers and information about configuring your Ethernet controller ports, see the Ethernet software documentation that comes with your blade server, or contact your IBM marketing representative or authorized reseller. For updated information about configuring the controllers, go to the Barcelona Computing Centre Web site at http://www.bsc.es/projects/deepcomputing/linuxoncell/.
If your blade server contains a different type of optional Ethernet-compatible
Note:
switch module in I/O bay 1 than the switch modules that are mentioned in this section, see the documentation that comes with the Ethernet switch module that you are using.
Updating the Ethernet controller firmware
To update the Ethernet controller firmware, you must download an update package from http://www.ibm.com/support/us/en/. This section describes how to use the update package to install the firmware update.
The update package consists of four files:
v A file containing the change history for the QS22 Ethernet Controller firmware.
This has a .chg extension.
v A file containing the update package. This has an .sh extension. v A readme file for the update package. This contains specific installation and
configuration information.
v An XML file. This file is for use by IBM Systems Management tools, including
IBM Director Update Manager, UpdateXpress CD, and UpdateXpress System Pack Installer.
Using the update package
The package consists of an file with a .sh extension that runs from the Linux prompt. It has a number of options. To see what options are available, run the package without any options or with the -h switch:
# ./brcm_fw_nic_2.0.3-e-1_rhel5_cell.sh
In the example shown above, brcm_fw_nic_2.0.3-e-1_rhel5_cell.sh is the name of the firmware update package. The file name changes according to the version of the firmware.
A screen similar to the following appears:
24 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Usage:
-x /someDirectory - Extract the payload to <some directory>
-xr /someDirectory - Extract the payload plus PkgSdk files to <some directory>
-xd /dev/fd0 - Create a DOS bootable diskette - Internel floppy drive
-xd /dev/sda - Create a DOS bootable diskette - External USB floppy drive
-u - Perform update unattended
-h - Display this help screen ++debug - Display helpful debug information
The -xd and -x options are not supported on BladeCenter QS21.
The -u option performs an unattended and automatic update of the firmware. The blade server reboots automatically as part of the update process.
Firmware update steps
Complete the following steps to update the firmware automatically:
1. Check the README before attempting to update the system firmware as it contains specific information about the particular firmware release.
2. Download the update package from http://www.ibm.com/support/us/en/. The update package has a .sh extension.
3. Change to the directory where you have downloaded the package.
4. Run the package with the -u option. Using the example from above, at the command prompt enter:
./ brcm_fw_nic_2.0.3-e-1_rhel5_cell.sh -u
During the update process, messages similar to the following appear on the console:
[root@c4b14 brcm-2.0.3-ppc]# ./ brcm_fw_nic_2.0.3-e-1_rhel5_cell.sh -u IBM Ethernet Firmware Update Tool, Version 1.0.2
Warning. No Broadcom NetXtreme II adapters found.
ADAPTER MAC BOOT IPMI ASF PXE UMP
------------------- ---- ---- --- --- --­001A640E030C (5704s) 3.21 2.20 NA NA NA 001A640E030D (5704s) NA NA NA NA NA
Updating Broadcom NetXtreme adapters. Updating 001A640E030C using file 16A8bc.bin ---> Update successful Updating 001A640E030C using file 16A8ipmi.bin ---> Update successful Error! Firmware not detected on device 001A640E030D.
Warning. No Broadcom NetXtreme II adapters found.
ADAPTER MAC BOOT IPMI ASF PXE UMP
------------------- ---- ---- --- --- --­001A640E030C (5704s) 3.38 2.47 NA NA NA 001A640E030D (5704s) NA NA NA NA NA
One or more errors occurred during the firmware update process. See /var
Note: The error message shown above is correct as it refers to an adapter not
available on BladeCenter QS21.
Chapter 2. Configuring the blade server 25
Blade server Ethernet controller enumeration
The enumeration of the Ethernet controller or controller ports in a blade server is operating system dependent. Yo u can verify the Ethernet controller or controller port designations that a blade server uses through your operating system settings.
The routing of an Ethernet controller or controller port to a particular BladeCenter unit I/O bay depends on the type of Ethernet expansion card that is installed. You can verify which Ethernet-controller port in this blade server is routed to which I/O bay by using the following test:
1. Install only one Ethernet switch module or pass-thru module, in I/O bay 1.
2. Make sure that the ports on the switch module or pass-thru module are enabled (Switch Tasks Management Advanced Switch Management in the BladeCenter Management Module Web interface).
3. Enable only one of the Ethernet-controller ports on the blade server. Note the designation that the blade server operating system has for the controller port.
4. Ping an external computer on the network connected to the Ethernet switch module. If you can ping the external computer, the Ethernet-controller port that you enabled is associated with the switch module in I/O bay 1. The other Ethernet-controller port in the blade server is associated with the switch module in I/O bay 2.
Communications from optional I/O expansion cards are routed to I/O bays 3 and 4. If you have installed an I/O expansion card on the blade server you can verify which controller port on an expansion card is routed to which I/O bay by performing the same test, using a controller on the expansion card and a compatible switch module or pass-thru module in I/O bay 3 or 4.
26 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Chapter 3. Parts listing
This parts listing supports BladeCenter QS21 replaceable components. To check for an updated parts list on the Web, do the following:
1. Go to http://www.ibm.com/support/.
2. Under Find resources, select Upgrades, accessories and parts.
Replaceable components
Replaceable components are of three types:
v Tier 1 customer replaceable unit (CRU): Replacement of Tier 1 CRUs is your
responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation.
v Tier 2 CRU: Yo u may install a Tier 2 CRU yourself or request IBM to install it, at
no additional charge, under the type of warranty service that is designated for your server.
v Field replaceable unit (FRU): FRUs must be installed only by trained service
technicians.
information about the terms of the warranty and getting service and assistance,
For see Warranty and Support Information.
The following table lists which replaceable components are available for the BladeCenter QS21.
Description FRU No.
DIMM , VLP 512 MB DDR2 I/O Buffer 39M5860 Cisco 4X Infiniband Expansion Card for IBM BladeCenter 32R1763 InfiniBand 4X DDR Expansion Card (CFFh) 43W4425 Front bezel 60H2963 BladeCenter QS21 blade assembly, base and planar 60H2960 3V lithium battery 43W9859 SAS expansion card 39Y9188 BladeCenter PCI Express I/O Expansion Unit 43W4390 DIMM filler 60H2962 Miscellaneous Parts Kit 60H3251 Blade Cover and Warning Label 46C7201 System Service Label 60H2965 FRU List Label 60H2966
Tier 1 CRU
No.
Tier 2 CRU
No.
Part numbers can change and other options can become available. For the latest information, check the IBM Web site at http://www.ibm.com/support/us/en/.
© Copyright IBM Corp. 2006, 2008 27
28 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Chapter 4. Installing and removing replaceable units
This chapter provides instructions for replacing units on the blade server. Replaceable units are components, such as memory modules, and I/O expansion cards. Some removal instructions are provided in case you need to replace one replaceable with another.
You can replace the following items:
v Battery v Front bezel assembly (control panel) v Blade server cover v Impedance air baffles v DIMM fillers v System board
can add or remove the following optional items:
You
v Cisco 4X Infiniband Expansion Card for IBM BladeCenter v InfiniBand 4X DDR Expansion Card (CFFh) v I/O buffer DDR2 memory modules v SAS expansion card v BladeCenter Expansion unit
If you wish to install the InfiniBand 4X DDR Expansion Card (CFFh) you
Note:
must install Red Hat Enterprise Linux 5.2 or higher.
Installation guidelines
Before you begin, read the following:
v Read the safety information beginning on page vii and the guidelines in “Handling
static-sensitive devices” on page 30. This information will help you work safely with the blade server and components.
v Yo u do not have to turn off the blade server or disconnect the BladeCenter unit
from power to install or replace any of the hot-swappable modules on the rear of the BladeCenter unit.
v Before you remove a hot-swappable blade server from the BladeCenter unit, you
must shut down the operating system on it by typing the shutdown -h now command or choosing the shut down option from your GUI. See “Turning off the blade server” on page 4 for details. You do not have to shut down the BladeCenter unit itself.
v Blue on a component indicates touch points, where you can grip the component
to remove it from or install it in the blade server or BladeCenter unit, open or close a latch, and so on.
v Orange on a component or an orange label on or near a component indicates
that the component can be hot-swapped. You can remove or install the component while the blade server or BladeCenter unit is running providing the blade server or BladeCenter unit and operating system support the hot-swappable capability. Orange can also indicate touch points on hot-swappable components. See the instructions for removing or installing a specific hot-swappable component for any additional procedures that you might have to perform before you remove or install the component.
© Copyright IBM Corp. 2006, 2008 29
System reliability guidelines
To help ensure proper cooling and system reliability, make sure that:
v The ventilation holes on the blade server are not blocked. v Each of the blade bays on the front of the BladeCenter unit has a blade server or
filler blade installed. Do not operate the BladeCenter unit for more than 1 minute without a blade server or filler blade installed in each blade bay.
v Yo u have followed the reliability guidelines in the documentation that comes with
the BladeCenter unit.
Handling static-sensitive devices
Attention: Static electricity can damage electronic devices and your system. To
avoid damage, keep static-sensitive devices in their static-protective packages until you are ready to install them.
To reduce the possibility of electrostatic discharge, observe the following precautions:
v Limit your movement. Movement can cause static electricity to build up around
you.
v Handle the device carefully, holding it by its edges or its frame. v Do not touch solder joints, pins, or exposed printed circuitry. v Do not leave the device where others can handle and damage it. v While the device is still in its static-protective package, touch it to an unpainted
metal part of the BladeCenter chassis for at least 2 seconds. This drains static electricity from the package and from your body.
v Remove the device from its package and install it directly into the blade server or
BladeCenter unit without setting the device down. If it is necessary to set down the device, put it back into its static-protective package. Do not place the device on the blade server cover or on a metal surface.
v Take additional care when handling devices during cold weather. Heating reduces
indoor humidity and increases static electricity.
v Wear an electrostatic-discharge wrist strap, if one is available.
30 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Removing the blade server from the BladeCenter unit
Attention:
v To maintain proper system cooling, do not operate the BladeCenter unit for more
than 1 minute without a blade server or filler blades installed in each blade bay.
v Note the number of the bay that contains the blade server before you remove it.
You must reinstall the blade server in the same bay from which it was removed. Reinstalling a blade server into a different bay than the one from which it was removed could have unexpected consequences, such as incorrect reconfiguration of the blade server. Some blade server configuration information and update options are established according to bay number.
If you reinstall the blade server into a different bay, you might have to reconfigure the blade server.
Removing the blade server
The blade server is a hot-swappable device, and the blade bays in the BladeCenter unit are hot-swappable bays. Therefore, you can install or remove the blade server without removing power from the BladeCenter unit. However, you must turn off the blade server before removing it from the BladeCenter unit.
Complete the following steps to remove the blade server:
Release­handles open
Figure 5. Removing the blade server
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. If the blade server is operating, the power on LED is lit continuously (steady). Before you remove a blade server from the BladeCenter unit, you must shut down the operating system on it by typing the shutdown -h now command or choosing the shut down option from your GUI. See “Turning off the blade server” on page 4 for details. You do not have to shut down the BladeCenter unit itself.
3. Open the two release levers as shown in the illustration. The blade server moves out of the bay approximately 0.6 cm (0.25 inch).
4. Pull the blade server out of the bay.
5. Place either a filler blade or a new blade server in the bay within 1 minute.
Chapter 4. Installing and removing replaceable units 31
Opening and removing the blade server cover
You must open the blade server cover to access, install or remove any of the replaceable items except the front bezel assembly.
Cover pins
Cover release
Cover release
Figure 6. Opening the blade server cover
Complete the following steps to open the blade server cover:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. Carefully place the blade server on a flat, static-protective surface, with the cover side up.
3. Press the blue blade cover release on each side of the blade server and lift the outer cover open (see Figure 6).
4. If you want to remove the cover, carefully lift it from the cover pins and set it aside (see Figure 6).
Statement 21:
CAUTION: Hazardous energy is present when the blade server is connected to the power source. Always replace the blade cover before installing the blade server.
Removing the BladeCenter PCI Express I/O Expansion Unit
You must remove BladeCenter PCI Express I/O Expansion Unit, if installed, to access, install or remove any of the replaceable items except the front bezel assembly.
32 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Cover pins
Cover release
Cover release
Figure 7. Removing the expansion unit
Complete the following steps to remove BladeCenter PCI Express I/O Expansion Unit:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. Carefully place the blade server on a flat, static-protective surface, with the expansion unit side facing up.
3. Press the blue blade cover release on each side of the blade server and lift the expansion unit (see Figure 7).
4. To remove the expansion unit, carefully lift it from the cover pins and set it aside.
Statement 21:
CAUTION: Hazardous energy is present when the blade server is connected to the power source. Always replace the blade cover before installing the blade server.
Installing the optional InfiniBand card
The InfiniBand card connects to the high-speed connector on the system board using the two expansion card locator pins to assist with fitting and locking in place. Use the blue handling areas to handle the card, and, when it has been placed in position, to lock it into place.
Note: If you wish to install the InfiniBand 4X DDR Expansion Card (CFFh) you
must install Red Hat Enterprise Linux 5.2 or higher.
Chapter 4. Installing and removing replaceable units 33
Locking clip
Locator pin holes
Handling areas
Figure 8. InfiniBand card handling areas
Complete the following steps to install the InfiniBand card:
1. Shut down the BladeCenter QS21.
2. Remove the BladeCenter QS21 from BladeCenter unit.
3. Remove the top cover.
4. Locate the high-speed connector at location J200 on the system board.
Ball stud
High-speed connector
Expansion card standoffs with locator pins
1
Figure 9. Expansion card connector, locator pins, and ball stud
5. Remove the connector cover.
6. Locate the expansion card locator pins at the back of the system board.
7. Locate the connector and ball socket on the InfiniBand card.
34 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Locking clip
Connector
Locator pin holes
Ball socket
Figure 10. InfiniBand card reverse view
8. Slide the InfiniBand card locator pin holes over the expansion card locator pins. The card rests on the locator pins.
Locator pin
Expansion card
Expansion connector cover
Expansion card standoff
Figure 11 . Positioning the InfiniBand card
9. Check that the ball socket on the card is over the corresponding ball stud on the main board then carefully press the InfiniBand card into position. Use the blue areas only to avoid damage to the card.
10. Check that the blue locking clip has locked into position.
11. If you do not want to install any other options, replace the cover and insert the BladeCenter QS21 into the BladeCenter unit.
Attention: The connectors on the system board and the InfiniBand card are not
designed for repeated removal or replacement of components. Avoid removing the InfiniBand card once it is in position,
Chapter 4. Installing and removing replaceable units 35
Adding I/O DDR2 memory modules
This section describes how to add extra I/O DDR2 memory. There are two slots per Cell/B.E. companion chip allowing up to 1 GB of memory for each Cell/B.E. companion chip for I/O buffering.
DIMM filler
DIMM filler
DIMM slot at JDIM11 DIMM slot at JDIM10
DIMM slot at JDIM00
DIMM slot at JDIM01
Figure 12. DIMM slot location
You must add memory as pairs of dual inline memory modules (DIMMs). Yo u may fit one or more memory modules for each buffer, but each I/O buffer must use the same type of memory module and have the same amount of memory. The minimum amount of memory you can add is 512 MB per buffer, or one module per buffer. If you fit a single pair of DIMMs you must use slots JDIM00 and JDIM11.
The BladeCenter QS21 supports VLP DDR2 512 MB DIMMs only.
The DIMMs are used as memory for the I/O buffers only. You cannot
Note:
increase the size of system memory which is fixed at 1GB for each Cell/B.E. processor.
To install extra I/O buffer memory, complete the following steps:
1. Shut down the BladeCenter QS21.
2. Remove the BladeCenter QS21 from the BladeCenter unit.
3. Open the top cover.
4. Locate the DIMM slots in which you want to insert the I/O DDR2 memory. modules.
36 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
1
Slot at JDIM01 Slot at JDIM00
Slot at JDIM11 Slot at JDIM10
Figure 13. DIMM slot location
There are four DIMM slots, two for each Cell/B.E. companion chip. If this is the first pair of DIMMs you are installing, use slots 00 and 11. Slots 00 and 11 are the two outer slots as shown in Figure 13. For a second pair of DIMMs, use the remaining slots 01 and 10.
5. Remove the DIMM fillers from the slots where you want to insert the DIMMs. Retain the DIMM fillers. Yo u need them if you remove any DIMMs from the blade server as they are an important part of the blade server cooling system.
6. Place the DIMM in the slot, contact side down. Check the orientation of the module. The central locating pin in the slot should match the corresponding cut-out on the module.
7. Carefully press the module into place until the retaining clips snap into position. Make sure that the clips are locked properly.
Figure 14. DIMM retaining clips
8. Repeat steps 6 and 7 until you have installed all the optional DIMMs.
9. Ensure that all unused DIMM slots are fitted with DIMM fillers.
10. If you do not want to install any other options, replace the cover and insert the BladeCenter QS21 into the BladeCenter unit.
Replacing DIMM fillers
For the BladeCenter QS21 cooling system to work properly there must be no empty DIMM slots. Unused slots must be fitted with DIMM fillers. Replace faulty DIMM fillers and, if you remove memory modules, fit empty slots with DIMM fillers.
To install or replace DIMM fillers, complete the following steps:
1. Shut down the BladeCenter QS21.
DIMM
Retaining clip
Chapter 4. Installing and removing replaceable units 37
2. Remove the BladeCenter QS21 from BladeCenter.
3. Open the top cover.
4. Remove any faulty DIMM fillers. a. Open the retaining clips on either end of the DIMM slot. b. Pull the filler out of the slot.
5. If you remove memory modules be sure to remove them in pairs. If you keep a single pair of memory modules they must be in the outermost slots, JDIM00 and JDIM11. See Figure 13 on page 37 for further information.
a. Open the retaining clips on either end of the DIMM slot. b. Pull the module out of the slot.
6. Carefully press the DIMM filler into the empty DIMM slot until the retaining clips snap into position.
7. Repeat step 6 until all unused slots are fitted with DIMM fillers.
8. Replace the cover and insert the BladeCenter QS21 into the BladeCenter unit.
Installing the SAS expansion card
The BladeCenter QS21 does not have any built-in disk storage. The SAS expansion card allows you to connect storage to the BladeCenter QS21. Use the blue handling areas to handle the card.
Handling areas
Figure 15. SAS expansion card handling areas
Complete the following steps to install the SAS expansion card:
1. Shut down the BladeCenter QS21.
2. Remove the BladeCenter QS21 from the BladeCenter unit.
3. Open the top cover.
4. Locate the two SAS expansion card connectors at locations J22 and JFC_18 and the ball stud on the system board.
38 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Connectors for SAS expansion card
1
Ball stud
Figure 16. SAS expansion card connector and ball stud location
5. Locate the connectors and the ball socket on the SAS card.
Connectors
Ball socket
Figure 17. SAS expansion card reverse side
6. Align the connectors on the system board with the connector on the SAS card.
Expansion card
Figure 18. SAS expansion card location
7. Using the blue handling areas, carefully push the card down to insert it into the connectors. Ensure that the ball stud on the system board engages with the ball socket on the SAS expansion card.
8. If you do not want to install any other options, replace the cover and insert the BladeCenter QS21 into the BladeCenter unit.
Chapter 4. Installing and removing replaceable units 39
Installing the BladeCenter PCI Express I/O Expansion Unit
Important:
v A BladeCenter QS21 with the BladeCenter PCI Express I/O Expansion Unit
installed takes up two contiguous slots in the BladeCenter chassis
v Yo u must remove any expansion card using the high-speed connector before
installing the expansion unit.
Cover pins
Cover release
Cover release
Figure 19. Installing the expansion unit
Complete the following steps to install the BladeCenter PCI Express I/O Expansion Unit:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. Remove the blade server cover and set it aside. See “Opening and removing the blade server cover” on page 32 for further information.
3. Remove the connector cover or any optional card from the high-speed connector. Figure 9 on page 34 shows the location of the high-speed connector.
4. Lower the expansion unit so that the slots at the rear slide down onto the cover pins at the rear of the blade server, as shown in Figure 19.
5. Carefully close the expansion unit as shown in Figure 19 until it clicks into place.
40 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Removing the blade-server front bezel assembly
Before you can replace a defective system board assembly or blade server front bezel assembly, you must first remove the blade server front bezel assembly. Figure 20 shows how to remove the front bezel assembly from a blade server.
Blade Cover
Blade-Cover Release
Bezel-Assembly Release
Blade-Cover Release
Control-Panel Cable
Bezel-Assembly Release
Control-Panel Connector
Figure 20. Removing the front bezel assembly
Complete the following steps to remove the front bezel assembly:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. Open the blade server cover.
3. Carefully disconnect the control panel cable from the control panel connector.
4. Press the front bezel release on both sides of the system board and pull the front bezel assembly away from the blade server.
5. Store the front bezel assembly in a safe place.
Replacing the system board base and planar
Bezel
Figure 21. System board assembly
Chapter 4. Installing and removing replaceable units 41
Complete the following steps to replace the system board base and planar:
1. Shut down the BladeCenter QS21.
2. Remove the BladeCenter QS21 from the BladeCenter unit.
3. Open and remove the top cover, and set it aside. See “Opening and removing the blade server cover” on page 32 for detailed instructions.
4. Remove the front bezel from the defective system board and set it aside. See “Removing the blade-server front bezel assembly” on page 41 for detailed instructions.
5. Remove any optional components from the defective system board and set them aside.
6. Note down the serial number of the defective system board. Yo u need this later to update the VPD information.
7. On the replacement system board, install the front bezel assembly. See “Installing the front bezel assembly” on page 47for detailed instructions.
8. On the replacement system board, reinstall any options you removed from the defective system board. See “Installing the optional InfiniBand card” on page 33, “Installing the SAS expansion card” on page 38 and “Adding I/O DDR2 memory modules” on page 36 for detailed instructions.
9. Replace the cover and close. See “Closing the blade server cover” on page 49 for details.
10. Reinstall the blade server in the BladeCenter unit.
11. Update the BMC, system and optional expansion card firmware as described in Chapter 2, “Configuring the blade server,” on page 9.
12. Using SMS, update the VPD information by entering the serial number of the
defective system board. See “Adding FRU information” on page 13 for details.
13. Configure the replacement blade server to boot from the same device as the
original defective unit. See the QS21 Installation and User's Guide for details.
Note:
Replacing the battery
IBM has designed this product with your safety in mind. The lithium battery must be handled correctly to avoid possible danger. If you replace the battery, you must adhere to the following instructions.
Note: In the U. S., call 1-800-IBM-4333 for information about battery disposal.
If you replace the original lithium battery with a heavy-metal battery or a battery with heavy-metal components, be aware of the following environmental consideration. Batteries and accumulators that contain heavy metals must not be disposed of with normal domestic waste. They will be taken back free of charge by the manufacturer, distributor, or representative, to be recycled or disposed of in a proper manner.
To order replacement batteries, call 1-800-IBM-SERV within the United States, and 1-800-465-7999 or 1-800-465-6666 within Canada. Outside the U.S. and Canada, call your IBM authorized reseller or IBM marketing representative.
Note: After you replace the battery, the blade server is automatically reconfigured.
However, you must reset the system date and time through the operating system that you installed.
Providing the options on the new blade server are the same as on the
old you do not have to reinstall or reconfigure the operating system but simply configure the boot options to boot from the boot device.
42 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Statement 2:
CAUTION: When replacing the lithium battery, use only IBM Part Number 43W9859 or 03N2449 or an equivalent type battery recommended by the manufacturer. If your system has a module containing a lithium battery, replace it only with the same module type made by the same manufacturer. The battery contains lithium and can explode if not properly used, handled, or disposed of.
Do not:
v Throw or immerse into water v Heat to more than 100°C (212°F) v Repair or disassemble
Dispose
of the battery as required by local ordinances or regulations.
Note: See “Battery return program” on page 116 for more information about battery
disposal.
Complete the following steps to replace the battery:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. Follow any special handling and installation instructions that come with the battery.
3. If the blade server is operating, shut down the operating system by typing the
shutdown -h now command or by choosing shut down from the GUI. If the
blade server was not powered off, press the power control button (behind the blade server control panel door) to turn off the blade server. See “Blade server controls and LEDs” on page 6 for more information about the location of the power control button.
4. Remove the blade server from the BladeCenter unit (see “Removing the blade server from the BladeCenter unit” on page 31 for information).
5. Carefully place the blade server on a flat, static-protective surface.
6. Open the blade server cover (see “Opening and removing the blade server cover” on page 32 for instructions).
7. Locate the battery (connector BH1) on the system board.
Chapter 4. Installing and removing replaceable units 43
1
Battery
Figure 22. Battery location
8. Remove the battery: a. Use one finger to press the top of the battery clip away from the battery.
The battery pops up when released.
b. Use your thumb and index finger to lift the battery from the socket. c. Dispose of the battery as required by local ordinances or regulations.
9. Insert the new battery:
a. Tilt the battery so that you can insert it into the socket, under the battery
clip.
b. Press the battery down into the socket until it clicks into place. Make sure
the battery clip holds the battery securely.
Close the blade server cover (see “Closing the blade server cover” on page
10.
49).
Statement 21:
CAUTION: Hazardous energy is present when the blade server is connected to the power source. Always replace the blade cover before installing the blade server.
11. Reinstall the blade server into the BladeCenter unit.
12. Turn on the blade server (see “Turning on the blade server” on page 3).
13. Reset the system date and time through the operating system that you installed. For additional information, see your operating-system documentation.
44 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Using the miscellaneous parts kit
The miscellaneous parts kit contains replacement parts and screws to be used if the original item is damaged. It contains the following items:
Kit, Miscellaneous Parts Quantity
Socket, alignment 4 Cover Connector Plug, 200 position 4 Pin, InfiniBand expansion card support, pivot point blocks 4 Ball stud, InfiniBand expansion card support 4 Tray, InfiniBand expansion card support end bracket 2 Pin, alignment 2 Screw, Plastite 4-20x6.35 8 Screw, 3.5 x 6 Pan Head, Philips, Planar 6 QS21 Planar Light box with transparency assembly 1 Impedance Air Baffle Top, Foam 4 Impedance Air Baffle DIMM Sides 4
To replace a support or bracket you need a Philips head screwdriver.
Replacing the ball studs
The ball studs help support the optional expansion cards and should be replaced if damaged.
To remove and replace a ball stud, complete the following steps:
1. Using a Philips head screwdriver pierce the label at the red circle corresponding with the ball stud you wish to replace.
Screw locations
Blade Service Information
Blade Cover and Be el
Blade Cover
Blade-Cover Release
Bezel-Assembly Release
I/O E pansion Option
Expansion Card
Expansion Card Standoff
Expansion Card
Blade-Cover Release
Control-Panel Cable
Control-Panel Connector
Bezel-Assembly Release
Bezel
Blade Expansion Connector Cover
S stem Board
DIMM 00 DIMM 01
Microprocessor 1
Microprocessor 2
DIMM 10 DIMM 11
Control Panel
Light Path Diagnostics( )
Light Path Diagnostics LED (Lights when capacitor is charged.) LP1 indicates the base blade light path.
LP
1
If LP1 does not light, the capacitor should be charged
TEMP
or the Light Path is defective.
SBRD
Light Path Diagnostics Button: Press button to find faults on the system board. If a memory LED is on, reseat
CPU
the component. If it is still on, replace the component.
NMI
If any of the other LEDs are on, check the
Determination and Service Guide
the problem.
Check BladeCenter cooling (blowers and air inlets at front of system). Check room temperature.
Reboot blade server. If error still exists, replace system board.
Check error log for additional information. Reboot blade server. If error still exists, replace system board.
Memor Option
NOTES:
• DIMM
DIMM Filler
DIMM Slot 00
DIMM Slot 01
to identify and solve
DIMM Installation
Pair 1
Slots 10 & 00
Pair 2
Slots 11 & 01
1
Problem
DIMM Filler
DIMM Slot 11
DIMM Slot 10
2. Carefully unscrew the ball stud and remove.
Chapter 4. Installing and removing replaceable units 45
3. Position the replacement ball stud over the hole and screw into position, taking care not to over-tighten as this might damage the system board.
46 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Finishing the installation
To complete the installation you must:
1. Reinstall the front bezel assembly on the blade server if removed. See “Installing the front bezel assembly” for further information.
2. Ensure there is a DIMM filler or a DIMM in each of the I/O buffer DIMM slots.
3. Replace and close the blade server cover. See “Closing the blade server cover” on page 49 for further information.
Statement 21:
CAUTION: Hazardous energy is present when the blade server is connected to the power source. Always replace the blade cover before installing the blade server.
4. Reinstall the blade server into the BladeCenter unit.
5. Turn on the blade server. See “Turning on the blade server” on page 3 for further information.
6. If you have replaced the battery or the system board assembly, reset the system date and time through the operating system that you installed. For additional information, see your operating system documentation.
If you have just powered on the BladeCenter unit, wait until the power on
Note:
LED on the blade server flashes slowly before powering on the blade server.
Installing the front bezel assembly
The following illustration shows how to reinstall the front bezel assembly on the blade server.
Chapter 4. Installing and removing replaceable units 47
Blade Cover
Blade-Cover Release
Bezel-Assembly Release
Blade-Cover Release
Control-Panel Cable
Bezel-Assembly Release
Bezel
Control-Panel Connector
Figure 23. Reinstalling the front bezel assembly
Complete the following steps to install the blade server front bezel assembly:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. Connect the control panel cable to the control panel connector on the system board assembly.
3. Carefully slide the front bezel assembly onto the blade server, as shown in Figure 23, until it clicks into place.
Make sure that you do not pinch any cables when you reinstall the front
Note:
bezel assembly.
48 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Closing the blade server cover
Important: The blade server cannot be inserted into the BladeCenter unit until the
cover is installed and closed. Do not attempt to override this protection.
Cover pins
Cover release
Cover release
Figure 24. Closing the blade server cover
Complete the following steps to close the blade server cover:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. If you removed the front bezel assembly, replace it now. See “Installing the front bezel assembly” on page 47 for instructions, and Figure 24.
3. Lower the cover so that the slots at the rear slide down onto the pins at the rear of the blade server, as shown Figure 24. Before closing the cover, make sure that all components are installed and seated correctly and that you have not left loose tools or parts inside the blade server.
4. Carefully close the cover as shown in Figure 24 until it clicks into place.
Input/output connectors and devices
The BladeCenter unit contains the input/output connectors that are available to the blade server. See the documentation that comes with the BladeCenter unit for information about the input/output connectors.
Chapter 4. Installing and removing replaceable units 49
50 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Chapter 5. Diagnostics and troubleshooting
This chapter provides basic troubleshooting information to help you solve some common problems that might occur while setting up your blade server.
A problem with the BladeCenter QS21 can relate either to the BladeCenter QS21 or the BladeCenter unit.
A problem with the blade server exists if the BladeCenter unit contains more than one blade server and only one of the blade servers has the symptom. If all of the blade servers have the same symptom, then the problem relates to the BladeCenter unit. For more information, see the documentation that comes with your BladeCenter unit.
The BladeCenter QS21 is supported in the BladeCenter H Type 8852 unit
Note:
and the BladeCenter HT Type 8740 and 8750 (enterprise environment only) unit. However you can put other blade servers compatible with the BladeCenter units in the same unit as a BladeCenter QS21.
Prerequisites
Basic checks
Before you start problem determination or servicing, check that:
v The BladeCenter QS21 is inserted correctly into the BladeCenter unit v All components are connected correctly v The BladeCenter QS21 has the latest firmware updates. These include:
BMC System Gigabit Ethernet controller SAS expansion card (if installed) InfiniBand high-speed expansion card (if installed)
If you install the blade server in the BladeCenter unit and the blade server does not start, always perform the following basic checks before continuing with more advanced troubleshooting:
v Make sure that the BladeCenter unit is correctly connected to a power source. v Reseat the blade server in the BladeCenter unit. v If the power on LED is flashing slowly, the blade server may be turned off. To
turn on the blade server, see “Turning on the blade server” on page 3 for further information.
v If you have just added a new optional device or component, make sure that it is
correctly installed and compatible with the blade server and its components. If the device or component is not compatible, remove it from the blade server, reinstall the blade server in the BladeCenter unit, and then restart the blade server.
v Use Advanced Management Module to check that the blade server appears in
the list of blade servers available.
© Copyright IBM Corp. 2006, 2008 51
Finding troubleshooting information
Table 2 describes where to find troubleshooting information in this section.
Note: Many components, including the CPU, RAM and power supplies cannot be
exchanged in the field. The only replaceable parts are the optional SAS daughter card, battery, front bezel assembly, I/O buffer DIMM memory, and the optional InfiniBand card.
Table 2. Where to find troubleshooting information
Component Where to find information
SAS expansion card Front bezel High-speed InfiniBand expansion card
Memory “Boot errors and handling” on page 72 LEDs
Power Network connections Service processor Software problems
For troubleshooting information about other BladeCenter components, see the appropriate Problem Determination and Service Guide, and other product-specific documentation. See “Related documentation” on page 1 for additional information. For the latest editions of the IBM BladeCenter documentation, go to http://www.ibm.com/support/us/en/ on the World Wide Web.
Troubleshooting charts
The following tables list problem symptoms and suggested solutions. If you cannot find the problem in the troubleshooting charts, or if carrying out the suggested steps do not solve the problem, have the blade server serviced.
“Solving undetermined problems” on page 95
“Troubleshooting charts” on page 52
If you have problems with an adapter, monitor, keyboard, mouse, or power module, see the Problem Determination and Service Guide that comes with your BladeCenter unit for more information.
If you have problems with an Ethernet switch module, I/O adapter, or other optional device that can be installed in the BladeCenter unit, see the Problem Determination
and Service Guide or other documentation that comes with the device for more
information.
Problems indicated by the front panel LEDs
The state of the LEDs on the front of the blade can help in isolating problems.
52 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Information LED
Location LED
Activity LED
Power-on LED
Media-tray select button
Power-control button
Blade-error LED
NMI reset-button
CD
Figure 25. Power-control button and LEDS
The table below gives an explanation and a suggested action, if required, for each LED.
Table 3. Explanation of LEDs and their states
LED State Explanation Suggested action
Blade error LED Amber A system error has occurred on
the blade server.
Information LED Amber Information about a system
error has been placed in the Advanced Management Module Event Log. The information LED remains on until turned off by Advanced Management Module or through IBM Director Console.
Activity LED Green There is network activity. No action required. For further
Check the BladeCenter error log, see “Problem reporting” on page 94.
Check Advanced Management Module to see what the problem is. See the
BladeCenter Management Module User's Guide for further
information about the error.
information about troubleshooting networks, see “Network connection problems” on page 57.
Chapter 5. Diagnostics and troubleshooting 53
Table 3. Explanation of LEDs and their states (continued)
LED State Explanation Suggested action
Power-on LED Flashing rapidly The service processor on the
No action required blade server is communicating with the BladeCenter Management Module.
Flashing slowly The blade server has power but
Turn on if required is not turned on.
Lit continuously (steady) The blade server has power
No action required and is turned on.
Not lit. Blade server not powered.
1. Reseat blade server.
2. Check if BladeCenter power supplies numbers 3 and 4 are installed and powered. If they are not, install and power them or use slots 1-5.
3. Go to “Power problems” on page 57
Problems indicated by the system board LEDS
The blade server must be removed from the BladeCenter unit and the cover removed before you can use the light path LEDs for diagnostics. To activate the light box and the other light path LEDs, press the light path diagnostics switch. The location of each LED on the system board is shown in the table below.
Temperature fault LED
System board LED
CPU fail LED
NMI error LED
TEMP S BRD
CPU
NMI
LP
Light box
1
Error LED (JDIM )00
Light path diagnostics LED
Light path diagnostics switch
JDIM01 slot JDIM00 slot
Error LED (JDIM01)
Error LED (JDIM11)
Error LED (JDIM10)
JDIM11 slot JDIM10 slot
Figure 26. Light box and system board LEDs
54 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Table 4. System board LEDs
Board
LED Color
location Explanation Comments
Status LEDs The status LEDs are listed for
Heartbeat Green D16 Indicates the BMC is functional. Alert Yellow D15 Indicates an error condition has
occurred on the system board.
Ethernet 1 activity Green D12 Indicates Ethernet 1 is active and
reasons of completeness since they are for use by IBM service only and are not normally visible. They are not activated by the light path diagnostics switch.
sending or receiving packets.
Ethernet 0 activity Green D11 Indicates Ethernet 0 is active and
sending or receiving packets.
BE0_PLL_LOCK Green D8 Indicates the phased lock loop of
Cell/B.E.-0 is working.
BE1_PLL_LOCK Green D13 Indicates the phased lock loop of
Cell/B.E.-1 is working.
MM_SELECT_A Green D19 Indicates Advanced Management
Module A is active.
MM_SELECT_B Green D18 Indicates Advanced Management
Module B is active.
Light path LEDs
DIMM at JDIM11 error
DIMM at JDIM10 error
DIMM at JDIM01 error
DIMM at JDIM00 error
Light box LEDs
Yellow D21 There has been a failure in the
I/O DIMM module.
Yellow D20
See Figure 26 on page 54 for the location of each DIMM and its
Yellow D10
associated LED.
Yellow D7
Either remove or replace the DIMM and reboot.
Chapter 5. Diagnostics and troubleshooting 55
Table 4. System board LEDs (continued)
Board
LED Color
location Explanation Comments
Temperature fault Yellow Light
box
The blade server has exceeded the operational temperature range.
v Using the Advanced
Management Module, check that the BladeCenter unit cooling system is operating correctly.
v Replace any missing filler
blades in the BladeCenter unit.
v Replace any missing filler
blades in the BladeCenter QS21 DIMM sockets.
v Check that other blade servers
are operating within the recommended temperature range.
v Replace the blade server,
power on and boot. Check Advanced Management Module for errors.
NMI error Yellow The NMI pinhole reset on the
front panel has been pressed.
CPU fail Yellow One of the Cell BE processors
has failed.
System board Yellow A critical error has occurred in a
component on the system board.
Light path diagnostics
Green Lights when the light path
diagnostics switch is pressed. Indicates that the capacitor is charged and the light path LEDs can light to show any errors.
If
the problem persists, contact
your IBM service representative as the system board may need servicing.
Pressing the reset causes the operating system to call the system debugger.
Contact your IBM service representative as the system board needs replacement.
Contact your IBM service representative as the system board may need replacing.
If this LED does not light then the light path LEDs cannot function.
Reinstall the blade server in the BladeCenter unit and power on to recharge.
If this fails to resolve the problem, there is a problem with the system board and it may need replacement.
56 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Power problems
Power symptom Suggested action
The blade server does not turn on.
1. Make sure that: a. The power-on LED on the front of the BladeCenter unit is lit. b. The LEDs on all the BladeCenter power modules are lit. c. The power-on LED on the blade-server control panel is flashing slowly.
v The power-on LED flashes rapidly for a short period to indicate it is
communicating with Advanced Management Module. If the power-on LED to flash rapidly and continues to do so, the blade server is not communicating with the management module; reseat the blade server and reboot.
v If the power LED is off, either the blade bay is not receiving power, the
blade server is defective, the Advanced Management Module firmware is an earlier version and does not support this function, or the LED information panel is loose or defective.
d.
Local power control for the blade server is enabled. Check using the
Advanced Management Module Web interface. The blade server might have been instructed through the Advanced Management Module to turn on.
If you have just installed a new option in the blade server, remove it, and restart
2. the blade server. If the blade server now powers on, troubleshoot the option. See the documentation that comes with the option for further information.
3. Try another blade server in the blade bay. If it works, you may need to have a trained service technician replace the system blade assembly.
Power throttling
Be aware that the BladeCenter unit automatically reduces the BladeCenter QS21 processor speed if certain conditions are met. One such condition is temperature thresholds being exceeded, for example, when the blade server is running in acoustic mode. This throttling occurs independent of your power configuration. Full processor speed is restored automatically when the conditions that have caused the throttling have been resolved.
Network connection problems
Network connection symptom Suggested action
One or more blade servers are unable to communicate with the network.
Make sure that:
v The switch modules for the network interface being used are installed in the
correct BladeCenter bays and are configured and operating correctly.
v The settings in the switch module are correct for the blade server (settings in the
switch module are blade server specific).
For
additional information, see:
v Chapter 2, “Configuring the blade server,” on page 9 v The Problem Determination and Service Guide that comes with your BladeCenter
unit
v Other product-specific documentation that comes with the switch module
For the latest editions of the IBM BladeCenter documentation, go to
Note:
http://www.ibm.com/support/us/en/.
If the problem remains, see “Solving undetermined problems” on page 95.
If all the blades cannot communicate with the network, check the network itself for problems.
Chapter 5. Diagnostics and troubleshooting 57
Service processor problems
Service processor symptom Suggested action
Service processor reports a general monitor failure.
1. If the blade server is operating, shut down the operating system.
2. If the blade server was not turned off, press the power-control button (behind the blade server control-panel door) to turn off the server.
3. Remove the blade server from the BladeCenter unit.
4. Wait 30 seconds and reinstall the blade server into the BladeCenter unit.
5. Restart the blade server.
If
the problem remains, see “Solving undetermined problems” on page 95
Software problems
Symptom Suggested action
You suspect a software problem.
1. To determine whether the problem is caused by the software, make sure that:
v The blade server has the minimum memory that is needed to use the software.
For memory requirements, see the software documentation.
v The software is designed to operate on the blade server. v Other software works on the blade server. v The software works on another server.
If you received any error messages when using the software, see the software
2. documentation for a description of the messages and suggested solutions to the problem.
3. Contact the software vendor.
58 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Recovering the system firmware code
The system firmware is contained in two separate images in the flash memory of the blade server: temporary and permanent. These images are referred to as TEMP and PERM, respectively. The system normally starts from the TEMP image, and the PERM image serves as a backup. If the TEMP image becomes damaged, such as from a power failure during a firmware update, the system automatically starts from the PERM image.
If the TEMP image is damaged, you can recover the TEMP image from the PERM image. See “Recovering the TEMP image from the PERM image” for further information.
Checking the boot image
To check whether the system has started from the PERM image, enter:
cat /proc/device-tree/openprom/ibm,fw-bank
A P is returned if the system has started from the PERM image.
Booting from the TEMP image
To initiate a boot from the TEMP image after the system has booted from the PERM side, complete the following steps:
1. Turn off the blade server.
2. Restart the blade system management processor from the Advanced Management Module.
3. Turn on the blade server.
If the temp side is corrupted the boot times out, and an automatic reboot
Note:
occurs after switching to the PERM side.
the blade server does not restart, you must replace the system board assembly.
If Contact a service support representative for assistance.
Recovering the TEMP image from the PERM image
To recover the TEMP image from the PERM image, you must copy the PERM image into the TEMP image. To perform the copy, complete the following steps:
1. Copy the perm image to the temp image. Using the Linux operating system, type the following command:
update_flash -r
2. Shut down the blade server using the operating system.
3. Restart the blade system management processor from the management module.
4. Turn on the blade server.
might need to update the firmware code to the latest version. See “Updating
You the system and BMC firmware” on page 15 for more information on updating the firmware code.
Supported boot media
The BladeCenter QS21 can boot from the operating system installation CDs or DVDs to allow the operating system to be installed.
Chapter 5. Diagnostics and troubleshooting 59
Once the operating system is installed, the BladeCenter QS21 can also boot either from attached SAS storage if you have the installed the optional SAS Expansion Card or from the network.
If you wish to perform a standard Bootp/TFTP network boot, please note the following restrictions:
v Only the built-in Gigabit Ethernet Controller of I/O Bridge is supported v Only boot through the Ethernet switch on the top side of BladeCenter v No fall back or configurable change to the bottom switch is possible v In the Advanced Management Module you need to set boot list to Network v There is no support for a router between the blade and TFTP server. Only local
TFTP is supported.
Advanced Management Module to configure the required boot mode. See IBM
Use
BladeCenter Management Module Installation Guide for more information.
Booting the system
This section provides an overview on how to interpret the console output of the host firmware. The output is grouped into several parts, which are detailed below.
1. The first part of the boot process shows the system name and build date. Yo u see an error at this point if the firmware image is corrupted.
*************************************************************************** QS21 Firmware Starting Check ROM = OK Build Date = Apr 24 2007 13:43:46 FW Version = QB-1.6.0-0 Press "F1" to enter Boot Configuration (SMS)
2. Memory initialization follows next.
Note: It can take several seconds to initialize the RAMBUS memory.
3. The memory is initialized. The screen displays details of the vendor and the speed of memory modules.
Initializing memory configuration... MEMORY
Modules = Elpida 512MB, 3200 Mhz
XDRlibrary = v0.32, Bin A/C, RevB, DualDD Calibrate = Done Test = Done
The next screens show the open firmware section of the boot process and provide checkpoints and an overview which adapters are available in the system. The details in the adapter list are not meaningful.
Note: The warning(!) Permanent Boot ROM is displayed if there is a problem
with the TEMP image and system firmware is running on from the PERM image. Yo u should correct this problem as soon as possible. See “Recovering the TEMP image from the PERM image” on page 59 for further information.
60 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
OPEN FIRMWARE Adapters on 000001460ec0000000
00 0800 (D) : 14e4 16a8 network [ ethernet ] 00 0900 (D) : 14e4 16a8 network [ ethernet ] Adapters on 000003460ec00000 00 0800 (D) : 1033 0035 usb-ohci ( NEC uPD720101 ) 00 0900 (D) : 1033 0035 usb-ohci ( NEC uPD720101 ) 00 0a00 (D) : 1033 00e0 usb-ehci*
Welcome to Open Firmware
Licensed Internal Code - Property of IBM (c) Copyright IBM Corp. 2005, 2007 All Rights Reserved. Cell BE is a trademark of SONY Computer Entertainment Inc.
Type ’boot’ and press return to continue booting the system. Type ’reset-all’ and press enter to reboot the system.
disable nvram logging .. done
4. The next screen displays system information. It shows revision information about the chip set, SMP size, boot date/time, and the available memory.
SYSTEM INFORMATION
Processor = Cell/B.E.(TM) DD3.2 @ 3200 MHz I/O Bridge = Cell BE companion chip DD2.x Timebase = 26666 kHz (internal) SMP Size = 2 (4 threads) Boot-Date = 2007-06-08 11:20 Memory = 2048MB (CPU0: 1024MB, CPU1: 1024MB)
The Operating System now boots unless you press F1 in which case the SMS menu starts. See “Using the SMS utility program” on page 11 for further information.
Chapter 5. Diagnostics and troubleshooting 61
Diagnostic programs and messages
The Dynamic System Analysis (DSA) Preboot diagnostic programs are the primary method of testing the major components of the server. DSA is a system information collection and analysis tool that you can use to provide information IBM service and support to aid in the diagnosis of the system problems. The DSA diagnostic programs come on the IBM Dynamic System Analysis Preboot Diagnostic CD. Yo u can download the CD from http://www.ibm.com/support/us/en if one did not come with your server. As you run the diagnostic programs, text messages are displayed on the screen and are saved in the test log. A diagnostic text message indicates that a problem has been detected and indicates the action you should take as a result of the text message.
The DSA diagnostic programs collect the following information about the following aspects of the system:
v System configuration v Network interfaces and settings v Hardware inventory USB information v IBM LightPath diagnostics status v Service processor status and configuration v Vital product data and system firmware information v Drive Health Information v LSI RAID & Controller configuration
DSA diagnostic programs can also provide diagnostics for the following system
The components:
v Baseboard Management Controller v Memory stress v CPU stress
Additionally,
DSA creates a merged log that includes events from all collected logs.
All collected information can be output as a compressed XML file that can be sent to IBM Service. Additionally, you can view the information locally through a generated text report file. Optionally, the generated HTML pages may be copied to removable media and viewed from a web browser.
Running diagnostics and preboot DSA
To run the diagnostic programs, complete the following steps:
1. If the server is running, turn off the server and all attached devices.
2. Turn on all attached devices then turn on the server.
3. Ensure that external DSA bootable media is available as a boot device. For boot device selection, system firmware will work through the boot path as specified in the onboard planar VPD and try to establish communication with the specified interfaces in sequential order. These boot devices include the USB attached DVD (BladeCenter, media tray), the SAS storage if attached , as well as Network attached storage.
Note: To ensure the blade server boots from the correct device, use the
4. If required, :
62 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Advanced Management Module to change the boot order so the blade server boots first from the preboot DSA device.
5. When the boot prompt appears, press enter, type dsa and press enter again. Alternatively you can wait for the timeout to expire.
6. The command line interface prompt will then appear on the SOL connection. The BladeCenter QS21 does not support the graphical user interface.
7. Follow the on screen directions to run preboot DSA. Diagnostics are run from within preboot DSA.
you are using the CPU or Memory stress tests, call your IBM service
When representative if you experience any system instability.
To determine what action you should take as a result of a diagnostic text message, see “DSA error messages.”
Open firmware memory diagnostic results are output to the SOL connection. They are also logged in NVRAM. All NVRAM logs (more than just OF diags) are collected as part of the DSA merged log.
If the diagnostic programs do not detect any hardware errors but the problem remains during normal server operations, a software error might be the cause. If you suspect a software problem, see the information that comes with your software.
A single problem might cause more than one error message. When this happens, correct the cause of the first error message. The other error messages usually will not occur the next time you run the diagnostic programs.
If there are multiple error codes or light path diagnostics LEDs that indicate a microprocessor error, the error might be in a microprocessor or in a microprocessor socket. See Table 20 on page 90 for further information about diagnosing microprocessor problems.
If the server stops during testing and you cannot continue, restart the server and try running the diagnostic programs again.
Diagnostic text messages
Diagnostic text messages are displayed while the tests are running. A diagnostic text message contains one of the following results:
v Passed: The test was completed without any errors v Failed: The test detected an error v Aborted: The test could not proceed because of the server configuration
Additional
information concerning test failures is available in the extended
diagnostic results for each test.
Viewing the test log
To view the test log when the tests are completed, issue the view command from the DSA command line interface. DSA collections may also be transferred to an external USB device using the copy command from the DSA command line interface.
DSA error messages
The tables below describe the messages that the diagnostic programs might generate and suggested actions to correct the detected problems. Follow the suggested actions in the order given.
Chapter 5. Diagnostics and troubleshooting 63
CPU test results
Table 5. CPU test results
Test Number Status
CPU stress test
089-901­xxx
089-802­xxx
089-801­xxx
Fail Test failure
Abort System resource
Abort Internal program
Extended results Actions
1. If the system has stopped responding, turn off and restart the system and then run the test again.
2. Make sure that the DSA Diagnostic code is at the latest level. The latest level DSA Diagnostic code can be found on the IBM Support Web site at http://www.ibm.com/ support/docview.wss?uid=psg1SERV-DSA/.
3. Run the test again.
availability error
4. Check system firmware level and upgrade if necessary. The installed firmware level can be found in the DSA Diagnostic Event Log within the Firmware/VPD section for this component. The latest level firmware for this component can be found on the IBM Support Web site athttp://www.ibm.com/support/us/en/.
error
5. Run the test again.
6. If the system has stopped responding, turn off and restart the system and then run the test again.
7. If the test continues to fail, refer to the other sections of this chapter for diagnosis and corrective action.
BMC test results
Table 6. BMC test results
Test Number Status
I2C test 166-901-
xxx
Fail The BMC
Extended results Actions
indicates a failure in the IPMB bus.
1. Turn off the system and disconnect it from power. The system must be removed from AC power in order to reset the BMC.
2. After 45 seconds, reconnect the system to power and turn on the system.
3. Run the test again.
4. Make sure that the DSA Diagnostic code is at the latest level. The latest level DSA Diagnostic code can be found on the IBM Support Web site at http://www.ibm.com/ support/docview.wss?uid=psg1SERV-DSA/.
5. Check BMC firmware level and upgrade if necessary. The installed firmware level can be found in the DSA Diagnostic Event Log within the Firmware/VPD section for this component. The latest level firmware for this component can be found on the IBM Support Web site athttp://www.ibm.com/support/us/en/.
6. Run the test again.
7. If the test continues to fail, refer to the other sections of this chapter for diagnosis and corrective action.
64 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Table 6. BMC test results (continued)
Test Number Status
166-902-
Fail The BMC
xxx
Extended results Actions
indicates a failure in the memory card bus.
1. Turn off the system and disconnect it from power. The
system must be removed from AC power in order to reset the BMC.
2. After 45 seconds, reconnect the system to power and turn on the system.
3. Run the test again.
4. Make sure that the DSA Diagnostic code is at the latest level. The latest level DSA Diagnostic code can be found on the IBM Support Web site at http://www.ibm.com/ support/docview.wss?uid=psg1SERV-DSA/.
5. Check BMC firmware level and upgrade if necessary. The installed firmware level can be found in the DSA Diagnostic Event Log within the Firmware/VPD section for this component. The latest level firmware for this component can be found on the IBM Support Web site athttp://www.ibm.com/support/us/en/.
6. Run the test again.
7. If the reported memory size is the same as the installed memory size, complete the following steps. Otherwise, go to step 8.
a. Turn off the system and disconnect it from power. b. Reseat all the system DIMMs within the system. c. Reconnect the system to power and turn on the
d. Run the test again.
8. Turn off the system and disconnect it from power.
9. Remove all the system memory.
10. Install the minimum memory configuration for the system. See the QS21 Installation and User's Guide for supported memory configurations.
11. Reconnect the system to power and turn on the system.
12. Make sure that the reported memory size is the same as the installed memory size.
13. Run the test again. If the memory passes the test, one of the uninstalled memory cards or DIMMs is the failing component.
14. Repeat steps 8 through to 13 as necessary, using different memory cards and DIMMs, to isolate the failing component. It is important to change only one element each time in order to identify the specific cause of the error.
15. Replace the failing memory card or DIMM.
system.
Chapter 5. Diagnostics and troubleshooting 65
Table 6. BMC test results (continued)
Test Number Status
166-903-
Fail The BMC
xxx
Extended results Actions
indicates a failure in the Ethernet sideband bus.
1. Turn off the system and disconnect it from power. The system must be removed from AC power in order to reset the BMC.
2. After 45 seconds, reconnect the system to power and turn on the system.
3. Run the test again.
4. Make sure that the DSA Diagnostic code is at the latest level. The latest level DSA Diagnostic code can be found on the IBM Support Web site at http://www.ibm.com/ support/docview.wss?uid=psg1SERV-DSA/.
5. Check BMC firmware level and upgrade if necessary. The installed firmware level can be found in the DSA Diagnostic Event Log within the Firmware/VPD section for this component. The latest level firmware for this component can be found on the IBM Support Web site athttp://www.ibm.com/support/us/en/.
6. Check Ethernet device firmware level and upgrade if necessary. The installed firmware level can be found in the DSA Diagnostic Event Log within the Firmware/VPD section for this component. The latest level firmware for this component can be found on the IBM Support Web site at http://www.ibm.com/support/us/en/ .
7. Run the test again.
8. If the test continues to fail, refer to the other sections of this chapter for diagnosis and corrective action.
66 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Table 6. BMC test results (continued)
Test Number Status
166-904-
Fail The BMC
xxx
166-905-
Fail The BMC
xxx
166-906-
Fail The BMC
xxx
166-907-
Fail The BMC
xxx
166-908-
Fail The BMC
xxx
166-910-
Fail The BMC
xxx
Extended results Actions
indicates a failure in the main bus.
1. Turn off the system and disconnect it from power. The system must be removed from AC power in order to reset the BMC.
2. After 45 seconds, reconnect the system to power and turn on the system.
indicates a failure in the pecos bus.
3. Run the test again.
4. Make sure that the DSA Diagnostic code is at the latest level. The latest level DSA Diagnostic code can be found on the IBM Support Web site at http://www.ibm.com/
indicates a failure in the BMC private bus.
support/docview.wss?uid=psg1SERV-DSA/.
5. Check BMC firmware level and upgrade if necessary. The installed firmware level can be found in the DSA Diagnostic Event Log within the Firmware/VPD section for this component. The latest level firmware for this component
indicates a failure in the power backplane bus.
can be found on the IBM Support Web site at http://www.ibm.com/support/us/en/.
6. Run the test again.
7. If the test continues to fail, refer to the other sections of this chapter for diagnosis and corrective action.
indicates a failure in the microprocessor bus.
indicates a failure in the PCIe and Light path diagnostics bus.
Chapter 5. Diagnostics and troubleshooting 67
Table 7. BMC test results
Test Number Status
166-801-
Abort BMC I2C test
xxx BMC
166-802-
Abort BMC I2C test
xxx BMC
166-803-
Abort BMC I2C test
xxx BMC
166-804­xxx
BMC
166-805-
Abort BMC I2C test
Abort BMC I2C test
xxx BMC
166-806-
Abort BMC I2C test
xxx BMC
166-807-
Abort BMC I2C test
xxx BMC
166-808-
Abort BMC I2C test
xxx BMC
166-809-
Abort BMC I2C test
xxx BMC
166-810-
Abort BMC I2C test
xxx BMC
166-811-
Abort BMC I2C test
xxx BMC
166-812-
Abort BMC I2C test
xxx BMC
Extended results Actions
canceled: the BMC returned an incorrect response length.
canceled: the test cannot be completed for an unknown reason.
canceled: the node is busy; try later.
1. Turn off the system and disconnect it from power. The system must be removed from AC power in order to reset the BMC.
2. After 45 seconds, reconnect the system to power and turn on the system.
3. Run the test again.
4. Make sure that the DSA Diagnostic code is at the latest level. The latest level DSA Diagnostic code can be found on the IBM Support Web site at http://www.ibm.com/ support/docview.wss?uid=psg1SERV-DSA/.
5. Check BMC firmware level and upgrade if necessary. The installed firmware level can be found in the DSA Diagnostic Event Log within the Firmware/VPD section for this component. The latest level firmware for this component can be found on the IBM Support Web site at
canceled: invalid command.
http://www.ibm.com/support/us/en/.
6. Run the test again.
7. If the test continues to fail, refer to the other sections of this
canceled: invalid
chapter for diagnosis and corrective action.
command for the given LUN.
canceled: timeout while processing the command.
canceled: out of space
canceled: reservation canceled or invalid reservation ID
canceled: request data was truncated.
canceled: request data length is invalid.
canceled: request data field length limit is exceeded.
canceled: a parameter is out of range.
68 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Table 7. BMC test results (continued)
Test Number Status
166-813-
Abort BMC I2C test
xxx BMC
166-814-
Abort BMC I2C test
xxx BMC
166-814-
Abort BMC I2C test
xxx BMC
166-816-
Abort BMC I2C test
xxx BMC
166-817-
Abort BMC I2C test
xxx BMC
166-818-
Abort BMC I2C test
xxx BMC
Extended results Actions
canceled: cannot return the number of requested data bytes.
canceled: requested sensor, data, or record is not present.
canceled: invalid data field in the request.
1. Turn off the system and disconnect it from power. The system must be removed from AC power in order to reset the BMC.
2. After 45 seconds, reconnect the system to power and turn on the system.
3. Run the test again.
4. Make sure that the DSA Diagnostic code is at the latest level. The latest level DSA Diagnostic code can be found on the IBM Support Web site at http://www.ibm.com/ support/docview.wss?uid=psg1SERV-DSA/.
5. Check BMC firmware level and upgrade if necessary. The installed firmware level can be found in the DSA Diagnostic Event Log within the Firmware/VPD section for this component. The latest level firmware for this component can be found on the IBM Support Web site at http://www.ibm.com/support/us/en/.
6. Run the test again.
7. If the test continues to fail, refer to the other sections of this chapter for diagnosis and corrective action.
canceled: the command is illegal for the specified sensor or record type
canceled: a command response could not be provided
canceled: cannot execute a duplicated request.
Chapter 5. Diagnostics and troubleshooting 69
Table 7. BMC test results (continued)
Test Number Status
166-819-
Abort BMC I2C test
xxx BMC
166-820­xxx
BMC
166-821-
Abort BMC I2C test
Abort BMC I2C test
xxx BMC
166-822-
Abort BMC I2C test
xxx BMC
166-823-
Abort BMC I2C test
xxx BMC
166-824-
Abort BMC I2C test
xxx BMC
166-000-
Pass
xxx
Extended results Actions
canceled: a command response could not be provided; the SDR repository is in update mode.
canceled: a command response could not be provided; the device is in firmware update mode.
canceled: a command response could not be provided; BMC initialization is in progress
canceled: the destination is unavailable.
canceled: cannot execute the command; insufficient privilege level.
canceled: cannot execute the command.
Memory tests
Table 8. Memory test results
Extended
Test Number Status
Memory stress
201-000­xxx
Pass
test
70 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
results Actions
Table 8. Memory test results (continued)
Extended
Test Number Status
202-
Fail General error:
802-xx
results Actions
memory size is insufficient to run the test.
202-901-
Fail Test failure.
xxx
202-801­xxx
202-000-
Abort Internal program
error.
Pass
xxx
1. Ensure all memory is enabled by checking Available
System Memory in the Resource Utilization section of the
DSA Diagnostic Event Log.
2. Make sure that the DSA Diagnostic code is at the latest level. The latest level DSA Diagnostic code can be found on the IBM Support Web site at http://www.ibm.com/support/ docview.wss?uid=psg1SERV-DSA/.
3. Run the test again.
4. Execute the standard DSA memory diagnostic to validate all memory.
5. If the test continues to fail, refer to the other sections of this chapter for diagnosis and corrective action.
1. Execute the standard DSA memory diagnostic to validate all memory.
2. Make sure that the DSA Diagnostic code is at the latest level. The latest level DSA Diagnostic code can be found on the IBM Support Web site at http://www.ibm.com/support/ docview.wss?uid=psg1SERV-DSA/.
3. Turn off the system and disconnect it from power.
4. Reseat the DIMMs.
5. Reconnect the system to power and turn on the system.
6. Run the test again.
7. Execute the standard DSA memory diagnostic to validate all memory.
8. If you cannot reproduce the problem, contact your IBM technical-support representative.
1. Turn off and restart the system.
2. Make sure that the system firmware code and DSA code are at the latest level.
3. Run the test again.
4. Turn off and restart the system if necessary to recover from a hung state.
5. Run the memory diagnostic to identify the specific failing DIMM.
6. If the test continues to fail, refer to the other sections of this chapter for diagnosis and corrective action
System firmware startup messages
The system firmware displays the progress of the startup process on the serial console from the time that ac power is connected to the system until the operating system login prompt is displayed following a successful operating system startup.
If a serial console is not connected, you can use the Advanced Management Module to monitor the logs and display informational and error messages.
Chapter 5. Diagnostics and troubleshooting 71
If the firmware encounters an error during the startup process, a message describing the error together with an error code is displayed on the serial console.
There are two types of error, where xxx represents the number of the error code:
Cxxx This is an internal checkpoint. If the system stops during the startup
process a checkpoint may be displayed.
Exxx This type of error means that there is a failure that does not allow
the firmware to continue the startup process. Check the error codes in the section “Boot errors and handling” on page 72. If these do not help resolve the problem, contact a service support representative.
are cases where a message that is informational only is displayed on the
There serial console.
Wxxx This is a warning message. The firmware allows the startup process
to continue, but indicates there maybe a problem. A warning message can be combined with an error message to give more complete information about an error.
complete list of possible messages is given in the section “Boot errors and
A handling” on page 72.
Boot errors and handling
The following sections describe boot errors and actions you can take to resolve these errors.
Boot list
The following table describes boot list errors.
Table 9. System firmware boot list errors
Code Message Description Action
E3400 It was not possible to boot from
any device specified in the VPD
E3401 Aborting boot, <details> Boot aborted due to error detected
E3402 Aborting boot, internal error. Boot aborted due to error detected
The firmware found a valid VPD but was not able to find bootable code on any of the devices listed in it.
by the low level code. The <details> string provides the error description.
by the low level code.
Use Advanced Management Module Web browser to specify at least one device that contains bootable code.
From the Advanced Management Module Web interface, choose
BladeTasks>Configuration>Boot Sequence.
Based on the <details> string you may have to take an action on faulty hardware or use the Advanced Management Module to correct the system configuration.
If the problem persists, contact your IBM service representative.
The exact reason is unknown but could be a firmware problem.
72 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
If the problem persists, contact your IBM service representative.
Table 9. System firmware boot list errors (continued)
Code Message Description Action
E3403 Bad executable: <details> The file loaded from the boot
device is not a valid PPC executable ELF file. The <details> string provides more details about
Using the Advanced Management Module correct the boot device configuration. Select a valid boot device and executable path
the file type.
E3404 Not a bootable device! The system cannot load an
executable file from this device.
Using the Advanced Management Module correct the boot device configuration. Select a valid boot device and executable path.
E3405 No such device The specified boot device is
currently not present or not ready for access.
Check the hardware device or use the Advanced Management Module to correct the system configuration.
If the problem persists, contact your IBM service representative.
E3406 Client application returned an
error: <details>
The OS or a standalone application returned an error code to the system firmware. The <details> string provides the error description
Based on the <details> string you may have to take an action on faulty hardware or use the Advanced Management Module to correct the system configuration. It may be needed to perform the firmware or OS upgrade to resolve compatibility issues. If the problem persists, contact your IBM service representative.
E3407 Load failed Load or boot failed to load
requested file from the device. This is informational message and may be preceded by one or more other error messages.
Based on the preceding error messages you may have to take an action on faulty hardware or use the Advanced Management Module to correct the system configuration
E3408 Failed to claim memory for the
executable
An attempt to load executable file from the boot device failed due to insufficient memory or firmware problem.
Verify that loaded file was indeed the right executable intended to boot this system. If not, using the Advanced Management Module correct the system configuration. Otherwise, contact your IBM service representative. Yo u may need to add more memory to the system or to perform the firmware upgrade.
E3409 Unknown FORTH Word Internal code error, or compatibility
issue.
Contact your IBM service representative. Yo u may need to perform the firmware upgrade.
E3410 Boot list successfully read from
VPD but no useful information received.
The firmware found a valid VPD but was not able to find bootable code on any of the devices listed in it.
Use Advanced Management Module Web browser to specify at least one device that contains bootable code.
From the Advanced Management Module Web interface, choose
BladeTasks>Configuration>Boot Sequence.
Chapter 5. Diagnostics and troubleshooting 73
Table 9. System firmware boot list errors (continued)
Code Message Description Action
W3411 Client application returned. Loaded OS or standalone
application returned to firmware. This may be a normal condition or firmware could not detect any error issued by the client application. Booting from the boot-device list will be interrupted at this stage and no further attempts to boot
None needed. If boot (e.g. yaboot) exited because of need to boot from different device in the list, either boot manually from the firmware (ok) prompt or, using the Advanced Management Module, change the boot device order in
the system configuration. from devices in the list will be made.
E3420 Boot list could not be read from
VPD.
The firmware found an invalid VPD. Possibly it has been corrupted by the system software.
The VPD must be rewritten. Use
the Advanced Management
Module Web browser to specify at
least one device that contains
bootable code.
From the Advanced Management
Module Web interface, choose
BladeTasks>Configuration>Boot
Sequence.If the problem persists,
contact your IBM service
representative.
System firmware update errors
The following table describes system firmware errors that can occur if there have been problems after an update.
Table 10. System firmware boot errors
Code Message Description Action
E4000 (RTAS Flash) unknown flash chip
version
E4010 Platform check failed for image The firmware image does not
E4020 (RTAS flash) image corrupted
(CRC)
The flash update code does not support the onboard boot ROM flash chip.
match the hardware platform.
The image for a system firmware update is corrupted.
Contact your IBM service
representative as the system
board may need replacing.
Check the firmware image and
ensure you have the right image
for the BladeCenter QS21. See
“Using the SMS utility program” on
page 11.
If the image is incorrect, download
and install the correct image from
http://www.ibm.com/support/us/en/.
See “Updating the system and
BMC firmware” on page 15 for
further information.
Download the image again and
reapply the update.
If this does not resolve the
problem, apply an image from a
different source.
74 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Memory initialization errors
The following table describes the memory initialization errors that can occur during boot.
Table 11 . Memory initialization errors
Code Message Description Action
E1006 Memory Incomplete. Not all the XDR system memory
could be initialized.
E1100 System memory init failure. Boot
abort.
The system XDR memory could not be initialized. The boot process has aborted.
E111 0 System memory test failure. Boot
abort.
An error has occurred while testing the XDR memory.
The blade server can still boot but with reduced system memory. Power down then reboot the blade.
If this does not resolve the problem, contact your IBM service representative as the system board may need replacing.
Power down then reboot the blade.
If this does not resolve the problem, contact your IBM service representative as the system board may need replacing.
Power down then reboot the blade.
E1200 System memory init failure during
second-pass calibration. CPU halted.
E1210 Memory controller failed. CPU
halted.
W1250 Timing Calibration failed: BE...
YRAC... DQ... pin...
Note: <xx...> indicates the
number of the pin where the error has occurred.
Since the first-pass calibration succeeded, either the CPU or the system XDR memory could have a defective contact.
The built-in memory controller of the CPU encountered an unexpected error.
This warning message accompanies a later memory initialization error message and lists the pin number for help in locating the cause of the error.
If this does not resolve the problem, contact your IBM service representative as the system board may need replacing.
Power down then reboot the blade.
If this does not resolve the problem, contact your IBM service representative as the system board may need replacing.
Power down then reboot the blade.
If this does not resolve the problem, contact your IBM service representative as the system board may need replacing.
See the accompanying memory initialization error message for further information.
USB errors
The following table describes boot list errors. These may occur when booting from a bootable CD or DVD.
Chapter 5. Diagnostics and troubleshooting 75
Table 12. System firmware boot errors
Code Message Description Action
E5000 (USB) Media or drive not ready for
this blade.
The media tray is not accessible for boot.
Verify that the media tray is
assigned to the blade and that the
media is configured correctly.
If this does not resolve the
problem, check the other blade
servers. If they cannot access the
media, there may be a problem
with the BladeCenter unit. See the
documentation that comes with the
BladeCenter unit for further
information.
E5010 (USB) No Media Found! Please
check for the drawer/inserted media.
E5020 (USB) Unknown media format. The media is not recognized by
E5030 (USB) Device communication error Firmware cannot communicate
Media is not inserted or the drawer of the media tray is open.
the firmware.
with the BladeCenter USB devices.
Ensure that there is a bootable CD
or DVD in the tray and that the
drawer is closed.
Insert a suitable bootable CD.
This could be a firmware or
physical hardware problem.
Check:
v The Advanced Management
Module for messages
v The system firmware image is
not corrupt. See “System firmware update errors” on page 74 for more information about possible errors and their solution.
v Other blade servers within the
BladeCenter unit to see if they have the problem. If they do, the BladeCenter unit itself may be the cause of the problem. See the Problem Determination
and Service Guide for your
BladeCenter unit for more information.
76 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Finally,
power down then reboot
the blade. If this does not help
resolve the problem, contact your
IBM service representative.
Table 12. System firmware boot errors (continued)
Code Message Description Action
E5040 (USB) Device transaction error.
<command>
The drive showed an error during data transfer.
1. Verify that the media tray is
Note: <command> indicates the
command in progress when the transaction error occurred. This information may not always be available.
2. Check that the correct media is
3. Inspect the media to see if
4. Use another CD or DVD drive
5. Check with other blade servers
assigned to the blade server.
Note: A reboot of the blade
server the to which the media tray was previously assigned is required.
inserted and that the drawer is closed.
there is visible damage.
to check that the media is readable.
within the BladeCenter unit to see if they have the problem. If they do, the BladeCenter unit itself may be the cause of the problem. See the Problem
Determination and Service Guide for your BladeCenter
unit for more information.
Network boot errors
The following table describes the network boot errors.
Table 13. Network boot errors
Code Message Description Action
E3000 (net) Could not read MAC
address.
E3001 (net) Could not get IP address. The DHCP server is not
E3002 (net) ARP request to TFTP server
(x.x.x.x) failed.
Note: (x.x.x.x) represents the
address of the TFTP server.
The firmware could not establish a communication socket for booting over the network due to an error while retrieving the MAC address of the network device.
responding, or there could be a MAC address conflict in your network.
The MAC address resolution failed for the TFTP server with IP address (x.x.x.x).
Power down then reboot the blade server.
If this does not resolve the problem, contact your IBM service representative.
v Check that your DHCP server is
available
v Check that an IP addresses has
been correctly assigned
v check that your MAC address is
valid and is unique across your network
1. Check that the TFTP server is
2. Check that the DHCP server is
available and can be reached over the network.
correctly assigning IP addresses.
Chapter 5. Diagnostics and troubleshooting 77
Table 13. Network boot errors (continued)
Code Message Description Action
E3003 (net) unknown TFTP error. The TFTP server encountered an
error but is not able to determine
Power down then reboot the blade
server. its cause.
If this does not help resolve the
problem, contact your IBM service
representative.
E3004 (net) TFTP buffer to small for
<filename> Note: <filename> is the name of
The requested file is too big. Try to load a smaller file. If this
succeeds, check your DHCP
server configuration.
the file TFTP has attempted to buffer.
E3005 (net) ICMP ERROR: <error
message>
E3006 (net) Could not initialize network
device
The TFTP server cannot be reached.
The network device could not be activated.
Check that the TFTP server is
available and correctly configured.
Check that you have connected all
network cables and that you have
enabled the BladeCenter I/O
module.
E3008 (net) Can’t obtain TFTP server IP
address
The DHCP server has not delivered the IP address of the
Check your DHCP server
configuration. TFTP server.
E3009 (net) file not found: <filename> The requested file was not found
on the TFTP server.
Check your DHCP server
configuration and make sure that
you are using the proper TFTP
server and the right file name.
E3010 (net) TFTP access violation The TFTP server reported a file
access violation.
Check the file name and the
permissions of the file that should
be downloaded.
E3011 (net) illegal TFTP operation The TFTP server is not able to
handle the request.
There may be too many UDP
ports open on the TFTP server.
Reboot the TFTP server and retry
the transfer.
E3012 (net) unknown TFTP transfer ID The TFTP server could not assign
Reboot and retry the transfer. the data to a UDP packet based on its transfer ID. The transfer ID for this connection may be in use by another client.
E3013 (net) no such TFTP user The TFTP server reported an
unknown user.
If the problem persists check the
configuration of the UDP ports on
your TFTP server.
Change the TFTP server
configuration to grant anonymous
user access.
E3014 (net) TFTP error occurred after
<No> bad packets received
The TFTP client received too many bad packets.
Reboot and retry the transfer.
If the error persists, this could
indicate problems with the
network. Check all network
connections and cables.
E3015 (net) TFTP error occurred after
missing <No> responses
The TFTP client has missed too many packets.
Reboot and retry the transfer.
If the error persists, this could
indicate problems with the
network. Check all network
connections and cables.
E3016 (net) TFTP error missing block
<No>, expected block was <No>
The TFTP client received a packet that is out of order.
Reboot and retry the transfer.
78 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Table 13. Network boot errors (continued)
Code Message Description Action
E3017 (net) TFTP block size negotiation
failed
TFTP server has sent an acknowledgement to the client without block size information for subsequent TFTP network traffic.
The TFTP server may not be working properly. Change the TFTP server configuration to allow block size negotiation. Reboot the blade and/or the TFTP server and try again.
E3018 (net) file exceeds maximum TFTP
transfer size
The requested file is too big to transfer via TFTP.
Change the TFTP server configuration to increase the block size to a maximum value of 1432 bytes.
Note: Be aware that your BladeCenter QS21 has two Ethernet controllers and can
be connected to two Ethernet switches. As the blade center performs a network boot from the controller that acquires the IP address first make sure that your Linux configuration supports this. If your Linux environment requires a static IP address for a particular Ethernet port, you must set up your DHCP environment accordingly.
SAS boot errors
These error messages only appear if you have installed the optional SAS daughter card.
Table 14. SAS boot errors
Code Message Description Action
E4303 LSISAS1064 controller
initialization failed.
The blade server firmware was not able to initialize the controller. This could indicate a hardware, blade server firmware, or SAS expansion card firmware problem.
Try following steps in order to fix the problem:
1. Reboot the blade.
2. Power down then remove and reinstall the blade server in the BladeCenter unit.
3. Remove and reinstall the SAS Expansion Card.
4. Ensure the SAS Expansion Card firmware and blade firmware version are at the correct level.
5. If the error started after a SAS Expansion Card firmware upgrade or a blade server firmware upgrade, consider a rollback to the previous firmware versions. Check with the documentation at http://www.ibm.com/systems/ bladecenter/support/ whether rollback is possible.
6. Plug the SAS expansion card into another blade server. If the problem persists, the SAS Expansion Card may need replacement.
7. Plug a different SAS expansion card into the blade server. If the problem persists, the blade server may need replacement.
to verify
Chapter 5. Diagnostics and troubleshooting 79
Table 14. SAS boot errors (continued)
Code Message Description Action
E4304 LSISAS1064 controller
operation failed.
The blade firmware was not able to bring the controller to an operational state. This could indicate a hardware, blade server firmware, or SAS expansion card firmware problem.
Try following steps in order to fix the problem:
1. Reboot the blade.
2. Power down then remove and reinstall the blade server in the BladeCenter unit.
3. Remove and reinstall the SAS Expansion Card.
4. Ensure the SAS Expansion Card firmware and blade firmware version are at the correct level.
5. If the error started after a SAS Expansion Card firmware upgrade or a blade server firmware upgrade, consider a rollback to the previous firmware versions. Check with the documentation at http://www.ibm.com/systems/ bladecenter/support/ whether rollback is possible.
6. Plug the SAS expansion card into another blade server. If the problem persists, the SAS Expansion Card may need replacement.
7. Plug a different SAS expansion card into the blade server. If the problem persists, the blade server may need replacement.
to verify
80 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Table 14. SAS boot errors (continued)
Code Message Description Action
E4305 LSISAS1064 port failed. The blade firmware could not enable
the SAS port. This could indicate a hardware, blade server firmware, or SAS expansion card firmware problem.
Try following steps in order to fix the problem:
1. Reboot the blade.
2. Power down then remove and reinstall the blade server in the BladeCenter unit.
3. Remove and reinstall the SAS Expansion Card.
4. Ensure the SAS Expansion Card firmware and blade firmware version are at the correct level.
5. If the error started after a SAS Expansion Card firmware upgrade or a blade server firmware upgrade, consider a rollback to the previous firmware versions. Check with the documentation at http://www.ibm.com/systems/ bladecenter/support/ whether rollback is possible.
6. Plug the SAS expansion card into another blade server. If the problem persists, the SAS Expansion Card may need replacement.
7. Plug a different SAS expansion card into the blade server. If the problem persists, the blade server may need replacement.
to verify
Chapter 5. Diagnostics and troubleshooting 81
Table 14. SAS boot errors (continued)
Code Message Description Action
E4307 LSISAS1064 network
topology read failed.
The blade server firmware was not able to discover the SAS topology. This could indicate a hardware, blade server firmware, or SAS expansion card firmware problem.
Try following steps in order to fix the problem:
1. Reboot the blade.
2. Power down then remove and reinstall the blade server in the BladeCenter unit.
3. Remove and reinstall the SAS Expansion Card.
4. Ensure the SAS Expansion Card firmware and blade firmware version are at the correct level.
5. If the error started after a SAS Expansion Card firmware upgrade or a blade server firmware upgrade, consider a rollback to the previous firmware versions. Check with the documentation at http://www.ibm.com/systems/ bladecenter/support/ whether rollback is possible.
6. Plug the SAS expansion card into another blade server. If the problem persists, the SAS Expansion Card may need replacement.
7. Plug a different SAS expansion card into the blade server. If the problem persists, the blade server may need replacement.
to verify
82 BladeCenter QS21 Type 0792: Problem Determination and Service Guide
Loading...