IBM QS22, BladeCenter QS22 Type 0793 Service Manual

BladeCenter QS22 Ty pe 0793
򔻐򗗠򙳰
Problem Dete rminatio n and Service Guid e
BladeCenter QS22 Ty pe 0793
򔻐򗗠򙳰
Problem Dete rminatio n and Service Guid e
Note
Before using this information and the product it supports, read the general information in Appendix C, “Notices,” on page 127 and the Warranty and Support Information on the Documentation CD.
© Copyright International Business Machines Corporation 2006, 2008.
US Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . .1
Related documentation . . . . . . . . . . . . . . . . . . . . . .1
Notices and statements used in this document . . . . . . . . . . . . . .2
Features and specifications . . . . . . . . . . . . . . . . . . . . .2
Support for storage . . . . . . . . . . . . . . . . . . . . . . . .3
Turning on the blade server . . . . . . . . . . . . . . . . . . . . .4
Turning off the blade server . . . . . . . . . . . . . . . . . . . . .4
Blade server controls and LEDs . . . . . . . . . . . . . . . . . . .5
System board LEDs . . . . . . . . . . . . . . . . . . . . . . .6
System board internal and expansion card connectors . . . . . . . . . . .7
Chapter 2. Configuring the blade server . . . . . . . . . . . . . . .9
Communicating with the blade server . . . . . . . . . . . . . . . . .9
Using the Advanced Management Module . . . . . . . . . . . . . .9
Using the Web interface . . . . . . . . . . . . . . . . . . .10
Using the command-line interface . . . . . . . . . . . . . . . .10
Using Serial over LAN . . . . . . . . . . . . . . . . . . . . .10
Using the serial interface . . . . . . . . . . . . . . . . . . . .11
Using the SMS utility program . . . . . . . . . . . . . . . . . .11
Starting SMS . . . . . . . . . . . . . . . . . . . . . . .11
Viewing FRU information . . . . . . . . . . . . . . . . . . .12
Adding FRU information . . . . . . . . . . . . . . . . . .13
Updating the system and BMC firmware . . . . . . . . . . . . . . .15
Updating steps . . . . . . . . . . . . . . . . . . . . . . . .16
Determining current blade server firmware levels . . . . . . . . . . .16
Updating the BMC firmware . . . . . . . . . . . . . . . . . . .17
Using the BMC update package . . . . . . . . . . . . . . . .18
Using the Advanced Management Module . . . . . . . . . . . . .18
Installing the system firmware . . . . . . . . . . . . . . . . . .20
The firmware update package . . . . . . . . . . . . . . . . . .20
Using the package . . . . . . . . . . . . . . . . . . . . .21
Updating the system firmware automatically . . . . . . . . . . . .21
Installing the firmware manually . . . . . . . . . . . . . . . . . .21
Updating the system firmware images . . . . . . . . . . . . . . .22
Updating the optional expansion card firmware . . . . . . . . . . . . .23
Integrating the Gigabit Ethernet controller into the BladeCenter . . . . . . .23
Updating the Ethernet controller firmware . . . . . . . . . . . . . . .23
Using the update package . . . . . . . . . . . . . . . . . . . .24
Firmware update steps . . . . . . . . . . . . . . . . . . . . .24
Blade server Ethernet controller enumeration . . . . . . . . . . . . . .25
Chapter 3. Parts listing . . . . . . . . . . . . . . . . . . . . .27
Replaceable components . . . . . . . . . . . . . . . . . . . . .27
Consumable parts . . . . . . . . . . . . . . . . . . . . . . . .28
Chapter 4. Installing and removing replaceable units . . . . . . . . .29
Installation guidelines . . . . . . . . . . . . . . . . . . . . . .29
System reliability guidelines . . . . . . . . . . . . . . . . . . .30
Handling static-sensitive devices . . . . . . . . . . . . . . . . .30
Removing the blade server from the BladeCenter unit . . . . . . . . . .30
Opening and removing the blade server cover . . . . . . . . . . . . .31
© Copyright IBM Corp. 2006, 2008 iii
Removing the BladeCenter PCI Express I/O Expansion Unit . . . . . . . .32
Removing the blade-server front bezel assembly . . . . . . . . . . . .33
Installing the optional modular flash drive . . . . . . . . . . . . . . .34
Removing the optional modular flash drive . . . . . . . . . . . . . . .35
Installing an optional high-speed expansion card . . . . . . . . . . . .36
Removing an optional high-speed expansion card . . . . . . . . . . . .38
Adding or changing system memory . . . . . . . . . . . . . . . . .39
Adding or changing I/O buffer DDR2 memory modules . . . . . . . . . .41
Replacing DIMM fillers . . . . . . . . . . . . . . . . . . . . . .42
Installing the optional SAS expansion card . . . . . . . . . . . . . . .43
Installing the BladeCenter PCI Express I/O Expansion Unit . . . . . . . .44
Replacing the system board base and planar . . . . . . . . . . . . . .45
Replacing the battery . . . . . . . . . . . . . . . . . . . . . .46
Replacing the retention clip for the modular flash drive . . . . . . . . . .48
Using the miscellaneous parts kit . . . . . . . . . . . . . . . . . .49
Replacing the ball studs . . . . . . . . . . . . . . . . . . . .50
Finishing the installation . . . . . . . . . . . . . . . . . . . . .50
Installing the front bezel assembly . . . . . . . . . . . . . . . . .51
Closing the blade server cover . . . . . . . . . . . . . . . . . .52
Input/output connectors and devices . . . . . . . . . . . . . . . . .53
Chapter 5. Diagnostics and troubleshooting . . . . . . . . . . . . .55
Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . .55
Basic checks . . . . . . . . . . . . . . . . . . . . . . . . .55
Finding troubleshooting information . . . . . . . . . . . . . . . . .56
Troubleshooting charts . . . . . . . . . . . . . . . . . . . . . .56
Problems indicated by the front panel LEDs . . . . . . . . . . . . .57
Problems indicated by the system board LEDs . . . . . . . . . . . .58
Power problems . . . . . . . . . . . . . . . . . . . . . . .62
Power throttling . . . . . . . . . . . . . . . . . . . . . . .62
Network connection problems . . . . . . . . . . . . . . . . . .62
Service processor problems . . . . . . . . . . . . . . . . . . .63
Software problems . . . . . . . . . . . . . . . . . . . . . .63
Recovering the system firmware code . . . . . . . . . . . . . . . .64
Checking the boot image . . . . . . . . . . . . . . . . . . . .64
Booting from the TEMP image . . . . . . . . . . . . . . . . . .64
Recovering the TEMP image from the PERM image . . . . . . . . . .64
Supported boot media . . . . . . . . . . . . . . . . . . . . . .65
Booting the system . . . . . . . . . . . . . . . . . . . . . . .65
Diagnostic programs and messages . . . . . . . . . . . . . . . . .67
Running diagnostics and preboot DSA . . . . . . . . . . . . . . .67
Diagnostic text messages . . . . . . . . . . . . . . . . . . . .68
Viewing the test log . . . . . . . . . . . . . . . . . . . . . .68
DSA error messages . . . . . . . . . . . . . . . . . . . . . .69
CPU test results . . . . . . . . . . . . . . . . . . . . . . .69
BMC test results . . . . . . . . . . . . . . . . . . . . . . .69
Memory tests . . . . . . . . . . . . . . . . . . . . . . . .75
System firmware startup messages . . . . . . . . . . . . . . . . .76
Checkpoints . . . . . . . . . . . . . . . . . . . . . . . . . .77
Boot errors and handling . . . . . . . . . . . . . . . . . . . . .77
Boot list . . . . . . . . . . . . . . . . . . . . . . . . . .77
System firmware update errors . . . . . . . . . . . . . . . . . .79
System memory errors . . . . . . . . . . . . . . . . . . . . .80
USB errors . . . . . . . . . . . . . . . . . . . . . . . . .83
Network boot errors . . . . . . . . . . . . . . . . . . . . . .84
SAS boot errors . . . . . . . . . . . . . . . . . . . . . . .86
iv BladeCenter QS22 Type 0793: Problem Determination and Service Guide
I/O DIMM boot-time errors . . . . . . . . . . . . . . . . . . . .97
Other error messages . . . . . . . . . . . . . . . . . . . . .99
BMC firmware messages . . . . . . . . . . . . . . . . . . . . . 100
NMI error messages . . . . . . . . . . . . . . . . . . . . . 107
Problem reporting . . . . . . . . . . . . . . . . . . . . . . . 108
Problem description . . . . . . . . . . . . . . . . . . . . . . . 108
Solving undetermined problems . . . . . . . . . . . . . . . . . . 108
Calling IBM for service . . . . . . . . . . . . . . . . . . . . . 109
Appendix A. Using the SMS utility . . . . . . . . . . . . . . . . 111
Starting the SMS utility . . . . . . . . . . . . . . . . . . . . . 111
The SMS utility menu . . . . . . . . . . . . . . . . . . . . . . 111
Select Language . . . . . . . . . . . . . . . . . . . . . . .112
Setup Remote IPL (Initial Program Load) . . . . . . . . . . . . . .112
IP Parameters . . . . . . . . . . . . . . . . . . . . . . .113
Adapter Configuration . . . . . . . . . . . . . . . . . . . . .114
Ping Test . . . . . . . . . . . . . . . . . . . . . . . . . .115
Advanced Setup: DHCP . . . . . . . . . . . . . . . . . . . .115
Change SCSI Settings . . . . . . . . . . . . . . . . . . . . .115
Select Console . . . . . . . . . . . . . . . . . . . . . . .115
Select Boot Options . . . . . . . . . . . . . . . . . . . . . .116
Firmware Boot Side Options . . . . . . . . . . . . . . . . . .118
Progress Indicator History . . . . . . . . . . . . . . . . . . .118
FRU information . . . . . . . . . . . . . . . . . . . . . . .119
Adding FRU information . . . . . . . . . . . . . . . . . . . 120
SAS Settings . . . . . . . . . . . . . . . . . . . . . . . . 122
Appendix B. Getting help and technical assistance . . . . . . . . . . 125
Before you call . . . . . . . . . . . . . . . . . . . . . . . . 125
Using the documentation . . . . . . . . . . . . . . . . . . . . . 125
Getting help and information from the World Wide Web . . . . . . . . . 125
Software service and support . . . . . . . . . . . . . . . . . . . 126
Hardware service and support . . . . . . . . . . . . . . . . . . . 126
Appendix C. Notices . . . . . . . . . . . . . . . . . . . . . . 127
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Important notes . . . . . . . . . . . . . . . . . . . . . . . . 128
Product recycling and disposal . . . . . . . . . . . . . . . . . . 129
Battery return program . . . . . . . . . . . . . . . . . . . . . 130
Electronic emission notices . . . . . . . . . . . . . . . . . . . . 132
Federal Communications Commission (FCC) statement . . . . . . . . 132
Industry Canada Class A emission compliance statement . . . . . . . . 132
Avis de conformité à la réglementation d’Industrie Canada . . . . . . . 132
Australia and New Zealand Class A statement . . . . . . . . . . . . 132
United Kingdom telecommunications safety requirement . . . . . . . . 132
Deutschsprachiger EU Hinweis: Hinweis für Geräte der Klasse A
EU-Richtlinie zur Elektromagnetischen Verträglichkeit . . . . . . . . 132
Deutschland: Einhaltung des Gesetzes über die elektromagnetische
Verträglichkeit von Geräten . . . . . . . . . . . . . . . . . 133
Zulassungsbescheinigung laut dem Deutschen Gesetz über die
elektromagnetische Verträglichkeit von Geräten (EMVG) (bzw. der EMC
EG Richtlinie 2004/108/EG) für Geräte der Klasse A . . . . . . . . 133
European Union EMC Directive conformance statement . . . . . . . . 133
Taiwanese Class A warning statement . . . . . . . . . . . . . . . 134
Japanese Voluntary Control Council for Interference (VCCI) statement 134
Korean Class A warning statement . . . . . . . . . . . . . . . . 134
Contents v
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
vi BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Safety
Before installing this product, read the Safety Information.
Antes de instalar este produto, leia as Informações de Segurança.
Pred instalací tohoto produktu si prectete prírucku bezpecnostních instrukcí.
Læs sikkerhedsforskrifterne, før du installerer dette produkt.
Lees voordat u dit product installeert eerst de veiligheidsvoorschriften.
Ennen kuin asennat tämän tuotteen, lue turvaohjeet kohdasta Safety Information.
Avant d’installer ce produit, lisez les consignes de sécurité.
Vor der Installation dieses Produkts die Sicherheitshinweise lesen.
Prima di installare questo prodotto, leggere le Informazioni sulla Sicurezza.
Les sikkerhetsinformasjonen (Safety Information) før du installerer dette produktet.
Antes de instalar este produto, leia as Informações sobre Segurança.
© Copyright IBM Corp. 2006, 2008 vii
Antes de instalar este producto, lea la información de seguridad.
Läs säkerhetsinformationen innan du installerar den här produkten.
Guidelines for trained service technicians:
This section contains information for trained service technicians.
Inspecting for unsafe conditions:
Use the information in this section to help you identify potential unsafe conditions in an IBM product that you are working on. Each IBM product, as it was designed and manufactured, has required safety items to protect users and service technicians from injury. The information in this section addresses only those items. Use good judgment to identify potential unsafe conditions that might be caused by non-IBM alterations or attachment of non-IBM features or options that are not addressed in this section. If you identify an unsafe condition, you must determine how serious the hazard is and whether you must correct the problem before you work on the product.
Consider the following conditions and the safety hazards that they present:
v Electrical hazards, especially primary power. v Primary voltage on the frame can cause serious or fatal electrical shock. v Explosive hazards, such as a damaged CRT face or a bulging capacitor. v Mechanical hazards, such as loose or missing hardware.
inspect the product for potential unsafe conditions, complete the following steps:
To
1. Make sure that the power is off and the power cord is disconnected.
2. Make sure that the exterior cover is not damaged, loose, or broken, and observe any sharp edges.
3. Check the power cord:
v Make sure that the third-wire ground connector is in good condition. Use a
meter to measure third-wire ground continuity for 0.1 ohm or less between the external ground pin and the frame ground.
v Make sure that the power cord is the correct type, as specified in the
documentation for your BladeCenter unit type.
v Make sure that the insulation is not frayed or worn.
Remove the cover.
4.
5. Check for any obvious non-IBM alterations. Use good judgment as to the safety of any non-IBM alterations.
6. Check inside the blade server for any obvious unsafe conditions, such as metal filings, contamination, water or other liquid, or signs of fire or smoke damage.
7. Check for worn, frayed, or pinched cables.
8. Make sure that the power-supply cover fasteners (screws or rivets) have not been removed or tampered with.
Guidelines for servicing electrical equipment:
Observe the following guidelines when servicing electrical equipment:
v Check the area for electrical hazards such as moist floors, nongrounded power
extension cords, and missing safety grounds.
viii BladeCenter QS22 Type 0793: Problem Determination and Service Guide
v Use only approved tools and test equipment. Some hand tools have handles that
are covered with a soft material that does not provide insulation from live electrical current.
v Regularly inspect and maintain your electrical hand tools for safe operational
condition. Do not use worn or broken tools or testers.
v Do not touch the reflective surface of a dental mirror to a live electrical circuit.
The surface is conductive and can cause personal injury or equipment damage if it touches a live electrical circuit.
v Some rubber floor mats contain small conductive fibers to decrease electrostatic
discharge. Do not use this type of mat to protect yourself from electrical shock.
v Do not work alone under hazardous conditions or near equipment that has
hazardous voltages.
v Locate the emergency power-off (EPO) switch, disconnecting switch, or electrical
outlet so that you can turn off the power quickly in the event of an electrical accident.
v Disconnect all power before you perform a mechanical inspection, work near
power supplies, or remove or install main units.
v Before you work on the equipment, disconnect the power cord. If you cannot
disconnect the power cord, have the customer power-off the wall box that supplies power to the equipment and lock the wall box in the off position.
v Never assume that power has been disconnected from a circuit. Check it to
make sure that it has been disconnected.
v If you have to work on equipment that has exposed electrical circuits, observe
the following precautions: Make sure that another person who is familiar with the power-off controls is
near you and is available to turn off the power if necessary.
When you are working with powered-on electrical equipment, use only one
hand. Keep the other hand in your pocket or behind your back to avoid creating a complete circuit that could cause an electrical shock.
When using a tester, set the controls correctly and use the approved probe
leads and accessories for that tester.
Stand on a suitable rubber mat to insulate you from grounds such as metal
floor strips and equipment frames.
Use extreme care when measuring high voltages.
v
v To ensure proper grounding of components such as power supplies, pumps,
blowers, fans, and motor generators, do not service these components outside of their normal operating locations.
v If an electrical accident occurs, use caution, turn off the power, and send another
person to get medical aid.
Important:
All caution and danger statements in this documentation begin with a number. This number is used to cross reference an English caution or danger statement with translated versions of the caution or danger statement in the IBM Safety Information book.
For example, if a caution statement begins with a number 1, translations for that caution statement appear in the IBM Safety
Information book under statement 1.
Safety ix
Be sure to read all caution and danger statements in this documentation before performing the instructions. Read any additional safety information that comes with the blade server or optional device before you install the device.
x BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Statement 1:
DANGER
Electrical
current from power, telephone, and communication cables is
hazardous.
To avoid a shock hazard: v Do not connect or disconnect any cables or perform installation,
maintenance, or reconfiguration of this product during an electrical storm.
v Connect all power cords to a properly wired and grounded electrical
outlet.
v Connect to properly wired outlets any equipment that will be attached to
this product.
v When possible, use one hand only to connect or disconnect signal
cables.
v Never turn on any equipment when there is evidence of fire, water, or
structural damage.
v Disconnect the attached power cords, telecommunications systems,
networks, and modems before you open the device covers, unless instructed otherwise in the installation and configuration procedures.
v Connect and disconnect cables as described in the following table when
installing, moving, or opening covers on this product or attached devices.
To Connect: To Disconnect:
1. Turn everything OFF.
2. First, attach all cables to devices.
3. Attach signal cables to connectors.
4. Attach power cords to outlet.
1. Turn everything OFF.
2. First, remove power cords from outlet.
3. Remove signal cables from connectors.
4. Remove all cables from devices.
5. Turn device ON.
Safety xi
Statement 2:
CAUTION: When replacing the lithium battery, use only IBM Part Number 43W9859 or 03N2449 or an equivalent type battery recommended by the manufacturer. If your system has a module containing a lithium battery, replace it only with the same module type made by the same manufacturer. The battery contains lithium and can explode if not properly used, handled, or disposed of.
Do not:
v Throw or immerse into water v Heat to more than 100°C (212°F) v Repair or disassemble
Dispose
of the battery as required by local ordinances or regulations.
xii BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Statement 3:
CAUTION: When laser products (such as CD-ROMs, DVD drives, fiber optic devices, or transmitters) are installed, note the following:
v Do not remove the covers. Removing the covers of the laser product could
result in exposure to hazardous laser radiation. There are no serviceable parts inside the device.
v Use of controls or adjustments or performance of procedures other than
those specified herein might result in hazardous radiation exposure.
DANGER
laser products contain an embedded Class 3A or Class 3B laser
Some diode. Note the following.
Laser radiation when open. Do not stare into the beam, do not view directly with optical instruments, and avoid direct exposure to the beam.
Class 1 Laser Product Laser Klasse 1 Laser Klass 1 Luokan 1 Laserlaite Appareil A Laser de Classe 1
`
Safety xiii
Statement 4:
18 kg (39.7 lb) 32 kg (70.5 lb) 55 kg (121.2 lb)
CAUTION: Use safe practices when lifting.
Statement 5:
CAUTION: The power control button on the device and the power switch on the power supply do not turn off the electrical current supplied to the device. The device also might have more than one power cord. To remove all electrical current from the device, ensure that all power cords are disconnected from the power source.
2
1
xiv BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Statement 8:
CAUTION: Never remove the cover on a power supply or any part that has the following label attached.
Hazardous voltage, current, and energy levels are present inside any component that has this label attached. There are no serviceable parts inside these components. If you suspect a problem with one of these parts, contact a service technician.
Statement 13:
DANGER
Overloading a branch circuit is potentially a fire hazard and a shock hazard under certain conditions. To avoid these hazards, ensure that your system electrical requirements do not exceed branch circuit protection requirements. Refer to the information that is provided with your device for electrical specifications.
Statement 21:
CAUTION: Hazardous energy is present when the blade is connected to the power source. Always replace the blade cover before installing the blade.
Safety xv
WARNING: Handling the cord on this product or cords associated with accessories
sold with this product, will expose you to lead, a chemical known to the State of California to cause cancer, and birth defects or other reproductive harm. Wash
hands after handling.
ADVERTENCIA: El contacto con el cable de este producto o con cables de
accesorios que se venden junto con este producto, pueden exponerle al plomo, un elemento químico que en el estado de California de los Estados Unidos está considerado como un causante de cancer y de defectos congénitos, además de otros riesgos reproductivos. Lávese las manos después de usar el producto.
xvi BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Chapter 1. Introduction
This Problem Determination and Service Guide contains information to help you solve problems that might occur when installing and using your IBM® BladeCenter®. It describes the diagnostic tools that come with the BladeCenter QS22, error codes and suggested actions. It also describes how to replace failing components.
Replaceable components are of three types:
v Tier 1 customer replaceable unit (CRU): Replacement of Tier 1 CRUs is your
responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation.
v Tier 2 CRU: You may install a Tier 2 CRU yourself or request IBM to install it, at
no additional charge, under the type of warranty service that is designated for your server.
v Field replaceable unit (FRU): FRUs must be installed only by trained service
technicians.
information about the terms of the warranty and getting service and assistance,
For see Warranty and Support Information.
The illustrations in this document might differ slightly from your hardware.
Note:
Related documentation
In addition to this document, the following documentation also comes with the server:
v Installation and User’s Guide
This document is available in Portable Document Format (PDF) on the
Documentation CD. It contains general information about the blade server,
including how to install supported options and how to configure the blade server.
v Safety Information
This document is on the Documentation CD. It contains translated caution and danger statements. Each caution and danger statement that appears in the documentation has a number that you can use to locate the corresponding statement in your language in the Safety Information document.
v Warranty and Support Information
This document is in PDF on the Documentation CD. It contains information about the terms of the warranty and about service and assistance.
Depending
Documentation CD.
The blade server may have features that are not described in the documentation that comes with the server. The documentation might be updated occasionally to include information about those features, or technical updates might be available to provide additional information that is not included in the blade server documentation. The most recent versions of all BladeCenter documentation are at http://www.ibm.com/support/us/en/.
on the server model, additional documentation might be included on the
In addition to the documentation in this library, be sure to review the planning and installation documents for your BladeCenter hardware available at http://www.ibm.com/support/us/en/.
© Copyright IBM Corp. 2006, 2008 1
The IBM Software Development Kit for Multicore Acceleration documentation can be downloaded from http://www.ibm.com/developerworks/power/cell/. This contains information about how to install the operating system and how to program applications for the blade server.
Updates may be available for this and other BladeCenter documents. You can check for the most recent versions at http://www.ibm.com/support/us/en/ or on the BladeCenter Information center at http://publib.boulder.ibm.com/infocenter/systems/.
Notices and statements used in this document
The caution and danger statements that appear in this document are also in the multilingual Safety Information document, which is on the Documentation CD. Each statement is numbered for reference to the corresponding statement in the Safety
Information document.
The following notices and statements are used in this document:
v Notes: These notices provide important tips, guidance, or advice. v Important: These notices provide information or advice that might help you avoid
inconvenient or problem situations.
v Attention: These notices indicate potential damage to programs, devices, or
data. An attention notice is placed just before the instruction or situation in which damage could occur.
v Caution: These statements indicate situations that can be potentially hazardous
to you. A caution statement is placed just before the description of a potentially hazardous procedure step or situation.
v Danger: These statements indicate situations that can be potentially lethal or
extremely hazardous to you. A danger statement is placed just before the description of a potentially lethal or extremely hazardous procedure step or situation.
Features and specifications
The following table provides a summary of the features and specifications of the BladeCenter QS22.
Through the BladeCenter Advanced Management Module, you can view the blade server firmware code and other hardware configuration information.
The QS22 blade server is supported in the IBM BladeCenter H unit, the IBM BladeCenter HT unit, and the IBM BladeCenter S (non RAID type only) unit.
Providing it is supported by the BladeCenter unit, you can install and operate any other model of blade server in the same BladeCenter unit as a BladeCenter QS22.
Note: Power, cooling, removable-media drives, external ports, and advanced
system management are provided by the IBM BladeCenter unit. For more information, see the Planning and Installation Guide for your BladeCenter unit.
2 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Table 1. Blade server features and specifications
Microprocessor:
Integrated functions:
Two IBM® PowerXCell™ 8i 64-bit architecture processors w/VMX with 8 Synergistic Processor Units (SPU), 512 KB L2 cache, 256 KB on each Synergistic Processing Engine (SPE)
v One dual-port 1 Gigabit Ethernet
controller
v Local service processor v 2 IBM PowerXCell 8i companion
chips each providing a PCIe and
Memory: Minimum 4GB DDR2
memory. 2 GB per IBM PowerXCell 8i processor. Maximum system memory 32 GB.
Supports 1 GB and 4 GB DDR2-800 VLP DIMMs. 2 GB DIMM support depends on firmware level.
a single PCI-X interface
v RS-485 interface for
communication with BladeCenter Management Module
v USB Controller
Supported
v Serial attached SCSI (SAS)
expansion card
v High-Speed InfiniBand Card,
IB-4x
v BladeCenter PCI Express I/O
Expansion Unit
v System memory 1 GB DDR2-800
VLP DIMM, total 4 GB per IBM PowerXCell 8i processor
v System memory 2 GB DDR2-800
VLP DIMM, total 8 GB per IBM PowerXCell 8i processor
v System memory 4 GB DDR2-800
VLP DIMM, total 16 GB per IBM PowerXCell 8i processor
v I/O Buffer DIMM VLP DDR2 1GB,
total 1 GB per IBM PowerXCell 8i companion chip
v 8 GB IBM Modular Solid State
Disk
v 8 GB Modular Flash Drive
Options:
Environment:
v Ambient temperature:
Operating temperature: 10°C to
35°C (50°F to 95°F). Altitude: 0 to 2133 m (0 to 7000 ft)
v
Humidity:
Operating temperature: 8% to
80%
Size:
v Height: 24.5 cm (9.7 inches) v Depth: 44.6 cm (17.6 inches) v Width: 2.9 cm (1.14 inches) v Maximum weight: 5 kg (13.2 lb)
Electrical
input:
v Power provided to the blade server
by the BladeCenter unit: 12 V dc
Support for storage
The BladeCenter provides two options for storage:
SAS solution for storage
Onboard USB attached modular flash drive
SAS storage can be available through the following components: a SAS Expansion Card attached to the blade server, one or two SAS connectivity modules in the rear of the BladeCenter unit, and various options to attach the IBM BladeCenter Boot Disk System to the SAS connectivity modules. An optional SAS Expansion Card is available for the BladeCenter QS22.
If your QS22 blade server is installed in an IBM BladeCenter S unit, local SAS drives of the BladeCenter S unit, if present, are also available as storage.
This option provides a modular flash drive for system boot or local storage.
Chapter 1. Introduction 3
Turning on the blade server
The QS22 blade server is hot-swappable and can be inserted into the BladeCenter unit when the unit is already powered up. However, it can only be powered on by one of the methods described in this section. While the blade server is powering up, the power-on LED on the front of the server is lit. See “Blade server controls and LEDs” on page 5 for the power-on LED states.
After you have installed the blade server into a powered up BladeCenter unit, wait until the power on LED on the blade server flashes slowly before turning on the blade server.
You can turn on the blade server in any of the following ways:
Using the power-control button
Providing local power control is enabled, you can press the power-control button (see Figure 1) which is behind the control-panel door on the front of the blade server. Local power control is enabled or disabled through the Advanced Management Module Web interface.
Figure 1. Blade server power-control button
Using the BladeCenter Advanced Management Module
You can use the Advanced Management Module Web interface to turn on the blade server remotely.
Using the Wake on LAN® feature:
If you want to use the Wake on LAN feature, you must enable it through the operating system. Note that Wake on LAN does not operate if it has been disabled through the Advanced Management Module.
In the event of a power failure the BladeCenter unit and then the blade server can
start automatically when power is restored. You must configure this through the BladeCenter Advanced Management Module. See the BladeCenter Management
Module User's Guide for further information about this feature.
Turning off the blade server
When you turn off the blade server, it is still connected to power through the BladeCenter unit and can continue to respond to requests from the service processor, including remote requests to turn the blade server on. To remove all power from the blade server, you must physically remove it from the BladeCenter unit or power off the BladeCenter unit.
Power-on LED
Power-control button
4 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
To avoid loss of data, shut down the Linux® operating system before you turn off the blade server. Shut down the operating system by entering the shutdown -h now command at the command prompt or by choosing shutdown if you are using a graphical user interface (GUI). See your operating system documentation for additional information about shutting down the operating system.
If the BladeCenter unit has not been turned off, the blade server can be turned off in any of the following ways:
Using the power-control button
Press the power-control button behind the control-panel door on the front panel of the blade server. This starts an orderly shutdown of the operating system if it has not been shut down already, providing your operating system supports this feature, before turning off the blade server. If the operating system stops functioning, pressing and holding the power-control button for more than 4 seconds turns off the blade server.
Using the BladeCenter Advanced Management Module
You can use the Advanced Management Module Web interface to turn off the blade server remotely. Yo u can also configure the Advanced Management Module to turn off the blade server automatically if the system is not operating correctly.
Note: After turning off the blade server, wait at least 5 seconds before turning it on
again.
Blade server controls and LEDs
This section describes the controls and LEDs on the front panel of the blade server. For further information about the LEDs and how they can be used to assist in troubleshooting, see “Problems indicated by the front panel LEDs” on page 57.
Location LED
Activity LED
Information LED
Power-on LED
Media-tray select button
Power-control button
CD
Blade-error LED
NMI reset-button
Figure 2. Power-control button and LEDs
Note: The control panel door which normally covers the LEDs and power-control
button is omitted for reasons of clarity.
Activity LED:
This green LED lights when there is network activity.
Chapter 1. Introduction 5
Location LED:
This blue LED is turned on remotely by the system administrator to assist in locating the blade server. The location LED on the BladeCenter unit lights at the same time.
Information LED:
This amber LED lights to indicate that information about a system event has been placed in the Advanced Management Module Event Log. The information LED remains on until turned off by Advanced Management Module or through IBM Director Console.
Blade error LED:
This amber LED lights when a system error has occurred in the blade server.
Power-control button:
Press this button to turn the blade server on or off. The power-control button only has effect if local power control is enabled for the blade server. Local power control is enabled and disabled through the BladeCenter Advanced Management Module Web interface.
Media tray select button:
This button associates the shared BladeCenter unit media tray (DVD/CD drive and USB ports) with the blade server. The LED on the button flashes while the request is being processed, then lights when the ownership of the media tray has been transferred to the blade server.
Power on LED:
reset button
NMI
The blade error LED, information LED, and location LED can be turned off through
the Advanced Management Module Web interface.
System board LEDs
The QS22 blade server has status LEDs on the system board to indicate the health of various components. Some are within the light box while others are in different locations. A lit LEDs indicates an error condition. Complete information about the LEDs can be found in “Troubleshooting charts” on page 56.
It can take approximately 20 seconds for the operating system on the blade server to recognize the media tray.
This green LED indicates the power status of the blade server as follows:
v Flashing rapidly - The service processor on the blade server is
communicating with the BladeCenter Advanced Management Module.
v Flashing slowly - The blade server has power but is not turned on. v Lit continuously (steady) - The blade server has power and is turned on. v Not lit. Either the BladeCenter unit is powered off, or a power failure has
occurred on the blade server or the BladeCenter unit.
If the operating system has been installed, pressing the button with a paper clip or pin causes the operating system to call the Linux kernel debugger.
To find out what if any errors have occurred on the system board, you must:
1. Remove the blade server from the BladeCenter unit
2. Open the cover
3. Press the light path diagnostics switch
6 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
This lights any error LEDs that were turned on during processing. It also lights a green LED to indicate the capacitor is charged and the light path diagnostics system is operating.
Figure 3 shows the location of the light path LEDs and the diagnostics switch.
Modular flash drive error LED
I/O buffer DIMM error LED
System memory DIMM
error LEDs
Light path diagnosis
1
switch
Light path diagnosis
I/O buffer DIMM error LED
LED
1
NMI error LED
CPU fail LED
System board LED
Temperature fault LED
Figure 3. System-board LEDs
Pressing the light path diagnostics switch lights the appropriate LED to indicate where an error has occurred.
System board internal and expansion card connectors
The following illustration shows the location of the connectors for user-installable options.
Chapter 1. Introduction 7
PCIe high-speed connector
Reserved
Flash drive connector
PCI-X expansion connectors
IOBUF 2 DIMM slot
1
DIMM 8 slot
DIMM 7 slot
DIMM 6 slot
DIMM 5 slot
Battery IOBUF 1 DIMM slot
DIMM 4 slot
DIMM 3 slot
DIMM 2 slot
DIMM 1 slot
Figure 4. Locations of the expansion option connectors on the system board
Control panel connector
8 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Chapter 2. Configuring the blade server
This chapter describes how to:
v Communicate with a blade server. v Use System Management Services (SMS) to view and update the system
firmware revision number. This does not require the operating system to be installed.
v Update the baseboard management controller (BMC) firmware using the update
package and the Advanced Management Module.
v Update the system firmware using the command-line utility. v Configure the Ethernet Gigabit dual-port controller in preparation for a network
installation of the operating system.
You can update the BMC firmware through the Advanced Management
Note:
Module Web interface without booting the operating system. However, to update the BMC using the update package or system firmware you must boot the operating system first.
Communicating with the blade server
You do not have to boot the operating system before you can communicate with the QS22 blade server. You can access it through:
Advanced Management Module
The Web-based management and configuration program. This is your main access method to the blade server.
The command-line interface
See “Using the command-line interface” on page 10 for further information.
Serial over LAN (SOL)
This is similar to the serial interface, but allows you to connect to the blade server over the network. See “Using Serial over LAN” on page 10 for further information.
The serial interface
You can connect a PC or compatible terminal directly to the BladeCenter unit. For BladeCenter H and BladeCenter HT you make this direct connection using a special cable, for BladeCenter S you use a special module. See “Using the serial interface” on page 11 for further information.
The BladeCenter unit Serial Breakout cable (or module for
Note:
BladeCenter S) is not supplied with the unit and must be ordered separately.
System
Management Services (SMS)
The SMS utility allows you to view and update the VPD, change the boot device and set network parameters. See “Using the SMS utility program” on page 11 for further information.
Using the Advanced Management Module
The Advanced Management Module is the main means of administering the BladeCenter system. Use the Advanced Management Module Web-based management and configuration program to:
v Configure the BladeCenter unit
© Copyright IBM Corp. 2006, 2008 9
v Update and configure BladeCenter components including the QS22 blade server v Monitor the current system status v Check the event log for system and other errors
Using the Web interface
Complete the following steps to start the Web-based management and configuration program:
1. Open a Web browser. In the address or URL field, type the Internet protocol (IP) address or host name that is assigned for the Management Module remote connection. The default IP address is:
192.168.70.125
The Enter Network Password window opens.
2. Type your user name and password. Before you log in to the Advanced Management Module for the first time, contact your system administrator regarding whether your organization has assigned a user name and password to you. Use the initial (default) user name and password the first time that you log in to the Advanced Management Module. If you have an assigned user name and password, use them for all subsequent logins. All login attempts are documented in the event log.
The initial user ID and password for the Advanced Management Module are:
User ID
Password
Follow the instructions that appear on the screen. Be sure to set the timeout
3. value that you want for your Web session.
BladeCenter management and configuration window opens.
The
For additional information, see the IBM BladeCenter Advanced Management Module User's Guide.
Using the command-line interface
The IBM BladeCenter Advanced Management Module also provides a command-line interface to provide direct access to BladeCenter management functions. Yo u can use this as an alternative to using the BladeCenter Management Module Web interface.
Through the command-line interface, you can issue commands to control the power and configuration of the blade server and other components in the BladeCenter unit. For information and instructions, see the IBM BladeCenter Management Module Command-Line Interface Reference Guide.
Using Serial over LAN
To establish a Serial over LAN (SOL) connection to the blade server, you must configure the SOL feature for the blade server and start an SOL session as described in theIBM BladeCenter Serial over LAN Setup Guide. In addition, the Advanced Management Module must be configured as described in the IBM BladeCenter Management Module User’s Guide, and the BladeCenter unit must be configured as described in the IBM BladeCenter Serial over LAN Setup Guide.
USERID (all capital letters)
PASSW0RD (note the number zero, not the letter O, in PASSW0RD)
10 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Using the serial interface
Use the serial interface to:
v Observe firmware progress. v Run the SMS Utility program v Access the Linux terminal in order to configure Linux.
can connect a PC serially through the BladeCenter unit using a specific UART
You cable. To connect to the serial console, plug the serial cable into the BladeCenter unit and connect the other end to a serial device or computer with a serial port. For more information, see the Installation and User's Guide for your BladeCenter unit.
Set the following parameters for the serial connection on the terminal client:
v 115200 baud v 8 data bits v No parity v One stop bit v No flow control
default, the blade server sends output over SOL and to the serial port on the
By BladeCenter unit. However, the default for input is to use SOL. If you wish to use a device connected to the serial port for input you must press any key on that device while the blade server boots.
Using the SMS utility program
The Advanced Management Module is the main means of administering the BladeCenter unit and the blade servers. However, another utility is provided which in some cases can give more information than that displayed in the Advanced Management Module. This is the System Management Services (SMS) utility program.
The SMS utility program allows you to view and update the VPD, change the boot list and set network parameters.
Starting SMS
Complete the following steps to start SMS:
1. Using a Telnet or SSH client, connect to the Advanced Management Module external Ethernet interface IP address.
2. When prompted, enter a valid user ID and password. The default management module user ID is USERID, and the default password is PASSW0RD, where the 0 is a zero.
Note: The user ID and password may have been changed. If so, check with the
system administrator for a valid id and password.
3. Power cycle the blade server and start an SOL console session by using the
power -cycle -c command.
For example, to power cycle and start an SOL remote text console with a blade server that is in the first bay of the BladeCenter unit, issue the command:
power -cycle -c -T system:blade[1]
To open a console session with a blade that is already powered on, use the command:
console -T system:blade[1]
Chapter 2. Configuring the blade server 11
4. After approximately 30 seconds, you see a sequence of checkpoint codes displayed on the console. These codes are generated by the Power On Self Test (POST).
5. When the POST menu and indicators displays a screen similar to:
QS22 Firmware Starting
Check ROM = OK Build Date = Jan 4 2008 11:31:29 FW Version = "QD-1.26.0-0"
Press "F1" to enter Boot Configuration (SMS) Press "F2" to boot once from CD/DVD
Press F1 to display the SMS menu.
Viewing FRU information
The VPD on each blade server contains details about the machine type or model, serial number and the universal unique ID.
Complete the following steps to see this information:
1. Start SMS by completing the steps in “Starting SMS” on page 11. The SMS menu appears:
PowerPC Firmware Version QD0123000 SLOF-SMS 1.1 (c) Copyright IBM Corp. 2007 All rights reserved.
--------------------------------------------------------------------------------
Main Menu
1. Select Language
2. Setup Remote IPL (Initial Program Load)
3. Change SCSI Settings
4. Select Console
5. Select Boot Options
6. Firmware Boot Side Options
7. Progress Indicator History
8. FRU Information
9. Change SAS Boot Device
--------------------------------------------------------------------------------
Navigation Keys:
X = eXit System Management Services
---------------------------------------------------------------------------
Type menu item number and press Enter or select Navigation key:
---------------------------------------------------------------------------
2. Type 8 to select FRU Information. A screen similar to the following appears:
12 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
PowerPC Firmware
Version QD0123000 SLOF-SMS 1.1 (c) Copyright IBM Corp. 2007 All rights reserved.
--------------------------------------------------------------------------------
FRU Information
Machine Type and Model: 079338x
Machine Serial Number: ABCDEFG
Universal Unique ID: 12345678-1234-1234-1234-123456789ABC
--------------------------------------------------------------------------------
Navigation Keys: M = return to Main Menu ESC key = return to previous screen X = eXit System Management Services
--------------------------------------------------------------------------------
Select Navigation key :
Note: You cannot change the FRU information from this screen, only view it.
Adding
FRU information: When you replace a FRU details are not recorded in
the VPD. Yo u must enter them manually through SMS.
When the system firmware detects an FRU replacement part during boot the process stops to allow you to enter the machine type or model and serial number. Boot does not continue until the information is provided.
To enter new FRU information, complete the following steps:
1. Using a Telnet or SSH client, connect to the Advanced Management Module external Ethernet interface IP address.
2. When prompted, enter a valid user ID and password. The default management module user ID is USERID, and the default password is PASSW0RD, where the 0 is a zero.
Note: The userid and password may have been changed. If so, check with the
system administrator for a valid user id and password.
3. Power cycle the blade and start an SOL console by using the power -cycle -c command. See “Using the SMS utility program” on page 11 for further information.
4. The following screen appears:
Chapter 2. Configuring the blade server 13
PowerPC Firmware
Version QD0123000 SLOF-SMS 1.1 (c) Copyright IBM Corp. 2007 All rights reserved.
--------------------------------------------------------------------------------
Enter Type Model Number (Must be 7 characters, only A-Z, a-z, 0-9 allowed. Press Esc to skip)
Enter Type Model Number :
Type the model number according to the instructions on the screen and press
Enter to continue.
5. You must confirm the model number:
PowerPC Firmware
Version QD0123000 SLOF-SMS 1.1 (c) Copyright IBM Corp. 2007 All rights reserved.
--------------------------------------------------------------------------------
Number entered is: 1234567 Accept number? (Enter ’y’ or ’Y’ to accept or ’n’ or ’N’ to decline)
Select Navigation key :
Type y or Y and press Enter to confirm the number.
6. At the following screen, type the serial number:
14 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
PowerPC Firmware
Version QD0123000 SLOF-SMS 1.1 (c) Copyright IBM Corp. 2007 All rights reserved.
--------------------------------------------------------------------------------
Enter Serial Number (Must be 7 characters, only A-Z, a-z, 0-9 allowed)
Enter Serial Number :
---------------------------------------------------------------------------------
Press Enter to continue.
7. You must now confirm the serial number:
PowerPC Firmware
Version QD0123000 SLOF-SMS 1.1 (c) Copyright IBM Corp. 2007 All rights reserved.
--------------------------------------------------------------------------------
Number entered is: ABCDEFG Accept number? (Enter ’y’ or ’Y’ to accept or ’n’ or ’N’ to decline)
Select Navigation key :
---------------------------------------------------------------------------------
Type y or Y and press Enter to confirm the number. This completes the process and the blade server continues to boot as normal.
Updating the system and BMC firmware
The firmware consists of two distinct packages:
v A firmware package for the baseboard management controller (BMC). This is
referred to as the BMC firmware.
v A firmware package for the basic input/output system (BIOS) which runs on the
IBM PowerXCell 8i processor. This is referred to as system firmware.
Chapter 2. Configuring the blade server 15
Note: The user and operating system interfaces of the system firmware are
based on the Open Firmware standard. Detailed system information is provided through the Open Firmware device tree. You can use the client interface and Run-Time Abstraction Services (RTAS) to run management functions.
firmware
BMC
v Communicates with advanced management module v Controls power on v Initializes the board, including the IBM PowerXCell 8i processors and
clock chips
v Monitors the physical board environment
Updating steps
System
firmware
v Takes over when the BMC has successfully initialized the board v Acts as the basic input/output system (BIOS) v Includes boot-time diagnostics and power-on self test v Prepares the system for the operating system boot
packages are delivered separately and do not follow the same versioning
The scheme.
IBM periodically makes updates to both BMC and system firmware. These may be downloaded from http://www.ibm.com/support/us/en/.
Note: To avoid problems and to maintain proper system performance, always make
sure that both the BMC firmware and the system firmware are at the same level for all QS22 blade servers within the BladeCenter unit.
Complete the following steps to update the BMC and system firmware images:
1. Check the revision level of the firmware on the blade server and the level of the updates on http://www.ibm.com/support/us/en/. If the level on the Web site is higher than the version currently installed, continue with the updating steps.
2. Download the firmware updates.
3. Power off the blade server you wish to update.
4. Update the BMC firmware using the BMC update package or the Advanced Management Module. See “Updating the BMC firmware” on page 17 for further information.
5. Power on the blade server. This boots it with the new BMC firmware.
6. Update the system firmware image. See “Installing the system firmware” on page 20 for further information.
7. The system reboots. This boots the blade server with the new system firmware.
8. Shut down the blade server.
There may be instances where you must update the BMC firmware before
Note:
updating the system firmware. Check the readme file that comes with each firmware package for more information.
Determining current blade server firmware levels
Complete the following steps to view the current firmware code levels for both the BMC and the system firmware:
16 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
1. To check the BMC firmware level, access and log on to the Advanced Management Module Web interface as described in the Management Module User's Guide.
2. From the Monitors menu section, select Firmware VPD:
The Blade Server Firmware Vital Product Data (VPD) window shows the build identifier, release, and revision level of both the system firmware/BIOS and the BMC firmware. In the example above, the system firmware or BIOS version is QB01020000 and the BMC firmware is BLBT06b.
Compare this information to the firmware information provided at http://www.ibm.com/support/us/en/. If the two match, then the blade server has the latest firmware. If not, download the firmware package from the IBM Support Web site. See “Updating the BMC firmware” or the IBM Support Web site for installation instructions.
You can also view the system firmware level from within the operating system by using the following command:
xxd /proc/device-tree/openprom/ibm,fw-vernum_encoded
Output is similar to:
0000000: 5142 3031 3031 3030 3000 00 QB0101000..
where QB0101000 is the system firmware version.
Note: The system firmware version displayed by the BladeCenter Advanced
Management Module might be different from the version displayed by your operating system. Cross-reference information is given in the firmware information at http://www.ibm.com/support/us/en/, and in the readme file which comes with the firmware image.
Updating the BMC firmware
You can update the BMC firmware from the Linux prompt using the update package or from the Advanced Management Module.
Chapter 2. Configuring the blade server 17
Using the BMC update package
Complete the following steps to update the BMC firmware from the Linux command prompt:
1. Check the README that comes with the BMC firmware as it contains specific information about that particular firmware release.
2. Boot the blade server and the operating system.
3. Download the package from the IBM support site at http://www.ibm.com/support/ us/en/. The update package has a .sh extension.
4. Change to the directory where you have downloaded the package.
5. Run the package using the -s option.
6. Reboot the blade server.
Using the Advanced Management Module
Complete the following steps to update the BMC firmware:
1. Download the BMC firmware image file from http://www.ibm.com/support/us/en/ to a suitable location on a server that is accessible on the network.
2. Uncompress the .zip file. The BMC firmware image file name has the format BLBT<version number>.zip.
3. Power off the blade you want to update.
4. Log in to the Advanced Management Module Web interface.
5. Click Firmware Update from the Blade Tasks submenu at the left of your screen. The following screen appears:
6. Choose the blade you want to update (target) and browse to the firmware image file.
7. Click on Update.
8. The validity of the image is checked, then the following screen appears:
18 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Click Continue.
9. The next screen shows the firmware update progress:
When the update is finished, a confirmation message appears and an entry is placed in the Advanced Management Module log.
10. Power up and boot the blade server.
Chapter 2. Configuring the blade server 19
Note: QS22 firmware contains a proprietary implementation of Cell Broadband
Engine™ hardware initialization code.
Installing the system firmware
System firmware can only be installed after the operating system has booted. If the operating system is not installed or cannot boot, then no upgrade or recovery is possible. See the other sections of the manual Chapter 5, “Diagnostics and troubleshooting,” on page 55 for further information about troubleshooting the QS22 blade server.
You can update the system firmware:
v Through IBM Director. See the IBM Director documentation on the IBM Director
CD for further information.
v Using the update package available from http://www.ibm.com/support/us/en/. See
“Updating the system firmware automatically” on page 21 for further information on how to perform an update.
v Using the update_flash script available on supported Linux operating systems.
This requires the system firmware image file. See “The firmware update package” for information about how to extract the file.
v Updating the firmware manually. See “Installing the firmware manually” on page
21 for further information.
all the above options Linux needs to have a current version of rtas_flash
For device driver installed. This is normally installed with the operating system. If it is not, see the installation guide for the Software Development Kit for Multicore Acceleration for instructions about how to get this device driver and install it.
Note: You may have to update the BMC before updating the system firmware. See
the README file that comes with the package.
The firmware update package
You can now update firmware using the update packages available from http://www.ibm.com/support/us/en/. These can be installed either through IBM Director or by executing the .sh file contained in the package. This section describes how to use the update package to install the firmware update or extract the firmware image for manual installation.
To install the firmware package using IBM Director, see the documentation on the
IBM Director CD.
Note: The blade server must be configured and have a running Linux operating
system before the package can be extracted or installed.
The update package consists of 4 files:
v A file containing the change history for the QS22 system firmware. This has a
.chg extension.
v A file containing the update package. This has an .sh extension. v A readme file for the update package. This contains specific installation and
configuration information.
v An XML file. This file is for use by IBM Systems Management tools, including
IBM Director Update Manager, UpdateXpress CD, and UpdateXpress System Pack Installer.
20 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Using the package
The package consists of an file with a .sh extension that runs from the Linux prompt. It has a number of options. To see what options are available, run the package without any options or with the -h switch:
# ./ibm_fw_bios_qb-1.9.1-2_linux-pq_cell.sh
In this example, ibm_fw_bios_qb-1.9.1-3_linux-pq_cell.sh is the name of the firmware update package. The file name changes according to the version of the firmware.
A screen similar to the following appears:
Usage:
-x /someDirectory - Extract the payload to <some directory>
-xr /someDirectory - Extract the payload plus PkgSdk files to <some directory>
-xd /dev/fd0 - Create a DOS bootable diskette - Internel floppy drive
-xd /dev/sda - Create a DOS bootable diskette - External USB floppy drive
-u - Perform update unattended
-h - Display this help screen ++debug - Display helpful debug information
Note: All other command line arguments are passed to the payload executable
The -xd options are not supported on the QS22 blade server.
The -x option
This enables to extract another executable file, in this example
ibm_fw_bios_qb-1.9.1-2.sh which in turn may be run to create the .bin file
required if you wish to update the firmware manually. See “Installing the firmware manually” for further information.
The -u option
This performs an unattended and automatic update of the system firmware. The blade server reboots automatically as part of the update process.
Updating the system firmware automatically
Complete the following steps to update the firmware automatically using the update package:
1. Check the README before attempting to update the system firmware as it contains specific information about the particular firmware release.
2. Download the update package from http://www.ibm.com/support/us/en/. The update package has a .sh extension.
3. Change to the directory where you have downloaded the package.
4. Run the package with the -u option. Using the example from above, at the command prompt enter:
./ibm_fw_bios_qb-1.9.1-2_linux-pq_cell.sh -u
5. Check the system firmware images to confirm the update has succeeded. See “Determining current blade server firmware levels” on page 16 for instructions.
Installing the firmware manually
If you cannot update the firmware using the update_flash script, it is possible to update the firmware manually. Yo u can use rtas_flash over /proc.
Complete the following steps to install the firmware manually:
1. Download the update package from http://www.ibm.com/support/us/en/.
Chapter 2. Configuring the blade server 21
2. Extract the system firmware image package. At the command prompt enter:
./<update package> -x <target directory>
For example, to extract the image package ibm_fw_bios_qb-1.9.1-2.sh from ibm_fw_bios_qb-1.9.1-2_linux-pq_cell.sh in the directory /temp/fwimage
enter:
./ibm_fw_bios_qb-1.9.1-2_linux-pq_cell.sh -x /temp/fwimage
If the directory does not exist the firmware package creates it.
3. Change to the directory containing the firmware image package.
4. Extract the firmware image. At the command prompt enter:
./<image package> -x
For example, to extract the image file QB-1.9.1-2-boot_rom.bin from ibm_fw_bios_qb-1.9.1-2.sh enter:
./ibm_fw_bios_qb-1.9.1-2.sh -x
5. Ensure the rtas_flash driver is loaded. To do this, run lsmod.
6. If the module is not yet in the kernel, invoke the following to load it:
modprobe rtas_flash
7. To update your current firmware, copy the image file to /proc/ppc64/rtas/ firmware_update and reboot manually:
cp <image-file> /proc/ppc64/rtas/firmware_update shutdown —r now
For example, to copy the image file cp QB-1.9.1-2-boot_rom.bin to /proc/ppc64/rtas/firmware_update enter:
cp QB-1.9.1-2-boot_rom.bin /proc/ppc64/rtas/firmware_update shutdown —r now
8. Once the system reboots, update the system firmware images. See “Updating the system firmware images” for instructions.
Updating the system firmware images
Once the system firmware is updated, the QS22 blade server boots from the new firmware. However, there are always two copies of the system firmware image on the blade server:
TEMP This is the firmware image normally used in the boot process. When the
firmware is updated, it is the TEMP image that is replaced.
PERM This is a backup copy of the system firmware boot image. The blade server
only boots from this image if the TEMP image is corrupt. See “Recovering the system firmware code” on page 64 for further information about how to recover from a corrupt TEMP image.
you have updated the system firmware and booted the blade server, you
Once should copy the TEMP image to the PERM image. This ensures that the PERM and TEMP images are at the same revision level. The TEMP and PERM images should always be at the same revision level.
There are two commands you can use to update an old image on PERM.
v From the Linux prompt issue the following command:
update_flash -c
22 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Note: The script checks whether the board has booted from the TEMP image. If
not, the script does not complete.
v From the Linux prompt issue the following command:
echo 0 > /proc/rtas/manage_flash
For more information on booting from the TEMP or PERM images, see “Recovering the system firmware code” on page 64.
Updating the optional expansion card firmware
If you have installed the SAS optional expansion card or a high-speed expansion card, for example the InfiniBand card, you may have to update the firmware. See the documentation that comes with the components for instructions about how to update the firmware.
IBM periodically makes updates available for both SAS and high-speed expansion cards. These may be downloaded from http://www.ibm.com/support/us/en/.
Integrating the Gigabit Ethernet controller into the BladeCenter
One dual-port Gigabit Ethernet controller is integrated on the blade server system board. Each controller port provides a 1000-Mbps full-duplex interface connecting to one of the Ethernet Switch Modules in I/O bays 1 and 2 of the BladeCenter unit, which enables simultaneous transmission and reception of data on the Ethernet local area network (LAN).
Each Ethernet-controller port on the system board is routed to a different switch module in I/O bay 1 or bay 2. The routing from the Ethernet-controller port to the I/O bay varies according to whether an Ethernet adapter is enabled and the operating system that is installed. See “Blade server Ethernet controller enumeration” on page 25 for information about how to determine the routing from the Ethernet-controller ports to I/O bays for your blade server.
You do not have to set any jumpers or configure the controller for the blade server operating system. However, you must install a device driver to enable the blade server operating system to address the Ethernet-controller ports. For device drivers and information about configuring your Ethernet controller ports, see the Ethernet software documentation that comes with your blade server, or contact your IBM marketing representative or authorized reseller. For updated information about configuring the controllers, go to the Barcelona Computing Centre Web site at http://www.bsc.es/projects/deepcomputing/linuxoncell/.
If your blade server contains a different type of optional Ethernet-compatible
Note:
switch module in I/O bay 1 than the switch modules that are mentioned in this section, see the documentation that comes with the Ethernet switch module that you are using.
Updating the Ethernet controller firmware
To update the Ethernet controller firmware, you must download an update package from http://www.ibm.com/support/us/en/. This section describes how to use the update package to install the firmware update.
The update package consists of four files:
Chapter 2. Configuring the blade server 23
v A file containing the change history for the QS22 Ethernet Controller firmware.
This has a .chg extension.
v A file containing the update package. This has an .sh extension. v A readme file for the update package. This contains specific installation and
configuration information.
v An XML file. This file is for use by IBM Systems Management tools, including
IBM Director Update Manager, UpdateXpress CD, and UpdateXpress System Pack Installer.
Using the update package
The package consists of an file with a .sh extension that runs from the Linux prompt. It has a number of options. To see what options are available, run the package without any options or with the -h switch:
# ./brcm_fw_nic_2.0.3-e-1_rhel5_cell.sh
In the example shown above, brcm_fw_nic_2.0.3-e-1_rhel5_cell.sh is the name of the firmware update package. The file name changes according to the version of the firmware.
A screen similar to the following appears:
Usage:
-x /someDirectory - Extract the payload to <some directory>
-xr /someDirectory - Extract the payload plus PkgSdk files to <some directory>
-xd /dev/fd0 - Create a DOS bootable diskette - Internel floppy drive
-xd /dev/sda - Create a DOS bootable diskette - External USB floppy drive
-u - Perform update unattended
-h - Display this help screen ++debug - Display helpful debug information
The -xd and -x options are not supported on QS22.
The -u option performs an unattended and automatic update of the firmware. The blade server reboots automatically as part of the update process.
Firmware update steps
Complete the following steps to update the firmware automatically:
1. Check the README before attempting to update the system firmware as it contains specific information about the particular firmware release.
2. Download the update package from http://www.ibm.com/support/us/en/. The update package has a .sh extension.
3. Change to the directory where you have downloaded the package.
4. Run the package with the -u option. Using the example from above, at the command prompt enter:
./ brcm_fw_nic_2.0.3-e-1_rhel5_cell.sh -u
During the update process, messages similar to the following appear on the console:
24 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
[root@c4b14 brcm-2.0.3-ppc]# ./ brcm_fw_nic_2.0.3-e-1_rhel5_cell.sh -u IBM Ethernet Firmware Update Tool, Version 1.0.2
Warning. No Broadcom NetXtreme II adapters found.
ADAPTER MAC BOOT IPMI ASF PXE UMP
------------------- ---- ---- --- --- --­001A640E030C (5704s) 3.21 2.20 NA NA NA 001A640E030D (5704s) NA NA NA NA NA
Updating Broadcom NetXtreme adapters. Updating 001A640E030C using file 16A8bc.bin ---> Update successful Updating 001A640E030C using file 16A8ipmi.bin ---> Update successful Error! Firmware not detected on device 001A640E030D.
Warning. No Broadcom NetXtreme II adapters found.
ADAPTER MAC BOOT IPMI ASF PXE UMP
------------------- ---- ---- --- --- --­001A640E030C (5704s) 3.38 2.47 NA NA NA 001A640E030D (5704s) NA NA NA NA NA
One or more errors occurred during the firmware update process. See /var
Note: The error message shown above is correct as it refers to an adapter not
available on QS22.
Blade server Ethernet controller enumeration
The enumeration of the Ethernet controller or controller ports in a blade server is operating system dependent. Yo u can verify the Ethernet controller or controller port designations that a blade server uses through your operating system settings.
The routing of an Ethernet controller or controller port to a particular BladeCenter unit I/O bay depends on the type of Ethernet controller that is installed. You can verify which Ethernet-controller port is routed to which I/O bay by using the following test:
1. Install only one Ethernet switch module or pass-thru module, in I/O bay 1.
2. Make sure that the ports on the switch module or pass-thru module are enabled (Switch Tasks Management Advanced Switch Management in the BladeCenter Management Module Web interface).
3. Enable only one of the Ethernet-controller ports on the blade server. Note the designation that the blade server operating system has for the controller port.
4. Ping an external computer on the network connected to the Ethernet switch module. If you can ping the external computer, the Ethernet-controller port that you enabled is associated with the switch module in I/O bay 1. The other Ethernet-controller port in the blade server is associated with the switch module in I/O bay 2.
Communications from optional I/O expansion cards are routed to I/O bays 3 and 4. If you have installed an I/O expansion card on the blade server you can verify which controller port on an expansion card is routed to which I/O bay by performing the same test, using a controller on the expansion card and a compatible switch module or pass-thru module in I/O bay 3 or 4.
Chapter 2. Configuring the blade server 25
26 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Chapter 3. Parts listing
This parts listing supports BladeCenter QS22 replaceable components. To check for an updated parts list on the Web, do the following:
1. Go to http://www.ibm.com/support/us/en.
2. Under Find resources, select Upgrades, accessories and parts.
Replaceable components
Replaceable components are of three types:
v Tier 1 customer replaceable unit (CRU): Replacement of Tier 1 CRUs is your
responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation.
v Tier 2 CRU: You may install a Tier 2 CRU yourself or request IBM to install it, at
no additional charge, under the type of warranty service that is designated for your server.
v Field replaceable unit (FRU): FRUs must be installed only by trained service
technicians.
information about the terms of the warranty and getting service and assistance,
For see Warranty and Support Information.
The following table lists which replaceable components are available for the BladeCenter QS22.
Description FRU No.
DIMM VLP 1 GB DDR2 system memory 46C0502 DIMM VLP 2 GB DDR2 system memory 46C0515 DIMM VLP 4 GB DDR2 system memory 46C0516 DIMM VLP 1 GB DDR2 I/O buffer memory only 46C0502 Cisco 4X Infiniband Expansion Card for IBM BladeCenter 32R1763 Front bezel assembly 43W9932 BladeCenter QS22 blade assembly, base and planar 60H3199 SAS expansion card 39Y9188 BladeCenter PCI Express I/O Expansion Unit 43W4390 Air baffle for I/O buffer DIMM connector 60H2962 Mounting tray for modular flash drive 42C0593 8 GB Modular Flash Drive 43W3932 8 GB IBM Modular Solid State Disk 60H4322 Miscellaneous Parts Kit 60H3473 Blade Cover and Warning Label 46C7201 System Service Label 60H3471 FRU List Label 60H3472
Tier 1 CRU
No.
Tier 2 CRU
No.
Part numbers can change and other options can become available. For the latest information, check the IBM Web site at http://www.ibm.com/support/us/en/.
© Copyright IBM Corp. 2006, 2008 27
Consumable parts
Consumable parts are not covered by the IBM Statement of Limited Warranty. The following consumable parts are available for purchase from the retail store:
Description Part Number
3V lithium battery 43W9859
28 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Chapter 4. Installing and removing replaceable units
This chapter provides instructions for installing or replacing units on the blade server. Replaceable units are components, such as memory modules, and I/O expansion cards. Some removal instructions are provided in case you need to replace one replaceable unit with another.
You can replace the following items:
v Battery v Front bezel assembly (control panel) v Blade server cover v Impedance air baffles v Air baffle for I/O buffer DIMM connector v Miscellaneous parts
can add, remove or replace the following optional items:
You
v Modular flash drive v High-speed expansion card v System memory DDR2 modules v I/O buffer DDR2 memory modules v SAS expansion card v BladeCenter PCI Express I/O Expansion Unit
This chapter also details how to replace the system board. The system board is a field replaceable unit (FRU): FRUs must be installed only by trained service technicians.
Installation guidelines
Before you begin, read the following:
v Read the safety information beginning on page vii and the guidelines in “Handling
static-sensitive devices” on page 30. This information will help you work safely with the blade server and components.
v You do not have to turn off the blade server or disconnect the BladeCenter unit
from power to install or replace any of the hot-swappable modules on the rear of the BladeCenter unit.
v Before you remove a hot-swappable blade server from the BladeCenter unit, you
must shut down the operating system on it by typing the shutdown -h now command or choosing the shut down option from your GUI. See “Turning off the blade server” on page 4 for details. You do not have to shut down the BladeCenter unit itself.
v Blue on a component indicates touch points, where you can grip the component
to remove it from or install it in the blade server or BladeCenter unit, open or close a latch, and so on. An exception to this rule are the DIMM clips and the battery clip that are also touch points but not colored blue.
v Orange on a component or an orange label on or near a component indicates
that the component can be hot-swapped. You can remove or install the component while the blade server or BladeCenter unit is running providing the blade server or BladeCenter unit and operating system support the hot-swappable capability. Orange can also indicate touch points on hot-swappable components. See the instructions for removing or installing a
© Copyright IBM Corp. 2006, 2008 29
specific hot-swappable component for any additional procedures that you might have to perform before you remove or install the component.
There are no hot-swappable components on the QS22 blade server. To
Note:
replace parts, you must turn off the blade server and remove it from the BladeCenter unit.
System reliability guidelines
To help ensure proper cooling and system reliability, be sure that:
v The ventilation holes on the blade server are not blocked. v Each of the blade bays on the front of the BladeCenter unit has a blade server or
blade filler installed. Do not operate the BladeCenter unit for more than 1 minute without a blade server or blade filler installed in each blade bay.
v You have followed the reliability guidelines in the documentation that comes with
the BladeCenter unit.
Handling static-sensitive devices
Attention: Static electricity can damage electronic devices and your system. To
avoid damage, keep static-sensitive devices in their static-protective packages until you are ready to install them.
To reduce the possibility of electrostatic discharge, observe the following precautions:
v Limit your movement. Movement can cause static electricity to build up around
you.
v Handle the device carefully, holding it by its edges or its frame. v Do not touch solder joints, pins, or exposed printed circuitry. v Do not leave the device where others can handle and damage it. v While the device is still in its static-protective package, touch it to an unpainted
metal part of the BladeCenter chassis for at least 2 seconds. This drains static electricity from the package and from your body.
v Remove the device from its package and install it directly into the blade server or
BladeCenter unit without setting the device down. If it is necessary to set down the device, put it back into its static-protective package. Do not place the device on the blade server cover or on a metal surface.
v Take additional care when handling devices during cold weather. Heating reduces
indoor humidity and increases static electricity.
v Wear an electrostatic-discharge wrist strap, if one is available.
Removing the blade server from the BladeCenter unit
The blade server is a hot-swappable device, and the blade bays in the BladeCenter unit are hot-swappable bays. Therefore, you can install or remove the blade server without removing power from the BladeCenter unit. However, you must turn off the blade server before removing it from the BladeCenter unit.
30 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Attention:
v To maintain proper system cooling, do not operate the BladeCenter unit for more
than 1 minute without a blade server or blade fillers installed in each blade bay.
v Note the number of the bay that contains the blade server before you remove it.
You must reinstall the blade server in the same bay from which it was removed. Reinstalling a blade server into a different bay than the one from which it was removed could have unexpected consequences, such as incorrect reconfiguration of the blade server. Some blade server configuration information and update options are established according to bay number.
If you reinstall the blade server into a different bay, you might have to reconfigure the blade server.
Release­handles open
Figure 5. Removing the blade server
Complete the following steps to remove the blade server:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. If the blade server is operating, the power on LED is lit continuously (steady). Before you remove a blade server from the BladeCenter unit, you must shut down the operating system on it by typing the shutdown -h now command or choosing the shut down option from your GUI. See “Turning off the blade server” on page 4 for details. You do not have to shut down the BladeCenter unit itself.
3. Open the two release levers as shown in Figure 5. The blade server moves out of the bay approximately 0.6 cm (0.25 inch).
4. Pull the blade server out of the bay.
5. Place either a blade filler or a new blade server in the bay within 1 minute.
Opening and removing the blade server cover
You must open the blade server cover to access, install or remove any of the replaceable items.
If a BladeCenter PCI Express I/O Expansion Unit has been installed on your blade server, ignore this section and follow the instructions in “Removing the BladeCenter PCI Express I/O Expansion Unit” on page 32 instead.
Chapter 4. Installing and removing replaceable units 31
Cover pins
Cover release
Figure 6. Opening the blade server cover
Cover release
Complete the following steps to open the blade server cover:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. If applicable, shut down the operating system, turn off the blade server, and remove the blade server from the BladeCenter unit. See “Removing the blade server from the BladeCenter unit” on page 30.
3. Carefully place the blade server on a flat, static-protective surface, with the cover side up.
4. Press the blue blade cover release on each side of the blade server and lift the outer cover open (see Figure 6).
5. If you want to remove the cover, carefully lift it from the cover pins and set it aside (see Figure 6).
Statement 21:
CAUTION: Hazardous energy is present when the blade server is connected to the power source. Always replace the blade cover before installing the blade server.
Removing the BladeCenter PCI Express I/O Expansion Unit
You must remove BladeCenter PCI Express I/O Expansion Unit, if installed, to access, install or remove any of the replaceable items.
32 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Cover pins
Cover release
Cover release
Figure 7. Removing the expansion unit
Complete the following steps to remove BladeCenter PCI Express I/O Expansion Unit:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. Carefully place the blade server on a flat, static-protective surface, with the expansion unit side facing up.
3. Press the blue blade cover release on each side of the blade server and lift the expansion unit (see Figure 7).
4. To remove the expansion unit, carefully lift it from the cover pins and set it aside.
Removing the blade-server front bezel assembly
Before you can add, remove, or replace system memory DIMM modules, replace a defective system board assembly, or replace the blade server front bezel assembly, you must first remove the blade server front bezel assembly.
Chapter 4. Installing and removing replaceable units 33
Blade cover
Cover release
Bezel release
Cover release
Bezel release
Figure 8. Removing the front bezel assembly
Control-Panel Cable
Bezel assembly
Control-Panel Connector
Complete the following steps to remove the front bezel assembly:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. If applicable, shut down the operating system, turn off the blade server, and remove the blade server from the BladeCenter unit. See “Removing the blade server from the BladeCenter unit” on page 30.
3. Open the blade server cover. See “Opening and removing the blade server cover” on page 31.
4. Carefully disconnect the control panel cable from the control panel connector (see Figure 8).
5. Press the front bezel release on both sides of the system board and pull the front bezel assembly away from the blade server.
6. Store the front bezel assembly in a safe place.
Installing the optional modular flash drive
The modular flash drive connects to the flash drive connector on the system board and provides non-volatile memory. This may be used, for example, for installing an operating system.
Complete the following steps to install the modular flash drive:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. If applicable, shut down the operating system, turn off the blade server, and remove the blade server from the BladeCenter unit. See “Removing the blade server from the BladeCenter unit” on page 30.
3. Open the blade server cover. See “Opening and removing the blade server cover” on page 31.
34 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
4. Locate the flash drive retention clip and connector on the system board.
Ta b
Flash drive
Connector
Retention clip
Figure 9. Fitting the modular flash drive
5. Locate the connector on the back of the modular flash drive.
6. Carefully align the modular flash drive with the retention clip and connector on the system board. Ensure the orientation of the connector on the modular flash drive matches the connector on the system board.
7. Gently press the modular flash drive into position.
8. If applicable, reinstall the high-speed expansion card.
If you have other options to install or remove, do so now. Otherwise, go to “Finishing the installation” on page 50.
Removing the optional modular flash drive
Complete the following steps to remove the modular flash drive:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. If applicable, shut down the operating system, turn off the blade server, and remove the blade server from the BladeCenter unit. See “Removing the blade server from the BladeCenter unit” on page 30.
3. Open the blade server cover. See “Opening and removing the blade server cover” on page 31.
4. Locate the modular flash drive on the system board. The modular flash drive can be underneath a high-speed expansion card.
Chapter 4. Installing and removing replaceable units 35
Ta b
Flash drive
Connector
Retention clip
Figure 10. Removing the modular flash drive
If the modular flash drive is blocked by a high-speed expansion card, you must remove the high-speed expansion card before removing the modular flash drive. See “Removing an optional high-speed expansion card” on page 38.
5. Locate the tab on the modular flash drive. See Figure 10.
6. Using the tab, carefully lift the modular flash drive away from the retention clip and connector on the system board.
7. If applicable, reinstall the high-speed expansion card.
If you have other options to install or remove, do so now. Otherwise, go to “Finishing the installation” on page 50.
Installing an optional high-speed expansion card
You can connect a high-speed expansion card, for example an InfiniBand card, to the high-speed connector on the system board. Use the two expansion card locator pins to assist with fitting the card. If your card has a ball socket, use the socket to lock the card in place.
36 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Locking clip
Connector
Locator pin holes
Ball socket
Figure 11 . High-speed expansion card reverse view
Complete the following steps to install the high-speed expansion card:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. If applicable, shut down the operating system, turn off the blade server, and remove the blade server from the BladeCenter unit. See “Removing the blade server from the BladeCenter unit” on page 30.
3. Remove the blade server cover. See “Opening and removing the blade server cover” on page 31.
4. If you also want to install a modular flash drive, do this before you proceed with installing the high-speed expansion card. See “Installing the optional modular flash drive” on page 34.
5. Locate the high-speed connector on the system board.
Ball stud
High-speed connector
Expansion card standoffs with locator pins
Figure 12. Expansion card connector, locator pins, and ball stud
6. Remove the connector cover.
7. Locate the expansion card locator pins on the standoffs at the back of the system board.
8. Locate the connector and the ball socket on the high-speed expansion card.
1
Chapter 4. Installing and removing replaceable units 37
9. Slide the locator pin holes on the expansion card over the locator pins. The card rests on the locator pins.
Locator pin Locking clipExpansion card
Expansion card standoff
Expansion card
Expansion connector cover
Figure 13. Positioning the high-speed expansion card
10. Carefully press the expansion card into position. Be sure that the ball socket on the card is over the corresponding ball stud on the main board. Use the blue areas only to avoid damage to the card.
11. Check that the blue locking clip is horizontal and that there is no gap between the card and the connector.
Attention: The connectors on the system board and the high-speed expansion
card are not designed for repeated removal or replacement of components. Avoid removing the card once it is in position,
If you have other options to install or remove, do so now. Otherwise, go to “Finishing the installation” on page 50.
Removing an optional high-speed expansion card
If you wish to remove an optional high-speed expansion card, complete the following steps.
1. Read the safety information beginning on page vii and “Installation guidelines”
on page 29.
2. If applicable, shut down the operating system, turn off the blade server, and
remove the blade server from the BladeCenter unit. See “Removing the blade server from the BladeCenter unit” on page 30.
3. Remove the blade server cover. See “Opening and removing the blade server
cover” on page 31.
4. Locate the high-speed expansion card on the system board.
5. Lift the locking clip to the vertical position until the card moves upward and
disengages from the connector.
6. If the card has a ball stud, hold the card at the handling area near the ball stud
and pull it upward until the ball stud disengages with the ball socket.
7. Lift the card off the locator pins and set it aside on a static-protective surface.
For diagrams see “Installing an optional high-speed expansion card” on page 36.
38 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Adding or changing system memory
There are 8 DIMM slots for system memory. Each IBM PowerXCell 8i processor has two memory channels and there are two DIMM slots per memory channel.
You can use VLP DDR2 1 GB, 2 GB, or 4 GB memory modules. The maximum memory configuration has a 4 GB memory module in each DIMM slot which provides 16 GB to each processor and 32 GB in total.
DIMM 8 slot DIMM 7 slot DIMM 6 slot DIMM 5 slot
DIMM 4 slot DIMM 3 slot
DIMM 2 slot
1
DIMM 1 slot
Processor 1 channel 1
Processor 1 channel 0
Processor 0 channel 1
Processor 0 channel 0
Figure 14. System memory DIMM slot locations
As shown in Figure 14, each processor has a channel 0 and a channel 1, with a pair of DIMM slots for each channel. To use a channel, you must populate both DIMM slots that belong to the channel.
All DIMM configuration listed in Table 2 are supported by the latest firmware level.
Table 2. Supported DIMM configurations
IBM PowerXCell 8i-0 IBM PowerXCell 8i-1
Channel 0 Channel 1 Channel 0 Channel 1
Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 Slot 7 Slot 8
1 GB 1 GB 1 GB 1 GB 1 GB 1 GB 2 GB 2 GB 1 GB 1 GB 4 GB 4 GB 2 GB 2 GB 1 GB 1 GB 2 GB 2 GB 2 GB 2 GB 2 GB 2 GB 4 GB 4 GB 4 GB 4 GB 1 GB 1 GB 4 GB 4 GB 2 GB 2 GB 4 GB 4 GB 4 GB 4 GB 1 GB 1 GB 1 GB 1 GB 1 GB 1 GB 1 GB 1 GB 1 GB 1 GB 1 GB 1 GB 2 GB 2 GB 2 GB 2 GB 1 GB 1 GB 1 GB 1 GB 4 GB 4 GB 4 GB 4 GB 2 GB 2 GB 2 GB 2 GB 1 GB 1 GB 1 GB 1 GB 2 GB 2 GB 2 GB 2 GB 2 GB 2 GB 2 GB 2 GB 2 GB 2 GB 2 GB 2 GB 4 GB 4 GB 4 GB 4 GB 4 GB 4 GB 4 GB 4 GB 1 GB 1 GB 1 GB 1 GB
Chapter 4. Installing and removing replaceable units 39
Table 2. Supported DIMM configurations (continued)
IBM PowerXCell 8i-0 IBM PowerXCell 8i-1
Channel 0 Channel 1 Channel 0 Channel 1
Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 Slot 7 Slot 8
4 GB 4 GB 4 GB 4 GB 2 GB 2 GB 2 GB 2 GB 4 GB 4 GB 4 GB 4 GB 4 GB 4 GB 4 GB 4 GB
To change the system memory configuration, complete the following steps:
1. Read the safety information beginning on page vii and “Installation guidelines”
on page 29.
2. If applicable, shut down the operating system, turn off the blade server, and
remove the blade server from the BladeCenter unit. See “Removing the blade server from the BladeCenter unit” on page 30.
3. Open the blade server cover. See “Opening and removing the blade server
cover” on page 31.
4. Remove the front bezel assembly. See “Removing the blade-server front bezel
assembly” on page 33 for details.
5. Locate the DIMM slots in which you want to insert the system memory modules.
See Table 2 on page 39 and Figure 14 on page 39 for guidance.
6. Remove any modules that are to be replaced or that have become redundant.
a. Open the retaining clips on either end of the DIMM slot. This lifts the DIMM
and disengages it from the slot.
b. Pull the DIMM out of the slot.
Insert the new DIMMs.
7.
a. Ensure that the retaining clips at both ends of the DIMM slot are in the open
position.
b. Place the DIMM in the slot, contact side down. Check the orientation of the
module. The locating pin in the slot must match the corresponding cut-out on the module.
c. Carefully press the module into place until the retaining clips snap into
position. Make sure that the clips are locked properly.
Figure 15. DIMM retaining clips
Note: Unused system memory slots do not require DIMM fillers.
If you have other options to install or remove, do so now. Otherwise, go to “Finishing the installation” on page 50.
40 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Adding or changing I/O buffer DDR2 memory modules
This section describes how to add I/O buffer memory. For instructions on how to add system memory see “Adding or changing system memory” on page 39.
Each IBM PowerXCell 8i companion chip has one DIMM slot for I/O buffer memory. The QS22 blade server supports VLP DDR2 1 GB DIMMs. Yo u must add memory as a pair of DIMMs, one for each IBM PowerXCell 8i companion chip.
Figure 16. I/O buffer DIMM slot location
To install I/O buffer memory, complete the following steps:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. If applicable, shut down the operating system, turn off the blade server, and remove the blade server from the BladeCenter unit. See “Removing the blade server from the BladeCenter unit” on page 30.
3. Open the blade server cover. See “Opening and removing the blade server cover” on page 31.
4. Locate the DIMM slots for the I/O buffer DDR2 memory modules. There are two DIMM slots, one for each IBM PowerXCell 8i companion chip.
The slots are labelled IOBUF 1 and IOBUF 2. You must install a 1 GB DIMM for both IBM PowerXCell 8i companion chips.
5. Remove the DIMM fillers from the slots. Retain the DIMM fillers. They are an important part of the blade server cooling system and you need them if you ever remove the I/O buffer DIMMs from the blade server.
6. If applicable, remove any modules that are to be replaced or that have become redundant.
a. Open the retaining clips on either end of the DIMM slot. This lifts the DIMM
and disengages it from the slot.
b. Pull the DIMM out of the slot.
7. Place each DIMM in its slot, contact side down. Check the orientation of the modules. The locating pin in each slot must match the corresponding cut-out on the module.
8. Carefully press the modules into place until the retaining clips snap into place. Make sure that the clips are locked properly.
Chapter 4. Installing and removing replaceable units 41
Figure 17. DIMM retaining clips
If you have other options to install or remove, do so now. Otherwise, go to “Finishing the installation” on page 50.
Replacing DIMM fillers
For the QS22 cooling system to work properly there must be no empty I/O buffer DIMM slots. The unused slots must be fitted with DIMM fillers. Replace faulty DIMM fillers and, if you remove the I/O buffer memory modules, fit empty slots with DIMM fillers.
Note: Unused system memory slots do not require DIMM fillers.
To install or replace DIMM fillers, complete the following steps:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. If applicable, shut down the operating system, turn off the blade server, and remove the blade server from the BladeCenter unit. See “Removing the blade server from the BladeCenter unit” on page 30.
3. Open the blade server cover. See “Opening and removing the blade server cover” on page 31.
4. Remove any faulty DIMM fillers. a. Open the retaining clips on either end of the DIMM slot. This lifts the DIMM
filler and disengages it from the slot.
b. Pull the filler out of the slot.
5. If you wish to remove the I/O Buffer memory modules, remove them now. Be sure to remove them both.
a. Open the retaining clips on either end of the DIMM slot. This lifts the DIMM
filler and disengages it from the slot.
b. Pull the module out of the slot.
Carefully press the DIMM filler into the empty I/O buffer DIMM slot until the
6. retaining clips snap into position.
7. Repeat step 6 for the other I/O buffer slot.
you have other options to install or remove, do so now. Otherwise, go to
If “Finishing the installation” on page 50.
42 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Installing the optional SAS expansion card
The QS22 blade server does not have any built-in disk storage. The SAS expansion card allows you to use SAS attached storage. Use the blue handling areas to handle the card.
Ball socket
Figure 18. SAS expansion card reverse side
Complete the following steps to install the SAS expansion card:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. If applicable, shut down the operating system, turn off the blade server, and remove the blade server from the BladeCenter unit. See “Removing the blade server from the BladeCenter unit” on page 30.
3. Open the blade server cover. See “Opening and removing the blade server cover” on page 31.
4. Locate the two SAS expansion card connectors and the ball stud on the system board.
Connectors
Connectors for SAS expansion card
1
Ball stud
Figure 19. SAS expansion card connector and ball stud location
5. Locate the connectors and the ball socket on the SAS adapter card.
6. Align the connectors on the system board with the connectors on the SAS adapter card.
Chapter 4. Installing and removing replaceable units 43
Expansion card
Figure 20. SAS expansion card location
7. Using the blue handling areas, carefully push the card down to insert it into the connectors. Ensure that the ball stud on the system board engages with the ball socket on the SAS expansion card.
you have other options to install or remove, do so now. Otherwise, go to
If “Finishing the installation” on page 50.
Installing the BladeCenter PCI Express I/O Expansion Unit
Important:
v A BladeCenter QS22 with the BladeCenter PCI Express I/O Expansion Unit
installed takes up two contiguous slots in the BladeCenter chassis.
v You must remove any expansion card that uses the high-speed connector before
installing the expansion unit.
Cover pins
Cover release
Cover release
Figure 21. Installing the expansion unit
44 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Complete the following steps to install the BladeCenter PCI Express I/O Expansion Unit:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. Remove the blade server cover and set it aside. See “Opening and removing the blade server cover” on page 31 for further information.
3. Remove the connector cover or any optional card from the high-speed connector. Figure 12 on page 37 shows the location of the high-speed connector.
4. Lower the expansion unit so that the slots at the rear slide down onto the cover pins at the rear of the blade server, as shown in Figure 21 on page 44.
5. Carefully close the expansion unit as shown in Figure 21 on page 44 until it clicks into place.
Replacing the system board base and planar
Important
The system board is a field replaceable unit (FRU): FRUs must be installed only by trained service technicians.
Figure 22. System board assembly
Complete the following steps to replace the system board base and planar:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. If applicable, shut down the operating system, turn off the blade server, and remove the blade server from the BladeCenter unit. See “Removing the blade server from the BladeCenter unit” on page 30.
3. Remove the blade server cover. See “Opening and removing the blade server cover” on page 31.
4. Remove the front bezel from the defective system board and set it aside. See “Removing the blade-server front bezel assembly” on page 33 for detailed instructions.
5. Remove the system memory modules and any optional components such as I/O buffer memory modules or fillers, expansion cards or modular flash drive from the defective system board and set them aside.
6. Note down the serial number of the defective system board. You need this later to update the VPD information.
7. Install the system memory modules on the replacement system board. See “Adding or changing system memory” on page 39.
Chapter 4. Installing and removing replaceable units 45
8. Reinstall any options you removed from the defective system board on the replacement system board, . See “Installing an optional high-speed expansion card” on page 36, “Installing the optional SAS expansion card” on page 43, “Installing the optional modular flash drive” on page 34 and “Adding or changing I/O buffer DDR2 memory modules” on page 41 for detailed instructions.
9. Reinstall the front bezel assembly on the replacement system board. See “Installing the front bezel assembly” on page 51 for detailed instructions.
10. Replace and close the cover. See “Closing the blade server cover” on page 52 for details.
11. Reinstall the blade server in the BladeCenter unit.
12. Update the BMC, system and optional expansion card firmware as described in Chapter 2, “Configuring the blade server,” on page 9.
13. Using SMS, update the VPD information by entering the serial number of the defective system board. See “Adding FRU information” on page 13 for details.
14. Configure the replacement blade server to boot from the same device as the original defective unit. See the QS22 Installation and User's Guide for details.
Providing the options on the new blade server are the same as on the
Note:
old you do not have to reinstall or reconfigure the operating system but simply configure the boot options to boot from the boot device.
Replacing the battery
IBM has designed this product with your safety in mind. The lithium battery must be handled correctly to avoid possible danger. If you replace the battery, you must adhere to the following instructions.
Note: In the U. S., call 1-800-IBM-4333 for information about battery disposal.
If you replace the original lithium battery with a heavy-metal battery or a battery with heavy-metal components, be aware of the following environmental consideration. Batteries and accumulators that contain heavy metals must not be disposed of with normal domestic waste. They will be taken back free of charge by the manufacturer, distributor, or representative, to be recycled or disposed of in a proper manner.
To order replacement batteries, call 1-800-IBM-SERV within the United States, and 1-800-465-7999 or 1-800-465-6666 within Canada. Outside the U.S. and Canada, call your IBM authorized reseller or IBM marketing representative.
Note: After you replace the battery, the blade server is automatically reconfigured.
However, you must reset the system date and time through the operating system that you installed.
Statement 2:
46 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
CAUTION: When replacing the lithium battery, use only IBM Part Number 43W9859 or 03N2449 or an equivalent type battery recommended by the manufacturer. If your system has a module containing a lithium battery, replace it only with the same module type made by the same manufacturer. The battery contains lithium and can explode if not properly used, handled, or disposed of.
Do not:
v Throw or immerse into water v Heat to more than 100°C (212°F) v Repair or disassemble
Dispose
of the battery as required by local ordinances or regulations.
Note: See “Battery return program” on page 130 for more information about battery
disposal.
Complete the following steps to replace the battery:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. Follow any special handling and installation instructions that come with the battery.
3. If applicable, shut down the operating system, turn off the blade server, and remove the blade server from the BladeCenter unit. See “Removing the blade server from the BladeCenter unit” on page 30.
4. Remove the blade server cover. See “Opening and removing the blade server cover” on page 31.
5. Locate the battery (Battery holder J12) on the system board.
1
Battery
Figure 23. Battery location
6. Remove the battery: a. Use one finger to press the top of the battery clip away from the battery.
The battery pops up when released.
Chapter 4. Installing and removing replaceable units 47
b. Use your thumb and index finger to lift the battery from the socket. c. Dispose of the battery as required by local ordinances or regulations.
7. Insert the new battery:
a. Make sure the positive (+) side is facing upwards. b. Tilt the battery so that you can insert it into the socket, under the battery
clip.
c. Press the battery down into the socket until it clicks into place. Make sure
the battery clip holds the battery securely.
8. Close the blade server cover and insert the blade server into the BladeCenter
unit (see “Closing the blade server cover” on page 52).
Statement 21:
CAUTION: Hazardous energy is present when the blade server is connected to the power source. Always replace the blade cover before installing the blade server.
9. Turn on the blade server (see “Turning on the blade server” on page 4).
10. Reset the system date and time through the operating system that you installed. For additional information, see your operating system documentation.
Replacing the retention clip for the modular flash drive
The retention clip supports the modular flash drive and should be replaced if damaged.
To remove and replace the retention clip, complete the following steps:
1. Read the safety information beginning on page vii and “Installation guidelines”
on page 29.
2. If applicable, shut down the operating system, turn off the blade server, and
remove the blade server from the BladeCenter unit. See “Removing the blade server from the BladeCenter unit” on page 30.
3. Remove the blade server cover. See “Opening and removing the blade server
cover” on page 31.
4. If applicable, carefully remove the modular flash drive from the retention clip.
5. Using a Philips head screwdriver pierce the label at the red circle corresponding
with the retention clip.
48 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Retention clip screws
Ball stud screws
6. Carefully unscrew the retention clip and remove.
7. Position the replacement retention clip over the hole and screw into position, taking care not to over-tighten as this might damage the system board.
8. If applicable, replace the modular flash drive.
9. Replace the cover and insert the blade server into the BladeCenter unit.
Using the miscellaneous parts kit
The miscellaneous parts kit contains replacement parts and screws to be used if the original item is damaged. It contains the following items:
Kit, Miscellaneous Parts Quantity Part No
Alignment socket 4 26K6003 Cover for blade server expansion connector 4 28R3024 Pivot point blocks for expansion card support 4 31R2232 Ball stud for expansion card support 4 31R2233 Tray, expansion card support end bracket 2 31R2248 Alignment pin 2 39M6518 Screw, Plastite 4-20x6.35 8 39R9558 Screw, 3.5 x 6 Pan Head, Philips, Planar 6 26K5962 QS22 Planar Light box 2 43W9782 Impedance Air Baffle top, foam 2 43W9966 Impedance Air Baffle DIMM Sides 2 43W9958 Impedance Air Baffle DIMM Sides 2 43W9959 Impedance Air Baffle VRD heat sink, foam 2 43W9967 Impedance Air Baffle VRD heat sink, foam 2 43W9969
Chapter 4. Installing and removing replaceable units 49
Kit, Miscellaneous Parts Quantity Part No
Impedance Air Baffle Processor HS, foam gasket 2 43W9968 Screw, Plastite 4-20x9.53 for flash drive memory tray 2 43W9973
To replace a support or bracket you need a Philips head screwdriver.
Replacing the ball studs
The ball studs help support the optional expansion cards and should be replaced if damaged.
To remove and replace a ball stud, complete the following steps:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. If applicable, shut down the operating system, turn off the blade server, and remove the blade server from the BladeCenter unit. See “Removing the blade server from the BladeCenter unit” on page 30.
3. Remove the blade server cover. See “Opening and removing the blade server cover” on page 31.
4. Using a Philips head screwdriver pierce the label at the red circle corresponding with the ball stud you wish to replace.
Retention clip screws
Ball stud screws
5. Carefully unscrew the ball stud and remove.
6. Position the replacement ball stud over the hole and screw into position, taking care not to over-tighten as this might damage the system board.
7. Replace the cover and insert the blade server into the BladeCenter unit.
Finishing the installation
To complete the installation you must:
50 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
1. Reinstall the front bezel assembly on the blade server if removed. See “Installing the front bezel assembly” for further information.
2. Ensure that the I/O buffer DIMM slots are occupied either by DIMMs or by DIMM fillers. No DIMM fillers are needed for empty system memory DIMM slots.
3. Replace and close the blade server cover, unless you installed an optional expansion unit that has its own cover. See “Closing the blade server cover” on page 52 for further information.
Statement 21:
CAUTION: Hazardous energy is present when the blade server is connected to the power source. Always replace the blade cover before installing the blade server.
4. Reinstall the blade server into the BladeCenter unit.
5. Turn on the blade server. See “Turning on the blade server” on page 4 for further information.
6. If you have replaced the battery or the system board assembly, reset the system date and time through the operating system that you installed. For additional information, see your operating system documentation.
If you have just powered on the BladeCenter unit, wait until the power on
Note:
LED on the blade server flashes slowly before powering on the blade server.
Installing the front bezel assembly
Figure 24 on page 52 shows how to install the front bezel assembly on the blade server.
Chapter 4. Installing and removing replaceable units 51
Blade cover
Cover release
Bezel release
Cover release
Bezel release
Figure 24. Installing the front bezel assembly
Complete the following steps to install the blade server front bezel assembly:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. Connect the control panel cable to the control panel connector on the system board assembly.
3. Carefully slide the front bezel assembly onto the blade server, as shown in Figure 24, until it clicks into place.
Make sure that you do not pinch any cables when you reinstall the front
Note:
bezel assembly.
Closing the blade server cover
Important: The blade server cannot be inserted into the BladeCenter unit until the
cover is installed and closed or an expansion unit is installed. Do not attempt to override this protection.
Control-Panel Cable
Bezel assembly
Control-Panel Connector
Statement 21:
CAUTION: Hazardous energy is present when the blade server is connected to the power source. Always replace the blade cover before installing the blade server.
52 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Cover pins
Cover release
Cover release
Figure 25. Closing the blade server cover
Complete the following steps to close the blade server cover:
1. Read the safety information beginning on page vii and “Installation guidelines” on page 29.
2. If you removed the front bezel assembly, replace it now. See “Installing the front bezel assembly” on page 51 for instructions.
3. Lower the cover so that the slots at the rear slide down onto the pins at the rear of the blade server, as shown in Figure 25. Before closing the cover, make sure that all components are installed and seated correctly and that you have not left loose tools or parts inside the blade server.
4. Pivot the cover to the closed position as shown in Figure 25 until it clicks into place.
Input/output connectors and devices
The BladeCenter unit contains the input/output connectors that are available to the blade server. See the documentation that comes with the BladeCenter unit for information about the input/output connectors.
Chapter 4. Installing and removing replaceable units 53
54 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Chapter 5. Diagnostics and troubleshooting
This chapter provides basic troubleshooting information to help you solve some common problems that might occur while setting up your blade server.
A problem with the BladeCenter QS22 can relate either to the BladeCenter QS22 or the BladeCenter unit.
A problem with the blade server exists if the BladeCenter unit contains more than one blade server and only one of the blade servers has the symptom. If all of the blade servers have the same symptom, then the problem relates to the BladeCenter unit. For more information, see the Problem Determination and Service Guide for your BladeCenter unit.
The BladeCenter QS22 blade server is supported in the IBM BladeCenter H unit, the IBM BladeCenter HT unit, and the IBM BladeCenter S (non RAID type only) unit.
You can put other blade server types that are supported in your BladeCenter unit in the same unit as a BladeCenter QS22.
Prerequisites
Basic checks
Before you start problem determination or servicing, check that:
v The BladeCenter QS22 is inserted correctly into the BladeCenter unit. v All components are connected correctly v The BladeCenter QS22 has the latest firmware updates. These should include
updates for: The BMC The system firmware Gb Ethernet controller The SAS expansion card The high-speed expansion card (if installed)
v The components in your SAS environment have the latest firmware updates.
These include updates for: SAS Connectivity Modules (if installed) SAS Storage Modules (if installed into BladeCenter S chassis) IBM Boot Disk System (if attached)
If you install the blade server in the BladeCenter unit and the blade server does not start, always perform the following basic checks before continuing with more advanced troubleshooting:
v Make sure that the BladeCenter unit is correctly connected to a power source. v Reseat the blade server in the BladeCenter unit. v If the power on LED is flashing slowly, the blade server may be turned off. To
turn on the blade server, see “Turning on the blade server” on page 4 for further information.
© Copyright IBM Corp. 2006, 2008 55
v If you have just added a new optional device or component, make sure that it is
correctly installed and compatible with the blade server and its components. If the device or component is not compatible, remove it from the blade server, reinstall the blade server in the BladeCenter unit, and then restart the blade server.
v Use Advanced Management Module to check that the blade server appears in
the list of blade servers available.
Finding troubleshooting information
Table 3 describes where to find troubleshooting information in this section.
Note: Many components, including the CPU and power supplies cannot be
exchanged in the field. The only replaceable parts are the optional SAS expansion card, battery, front bezel assembly, system DIMM memory, I/O buffer DIMM memory, modular flash drive and the optional InfiniBand card.
Table 3. Where to find troubleshooting information
Component Where to find information
SAS expansion card Front bezel High-speed InfiniBand expansion card Modular flash drive
Memory Table 12 on page 80 LEDs
Power Network connections Service processor Software problems
For troubleshooting information about other BladeCenter components, see the appropriate Problem Determination and Service Guide, and other product-specific documentation. See “Related documentation” on page 1 for additional information. For the latest editions of the IBM BladeCenter documentation, go to http://www.ibm.com/support/us/en/ on the World Wide Web.
Troubleshooting charts
The following tables list problem symptoms and suggested solutions. If you cannot find the problem in the troubleshooting charts, or if carrying out the suggested steps do not solve the problem, have the blade server serviced.
“Solving undetermined problems” on page 108
“Troubleshooting charts” on page 56
If you have problems with an adapter, monitor, keyboard, mouse, or power module, see the Problem Determination and Service Guide for your BladeCenter unit for more information.
If you have problems with an Ethernet switch module, I/O adapter, or other optional device that can be installed in the BladeCenter unit, see the Problem Determination
and Service Guide or other documentation that comes with the device for more
information.
56 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Problems indicated by the front panel LEDs
The state of the LEDs on the front of the blade can help in isolating problems.
The table below gives an explanation and a suggested action, if required, for each
Information LED
Location LED
Activity LED
Power-on LED
Media-tray select button
Power-control button
Blade-error LED
NMI reset-button
CD
Figure 26. Power-control button and LEDs
LED.
Table 4. Explanation of LEDs and their states
LED State Explanation Suggested action
Blade error LED yellow A system error has occurred on
the blade server.
Information LED yellow Information about a system
event has been placed in the Advanced Management Module Event Log. The information LED remains on until turned off by Advanced Management Module or through IBM Director Console.
Activity LED Green There is network activity. No action required. For further
Check the BladeCenter error log, see “Problem reporting” on page 108.
Check Advanced Management Module to see what the problem is. See the
BladeCenter Management Module User's Guide for further
information about the error.
information about troubleshooting networks, see “Network connection problems” on page 62.
Chapter 5. Diagnostics and troubleshooting 57
Table 4. Explanation of LEDs and their states (continued)
LED State Explanation Suggested action
Power-on LED Flashing rapidly The service processor on the
No action required blade server is communicating with the BladeCenter Management Module.
Flashing slowly The blade server has power but
Turn on if required is not turned on.
Lit continuously (steady) The blade server has power
No action required and is turned on.
Not lit. Blade server not powered.
1. Verify that the BladeCenter unit provides 12V dc to the blade server.
2. Reseat blade server.
3. Check if BladeCenter power supplies numbers 3 and 4 are installed and powered. If they are not, install and power them or use slots 1-5.
4. Go to “Power problems” on page 62
Problems indicated by the system board LEDs
The blade server must be removed from the BladeCenter unit and the cover removed before you can use the light path LEDs for diagnostics. To activate the light box and the other light path LEDs, press the light path diagnostics switch.
58 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Modular flash drive error LED
Light path diagnosis switch
Light path diagnosis LED
I/O buffer DIMM error LED
System memory DIMM
error LEDs
1
I/O buffer DIMM error LED
1
NMI error LED
CPU fail LED
System board LED
Temperature fault LED
Figure 27. Light box and system board LEDs
The location of each LED on the system board is shown in the table below.
Table 5. System board LEDs
Board
LED Color Status LEDs The status LEDs are listed
Heartbeat Green D16 Indicates the BMC is functional. Alert Yellow D15 Indicates an error condition has
Ethernet 1 activity Green D12 Indicates on-board Ethernet 1 is
Ethernet 0 activity Green D11 Indicates on-board Ethernet 0 is
BE0_PLL_LOCK Green D8 Indicates the phased lock loop of
BE1_PLL_LOCK Green D13 Indicates the phased lock loop of
MM_SELECT_A Green D19 Indicates Advanced Management
MM_SELECT_B Green D18 Indicates Advanced Management
location Explanation Comments
for reasons of completeness since they are for use by IBM service only and are not
occurred on the system board.
normally visible. They are not activated by the light
active and sending or receiving
path diagnostics switch.
packets.
active and sending or receiving packets.
Cell BE-0 is working.
Cell BE-1 is working.
Module A is active.
Module B is active.
Chapter 5. Diagnostics and troubleshooting 59
Table 5. System board LEDs (continued)
Board
LED Color
location Explanation Comments
Light path LEDs
DIMM at I/O BUF 1 error
DIMM at I/O BUF 2 error
Yellow DS9 There has been a failure in the I/O
DIMM module.
Yellow DS10
See Figure 27 on page 59 for the location of each DIMM and its
associated LED. System DIMM1 error Yellow DS1 There has been a failure in the System DIMM2 error Yellow DS2 System DIMM3 error Yellow DS3 System DIMM4 error Yellow DS4
system DIMM module.
See Figure 27 on page 59 for the
location of each DIMM and its
associated LED. System DIMM5 error Yellow DS1
System DIMM6 error Yellow DS6 System DIMM7 error Yellow DS7 System DIMM8 error Yellow DS8
Light box LEDs
Either remove or replace the faulty DIMM. Note that if you remove one I/O buffer DIMM you must remove the other. Reboot.
Either remove or replace the faulty DIMM and reboot. See “Adding or changing I/O buffer DDR2 memory modules” on page 41 for supported memory configurations.
60 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Table 5. System board LEDs (continued)
Board
LED Color
location Explanation Comments
Temperature fault Yellow Light
box
The blade server has exceeded the operational temperature range.
v Using the Advanced
Management Module, check that the BladeCenter unit cooling system is operating correctly.
v Replace any missing filler
blades in the BladeCenter unit.
v Replace any missing
DIMM fillers in the BladeCenter QS22 I/O buffer DIMM slots.
v Check that other blade
servers are operating within the recommended temperature range.
v Replace the blade server,
power on and boot. Check Advanced Management Module for errors.
NMI error (NMI) Yellow The NMI pinhole reset on the front
panel has been pressed.
CPU fail Yellow One of the Cell BE processors has
failed.
System board Yellow A critical error has occurred in a
component on the system board.
Light path diagnostics Green Lights when the light path
diagnostics switch is pressed. Indicates that the capacitor is charged and the light path LEDs can light to show any errors.
If
the problem persists,
contact your IBM service representative as the system board may need servicing.
Pressing the reset causes the operating system to call the system debugger.
Contact your IBM service representative as the system board needs replacement.
Contact your IBM service representative as the system board may need replacing.
If this LED does not light then the light path LEDs cannot function.
Reinstall the blade server in the BladeCenter unit and power on to recharge.
If this fails to resolve the problem, there is a problem with the system board and it may need replacement.
Chapter 5. Diagnostics and troubleshooting 61
Power problems
Power symptom Suggested action
The blade server does not turn on.
1. Make sure that: a. The power-on LED on the front of the BladeCenter unit is lit. b. The LEDs on all the BladeCenter power modules are lit. c. The power-on LED on the blade-server control panel is flashing slowly.
v The power-on LED only flashes rapidly while it is communicating with the
management module. If the power-on LED is flashing rapidly and continues to do so for an unduly long time, the blade server is not communicating with the management module. Power off, reseat the blade server and reboot.
v If the power LED is off, either the blade bay is not receiving power, the
blade server is defective, the Advanced Management Module firmware is an earlier version and does not support this function, or the LED information panel is loose or defective.
d.
Local power control for the blade server is enabled. Check using the
Advanced Management Module Web interface. The blade server might have been instructed through the Advanced Management Module to turn on.
If you have just installed a new option in the blade server, remove it, and restart
2. the blade server. If the blade server now powers on, troubleshoot the option. See the documentation that comes with the option for further information.
3. Try another blade server in the blade bay. If it works, you may need to have a trained service technician replace the system blade assembly.
Power throttling
Be aware that the BladeCenter unit automatically reduces the BladeCenter QS22 processor speed if certain conditions are met. One such condition is temperature thresholds being exceeded, for example, when the blade server is running in acoustic mode. This throttling occurs independent of your power configuration. Full processor speed is restored automatically when the conditions that have caused the throttling have been resolved.
Network connection problems
Network connection symptom Suggested action
One or more blade servers are unable to communicate with the network.
Make sure that:
v The switch modules for the network interface being used are installed in the
correct BladeCenter bays and are configured and operating correctly.
v The settings in the switch module are correct for the blade server (settings in the
switch module are blade server specific).
For
additional information, see:
v Chapter 2, “Configuring the blade server,” on page 9 v The Problem Determination and Service Guide for your BladeCenter unit v Other product-specific documentation that comes with the switch module
For the latest editions of the IBM BladeCenter documentation, go to
Note:
http://www.ibm.com/support/us/en/.
If the problem remains, see “Solving undetermined problems” on page 108.
If all the blades cannot communicate with the network, check the network itself for problems.
62 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Service processor problems
Service processor symptom Suggested action
Service processor reports a general monitor failure.
1. If the blade server is operating, shut down the operating system.
2. If the blade server was not turned off, press the power-control button (behind the blade server control-panel door) to turn off the server.
3. Remove the blade server from the BladeCenter unit.
4. Wait 30 seconds and reinstall the blade server into the BladeCenter unit.
5. Restart the blade server.
If
the problem remains, see “Solving undetermined problems” on page 108
Software problems
Symptom Suggested action
You suspect a software problem.
1. To determine whether the problem is caused by the software, make sure that:
v The blade server has at least the minimum memory that is needed to use the
software. For memory requirements, see the software documentation.
v The blade server has a supported DIMM configuration (see Table 2 on page
39).
v The software is designed to operate on the blade server. v Other software works on the blade server. v The software works on another server.
If you received any error messages when using the software, see the software
2. documentation for a description of the messages and suggested solutions to the problem.
3. Contact the software vendor.
Chapter 5. Diagnostics and troubleshooting 63
Recovering the system firmware code
The system firmware is contained in two separate images in the flash memory of the blade server: temporary and permanent. These images are referred to as TEMP and PERM, respectively. The system normally starts from the TEMP image, and the PERM image serves as a backup. If the TEMP image becomes damaged, such as from a power failure during a firmware update, the system automatically starts from the PERM image.
If the TEMP image is damaged, you can recover the TEMP image from the PERM image. See “Recovering the TEMP image from the PERM image” for further information.
Checking the boot image
To check whether the system has started from the PERM image, enter:
cat /proc/device-tree/openprom/ibm,fw-bank
A P is returned if the system has started from the PERM image.
Booting from the TEMP image
To initiate a boot from the TEMP image after the system has booted from the PERM side, complete the following steps:
1. Turn off the blade server.
2. Restart the blade system management processor from the Advanced Management Module.
3. Turn on the blade server.
If the temp side is corrupted the boot times out, and an automatic reboot
Note:
occurs after switching to the PERM side.
the blade server does not restart, you must replace the system board assembly.
If Contact a service support representative for assistance.
Recovering the TEMP image from the PERM image
To recover the TEMP image from the PERM image, you must copy the PERM image into the TEMP image. To perform the copy, complete the following steps:
1. Copy the perm image to the temp image. Using the Linux operating system, type the following command:
update_flash -r
2. Shut down the blade server using the operating system.
3. Restart the blade system management processor from the management module.
4. Turn on the blade server.
might need to update the firmware code to the latest version. See “Installing
You the system firmware” on page 20 for more information on updating the firmware code.
64 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Supported boot media
The BladeCenter QS22 can boot from the operating system installation CDs or DVDs to allow the operating system to be installed.
The BladeCenter QS22 can also boot from:
v The network v Modular flash drive, if installed v SAS storage, if attached v Local BladeCenter unit SAS drives, if the blade server is installed into an IBM
BladeCenter S unit that provides local SAS drives
you wish to perform a standard Bootp/TFTP network boot, please note the
If following restrictions:
v Only the built-in Gigabit Ethernet Controller of I/O Bridge is supported v Only boot through the Ethernet switch on the top side of BladeCenter v No fall back or configurable change to the bottom switch is possible v In the Advanced Management Module you need to set boot list to Network v There is no support for a router between the blade and TFTP server. Only local
TFTP is supported.
Advanced Management Module to configure the required boot mode. See IBM
Use
BladeCenter Management Module Installation Guide for more information.
Booting the system
This section provides an overview on how to interpret the console output of the host firmware. The output is grouped into several parts, which are detailed below.
Note:
1. The first part of the boot process shows the system name and build date. You see an error at this point if the firmware image is corrupted.
2. Memory initialization follows next. It can take several seconds to initialize the system memory. The screen displays details of the size and the speed of memory modules.
The firmware console output depends on the configuration of your system.
The examples below are indicative only and may not reflect the configuration of your system
*************************************************************************** QS22 Firmware Starting
Check ROM = OK Build Date = Jan 4 2008 11:31:29 FW Version = "QD-1.26.0-0"
Press "F1" to enter Boot Configuration (SMS) Press "F2" to boot once from CD/DVD
DDR2 MEMORY INITIALIZATION
CPU0 DIMMs: DIMM1=1024MB DIMM2=1024MB DIMM3=n/a DIMM4=n/a CPU0 timings: 800 MHz, CL=6, tRCD=6, tRP=6 CPU0 memory test: ok CPU1 DIMMs: DIMM5=1024MB DIMM6=1024MB DIMM7=n/a DIMM8=n/a CPU1 timings: 800 MHz, CL=6, tRCD=6, tRP=6 CPU1 memory test: ok
COMPLETE
Chapter 5. Diagnostics and troubleshooting 65
3. The next screen displays system information. It shows revision information about the chip set, SMP size, boot date/time, and the available memory.
SYSTEM INFORMATION
Processor = PowerXCell DD1.0 @ 3200 MHz I/O Bridge = Cell BE companion chip DD3.x Timebase = 26666 kHz (internal) SMP Size = 2 (4 threads) Boot-Date = 2008-01-16 11:25 Memory = 4096MB (CPU0: 2048MB, CPU1: 2048MB)
4. The next screens show the open firmware section of the boot process and provide checkpoints and an overview which adapters are available in the system. The details in the adapter list are not meaningful.
Note: The warning (!) Permanent Boot ROM is displayed if there is a problem
with the TEMP image and system firmware is running on from the PERM image. Yo u should correct this problem as soon as possible. See “Recovering the TEMP image from the PERM image” on page 64 for further information.
OPEN FIRMWARE
Adapters on 000001460ec00000 00 0800 (D) : 14e4 16a8 network [ ethernet ] 00 0900 (D) : 14e4 16a8 network [ ethernet ] Adapters on 000001a040000000 00 0000 (B) : 1014 032c pci Adapters on 000001a240000000 00 0000 (B) : 1014 032c pci IOBUF1 initializing... no DIMM.
Adapters on 000003460ec00000 00 0800 (D) : 1033 0035 usb-ohci ( NEC uPD720101 ) 00 0900 (D) : 1033 0035 usb-ohci ( NEC uPD720101 ) 00 0a00 (D) : 1033 00e0 usb-ehci* Adapters on 000003a040000000 00 0000 (B) : 1014 032c pci Adapters on 000003a240000000 00 0000 (B) : 1014 032c pci IOBUF2 initializing... no DIMM.
SB0 Monitor started SB1 Monitor started
Scan USB... uDOC not present
Ready
Welcome to Open Firmware
Licensed Internal Code - Property of IBM (c) Copyright IBM Corp. 2005, 2007 All Rights Reserved. Cell/B.E. is a trademark of SONY Computer Entertainment Inc.
Type ’boot’ and press return to continue booting the system. Type ’sms-start’ and press return to enter the configuration menu. Type ’reset-all’ and press return to reboot the system.
disable nvram logging .. done
The Operating System now boots unless you have pressed F1 in which case the SMS menu starts. See “Using the SMS utility program” on page 11 for further information.
66 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Diagnostic programs and messages
The Dynamic System Analysis (DSA) Preboot diagnostic programs are the primary method of testing the major components of the server. DSA is a system information collection and analysis tool that you can use to provide information IBM service and support to aid in the diagnosis of the system problems. The DSA diagnostic programs come on the IBM Dynamic System Analysis Preboot Diagnostic CD. Yo u can download the CD from http://www.ibm.com/support/us/en if one did not come with your server. As you run the diagnostic programs, text messages are displayed on the screen and are saved in the test log. A diagnostic text message indicates that a problem has been detected and indicates the action you should take as a result of the text message.
The DSA diagnostic programs collect the following information about the following aspects of the system:
v System configuration v Network interfaces and settings v Hardware inventory USB information v IBM LightPath diagnostics status v Service processor status and configuration v Vital product data and system firmware information v Drive Health Information v LSI RAID & Controller configuration
DSA diagnostic programs can also provide diagnostics for the following system
The components:
v Baseboard Management Controller v Memory stress v Open Firmware Memory Diagnostics v CPU stress
Additionally,
DSA creates a merged log that includes events from all collected logs.
All collected information can be output as a compressed XML file that can be sent to IBM Service. Additionally, you can view the information locally through a generated text report file. Optionally, the generated HTML pages may be copied to removable media and viewed from a web browser.
Running diagnostics and preboot DSA
To run the diagnostic programs, complete the following steps:
1. If the server is running, turn off the server and all attached devices.
2. Turn on all attached devices then turn on the server.
3. Ensure that external DSA bootable media is available as a boot device. For boot device selection, system firmware will work through the boot path as specified in the onboard planar VPD and try to establish communication with the specified interfaces in sequential order. These boot devices include the USB attached DVD (BladeCenter, media tray), the SAS storage if attached, Network attached storage, and, if the blade server is installed into an IBM BladeCenter S unit that provides local SAS drives, the local BladeCenter unit SAS drives
4. Press F2 to enter DSA when the POST menu displays the following screen:
Chapter 5. Diagnostics and troubleshooting 67
*************************************************************************** QS22 Firmware Starting
Check ROM = OK Build Date = Jan 4 2008 11:31:29 FW Version = "QD-1.26.0-0"
Press "F1" to enter Boot Configuration (SMS) Press "F2" to boot once from CD/DVD
5. The command line interface prompt will then appear on the SOL connection. The BladeCenter QS22 does not support the graphical user interface.
6. Follow the on screen directions to run preboot DSA. Diagnostics are run from within preboot DSA.
you are using the CPU or Memory stress tests, call your IBM service
When representative if you experience any system instability.
To determine what action you should take as a result of a diagnostic text message, see “DSA error messages” on page 69.
Open firmware memory diagnostic results are output to the SOL connection. They are also logged in NVRAM. All NVRAM logs (more than just OF diags) are collected as part of the DSA merged log.
If the diagnostic programs do not detect any hardware errors but the problem remains during normal server operations, a software error might be the cause. If you suspect a software problem, see the information that comes with your software.
A single problem might cause more than one error message. When this happens, correct the cause of the first error message. The other error messages usually will not occur the next time you run the diagnostic programs.
If there are multiple error codes or light path diagnostics LEDs that indicate a microprocessor error, the error might be in a microprocessor or in a microprocessor socket. See Table 21 on page 100 for further information about diagnosing microprocessor problems.
If the server stops during testing and you cannot continue, restart the server and try running the diagnostic programs again.
Diagnostic text messages
Diagnostic text messages are displayed while the tests are running. A diagnostic text message contains one of the following results:
v Passed: The test was completed without any errors v Failed: The test detected an error v Aborted: The test could not proceed because of the server configuration
Additional
information concerning test failures is available in the extended
diagnostic results for each test.
Viewing the test log
To view the test log when the tests are completed, issue the view command from the DSA command line interface. DSA collections may also be transferred to an external USB device using the copy command from the DSA command line interface.
68 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
DSA error messages
The tables below describe the messages that the diagnostic programs might generate and suggested actions to correct the detected problems. Follow the suggested actions in the order given.
CPU test results
Table 6. CPU test results
Test Number Status
CPU stress test
089-901­xxx
089-802­xxx
089-801­xxx
Fail Test failure
Abort System resource
Abort Internal program
Extended results Actions
1. If the system has stopped responding, turn off and restart the system and then run the test again.
2. Make sure that the DSA Diagnostic code is at the latest level. The latest level DSA Diagnostic code can be found on the IBM Support Web site at http://www.ibm.com/ support/docview.wss?uid=psg1SERV-DSA/.
3. Run the test again.
availability error
4. Check system firmware level and upgrade if necessary. The installed firmware level can be found in the DSA Diagnostic Event Log within the Firmware/VPD section for this component. The latest level firmware for this component can be found on the IBM Support Web site athttp://www.ibm.com/support/us/en/.
error
5. Run the test again.
6. If the system has stopped responding, turn off and restart the system and then run the test again.
7. If the test continues to fail, refer to the other sections of this chapter for diagnosis and corrective action.
BMC test results
Table 7. BMC test results
Test Number Status
I2C test 166-901-
xxx
Fail The BMC
Extended results Actions
indicates a failure in the IPMB bus.
1. Turn off the system and disconnect it from power. The system must be removed from AC power in order to reset the BMC.
2. After 45 seconds, reconnect the system to power and turn on the system.
3. Run the test again.
4. Make sure that the DSA Diagnostic code is at the latest level. The latest level DSA Diagnostic code can be found on the IBM Support Web site at http://www.ibm.com/ support/docview.wss?uid=psg1SERV-DSA/.
5. Check BMC firmware level and upgrade if necessary. The installed firmware level can be found in the DSA Diagnostic Event Log within the Firmware/VPD section for this component. The latest level firmware for this component can be found on the IBM Support Web site athttp://www.ibm.com/support/us/en/.
6. Run the test again.
7. If the test continues to fail, refer to the other sections of this chapter for diagnosis and corrective action.
Chapter 5. Diagnostics and troubleshooting 69
Table 7. BMC test results (continued)
Test Number Status
166-902-
Fail The BMC
xxx
Extended results Actions
indicates a failure in the memory card bus.
1. Turn off the system and disconnect it from power. The
system must be removed from AC power in order to reset the BMC.
2. After 45 seconds, reconnect the system to power and turn on the system.
3. Run the test again.
4. Make sure that the DSA Diagnostic code is at the latest level. The latest level DSA Diagnostic code can be found on the IBM Support Web site at http://www.ibm.com/ support/docview.wss?uid=psg1SERV-DSA/.
5. Check BMC firmware level and upgrade if necessary. The installed firmware level can be found in the DSA Diagnostic Event Log within the Firmware/VPD section for this component. The latest level firmware for this component can be found on the IBM Support Web site athttp://www.ibm.com/support/us/en/.
6. Run the test again.
7. If the reported memory size is the same as the installed memory size, complete the following steps. Otherwise, go to step 8.
a. Turn off the system and disconnect it from power. b. Reseat all the system DIMMs within the system. c. Reconnect the system to power and turn on the
d. Run the test again.
8. Turn off the system and disconnect it from power.
9. Remove all the system memory.
10. Install the minimum memory configuration for the system. See Table 2 on page 39 for supported memory configurations.
11. Reconnect the system to power and turn on the system.
12. Make sure that the reported memory size is the same as the installed memory size.
13. Run the test again. If the memory passes the test, one of the uninstalled memory cards or DIMMs is the failing component.
14. Repeat steps 8 through to 13 as necessary, using different memory cards and DIMMs, to isolate the failing component. It is important to change only one element each time in order to identify the specific cause of the error.
15. Replace the failing memory card or DIMM.
system.
70 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Table 7. BMC test results (continued)
Test Number Status
166-903-
Fail The BMC
xxx
Extended results Actions
indicates a failure in the Ethernet sideband bus.
1. Turn off the system and disconnect it from power. The system must be removed from AC power in order to reset the BMC.
2. After 45 seconds, reconnect the system to power and turn on the system.
3. Run the test again.
4. Make sure that the DSA Diagnostic code is at the latest level. The latest level DSA Diagnostic code can be found on the IBM Support Web site at http://www.ibm.com/ support/docview.wss?uid=psg1SERV-DSA/.
5. Check BMC firmware level and upgrade if necessary. The installed firmware level can be found in the DSA Diagnostic Event Log within the Firmware/VPD section for this component. The latest level firmware for this component can be found on the IBM Support Web site athttp://www.ibm.com/support/us/en/.
6. Check Ethernet device firmware level and upgrade if necessary. The installed firmware level can be found in the DSA Diagnostic Event Log within the Firmware/VPD section for this component. The latest level firmware for this component can be found on the IBM Support Web site at http://www.ibm.com/support/us/en/ .
7. Run the test again.
8. If the test continues to fail, refer to the other sections of this chapter for diagnosis and corrective action.
Chapter 5. Diagnostics and troubleshooting 71
Table 7. BMC test results (continued)
Test Number Status
166-904-
Fail The BMC
xxx
166-905-
Fail The BMC
xxx
166-906-
Fail The BMC
xxx
166-907-
Fail The BMC
xxx
166-908-
Fail The BMC
xxx
166-910-
Fail The BMC
xxx
Extended results Actions
indicates a failure in the main bus.
1. Turn off the system and disconnect it from power. The system must be removed from AC power in order to reset the BMC.
2. After 45 seconds, reconnect the system to power and turn on the system.
indicates a failure in the pecos bus.
3. Run the test again.
4. Make sure that the DSA Diagnostic code is at the latest level. The latest level DSA Diagnostic code can be found on the IBM Support Web site at http://www.ibm.com/
indicates a failure in the BMC private bus.
support/docview.wss?uid=psg1SERV-DSA/.
5. Check BMC firmware level and upgrade if necessary. The installed firmware level can be found in the DSA Diagnostic Event Log within the Firmware/VPD section for this component. The latest level firmware for this component
indicates a failure in the power backplane bus.
can be found on the IBM Support Web site at http://www.ibm.com/support/us/en/.
6. Run the test again.
7. If the test continues to fail, refer to the other sections of this chapter for diagnosis and corrective action.
indicates a failure in the microprocessor bus.
indicates a failure in the PCIe and Light path diagnostics bus.
72 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Table 8. BMC test results
Test Number Status
166-801-
Abort BMC I2C test
xxx BMC
166-802-
Abort BMC I2C test
xxx BMC
166-803-
Abort BMC I2C test
xxx BMC
166-804­xxx
BMC
166-805-
Abort BMC I2C test
Abort BMC I2C test
xxx BMC
166-806-
Abort BMC I2C test
xxx BMC
166-807-
Abort BMC I2C test
xxx BMC
166-808-
Abort BMC I2C test
xxx BMC
166-809-
Abort BMC I2C test
xxx BMC
166-810-
Abort BMC I2C test
xxx BMC
166-811-
Abort BMC I2C test
xxx BMC
166-812-
Abort BMC I2C test
xxx BMC
Extended results Actions
canceled: the BMC returned an incorrect response length.
canceled: the test cannot be completed for an unknown reason.
canceled: the node is busy; try later.
1. Turn off the system and disconnect it from power. The system must be removed from AC power in order to reset the BMC.
2. After 45 seconds, reconnect the system to power and turn on the system.
3. Run the test again.
4. Make sure that the DSA Diagnostic code is at the latest level. The latest level DSA Diagnostic code can be found on the IBM Support Web site at http://www.ibm.com/ support/docview.wss?uid=psg1SERV-DSA/.
5. Check BMC firmware level and upgrade if necessary. The installed firmware level can be found in the DSA Diagnostic Event Log within the Firmware/VPD section for this component. The latest level firmware for this component can be found on the IBM Support Web site at
canceled: invalid command.
http://www.ibm.com/support/us/en/.
6. Run the test again.
7. If the test continues to fail, refer to the other sections of this
canceled: invalid
chapter for diagnosis and corrective action.
command for the given LUN.
canceled: timeout while processing the command.
canceled: out of space
canceled: reservation canceled or invalid reservation ID
canceled: request data was truncated.
canceled: request data length is invalid.
canceled: request data field length limit is exceeded.
canceled: a parameter is out of range.
Chapter 5. Diagnostics and troubleshooting 73
Table 8. BMC test results (continued)
Extended
Test Number Status
166-813-
Abort BMC I2C test
xxx BMC
results Actions
canceled: cannot return the number of requested data bytes.
166-814­xxx BMC
Abort BMC I2C test
canceled: requested sensor, data, or record is not present.
166-814­xxx BMC
Abort BMC I2C test
canceled: invalid data field in the request.
166-816­xxx BMC
Abort BMC I2C test
canceled: the command is illegal for the specified sensor or record type
166-817­xxx BMC
Abort BMC I2C test
canceled: a command response could not be provided
166-818­xxx BMC
Abort BMC I2C test
canceled: cannot execute a duplicated request.
1. Turn off the system and disconnect it from power. The system must be removed from AC power in order to reset the BMC.
2. After 45 seconds, reconnect the system to power and turn on the system.
3. Run the test again.
4. Make sure that the DSA Diagnostic code is at the latest level. The latest level DSA Diagnostic code can be found on the IBM Support Web site at http://www.ibm.com/ support/docview.wss?uid=psg1SERV-DSA/.
5. Check BMC firmware level and upgrade if necessary. The installed firmware level can be found in the DSA Diagnostic Event Log within the Firmware/VPD section for this component. The latest level firmware for this component can be found on the IBM Support Web site at http://www.ibm.com/support/us/en/.
6. Run the test again.
7. If the test continues to fail, refer to the other sections of this chapter for diagnosis and corrective action.
74 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Table 8. BMC test results (continued)
Test Number Status
166-819-
Abort BMC I2C test
xxx BMC
166-820­xxx
BMC
166-821-
Abort BMC I2C test
Abort BMC I2C test
xxx BMC
166-822-
Abort BMC I2C test
xxx BMC
166-823-
Abort BMC I2C test
xxx BMC
166-824-
Abort BMC I2C test
xxx BMC
166-000-
Pass
xxx
Extended results Actions
canceled: a command response could not be provided; the SDR repository is in update mode.
canceled: a command response could not be provided; the device is in firmware update mode.
canceled: a command response could not be provided; BMC initialization is in progress
canceled: the destination is unavailable.
canceled: cannot execute the command; insufficient privilege level.
canceled: cannot execute the command.
Memory tests
Table 9. Memory test results
Test Number Status
Memory stress test
201-000­xxx
Pass
Extended results Actions
Chapter 5. Diagnostics and troubleshooting 75
Table 9. Memory test results (continued)
Extended
Test Number Status
202-
Fail General error:
802-xx
results Actions
memory size is insufficient to run the test.
202-901-
Fail Test failure.
xxx
202-801­xxx
202-000-
Abort Internal program
error.
Pass
xxx
1. Ensure all memory is enabled by checking Available
System Memory in the Resource Utilization section of the
DSA Diagnostic Event Log.
2. Make sure that the DSA Diagnostic code is at the latest level. The latest level DSA Diagnostic code can be found on the IBM Support Web site at http://www.ibm.com/support/ docview.wss?uid=psg1SERV-DSA/.
3. Run the test again.
4. Execute the standard DSA memory diagnostic to validate all memory.
5. If the test continues to fail, refer to the other sections of this chapter for diagnosis and corrective action.
1. Execute the standard DSA memory diagnostic to validate all memory.
2. Make sure that the DSA Diagnostic code is at the latest level. The latest level DSA Diagnostic code can be found on the IBM Support Web site at http://www.ibm.com/support/ docview.wss?uid=psg1SERV-DSA/.
3. Turn off the system and disconnect it from power.
4. Reseat the DIMMs.
5. Reconnect the system to power and turn on the system.
6. Run the test again.
7. Execute the standard DSA memory diagnostic to validate all memory.
8. If you cannot reproduce the problem, contact your IBM technical-support representative.
1. Turn off and restart the system.
2. Make sure that the system firmware code and DSA code are at the latest level.
3. Run the test again.
4. Turn off and restart the system if necessary to recover from a hung state.
5. Run the memory diagnostic to identify the specific failing DIMM.
6. If the test continues to fail, refer to the other sections of this chapter for diagnosis and corrective action
System firmware startup messages
The system firmware displays the progress of the startup process on the serial console from the time that ac power is connected to the system until the operating system login prompt is displayed following a successful operating system startup.
If a serial console is not connected, you can use the Advanced Management Module to monitor the logs and display informational and error messages.
76 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
If the firmware encounters an error during the startup process, a message describing the error together with an error code is displayed on the serial console.
There are two types of error, where xxx represents the number of the error code:
Cxxx This is an internal checkpoint. If the system stops during the startup
process a checkpoint may be displayed.
Exxx This type of error means that there is a failure that does not allow
the firmware to continue the startup process. Check the error codes in the section “Boot errors and handling” on page 77. If these do not help resolve the problem, contact a service support representative.
are cases where a message that is informational only is displayed on the
There serial console.
Wxxx This is a warning message. The firmware allows the startup process
to continue, but indicates there maybe a problem. A warning message can be combined with an error message to give more complete information about an error.
complete list of possible messages is given in the section “Boot errors and
A handling” on page 77.
Checkpoints
Checkpoints show the progress of the boot. Each checkpoint is overwritten by the next as the boot process continues. If the boot process stops for any reason, a checkpoint may be displayed. Take note of the checkpoint code and any message, then attempt to reboot the blade server.
If the problem persists, contact your IBM service representative with details of the checkpoint and any message associated with it.
Boot errors and handling
The following sections describe boot errors and actions you can take to resolve these errors.
Boot list
The following table describes boot list errors.
Table 10. System firmware boot list errors
Code Message Description Action
E3400 It was not possible to boot from
any device specified in the VPD
The firmware found a valid VPD but was not able to find bootable code on any of the devices listed in it.
Use Advanced Management Module Web browser to specify at least one device that contains bootable code.
From the Advanced Management Module Web interface, choose
BladeTasks>Configuration>Boot Sequence.
Chapter 5. Diagnostics and troubleshooting 77
Table 10. System firmware boot list errors (continued)
Code Message Description Action
E3401 Aborting boot, <details> Boot aborted due to error detected
by the low level code. The <details> string provides the error description.
Based on the <details> string you may have to take an action on faulty hardware or use the Advanced Management Module to correct the system configuration.
If the problem persists, contact your IBM service representative.
E3402 Aborting boot, internal error. Boot aborted due to error detected
by the low level code.
The exact reason is unknown but could be a firmware problem.
If the problem persists, contact your IBM service representative.
E3403 Bad executable: <details> The file loaded from the boot
device is not a valid PPC executable ELF file. The <details> string provides more details about
Using the Advanced Management Module correct the boot device configuration. Select a valid boot device and executable path
the file type.
E3404 Not a bootable device! The system cannot load an
executable file from this device.
Using the Advanced Management Module correct the boot device configuration. Select a valid boot device and executable path.
E3405 No such device The specified boot device is
currently not present or not ready for access.
Check the hardware device or use the Advanced Management Module to correct the system configuration.
E3406 Client application returned an
error: <details>
The OS or a standalone application returned an error code to the system firmware. The <details> string provides the error description
E3407 Load failed Load or boot failed to load
requested file from the device. This is informational message and may be preceded by one or more other error messages.
E3408 Failed to claim memory for the
executable
An attempt to load executable file from the boot device failed due to insufficient memory or firmware problem.
If the problem persists, contact your IBM service representative.
Based on the <details> string you may have to take an action on faulty hardware or use the Advanced Management Module to correct the system configuration. It may be needed to perform the firmware or OS upgrade to resolve compatibility issues. If the problem persists, contact your IBM service representative.
Based on the preceding error messages you may have to take an action on faulty hardware or use the Advanced Management Module to correct the system configuration
Verify that loaded file was indeed the right executable intended to boot this system. If not, using the Advanced Management Module correct the system configuration. Otherwise, contact your IBM service representative. Yo u may need to add more memory to the system or to perform the firmware upgrade.
78 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Table 10. System firmware boot list errors (continued)
Code Message Description Action
E3409 Unknown FORTH Word Internal code error, or compatibility
issue.
Contact your IBM service representative. Yo u may need to perform the firmware upgrade.
E3410 Boot list successfully read from
VPD but no useful information received.
The firmware found a valid VPD but was not able to find bootable code on any of the devices listed in it.
Use Advanced Management Module Web browser to specify at least one device that contains bootable code.
From the Advanced Management Module Web interface, choose
BladeTasks>Configuration>Boot Sequence.
W3411 Client application returned. Loaded OS or standalone
application returned to firmware. This may be a normal condition or firmware could not detect any error issued by the client application. Booting from the boot-device list will be interrupted at this stage and no further attempts to boot
None needed. If boot (e.g. yaboot) exited because of need to boot from different device in the list, either boot manually from the firmware (ok) prompt or, using the Advanced Management Module, change the boot device order in
the system configuration. from devices in the list will be made.
E3420 Boot list could not be read from
VPD.
The firmware found an invalid VPD. Possibly it has been corrupted by the system software.
The VPD must be rewritten. Use
the Advanced Management
Module Web browser to specify at
least one device that contains
bootable code.
From the Advanced Management
Module Web interface, choose
BladeTasks>Configuration>Boot
Sequence.If the problem persists,
contact your IBM service
representative.
System firmware update errors
The following table describes system firmware errors that can occur if there have been problems after an update.
Table 11 . System firmware boot errors
Code Message Description Action
E4000 (RTAS Flash) unknown flash chip
version
The flash update code does not support the onboard boot ROM flash chip.
Contact your IBM service
representative as the system
board may need replacing.
Chapter 5. Diagnostics and troubleshooting 79
Table 11 . System firmware boot errors (continued)
Code Message Description Action
E4010 Platform check failed for image The firmware image does not
match the hardware platform.
Check the firmware image and ensure you have the right image for the BladeCenter QS22. See “Using the SMS utility program” on page 11.
If the image is incorrect, download and install the correct image from http://www.ibm.com/support/us/en/. See “Updating the system and BMC firmware” on page 15 for further information.
E4020 (RTAS flash) image corrupted
(CRC)
Download the image again and reapply the update.
The image for a system firmware update is corrupted.
If this does not resolve the problem, apply an image from a different source.
System memory errors
The following table describes system memory initialization errors that can occur during boot.
Table 12. DIMM boot errors
Code Message Description Action
E1100 Incompatible DIMM in slot x.
Disabling slots x and y.
Incompatible DIMM. DIMM does not match technical requirements
Replace the DIMM with a supported DIMM.
of the memory controller.
where
x is the slot containing the
incompatible DIMM and
Note: Both the offending DIMM
and its pair are disabled. System DIMM must operate in pairs.
See Chapter 3, “Parts listing,” on page 27 for details of supported DIMMs.
y is the slot containing its pair.
W1110 Unsupported DIMM in slot x.
where
Unsupported DIMM. DIMM is within technical requirements of memory controller but is untested.
Replace the DIMM with a supported DIMM.
See Chapter 3, “Parts listing,” on
x is the slot containing the
unsupported DIMM.
E1120 Unsupported plugging: No pair in
slots x and y. Disabling slots.
Plugging rule violation. No pair plugged on a given channel.
page 27 for details of supported DIMMs.
Check DIMM configuration. See “Adding or changing system memory” on page 39for details of
where
permitted configurations.
x and y form the pair of slots on a
given channel.
W1130 Unsupported plugging.
This message appears with one of the following informational
Plugging sequence violation. DIMMs are not plugged in supported sequence
Check DIMM configuration. See “Adding or changing system memory” on page 39 details of permitted configurations.
messages:
80 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Table 12. DIMM boot errors (continued)
Code Message Description Action
No local memory on CPU1 No local memory on IBM
PowerXCell 8i processor 1. There should be DIMMs on IBM PowerXCell 8i processor 1, but this is not the case.
No local memory on CPU2 No local memory on IBM
PowerXCell 8i processor 2. There should be DIMMs on IBM PowerXCell 8i processor 2, but this is not the case.
Slots 1 and 2 should both have DIMMs plugged
Either slot 1 or slot 2 has no DIMMs plugged, but this should not be the case.
Slots 5 and 6 should both have DIMMs plugged
Either slot 5 or slot 6 has no DIMMs plugged, but this should not be the case.
Slots [3,7] and [4,8] should have both DIMMs plugged
Slots 3,4,7,8 are not completely plugged.
Slots 1,2,5,6 are plugged, additional slots are plugged, but not all of 3,4,7,8.
E1140 Slots x and y are plugged with
different DIMM types. Disabling slots.
The type or the speed bin of the two DIMMs on a given channel differs.
Replace the DIMM with a supported DIMM.
See Chapter 3, “Parts listing,” on page 27 for details of supported DIMMs.
E1150 DIMMs in slots x and y have
different amount of ranks. Disabling
DIMM rank count on channel differs.
Replace the DIMMs with ones of the same type.
slots a and b
The number of ranks of the two DIMMs on a given channel differs.
See Chapter 3, “Parts listing,” on page 27 for details of supported DIMMs.
E1160 DIMMs on CPUx have different
types on channel 1 and 2.
DIMM types across channels differ.
Replace the DIMMs with ones of the same type.
Disabling slots a and b.
The type or speed bin of the DIMMs on CH0 is different than the type of the DIMMs on CH1.
See Chapter 3, “Parts listing,” on page 27 for details of supported DIMMs.
W1170 DIMMs on CPux have different
speed bins on channel 1 and 2.
W1180 DIMMs on CPux have different
rank count on channel 1 and 2. Truncating slots a and b.
DIMM speed bins across channels differs.
Speed bins on Chx are greater than those on CHy
DIMM rank count across channels differs
Chapter 5. Diagnostics and troubleshooting 81
Replace the DIMMs with ones of the same type.
See Chapter 3, “Parts listing,” on page 27 for details of supported DIMMs.
Replace the DIMMs with ones of the same type.
See Chapter 3, “Parts listing,” on page 27 for details of supported DIMMs.
Table 12. DIMM boot errors (continued)
Code Message Description Action
E1200 Error during memtest.
Memory test failed
This message appears with one of the following informational messages:
Detected MIC SRAM parity error. MIC SRAM parity failed during
memtest.
MIC detected a parity error in an internal buffer.
Detected MIC Data write error. MIC DERR during memtest
MIC received a DERR condition from another on chip unit upon a write request from that unit.
Detected single-bit ECC error (recoverable) on DIMM x or DIMM y.
A correctable error occurred during memtest.
Replace the DIMM where the error occurred with a supported DIMM.
See Chapter 3, “Parts listing,” on page 27* for details of supported DIMMs.
Detected multi-bit ECC uncorrectable error on DIMM x or DIMM y.
An uncorrectable error occurred during memtest
Replace the DIMM where the error occurred with a supported DIMM.
Stuck data bit on DIMM x. Stuck data bit during memtest.
Stuck address bit on DIMM x. Stuck address bit during memtest.
E1300 No memory available at address
zero. Aborting system boot.
One or more data lanes failed in the stuck bit test.
One or more address lanes failed in the stuck bit test.
No memory is assigned to address 0. Due to error E1100 all memory attached to CPU with address zero must be disabled.
See Chapter 3, “Parts listing,” on page 27 for details of supported DIMMs.
Replace the DIMM where the error occurred with a supported DIMM.
See Chapter 3, “Parts listing,” on page 27 for details of supported DIMMs.
Replace the DIMM where the error occurred with a supported DIMM.
See Chapter 3, “Parts listing,” on page 27 for details of supported DIMMs.
Replace the DIMM where the error occurred with a supported DIMM.
See Chapter 3, “Parts listing,” on page 27 for details of supported DIMMs.
82 BladeCenter QS22 Type 0793: Problem Determination and Service Guide
Loading...