IBM BladeCenter JS12 Problem Determination And Service Manual


BladeCenter JS12 Type 7998
Problem Determination and Service Guide

BladeCenter JS12 Type 7998
Problem Determination and Service Guide
Note
Before using this information and the product it supports, read the general information in Appendix B, “Notices,” on page 289 and the Warranty and Support Information document for your blade server type on the Documentation CD.
Second Edition (November 2009)
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

Contents

Safety ...............v
Guidelines for trained service technicians ....vi
Inspecting for unsafe conditions ......vi
Guidelines for servicing electrical equipment . . vii
Safety statements ............viii
Chapter 1. Introduction ........1
Related documentation ...........1
Notices and statements in this documentation . . . 2
Features and specifications..........2
Supported DIMMs ............4
Blade server control panel buttons and LEDs . . . 5
Turning on the blade server .........7
Turning off the blade server .........8
System-board layouts ...........9
System-board connectors .........9
System-board LEDs ...........9
Chapter 2. Diagnostics ........11
Diagnostic tools .............12
Collecting dump data ...........13
Location codes .............14
Reference codes .............15
System reference codes (SRCs) .......16
1xxxyyyy SRCs ...........17
6xxxyyyy SRCs ...........22
A1xxyyyy service processor SRCs .....24
AA00E1A8 to AA260005 Partition firmware
attention codes ...........25
Bxxxxxxx Service processor early termination
SRCs ..............28
B200xxxx Logical partition SRCs .....29
B700xxxx Licensed internal code SRCs . . . 39 BA000010 to BA400002 Partition firmware
SRCs ..............48
POST progress codes (checkpoints) .....88
C1001F00 to C1645300 Service processor
checkpoints ............89
C2001000 to C20082FF Virtual service
processor checkpoints .........99
IPL status progress codes .......109
C700xxxx Server firmware IPL status
checkpoints ...........109
CA000000 to CA2799FF Partition firmware
checkpoints ............110
D1001xxx to D1xx3FFF Service processor
dump codes ............131
D1xx3y01 to D1xx3yF2 Service processor
dump codes ...........138
D1xx900C to D1xxC003 Service processor
power-off checkpoints ........140
Service request numbers (SRNs) ......141
Using the SRN tables .........142
101-711 through FFC-725 SRNs .....142
A00-FF0 through A24-xxx SRNs .....159
ssss-102 through ssss-640 SRNs for SCSI
devices .............180
Failing function codes 151 through 2D02 . . 184
Error logs ..............186
Checkout procedure ...........186
About the checkout procedure.......186
Performing the checkout procedure .....187
Verifying the partition configuration......189
Running the diagnostics program ......189
Starting AIX concurrent diagnostics .....189
Starting stand-alone diagnostics from a CD . . 190
Starting stand-alone diagnostics from a NIM
server ...............191
Using the diagnostics program ......192
Boot problem resolution ..........193
Troubleshooting tables ..........194
General problems ...........195
Hard disk drive problems ........195
Intermittent problems .........196
Keyboard problems ..........196
Management module service processor
problems ..............197
Memory problems ...........197
Microprocessor problems ........198
Monitor or video problems ........198
Network connection problems .......200
PCI expansion card (PIOCARD) problem
isolation procedure ..........200
Optional device problems ........201
Power problems ...........202
POWER Hypervisor (PHYP) problems ....203
Service processor problems ........205
Software problems...........217
Universal Serial Bus (USB) port problems . . . 217
Light path diagnostics ..........218
Viewing the light path diagnostic LEDs . . . 218
Light path diagnostics LEDs .......219
Isolating firmware problems ........222
Recovering the system firmware .......222
Starting the PERM image ........222
Starting the TEMP image ........223
Recovering the TEMP image from the PERM
image ...............223
Verifying the system firmware levels ....224
Committing the TEMP system firmware image 224 Solving shared BladeCenter resource problems . . 225
Solving shared keyboard problems .....226
Solving shared media tray problems.....226
Solving shared network connection problems 228
Solving shared power problems ......229
Solving shared video problems ......230
Solving undetermined problems .......231
Calling IBM for service ..........232
Chapter 3. Parts listing, Type 7998 235
© Copyright IBM Corp. 2008, 2009 iii
Chapter 4. Removing and replacing
blade server components ......239
Installation guidelines ..........239
System reliability guidelines .......240
Handling static-sensitive devices ......240
Returning a device or component .....241
Removing the blade server from a BladeCenter
unit ................241
Installing the blade server in a BladeCenter unit 242
Removing and replacing Tier 1 CRUs .....244
Removing the blade server cover ......244
Installing and closing the blade server cover . . 245
Removing the bezel assembly .......246
Installing the bezel assembly .......247
Removing a SAS hard disk drive ......248
Installing a SAS hard disk drive ......249
Removing a memory module .......251
Installing a memory module .......251
Removing the management card ......253
Installing the management card ......254
Entering vital product data ........256
Obtaining a PowerVM Virtualization Engine
system technologies activation code .....257
Removing and installing an I/O expansion card 260
Removing a small-form-factor expansion card 260 Installing a small-form-factor expansion card 261 Removing a standard-form-factor expansion
card..............263
Installing a standard-form-factor expansion
card..............264
Removing a combination-form-factor
expansion card ...........265
Installing a combination-form-factor
expansion card ...........266
Removing the battery .........267
Installing the battery ..........268
Removing the hard disk drive tray .....270
Installing the hard disk drive tray .....271
Removing the expansion bracket ......272
Installing the expansion bracket ......273
Replacing the Tier 2 system-board and chassis
assembly ...............274
Updating the firmware ..........277
Configuring the blade server ........278
Using the SMS utility...........279
Starting the SMS utility .........279
SMS utility menu choices ........280
Creating a CE login ...........280
Configuring the Gigabit Ethernet controllers . . . 281 Blade server Ethernet controller enumeration . . . 282 MAC addresses for host Ethernet adapters . . . 282
Updating IBM Director ..........283
Appendix A. Getting help and
technical assistance ........285
Before you call .............286
Using the documentation .........286
Getting help and information from the Web . . . 287
Software service and support ........287
Hardware service and support .......287
IBM Taiwan product service ........287
Appendix B. Notices ........289
Trademarks ..............290
Important notes ............291
Product recycling and disposal .......291
Battery return program ..........293
Electronic emission notices .........295
Federal Communications Commission (FCC)
statement..............295
Industry Canada Class A emission compliance
statement..............295
Avis de conformité à la réglementation
d’Industrie Canada ..........296
Australia and New Zealand Class A statement 296
United Kingdom telecommunications safety
requirement .............296
European Union EMC Directive conformance
statement..............296
Taiwanese Class A warning statement ....297
Chinese Class A warning statement .....297
Japanese Voluntary Control Council for
Interference (VCCI) statement .......297
Chapter 5. Configuring .......277
iv
JS12 Type 7998: Problem Determination and Service Guide
Index ...............299

Safety

Before installing this product, read the Safety Information.
Antes de instalar este produto, leia as Informações de Segurança.
Pred instalací tohoto produktu si prectete prírucku bezpecnostních instrukcí.
Læs sikkerhedsforskrifterne, før du installerer dette produkt.
Lees voordat u dit product installeert eerst de veiligheidsvoorschriften.
Ennen kuin asennat tämän tuotteen, lue turvaohjeet kohdasta Safety Information.
Avant d’installer ce produit, lisez les consignes de sécurité.
Vor der Installation dieses Produkts die Sicherheitshinweise lesen.
Prima di installare questo prodotto, leggere le Informazioni sulla Sicurezza.
Les sikkerhetsinformasjonen (Safety Information) før du installerer dette produktet.
Antes de instalar este produto, leia as Informações sobre Segurança.
© Copyright IBM Corp. 2008, 2009 v
Antes de instalar este producto, lea la información de seguridad.
Läs säkerhetsinformationen innan du installerar den här produkten.

Guidelines for trained service technicians

Inspect the equipment for unsafe conditions and observe the servicing guidelines.

Inspecting for unsafe conditions

Identify potential unsafe conditions in an IBM®product that you are working on.
Each IBM product, as it was designed and manufactured, has required safety items to protect users and service technicians from injury. This information addresses only those items. Use good judgment to identify potential unsafe conditions that might be caused by non-IBM alterations or attachment of non-IBM features or options that are not addressed in this information. If you identify an unsafe condition, you must determine how serious the hazard is and whether you must correct the problem before you work on the product.
Consider the following conditions and the safety hazards that they present:
v Electrical hazards, especially primary power. Primary voltage on the frame can
cause serious or fatal electrical shock.
v Explosive hazards, such as a damaged CRT face or a bulging capacitor.
v Mechanical hazards, such as loose or missing hardware.
To inspect the product for potential unsafe conditions, complete the following steps:
1. Make sure that the power is off and the power cords are disconnected.
2. Make sure that the exterior cover is not damaged, loose, or broken, and observe
any sharp edges.
3. Check the power cords:
v Make sure that the third-wire ground connector is in good condition. Use a
meter to measure third-wire ground continuity for 0.1 ohm or less between the external ground pin and the frame ground.
v Make sure that the power cords are the correct type.
v Make sure that the insulation is not frayed or worn.
4. Remove the cover.
5. Check for any obvious non-IBM alterations. Use good judgment as to the safety
of any non-IBM alterations.
6. Check inside the computer for any obvious unsafe conditions, such as metal
filings, contamination, water or other liquid, or signs of fire or smoke damage.
7. Check for worn, frayed, or pinched cables.
8. Make sure that the power-supply cover fasteners (screws or rivets) have not
been removed or tampered with.
vi JS12 Type 7998: Problem Determination and Service Guide

Guidelines for servicing electrical equipment

Observe the guidelines for servicing electrical equipment.
v Check the area for electrical hazards such as moist floors, nongrounded power
extension cords, and missing safety grounds.
v Use only approved tools and test equipment. Some hand tools have handles that
are covered with a soft material that does not provide insulation from live electrical current.
v Regularly inspect and maintain your electrical hand tools for safe operational
condition. Do not use worn or broken tools or testers.
v Do not touch the reflective surface of a dental mirror to a live electrical circuit.
The surface is conductive and can cause personal injury or equipment damage if it touches a live electrical circuit.
v Some rubber floor mats contain small conductive fibers to decrease electrostatic
discharge. Do not use this type of mat to protect yourself from electrical shock.
v Do not work alone under hazardous conditions or near equipment that has
hazardous voltages.
v Locate the emergency power-off (EPO) switch, disconnecting switch, or electrical
outlet so that you can turn off the power quickly in the event of an electrical accident.
v Disconnect all power before you perform a mechanical inspection, work near
power supplies, or remove or install main units.
v Before you work on the equipment, disconnect the power cord. If you cannot
disconnect the power cord, have the customer power-off the wall box that supplies power to the equipment and lock the wall box in the off position.
v Never assume that power has been disconnected from a circuit. Check it to
make sure that it has been disconnected.
v If you have to work on equipment that has exposed electrical circuits, observe
the following precautions:
– Make sure that another person who is familiar with the power-off controls is
near you and is available to turn off the power if necessary.
– When you are working with powered-on electrical equipment, use only one
hand. Keep the other hand in your pocket or behind your back to avoid creating a complete circuit that could cause an electrical shock.
– When using a tester, set the controls correctly and use the approved probe
leads and accessories for that tester.
– Stand on a suitable rubber mat to insulate you from grounds such as metal
floor strips and equipment frames.
v Use extreme care when measuring high voltages.
v To ensure proper grounding of components such as power supplies, pumps,
blowers, fans, and motor generators, do not service these components outside of their normal operating locations.
v If an electrical accident occurs, use caution, turn off the power, and send another
person to get medical aid.
Safety vii

Safety statements

Important: Each caution and danger statement in this documentation is labeled with a number. This number is used to cross reference an English-language caution or danger statement with translated versions of the caution or danger statement in the Safety Information document.
For example, if a caution statement is labeled, Statement 1,translations for that caution statement are in the Safety Information document under Statement 1.Be sure to read all caution and danger statements in this documentation before you perform the procedures. Read any additional safety information that comes with your blade server or optional device before you install the device.
Statement 1
DANGER
Electrical current from power, telephone, and communication cables is hazardous.
To avoid a shock hazard:
v Do not connect or disconnect any cables or perform installation,
maintenance, or reconfiguration of this product during an electrical storm.
v Connect all power cords to a properly wired and grounded electrical outlet.
v Connect to properly wired outlets any equipment that will be attached to
this product.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or
structural damage.
v Disconnect the attached power cords, telecommunications systems,
networks, and modems before you open the device covers, unless instructed otherwise in the installation and configuration procedures.
v Connect and disconnect cables as described in the following table when
installing, moving, or opening covers on this product or attached devices.
To Connect: To Disconnect:
1. Turn everything OFF.
2. First, attach all cables to devices.
3. Attach signal cables to connectors.
4. Attach power cords to outlet.
5. Turn device ON.
1. Turn everything OFF.
2. First, remove power cords from outlet.
3. Remove signal cables from connectors.
4. Remove all cables from devices.
viii JS12 Type 7998: Problem Determination and Service Guide
Statement 2
CAUTION: When replacing the lithium battery, use only IBM Part Number 16G8095 or an equivalent type battery recommended by the manufacturer. If your system has a module containing a lithium battery, replace it only with the same module type made by the same manufacturer. The battery contains lithium and can explode if not properly used, handled, or disposed of.
Do not:
v Throw or immerse into water
v Heat to more than 100°C (212°F)
v Repair or disassemble
Dispose of the battery as required by local ordinances or regulations.
Statement 3
CAUTION: When laser products (such as CD-ROMs, DVD drives, fiber optic devices, or transmitters) are installed, note the following:
v Do not remove the covers. Removing the covers of the laser product could
result in exposure to hazardous laser radiation. There are no serviceable parts inside the device.
v Use of controls or adjustments or performance of procedures other than those
specified herein might result in hazardous radiation exposure.
DANGER
Some laser products contain an embedded Class 3A or Class 3B laser diode. Note the following.
Laser radiation when open. Do not stare into the beam, do not view directly with optical instruments, and avoid direct exposure to the beam.
Safety ix
Statement 4
18 kg (39.7 lb) 32 kg (70.5 lb) 55 kg (121.2 lb)
CAUTION: Use safe practices when lifting.
Statement 5
CAUTION: The power control button on the device and the power switch on the power supply do not turn off the electrical current supplied to the device. The device also might have more than one power cord. To remove all electrical current from the device, ensure that all power cords are disconnected from the power source.
1 2
Statement 8
x JS12 Type 7998: Problem Determination and Service Guide
CAUTION: Never remove the cover on a power supply or any part that has the following label attached.
Hazardous voltage, current, and energy levels are present inside any component that has this label attached. There are no serviceable parts inside these components. If you suspect a problem with one of these parts, contact a service technician.
Statement 10
CAUTION: Do not place any object on top of rack-mounted devices.
Safety xi
xii JS12 Type 7998: Problem Determination and Service Guide

Chapter 1. Introduction

This problem determination and service information helps you solve problems that might occur in your IBM BladeCenter®JS12 Type 7998 blade server. The information describes the diagnostic tools that come with the blade server, error codes and suggested actions, and instructions for replacing failing components.
Replaceable components are of three types:
v Tier 1 customer replaceable unit (CRU): Replacement of Tier 1 CRUs is your
responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation.
v Tier 2 customer replaceable unit: You may install a Tier 2 CRU yourself or
request IBM to install it, at no additional charge, under the type of warranty service that is designated for your blade server.
v Field replaceable unit (FRU): FRUs must be installed only by trained service
technicians.
For information about the terms of the warranty and getting service and assistance, see the Warranty and Support Information document.

Related documentation

Documentation for the JS12 blade server includes documents in Portable Document Format (PDF) on the IBM BladeCenter Documentation CD and the online information center.
The most recent version of all BladeCenter documentation is in the BladeCenter information center.
The online BladeCenter information center is available in the IBM Systems Information Center.
You can find the following documents in PDF on the IBM BladeCenter Documentation CD and in the online information center:
v Installation and User’s Guide
This document contains general information about the blade server, including how to install supported options and how to configure the blade server.
v Safety Information
This document contains translated caution and danger statements. Each caution and danger statement that appears in the documentation has a number that you can use to locate the corresponding statement in your language in the Safety Information document.
v Warranty and Support Information
This document contains information about the terms of the warranty and about getting service and assistance.
© Copyright IBM Corp. 2008, 2009 1
Additional documents might be included in the online information center and on the IBM BladeCenter Documentation CD.
The blade server might have features that are not described in the documentation that comes with the blade server. The documentation might be updated occasionally to include information about those features, or technical updates might be available to provide additional information that is not included in the documentation that comes with the blade server.
Review the online information or the Planning Guide and the Installation Guide for your IBM BladeCenter unit. The information can help you prepare for system installation and configuration. The most current version of each document is available in the BladeCenter information center.

Notices and statements in this documentation

The caution and danger statements in this document are also in the multilingual Safety Information. Each statement is numbered for reference to the corresponding statement in your language in the Safety Information document.
The following notices and statements are used in this document:
v Note: These notices provide important tips, guidance, or advice.
v Important: These notices provide information or advice that might help you
avoid inconvenient or problem situations.
v Attention: These notices indicate potential damage to programs, devices, or data.
An attention notice is placed just before the instruction or situation in which damage might occur.
v Caution: These statements indicate situations that can be potentially hazardous
to you. A caution statement is placed just before the description of a potentially hazardous procedure step or situation.
v Danger: These statements indicate situations that can be potentially lethal or
extremely hazardous to you. A danger statement is placed just before the description of a potentially lethal or extremely hazardous procedure step or situation.

Features and specifications

Features and specifications of the IBM BladeCenter JS12 Type 7998 blade server are summarized in this overview.
2 JS12 Type 7998: Problem Determination and Service Guide
The JS12 blade server is used in one of the following IBM BladeCenter units: BladeCenter E (8677), BladeCenter H (8852), BladeCenter HT (8740 and 8750), BladeCenter S (8886), and BladeCenter T (8720 and 8730) units.
Notes:
v Power, cooling, removable-media drives, external ports, and advanced system
management are provided by the BladeCenter unit.
v The operating system in the blade server must provide support for the Universal
Serial Bus (USB), to enable the blade server to recognize and communicate internally with the removable-media drives and front-panel USB ports.
Microprocessor:
Support for one dual-core, 64-bit POWER6
Support for Energy Scale thermal management for power management/oversubscription (throttling) and environmental sensing
Memory:
v Dual-channel (DDR2) with 8 slots
v Supports 1 GB, 2 GB, 4 GB, and 8
v Supports 2-way interleaved, DDR2,
Virtualization:
PowerVM Standard Edition hardware feature supports Integrated Virtualization Manager and Virtual I/O Server
®
microprocessor; 3.8 GHz
for very low profile (18.3 mm) DIMMs
GB DDR2 DIMMs for a maximum of 64 GB
PC2-4200 or PC2-5300, ECC SDRAM registered x4, memory scrubbing, Chipkill, and bit steering DIMMs
Integrated functions:
v Two 1 Gigabit Ethernet controllers
v Expansion card interface
v The baseboard management
controller (BMC) is a flexible service processor with Intelligent Platform Management Interface (IPMI) firmware and SOL support
v ATI RN 50 ES1000 video controller
v SAS RAID controller
v Light path diagnostics
v RS-485 interface for
communication with the management module
v Automatic server restart (ASR)
v Serial over LAN (SOL)
v Support for local keyboard and
video
v Four Universal Serial Bus (USB)
buses for communication with keyboard and removable-media drives
v Transferable Anchor function
(Renesas Technology HD651330 microcontroller) in the management card
Storage:
Predictive Failure Analysis (PFA) alerts:
v Microprocessor
v Memory
Electrical input: 12Vdc
Environment:
v Air temperature:
– Blade server on: 10° to 35°C (50°
to 95°F). Altitude: 0 to 914 m (3000 ft)
– Blade server on: 10° to 32°C (50°
to 90°F). Altitude: 914 m to 2133 m (3000 ft to 7000 ft)
– Blade server off: -40° to 60°C (-40°
to 140°F)
v Humidity:
– Blade server on: 8% to 80% – Blade server off: 8% to 80%
Size:
v Height: 24.5 cm (9.7 inches)
v Depth: 44.6 cm (17.6 inches)
v Width: 2.9 cm (1.14 inches)
v Maximum weight: 5.0 kg (11 lb)
Support for two internal small-form-factor (SFF) Serial Attached SCSI (SAS) drives
See the ServerProven Web site for information about supported operating-system versions and all JS12 blade server optional devices.
Chapter 1. Introduction 3

Supported DIMMs

The BladeCenter JS12 Type 7998 blade server contains eight memory connectors for industry-standard registered, dual-inline-memory modules (RDIMMs). The DIMMS are very low profile, which means that each DIMM has a height of 18.3 millimeters (mm). Total memory can range from a minimum of 2 gigabytes (GB) to a maximum of 64 GB.
See Chapter 3, “Parts listing, Type 7998,” on page 235 for memory modules that you can order from IBM.
Memory module rules:
v Install DIMMs in pairs in the following connectors to have a supported (tested)
Table 1. Supported use of DIMMs
DIMM Connectors
Pair 1 (DIMM 1 and DIMM 3)
Pair 2 (DIMM 6 and DIMM 8)
Pair 3 (DIMM 2 and DIMM 4)
Pair 4 (DIMM 5 and DIMM 7)
configuration:
Number of DIMMs in Use
Two Four Six Eight
Yes Yes Yes Yes
No Yes Yes Yes
No No Yes Yes
No No No Yes
See “System-board connectors” on page 9 for DIMM connector locations.
v Both DIMMs in a pair must be the same size, speed, type, technology, and
physical design. You can mix compatible DIMMs from different manufacturers. Each DIMM in each of the following sets of four connectors must be the same size:
Size 1 DIMM 1 and DIMM 3 (pair 1) and DIMM 2 and DIMM 4 (pair 3) when
using 6 or 8 DIMMs
Size 2 DIMM 5 and DIMM 7 (pair 4) and DIMM 6 and DIMM 8 (pair 2) when
using 8 DIMMs
v When using 4 DIMMs in DIMM 1 and DIMM 3 (pair 1) and DIMM 6 and
DIMM 8 (pair 2), DIMMs in the second pair can differ in size and speed from the first pair.
v When using 8 GB DIMMs, all of the DIMMS used must be 8 GB.
®
v Install only supported DIMMs, as described on the ServerProven
Web site. See
http://www.ibm.com/servers/eserver/serverproven/compat/us/.
v Installing or removing DIMMs changes the configuration of the blade server.
After you install or remove a DIMM, the blade server is automatically reconfigured, and the new configuration information is stored.
4 JS12 Type 7998: Problem Determination and Service Guide

Blade server control panel buttons and LEDs

Blade server control panel buttons and LEDs provide operational controls and status indicators.
Note: Figure 1 shows the control-panel door in the closed (normal) position. To access the power-control button, you must open the control-panel door.
Keyboard/video select button
Media-tray select button
MT
Location LED
Activity LED
Power-on LED
Sleep (not used on blade server)
Figure 1. Blade server control panel buttons and LEDs
Keyboard/video select button: When you use an operating system that supports a local console and keyboard, press this button to associate the shared BladeCenter unit keyboard and video ports with the blade server.
Information LED
Blade-error LED
Power-control button
NMI reset
Notes:
v The operating system in the blade server must provide USB support for the
blade server to recognize and use the keyboard, even if the keyboard has a PS/2-style connector.
v The keyboard and video are available after partition firmware loads and is
running. Power-on self-test (POST) codes and diagnostics are not supported using the keyboard and video. Use the management module to view checkpoints.
The LED on this button flashes while the request is being processed, then is lit when the ownership of the keyboard and video has been transferred to the blade server. It can take approximately 20 seconds to switch control of the keyboard and video to the blade server.
Using a keyboard that is directly attached to the management module, you can press keys in the following sequence to switch keyboard and video control between blade servers:
NumLock NumLock blade_server_number Enter
Chapter 1. Introduction 5
Where blade_server_number is the two-digit number for the blade bay in which the blade server is installed. When you use some keyboards, such as the 28L3644 (37L0888) keyboard, hold down the Shift key while you enter this key sequence.
If there is no response when you press the keyboard/video select button, you can use the Web interface of the management module to determine whether local control has been disabled on the blade server.
Media-tray select button: Press this button to associate the shared BladeCenter unit media tray (removable-media drives and front-panel USB ports) with the blade server. The LED on the button flashes while the request is being processed, then is lit when the ownership of the media tray has been transferred to the blade server. It can take approximately 20 seconds for the operating system in the blade server to recognize the media tray.
If there is no response when you press the media-tray select button, use the management module to determine whether local control has been disabled on the blade server.
Note: The operating system in the blade server must provide USB support for the blade server to recognize and use the removable-media drives and USB ports.
Information LED: When this amber LED is lit, it indicates that information about a system error for the blade server has been placed in the management-module event log. The information LED can be turned off through the Web interface of the management module or through IBM Director Console.
Blade-error LED: When this amber LED is lit, it indicates that a system error has occurred in the blade server. The blade-error LED will turn off after one of the following events:
v Correcting the error
v Reseating the blade server in the BladeCenter unit
v Cycling the BladeCenter unit power
Power-control button: This button is behind the control panel door. Press this button to turn on or turn off the blade server.
The power-control button has effect only if local power control is enabled for the blade server. Local power control is enabled and disabled through the Web interface of the management module.
Press the power button for 5 seconds to begin powering down the blade server.
6 JS12 Type 7998: Problem Determination and Service Guide
NMI reset (recessed): The nonmaskable interrupt (NMI) reset dumps the partition.
Use this recessed button only as directed by IBM Support.
Power-on LED: This green LED indicates the power status of the blade server in the following manner:
v Flashing rapidly: The service processor (BMC) is initializing the blade server.
v Flashing slowly: The blade server has completed initialization and is waiting for
a power-on command.
v Lit continuously: The blade server has power and is turned on.
Note: The enhanced service processor (BMC) can take as long as three minutes to initialize after you install the BladeCenter JS12 blade server, at which point the LED begins to flash slowly.
Activity LED: When this green LED is lit, it indicates that there is activity on the hard disk drive or network.
Location LED: When this blue LED is lit, it has been turned on by the system administrator to aid in visually locating the blade server. The location LED can be turned off through the Web interface of the management module or through IBM Director Console.

Turning on the blade server

After you connect the blade server to power through the BladeCenter unit, you can start the blade server after the discovery and initialization process is complete.
You can start the blade server in any of the following ways.
v Start the blade server by pressing the power-control button on the front of the
blade server.
The power-control button is behind the control panel door, as described in “Blade server control panel buttons and LEDs” on page 5.
After you push the power-control button, the power-on LED continues to blink slowly for about 15 seconds, then is lit solidly when the power-on process is complete.
Wait until the power-on LED on the blade server flashes slowly before you press the blade server power-control button. If the power-on LED is flashing rapidly, the service processor is initializing the blade server. The power-control button does not respond during initialization.
Note: The enhanced service processor (BMC) can take as long as three minutes to initialize after you install the BladeCenter JS12 blade server, at which point the LED begins to flash slowly.
Chapter 1. Introduction 7
v Start the blade server automatically when power is restored after a power
failure.
If a power failure occurs, the BladeCenter unit and then the blade server can start automatically when power is restored. You must configure the blade server to restart through the management module.
v Start the blade server remotely using the management module.
After you initiate the power-on process, the power-on LED blinks slowly for about 15 seconds, then is lit solidly when the power-on process is complete.

Turning off the blade server

When you turn off the blade server, it is still connected to power through the BladeCenter unit. The blade server can respond to requests from the service processor, such as a remote request to turn on the blade server. To remove all power from the blade server, you must remove it from the BladeCenter unit.
Shut down the operating system before you turn off the blade server. See the operating-system documentation for information about shutting down the operating system.
You can turn off the blade server in one of the following ways.
v Turn off the blade server by pressing the power-control button for at least 5
seconds.
The power-control button is on the blade server behind the control panel door. See “Blade server control panel buttons and LEDs” on page 5 for the location.
Note: The power-control LED can remain on solidly for up to 1 minute after you push the power-control button. After you turn off the blade server, wait until the power-control LED is blinking slowly before you press the power-control button to turn on the blade server again.
If the operating system stops functioning, press and hold the power-control button for more than 5 seconds to force the blade server to turn off.
v Use the management module to turn off the blade server.
The power-control LED can remain on solidly for up to 1 minute after you initiate the power-off process. After you turn off the blade server, wait until the power-control LED is blinking slowly before you initiate the power-on process from the advanced management module to turn on the blade server again.
Use the management-module Web interface to configure the management module to turn off the blade server if the system is not operating correctly.
For additional information, see the online documentation or the User’s Guide for the management module.
8 JS12 Type 7998: Problem Determination and Service Guide

System-board layouts

Illustrations show the connectors and LEDs on the system board. The illustrations might differ slightly from your hardware.

System-board connectors

Blade server components attach to the connectors on the system board.
Figure 2 shows the connectors on the system board in the blade server.
Control panel connector
SAS drive (P1-D1)
DIMM 1 (P1-C1)
DIMM 2 (P1-C2)
DIMM 3 (P1-C3)
DIMM 4 (P1-C4)
SAS drive (P1-D2)
PCI-X expansion card (P1-C10)
PCI-X expansion card (P1-C10)
PCI-E high-speed expansion card (P1-C11)
Management card (P1-C9)
Battery (P1-E1)
Figure 2. System-board connectors

System-board LEDs

Use the illustration of the LEDs on the system board to identify a light emitting diode (LED).
DIMM 5 (P1-C5)
DIMM 6 (P1-C6)
DIMM 7 (P1-C7)
DIMM 8 (P1-C8)
Chapter 1. Introduction 9
Remove the blade server from the BladeCenter unit, open the cover to see any error LEDs that were turned on during error processing, and use Figure 3 to identify the failing component.
Front SAS drive error LED (P1-D1)
System board error LED (P1)
Battery error LED (P1-E1)
Figure 3. System-board LEDs
Power LED (always on when plugged in)
PCIe high-speed expansion card error LED (P1-C11)
Management card error LED (P1-C9)
DIMM 1 error LED (P1-C1)
DIMM 2 error LED (P1-C2)
DIMM 3 error LED (P1-C3)
DIMM 4 error LED (P1-C4)
PCI-X expansion card error LED (P1-C10)
DIMM 5 error LED (P1-C5)
DIMM 6 error LED (P1-C6)
DIMM 7 error LED (P1-C7)
DIMM 8 error LED (P1-C8)
10 JS12 Type 7998: Problem Determination and Service Guide

Chapter 2. Diagnostics

Use the available diagnostic tools to help solve any problems that might occur in the blade server.
The first and most crucial component of a solid serviceability strategy is the ability to accurately and effectively detect errors when they occur. While not all errors are a threat to system availability, those that go undetected are dangerous because the system does not have the opportunity to evaluate and act if necessary. POWER6 processor-based systems are specifically designed with error-detection mechanisms that extend from processor cores and memory to power supplies and hard drives.
POWER6 processor-based systems contain specialized hardware detection circuitry for detecting erroneous hardware operations. Error checking hardware ranges from parity error detection coupled with processor instruction retry and bus retry, to ECC correction on caches and system buses.
IBM hardware error checkers have these distinct attributes:
v Continuous monitoring of system operations to detect potential calculation
errors
v Attempted isolation of physical faults based on runtime detection of each unique
failure
v Initiation of a wide variety of recovery mechanisms designed to correct a
problem
POWER6 processor-based systems include extensive hardware and firmware recovery logic.
Machine check handling
Machine checks are handled by firmware. When a machine check occurs, the firmware analyzes the error to identify the failing device and creates an error log entry.
If the system degrades to the point that the service processor cannot reach standby state, the ability to analyze the error does not exist. If the error occurs during POWER
In partitioned mode, an error that occurs during partition activity is surfaced to the operating system in the partition.
®
hypervisor (PHYP) activities, the PHYP initiates a system reboot.
© Copyright IBM Corp. 2008, 2009 11

Diagnostic tools

What to do if you cannot solve a problem
If you cannot locate and correct the problem using the diagnostics tools and information, see Appendix A, “Getting help and technical assistance,” on page 285.
Tools are available to help you diagnose and solve hardware-related problems.
v Power-on self-test (POST) progress codes (checkpoints), error codes, and
isolation procedures
The POST checks out the hardware at system initialization. IPL diagnostic functions test some system components and interconnections. The POST generates eight-digit checkpoints to mark the progress of powering up the blade server.
Use the management module to view progress codes.
The documentation of a progress code includes recovery actions for system hangs. See “POST progress codes (checkpoints)” on page 88 for more information.
If the service processor detects a problem during POST, an error code is logged in the management module event log. Error codes are also logged in the Linux syslog or AIX®diagnostic log, if possible. See “System reference codes (SRCs)” on page 16.
The service processor can generate codes that point to specific isolation procedures. See “Service processor problems” on page 205.
v Light path diagnostics
Use the light path diagnostic LEDs on the system board to identify failing hardware. If the system error LED on the system LED panel on the front or rear of the BladeCenter unit is lit, one or more error LEDs on the BladeCenter unit components also might be lit.
Light path diagnostics help identify failing customer replaceable unit (CRUs). CRU location codes are included in error codes and the event log.
LED locations
See “System-board LEDs” on page 9.
Front panel
See “Blade server control panel buttons and LEDs” on page 5.
v Troubleshooting tables
Use the troubleshooting tables to find solutions to problems that have identifiable symptoms.
See “Troubleshooting tables” on page 194.
v Dump data collection
In some circumstances, an error might require a dump to show more data. The Integrated Virtual Manager (IVM) sets up a dump area. Specific IVM information is included as part of the information that can optionally be sent to IBM support for analysis.
See “Collecting dump data” on page 13 for more information.
v Stand-alone diagnostics
The AIX-based stand-alone Diagnostics CD is in the ship package and is also available from the IBM Web site. Boot the CD from a CD drive or from an AIX network installation manager (NIM) server if the blade server cannot boot to an operating system, no matter which operating system is installed.
®
12 JS12 Type 7998: Problem Determination and Service Guide
Functions provided by the stand-alone diagnostics include:
– Analysis of errors reported by platform, such as microprocessor and memory
– Testing of resources, such as I/O adapters and devices
– Service aids, such as firmware update, format disk, and Raid Manager
v Diagnostic utilities for the AIX operating system
Run AIX concurrent diagnostics if AIX is functioning instead of the stand-alone diagnostics. Functions provided by disk-based AIX diagnostic include:
– Automatic error log analysis
– Analysis of errors reported by platform, such as microprocessor and memory
– Testing of resources, such as I/O adapters and devices
– Service aids, such as firmware update, format disk, and Raid Manager
v Diagnostic utilities for Linux operating systems
Linux on POWER service and productivity tools include hardware diagnostic aids and productivity tools, and installation aids. The installation aids are provided in the IBM Installation Toolkit for Linux on POWER, a set of tools that aids the installation of Linux on IBM servers with POWER architecture. You can also use the tools to update the JS12 blade server firmware.
Diagnostic utilities for the Linux operating system are available from IBM at https://www14.software.ibm.com/webapp/set2/sas/f/lopdiags/home.html.
v Diagnostic utilities for other operating systems
You can use the stand-alone Diagnostics CD to perform diagnostics on the JS12 blade server, no matter which operating system is loaded on the blade server. However, other supported operating systems might have diagnostic tools that are available through the operating system. See the documentation for your operating system for more information.

Collecting dump data

A dump might be critical for fault isolation when the built-in First Failure Data Capture (FFDC) mechanisms are not capturing sufficient fault data. Even when a fault is identified, dump data can provide additional information that is useful in problem determination.
All hardware state information is part of the dump if a hardware checkstop occurs. When a checkstop occurs, the service processor attempts to dump data that is necessary to analyze the error from appropriate parts of the system.
Note: If you power off the blade through the management module while the service processor is performing a dump, platform dump data is lost.
You might be asked to retrieve a dump to send it to IBM Support for analysis. The location of the dump data varies per operating system platform.
Chapter 2. Diagnostics 13
v Collect an AIX dump from the /var/adm/platform directory. v Collect a Linux dump from the /var/log/dump directory.
v Collect an Integrated Virtualization Manager (IVM) dump from the
IVM-managed JS12 blade server through the Manage Dumps task in the IVM console.

Location codes

Location codes identify components of the blade server. Location codes are displayed with some error codes to identify the blade server component that is causing the error.
See “System-board connectors” on page 9 for component locations.
Notes:
1. Location codes do not indicate the location of the blade server within the
BladeCenter unit. The codes identify components of the blade server only.
2. For checkpoints with no associated location code, see “Light path diagnostics”
on page 218 to identify the failing component when there is a hang condition.
3. For checkpoints with location codes, use the following table to identify the
failing component when there is a hang condition.
4. For 8-digit codes not listed in Table 2, see “Checkout procedure” on page 186.
Table 2. Location codes
Location code Component
Un location codes are for enclosure and VPD locations.
Un = Utttt.mmm.sssssss
tttt = system machine type mmm = system model number sssssss = system serial number
Un-P1 System-board and chassis assembly (Planar, FSP, SPCN,
CP0, P5IOC2)
Un-P1-C1 DIMM 1 (DIMM1A)
Un-P1-C2 DIMM 2 (DIMM1B)
Un-P1-C3 DIMM 3 (DIMM0A)
Un-P1-C4 DIMM 4 (DIMM0B)
Un-P1-C5 DIMM 5 (DIMM3B)
Un-P1-C6 DIMM 6 (DIMM3A)
Un-P1-C7 DIMM 7 (DIMM2B)
Un-P1-C8 DIMM 8 (DIMM2A)
Un-P1-C9 Management card (MGMT CRD)
Un-P1-C10 PCI-X expansion card (PIOCARD)
Un-P1-C11 PCIe high-speed expansion card (PIOCARD)
Un-P1-D1 Front SAS hard disk drive (SFF0)
Un-P1-D2 Rear SAS hard disk drive (SFF1)
Un-P1-E1 Battery (BATT)
14 JS12 Type 7998: Problem Determination and Service Guide
Table 2. Location codes (continued)
Location code Component
Um codes are for firmware. The format is the same as for a Un location code.
Um = Utttt.mmm.sssssss
Um-Y1 Firmware version

Reference codes

Reference codes are diagnostic aids that help you determine the source of a hardware or operating system problem. To use reference codes effectively, use them in conjunction with other service and support procedures.
The BladeCenter JS12 Type 7998 blade server produces several types of codes.
Progress codes: The power-on self-test (POST) generates eight-digit status codes that are known as checkpoints or progress codes, which are recorded in the management-module event log. The checkpoints indicate which blade server resource is initializing.
Error codes: The First Failure Data Capture (FFDC) error checkers capture fault data, which the baseboard management controller (BMC) service processor then analyzes. For unrecoverable errors (UEs), for recoverable events that meet or exceed their service thresholds, and for fatal system errors, an unrecoverable checkstop service event triggers the service processor to analyze the error, log the system reference code (SRC), and turn on the system attention LED.
The service processor logs the nine-word, eight-digit per word error code in the BladeCenter management-module event log. Error codes are either system reference codes (SRCs) or service request numbers (SRNs). A location code might also be included.
Isolation procedures: If the fault analysis does not determine a definitive cause, the service processor might indicate a fault isolation procedure that you can use to isolate the failing component.
Viewing the codes
The JS12 blade server does not display checkpoints or error codes on the remote console. The shared BladeCenter unit video also does not display the codes.
If the POST detects a problem, a 9-word, 8-digit error code is logged in the BladeCenter management-module event log. A location code that identifies a component might also be included. See “Error logs” on page 186 for information about viewing the management-module event log.
Service request numbers can be viewed using the AIX diagnostics CD, or various operating system utilities, such as AIX diagnostics or the Linux service aid “diagela”, if it is installed.
Chapter 2. Diagnostics 15

System reference codes (SRCs)

System reference codes indicate a server hardware or software problem that can originate in hardware, in firmware, or in the operating system.
A blade server component generates an error code when it detects a problem. An SRC identifies the component that generated the error code and describes the error. Use the SRC information to identify a list of possibly failing items and to find information about any additional isolation procedures.
The following table shows the syntax of a nine-word B700xxxx SRC as it might be displayed in the event log of the management module.
The first word of the SRC in this example is the message identifier, B7001111. This example numbers each word after the first word to show relative word positions. The seventh word is the direct select address, which is 77777777 in the example.
Table 3. Nine-word system reference code in the management-module event log
Index Sev Source Date/Time Text
(JS12-BC1BLD5E) SYS F/W: Error. Replace UNKNOWN (5008FECF B7001111 22222222 33333333 44444444 55555555 66666666 77777777 88888888 99999999)
1 E Blade_05
Depending on your operating system and the utilities you have installed, error messages might also be stored in an operating system log. See the documentation that comes with the operating system for more information.
01/21/2008, 17:15:14
The management module can display the most recent 32 SRCs and time stamps. Manually refresh the list to update it.
Select Blade Service Data blade_name in the management module to see a list of the 32 most recent SRCs.
Table 4. Management module reference code listing
Unique ID System Reference Code Timestamp
00040001 D1513901 2005-11-13 19:30:20
00000016 D1513801 2005-11-13 19:30:16
Any message with more detail is highlighted as a link in the System Reference Code column. Click the message to cause the management module to present the additional message detail:
D1513901 Created at: 2007-11-13 19:30:20 SRC Version: 0x02 Hex Words 2-5: 020110F0 52298910 C1472000 200000FF
16 JS12 Type 7998: Problem Determination and Service Guide
Loading...
+ 288 hidden pages