Before using this information and the product it supports, read the general information in
Appendix B, “Notices,” on page 289 and the Warranty and Support Information document for your
blade server type on the Documentation CD.
Antes de instalar este producto, lea la información de seguridad.
Läs säkerhetsinformationen innan du installerar den här produkten.
Guidelines for trained service technicians
Inspect the equipment for unsafe conditions and observe the servicing guidelines.
Inspecting for unsafe conditions
Identify potential unsafe conditions in an IBM®product that you are working on.
Each IBM product, as it was designed and manufactured, has required safety items
to protect users and service technicians from injury. This information addresses
only those items. Use good judgment to identify potential unsafe conditions that
might be caused by non-IBM alterations or attachment of non-IBM features or
options that are not addressed in this information. If you identify an unsafe
condition, you must determine how serious the hazard is and whether you must
correct the problem before you work on the product.
Consider the following conditions and the safety hazards that they present:
v Electrical hazards, especially primary power. Primary voltage on the frame can
cause serious or fatal electrical shock.
v Explosive hazards, such as a damaged CRT face or a bulging capacitor.
v Mechanical hazards, such as loose or missing hardware.
To inspect the product for potential unsafe conditions, complete the following
steps:
1. Make sure that the power is off and the power cords are disconnected.
2. Make sure that the exterior cover is not damaged, loose, or broken, and observe
any sharp edges.
3. Check the power cords:
v Make sure that the third-wire ground connector is in good condition. Use a
meter to measure third-wire ground continuity for 0.1 ohm or less between
the external ground pin and the frame ground.
v Make sure that the power cords are the correct type.
v Make sure that the insulation is not frayed or worn.
4. Remove the cover.
5. Check for any obvious non-IBM alterations. Use good judgment as to the safety
of any non-IBM alterations.
6. Check inside the computer for any obvious unsafe conditions, such as metal
filings, contamination, water or other liquid, or signs of fire or smoke damage.
7. Check for worn, frayed, or pinched cables.
8. Make sure that the power-supply cover fasteners (screws or rivets) have not
been removed or tampered with.
viJS12 Type 7998: Problem Determination and Service Guide
Guidelines for servicing electrical equipment
Observe the guidelines for servicing electrical equipment.
v Check the area for electrical hazards such as moist floors, nongrounded power
extension cords, and missing safety grounds.
v Use only approved tools and test equipment. Some hand tools have handles that
are covered with a soft material that does not provide insulation from live
electrical current.
v Regularly inspect and maintain your electrical hand tools for safe operational
condition. Do not use worn or broken tools or testers.
v Do not touch the reflective surface of a dental mirror to a live electrical circuit.
The surface is conductive and can cause personal injury or equipment damage if
it touches a live electrical circuit.
v Some rubber floor mats contain small conductive fibers to decrease electrostatic
discharge. Do not use this type of mat to protect yourself from electrical shock.
v Do not work alone under hazardous conditions or near equipment that has
hazardous voltages.
v Locate the emergency power-off (EPO) switch, disconnecting switch, or electrical
outlet so that you can turn off the power quickly in the event of an electrical
accident.
v Disconnect all power before you perform a mechanical inspection, work near
power supplies, or remove or install main units.
v Before you work on the equipment, disconnect the power cord. If you cannot
disconnect the power cord, have the customer power-off the wall box that
supplies power to the equipment and lock the wall box in the off position.
v Never assume that power has been disconnected from a circuit. Check it to
make sure that it has been disconnected.
v If you have to work on equipment that has exposed electrical circuits, observe
the following precautions:
– Make sure that another person who is familiar with the power-off controls is
near you and is available to turn off the power if necessary.
– When you are working with powered-on electrical equipment, use only one
hand. Keep the other hand in your pocket or behind your back to avoid
creating a complete circuit that could cause an electrical shock.
– When using a tester, set the controls correctly and use the approved probe
leads and accessories for that tester.
– Stand on a suitable rubber mat to insulate you from grounds such as metal
floor strips and equipment frames.
v Use extreme care when measuring high voltages.
v To ensure proper grounding of components such as power supplies, pumps,
blowers, fans, and motor generators, do not service these components outside of
their normal operating locations.
v If an electrical accident occurs, use caution, turn off the power, and send another
person to get medical aid.
Safetyvii
Safety statements
Important: Each caution and danger statement in this documentation is labeled
with a number. This number is used to cross reference an English-language caution
or danger statement with translated versions of the caution or danger statement in
the Safety Information document.
For example, if a caution statement is labeled, ″Statement 1,″ translations for that
caution statement are in the Safety Information document under ″Statement 1.″ Be
sure to read all caution and danger statements in this documentation before you
perform the procedures. Read any additional safety information that comes with
your blade server or optional device before you install the device.
Statement 1
DANGER
Electrical current from power, telephone, and communication cables is
hazardous.
To avoid a shock hazard:
v Do not connect or disconnect any cables or perform installation,
maintenance, or reconfiguration of this product during an electrical storm.
v Connect all power cords to a properly wired and grounded electrical outlet.
v Connect to properly wired outlets any equipment that will be attached to
this product.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or
structural damage.
v Disconnect the attached power cords, telecommunications systems,
networks, and modems before you open the device covers, unless
instructed otherwise in the installation and configuration procedures.
v Connect and disconnect cables as described in the following table when
installing, moving, or opening covers on this product or attached devices.
To Connect:To Disconnect:
1. Turn everything OFF.
2. First, attach all cables to devices.
3. Attach signal cables to connectors.
4. Attach power cords to outlet.
5. Turn device ON.
1. Turn everything OFF.
2. First, remove power cords from outlet.
3. Remove signal cables from connectors.
4. Remove all cables from devices.
viiiJS12 Type 7998: Problem Determination and Service Guide
Statement 2
CAUTION:
When replacing the lithium battery, use only IBM Part Number 16G8095 or an
equivalent type battery recommended by the manufacturer. If your system has a
module containing a lithium battery, replace it only with the same module type
made by the same manufacturer. The battery contains lithium and can explode if
not properly used, handled, or disposed of.
Do not:
v Throw or immerse into water
v Heat to more than 100°C (212°F)
v Repair or disassemble
Dispose of the battery as required by local ordinances or regulations.
Statement 3
CAUTION:
When laser products (such as CD-ROMs, DVD drives, fiber optic devices, or
transmitters) are installed, note the following:
v Do not remove the covers. Removing the covers of the laser product could
result in exposure to hazardous laser radiation. There are no serviceable parts
inside the device.
v Use of controls or adjustments or performance of procedures other than those
specified herein might result in hazardous radiation exposure.
DANGER
Some laser products contain an embedded Class 3A or Class 3B laser diode.
Note the following.
Laser radiation when open. Do not stare into the beam, do not view directly
with optical instruments, and avoid direct exposure to the beam.
Safetyix
Statement 4
≥ 18 kg (39.7 lb)≥ 32 kg (70.5 lb)≥ 55 kg (121.2 lb)
CAUTION:
Use safe practices when lifting.
Statement 5
CAUTION:
The power control button on the device and the power switch on the power
supply do not turn off the electrical current supplied to the device. The device
also might have more than one power cord. To remove all electrical current from
the device, ensure that all power cords are disconnected from the power source.
12
Statement 8
xJS12 Type 7998: Problem Determination and Service Guide
CAUTION:
Never remove the cover on a power supply or any part that has the following
label attached.
Hazardous voltage, current, and energy levels are present inside any component
that has this label attached. There are no serviceable parts inside these
components. If you suspect a problem with one of these parts, contact a service
technician.
Statement 10
CAUTION:
Do not place any object on top of rack-mounted devices.
Safetyxi
xiiJS12 Type 7998: Problem Determination and Service Guide
Chapter 1. Introduction
This problem determination and service information helps you solve problems that
might occur in your IBM BladeCenter®JS12 Type 7998 blade server. The
information describes the diagnostic tools that come with the blade server, error
codes and suggested actions, and instructions for replacing failing components.
Replaceable components are of three types:
v Tier 1 customer replaceable unit (CRU): Replacement of Tier 1 CRUs is your
responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged
for the installation.
v Tier 2 customer replaceable unit: You may install a Tier 2 CRU yourself or
request IBM to install it, at no additional charge, under the type of warranty
service that is designated for your blade server.
v Field replaceable unit (FRU): FRUs must be installed only by trained service
technicians.
For information about the terms of the warranty and getting service and assistance,
see the Warranty and Support Information document.
Related documentation
Documentation for the JS12 blade server includes documents in Portable Document
Format (PDF) on the IBM BladeCenter Documentation CD and the online information
center.
The most recent version of all BladeCenter documentation is in the BladeCenter
information center.
The online BladeCenter information center is available in the IBM Systems
Information Center.
You can find the following documents in PDF on the IBM BladeCenterDocumentation CD and in the online information center:
v Installation and User’s Guide
This document contains general information about the blade server, including
how to install supported options and how to configure the blade server.
v Safety Information
This document contains translated caution and danger statements. Each caution
and danger statement that appears in the documentation has a number that you
can use to locate the corresponding statement in your language in the SafetyInformation document.
v Warranty and Support Information
This document contains information about the terms of the warranty and about
getting service and assistance.
Additional documents might be included in the online information center and on
the IBM BladeCenter Documentation CD.
The blade server might have features that are not described in the documentation
that comes with the blade server. The documentation might be updated
occasionally to include information about those features, or technical updates
might be available to provide additional information that is not included in the
documentation that comes with the blade server.
Review the online information or the Planning Guide and the Installation Guide for
your IBM BladeCenter unit. The information can help you prepare for system
installation and configuration. The most current version of each document is
available in the BladeCenter information center.
Notices and statements in this documentation
The caution and danger statements in this document are also in the multilingual
Safety Information. Each statement is numbered for reference to the corresponding
statement in your language in the Safety Information document.
The following notices and statements are used in this document:
v Note: These notices provide important tips, guidance, or advice.
v Important: These notices provide information or advice that might help you
avoid inconvenient or problem situations.
v Attention: These notices indicate potential damage to programs, devices, or data.
An attention notice is placed just before the instruction or situation in which
damage might occur.
v Caution: These statements indicate situations that can be potentially hazardous
to you. A caution statement is placed just before the description of a potentially
hazardous procedure step or situation.
v Danger: These statements indicate situations that can be potentially lethal or
extremely hazardous to you. A danger statement is placed just before the
description of a potentially lethal or extremely hazardous procedure step or
situation.
Features and specifications
Features and specifications of the IBM BladeCenter JS12 Type 7998 blade server are
summarized in this overview.
2JS12 Type 7998: Problem Determination and Service Guide
The JS12 blade server is used in one of the following IBM BladeCenter units:
BladeCenter E (8677), BladeCenter H (8852), BladeCenter HT (8740 and 8750),
BladeCenter S (8886), and BladeCenter T (8720 and 8730) units.
Notes:
v Power, cooling, removable-media drives, external ports, and advanced system
management are provided by the BladeCenter unit.
v The operating system in the blade server must provide support for the Universal
Serial Bus (USB), to enable the blade server to recognize and communicate
internally with the removable-media drives and front-panel USB ports.
Microprocessor:
Support for one dual-core, 64-bit
POWER6
Support for Energy Scale thermal
management for power
management/oversubscription
(throttling) and environmental
sensing
Memory:
v Dual-channel (DDR2) with 8 slots
v Supports 1 GB, 2 GB, 4 GB, and 8
v Supports 2-way interleaved, DDR2,
Virtualization:
PowerVM Standard Edition hardware
feature supports Integrated
Virtualization Manager and Virtual
I/O Server
®
microprocessor; 3.8 GHz
for very low profile (18.3 mm)
DIMMs
GB DDR2 DIMMs for a maximum
of 64 GB
PC2-4200 or PC2-5300, ECC
SDRAM registered x4, memory
scrubbing, Chipkill, and bit steering
DIMMs
Integrated functions:
v Two 1 Gigabit Ethernet controllers
v Expansion card interface
v The baseboard management
controller (BMC) is a flexible
service processor with Intelligent
Platform Management Interface
(IPMI) firmware and SOL support
v ATI RN 50 ES1000 video controller
v SAS RAID controller
v Light path diagnostics
v RS-485 interface for
communication with the
management module
v Automatic server restart (ASR)
v Serial over LAN (SOL)
v Support for local keyboard and
video
v Four Universal Serial Bus (USB)
buses for communication with
keyboard and removable-media
drives
v Transferable Anchor function
(Renesas Technology HD651330
microcontroller) in the
management card
Storage:
Predictive Failure Analysis (PFA)
alerts:
v Microprocessor
v Memory
Electrical input: 12Vdc
Environment:
v Air temperature:
– Blade server on: 10° to 35°C (50°
to 95°F). Altitude: 0 to 914 m
(3000 ft)
– Blade server on: 10° to 32°C (50°
to 90°F). Altitude: 914 m to 2133
m (3000 ft to 7000 ft)
– Blade server off: -40° to 60°C (-40°
to 140°F)
v Humidity:
– Blade server on: 8% to 80%
– Blade server off: 8% to 80%
Size:
v Height: 24.5 cm (9.7 inches)
v Depth: 44.6 cm (17.6 inches)
v Width: 2.9 cm (1.14 inches)
v Maximum weight: 5.0 kg (11 lb)
Support for two internal
small-form-factor (SFF) Serial
Attached SCSI (SAS) drives
See the ServerProven Web site for information about supported operating-system
versions and all JS12 blade server optional devices.
Chapter 1. Introduction3
Supported DIMMs
The BladeCenter JS12 Type 7998 blade server contains eight memory connectors for
industry-standard registered, dual-inline-memory modules (RDIMMs). The DIMMS
are very low profile, which means that each DIMM has a height of 18.3 millimeters
(mm). Total memory can range from a minimum of 2 gigabytes (GB) to a
maximum of 64 GB.
See Chapter 3, “Parts listing, Type 7998,” on page 235 for memory modules that
you can order from IBM.
Memory module rules:
v Install DIMMs in pairs in the following connectors to have a supported (tested)
Table 1. Supported use of DIMMs
DIMM
Connectors
Pair 1 (DIMM 1
and DIMM 3)
Pair 2 (DIMM 6
and DIMM 8)
Pair 3 (DIMM 2
and DIMM 4)
Pair 4 (DIMM 5
and DIMM 7)
configuration:
Number of DIMMs in Use
TwoFourSixEight
YesYesYesYes
NoYesYesYes
NoNoYesYes
NoNoNoYes
See “System-board connectors” on page 9 for DIMM connector locations.
v Both DIMMs in a pair must be the same size, speed, type, technology, and
physical design. You can mix compatible DIMMs from different manufacturers.
Each DIMM in each of the following sets of four connectors must be the same
size:
Size 1 DIMM 1 and DIMM 3 (pair 1) and DIMM 2 and DIMM 4 (pair 3) when
using 6 or 8 DIMMs
Size 2 DIMM 5 and DIMM 7 (pair 4) and DIMM 6 and DIMM 8 (pair 2) when
using 8 DIMMs
v When using 4 DIMMs in DIMM 1 and DIMM 3 (pair 1) and DIMM 6 and
DIMM 8 (pair 2), DIMMs in the second pair can differ in size and speed from
the first pair.
v When using 8 GB DIMMs, all of the DIMMS used must be 8 GB.
®
v Install only supported DIMMs, as described on the ServerProven
v Installing or removing DIMMs changes the configuration of the blade server.
After you install or remove a DIMM, the blade server is automatically
reconfigured, and the new configuration information is stored.
4JS12 Type 7998: Problem Determination and Service Guide
Blade server control panel buttons and LEDs
Blade server control panel buttons and LEDs provide operational controls and
status indicators.
Note: Figure 1 shows the control-panel door in the closed (normal) position. To
access the power-control button, you must open the control-panel door.
Keyboard/video select button
Media-tray select button
MT
Location LED
Activity LED
Power-on LED
Sleep (not used on
blade server)
Figure 1. Blade server control panel buttons and LEDs
Keyboard/video select button: When you use an operating system that supports a
local console and keyboard, press this button to associate the shared BladeCenter
unit keyboard and video ports with the blade server.
Information LED
Blade-error LED
Power-control button
NMI reset
Notes:
v The operating system in the blade server must provide USB support for the
blade server to recognize and use the keyboard, even if the keyboard has a
PS/2-style connector.
v The keyboard and video are available after partition firmware loads and is
running. Power-on self-test (POST) codes and diagnostics are not supported
using the keyboard and video. Use the management module to view
checkpoints.
The LED on this button flashes while the request is being processed, then is lit
when the ownership of the keyboard and video has been transferred to the blade
server. It can take approximately 20 seconds to switch control of the keyboard and
video to the blade server.
Using a keyboard that is directly attached to the management module, you can
press keys in the following sequence to switch keyboard and video control
between blade servers:
NumLock NumLock blade_server_number Enter
Chapter 1. Introduction5
Where blade_server_number is the two-digit number for the blade bay in which
the blade server is installed. When you use some keyboards, such as the
28L3644 (37L0888) keyboard, hold down the Shift key while you enter this key
sequence.
If there is no response when you press the keyboard/video select button, you can
use the Web interface of the management module to determine whether local
control has been disabled on the blade server.
Media-tray select button: Press this button to associate the shared BladeCenter
unit media tray (removable-media drives and front-panel USB ports) with the
blade server. The LED on the button flashes while the request is being processed,
then is lit when the ownership of the media tray has been transferred to the blade
server. It can take approximately 20 seconds for the operating system in the blade
server to recognize the media tray.
If there is no response when you press the media-tray select button, use the
management module to determine whether local control has been disabled on the
blade server.
Note: The operating system in the blade server must provide USB support for the
blade server to recognize and use the removable-media drives and USB ports.
Information LED: When this amber LED is lit, it indicates that information about a
system error for the blade server has been placed in the management-module
event log. The information LED can be turned off through the Web interface of the
management module or through IBM Director Console.
Blade-error LED: When this amber LED is lit, it indicates that a system error has
occurred in the blade server. The blade-error LED will turn off after one of the
following events:
v Correcting the error
v Reseating the blade server in the BladeCenter unit
v Cycling the BladeCenter unit power
Power-control button: This button is behind the control panel door. Press this
button to turn on or turn off the blade server.
The power-control button has effect only if local power control is enabled for the
blade server. Local power control is enabled and disabled through the Web
interface of the management module.
Press the power button for 5 seconds to begin powering down the blade server.
6JS12 Type 7998: Problem Determination and Service Guide
NMI reset (recessed): The nonmaskable interrupt (NMI) reset dumps the partition.
Use this recessed button only as directed by IBM Support.
Power-on LED: This green LED indicates the power status of the blade server in
the following manner:
v Flashing rapidly: The service processor (BMC) is initializing the blade server.
v Flashing slowly: The blade server has completed initialization and is waiting for
a power-on command.
v Lit continuously: The blade server has power and is turned on.
Note: The enhanced service processor (BMC) can take as long as three minutes to
initialize after you install the BladeCenter JS12 blade server, at which point the
LED begins to flash slowly.
Activity LED: When this green LED is lit, it indicates that there is activity on the
hard disk drive or network.
Location LED: When this blue LED is lit, it has been turned on by the system
administrator to aid in visually locating the blade server. The location LED can be
turned off through the Web interface of the management module or through IBM
Director Console.
Turning on the blade server
After you connect the blade server to power through the BladeCenter unit, you can
start the blade server after the discovery and initialization process is complete.
You can start the blade server in any of the following ways.
v Start the blade server by pressing the power-control button on the front of the
blade server.
The power-control button is behind the control panel door, as described in
“Blade server control panel buttons and LEDs” on page 5.
After you push the power-control button, the power-on LED continues to blink
slowly for about 15 seconds, then is lit solidly when the power-on process is
complete.
Wait until the power-on LED on the blade server flashes slowly before you press
the blade server power-control button. If the power-on LED is flashing rapidly,
the service processor is initializing the blade server. The power-control button
does not respond during initialization.
Note: The enhanced service processor (BMC) can take as long as three minutes
to initialize after you install the BladeCenter JS12 blade server, at which point
the LED begins to flash slowly.
Chapter 1. Introduction7
v Start the blade server automatically when power is restored after a power
failure.
If a power failure occurs, the BladeCenter unit and then the blade server can
start automatically when power is restored. You must configure the blade server
to restart through the management module.
v Start the blade server remotely using the management module.
After you initiate the power-on process, the power-on LED blinks slowly for
about 15 seconds, then is lit solidly when the power-on process is complete.
Turning off the blade server
When you turn off the blade server, it is still connected to power through the
BladeCenter unit. The blade server can respond to requests from the service
processor, such as a remote request to turn on the blade server. To remove all
power from the blade server, you must remove it from the BladeCenter unit.
Shut down the operating system before you turn off the blade server. See the
operating-system documentation for information about shutting down the
operating system.
You can turn off the blade server in one of the following ways.
v Turn off the blade server by pressing the power-control button for at least 5
seconds.
The power-control button is on the blade server behind the control panel door.
See “Blade server control panel buttons and LEDs” on page 5 for the location.
Note: The power-control LED can remain on solidly for up to 1 minute after
you push the power-control button. After you turn off the blade server, wait
until the power-control LED is blinking slowly before you press the
power-control button to turn on the blade server again.
If the operating system stops functioning, press and hold the power-control
button for more than 5 seconds to force the blade server to turn off.
v Use the management module to turn off the blade server.
The power-control LED can remain on solidly for up to 1 minute after you
initiate the power-off process. After you turn off the blade server, wait until the
power-control LED is blinking slowly before you initiate the power-on process
from the advanced management module to turn on the blade server again.
Use the management-module Web interface to configure the management
module to turn off the blade server if the system is not operating correctly.
For additional information, see the online documentation or the User’s Guide for
the management module.
8JS12 Type 7998: Problem Determination and Service Guide
System-board layouts
Illustrations show the connectors and LEDs on the system board. The illustrations
might differ slightly from your hardware.
System-board connectors
Blade server components attach to the connectors on the system board.
Figure 2 shows the connectors on the system board in the blade server.
Control panel
connector
SAS drive (P1-D1)
DIMM 1 (P1-C1)
DIMM 2 (P1-C2)
DIMM 3 (P1-C3)
DIMM 4 (P1-C4)
SAS drive (P1-D2)
PCI-X expansion
card (P1-C10)
PCI-X expansion
card (P1-C10)
PCI-E high-speed
expansion card
(P1-C11)
Management
card (P1-C9)
Battery (P1-E1)
Figure 2. System-board connectors
System-board LEDs
Use the illustration of the LEDs on the system board to identify a light emitting
diode (LED).
DIMM 5 (P1-C5)
DIMM 6 (P1-C6)
DIMM 7 (P1-C7)
DIMM 8 (P1-C8)
Chapter 1. Introduction9
Remove the blade server from the BladeCenter unit, open the cover to see any
error LEDs that were turned on during error processing, and use Figure 3 to
identify the failing component.
Front SAS drive
error LED (P1-D1)
System board
error LED (P1)
Battery error
LED (P1-E1)
Figure 3. System-board LEDs
Power LED
(always on when plugged in)
PCIe high-speed
expansion card
error LED (P1-C11)
Management
card error LED
(P1-C9)
DIMM 1 error LED
(P1-C1)
DIMM 2 error LED
(P1-C2)
DIMM 3 error LED
(P1-C3)
DIMM 4 error LED
(P1-C4)
PCI-X expansion card
error LED (P1-C10)
DIMM 5 error LED
(P1-C5)
DIMM 6 error LED
(P1-C6)
DIMM 7 error LED
(P1-C7)
DIMM 8 error LED
(P1-C8)
10JS12 Type 7998: Problem Determination and Service Guide
Chapter 2. Diagnostics
Use the available diagnostic tools to help solve any problems that might occur in
the blade server.
The first and most crucial component of a solid serviceability strategy is the ability
to accurately and effectively detect errors when they occur. While not all errors are
a threat to system availability, those that go undetected are dangerous because the
system does not have the opportunity to evaluate and act if necessary. POWER6
processor-based systems are specifically designed with error-detection mechanisms
that extend from processor cores and memory to power supplies and hard drives.
POWER6 processor-based systems contain specialized hardware detection circuitry
for detecting erroneous hardware operations. Error checking hardware ranges from
parity error detection coupled with processor instruction retry and bus retry, to
ECC correction on caches and system buses.
IBM hardware error checkers have these distinct attributes:
v Continuous monitoring of system operations to detect potential calculation
errors
v Attempted isolation of physical faults based on runtime detection of each unique
failure
v Initiation of a wide variety of recovery mechanisms designed to correct a
problem
POWER6 processor-based systems include extensive hardware and firmware
recovery logic.
Machine check handling
Machine checks are handled by firmware. When a machine check occurs, the
firmware analyzes the error to identify the failing device and creates an error log
entry.
If the system degrades to the point that the service processor cannot reach standby
state, the ability to analyze the error does not exist. If the error occurs during
POWER
In partitioned mode, an error that occurs during partition activity is surfaced to the
operating system in the partition.
®
hypervisor (PHYP) activities, the PHYP initiates a system reboot.
If you cannot locate and correct the problem using the diagnostics tools and
information, see Appendix A, “Getting help and technical assistance,” on page 285.
Tools are available to help you diagnose and solve hardware-related problems.
v Power-on self-test (POST) progress codes (checkpoints), error codes, and
isolation procedures
The POST checks out the hardware at system initialization. IPL diagnostic
functions test some system components and interconnections. The POST
generates eight-digit checkpoints to mark the progress of powering up the blade
server.
Use the management module to view progress codes.
The documentation of a progress code includes recovery actions for system
hangs. See “POST progress codes (checkpoints)” on page 88 for more
information.
If the service processor detects a problem during POST, an error code is logged
in the management module event log. Error codes are also logged in the Linux
syslog or AIX®diagnostic log, if possible. See “System reference codes (SRCs)”
on page 16.
The service processor can generate codes that point to specific isolation
procedures. See “Service processor problems” on page 205.
v Light path diagnostics
Use the light path diagnostic LEDs on the system board to identify failing
hardware. If the system error LED on the system LED panel on the front or rear
of the BladeCenter unit is lit, one or more error LEDs on the BladeCenter unit
components also might be lit.
Light path diagnostics help identify failing customer replaceable unit (CRUs).
CRU location codes are included in error codes and the event log.
LED locations
See “System-board LEDs” on page 9.
Front panel
See “Blade server control panel buttons and LEDs” on page 5.
v Troubleshooting tables
Use the troubleshooting tables to find solutions to problems that have
identifiable symptoms.
See “Troubleshooting tables” on page 194.
v Dump data collection
In some circumstances, an error might require a dump to show more data. The
Integrated Virtual Manager (IVM) sets up a dump area. Specific IVM
information is included as part of the information that can optionally be sent to
IBM support for analysis.
See “Collecting dump data” on page 13 for more information.
v Stand-alone diagnostics
The AIX-based stand-alone Diagnostics CD is in the ship package and is also
available from the IBM Web site. Boot the CD from a CD drive or from an AIX
network installation manager (NIM) server if the blade server cannot boot to an
operating system, no matter which operating system is installed.
®
12JS12 Type 7998: Problem Determination and Service Guide
Functions provided by the stand-alone diagnostics include:
– Analysis of errors reported by platform, such as microprocessor and memory
– Testing of resources, such as I/O adapters and devices
– Service aids, such as firmware update, format disk, and Raid Manager
v Diagnostic utilities for the AIX operating system
Run AIX concurrent diagnostics if AIX is functioning instead of the stand-alone
diagnostics. Functions provided by disk-based AIX diagnostic include:
– Automatic error log analysis
– Analysis of errors reported by platform, such as microprocessor and memory
– Testing of resources, such as I/O adapters and devices
– Service aids, such as firmware update, format disk, and Raid Manager
v Diagnostic utilities for Linux operating systems
Linux on POWER service and productivity tools include hardware diagnostic
aids and productivity tools, and installation aids. The installation aids are
provided in the IBM Installation Toolkit for Linux on POWER, a set of tools that
aids the installation of Linux on IBM servers with POWER architecture. You can
also use the tools to update the JS12 blade server firmware.
Diagnostic utilities for the Linux operating system are available from IBM at
https://www14.software.ibm.com/webapp/set2/sas/f/lopdiags/home.html.
v Diagnostic utilities for other operating systems
You can use the stand-alone Diagnostics CD to perform diagnostics on the JS12
blade server, no matter which operating system is loaded on the blade server.
However, other supported operating systems might have diagnostic tools that
are available through the operating system. See the documentation for your
operating system for more information.
Collecting dump data
A dump might be critical for fault isolation when the built-in First Failure Data
Capture (FFDC) mechanisms are not capturing sufficient fault data. Even when a
fault is identified, dump data can provide additional information that is useful in
problem determination.
All hardware state information is part of the dump if a hardware checkstop occurs.
When a checkstop occurs, the service processor attempts to dump data that is
necessary to analyze the error from appropriate parts of the system.
Note: If you power off the blade through the management module while the
service processor is performing a dump, platform dump data is lost.
You might be asked to retrieve a dump to send it to IBM Support for analysis. The
location of the dump data varies per operating system platform.
Chapter 2. Diagnostics13
v Collect an AIX dump from the /var/adm/platform directory.
v Collect a Linux dump from the /var/log/dump directory.
v Collect an Integrated Virtualization Manager (IVM) dump from the
IVM-managed JS12 blade server through the Manage Dumps task in the IVM
console.
Location codes
Location codes identify components of the blade server. Location codes are
displayed with some error codes to identify the blade server component that is
causing the error.
See “System-board connectors” on page 9 for component locations.
Notes:
1. Location codes do not indicate the location of the blade server within the
BladeCenter unit. The codes identify components of the blade server only.
2. For checkpoints with no associated location code, see “Light path diagnostics”
on page 218 to identify the failing component when there is a hang condition.
3. For checkpoints with location codes, use the following table to identify the
failing component when there is a hang condition.
4. For 8-digit codes not listed in Table 2, see “Checkout procedure” on page 186.
Table 2. Location codes
Location codeComponent
Un location codes are for enclosure and VPD locations.
Un = Utttt.mmm.sssssss
tttt = system machine type
mmm = system model number
sssssss = system serial number
Un-P1System-board and chassis assembly (Planar, FSP, SPCN,
CP0, P5IOC2)
Un-P1-C1DIMM 1 (DIMM1A)
Un-P1-C2DIMM 2 (DIMM1B)
Un-P1-C3DIMM 3 (DIMM0A)
Un-P1-C4DIMM 4 (DIMM0B)
Un-P1-C5DIMM 5 (DIMM3B)
Un-P1-C6DIMM 6 (DIMM3A)
Un-P1-C7DIMM 7 (DIMM2B)
Un-P1-C8DIMM 8 (DIMM2A)
Un-P1-C9Management card (MGMT CRD)
Un-P1-C10PCI-X expansion card (PIOCARD)
Un-P1-C11PCIe high-speed expansion card (PIOCARD)
Un-P1-D1Front SAS hard disk drive (SFF0)
Un-P1-D2Rear SAS hard disk drive (SFF1)
Un-P1-E1Battery (BATT)
14JS12 Type 7998: Problem Determination and Service Guide
Table 2. Location codes (continued)
Location codeComponent
Um codes are for firmware. The format is the same as for a Un location code.
Um = Utttt.mmm.sssssss
Um-Y1Firmware version
Reference codes
Reference codes are diagnostic aids that help you determine the source of a
hardware or operating system problem. To use reference codes effectively, use them
in conjunction with other service and support procedures.
The BladeCenter JS12 Type 7998 blade server produces several types of codes.
Progress codes: The power-on self-test (POST) generates eight-digit status codes
that are known as checkpoints or progress codes, which are recorded in the
management-module event log. The checkpoints indicate which blade server
resource is initializing.
Error codes: The First Failure Data Capture (FFDC) error checkers capture fault
data, which the baseboard management controller (BMC) service processor then
analyzes. For unrecoverable errors (UEs), for recoverable events that meet or
exceed their service thresholds, and for fatal system errors, an unrecoverable
checkstop service event triggers the service processor to analyze the error, log the
system reference code (SRC), and turn on the system attention LED.
The service processor logs the nine-word, eight-digit per word error code in the
BladeCenter management-module event log. Error codes are either system referencecodes (SRCs) or service request numbers (SRNs). A location code might also be
included.
Isolation procedures: If the fault analysis does not determine a definitive cause,
the service processor might indicate a fault isolation procedure that you can use to
isolate the failing component.
Viewing the codes
The JS12 blade server does not display checkpoints or error codes on the remote
console. The shared BladeCenter unit video also does not display the codes.
If the POST detects a problem, a 9-word, 8-digit error code is logged in the
BladeCenter management-module event log. A location code that identifies a
component might also be included. See “Error logs” on page 186 for information
about viewing the management-module event log.
Service request numbers can be viewed using the AIX diagnostics CD, or various
operating system utilities, such as AIX diagnostics or the Linux service aid
“diagela”, if it is installed.
Chapter 2. Diagnostics15
System reference codes (SRCs)
System reference codes indicate a server hardware or software problem that can
originate in hardware, in firmware, or in the operating system.
A blade server component generates an error code when it detects a problem. An
SRC identifies the component that generated the error code and describes the error.
Use the SRC information to identify a list of possibly failing items and to find
information about any additional isolation procedures.
The following table shows the syntax of a nine-word B700xxxx SRC as it might be
displayed in the event log of the management module.
The first word of the SRC in this example is the message identifier, B7001111. This
example numbers each word after the first word to show relative word positions.
The seventh word is the direct select address, which is 77777777 in the example.
Table 3. Nine-word system reference code in the management-module event log
Depending on your operating system and the utilities you have installed, error
messages might also be stored in an operating system log. See the documentation
that comes with the operating system for more information.
01/21/2008,
17:15:14
The management module can display the most recent 32 SRCs and time stamps.
Manually refresh the list to update it.
Select Blade Service Data → blade_name in the management module to see a list of
the 32 most recent SRCs.
Table 4. Management module reference code listing
Unique IDSystem Reference CodeTimestamp
00040001D15139012005-11-13 19:30:20
00000016D15138012005-11-13 19:30:16
Any message with more detail is highlighted as a link in the System Reference
Code column. Click the message to cause the management module to present the
additional message detail:
D1513901
Created at: 2007-11-1319:30:20
SRC Version: 0x02
Hex Words 2-5: 020110F0 52298910 C1472000 200000FF
16JS12 Type 7998: Problem Determination and Service Guide
Loading...
+ 288 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.