Problem Determination and Ser vice
Guide for the
IBM Power PS700 (8406-70Y)
GI11-9831-00
Power Systems
Problem Determination and Ser vice
Guide for the
IBM Power PS700 (8406-70Y)
GI11-9831-00
Note
Before using this information and the product it supports, read the information in “Notices,” on page 271, “Safety notices”
on page v, the IBM Systems Safety Notices manual, G229-9054, and the IBM Environmental Notices and User Guide, Z125–5823.
This edition applies to IBM Power Systems servers that contain the POWER7 processor and to all associated
models.
Installing the blade server in a BladeCenter unit236
Removing and replacing Tier 1 CRUs .....237
Removing the blade server cover ......237
Installing and closing the blade server cover . . 239
Removing the bezel assembly .......240
Installing the bezel assembly .......240
Removing a drive ...........241
Installing a drive ...........242
Removing a memory module .......244
Installing a memory module .......245
Removing and installing an I/O expansion card 246
Removing a CIOv form-factor expansion card 247
Installing a CIOv form-factor expansion card 247
Removing a combination-form-factor
expansion card ...........249
Installing a combination-form-factor
expansion card ...........249
Removing the battery.........250
Installing the battery ..........251
Removing the disk drive tray .......252
Installing the disk drive tray .......253
Removing the tier 2 management card .....255
Installing the tier 2 management card .....256
Obtaining a PowerVM Virtualization Engine
system technologies activation code ......257
Replacing the FRU system-board and chassis
assembly ...............260
Chapter 5. Configuring .......263
Updating the firmware ..........263
Configuring the blade server ........264
Using the SMS utility...........265
Starting the SMS utility .........265
SMS utility menu choices ........265
Creating a CE login ...........265
Configuring the Gigabit Ethernet controllers . . . 266
Blade server Ethernet controller enumeration . . . 267
MAC addresses for host Ethernet adapters. . . 267
Configuring a RAID array .........268
Updating IBM Director ..........268
Appendix. Notices .........271
Trademarks ..............272
Electronic emission notices .........273
Class A Notices............273
Class B Notices............277
Terms and conditions...........280
ivPower Systems: Problem Determination and Service Guide for the IBM Power PS700 (8406-70Y)
Safety notices
Safety notices may be printed throughout this guide:
v DANGER notices call attention to a situation that is potentially lethal or extremely hazardous to
people.
v CAUTION notices call attention to a situation that is potentially hazardous to people because of some
existing condition.
v Attention notices call attention to the possibility of damage to a program, device, system, or data.
World Trade safety information
Several countries require the safety information contained in product publications to be presented in their
national languages. If this requirement applies to your country, a safety information booklet is included
in the publications package shipped with the product. The booklet contains the safety information in
your national language with references to the U.S. English source. Before using a U.S. English publication
to install, operate, or service this product, you must first become familiar with the related safety
information in the booklet. You should also refer to the booklet any time you do not clearly understand
any safety information in the U.S. English publications.
German safety information
Das Produkt ist nicht für den Einsatz an Bildschirmarbeitsplätzen im Sinne§2der
Bildschirmarbeitsverordnung geeignet.
Laser safety information
IBM®servers can use I/O cards or features that are fiber-optic based and that utilize lasers or LEDs.
Laser compliance
IBM servers may be installed inside or outside of an IT equipment rack.
When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To
avoid a shock hazard:
v Connect power to this unit only with the IBM provided power cord. Do not use the IBM
provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration
of this product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
v Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet
supplies proper voltage and phase rotation according to the system rating plate.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Disconnect the attached power cords, telecommunications systems, networks, and modems before
you open the device covers, unless instructed otherwise in the installation and configuration
procedures.
v Connect and disconnect cables as described in the following procedures when installing, moving,
or opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. Remove the power cords from the outlets.
3. Remove the signal cables from the connectors.
4. Remove all cables from the devices
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. Attach the power cords to the outlets.
5. Turn on the devices.
(D005)
DANGER
viPower Systems: Problem Determination and Service Guide for the IBM Power PS700 (8406-70Y)
Observe the following precautions when working on or around your IT rack system:
v Heavy equipment–personal injury or equipment damage might result if mishandled.
v Always lower the leveling pads on the rack cabinet.
v Always install stabilizer brackets on the rack cabinet.
v To avoid hazardous conditions due to uneven mechanical loading, always install the heaviest
devices in the bottom of the rack cabinet. Always install servers and optional devices starting
from the bottom of the rack cabinet.
v Rack-mounted devices are not to be used as shelves or work spaces. Do not place objects on top
of rack-mounted devices.
v Each rack cabinet might have more than one power cord. Be sure to disconnect all power cords in
the rack cabinet when directed to disconnect power during servicing.
v Connect all devices installed in a rack cabinet to power devices installed in the same rack
cabinet. Do not plug a power cord from a device installed in one rack cabinet into a power
device installed in a different rack cabinet.
v An electrical outlet that is not correctly wired could place hazardous voltage on the metal parts of
the system or the devices that attach to the system. It is the responsibility of the customer to
ensure that the outlet is correctly wired and grounded to prevent an electrical shock.
CAUTION
v Do not install a unit in a rack where the internal rack ambient temperatures will exceed the
manufacturer's recommended ambient temperature for all your rack-mounted devices.
v Do not install a unit in a rack where the air flow is compromised. Ensure that air flow is not
blocked or reduced on any side, front, or back of a unit used for air flow through the unit.
v Consideration should be given to the connection of the equipment to the supply circuit so that
overloading of the circuits does not compromise the supply wiring or overcurrent protection. To
provide the correct power connection to a rack, refer to the rating labels located on the
equipment in the rack to determine the total power requirement of the supply circuit.
v (For sliding drawers.) Do not pull out or install any drawer or feature if the rack stabilizer brackets
are not attached to the rack. Do not pull out more than one drawer at a time. The rack might
become unstable if you pull out more than one drawer at a time.
v (For fixed drawers.) This drawer is a fixed drawer and must not be moved for servicing unless
specified by the manufacturer. Attempting to move the drawer partially or completely out of the
rack might cause the rack to become unstable or cause the drawer to fall out of the rack.
(R001)
Safety noticesvii
CAUTION:
Removing components from the upper positions in the rack cabinet improves rack stability during
relocation. Follow these general guidelines whenever you relocate a populated rack cabinet within a
room or building:
v Reduce the weight of the rack cabinet by removing equipment starting at the top of the rack
cabinet. When possible, restore the rack cabinet to the configuration of the rack cabinet as you
received it. If this configuration is not known, you must observe the following precautions:
– Remove all devices in the 32U position and above.
– Ensure that the heaviest devices are installed in the bottom of the rack cabinet.
– Ensure that there are no empty U-levels between devices installed in the rack cabinet below the
32U level.
v If the rack cabinet you are relocating is part of a suite of rack cabinets, detach the rack cabinet from
the suite.
v Inspect the route that you plan to take to eliminate potential hazards.
v Verify that the route that you choose can support the weight of the loaded rack cabinet. Refer to the
documentation that comes with your rack cabinet for the weight of a loaded rack cabinet.
v Verify that all door openings are at least 760 x 2030 mm (30 x 80 in.).
v Ensure that all devices, shelves, drawers, doors, and cables are secure.
v Ensure that the four leveling pads are raised to their highest position.
v Ensure that there is no stabilizer bracket installed on the rack cabinet during movement.
v Do not use a ramp inclined at more than 10 degrees.
v When the rack cabinet is in the new location, complete the following steps:
– Lower the four leveling pads.
– Install stabilizer brackets on the rack cabinet.
– If you removed any devices from the rack cabinet, repopulate the rack cabinet from the lowest
position to the highest position.
v If a long-distance relocation is required, restore the rack cabinet to the configuration of the rack
cabinet as you received it. Pack the rack cabinet in the original packaging material, or equivalent.
Also lower the leveling pads to raise the casters off of the pallet and bolt the rack cabinet to the
pallet.
(R002)
(L001)
(L002)
viiiPower Systems: Problem Determination and Service Guide for the IBM Power PS700 (8406-70Y)
(L003)
or
All lasers are certified in the U.S. to conform to the requirements of DHHS 21 CFR Subchapter J for class
1 laser products. Outside the U.S., they are certified to be in compliance with IEC 60825 as a class 1 laser
product. Consult the label on each part for laser certification numbers and approval information.
CAUTION:
This product might contain one or more of the following devices: CD-ROM drive, DVD-ROM drive,
DVD-RAM drive, or laser module, which are Class 1 laser products. Note the following information:
v Do not remove the covers. Removing the covers of the laser product could result in exposure to
hazardous laser radiation. There are no serviceable parts inside the device.
v Use of the controls or adjustments or performance of procedures other than those specified herein
might result in hazardous radiation exposure.
(C026)
Safety noticesix
CAUTION:
Data processing environments can contain equipment transmitting on system links with laser modules
that operate at greater than Class 1 power levels. For this reason, never look into the end of an optical
fiber cable or open receptacle. (C027)
CAUTION:
This product contains a Class 1M laser. Do not view directly with optical instruments. (C028)
CAUTION:
Some laser products contain an embedded Class 3A or Class 3B laser diode. Note the following
information: laser radiation when open. Do not stare into the beam, do not view directly with optical
instruments, and avoid direct exposure to the beam. (C030)
Power and cabling information for NEBS (Network Equipment-Building System)
GR-1089-CORE
The following comments apply to the IBM servers that have been designated as conforming to NEBS
(Network Equipment-Building System) GR-1089-CORE:
The equipment is suitable for installation in the following:
v Network telecommunications facilities
v Locations where the NEC (National Electrical Code) applies
The intrabuilding ports of this equipment are suitable for connection to intrabuilding or unexposed
wiring or cabling only. The intrabuilding ports of this equipment must not be metallically connected to the
interfaces that connect to the OSP (outside plant) or its wiring. These interfaces are designed for use as
intrabuilding interfaces only (Type 2 or Type 4 ports as described in GR-1089-CORE) and require isolation
from the exposed OSP cabling. The addition of primary protectors is not sufficient protection to connect
these interfaces metallically to OSP wiring.
Note: All Ethernet cables must be shielded and grounded at both ends.
The ac-powered system does not require the use of an external surge protection device (SPD).
The dc-powered system employs an isolated DC return (DC-I) design. The DC battery return terminal
shall not be connected to the chassis or frame ground.
xPower Systems: Problem Determination and Service Guide for the IBM Power PS700 (8406-70Y)
Chapter 1. Introduction
This problem determination and service information helps you solve problems that might occur in your
PS700 blade server. The information describes the diagnostic tools that come with the blade server, error
codes and suggested actions, and instructions for replacing failing components.
Replaceable components are of three types:
v Tier 1 customer replaceable unit (CRU): Replacement of Tier 1 CRUs is your responsibility. If IBM
installs a Tier 1 CRU at your request, you are charged for the installation.
v Tier 2 customer replaceable unit: You can install a Tier 2 CRU yourself or request IBM to install it, at
no additional charge, under the type of warranty service that is designated for your blade server.
v Field replaceable unit (FRU): FRUs must be installed only by trained service technicians.
The serial number for the PS700 blade server can be found in the following locations:
v The bottom front of the blade server in the right corner on the 1S label.
vThe bottom rear of the blade server in the right corner.
v Under the front cover door.
For information about the terms of the warranty and getting service and assistance, see the information
center or the Warranty and Support Information document on the IBM BladeCenter
®
Documentation CD.
Related documentation
Documentation for the PS700 blade server includes documents in Portable Document Format (PDF) on
the IBM BladeCenter Documentation CD and the online information center.
The most recent version of all BladeCenter documentation is in the BladeCenter information center.
The online BladeCenter information center is available in the IBM BladeCenter Information Center at
http://publib.boulder.ibm.com/infocenter/bladectr/documentation/index.jsp.
PDF versions of the following documents are on the IBM BladeCenter Documentation CD and in the online
information center:
v Installation and User's Guide
This document contains general information about the blade server, including how to install supported
options and how to configure the blade server.
v Safety Information
This document contains translated caution and danger statements. Each caution and danger statement
that appears in the documentation has a number that you can use to locate the corresponding
statement in your language in the Safety Information document.
v Warranty and Support Information
This document contains information about the terms of the warranty and about getting service and
assistance.
Additional documents might be included in the online information center and on the IBM BladeCenterDocumentation CD.
The blade server might have features that are not described in the documentation that comes with the
blade server. Occasional updates to the documentation might include information about those features, or
technical updates might be available to provide additional information that is not included in the
documentation that comes with the blade server.
Review the online information or the Planning Guide and the Installation Guide for your IBM BladeCenter
unit. The information can help you prepare for system installation and configuration. The most current
version of each document is available in the BladeCenter information center.
Notices and statements
The caution and danger statements in this document are also in the multilingual Safety Information. Each
statement is numbered for reference to the corresponding statement in your language in the SafetyInformation document.
The following notices and statements are used in this document:
v Note: These notices provide important tips, guidance, or advice.
v Important: These notices provide information or advice that might help you avoid inconvenient or
problem situations.
v Attention: These notices indicate potential damage to programs, devices, or data. An attention notice is
placed just before the instruction or situation in which damage might occur.
v Caution: These statements indicate situations that can be potentially hazardous to you. A caution
statement is placed just before the description of a potentially hazardous procedure step or situation.
v Danger: These statements indicate situations that can be potentially lethal or extremely hazardous to
you. A danger statement is placed just before the description of a potentially lethal or extremely
hazardous procedure step or situation.
Features and specifications
Features and specifications of the IBM BladeCenter PS700 blade server are summarized in this overview.
The PS700 Type 8406 is a single-wide (non-expandable) blade server. The PS700 blade server is used in an
IBM BladeCenter H (8852 and 7989), BladeCenter HT (8740 and 8750), or BladeCenter S (8886 and 7779)
chassis unit.
Notes:
v Power, cooling, removable-media drives, external ports, and advanced system management are
provided by the BladeCenter unit.
v The operating system in the blade server must provide support for the Universal Serial Bus (USB), to
enable the blade server to recognize and communicate internally with the removable-media drives and
front-panel USB ports.
2Power Systems: Problem Determination and Service Guide for the IBM Power PS700 (8406-70Y)
Core electronics:
v 64-bit Power 7 processors (12S
technology)
v Four core, single socket (4-way)
processors @ 3.0 GHz
v 64 GB maximum in 8 very low
profile (VLP) DIMM slots; Supports
4 GB DDR3 at 1066MHz, and 8 GB
DDR3 at 800HMz
P5IOC2 I/O hub
On-board, integrated features:
v Two 1 GB Ethernet ports (HEA)
(two on each side)
v SAS controller
v USB 2.0
v 1 Serial over LAN (SOL) console
using FSP
FSP1 Service Processor - IPMI and
SOL
v The baseboard management
controller (BMC) is a flexible
service processor (FSP1) with
Intelligent Platform Management
Interface (IPMI), Serial over LAN
(SOL), and Wake on LAN (WOL)
firmware support.
Local Storage:
v First DASD bay: zero or one 2.5"
SAS HDD
v Second DASD bay: zero or one 2.5"
SAS HDD
v SAS HDDs are 300 GB and 600 GB
v Hardware mirroring
Daughter card I/O options:
v 1 1Xe expansion card (CIOv)
v SAS Pass-through using 1Xe
v 1 High-Speed expansion card
(CFFh)
Integrated functions:
v RS-485 interface for
communication with the
management module
v Automatic server restart (ASR)
v SOL through FSP
v Two Universal Serial Bus (USB
2.0) buses on base planar for
communication with
removable-media drives
v Optical media available by shared
chassis feature
Environment:
v Air temperature:
– Blade server on: 10° to 35°C
(50° to 95°F). Altitude: 0 to 914
m (3000 ft)
– Blade server on: 10° to 32°C
(50° to 90°F). Altitude: 914 m to
2133 m (3000 ft to 7000 ft)
– Blade server off: -40° to 60°C
(-40° to 140°F)
v Humidity:
– Blade server on: 8% to 80%
– Blade server off: 8% to 80%
PS700 Size:
v Height: 24.5 cm (9.7 inches)
v Depth: 44.6 cm (17.6 inches)
v Width: 30 mm (1.14 inches)
Systems management:
v Supported by BladeCenter chassis
management module
v Front panel LEDs
v IBM Director
v Hardware Management Console
(HMC)
v Integrated Virtualization Manager
(IVM)
v Energy Scale thermal management
for power management/
oversubscription (throttling) and
environmental sensing
v Active Energy Manager
Clusters support for:
v IBM Director
v xCat
Virtualization support for:
PowerVM
®
Standard Edition hardware
feature, which provides the Integrated
Virtualization Manager, Virtual I/O
Server, and Director Power Systems
™
Manager (DPSM).
Reliability and service features:
v Dual alternating current power
supply
v BladeCenter chassis redundant and
hot plug power and cooling
modules
v Boot-time processor deallocation
v Blade server hot plug
v Customer setup and expansion
v Automatic reboot on power loss
v Internal and ambient temperature
monitors
v ECC, chipkill memory
v System management alerts
Electrical input: 12Vdc
See the ServerProven Web site for information about supported operating-system versions and all PS700
blade server optional devices.
Chapter 1. Introduction3
Supported DIMMs
Each planar in the PS700 blade server contains eight very low profile (VLP) memory connectors for
registered dual inline memory modules (RDIMMs). The maximum size for a single DIMM is 8 GB. The
total memory capacity ranges for PS700 from a minimum of 4 GB to a maximum of 64 GB.
See Chapter 3, “Parts listing, Type 8406,” on page 229 for memory modules that you can order from IBM.
Memory module rules:
v Install DIMM fillers in unused DIMM slots for proper cooling.
v Install DIMMs in pairs (1 and 3, 6 and 8, 2 and 4, 5 and 7)
v Both DIMMs in a pair must be the same size, speed, type, and technology. You can mix compatible
DIMMs from different manufacturers.
v Each DIMM within a processor-support group (1-4 and 5-8) must be the same size and speed.
®
v Install only supported DIMMs, as described on the ServerProven
servers/eserver/serverproven/compat/us/.
v Installing or removing DIMMs changes the configuration of the blade server. After you install or
remove a DIMM, the blade server is automatically re-configured, and the new configuration
information is stored.
v See “System-board connectors” on page 8 for DIMM connector locations.
Table 1 shows allowable placement of DIMM modules:
Table 1. Memory module combinations
DIMM
countPS700 Base blade planar (P1) DIMM slots
12345678
2XX
4XXXX
6XXXXXX
8XXXXXXXX
Web site. See http://www.ibm.com/
Figure 1. DIMM connectors. Base unit connectors
4Power Systems: Problem Determination and Service Guide for the IBM Power PS700 (8406-70Y)
Blade server control panel buttons and LEDs
Blade server control panel buttons and LEDs provide operational controls and status indicators.
Note: Figure 2 shows the control-panel door in the closed (normal) position. To access the power-control
button, you must open the control-panel door.
Figure 2. Blade server control panel buttons and LEDs
1 Media-tray select button: Press this button to associate the shared BladeCenter unit media tray
(removable-media drives and front-panel USB ports) with the blade server. The LED on the button flashes
while the request is being processed, then is lit when the ownership of the media tray has been
transferred to the blade server. It can take approximately 20 seconds for the operating system in the blade
server to recognize the media tray.
If there is no response when you press the media-tray select button, use the management module to
determine whether local control has been disabled on the blade server.
Note: The operating system in the blade server must provide USB support for the blade server to
recognize and use the removable-media drives and USB ports.
Chapter 1. Introduction5
2 Information LED: When this amber LED is lit, it indicates that information about a system error for
the blade server has been placed in the management-module event log. The information LED can be
turned off through the Web interface of the management module or through IBM Director Console.
3 Blade-error LED: When this amber LED is lit, it indicates that a system error has occurred in the
blade server. The blade-error LED will turn off after one of the following events:
v Correcting the error
v Reseating the blade server in the BladeCenter unit
v Cycling the BladeCenter unit power
4 Power-control button: This button is behind the control panel door. Press this button to turn on or
turn off the blade server.
The power-control button has effect only if local power control is enabled for the blade server. Local
power control is enabled and disabled through the Web interface of the management module.
Press the power button for 5 seconds to begin powering down the blade server.
5 NMI reset (recessed): The nonmaskable interrupt (NMI) reset dumps the partition. Use this recessed
button only as directed by IBM Support.
6 Power-on LED: This green LED indicates the power status of the blade server in the following
manner:
v Flashing rapidly: The service processor is initializing the blade server.
v Flashing slowly: The blade server has completed initialization and is waiting for a power-on command.
v Lit continuously: The blade server has power and is turned on.
Note: The enhanced service processor can take as long as three minutes to initialize after you install the
BladeCenter PS700 blade server, at which point the LED begins to flash slowly.
7 Activity LED: When this green LED is lit, it indicates that there is activity on the hard disk drive or
network.
8 Location LED: When this blue LED is lit, it has been turned on by the system administrator to aid in
visually locating the blade server. The location LED can be turned off through the Web interface of the
management module or through IBM Director Console.
Turning on the blade server
After you connect the blade server to power through the BladeCenter unit, you can start the blade server
after the discovery and initialization process is complete.
6Power Systems: Problem Determination and Service Guide for the IBM Power PS700 (8406-70Y)
You can start the blade server in any of the following ways.
v Start the blade server by pressing the power-control button on the front of the blade server.
The power-control button is behind the control panel door, as described in “Blade server control panel
buttons and LEDs” on page 5.
After you push the power-control button, the power-on LED continues to blink slowly for about 15
seconds, then is lit solidly when the power-on process is complete.
Wait until the power-on LED on the blade server flashes slowly before you press the blade server
power-control button. If the power-on LED is flashing rapidly, the service processor is initializing the
blade server. The power-control button does not respond during initialization.
Note: The enhanced service processor can take as long as three minutes to initialize after you install
the BladeCenter PS700 blade server, at which point the LED begins to flash slowly.
v Start the blade server automatically when power is restored after a power failure.
If a power failure occurs, the BladeCenter unit and then the blade server can start automatically when
power is restored. You must configure the blade server to restart through the management module.
v Start the blade server remotely using the management module.
After you initiate the power-on process, the power-on LED blinks slowly for about 15 seconds, then is
lit solidly when the power-on process is complete.
Turning off the blade server
When you turn off the blade server, it is still connected to power through the BladeCenter unit. The blade
server can respond to requests from the service processor, such as a remote request to turn on the blade
server. To remove all power from the blade server, you must remove it from the BladeCenter unit.
Shut down the operating system before you turn off the blade server. See the operating-system
documentation for information about shutting down the operating system.
You can turn off the blade server in one of the following ways.
v Turn off the blade server by pressing the power-control button for at least 5 seconds.
The power-control button is on the blade server behind the control panel door. See “Blade server
control panel buttons and LEDs” on page 5 for the location.
Note: The power-control LED can remain on solidly for up to 1 minute after you push the
power-control button. After you turn off the blade server, wait until the power-control LED is blinking
slowly before you press the power-control button to turn on the blade server again.
If the operating system stops functioning, press and hold the power-control button for more than 5
seconds to force the blade server to turn off.
v Use the management module to turn off the blade server.
The power-control LED can remain on solidly for up to 1 minute after you initiate the power-off
process. After you turn off the blade server, wait until the power-control LED is blinking slowly before
you initiate the power-on process from the AMM to turn on the blade server again.
Use the management-module Web interface to configure the management module to turn off the blade
server if the system is not operating correctly.
For additional information, see the online documentation or the User's Guide for the management
module.
Chapter 1. Introduction7
System-board layouts
Illustrations show the connectors and LEDs on the system board. The illustrations might differ slightly
from your hardware.
System-board connectors
Blade server components attach to the connectors on the system board.
Figure 3 shows the connectors on the base unit system board in the blade server.
Figure 3. PS700 system-board connectors
Table 2 shows connector descriptions.
Table 2. PS700 connectors
CalloutPS700 blade server connectors
1Operator panel connector
2DIMM 1-4 connectors (See Figure 4 on page 9 for individual connectors.) Expansion unit
9DIMM 5-8 connectors (See Figure 4 on page 9 for individual connectors.)
103V lithium battery connector (P1-E1)
Figure 4 on page 9 shows individual DIMM connectors.
8Power Systems: Problem Determination and Service Guide for the IBM Power PS700 (8406-70Y)
Figure 4. DIMM connectors. Base unit connectors
System-board LEDs
Use the illustration of the LEDs on the system board to identify a light emitting diode (LED).
Remove the blade server from the BladeCenter unit, open the cover, press the blue button to see any
error LEDs that were turned on during error processing, and use Figure 5 to identify the failing
component.
Figure 5 shows the locations of LEDs on the system board.
Table 3 shows LED descriptions.
Figure 5. LED locations on the system board of the PS700 blade server
Table 3. PS700 LEDs
CalloutBase unit LEDs
13V lithium battery LED
2DIMM 1-4 LEDs
3Management card LED
4Light path power LED
5System board LED
6HDD1 LED
7Interposer LED
Chapter 1. Introduction9
Table 3. PS700 LEDs (continued)
CalloutBase unit LEDs
8CIOv (1Xe) expansion card connector LED
9High-Speed (CFFh) expansion card connector LED
10HDD2 LED
11DIMM 5-8 LEDs
10Power Systems: Problem Determination and Service Guide for the IBM Power PS700 (8406-70Y)
Chapter 2. Diagnostics
Use the available diagnostic tools to help solve any problems that might occur in the blade server.
The first and most crucial component of a solid serviceability strategy is the ability to accurately and
effectively detect errors when they occur. While not all errors are a threat to system availability, those that
go undetected are dangerous because the system does not have the opportunity to evaluate and act if
necessary. POWER7
that extend from processor cores and memory to power supplies and hard drives.
POWER7 processor-based systems contain specialized hardware detection circuitry for detecting
erroneous hardware operations. Error checking hardware ranges from parity error detection coupled with
processor instruction retry and bus retry, to ECC correction on caches and system buses.
IBM hardware error checkers have these distinct attributes:
v Continuous monitoring of system operations to detect potential calculation errors
v Attempted isolation of physical faults based on runtime detection of each unique failure
v Initiation of a wide variety of recovery mechanisms designed to correct a problem
POWER7 processor-based systems include extensive hardware and firmware recovery logic.
Machine check handling
Machine checks are handled by firmware. When a machine check occurs, the firmware analyzes the error
to identify the failing device and creates an error log entry.
®
processor-based systems are specifically designed with error-detection mechanisms
If the system degrades to the point that the service processor cannot reach standby state, the ability to
analyze the error does not exist. If the error occurs during POWER
PHYP initiates a system reboot.
In partitioned mode, an error that occurs during partition activity is reported to the operating system in
the partition.
®
hypervisor (PHYP) activities, the
Diagnostic tools
Tools are available to help you diagnose and solve hardware-related problems.
v Power-on self-test (POST) progress codes (checkpoints), error codes, and isolation procedures
The POST checks out the hardware at system initialization. IPL diagnostic functions test some system
components and interconnections. The POST generates eight-digit checkpoints to mark the progress of
powering up the blade server.
Use the management module to view progress codes.
The documentation of a progress code includes recovery actions for system hangs. See “POST progress
codes (checkpoints)” on page 84 for more information.
If the service processor detects a problem during POST, an error code is logged in the management
module event log. Error codes are also logged in the Linux syslog or AIX
®
diagnostic log, if possible.
See “System reference codes (SRCs)” on page 16.
The service processor can generate codes that point to specific isolation procedures. See “Service
processor problems” on page 200.
v Light path diagnostics
Use the light path diagnostic LEDs on the system board to identify failing hardware. If the system
error LED on the system LED panel on the front or rear of the BladeCenter unit is lit, one or more
error LEDs on the BladeCenter unit components also might be lit.
Light path diagnostics help identify failing customer replaceable unit (CRUs). CRU location codes are
included in error codes and the event log.
LED locations
See “System-board LEDs” on page 9.
Front panel
See “Blade server control panel buttons and LEDs” on page 5.
v Troubleshooting tables
Use the troubleshooting tables to find solutions to problems that have identifiable symptoms.
See “Troubleshooting tables” on page 191.
v Dump data collection
In some circumstances, an error might require a dump to show more data. The Integrated
Virtualization Manager (IVM) or Hardware Management Console (HMC) sets up a dump area. Specific
IVM or HMC information is included as part of the information that can optionally be sent to IBM
support for analysis.
See “Collecting dump data” on page 13 for more information.
v Stand-alone diagnostics
The AIX-based stand-alone diagnostics CD is in the ship package and is also available from the IBM
Web site. Boot the diagnostics from a CD drive or from an AIX network installation manager (NIM)
server if the blade server cannot boot to an operating system, no matter which operating system is
installed.
Functions provided by the stand-alone diagnostics include:
– Analysis of errors reported by platform, such as microprocessor and memory errors
– Testing of resources, such as I/O adapters and devices
– Service aids, such as firmware update, format disk, and Raid Manager
v Diagnostic utilities for the AIX operating system
Run AIX concurrent diagnostics if AIX is functioning instead of the stand-alone diagnostics. Functions
provided by disk-based AIX diagnostics include:
– Automatic error log analysis
– Analysis of errors reported by platform, such as microprocessor and memory errors
– Testing of resources, such as I/O adapters and devices
– Service aids, such as firmware update, format disk, and Raid Manager
v Diagnostic utilities for Linux operating systems
12Power Systems: Problem Determination and Service Guide for the IBM Power PS700 (8406-70Y)
Linux on POWER service and productivity tools include hardware diagnostic aids and productivity
tools, and installation aids. The installation aids are provided in the IBM Installation Toolkit for Linux
on POWER, a set of tools that aids the installation of Linux on IBM servers with POWER architecture.
You can also use the tools to update the PS700 blade server firmware.
Diagnostic utilities for the Linux operating system are available from IBM at https://
www14.software.ibm.com/webapp/set2/sas/f/lopdiags/home.html.
v Diagnostic utilities for other operating systems
You can use the stand-alone diagnostics CD to perform diagnostics on the PS700 blade server, no matter
which operating system is loaded on the blade server. However, other supported operating systems
might have diagnostic tools that are available through the operating system. See the documentation for
your operating system for more information.
Collecting dump data
A dump might be critical for fault isolation when the built-in First Failure Data Capture (FFDC)
mechanisms are not capturing sufficient fault data. Even when a fault is identified, dump data can
provide additional information that is useful in problem determination.
All hardware state information is part of the dump if a hardware checkstop occurs. When a checkstop
occurs, the service processor attempts to dump data that is necessary to analyze the error from
appropriate parts of the system.
Note: If you power off the blade through the management module while the service processor is
performing a dump, platform dump data is lost.
You might be asked to retrieve a dump to send it to IBM Support for analysis. The location of the dump
data varies by operating system.
v Collect an AIX dump from the /var/adm/platform directory.
v Collect a Linux dump from the /var/log/dump directory.
v Collect an Integrated Virtualization Manager (IVM) dump from the IVM-managed PS700 blade server
through the Manage Dumps task in the IVM console.
v To collect a system dump by using the Hardware Management Console (HMC), complete these steps:
1. Perform a controlled shutdown of all partitions.
Note: A system dump will abnormally terminate any running partitions.
2. In the navigation area, open Systems Management.
3. Select the server and open it.
4. Select Serviceability > Manage Dumps > Action > Initiate System Dump. The dump is
automatically saved to the HMC. For details on how to copy, report, or delete a dump after you
have completed a dump, see Managing dumps.
Chapter 2. Diagnostics13
Location codes
Location codes identify components of the blade server. Location codes are displayed with some error
codes to identify the blade server component that is causing the error.
See “System-board connectors” on page 8 for component locations.
Notes:
1. Location codes do not indicate the location of the blade server within the BladeCenter unit. The codes
identify components of the blade server only.
2. For checkpoints with no associated location code, see “Light path diagnostics” on page 214 to identify
the failing component when there is a hang condition.
3. For checkpoints with location codes, use the following table to identify the failing component when
there is a hang condition.
4. For 8-digit codes not listed in Table 4, see the “Checkout procedure” on page 184.
Table 4. Location codes
ComponentsPhysical Location CodeCRU LED
Un location codes are for enclosure and VPD locations.
Un = Utttt.mmm.sssssss
tttt = system machine type
mmm = system model number
sssssss = system serial number
DIMM 1Un-P1-C1Yes
DIMM 2Un-P1-C2Yes
DIMM 3Un-P1-C3Yes
DIMM 4Un-P1-C4Yes
DIMM 5Un-P1-C5Yes
DIMM 6Un-P1-C6Yes
DIMM 7Un-P1-C7Yes
DIMM 8Un-P1-C8Yes
2.5" SAS HDD1Un-P1-D1Yes
2.5" SAS HDD2Un-P1-D2Yes
Management CardUn-P1-C9Yes
BatteryUn-P1-E1Yes
PCIe High Speed Expansion CardUn-P1-C12Yes
1Xe CardUn-P1-C11Yes
USB Port 1 (CDROM/FDD)Un-P1-T1No
USB Port 2 (CDROM/FDD)Un-P1-T2No
SAS controllerUn-P1-T3No
Ethernet HEA0_AUn-P1-T4No
Ethernet HEA0_BUn-P1-T5No
Machine Location CodeUtttt.mmm.sssssssNo
Um codes are for firmware. The format is the same as for a Un location code.
Um = Utttt.mmm.sssssss
14Power Systems: Problem Determination and Service Guide for the IBM Power PS700 (8406-70Y)
Table 4. Location codes (continued)
ComponentsPhysical Location CodeCRU LED
Firmware versionUm-Y1
Reference codes
Reference codes are diagnostic aids that help you determine the source of a hardware or operating
system problem. To use reference codes effectively, use them in conjunction with other service and
support procedures.
The BladeCenter PS700 Type 8406 blade server produces several types of codes.
Progress codes: The power-on self-test (POST) generates eight-digit status codes that are known as
checkpoints or progress codes, which are recorded in the management-module event log. The checkpoints
indicate which blade server resource is initializing.
Error codes: The First Failure Data Capture (FFDC) error checkers capture fault data, which the service
processor then analyzes. For unrecoverable errors (UEs), for recoverable events that meet or exceed their
service thresholds, and for fatal system errors, an unrecoverable checkstop service event triggers the
service processor to analyze the error, log the system reference code (SRC), and turn on the system
attention LED.
The service processor logs the nine-word, eight-digit per word error code in the BladeCenter
management-module event log. Error codes are either system reference codes (SRCs) or service requestnumbers (SRNs). A location code might also be included.
Isolation procedures: If the fault analysis does not determine a definitive cause, the service processor
might indicate a fault isolation procedure that you can use to isolate the failing component.
Viewing the codes
The PS700 blade server does not display checkpoints or error codes on the remote console. The shared
BladeCenter unit video also does not display the codes.
If the POST detects a problem, a 9-word, 8-digit error code is logged in the BladeCenter
management-module event log. A location code that identifies a component might also be included. See
“Error logs” on page 183 for information about viewing the management-module event log.
Service request numbers can be viewed using the AIX diagnostics CD, or various operating system
utilities, such as AIX diagnostics or the Linux service aid “diagela”, if it is installed.
Chapter 2. Diagnostics15
System reference codes (SRCs)
System reference codes indicate a server hardware or software problem that can originate in hardware, in
firmware, or in the operating system.
A blade server component generates an error code when it detects a problem. An SRC identifies the
component that generated the error code and describes the error. Use the SRC information to identify a
list of possibly failing items and to find information about any additional isolation procedures.
The following table shows the syntax of a nine-word B700xxxx SRC as it might be displayed in the event
log of the management module.
The first word of the SRC in this example is the message identifier, B7001111. This example numbers each
word after the first word to show relative word positions. The seventh word is the direct select address,
which is 77777777 in the example.
Table 5. Nine-word system reference code in the management-module event log
IndexSevSourceDate/TimeText
1EBlade_05
01/21/2008,
17:15:14
Depending on your operating system and the utilities you have installed, error messages might also be
stored in an operating system log. See the documentation that comes with the operating system for more
information.
The management module can display the most recent 32 SRCs and time stamps. Manually refresh the list
to update it.
Select Blade Service Data > blade_name in the management module to see a list of the 32 most recent
SRCs.
Table 6. Management module reference code listing
Unique IDSystem Reference CodeTimestamp
00040001D15139012005-11-13 19:30:20
00000016D15138012005-11-13 19:30:16
Any message with more detail is highlighted as a link in the System Reference Code column. Click the
message to cause the management module to present the additional message detail:
D1513901
Created at: 2007-11-1319:30:20
SRC Version: 0x02
Hex Words 2-5: 020110F0 52298910 C1472000 200000FF
16Power Systems: Problem Determination and Service Guide for the IBM Power PS700 (8406-70Y)
SRC formats
SRCs are strings of either six or eight alphanumeric characters. The first two characters designate the
reference code type.
The first character indicates the type of error. In a few cases, the first two characters indicate the type of
error:
v 1xxxxxxx - System power control network (SPCN) error
v 6xxxxxxx - Virtual optical device error
v A1xxxxxx - Attention required (Service processor)
v AAxxxxxx - Attention required (Partition firmware)
v B1xxxxxx - Service processor error, such as a boot problem
v B6xxxxxx - Licensed Internal Code or hardware event error
v B9xxxxxx - Software installation error or IBM i IPL error. See "Recovering from IPL or system failures"
in the IBM i Information Center at http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/
index.jsp?topic=/ipha5_p5/iplprocedure.htm.
v BAxxxxxx - Partition firmware error
v Cxxxxxxx - Checkpoint (must hang to indicate an error)
v Dxxxxxxx - Dump checkpoint (must hang to indicate an error)
To find a description of a SRC that is not listed in this PS700 blade server documentation, refer to the
POWER7 Reference Code Lookup page at http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/
index.jsp?topic=/ipha8/codefinder.htm.
1xxxyyyy SRCs
The 1xxxyyyy system reference codes are system power control network (SPCN) reference codes.
Look for the rightmost 4 characters (yyyy in 1xxxyyyy) in the error code; this is the reference code. Find
the reference code in Table 7.
Perform all actions before exchanging failing items.
Table 7. 1xxxyyyy SRCs
v Follow the suggested actions in the order in which they are listed in the Action column until the problem is
solved. If an action solves the problem, then you can stop performing the remaining actions.
v See Chapter 3, “Parts listing, Type 8406,” on page 229 to determine which components are CRUs and which
components are FRUs.
1xxxyyyy
Error
Codes
00ACInformational message: AC loss
00ADInformational message: A
1F02Informational message: The
1F03Informational message: Invalid
DescriptionAction
No action is required.
was reported
No action is required.
service processor reset caused
the blade server to power off
No action is required.
trace logs reached 1K of data.
No action is required.
TMS of location code.
Chapter 2. Diagnostics17
Table 7. 1xxxyyyy SRCs (continued)
v Follow the suggested actions in the order in which they are listed in the Action column until the problem is
solved. If an action solves the problem, then you can stop performing the remaining actions.
v See Chapter 3, “Parts listing, Type 8406,” on page 229 to determine which components are CRUs and which
components are FRUs.
1xxxyyyy
Error
Codes
2600Power good (pGood) master
2610pGood fault
262012V dc pGood input fault
26291.5V reg_pgood fault
262B1.8V reg_pgood fault
262C5V reg_pgood fault
262D3.3V reg_pgood fault
262E2.5V reg_pgood fault
2630VRM CP0 core pGood fault
2632VRM CP0 cache pGood fault
264712V "or-ing" FET short
2648Blade power latch fault
DescriptionAction
fault
1. Go to “Checkout procedure” on page 184.
2. Replace the system-board, as described in “Replacing the FRU
system-board and chassis assembly” on page 260.
1. Go to “Checkout procedure” on page 184.
2. Replace the system-board, as described in “Replacing the FRU
system-board and chassis assembly” on page 260.
1. Go to “Checkout procedure” on page 184.
2. Replace the system-board, as described in “Replacing the FRU
system-board and chassis assembly” on page 260.
1. Go to “Checkout procedure” on page 184.
2. Replace the system-board, as described in “Replacing the FRU
system-board and chassis assembly” on page 260.
1. Go to “Checkout procedure” on page 184.
2. Replace the system-board, as described in “Replacing the FRU
system-board and chassis assembly” on page 260.
1. Go to “Checkout procedure” on page 184.
2. Replace the system-board, as described in “Replacing the FRU
system-board and chassis assembly” on page 260.
1. Go to “Checkout procedure” on page 184.
2. Replace the system-board, as described in “Replacing the FRU
system-board and chassis assembly” on page 260.
1. Go to “Checkout procedure” on page 184.
2. Replace the system-board, as described in “Replacing the FRU
system-board and chassis assembly” on page 260.
1. Go to “Checkout procedure” on page 184.
2. Replace the system-board, as described in “Replacing the FRU
system-board and chassis assembly” on page 260.
1. Go to “Checkout procedure” on page 184.
2. Replace the system-board, as described in “Replacing the FRU
system-board and chassis assembly” on page 260.
1. Go to “Checkout procedure” on page 184.
2. Replace the system-board, as described in “Replacing the FRU
system-board and chassis assembly” on page 260.
1. Go to “Checkout procedure” on page 184.
2. Replace the system-board, as described in “Replacing the FRU
system-board and chassis assembly” on page 260.
18Power Systems: Problem Determination and Service Guide for the IBM Power PS700 (8406-70Y)
Loading...
+ 264 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.