BladeCenter JS20
Ty pe 8842
Hardw are Maintenance Manual and
Troubleshooting Guid e
BladeCenter JS20
Ty pe 8842
Hardw are Maintenance Manual and
Troubleshooting Guid e
Notes
v Before using this information and the product it supports, read Appendix B, “Safety information,” on page 163 and
“Notices” on page 197
v The most recent version of this document is available at http://www.ibm.com/pc/support/ .
16th Edition (June 2006)
© Copyright International Business Machines Corporation 2003. All rights reserved.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.
About this manual
This manual contains diagnostic information, a symptom-to-FRU index, service
information, error codes, error messages, and configuration information for the IBM
BladeCenter
®
JS20 Type 8842 blade server.
Important safety information
Be sure to read all caution and danger statements in this book before performing
any of the instructions; see Appendix B, “Safety information,” on page 163.
Leia todas as instruções de cuidado e perigo antes de executar qualquer operação.
Prenez connaissance de toutes les consignes de type Attention et Danger avant de
procéder aux opérations décrites par les instructions.
Lesen Sie alle Sicherheitshinweise, bevor Sie eine Anweisung ausführen.
Accertarsi di leggere tutti gli avvisi di attenzione e di pericolo prima di effettuare
qualsiasi operazione.
Lea atentamente todas las declaraciones de precaución y peligro ante de llevar a
cabo cualquier operación.
®
WARNING: Handling the cord on this product or cords associated with accessories
sold with this product, will expose you to lead, a chemical known to the State of
California to cause cancer, and birth defects or other reproductive harm. Wash
hands after handling.
ADVERTENCIA: El contacto con el cable de este producto o con cables de
accesorios que se venden junto con este producto, pueden exponerle al plomo, un
elemento químico que en el estado de California de los Estados Unidos está
considerado como un causante de cancer y de defectos congénitos, además de
otros riesgos reproductivos. Lávese las manos después de usar el producto.
Online support
You can download the most current firmware update and device driver files from
http://www.ibm.com/pc/support.
© Copyright IBM Corp. 2003 iii
iv BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Contents
About this manual . . . . . . . . . . . . . . . . . . . . . . . iii
Important safety information . . . . . . . . . . . . . . . . . . . . iii
Online support . . . . . . . . . . . . . . . . . . . . . . . . . iii
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . .1
Related documentation . . . . . . . . . . . . . . . . . . . . . .3
The IBM BladeCenter Documentation CD . . . . . . . . . . . . . . .4
Hardware and software requirements . . . . . . . . . . . . . . . .4
Using the Documentation Browser . . . . . . . . . . . . . . . . .4
Notices and statements used in this document . . . . . . . . . . . . . .5
Features and specifications . . . . . . . . . . . . . . . . . . . . .6
BladeCenter JS20 specifications for non-NEBS/ETSI environments . . . . .6
BladeCenter JS20 specifications for NEBS/ETSI environments . . . . . . .7
Preinstallation checklist . . . . . . . . . . . . . . . . . . . . . .9
Checking the status of the media tray . . . . . . . . . . . . . . . .10
Chapter 2. Blade server power, controls, and indicators . . . . . . . .13
Turning on the blade server . . . . . . . . . . . . . . . . . . . .13
Turning off the blade server . . . . . . . . . . . . . . . . . . . .14
Blade server controls and LEDs . . . . . . . . . . . . . . . . . .14
Chapter 3. Configuration . . . . . . . . . . . . . . . . . . . . .17
Using the command-line interface . . . . . . . . . . . . . . . . . .18
Configuring the Gigabit Ethernet controller . . . . . . . . . . . . . . .18
Blade server Ethernet controller enumeration . . . . . . . . . . . . . .19
Chapter 4. Problem determination procedures for AIX and Linux . . . . .21
Problem determination . . . . . . . . . . . . . . . . . . . . . .21
Obtaining an SRN/SRC or error code . . . . . . . . . . . . . . . .22
Chapter 5. AIX online, standalone and verification procedures . . . . . .25
Performing AIX online concurrent mode diagnostics for problem determination 25
Running the standalone diagnostics from CD-ROM . . . . . . . . . . .25
Performing AIX online concurrent mode diagnostics for previous diagnostic
results: service aids . . . . . . . . . . . . . . . . . . . . . .28
Performing AIX online concurrent mode diagnostics for system verification . . .29
Verifying the replacement part using AIX diagnostics . . . . . . . . . . .30
Chapter 6. Running a Serial Over LAN session . . . . . . . . . . . .33
Selecting the command target . . . . . . . . . . . . . . . . . . .34
Starting the command-line interface . . . . . . . . . . . . . . . . .34
Establishing a Telnet connection . . . . . . . . . . . . . . . . .35
Establishing a Secure Shell (SSH) connection . . . . . . . . . . . .35
Starting an SOL session . . . . . . . . . . . . . . . . . . . . .35
Ending an SOL session . . . . . . . . . . . . . . . . . . . . . .36
Chapter 7. Diagnostics . . . . . . . . . . . . . . . . . . . . .37
General checkout . . . . . . . . . . . . . . . . . . . . . . . .37
Checkout procedure . . . . . . . . . . . . . . . . . . . . . . .38
Diagnostic tools overview . . . . . . . . . . . . . . . . . . . . .39
POST . . . . . . . . . . . . . . . . . . . . . . . . . . . .40
Checkpoints . . . . . . . . . . . . . . . . . . . . . . . . .40
Accessing the Linux system error log . . . . . . . . . . . . . . . .40
© Copyright IBM Corp. 2003 v
Service aids and the Linux system error log . . . . . . . . . . . . .40
FRU/CRU isolation . . . . . . . . . . . . . . . . . . . . . . .46
Error symptom charts . . . . . . . . . . . . . . . . . . . . . .46
Light path diagnostics . . . . . . . . . . . . . . . . . . . . . .46
Memory errors . . . . . . . . . . . . . . . . . . . . . . . . .47
Recovering the system firmware code . . . . . . . . . . . . . . . .48
Recovery of system firmware code using service aids . . . . . . . . .48
Starting the TEMP image . . . . . . . . . . . . . . . . . . . .48
Recovering the TEMP image from the PERM image . . . . . . . . . .49
Updating the blade server firmware . . . . . . . . . . . . . . . . .50
Determination of current server firmware levels . . . . . . . . . . . .51
Updating the blade server service processor . . . . . . . . . . . . .51
Update and manage system flash using Linux service aids . . . . . . . .51
Updating the system flash using Linux . . . . . . . . . . . . . .51
Verifying the system firmware levels using Linux . . . . . . . . . .52
Update and manage system flash using AIX diagnostics . . . . . . . . .52
Updating the system flash using AIX . . . . . . . . . . . . . . .52
Committing the temporary firmware image using AIX . . . . . . . . .53
Verifying the system firmware levels using AIX . . . . . . . . . . .53
Recovering the system firmware code . . . . . . . . . . . . . . . .54
Recovery of system firmware code using service aids . . . . . . . . .55
Starting the backup image . . . . . . . . . . . . . . . . . . . .55
Recovering the primary image . . . . . . . . . . . . . . . . . .56
Chapter 8. General AIX and xSeries standalone diagnostic information 59
Information for general diagnostic systems running the AIX operating system 59
AIX operating system message files . . . . . . . . . . . . . . . .59
CE login . . . . . . . . . . . . . . . . . . . . . . . . . . .59
Missing resources . . . . . . . . . . . . . . . . . . . . . . . .60
Automatic diagnostic tests . . . . . . . . . . . . . . . . . . . . .60
Configuration program . . . . . . . . . . . . . . . . . . . . .60
Diagnostic programs . . . . . . . . . . . . . . . . . . . . . .61
Error log analysis . . . . . . . . . . . . . . . . . . . . . . . .61
Introducing tasks and service aids . . . . . . . . . . . . . . . . . .62
Task and service aid functions . . . . . . . . . . . . . . . . . .62
AIX automatic error log analysis (diagela) . . . . . . . . . . . . . .62
Error log analysis . . . . . . . . . . . . . . . . . . . . . . .63
Log repair action . . . . . . . . . . . . . . . . . . . . . . .63
Tasks (service aids) . . . . . . . . . . . . . . . . . . . . . .63
Download microcode . . . . . . . . . . . . . . . . . . . . .64
Update and manage system flash . . . . . . . . . . . . . . . .65
Using the standalone CD-ROM and online current diagnostics . . . . . .67
Standalone and online diagnostics operating considerations . . . . . . .67
Running online diagnostics . . . . . . . . . . . . . . . . . . .67
Running the online diagnostics in concurrent mode . . . . . . . . . .68
Running standalone diagnostics from a management (NIM) server . . . . . .68
NIM server configuration . . . . . . . . . . . . . . . . . . . .69
Client configuration and booting ERserver standalone diagnostics from the
NIM server . . . . . . . . . . . . . . . . . . . . . . . .69
Chapter 9. Installing options . . . . . . . . . . . . . . . . . . .71
Installation guidelines . . . . . . . . . . . . . . . . . . . . . .71
System reliability guidelines . . . . . . . . . . . . . . . . . . .71
Handling static-sensitive devices . . . . . . . . . . . . . . . . .71
Removing the blade server from the BladeCenter unit . . . . . . . . . .73
Opening the blade server cover . . . . . . . . . . . . . . . . . . .74
vi BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Removing the blade server bezel assembly . . . . . . . . . . . . . .75
Installing IDE hard disk drives . . . . . . . . . . . . . . . . . . .75
Installing memory modules . . . . . . . . . . . . . . . . . . . .77
Installing an I/O expansion card . . . . . . . . . . . . . . . . . . .79
Ethernet controller, switch module, and cabling requirements . . . . . . . .82
Replacing the battery . . . . . . . . . . . . . . . . . . . . . .83
System board . . . . . . . . . . . . . . . . . . . . . . . . .86
System board component locations . . . . . . . . . . . . . . . .86
System-board LED locations . . . . . . . . . . . . . . . . . . .87
Replacing the system board . . . . . . . . . . . . . . . . . . .87
Completing the installation . . . . . . . . . . . . . . . . . . . . .90
Installing the blade-server bezel assembly . . . . . . . . . . . . . .90
Closing the blade server cover . . . . . . . . . . . . . . . . . .92
Input/output connectors and devices . . . . . . . . . . . . . . . . .92
Chapter 10. Symptom-to-FRU index . . . . . . . . . . . . . . . .93
Firmware checkpoint (progress) codes . . . . . . . . . . . . . . . .94
Firmware error codes . . . . . . . . . . . . . . . . . . . . . . 102
Service request numbers . . . . . . . . . . . . . . . . . . . . . 108
Linux service aid ″ diagela″ . . . . . . . . . . . . . . . . . . . 109
Using the SRN list . . . . . . . . . . . . . . . . . . . . . . 109
Service request number . . . . . . . . . . . . . . . . . . . 109
Source of SRN . . . . . . . . . . . . . . . . . . . . . . 109
Failing Function Codes . . . . . . . . . . . . . . . . . . . 109
Description and action . . . . . . . . . . . . . . . . . . . .110
Using the SRN list . . . . . . . . . . . . . . . . . . . . .110
SRN tables . . . . . . . . . . . . . . . . . . . . . . . . .110
AIX SRNs 101-711 through 2D02 . . . . . . . . . . . . . . . .110
SRNs A00-(x)xxx through A1D-(x)xxx . . . . . . . . . . . . . . 121
Failing Function Codes (FFCs) . . . . . . . . . . . . . . . . 141
FFC table . . . . . . . . . . . . . . . . . . . . . . . . 142
Light path diagnostics LEDs . . . . . . . . . . . . . . . . . . . 145
Error symptoms . . . . . . . . . . . . . . . . . . . . . . . . 145
CD drive problems . . . . . . . . . . . . . . . . . . . . . . 146
Diskette drive problems . . . . . . . . . . . . . . . . . . . . 147
General problems . . . . . . . . . . . . . . . . . . . . . . 147
Hard disk drive problems . . . . . . . . . . . . . . . . . . . . 147
Memory problems . . . . . . . . . . . . . . . . . . . . . . 148
Microprocessor problems . . . . . . . . . . . . . . . . . . . . 148
Monitor problems . . . . . . . . . . . . . . . . . . . . . . 148
Mouse problems . . . . . . . . . . . . . . . . . . . . . . . 149
Network connection problems . . . . . . . . . . . . . . . . . . 150
Option problems . . . . . . . . . . . . . . . . . . . . . . . 150
Power problems . . . . . . . . . . . . . . . . . . . . . . . 151
Service processor problems . . . . . . . . . . . . . . . . . . . 151
Software problems . . . . . . . . . . . . . . . . . . . . . . 152
Startup problems . . . . . . . . . . . . . . . . . . . . . . . 152
Service processor error codes . . . . . . . . . . . . . . . . . . . 152
Boot problem resolution . . . . . . . . . . . . . . . . . . . . . 153
Physical location codes . . . . . . . . . . . . . . . . . . . . . 154
Undetermined problems . . . . . . . . . . . . . . . . . . . . . 156
Problem determination tips . . . . . . . . . . . . . . . . . . . . 158
Chapter 11. Parts listing, Type 8842 . . . . . . . . . . . . . . . . 159
Appendix A. Getting help and technical assistance . . . . . . . . . . 161
Contents vii
Before you call . . . . . . . . . . . . . . . . . . . . . . . . 161
Using the documentation . . . . . . . . . . . . . . . . . . . . . 161
Getting help and information from the World Wide Web . . . . . . . . . 162
Software service and support . . . . . . . . . . . . . . . . . . . 162
Hardware service and support . . . . . . . . . . . . . . . . . . . 162
Appendix B. Safety information . . . . . . . . . . . . . . . . . 163
General safety . . . . . . . . . . . . . . . . . . . . . . . . 163
Electrical safety . . . . . . . . . . . . . . . . . . . . . . . . 164
Safety inspection guide . . . . . . . . . . . . . . . . . . . . . 165
Grounding requirements . . . . . . . . . . . . . . . . . . . . . 166
Safety notices (multi-lingual translations) . . . . . . . . . . . . . . . 166
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Edition notice . . . . . . . . . . . . . . . . . . . . . . . . . 197
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . 198
Important notes . . . . . . . . . . . . . . . . . . . . . . . . 198
Product recycling and disposal . . . . . . . . . . . . . . . . . . 199
Battery return program . . . . . . . . . . . . . . . . . . . . . 199
Electronic emission notices . . . . . . . . . . . . . . . . . . . . 200
Federal Communications Commission (FCC) statement . . . . . . . . 200
Industry Canada Class A emission compliance statement . . . . . . . . 200
Australia and New Zealand Class A statement . . . . . . . . . . . . 200
United Kingdom telecommunications safety requirement . . . . . . . . 200
European Union EMC Directive conformance statement . . . . . . . . 201
Taiwanese Class A warning statement . . . . . . . . . . . . . . . 201
Chinese Class A warning statement . . . . . . . . . . . . . . . . 201
Japanese Voluntary Control Council for Interference (VCCI) statement 201
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
viii BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Chapter 1. Introduction
The IBM BladeCenter JS20 Type 8842, also known as the blade server, is based
on the IBM Power Architecture
™
technologies.
The BladeCenter JS20 Type 8842 is compatible with IBM BladeCenter units. This
high-performance blade server is well-suited for networking environments that
require outstanding microprocessor performance, efficient memory management,
flexibility, and reliable data storage.
Notes:
1. In this document, the term BladeCenter unit refers to any IBM BladeCenter,
BladeCenter T, or other BladeCenter-class chassis model, except where
specifically indicated otherwise.
2. The number of blade servers your BladeCenter unit supports depends on the
type of BladeCenter unit. For example, the IBM Eserver BladeCenter Type
8677 supports up to 14 hot-swap blade servers; the BladeCenter T Types 8720
and 8730 support up to 8 hot-swap blade servers. See the documentation that
comes with the BladeCenter unit for more information. For more information
about determining the power requirements for the blade server, see the IBM
Eserver BladeCenter Power Module Upgrade Guidelines Technical Update on
the IBM BladeCenter Documentation CD.
3. The types and capacities of power modules your BladeCenter unit supports,
which affects the number of blade servers you can install in the BladeCenter
unit, depends on the type of BladeCenter unit. See the documentation that
comes with the BladeCenter unit for more information.
© Copyright IBM Corp. 2003 1
Release
levers
Release
button
Notes:
v In a BladeCenter unit that supports multiple types of power modules with different
capacities, such as the BladeCenter Type 8677, the maximum number of blade
servers that the BladeCenter unit supports varies by the wattage of the power
modules that are installed in the BladeCenter unit. For more information about
determining the power requirements for the blade server, see the IBM Eserver
BladeCenter Power Module Upgrade Guidelines Technical Update on the World
Wide Web at http://www.ibm.com/support/.
v Two power modules are required to support the blade servers in power domain A
in the BladeCenter unit. The following blade bays are in power domain A:
– Blade bays 1 through 6 in a BladeCenter Type 8677 or similar unit
– Blade bays 1 through 5 in a BladeCenter T unit
If you install blade servers in these blade bays, you must install power modules
in power-module bays 1 and 2 in the BladeCenter unit.
v Two additional power modules are required to support the blade servers in power
domain B in the BladeCenter unit. The following blade bays are in power domain
B:
– Blade bays 7 through 14 in a Type 8677 or similar BladeCenter unit
– Blade bays 6 through 8 in a BladeCenter T unit
you install blade servers in these blade bays, you must install power modules
If
in power-module bays 3 and 4 in the BladeCenter unit.
v Make sure that you review and understand the design of the BladeCenter unit.
Use this information to help you determine your system configuration
requirements and the bays and connectors where you will install or remove
2 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
components. For additional information, see the BladeCenter unit Installation and
User’s Guide on the Documentation CD for your BladeCenter unit, or go to
http://www.ibm.com/support/ on the World Wide Web.
Related documentation
This Hardware Maintenance Manual and Troubleshooting Guide is provided in
Portable Document Format (PDF) on the IBM BladeCenter JS20 Documentation CD
that comes with the IBM BladeCenter JS20 Type 8842. It contains information to
help you solve problems yourself or to provide helpful information to a service
technician.
In addition to this Hardware Maintenance Manual and Troubleshooting Guide , the
following information is provided in PDF on the IBM BladeCenter Documentation
CD that comes with the IBM BladeCenter JS20 Type 8842:
v Safety Information: This document contains translated caution and danger
statements. Each caution and danger statement that appears in the
documentation has a number that you can use to locate the corresponding
statement in your language in the Safety Information document.
v BladeCenter JS20 Type 8842 Installation and User’s Guide: This document
contains instructions for setting up the server, contains basic instructions for
installing some options; and provides general information about the server,
including information about features and how to configure the server.
v BladeCenter and BladeCenter T Management Module User’s Guide: This
document contains instructions for installing, starting, configuring, and using the
BladeCenter unit management module. This document also provides general
information about the management module and contains a description of the
management module features.
v BladeCenter and BladeCenter T Management Module Command-Line Interface
Reference Guide: This document contains instructions for installing, starting,
configuring, and using the IBM Eserver BladeCenter management-module
command-line interface. This document also provides general information about
the BladeCenter management-module command-line interface and contains a
description of its features.
v BladeCenter or BladeCenter T Management Module Installation Guide : This
document contains instructions for installing, setting up, starting, and configuring
the BladeCenter unit management module.
v BladeCenter unit Installation and User’s Guide: This document contains
instructions for setting up and configuring the BladeCenter unit and basic
instructions for installing some options in the BladeCenter unit. It also contains
general information about the BladeCenter unit.
v BladeCenter unit Hardware Maintenance Manual and Troubleshooting Guide:
This document contains the information to help you solve BladeCenter unit
problems yourself, and it contains information for service technicians.
v BladeCenter unit Rack Installation Instructions: This document contains
instructions for installing the BladeCenter unit in a rack.
v IBM 4-Port Gb Ethernet Switch Module for BladeCenter Installation and User’s
Guide: This document contains instructions for setting up, installing, and
configuring the IBM 4-Port Gb Ethernet Switch Module for BladeCenter and a
description of the switch-module features.
v Nortel Networks Layer 2-7 GbE Switch Module for IBM BladeCenter Installation
Guide: This document contains instructions for setting up, installing, and
Chapter 1. Introduction 3
configuring the Nortel Networks Layer 2-7 GbE Switch Module for IBM Eserver
BladeCenter and a description of the switch-module features.
v IBM BladeCenter 2-Port Fibre Channel Switch Module Installation Guide: This
document contains instructions for setting up, installing, and configuring the IBM
Eserver BladeCenter 2-Port Fibre Channel Switch Module, and a description of
the switch module features.
v Technical Update for IBM BladeCenter Fiber Channel Switch Module version
1.00: This document contains updated information about the IBM Eserver
BladeCenter 2-Port Fibre Channel Switch Module.
v IBM Eserver BladeCenter and BladeCenter T Serial Over LAN Setup Guide :
This document contains instructions for establishing a Serial Over LAN (SOL)
connection, enabling the SOL feature, and configuring the blade server so that
you can run SOL sessions and use the BladeCenter management-module
command-line interface. This document also contains instructions for updating
and configuring BladeCenter components for SOL operation using the
management-module Web-based management and configuration program.
v IBM Eserver BladeCenter Power Module Upgrade Guidelines Technical Update:
This document contains information that helps you determine the power
requirements for the blade server.
Additional documentation might be included on the IBM BladeCenter Documentation
CD.
The IBM BladeCenter Documentation CD
The IBM BladeCenter JS20 blade server Documentation CD contains
documentation for the blade server in Portable Document Format (PDF) and
includes the IBM Documentation Browser to help you find information quickly.
Hardware and software requirements
The IBM Documentation CD requires the following minimum hardware and
software:
v Microsoft
Red Hat
Windows NT
®
®
Linux
®
4.0 (with Service Pack 3 or later), Windows
®
v 100 MHz microprocessor
v 32 MB of RAM
v Adobe Acrobat Reader 3.0 (or later) or xpdf, which comes with Linux operating
systems
Note: Acrobat Reader software is included on the CD, and you can install it
when you run the Documentation Browser.
Using the Documentation Browser
Use the Documentation Browser to browse the contents of the CD, read brief
descriptions of the documents, and view documents using Adobe Acrobat Reader or
xpdf. The Documentation Browser automatically detects the regional settings in use
in the system and displays the documents in the language for that region (if
available). If a document is not available in the language for that region, the English
version is displayed.
®
2000, or
Use one of the following procedures to start the Documentation Browser:
v If Autostart is enabled, insert the CD into the CD drive. The Documentation
Browser starts automatically.
4 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
v If Autostart is disabled or is not enabled for all users:
– If you are using a Windows operating system, insert the CD into the CD drive
and click Start --> Run . In the Open field, type
x:\win32.bat
(where x is the drive letter of the CD drive), and click OK .
– If you are using a Linux operating system, insert the CD into the CD drive;
then, run the following command from the /mnt/cdrom directory:
sh runlinux.sh
Select the server from the Product menu. The Available Topics list displays all the
documents for the server. Some documents might be in folders. A plus sign (+)
indicates each folder or document that has additional documents under it. Click the
plus sign to display the additional documents.
When you select a document, a description of the document appears under Topic
Description . To select more than one document, press and hold the Ctrl key while
you select the documents. Click View Book to view the selected document or
documents in Acrobat Reader or xpdf. If you selected more than one document, all
the selected documents are opened in Acrobat Reader or xpdf.
To search all the documents, type a word or word string in the Search field and
click Search . The documents in which the word or word string appears are listed in
order of the most occurrences. Click a document to view it, and press Crtl+F to use
the Acrobat search function or Alt+F to use the xpdf search function within the
document.
Click Help for detailed information about using the Documentation Browser.
Notices and statements used in this document
The caution and danger statements that appear in this document are also in the
multilingual Safety Information document, which is on the IBM BladeCenter unit or
blade server Documentation CD. Each statement is numbered for reference to the
corresponding statement in the Safety Information document.
The following notices and statements are used in the documentation:
v Notes: These notices provide important tips, guidance, or advice.
v Important: These notices provide information or advice that might help you avoid
inconvenient or problem situations.
v Attention: These notices indicate potential damage to programs, devices, or
data. An attention notice is placed just before the instruction or situation in which
damage could occur.
v Caution: These statements indicate situations that can be potentially hazardous
to you. A caution statement is placed just before the description of a potentially
hazardous procedure step or situation.
v Danger: These statements indicate situations that can be potentially lethal or
extremely hazardous to you. A danger statement is placed just before the
description of a potentially lethal or extremely hazardous procedure step or
situation.
Chapter 1. Introduction 5
Features and specifications
This section provides a summary of the features and specifications of your blade
server. Through the BladeCenter unit management module, you can view the blade
server firmware code and other hardware configuration information.
Note: Power, cooling, removable-media drives, external ports, and advanced
system management are provided by the IBM Eserver BladeCenter unit.
For more information, see the Installation and User’s Guide for your
BladeCenter unit.
BladeCenter JS20 specifications for non-NEBS/ETSI environments
The following table provides a summary of the features and specifications of the
BladeCenter JS20 Type 8842 in a non-NEBS/ETSI environment. This includes
model-specific information.
®
Microprocessor:
®
Two IBM PowerPC
microprocessors
with 512 KB ECC L2 cache
Memory:
v Four double-data rate (DDR)
PC2700 sockets
v Minimum: 512 MB
v Maximum: 4 or 8 GB (depends on
the blade server model) *
IDE
devices:
v Support for up to two internal
integrated drive electronics (IDE)
2.5-inch hard disk drives
or
v Support for one internal IDE
2.5-inch hard disk drive in IDE
connector 1 and one optional I/O
expansion card in IDE connector 2
Note: Installing an I/O expansion
Size:
v Height: 24.5 cm (9.7 inches)
v Depth: 44.6 cm (17.6 inches)
v Width: 2.9 cm (1.14 inches)
v Maximum weight: 5.4 kg (12 lb)
Integrated
functions:
v One dual-port Gigabit Ethernet
controller
v Light path diagnostics
v Local service processor
v One IDE hard disk drive controller
with two channels
v RS-485 interface for
communication with BladeCenter
management module
v Serial Over LAN (SOL)
Predictive Failure Analysis
alerts:
v Microprocessors
v Memory
v Hard disk drives
Environment:
v Air temperature:
– Blade server on: 10° to 35°C
(50° to 95°F). Altitude: 0 to 914
m (0 to 3000 ft)
– Blade server on: 10° to 32°C
(50° to 90°F). Altitude: 914 m to
2133 m (3000 ft to 7000 ft)
– Blade server off: -40° to 60°C
(40° to 140° F)
v
Humidity:
– Blade server on: 8% to 80%
– Blade server off: 5% to 80%
Electrical input:
v Input voltage: 12 V dc
card increases network
connections.
* For information about dual inline memory module (DIMM) type and supported DIMM size, see “Installing memory
modules” on page 77.
(PFA)
Note: The operating system in the blade server must provide USB support for the
blade server to recognize and use the CD drive and diskette drive. The
BladeCenter unit uses USB for internal communications with these devices.
6 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
BladeCenter JS20 specifications for NEBS/ETSI environments
The following table provides a summary of the features and specifications of the
BladeCenter JS20 Type 8842 in a NEBS/ETSI environment. This includes
model-specific information.
Microprocessor:
®
v Two IBM Power PC
microprocessors with 512 KB ECC
L2 cache
Memory:
v Four DDR PC2700 sockets
v Minimum: 1 GB
v Maximum: 4 or 8 GB (depends on
the blade server model)
IDE
devices:
v NEBS application does not support
internal drives
Size:
v Height: 24.5 cm (9.7 inches)
v Depth: 44.6 cm (17.6 inches)
v Width: 2.9 cm (1.14 inches)
v Maximum weight: 5.4 kg (12 lb)
Integrated
functions:
v One dual-port Gigabit Ethernet
controller
v Light path diagnostics
v Local service processor
v One IDE hard disk drive controller
with two channels
v RS-485 interface for
communication with BladeCenter
management module
v Serial over LAN
Predictive
Failure Analysis (PFA)
alerts:
v Microprocessors
v Memory
Environment (NEBS):
v Air temperature:
– Blade server on: 5° to 40°C (41°
to 104°F). Altitude: -60 to 1800 m
(-197 to 6000 ft)
– Blade server on (short term): -5°
to 55°C (23° to 131°F) Altitude:
-60 to 1800 m (-197 to 6000 ft)
– Blade server on: 5° to 30°C (41°
to 86°F). Altitude: 1800 to 4000
m (6000 to 13 000 ft)
– Blade server on (short term): -5°
to 45°C (23° to 113°F). Altitude:
1800 to 4000 m (6000 to 13 000
ft)
– Blade server off: -40° to 70°C
(-40° to 158°F)
v
Humidity:
– Blade server on: 5% to 80%
– Blade server on (short term): 5%
to 90% but not to exceed 0.024
kg water/kg of dry air
– Blade server off: uncontrolled
"Short term" refers to a period
Note:
of not more than 96 consecutive hours
and a total of not more than 15 days in
1 year. (This refers to a total of 360
hours in any year, but no more than 15
occurrences during that 1-year period.)
Electrical input:
v Input voltage: 12 V dc
* For information about DIMM type and supported DIMM size, see “Installing memory modules” on page 77.
Notes:
1. The operating system in the blade server must provide USB support for the
blade server to recognize and use the CD drive and an external diskette drive.
The BladeCenter T unit uses USB for internal communication with these
devices.
Chapter 1. Introduction 7
2. BladeCenter JS20 models that are designed for the NEBS environment contain
a power-management capability that provides the maximum possible operating
time for your system. Power management is invoked only when the blade server
is installed in a BladeCenter T unit and only under the short term extended
thermal conditions that are described in the preceding table as "short term" in
the high end of the NEBS extended temperature range, 40° to 55°C (104° to
131°F). Instead of shutting down or failing in short term extended thermal
conditions, the JS20 blade server automatically reduces the frequency of the
processor to maintain acceptable thermal levels. The processor frequency
automatically returns to normal as thermal conditions improve. The BladeCenter
management module is notified when power management starts and again
when it stops.
The following entries are made in the event log:
v Frequency throttling process is now active.
(This message indicates that power reduction is in effect.)
v Frequency throttling process is now idling.
(This message indicates that power reduction was previously invoked but is
no longer in effect.)
not restart the blade server when power reduction is in effect.
Do
3. Some applications are sensitive to processor frequency changes. Check with
your application vendors to determine if there are any possible impacts to your
applications from the effects of the JS20 blade server power-management
capability in the short term extended thermal conditions of the NEBS
environment.
8 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Preinstallation checklist
Before you can use the BladeCenter unit with the blade server, you must correctly
set up and configure the BladeCenter unit, and install and configure the required
components in the BladeCenter unit. Read Appendix B, “Safety information,” on
page 163, and the information in “Installation guidelines” on page 71, and review
the documentation that comes with each device and any applicable information in
the “Related documentation” on page 3.
If you have not already done so, perform the activities on the following checklist:
__ 1. Set up the rack in which you will install the BladeCenter unit.
__ 2. Install the BladeCenter unit in a rack. For additional information, see the
Rack Installation Instructions that come with the BladeCenter unit.
__ 3. Install and configure the required BladeCenter unit components:
__ a. Make sure that the BladeCenter unit has adequate power to support
__ b. Install and configure one or two management modules in the
__ c. Install and configure one or two Ethernet switch modules in the
all the installed devices. The BladeCenter unit must contain either two
or four power modules. If necessary, on BladeCenter units that
support it, upgrade the power modules in the BladeCenter unit to
higher-capacity power modules. For additional information, see the
IBM Eserver BladeCenter Power Module Upgrade Guidelines
Technical Update.
BladeCenter unit.
BladeCenter unit.
To support the Serial Over LAN (SOL) feature on any blade server
that is installed in the BladeCenter unit:
v A SOL-compatible Ethernet switch module must be installed in I/O
bay 1 of the BladeCenter unit.
v Both the BladeCenter unit and the Ethernet switch module must be
configured so that the SOL feature is enabled and set to operate on
the same virtual local area network (VLAN).
If you plan to install the operating system through the Ethernet
network, you also must install and configure a second Ethernet switch
module in I/O bay 2 of the BladeCenter unit.
__ d. Configure the BladeCenter unit for SOL operation as described in the
__ 4. If the BladeCenter unit was shipped to you before June 2003, make sure
that:
Note: If you install other Ethernet switch modules, they do not have
to be the same type that you installed in I/O bay 1 of the
BladeCenter unit.
IBM Eserver BladeCenter and BladeCenter T Serial Over LAN Setup
Guide.
Verify that the firmware code for the BladeCenter unit, management
module, and Ethernet switch modules supports the SOL feature. If you
are not sure whether these devices come with this feature, see the
IBM Eserver BladeCenter and BladeCenter T Serial Over LAN Setup
Guide for additional information.
The SOL feature is required and must remain enabled for all
applicable devices, including the BladeCenter unit, management
module, and Ethernet switch modules.
Chapter 1. Introduction 9
__ a. The hardware and firmware in the BladeCenter unit are at the
supported levels for the blade server. Go to the IBM Support Web
site, http://www.ibm.com/support/, for additional information.
__ b. The BladeCenter unit has the correct customer interface card (CIC)
(see “Checking the status of the media tray”).
For illustrations and additional information, see the following related documentation
on the World Wide Web at http://www.ibm.com/support/:
v BladeCenter Type 8677 Rack Installation Instructions
v BladeCenter Type 8677 Installation and User ’s Guide
v BladeCenter T Types 8720 and 8730 Installation and User’s Guide
v BladeCenter T 2-Post Rack Mount Kit Installation Instructions
v BladeCenter T 4-Post and Universal Telco Frame (UTF) Rack Mount Kit
Installation Instructions
v IBM Eserver BladeCenter Power Module Upgrade Guidelines Technical Update
v BladeCenter Management Module Installation Guide
v BladeCenter T Management Module Installation Guide
v BladeCenter and BladeCenter T Management Module Command-Line Interface
Reference Guide
v IBM Eserver BladeCenter and BladeCenter T Serial Over LAN Setup Guide
v The documentation that comes with the Ethernet switch module that you are
using; for example:
– IBM 4-Port Gb Ethernet Switch Module for BladeCenter Installation and User’s
Guide
– Nortel Networks Layer 2-7 GbE Switch Module for IBM BladeCenter
Installation Guide
For more information, see “Related documentation” on page 3.
Note:
Checking the status of the media tray
If you received a BladeCenter unit other than a Type 8677, this topic does not
apply.
Important: If you received a Type 8677 BladeCenter unit before June 2003, the
customer interface card (CIC) in the media tray of the BladeCenter unit might need
to be replaced before the CD drive will work correctly with a BladeCenter JS20
Type 8842.
If you received a Type 8677 BladeCenter unit before June 2003, start the
management-module Web interface and perform these steps to determine if the
CIC in your BladeCenter unit needs to be replaced:
1. In the navigation pane on the left side, select Monitors ; then, select Hardware
VPD .
2. While looking at the “BladeCenter Hardware VPD” table in the right pane, find
the row for module name “Media Tray”.
3. Check the “FRU Number” column for the “Media Tray”.
4. If you see 59P6629, have the CIC replaced before installing a BladeCenter
JS20 Type 8842 in the BladeCenter unit.
10 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
To have the CIC replaced, call the IBM Support Center and report the CIC as a
failed part and request replacement with the latest CIC field replaceable unit (FRU).
Chapter 1. Introduction 11
12 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Chapter 2. Blade server power, controls, and indicators
This chapter describes the power features, how to turn on and turn off the blade
server, and what the controls and indicators mean.
Turning on the blade server
Important: To generate faster blade-server startups from the network, connect the
dynamic host configuration protocol (DHCP) server to the Ethernet switch module in
I/O bay 2 in the BladeCenter unit. The system firmware code in the blade server
detects this Ethernet controller first. The Ethernet controller in each blade server is
then associated with the switch module in I/O bay 2.
Notes:
v After you connect the power cords of the BladeCenter unit to the electrical
outlets, wait until the power-on LED on the blade server flashes slowly before
pressing the blade server power-control button. Before the LED flashes, the
service processor in the BladeCenter management module is initializing, and the
power-control button on the blade server will not respond.
v While the blade server is powering up, the power-on LED on the front of the
server is lit. See “Blade server controls and LEDs” on page 14 for the power-on
LED states.
v After an orderly shutdown of the operating system occurs, the Wake on LAN
feature is permanently enabled in the blade server system firmware. Therefore,
Enabled is the default setting. The Wake on LAN setting for each blade server is
stored in the management-module nonvolatile random-access memory (NVRAM).
To disable the Wake on LAN feature for one or more blade servers, use the
BladeCenter management-module Web interface. For more information about the
BladeCenter management-module Web interface, see the BladeCenter and
BladeCenter T Management Module User’s Guide on the IBM BladeCenter
Documentation CD.
v Throughout this document, the management-module Web-based user interface is
also known as the BladeCenter management-module Web interface.
®
you connect the BladeCenter unit to power, the blade server can start in any
After
of the following ways:
v You can press the power-control button on the front of the blade server (behind
the control panel door) to start the server.
v If a power failure occurs, the BladeCenter unit and then the blade server can
start automatically when power is restored (if the blade server is configured
through the BladeCenter management module to do so).
v You can turn on the blade server remotely by means of the service processor in
the BladeCenter management module.
v If the operating system supports the Wake on LAN feature and it has not been
disabled through the BladeCenter management-module Web interface, the blade
server power-on LED is flashing slowly, and the Wake on LAN feature can turn
on the blade server.
© Copyright IBM Corp. 2003 13
Turning off the blade server
When you turn off the blade server, it is still connected to power through the
BladeCenter unit. The blade server can respond to requests from the service
processor, such as a remote request to turn on the blade server. To remove all
power from the blade server, you must remove it from the BladeCenter unit.
Shut down the operating system before you turn off the blade server. See the
operating-system documentation for information about shutting down the operating
system.
If the blade server has not been turned off, it can be turned off in any of the
following ways:
v You can press the power-control button on the blade server (behind the control
panel door). This starts an orderly shutdown of the operating system, if this
feature is supported by the operating system.
Note: After turning off the blade server, wait at least 5 seconds before you press
the power-control button to turn on the blade server again.
v If the operating system stops functioning, you can press and hold the
power-control button for more than 4 seconds to turn off the blade server.
v The management module can turn off the blade server.
After turning off the blade server, wait at least 30 seconds for its hard disk
Note:
drives to stop spinning before you remove the blade server from the
BladeCenter unit.
Blade server controls and LEDs
This section describes the controls and LEDs on the blade server.
Power-control button: This button is behind the control panel door. Press this
button to manually turn the blade server on or off.
Note: The power-control button has effect only if local power control is enabled for
the blade server. Local power control is enabled and disabled through the
BladeCenter management-module Web interface.
14 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Power control button
Notes:
1. The blade-error LED, information LED, and location LED can be turned off
through the BladeCenter management-module Web interface.
2. For additional information about errors, see “Light path diagnostics” on page 46.
3. This blade server does not have a keyboard/mouse/video select button.
Blade-error
LED
Information
LED
Location
LED
Activity
LED
Power-on
LED
CD/diskette/USB
select button
CD
Blade-error LED: When this amber LED is lit, it indicates that a system error has
occurred in the blade server.
Information LED: When this amber LED is lit, it indicates that information about a
system error for this blade server has been placed in the BladeCenter system-error
log.
Location LED: When this blue LED is lit, it has been turned on remotely by the
system administrator to aid in visually locating the blade server. The location LED
on the BladeCenter unit will be lit also.
Chapter 2. Blade server power, controls, and indicators 15
Activity LED: When this green LED is lit, it indicates that there is hard disk drive or
network activity.
Power-on LED: This green LED indicates the power status of the blade server in
the following manner:
v Flashing rapidly – The service processor on the blade server is communicating
with the BladeCenter management module.
v Flashing slowly – The blade server has power but is not turned on.
v Lit continuously (steady) – The blade server has power and is turned on.
16 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Chapter 3. Configuration
The firmware in the blade server uses auto-configuration; therefore, additional
blade-server configuration programs are not required for the blade server. However,
if you have attached other devices to the blade server or the BladeCenter unit, you
must configure those devices as described in the applicable documentation that
comes with those devices or the BladeCenter unit. You do not have to set any
passwords to use the blade server. If you change the battery or replace the system
board, you must reset the date and time through the operating system.
You must establish a Serial Over LAN (SOL) connection and start an SOL session
on the blade server:
v To establish a communications channel between the blade server and a
compatible monitor (or video console), keyboard, and mouse
v To install the operating system on the blade server
v To configure the SOL feature
v To run diagnostics programs
v To have the blade server serviced
information relating to establishing an SOL connection, enabling the SOL
For
feature, and configuring the blade server so that you can run SOL sessions and use
the BladeCenter management-module command-line interface, see the following
documents on the IBM BladeCenter Documentation CD:
v IBM Eserver BladeCenter JS20 Installation and User’s Guide
v IBM Eserver BladeCenter and BladeCenter T Management Module
Command-Line Interface Reference Guide
Other documents on the IBM BladeCenter Documentation CD that you might find
useful in the configuration process are:
v IBM 4-Port Gb Ethernet Switch Module for BladeCenter Installation and User’s
Guide
v Nortel Networks Layer 2-7 GbE Switch Module for IBM BladeCenter Installation
Guide
For information about setting up the network configuration for remote management,
see the IBM Eserver BladeCenter Planning and Installation Guide or the IBM
Eserver BladeCenter T Planning and Installation Guide . Yo u can obtain the
planning guide from the Web site at http://www.ibm.com/pc/support.
To support the SOL feature and to configure the blade server, you must install a
compatible Ethernet switch module in I/O bay 1 of the BladeCenter unit. Examples
of compatible Ethernet switch modules are the IBM 4-Port Gb Ethernet Switch
Module for BladeCenter and the Nortel Networks Layer 2-7 GbE Switch Module for
IBM BladeCenter. For more information about these switch modules, see the IBM
4-Port Gb Ethernet Switch Module for BladeCenter Installation and User’s Guide or
Nortel Networks Layer 2-7 GbE Switch Module for IBM BladeCenter Installation
Guide on the IBM BladeCenter Documentation CD. Information is also available in:
v Chapter 6, “Running a Serial Over LAN session,” on page 33
v The IBM Eserver BladeCenter and BladeCenter T Management Module
Command Line Interface Reference Guide on the IBM BladeCenter
Documentation CD
© Copyright IBM Corp. 2003 17
Note: The BladeCenter unit supports up to four Ethernet switch modules.
The SOL feature is accessed through the Management Module Command-Line
Interface. For information about using the command-line interface, see “Using the
command-line interface” and the IBM Eserver BladeCenter and BladeCenter T
Management Module Command Line Interface Reference Guide on the IBM
BladeCenter Documentation CD.
Using the command-line interface
The IBM Eserver BladeCenter Management Module Command-Line Interface
provides direct access to BladeCenter management functions as an alternative to
using the Web interface. Using the command-line interface, you can issue
commands to control the power and configuration of the blade server and other
components installed in the BladeCenter unit. The command-line interface also
provides access to the text-console command prompt for the blade server through
an SOL connection. See the IBM Eserver BladeCenter and BladeCenter T
Management Module Command Line Interface Reference Guide on the IBM
BladeCenter Documentation CD for information and instructions.
Configuring the Gigabit Ethernet controller
One dual-port Gigabit Ethernet controller is integrated on the blade server system
board. Each controller port provides a 1000-Mbps full-duplex interface for
connecting to one of the Ethernet-compatible switch modules in I/O bays 1 and 2,
which enables simultaneous transmission and reception of data on the Ethernet
local area network (LAN). Each Ethernet controller port on the system board is
routed to a different switch module in I/O bay 1 or bay 2. The routing from the
Ethernet controller port to the I/O bay will vary based on blade server type and the
operating system that is installed. See “Blade server Ethernet controller
enumeration” on page 19 for information about how to determine the routing from
the Ethernet controller ports to I/O bays for the blade server.
Note: Other types of blade servers, such as the BladeCenter HS20 Type 8678, that
are installed in the same BladeCenter unit as this BladeCenter JS20 Type
8842 might have different requirements for Ethernet controller routing. See
the documentation that comes with the other blade servers for detailed
information.
You do not need to set any jumpers or configure the controllers for the blade server
operating system. However, you must install a device driver to enable the blade
server operating system to address the Ethernet controller ports. For device drivers
and information about configuring the Ethernet controller ports, see the Ethernet
software documentation that comes with the blade server, or contact your reseller or
IBM marketing representative. For updated information about configuring the
controllers, go to the IBM Support Web site at http://www.ibm.com/support/.
The Ethernet controller supports failover, which provides automatic redundancy for
the Ethernet controller ports. Without failover you can have only one Ethernet
controller port from each server attached to each virtual LAN or subnet. With
failover you can configure more than one Ethernet controller port from each server
to attach to the same virtual LAN or subnet. Either one of the integrated Ethernet
controller ports can be configured as the primary Ethernet controller port. If you
have configured the controller ports for failover and the primary link fails, the
secondary controller port takes over. When the primary link is restored, the Ethernet
18 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
traffic switches back to the primary Ethernet controller port. (See the
operating-system device driver documentation for information about configuring for
failover.)
Important: To support failover on the blade server Ethernet controller, the Ethernet
switch modules in the BladeCenter unit must have identical configurations to each
other.
Blade server Ethernet controller enumeration
The enumeration of the Ethernet controllers or controller ports in a blade server is
operating-system dependent. Yo u can verify the Ethernet controller or controller port
designations that a blade server uses through the operating-system settings.
The routing of an Ethernet controller or controller port to a particular BladeCenter
unit I/O bay depends on the type of blade server. You can verify which Ethernet
controller port in this blade server is routed to which I/O bay by using the following
test:
1. Install only one Ethernet switch module or pass-thru module in I/O bay 1.
2. Make sure that the ports on the switch module or pass-thru module are enabled
(I/O Module Tasks → Management → Advanced Management in the
BladeCenter management-module Web interface).
3. Enable only one of the Ethernet controller ports on the blade server. Note the
designation that the blade server operating system has for the controller port.
4. Ping an external computer on the network connected to the Ethernet switch
module. If you can ping the external computer, the Ethernet controller port that
you enabled is associated with the switch module in I/O bay 1. The other
Ethernet controller port in the blade server is associated with the switch module
in I/O bay 2.
If you have installed an I/O expansion card on a blade server, communications from
the option are routed to I/O bays 3 and 4. You can verify which controller port on
the card is routed to which I/O bay by performing the above test, using a controller
on the I/O expansion card and a compatible switch module or pass-thru module in
I/O bay 3 or 4.
Chapter 3. Configuration 19
20 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Chapter 4. Problem determination procedures for AIX and
Linux
This chapter outlines the procedure to follow if the server suspends operation
without notice.
Use the following procedure if any of the following is true:
v The console displays
– an SRN/SRC code
– an 8-digit firmware error code
– a 3- or 4-digit firmware checkpoint (progress) code
The server does not start up after installation
v
v The server experiences an undetermined error while running, such as if the
server stops running with no error code displayed
Certain errors listed in the SRN/SRC table, Failing Function Code table, and
Symptom-to-FRU index will also direct you to perform the diagnostic procedure
based on the operating system and the type of problem.
Problem determination
Perform the steps in this section to perform the problem determination.
Step 001 Check for the following information:
002 Perform the following steps:
Step
1. If a firmware checkpoint (progress) code (3 or 4 digits) is
displayed, see “Firmware checkpoint (progress) codes” on page
94.
2. If a firmware error code (8 digits) is displayed, see “Firmware
error codes” on page 102.
3. If you have an SRN or SRC, see “SRN tables” on page 110.
4. Check the BladeCenter management module event log. If an
error was recorded by the system, see “SRN tables” on page
110.
5. Check the blade error LED on the information LED panel; if it is
lit, see “Light path diagnostics LEDs” on page 145.
6. If the Blade has stalled, with no error codes and no command
line or login prompt, continue to Step 002
7. If the login prompt appears and you still suspect a problem,
continue to Step 002 .
8. If you have none of the above symptoms, go to “Undetermined
problems” on page 156.
1. Turn off the server, making sure to first turn off all external
devices, if attached.
2. Check all cables and power cords.
3. Turn on all external devices; then, turn on the blade server.
© Copyright IBM Corp. 2003 21
4. Start the Serial Over Lan (SOL) console for the blade server to
be tested and check for the following responses:
a. Progress codes are displayed on the console.
b. AIX or Linux login prompt appears.
003 Record any error messages or codes that are displayed on the
Step
screen. If the last error is a POST or firmware error (8-digit) code,
look up the error in “Firmware error codes” on page 102. If the last
error or is a firmware checkpoint (progress) code (3 or 4 digits), see
“Firmware checkpoint (progress) codes” on page 94.
Step 004 Check the BladeCenter management module event log. If an error
was recorded by the system, see Chapter 10, “Symptom-to-FRU
index,” on page 93.
Obtaining an SRN/SRC or error code
Perform the steps in this section to get a service request number (SRN), or to
obtain an SRC or error code.
Step 001 Check for the following information:
1. If a firmware checkpoint (progress) code (3 or 4 digits) is
displayed, see “Firmware checkpoint (progress) codes” on page
94.
2. If a firmware error code (8 digits) is displayed, see “Firmware
error codes” on page 102.
3. If you have an SRN or SRC, see “SRN tables” on page 110.
4. Check the BladeCenter management module event log. If an
error was recorded by the system, continue with Step 002 ;
then, see “SRN tables” on page 110.
5. If the login prompt appears and you still suspect a problem,
continue to Step 002 .
6. If you have none of the above symptoms, go to “Undetermined
problems” on page 156.
002 Visually check the system for obvious problems such as unplugged
Step
power cables or external devices that are powered off.
Did you find an obvious problem?
No Go to Step 003.
Yes Fix the problem.
003 Is the operating system AIX?
Step
Yes Record any information or messages that may be provided
on the system console and then go to Step 005 to
perform problem determination procedures.
No Go to Step 004.
Step 004 Is the operating system Linux?
Yes Record any information or messages that may be provided
on the system console; then go to Step 007 , to perform
standalone diagnostics. If you cannot load standalone
diagnostics, then answer this question No .
No Go to “Undetermined problems” on page 156.
22 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Step 005 Perform the following procedure for problem determination.
Note: When possible, run AIX Online Diagnostics in Concurrent
Mode. AIX Online diagnostics perform additional functions,
compared to standalone diagnostics CD.
1. Perform the AIX online concurrent mode diagnostics for
Problem Determination, see “Performing AIX online concurrent
mode diagnostics for problem determination” on page 25.
Record any diagnostic results and utilize the “SRN tables” on
page 110 to identify the failing component.
Note: If you have replaced the failing component then to verify
the repair go to “Verifying the replacement part using AIX
diagnostics” on page 30.
2. If you cannot perform AIX concurrent online diagnostics then
continue to Step 006 .
Step 006 Perform the following steps:
1. Turn off the system unit power and wait 45 seconds before
proceeding.
2. Turn on the system unit power.
3. Start the Serial Over Lan (SOL) console for the blade server to
be tested and check for the following responses:
a. Progress codes are displayed on the console.
b. Record any messages or diagnostic information that may be
displayed on the system console.
Load the standalone diagnostics CD-ROM. Go to “Running the
4.
standalone diagnostics from CD-ROM” on page 25 or “Running
standalone diagnostics from a management (NIM) server” on
page 68.
5. If you have replaced the failing component then to verify the
repair go to “Verifying the replacement part using AIX
diagnostics” on page 30.
6. If you are still having a problem or think you still have a problem
call the support center.
Step 007 If the operating system is Linux then perform the following:
This ends the AIX procedure.
1. Turn off the system unit power and wait 45 seconds before
proceeding.
2. Turn on the system unit power.
3. Start the Serial Over Lan (SOL) console for the blade server to
be tested and check for the following responses:
a. Progress codes are displayed on the console.
b. Record any messages or diagnostic information that may be
displayed on the system console.
Continue with step 008 .
Chapter 4. Problem determination procedures for AIX and Linux 23
Step 008 Load the Standalone Diagnostics in Service Mode. Refer to
“Running the standalone diagnostics from CD-ROM” on page 25 or
“Running standalone diagnostics from a management (NIM) server”
on page 68.
Can you load the standalone diagnostics?
No Go to “Undetermined problems” on page 156. If you still
have a problem then Call to get additional support.
Yes Select the resources to be tested and record any diagnostic
information (SRNs or SRC Error codes) and go to “SRN
tables” on page 110.
This ends the Linux procedure.
24 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Chapter 5. AIX online, standalone and verification procedures
This chapter describes the procedures for performing online concurrent and
CD-based diagnostics, and replacement part verification for an AIX operating
system.
Performing AIX online concurrent mode diagnostics for problem
determination
Perform the following steps to run online diagnostics in concurrent mode:
1. Log in to the AIX operating system as root user, or use CE login. If you need
help, contact the system operator.
2. Enter the diag command to load the diagnostic controller, and display the
online diagnostic menus.
3. At the Function Selection menu, select Diagnostic Routines.
4. At the diagnostic mode selection menu, select problem determination.
5. When the Diagnostic Selection menu is displayed, choose a resource or all
resources to be tested.
6. After selecting the resource that you want to test, select Commit (PF7 = F7).
The resources will then be tested.
7. If any SRNs or firmware error codes are displayed, record all information
provided from the diagnostic results; then, go to the SRN tables. If ″ No trouble
found″ is displayed, continue to next step.
8. When testing is complete, press F3 to return to the Diagnostic Operating
Instructions display.
9. Press Ctrl+D to log off from root user or CE login.
10. When finished, contact your hardware service provider with any information
you received during the diagnostics, including service request numbers (SRNs)
or firmware error codes.
Running the standalone diagnostics from CD-ROM
The AIX diagnostics can be downloaded from the World Wide Web at
http://techsupport.services.ibm.com/server/mdownload/diags/download.
Note: Select the diagnostics that indicate JS20 support.
To run standalone diagnostics in service mode, complete the following steps:
Step 001 Verify with the system administrator and systems users that the
system unit may be shut down. Stop all programs including the
operating system (refer to the AIX operating system documentation
for information on the shutdown command). Make the CD drive
available to the system on which you want to run standalone
diagnostics (see the BladeCenter Management Module Operations
Guide for more information).
Step 002 Before attempting to load standalone CD diagnostic, make sure you
are at the latest firmware level (xx.xx) before continuing.
Step 003 Log in to the management module.
Step 004 Enable SOL for the JS20 Blade to be tested.
© Copyright IBM Corp. 2003 25
Step 005 Select CD-ROM as the first device to be booted from the
configuration menu boot sequence.
Step 006 On the operator panel on the blade to be tested, press the CD
button to assign the CD-ROM to the blade to be tested; then, insert
the diagnostic CD into the CD drive.
Step 007 Turn on the blade to be tested.
Note: This could take from 3 to 5 minute to load diagnostic from
CD. Please be patient.
Progress codes will stop at E1AD; look at the CD drive activity LED
(flashing). Standalone diagnostics are booting. This may take 2 to 3
minutes. During this time the system may reboot and the progress
code will stop at E14D again.
Note: Only firmware progress codes, not AIX progress codes, are
displayed when booting from the diagnostic CD or booting
AIX.
The screen will display “Welcome to AIX”.
Note: Once you have the “Welcome to AIX” screen, another 3 to 5
minutes may be required to get to the next screen; please
be patient.
A console message will display:
Standalone Diagnostic has completed loading. Please remove the
Diagnostic CD from the Tray.
You can leave the CD installed at this point or remove the CD
(256MB RAM required to remove CD at this point). The diagnostic
application is now loaded into memory and the CD is no longer
required.
The screen will display “Please define the system console”. Follow
the instructions on the screen to define the system console. A
choice of vs100 as the system console is recommended.
Notes:
v At this point, you can follow the instructions to run standalone
diagnostics or service aids from CD. Once you are done, you
can press the F10 key to exit diagnostics.
v The operating system will not be available until the system is
rebooted with the diagnostics CD removed from the CD drive, or
with a different startup drive selected.
008 At the Diagnostic Operating Instructions screen, press Enter.
Step
Step 009 From the Function Selection screen, use the up and down arrow
keys to select the function to be performed. Select the type of
terminal to be defined from the list provided at the prompt, for
example, type ″ vs100″ .
26 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Note: Use vs100 as the type of terminal; however, depending on
the terminal emulator selected, the function keys (PF#) may
not function. In this case, use the F# function keys or press
Esc and the number in the screen menus. For example, for
PF3 you can press F3 or you can press the Esc key and the
#3 key.
010
Step
v Select Diagnostic Routine and, if attempting to run diagnostics in
Problem Determination, go to Step 011 .
v Select Diagnostic Routine and, if attempting to run diagnostics in
System Verification, go to Step 012 .
v Select Task Selection if attempting to perform diagnostic service
aids (for example, Display Hardware Error Report); then, go to
Step 013 .
011 Problem determination.
Step
From the Function Selection screen, use the up or down arrow
keys to select Diagnostic Routines ; then, press Enter.
1. Press 1, then press Enter. From the diagnostic selection menu,
use the up or down arrow keys to select Problem
Determination .
2. Select the resource to be tested and the commit (PF) key.
3. Record any results provided and go to the “SRN tables” on
page 110 to identify the failure and perform the action(s).
4. When testing is complete, use the F3 or the Esc and #3 keys to
return to the Diagnostic Selection . If you want to run another
test, press F3 or the Esc and #3 keys again to return to the
Function Selection screen.
5. If you want to exit standalone diagnostics, select the exit
function (F10 key) from the menu.
012 System Verification
Step
From the Function Selection screen, use the up or down arrow
keys to select Diagnostic Routines and press Enter.
1. Press 1, then press Enter
2. From the Diagnostic Selection menu, use the up or down
arrow keys to select System Verification .
3. Select the resource to be verified or select All Resources , and
press the Commit (PF) key.
4. Record any results provided; then, go to the “SRN tables” on
page 110 to identify the failure and perform the actions.
5. When testing is complete, use the F3 or the Esc and #3 keys to
return to the Diagnostic Selection screen; then, press F3 or
the Esc and #3 keys again to return to the Function Selection
screen if you want to verify another component.
013 Task selection.
Step
From the Function Selection screen use the up or down arrow
keys to select Task Selection ; then, press Enter.
1. Use the up or down arrow keys to select the task to be run;
then, press Enter.
2. Follow the instruction for the task selected.
Chapter 5. AIX online, standalone and verification procedures 27
3. When the task is completed, press F3 or the Esc and #3 keys
to return to the Task Select screen.
4. If you want to run another task, select the task to be performed.
From the Task Selection list, select the service aid task you
want to perform; for example, Update and Manage System
Flash (see “Tasks (service aids)” on page 63).
5. After a task is selected, a resource menu might be presented
showing all resources supported by the task.
6. When you are finished with Task Selection , press F3 or the
Esc and #3 keys to return to the Function Select screen, or
press F10 to exit.
014 Once you have completed the “Please ensure that you reset the
Step
boot list” screen, remove the CD if it is still in the CD drive, and
make sure you set the original boot list that had been defined by
the user.
Step 015 If you are still having a problem or think you still have a problem,
call the support center.
Performing AIX online concurrent mode diagnostics for previous
diagnostic results: service aids
Complete the following steps to display previous diagnostic results form online
diagnostics in concurrent mode:
1. Log in to the AIX operating system as root user, or use CE login. If you need
help, contact the system operator.
2. Enter the diag command to load the diagnostic controller, and display the online
diagnostic menus.
3. At the Function Selection menu, select Task Selection .
4. At the Task Selection List menu, select Display Previous Diagnostic
Results .
5. At the Previous Diagnostic Results menu, select Display Diagnostic Log
Summary .
The diagnostic log will be shown with a time-ordered table of events from the
error log. Look in the ’T’ column form the most recent entry that is an ’S’ type of
entry.
Press Enter to select that row in the table; then, choice commit.
The details of this entry from the table will be displayed; look for the ″ SRN″
entry shown near the end of the entry and record the information shown.
Example:
28 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
IDENTIFIER: DATE
Date/Time: Fri Jul 16 04:06:09
Sequence Number: 287
Event type: SRN Callout
Resource Name: sysplanar0
Resource Description: System Planar
Location: 00-00
Diag Session: 12736
Test Mode: No Console,Non-Advanced,Normal IPL,ELA,Option Checkout
Error Log Sequence Number: 3
Error Log Identifier: BFE4C025
SRN: 2B276422
Description: Refer to the Error Code to FRU Index in the system service guide.
Probable FRUs:
n/a FRU: 09P0406 P1-C1
_______________________________
[BOTTOM]
Use Enter to continue.
Esc+3=Cancel Esc+0=Exit Enter
6. If any SRNs are displayed, record all information provided from the diagnostic
results; then, go to“SRN tables” on page 110 or “Firmware error codes” on page
102.
If ″ No trouble found″ is displayed, continue to next step.
7. When results are complete, press F3 to return to the Diagnostic Operating
Instructions display.
8. Press Ctrl+D to log off from root user or CE login.
9. When finished, contact your hardware service provider with any information you
received during the diagnostics, including service request numbers (SRNs) or
firmware error codes.
Performing AIX online concurrent mode diagnostics for system
verification
Note: This procedure is for verifying newly-installed components, such as hard disk
drives, options, etc.
Perform the following steps to run online diagnostics in concurrent mode:
1. Log in to the AIX operating system as root user, or use CE login. If you need
help, contact the system operator.
2. Enter the diag command to load the diagnostic controller, and display the online
diagnostic menus.
3. At the Function Selection menu, select Diagnostic Routines .
4. At the Diagnostic Mode Selection menu, select System Verification .
5. When the Diagnostic Selection menu is displayed, choose the resource to be
tested or all resources to be tested.
6. After selecting the resource, select Commit (PF7 or F7). The resources will then
be tested.
Chapter 5. AIX online, standalone and verification procedures 29
7. If any SRNs or firmware error codes are displayed, record all information
provided from the diagnostic results, go to the “SRN tables” on page 110 or
“Firmware error codes” on page 102.
If ″ No trouble found″ is displayed, continue to the next step.
8. When testing is complete, press F3 to return to the Diagnostic Operating
Instructions display.
9. Press Ctrl+D to log off from root user or CE login.
When finished, contact your hardware service provider with any information you
received during the diagnostics, including service request numbers (SRNs) or
firmware error codes.
Verifying the replacement part using AIX diagnostics
Complete the following steps to verify the replacement part using AIX diagnostics:
1. Did you use an AIX or online diagnostics service aid hot-swap operation to
replace the part?
Note: When you are not sure, answer this question No.
No Go to Step 2.
Yes Go to Step 4 on page 31. Note: Hot plug is currently not supported on
the JS20.
Follow these steps:
2.
a. Start the system.
b. Wait until the AIX operating system login prompt displays or until apparent
system activity on the operator panel or display has stopped.
Did the AIX login prompt display?
No If an SRN displays, suspect a loose adapter or cable connection.
Review the procedures for the part that you replaced to ensure that
the new part is installed correctly. If you cannot correct the problem,
collect all SRNs or any other reference code information that you
see.
Contact your service provider for assistance.
Note: If you received an SRN or any other reference code when
you attempted to start the system, you can learn more about
these codes in the “SRN tables” on page 110 or “Firmware
error codes” on page 102.
This ends the procedure.
Yes Go to Step 3.
If the Resource Repair Action menu appears, go to Step 6 on page 316. If not,
3.
follow these steps:
a. Log in as root user or use CE login
b. At the command line, type diag -a and press Enter to check for missing
resources. Follow any instructions that appear. If an SRN displays, suspect
a loose card or connection. If no instructions appear, no resources were
detected as missing; go to Step 4 on page 31.
Note: If you have a resource with a “-M”, this means the resource is no
30 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
longer available to this JS20 blade. Check to see if the resource
(such as USB CD-ROM or diskette drive) are assigned to the JS20
blade against which you are running diag -. Follow the prompts to
resolve the resource conflict. See“Missing resources” on page 60 for
more information. If an 8-digit error code is displayed, go to
“Firmware error codes” on page 102, find the error and perform the
listed action. If an SRN is displayed, record it and go to “SRN tables”
on page 110.
c. Then, proceed to the next step.
Complete these steps:
4.
a. At the command line, type diag and press Enter to move to the Function
Selection menu.
b. From the Function Selection menu, select Advanced Diagnostics
Routines and press Enter.
c. From the Diagnostic Mode Selection menu, select System Verification
and press Enter.
d. When the Diagnostic Selection menu appears, select All Resources , or
test only the part you replaced, along with any devices that are attached to
the part you replaced, by selecting the diagnostics for the individual part.
Press F7 = commit to run the tests. Did the Resource Repair Action menu
appear?
No Go to Step 5.
Yes Go to Step 6.
Did the ″Testing complete, no trouble was found″ display appear?
5.
No There is still a problem. Contact your service provider. This ends the
procedure.
Yes Complete the following steps:
a. Press F3 or Enter to return to the Advanced Diagnostic selection,
then press F3 or press the Esc key and the #3 key for the Function
Selection menu.
b. Select Task Menu .
c. Select Log Repair Action to update the AIX error log. If the repair
action was reseating a cable or adapter, select the resource
associated with that repair action. If the resource associated with
your action is not displayed on the resource list, select ″ sysplanar0″
and press F7 = Commit.
Note: This action changes the indicator light for the part from the
Fault state to the Normal state. Go to Step 8 on page 32.
6. When a test is run on a resource in System Verification mode and that resource
has an entry in the AIX error log, then, if the test on the resource was
successful, the Resource Repair Action menu appears. After replacing a part,
you must select the resource for that part from the Resource Repair Action
menu. This updates the AIX error log to indicate that a system-detectable part
has been replaced.
Note: On systems with an indicator light for the failing part, this changes the
Follow these steps:
a. Select the resource that has been replaced from the Resource Repair Action
indicator light from the Fault state to the Normal state.
menu. If the repair action was reseating a cable or adapter, select the
Chapter 5. AIX online, standalone and verification procedures 31
resource associated with that repair action. If the resource associated with
your action does not appear on the resource list, select ″ sysplanar0″ and
press Enter.
b. After you have made your selections, choose F7 Commit. Did another
Resource Repair Action display appear? If RA Complete appears, press
Enter for the NTF screen.
No If the “No trouble found” display appears, go to Step 8.
Yes Go to Step 7.
7. The parent or child of the resource you just replaced may also require that you
run the Resource Repair Action option on it. When a test is run on a resource in
System Verification mode and that resource has an entry in the AIX error log,
then, if the test on the resource was successful, the Resource Repair Action
menu appears. After replacing that part, you must select the resource for that
part from the Resource Repair Action menu. This updates the AIX error log to
indicate that a system-detectable part has been replaced.
Note: This changes the indicator light for the part from the Fault state to the
Normal state.
Complete these steps:
a. From the Resource Repair Action menu, select the parent or child of the
resource that has been replaced. If the repair action was reseating a cable
or adapter, select the resource associated with that repair action. If the
resource associated with your action does not appear on the resource list,
select ″ sysplanar0″ and press Enter.
b. After you have made your selections, choose Commit.
c. If the “No trouble found” display appears, go to Step 8.
8. If the operating system is not started, then start the operating system with the
system or partition in normal mode. Were you able to start the operating
system?
No Contact your service provider. This ends the procedure.
Yes This ends the procedure.
32 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Chapter 6. Running a Serial Over LAN session
The IBM Eserver BladeCenter management-module command-line interface
provides a convenient method for entering commands that manage and monitor
BladeCenter components. The blade server does not support a direct connection to
a monitor, keyboard, or mouse. Therefore, to enable communication between the
blade server and these devices, you must first configure the SOL feature on the
blade server to establish an SOL connection and then start an SOL session as
described in this chapter.
Note: Detailed information about configuring the SOL feature is described in the
BladeCenter JS20 Installation and User’s Guide on the IBM BladeCenter
Documentation CD.
This chapter contains the following information about running an SOL session:
v Starting the command-line interface
v Establishing a Telnet connection
v Establishing a Secure Shell (SSH) connection
v Starting an SOL session
v Ending an SOL session
In the BladeCenter environment, the integrated system management processor
(ISMP) and network interface controller (NIC) on each blade server route the serial
data from the blade server serial communications port to the network infrastructure
of the BladeCenter unit, including an Ethernet compatible I/O module that supports
SOL communication. Configuration of BladeCenter components for SOL operation is
done through the BladeCenter management module (see the BladeCenter JS20
Installation and User’s Guide on the IBM BladeCenter Documentation CD). The
management module also acts as a proxy in the network infrastructure to couple a
client running a Telnet session with the management module to an SOL session
running on a blade server, allowing the Telnet client to interact with the serial port of
the blade server over the network. Because all SOL traffic is controlled by and
routed through the management module, it is possible for administrators to
segregate the management traffic for the BladeCenter unit from the data traffic of
the blade servers.
To start an SOL connection with a blade server, you must first start a Telnet
command-line interface session with the management module. When this Telnet
command-line interface session is running, you can start a remote console SOL
session with any blade server installed in the BladeCenter unit that is set up and
enabled for SOL operation. You can establish as many as 20 separate Telnet
sessions with the BladeCenter management module, giving you the ability to have
14 simultaneous SOL sessions active (one for each of up to 14 blade servers) with
six additional command-line interface sessions available for BladeCenter unit
management. If security is a concern, secure shell (SSH) sessions can be used to
establish secure Telnet command-line interface sessions with the BladeCenter
management module before starting an SOL console redirect session with a blade
server.
The most recent versions of all BladeCenter documentation are available from the
IBM Web site. Complete the following steps to check for updated BladeCenter
documentation and technical updates:
1. Go to http://www.ibm.com/support/.
© Copyright IBM Corp. 2003 33
2. In the Learn section, click Online publications .
3. On the “Online publications” page, in the Brand field, select Servers .
4. In the Family field, select BladeCenter .
5. Click Continue .
Detailed
information about SOL setup instructions is available in the BladeCenter
JS20 Installation and User’s Guide on the IBM BladeCenter Documentation CD.
Also, see the documentation for the operating system for information about
commands that you can enter through an SOL connection. See the IBM Eserver
BladeCenter and BladeCenter T Management Module Command Line Interface
Reference Guide for information about:
v Command-line interface guidelines
v Command syntax and descriptions
v Command-line interface error messages
Selecting the command target
You can use the command-line interface to target commands to the management
module or to other devices installed in the BladeCenter unit. The command line
prompt indicates the persistent command environment: the environment where
commands are entered unless otherwise redirected. When a command-line
interface session is started, the persistent command environment is “system”; this
indicates that commands are being directed to the BladeCenter unit. See the IBM
Eserver BladeCenter and BladeCenter T Management Module Command Line
Interface Reference Guide for additional information.
Starting the command-line interface
Start the management-module command-line interface from a client computer by
establishing a Telnet connection to the IP address of the management module or by
establishing an SSH connection. Yo u can establish up to 14 separate Telnet or SSH
sessions to the BladeCenter management module.
Although a remote network administrator can access the management-module
command-line interface through Telnet, this method does not provide a secure
connection. As a secure alternative to using Telnet to access the command-line
interface, SSH ensures that all data that is sent over the network is encrypted and
secure.
The SSH clients listed below are available. Although some SSH clients have been
tested, support or non-support of any particular SSH client is not implied.
v The SSH clients distributed with operating systems such as Linux, AIX ®, and
UNIX
®
(see the operating-system documentation for information). The SSH client
of Red Hat Linux 8.0 Professional was used to test the command-line interface.
v The SSH client of cygwin (see http://www.cygwin.com for information)
v Putty (see http://www.chiark.greenend.org.uk/~sgatham/putty for information)
The following table shows the types of encryption algorithms that are supported,
according to the client software version that is being used.
34 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Algorithm SSH version 1.5 clients SSH version 2.0 clients
Public key exchange SSH 1-key exchange algorithm Diffie-Hellman-group 1-sha-1
Host key type RSA (1024-bit) DSA (1024-bit)
Bulk cipher algorithms 3-des 3-des-cbc or blowfish-cbc
MAC algorithms 32-bit crc Hmac-sha1
Establishing a Telnet connection
To log on to the management module using Telnet, complete the following steps:
1. Open a command-line window on the network-management workstation, type
telnet 192.168.70.125 , and press Enter. The IP address, 192.168.70.125, is
the default IP address of the management module; if a new IP address has
been assigned to the management module, use that one instead.
A command-prompt window opens.
2. At the login prompt, type the management-module user ID. At the password
prompt, type the management-module password. The user ID and password are
case sensitive and are the same as those that are used for
management-module Web access.
A command prompt is displayed. You can now enter commands for the
management module.
Establishing a Secure Shell (SSH) connection
To log on to the management module using SSH, complete the following steps:
1. Make sure that the SSH service on the network-management workstation is
enabled. See the operating-system documentation for instructions.
2. Make sure that the SSH server on the BladeCenter management module is
enabled. See the IBM Eserver BladeCenter and BladeCenter T Management
Module User’s Guide for instructions.
3. Start an SSH session to the management module using the SSH client of your
choice. For example, if you are using the cygwin client, open a command-line
window on the network-management workstation, type ssh -x 192.168.70.125 ,
and press Enter. The IP address, 192.168.70.125, is the default IP address of
the management module; if a new IP address has been assigned to the
management module, use that one instead.
A command prompt window opens.
4. Type the management-module user ID when prompted. At the password prompt,
type the management-module password. The user ID and password are case
sensitive and are the same as those that are used for management-module
Web access.
A command prompt is displayed. You can now enter commands for the
management module.
For information about installing and configuring the SSH, see the
Note:
BladeCenter JS20 Installation and User’s Guide on the IBM BladeCenter
Documentation CD.
Starting an SOL session
Notes:
1. The SOL feature must be enabled for both the BladeCenter unit and the blade
server before you can start an SOL session with the blade server. See the IBM
Chapter 6. Running a Serial Over LAN session 35
Eserver BladeCenter and BladeCenter T Management Module Command Line
Interface Reference Guide for information about SOL commands. See the
operating-system documentation for information about SOL commands that you
can enter using the command-line interface. Additional information about setting
up and enabling SOL, and configuring a blade server for SOL, is available in the
BladeCenter JS20 Installation and User’s Guide on the IBM BladeCenter
Documentation CD.
2. The BladeCenter management module automatically stores the previous 8 KB
of serial data that was transmitted by each blade server, even when SOL
sessions are not active. When an SOL session is established, all of the previous
serial data, up to 8 KB, is automatically displayed. If no previous data is
available when the SOL session starts, the cursor will remain on the command
line until new serial data is transmitted.
you start a Telnet or SSL session to the BladeCenter management module,
After
you can start an SOL session to any individual blade server that supports SOL
using the console command. Therefore, you can have simultaneous SOL sessions
active for each blade server installed in the BladeCenter unit.
Use the console command from the command line, indicating the target blade
server, where x is the corresponding blade server bay number:
console -T system:blade[x ]
For example, to start an SOL connection to the blade server in blade bay 14, type
console -T system:blade[14]
A blade server that occupies more than one blade server bay is identified by the
lowest bay number that it occupies.
After an SOL session is started, all commands are sent to the blade server that is
specified by the console command until the SOL session is ended, regardless of
the persistent command target that was in effect before the SOL session.
To restart the blade server through an SOL session, use the following key
sequence: Esc R Esc r Esc R
Complete the following steps to start a BladeCenter management module Telnet
CLI session:
1. From a command prompt, type telnet location
Where location is the host name or IP address of the BladeCenter management
module
2. Log on to the BladeCenter management module. The default user name is
USERID, and the default password is PASSW0RD (note the number zero, not
the letter O, in PASSW0RD).
Ending an SOL session
To end an SOL session, use the following key sequence: Press the Esc and shift-9
keys. The command-line interface will return to the persistent command target that
was in effect before the SOL session.
To exit a BladeCenter management-module Telnet CLI session, type exit at the
BladeCenter management-module Telnet CLI prompt after ending an SOL session.
36 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Chapter 7. Diagnostics
This chapter provides basic troubleshooting information to help you solve some
common problems that might occur with the blade server.
Note: Linux service aids for hardware diagnostics (separate from the operating
system installation) are available for download from the following Web site:
http://techsupport.services.ibm.com/server/lopdiags/.
Diagnostic utilities for the Linux operating system are available from IBM. For more
information, go to http://www.ibm.com/servers/eservers/support/bladecenter/; in the
Hardware field select BladeCenter JS20, in the Software field select Linux on
POWER environment, then click Go.
For information about standalone AIX diagnostics, see “Running the standalone
diagnostics from CD-ROM” on page 25.
Other supported operating systems might have diagnostic tools available through
the operating system. Consult your operating system documentation for more
information.
If you cannot locate and correct the problem using the information in this section,
see Appendix A, “Getting help and technical assistance,” on page 161 for more
information.
Note: A problem with the BladeCenter JS20 Type 8842 blade server may relate to
General checkout
Follow the checkout procedure for diagnosing hardware problems.
Note: Before performing the checkout procedure, read Appendix B, “Safety
The firmware diagnostics program tests the major components of the blade server
during startup and while the operating system is running. For Linux or AIX, there
are automatic error log analysis routines that provide failure information during
runtime.
either the BladeCenter JS20 Type 8842 blade server or the BladeCenter
unit.
v A blade-server problem exists if the BladeCenter unit contains more than
one blade server and only one of the blade servers has the symptom.
v If all of the blade servers have the same symptom, then the problem
relates to the BladeCenter unit. For more information, see the Hardware
Maintenance Manual and Troubleshooting Guide for your BladeCenter
unit.
information,” on page 163.
There are also AIX concurrent online diagnostics (from disk) and standalone
diagnostics (from CD) to assist you in performing problem determination.
The firmware diagnostic tests are run automatically when the JS20 server is started.
v If your operating system is AIX, then you have AIX diagnostics available
concurrently to check out your hardware. See “Performing AIX online concurrent
mode diagnostics for system verification” on page 29.
© Copyright IBM Corp. 2003 37
Note: If your system will not start, you can use the “Running the standalone
diagnostics from CD-ROM” on page 25 procedure to isolate a hard disk
drive failure that may be preventing the system from starting.
v If your operating system is Linux, then you have the eSever Standalone
Diagnostics CD available to check out your hardware. See “Running the
standalone diagnostics from CD-ROM” on page 25.
v The console must be open for error codes to be visible. Make sure that SOL is
enabled.
v A single problem might cause several error messages. When this occurs, correct
the cause of the first error message.
When the cause of the first error message is corrected, the other error messages
might not occur the next time you run the test.
Important:
1. If multiple error codes are displayed, diagnose the first code that is displayed.
2. If the server stops and a POST (3- or 4-digit) error code is displayed, see
“Firmware checkpoint (progress) codes” on page 94.
3. If the server is suspended and no error message is displayed, see
“Undetermined problems” on page 156.
4. For intermittent problems, go to Chapter 4, “Problem determination procedures
for AIX and Linux,” on page 21 and check the BladeCenter management
module event log.
If the operating system is Linux, the Linux Syslog (platform log) may have more
information to help isolation the problem.
5. If the blade front panel shows no lit LEDs, verify blade status and errors in
BladeCenter management-module web interface; also, see “Undetermined
problems” on page 156.
6. If device errors occur, go to Chapter 4, “Problem determination procedures for
AIX and Linux,” on page 21 or see Chapter 10, “Symptom-to-FRU index,” on
page 93.
Checkout procedure
The checkout procedure can be used if the server does not start up after installation
or if it experiences an undetermined error while running, such as if the server stops
running with no error code displayed. Certain errors listed in the Symptom-to-FRU
index will also direct you to perform the checkout procedure. If the operating system
is AIX, you can also verify the system following the procedure “Performing AIX
online concurrent mode diagnostics for system verification” on page 29. If the
operating system is Linux, then you can use the standalone diagnostic CD in
system verification mode to verify the JS20 blade server.
001 PERFORM THE CHECKOUT PROCEDURE:
1. Turn off the server, making sure to first turn off all external devices, if
2. Check all cables and power cords.
3. Turn on all external devices; then, turn on the blade server.
4. Start the Serial Over Lan (SOL) console for the blade server to be
002 DID THE LINUX OR AIX LOGIN PROMPT APPEAR?
YES.
38 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
attached.
tested and check for the following responses:
a. Progress codes are displayed on the console.
b. AIX or Linux login prompt appears.
1. If a firmware checkpoint (progress) (3 or 4-digit) code or firmware error
(8-digit) code is displayed on the console, see “Firmware checkpoint
(progress) codes” on page 94 or “Firmware error codes” on page 102.
2. Check the BladeCenter management module event log and if the
operating system is Linux, check the Linux Syslog (platform log). If an
error was recorded by the system, see Chapter 10, “Symptom-to-FRU
index,” on page 93.
3. If an error was recorded or if you believe you have a problem, perform
the “Performing AIX online concurrent mode diagnostics for problem
determination” on page 25.
4. If the login prompt appears and you still suspect a problem, go to
“Performing AIX online concurrent mode diagnostics for problem
determination” on page 25 or see “Undetermined problems” on page
156.
NO.
1. Check to see if a firmware checkpoint (progress) (3 or 4-digit) code or
firmware error (8-digit) code is displayed on the console; if so, see
“Firmware checkpoint (progress) codes” on page 94 or “Firmware error
codes” on page 102.
2. Check the blade error LED on the information LED panel; if it is lit,
see“Light path diagnostics” on page 46.
3. Record any POST error messages that are displayed on the screen;
then, check the BladeCenter management module event log. If an error
was recorded by the system or if a checkpoint code is displayed on the
console, perform Chapter 4, “Problem determination procedures for AIX
and Linux,” on page 21 or see Chapter 10, “Symptom-to-FRU index,” on
page 93.
4. If you do not have any error codes, perform Chapter 4, “Problem
determination procedures for AIX and Linux,” on page 21.
Diagnostic tools overview
The following tools are available to help you diagnose and solve hardware-related
problems:
v POST firmware checkpoints (progress codes)
The power-on self-test (POST), or firmware checkpoints, tests the major
components of the blade server. These firmware checkpoints (progress codes)
indicate the detection of a problem if the server stops on a checkpoint during the
startup process.
– A four-digit code indicates successful completion of that portion of POST
when the server does not stop on that checkpoint.
– A result other than a four-digit code indicates that POST might have detected
a problem. Error messages also appear during startup if POST detects a
hardware-configuration problem. The last POST firmware checkpoint code
posted is the most likely failure indicator. See “POST” on page 40 for more
information.
Error symptom charts
v
These charts list problem symptoms and steps to correct the problems. See
“Error symptoms” on page 145 for more information.
v Light path diagnostics
Use the light path diagnostics feature to diagnose system errors quickly. See
“Light path diagnostics” on page 46 for more information.
Chapter 7. Diagnostics 39
POST
Checkpoints
Note: The service processor runs on its own power boundary and continually
monitors hardware attributes and the environmental conditions within the
system. The service processor is controlled by firmware and does not require
the operating system to be operational to perform its tasks.
After power is turned on and before the operating system is loaded, the system
does a power-on self-test (POST). This test performs checks to ensure that the
hardware is functioning correctly before the operating system is started. During
POST, a POST screen displays, and POST indicators appear on the Serial Over
LAN (SOL) console (if one is connected). The next section describes the POST
indicators and functions that can be accessed during POST.
The system firmware uses checkpoints (progress codes and error codes) to indicate
the status of the system. These codes can appear only on the Serial Over Lan
(SOL) console. Firmware error codes and messages indicate that a problem exists;
they are not intended to be used to identify a failing part.
Checkpoints display in the system console from the time ac power is connected to
the system until the operating system login prompt is displayed after a successful
operating system boot. These checkpoints have the following forms:
Exxx Exxx checkpoints indicate that a system processor is in control and
is initializing the system resources. Control is being passed to the
operating system when E105 displays on the operator panel
display. Location code information may also display on the operator
panel during this time (see “Physical location codes” on page 154).
Error codes If a fault is detected, an 8-digit error code is displayed in the
BladeCenter management module event log. A location code might
be displayed at the same time on the second line.
The management-module log, which can be accessed through the Blade Center
unit, contains the most recent error codes and messages that the system generated
during POST.
Accessing the Linux system error log
If the system information LED is lit, do one of the following:
1. Check for an entry in the BladeCenter management-module event log. If the
information in this log is either a four-digit or eight-digit error code, go directly to
“Firmware checkpoint (progress) codes” on page 94 or “Firmware error codes”
on page 102.
2. Go to “General checkout” on page 37.
Service aids and the Linux system error log
®
Linux on pSeries
who have installed and are running Linux. Users can install these free diagnostics
tools for effective diagnosis and repair of their system in the rare instance when a
system error occurs.
service aids for hardware diagnostics are available for customers
This service aid toolkit provides the key tools required to take advantage of the
inherent pSeries hardware reliability, availability, and serviceability (RAS) functions
as outlined in the Linux on pSeries RAS Whitepaper, available from
40 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
http://techsupport.services.ibm.com/server/Linux_on_pSeries/images/Linux_RAS.pdf,
such as first failure data capture and error log analysis. With the toolkit installed,
problem determination and correction is greatly enhanced and the likelihood of an
extended system outage is reduced.
The Linux service aids for hardware diagnostics are separate from the operating
system installation and are available for download from the following Web site:
http://techsupport.services.ibm.com/server/lopdiags/.
Note: The following steps can only be performed if the Linux service tools have
been installed on the JS20 blade server.
001
Step
1. If the blade server is functional, determine your level of Linux by
logging in to the blade server as the root user and entering the
following command:
ls -l /var/log/platform
If the /var/log/platform file exists, go to substep 2.
2. Use the following command to list diagela messages recorded
in the Linux Syslog (platform log):
cat /var/log/platform |grep diagela |more
Linux run-time diagela error messages are logged in the
platform file under /var/log.
The following illustration shows an example of the Linux Syslog
(platform log) error log diagela messages.
Chapter 7. Diagnostics 41
Aug 13 09:38:45 larry diagela: 08/13/2003 09:38:44
Aug 13 09:38:45 larry diagela: Automatic Error Log Analysis has detected
a problem.
Aug 13 09:38:45 larry diagela: Aug 13 09:38:45 larry diagela: The Service
Request Number(s)/Probable Cause(s)
Aug 13 09:38:45 larry diagela: (causes are listed in descending order of
probability):
Aug 13 09:38:45 larry diagela:
Aug 13 09:38:45 larry diagela: 651-880: The CEC or SPCN reported an error.
Report the SRN and the following reference and physical location codes
to your service provider.
Aug 13 09:38:45 larry diagela: Location: n/a FRU: n/a Ref-Code: B1004699
Aug 13 09:38:45 larry diagela:
Aug 13 09:38:45 larry diagela: Analysis of Error log sequence number: 3
Aug 29 07:13:04 larry diagela: 08/29/2003 07:13:04
Aug 29 07:13:04 larry diagela: Automatic Error Log Analysis has detected
a problem.
Aug 29 07:13:04 larry diagela:
Aug 29 07:13:04 larry diagela: The Service Request Number(s)/Probable Cause(s)
Aug 29 07:13:04 larry diagela: (causes are listed in descending order of
probability):
Aug 29 07:13:04 larry diagela:
Aug 29 07:13:04 larry diagela: 651-880: The CEC or SPCN reported an error.
Report the SRN and the following reference and physical location codes
to your service provider.
Aug 29 07:13:04 larry diagela: Location: U0.1-F4 FRU: 09P5866 Ref-Code:
10117661
Aug 29 07:13:04 larry diagela:
Aug 29 07:13:04 larry diagela: Analysis of /var/log/platform sequence
number: 24
Sep 4 06:00:55 larry diagela: 09/04/2003 06:00:55
Sep 4 06:00:55 larry diagela: Automatic Error Log Analysis reports the
following:
Sep 4 06:00:55 larry diagela:
Sep 4 06:00:55 larry diagela: 651204 ANALYZING SYSTEM ERROR LOG
Sep 4 06:00:55 larry diagela: A loss of redundancy on input power was
detected.
Sep 4 06:00:55 larry diagela:
Sep 4 06:00:55 larry diagela: Check for the following:
Sep 4 06:00:55 larry diagela: 1. Loose or disconnected power source
connections.
Sep 4 06:00:55 larry diagela: 2. Loss of the power source.
Sep 4 06:00:55 larry diagela: 3. For multiple enclosure systems, loose or
Sep 4 06:00:55 larry diagela: disconnected power and/or signal connections
Sep 4 06:00:55 larry diagela: between enclosures.
Sep 4 06:00:55 larry diagela:
Sep 4 06:00:55 larry diagela: Supporting data:
Sep 4 06:00:55 larry diagela: Ref. Code: 10111520
Sep 4 06:00:55 larry diagela: Location Codes: P1 P2 Sep 4 06:00:55 larry
diagela:
Sep 4 06:00:55 larry diagela: Analysis of /var/log/platform sequence
number: 13
3. Also use the following command to list RTAS messages
recorded in the Linux Syslog (platform log):
cat /var/log/platform |grep RTAS |more
Linux RTAS error messages are logged in the platform file
under /var/log. The following illustration shows an example of
the Linux Syslog (platform error log) RTAS messages.
42 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Aug 27 12:16:33 larry kernel: RTAS: 15 -------- RTAS event begin -------Aug 27 12:16:33 larry kernel: RTAS 0: 04440040 000003f8 96008508 19155800
Aug 27 12:16:33 larry kernel: RTAS 1: 20030827 00000001 20000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 2: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 3: 49424d00 55302e31 2d463400 00503034
Aug 27 12:16:33 larry kernel: RTAS 4: 10117661 04a0005d 10110000 00000000
Aug 27 12:16:33 larry kernel: RTAS 5: 00007701 000000e0 00000003 000000e3
Aug 27 12:16:33 larry kernel: RTAS 6: 00000000 01000000 00000000 31303131
Aug 27 12:16:33 larry kernel: RTAS 7: 37363631 20202020 20202020 55302e31
Aug 27 12:16:33 larry kernel: RTAS 8: 2d463420 20202020 20202020 03705a39
Aug 27 12:16:33 larry kernel: RTAS 9: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 10: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 11: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 12: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 13: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 14: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 15: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 16: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 17: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 18: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 19: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 20: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 21: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 22: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 23: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 24: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 25: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 26: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 27: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 28: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 29: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 30: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 31: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 32: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 33: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 34: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 35: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 36: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 37: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 38: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 39: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 40: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 41: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 42: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 43: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 44: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 45: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 46: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 47: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 48: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 49: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 50: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 51: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 52: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 53: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 54: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 55: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 56: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 57: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 58: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 59: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 60: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 61: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 62: 00000000 00000000 00000000 00000000
Aug 27 12:16:33 larry kernel: RTAS 63: 00000000 00000000 00000000 00020000
Aug 27 12:16:33 larry kernel: RTAS: 15 -------- RTAS event end ----------
Error codes and location codes might appear as RTAS
messages.
The extended data is also provided in the form of an RTAS
message. The extended data contains other error code words
that help in isolating the correct FRUs. The start of the extended
data is marked, for example, by the line
Aug 27 12:16:33 larry kernel: RTAS: 15 ------ RTAS event begin ------.
Chapter 7. Diagnostics 43
The number after the colon is a sequence number that
correlates this data with any diagela data with the same
sequence number. The end of the extended data is marked by
the line
Aug 27 12:16:33 larry kernel: RTAS: 15 ----- RTAS event end -------
with the same sequence number.
Word 13 and word 19 are found in the RTAS messages. For
example, to find word 13, first find the error code in the left
column of words of the extended data, 10117661. In this
example, we find the error code to the right of RTAS 4:. This is
also word 11. To get word 13, 10110000, count the words left to
right, beginning at word 11.
002 If you performed substep 2 on page 41 of Step 001, record any
Step
RTAS messages found in the Linux Syslog (platform log) in Step
001 . If you performed substep 2 on page 41 of Step 001 , record
any RTAS and diagela messages found in the Linux Syslog
(platform log) in Step 001 , and also record any extended data
found in the RTAS messages, especially word 13 and word 19.
Ignore all other messages in the Linux Syslog (platform log).
Step 003 Examine the Linux boot (IPL) log by logging in to the system as the
root user and entering the following command:
cat /var/log/boot.msg |grep RTAS |more
Linux boot (IPL) error messages are logged into the boot.msg file
under /var/log. The following illustration shows an example of the
Linux boot error log.
RTAS daemon started
RTAS: -------- event-scan begin -------RTAS: Location Code: U0.1-F3
RTAS: WARNING: (FULLY RECOVERED) type: SENSOR
RTAS: initiator: UNKNOWN target: UNKNOWN
RTAS: Status: bypassed new
RTAS: Date/Time: 20020830 14404000
RTAS: Environment and Power Warning
RTAS: EPOW Sensor Value: 0x00000001
RTAS: EPOW caused by fan failure
RTAS: -------- event-scan end ----------
Step 004 Record any RTAS messages found in the Linux boot (IPL) log in
Step 003 . Ignore all other messages in the Linux boot (IPL) log.
Step 005 If you performed substep 3 on page 42 of Step 001 for the
current Linux partition, go to Step 006 on page 45, and when
asked in Step 006 , do not record any additional extended data
from Step 004 for the current Linux partition. Examine the
extended data in both logs.
The following is an example of the Linux extended data.
44 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
RTAS daemon started
RTAS: -------- event-scan begin -------RTAS: Location Code: U0.1-P1-C2
RTAS: Log Debug: 04
4b2726fb04a00011702c0014000000000000000000000000f1800001001801d3ffffffff0100000
00000000042343138 20202020383030343236464238454134303030303 030303030303030
RTAS: Log Debug: D2
5046413405020d0a000001000271400100000033434d502044415441000001000000000000010000
f180000153595320444154410000000000000000200216271501050920021627150105092002063
7150105095352432044415441702c001400000000000000020018820201d3820000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000280048ea400000000000000000000000000000000000000004350
542044415441702cff08000000001c000000702cf0080000000080000000702cf100702cf200702
c000400000800702c01040bf2002e702c02040c1fffbf702c0300702c1000702c11040bf2002e70
2c12040c1fffbf702c1300702ca000702ca108000000000000a03c702ca208000000000000effc7
02cb000702cb108000000000000a03c702cb208000000000000effc702cc000702cc10800000000
0000a03c702cc208000000000000effc702c3000702c31080000000000000003702c32080000000
00000007b702c8000702c81080000000020e27a39702c820800000000fffeffff702cd000702cd1
080000000010004010702cd208000000007777f3fffffffffffffffffffffffffffffffffffffff
fffffffffffffffffffffffffff
RTAS: WARNING: (FULLY RECOVERED) type: INTERN_DEV_FAIL
RTAS: initiator: UNKNOWN target: UNKNOWN
RTAS: Status: unrecoverable new
RTAS: Date/Time: 20020905 15372200
RTAS: CPU Failure
RTAS: Internal error (not cache)
RTAS: CPU id: 0
RTAS: Failing element: 0x0000
RTAS: -------- event-scan end --------
Step 006 Record any extended data found in the Linux Syslog (platform log)
in Step 001 or the Linux boot (IPL) log in Step 003 . Be sure to
record word 13.
Note: The lines in the Linux extended data that begin with
RTAS: Log Debug: 04
contain the error code listed in the next 8 hex characters. In
the previous example, 4b27 26fb is an error code. The error
code is also known as word 11. Each 4 bytes after the error
code in the Linux extended data is another word (for
example, 04a0 0011 is word 12, and 702c 0014 is word 13,
and so on).
007 Were any error codes or checkpoints recorded in Steps 001,
Step
002, 003, 004, 005, or 006?
NO. Go to Step 008.
YES. Go to “Firmware checkpoint (progress) codes” on page 94
and “Firmware error codes” on page 102 for each recorded error
code or symptom. Perform the indicated actions one at a time for
each error code until the problem has been corrected. If all
recorded error codes have been processed and the problem has
not been corrected, go to Step 008 .
Step 008 If no additional error information is available and the problem has
not been corrected, shut down the blade server.
1. Are there any event-logged entries in the BladeCenter
management-module event log?
2. Replace the system board.
Chapter 7. Diagnostics 45
FRU/CRU isolation
Error codes and the recommended actions for each code are provided in
Chapter 10, “Symptom-to-FRU index,” on page 93. These actions can provide you
with informational messages and directions or can refer you to Chapter 11, “Parts
listing, Type 8842,” on page 159.
If a replacement part is indicated, the part is referred to by name. The physical
location codes are listed for each occurrence as required (see “Physical location
codes” on page 154). Chapter 11, “Parts listing, Type 8842,” on page 159 provides
a parts index with the predominant field replaceable units (FRUs) or customer
replaceable units (CRUs) listed by name, and provides illustrations of the various
assemblies and components that make up the blade server.
Error symptom charts
You can use the error symptom charts to find solutions to problems that have
definite symptoms (see “Error symptoms” on page 145).
If you cannot find the problem in the error symptom charts, go to “Checkout
procedure” on page 38 and “Undetermined problems” on page 156.
If you encounter problems with an Ethernet or Fibre Channel switch module, IBM
Eserver BladeCenter Optical Pass-Thru Module, I/O expansion card, or other
optional device that can be installed in the BladeCenter unit, see the applicable
Hardware Maintenance Manual and Troubleshooting Guide on the IBM BladeCenter
Documentation CD or other documentation that comes with the device for more
information.
Light path diagnostics
Many errors are first indicated when the blade-error LED on the blade server is lit
(see “Blade server controls and LEDs” on page 14). If this LED is lit, one or more
error LEDs elsewhere in the blade server might also be lit and can direct you to the
source of the error.
This section describes how to use the light path diagnostics to identify problems
that might arise. To locate the actual component that caused the error, you must
locate the lit error LED for that component.
Note: Read Appendix B, “Safety information,” on page 163 and “Handling
static-sensitive devices” on page 71.
For example, if a blade error has occurred and the blade-error LED is lit on the
blade server, complete the following steps:
1. Turn off the blade server and remove it from the BladeCenter unit.
2. Place the blade server on a flat, static-protective surface.
3. Remove the cover from the blade server (see “Opening the blade server cover”
on page 74).
46 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
DIMM 3 error
LED (CR46)
DIMM 1 error
LED (CR40)
DIMM 2 error
LED (CR45)
DIMM 4 error
LED (CR53)
Microprocessor 0
error LED (CR19)
Temperature
error LED (CR16)
Light Path Diagnostics
(SW1)
Microprocessor 1
error LED (CR58)
Service processor
error LED (CR27)
NMI error
LED (CR17)
System board
error LED (CR20)
Reserved (CR29)
Memory errors
4. Press and hold the light path diagnostics button (SW1) to light the LEDs that
were lit before you removed the blade server from the BladeCenter unit. The
LEDs will remain lit for as long as you press the button, up to a maximum of 25
seconds.
Notes:
a. Power is available to relight the light path diagnostics LEDs for a short
period of time after the blade server is removed from the BladeCenter unit.
During that time, you can relight the light path diagnostics LEDs for a
maximum of 25 seconds (or less, depending on the number of LEDs that
are lit and the length of time the blade server is removed from the
BladeCenter unit) by pressing the light path diagnostics button (SW1).
b. Error LED CR29 is reserved.
Use the table at “Light path diagnostics LEDs” on page 145 to help determine
5.
the cause of the error and the action that should be taken.
If a memory problem occurs, complete the following steps before replacing a DIMM:
1. Reseat both DIMMs in the bank.
2. Turn off the blade server and wait 30 seconds; then, turn on the blade server.
3. Check for a memory type mismatch in the bank.
For more information about memory, see “Installing memory modules” on page 77.
Chapter 7. Diagnostics 47
Recovering the system firmware code
The system firmware is contained in two separate images in the flash memory of
the blade server: temporary and permanent. These images are referred to as TEMP
and PERM, respectively. The system normally starts from the TEMP image, and the
PERM image serves as a backup. If the TEMP image becomes damaged, such as
from a power failure during a flash update, you can recover the TEMP image from
the PERM image.
If the TEMP image becomes damaged, you can see one of two symptoms:
v The system automatically starts from the PERM image. This is indicated by the
error code 20D00902.
v The system hangs or is non-responsive after the system is started with no
checkpoints.
your system hangs, you can force the system to start from the PERM image by
If
using the code page jumper (J14).
v Setting jumper J14 to pins 2 and 3 will force the blade server to start (boot) from
the PERM image.
v Setting jumper J14 to pins 1 and 2 will enable the blade server to start (boot)
from either the TEMP or PERM image.
Recovery of system firmware code using service aids
Linux on pSeries service aids for hardware diagnostics are available for customers
who have installed and are running the Linux operating system. Users can install
these free diagnostics tools for effective diagnosis and repair of the system in the
rare instance when a system error occurs.
This service aid toolkit provides the key tools required to take advantage of the
inherent pSeries hardware RAS functions as outlined in the Linux on pSeries RAS
White paper available from http://techsupport.services.ibm.com/server/
Linux_on_pSeries/images/Linux_RAS.pdf. These functions include first failure data
capture and error log analysis. With the toolkit installed, problem determination and
correction is greatly enhanced and the likelihood of an extended system outage is
reduced.
The Linux service aids for hardware diagnostics are separate from the operating
system installation and are available for download from the following Web site:
http://techsupport.services.ibm.com/server/lopdiags/.
Note: The Update_Flash command can only be performed if the Linux service
tools have been installed on the blade server.
Starting the TEMP image
To force the system to start the TEMP image, complete the following steps:
Note: Do not perform these steps if the system error code 20D00902 has already
occurred on your system.
1. Turn off the blade server.
2. Remove the blade server (see “Removing the blade server from the
BladeCenter unit” on page 73).
3. Open the blade-server cover (see “Opening the blade server cover” on page 74
for instructions).
48 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
4. Remove the blade-server bezel assembly (see “Removing the blade server
bezel assembly” on page 75).
5. Locate jumper J14 (system firmware code page jumper) on the system board.
3
2
1
System firmware code
page jumper (J14)
6. Move jumper J14 to pins 2 and 3 to enable system firmware recovery mode.
7. Replace the cover and reinstall the blade server in the BladeCenter unit, making
sure that the blade server controls all relevant components and restart the blade
server.
8. If the system starts up and boots to the operating-system prompt, see
“Recovering the TEMP image from the PERM image.” If the system does not
boot to the operating-system prompt, replace the system-board assembly.
Contact a service support representative for assistance.
If the blade server does not restart, you must replace the system-board
Note:
assembly. Contact a service support representative for assistance.
Recovering the TEMP image from the PERM image
To recover the TEMP image from the PERM image, you must perform the reject
function. The reject function copies the PERM image into the TEMP image. To
perform the reject function, complete the following steps:
1. If you have not started the system from the TEMP image, do so now. For
additional information, see “Starting the TEMP image” on page 48.
2. If you have not installed the ppc64 Linux utilities, perform the installation now.
For instructions, go to the Linux on POWER Web site at http://
techsupport.services.ibm.com/server/lopdiags/.
3. Reject the TEMP image.
v If you are using the Red Hat Linux or SUSE LINUX operating system, type
the following command:
update_flash -r
v If you are using the AIX operating system, type the following command:
/usr/lpp/diagnostics/bin/update_flash -r
Shut down the blade server using the operating system.
4.
5. If you have not moved jumper J14 as described in “Starting the TEMP image”
on page 48, restart the system.
6. If you moved jumper J14, complete the following steps:
a. Turn off the blade server.
b. Remove the blade server (see “Removing the blade server from the
BladeCenter unit” on page 73).
Chapter 7. Diagnostics 49
c. Open the blade-server cover (see “Opening the blade server cover” on page
74 for instructions).
d. Locate jumper J14 (system firmware code page jumper) on the system
board.
3
2
1
System firmware code
page jumper (J14)
e. Move jumper J14 to pins 1 and 2 to enable system firmware recovery mode.
f. Replace the cover and reinstall the blade server in the BladeCenter unit,
making sure that the blade server controls all relevant components.
Statement 21:
CAUTION:
Hazardous energy is present when the blade server is connected to the
power source. Always replace the blade cover before installing the
blade server.
g. Restart the blade server.
h. Verify that the system starts from the TEMP image.
v If you are using one of the Linux operating systems, go to “Verifying the
system firmware levels using Linux” on page 52.
v If you are using the AIX operating system, go to “Committing the
temporary firmware image using AIX” on page 53.
i. Update the flash again, if you are updating the system firmware code.
might need to update the firmware code to the latest version. See
You
http://www.ibm/.com/pc/support for more information about how to update the
firmware code.
Updating the blade server firmware
This section describes how to determine the current code levels for the blade server
(system) firmware and Integrated Systems Management Processor (service
processor). Information on how to validate, update and commit the system firmware
is included.
The blade server contains firmware code for the system and service processor. IBM
will periodically make firmware updates available for the server system and the
service processor. You can maintain the latest levels of firmware code for the blade
50 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
server system and service processor by installing firmware updates as they become
available. Be sure to follow the instructions in this section.
Determination of current server firmware levels
Complete the following steps to view the current firmware code levels for the blade
server and the service processor:
1. Access and log on to the BladeCenter management-module Web interface as
described in the For more information, see the Installation and User’s Guide for
your BladeCenter unit.
2. From the Blade Tasks section, select Firmware VPD .
The Blade Server Firmware VPD window contains the build identifier, release, and
revision for system and service processor. Compare this information to the firmware
information on the IBM Support Web site at http://www.ibm.com/support/. If these
two types of information match, then your blade server has the latest firmware
code. If these two types of information differ, download the latest firmware code
from the IBM Support Web site. Follow the update instructions on the IBM Support
Web site.
Updating the blade server service processor
To apply the latest firmware code for your blade server service processor, flash the
service processor. Download the latest firmware code for your integrated systems
management processor from the IBM Support Web site at http://www.ibm.com/
support/.
service processor. Follow the update instructions on the IBM Support Web site.
Use the BladeCenter management-module Web interface to flash the
Important: To avoid problems and to maintain proper system performance, always
make sure that the blade server firmware code and service processor
code levels are consistent for all blade servers within the BladeCenter
unit.
Update and manage system flash using Linux service aids
This section describes how to update and verify the system flash using Linux
service aids.
Updating the system flash using Linux
The Linux service aid for managing the system flash is separate from the operating
system installation and is available for download from the following Web site:
http://techsupport.services.ibm.com/server/lopdiags/.
Note: The update_flash command can only be performed if the update_flash
service aid has been installed on the JS20 blade server for the appropriate
version of Linux.
Complete the following steps to update the system firmware code:
1. Obtain the flash image you want to update and place it in the /etc/microcode
directory (create the directory if it does not exist).
2. Issue following command:
update_flash -f /etc/microcode/<update_image_name>
After the system reboots successfully and once you are satisfied with the
functionality of the new image, commit the update using the following Linux
command:
update_flash -c
Chapter 7. Diagnostics 51
This will copy your new image from the temp side to perm side of flash.
Verifying the system firmware levels using Linux
To verify the system firmware levels on the Perm and Temp side, enter the following
command at the Linux prompt (the entire command must be entered on one line):
for file in `ls /proc/device-tree/openprom/*bank*`; do echo $file;
cat $file; echo; echo; done
For Example: This command will return information similar to the following:
/proc/device-tree/openprom/ibm,fw-bank
P
/proc/device-tree/openprom/ibm,fw-perm-bank
FW04310120, 17:16:09, 07/26/2004
/proc/device-tree/openprom/ibm,fw-temp-bank
FW04310120, 17:16:09, 07/26/2004
v The value for ibm,fw-bank indicates what side you booted from (T
for TEMP, P for PERM).
v The value for ibm,fw-perm-bank identifies the firmware version,
date and time stamp of firmware on the PERM side.
v The value for ibm,fw-temp-bank indicates the firmware version,
date and time stamp of firmware on the TEMP side.
Notes:
1. If you have to recover the system firmware code, see “Recovering the system
firmware code” on page 54.
2. The IBM Remote Deployment Manager (RDM) program does not support the
BladeCenter JS20 Type 8842.
3. A reboot of the system must be done after using update_flash -c for the
firmware level shown in ibm,fw-perm-bank to be current.
Update and manage system flash using AIX diagnostics
This section describes how to update, commit and verify the system flash using AIX
diagnostics.
Updating the system flash using AIX
Attention: Do not power off the system while performing this task!
Complete the following steps:
1. Obtain the flash image you want to update from the IBM Support Web site at
http://www.ibm.com/support/ (look for the flash image for Type 8842, under ″ AIX
Diagnostics Version Number″ , this is the version used by the AIX diagnostics
service aid).
v If you want to update the image from the local file system, put the image into
the /etc/microcode directory on the system prior to running this service aid.
v If you want to update the image from media (diskette or optical media), put
the image on the media of choice prior to running this service aid.
Run diagnostics.
2.
v If you have booted AIX, login as ″root″ or use the CE login; then, at the
command line, enter:
diag
v Otherwise, boot standalone diagnostics; then press Enter.
52 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
3. From the ″ Function Selection″ menu, choose ″ Task Selection″ .
4. From the ″ Tasks Selection List″ choose ″ Update and Manage System Flash″ .
5. From the ″ Update and Manage System Flash″ list:
v If, in Step 1 above, you have put the image in the /etc/microcode file system,
then choose the ″ File System″ selection. At the ″ flash update image file″
prompt, specify the directory that contains the image (normally
″ /etc/microcode″ ), and then Commit (PF7).
v If, in Step 1 above, you have put the image on optical media or diskette,
place the media containing the image into the drive and choose ″ Removable
Media″ and then ″ Commit″ (PF7).
If you have booted standalone diagnostics from CD-ROM, you may
Note:
remove the standalone diagnostics CD-ROM media from the drive and
replace it with the optical media containing the image you want to
update.
6. Follow instructions displayed on the screen, for example:
Choose ″ Yes″ to proceed with the flash operation.
7. After the system flash completes, and the system reboots, you may remove the
media containing the image from the diskette or optical drive, or from the
/etc/microcode directory on the file system. You may also remove the temporary
file created in the ″ /var/update_flash_image″ directory after the reboot occurs,
after you log in to the operating system.
Committing the temporary firmware image using AIX
After the system reboots successfully and once you are satisfied with the
functionality of the new image, commit the update using the following AIX diagnostic
commands:
1. Run diagnostics.
v If you have booted AIX, login as ″root″ or use the CE login; then, at the
command line, enter:
diag
v Otherwise, boot standalone diagnostics; then press Enter.
From the ″Function Selection″ menu, choose ″Task Selection″ .
2.
3. From the ″ Tasks Selection List″ choose ″ Update and Manage System Flash″ .
4. Choose ″ Commit the Temporary Image″ .
5. Choose ″ Yes ″ to commit the image.
6. Press F10 to exit diagnostics
This selection commits the Temporary system firmware image to the
Note:
Permanent image when booted from the Temporary image.
Verifying the system firmware levels using AIX
To verify levels of system firmware on the Permanent and Temporary sides, use the
AIX diagnostics “Update and Manage System Flash” (see “Updating the system
flash using AIX” on page 52). The diagnostics function displays the system firmware
image level for both the Permanent and Temporary sides as well as an indication as
to which side was used for the current boot cycle. Following is an example screen:
Chapter 7. Diagnostics 53
UPDATE AND MANAGE FLASH 802810
The current permanent system firmware image is 2b204_310
The current temporary system firmware image is 2b204_310
The system is currently booted from the temporary firmware image.
Move cursor to selection, then press ’Enter’.
Validate and Update System Firmware
Validate System Firmware
Commit the Temporary Image
F1=Help F10=Exit F3=Previous Menu
If the system was booted using the permanent image instead of the temporary
image (as shown in the above example), the screen example would show:
The system is currently booted from the permanent firmware image.
and the last selection option is changed to:
Reject the Temporary Image
The firmware version information displayed by the AIX diagnostics (2b204_310 in
the example shown above) may be different than the information displayed by the
management module (see“Determination of current server firmware levels” on page
51). Cross reference information is given in the firmware information (“Blade Server
Firmware - IBM BladeCenter JS20”) on the IBM Support Web site at
http://www.ibm.com/support/, as well as in the README file for the firmware image.
Recovering the system firmware code
If the system firmware code has become damaged, such as from a power failure
during a flash update, the blade server might appear to be nonfunctional (no
progress codes or firmware codes). Yo u can recover the system firmware code by
using the system firmware code page jumper (J14).
Note: To obtain a system firmware flash image, download a system firmware file
from http://www.ibm.com/support/.
The system firmware is contained in two separate images in the flash memory of
the blade server (primary and backup).
Note: The primary image is also known as the TEMP side of the flash module. The
backup image is also known as the PERM side of the flash module.
If the primary image becomes damaged, such as from a power failure during a flash
update, you can recover the primary image from the backup image. If this occurs,
you can see one of two symptoms:
v The system automatically starts up from backup. This is indicated by the error
code 20D00902.
v The system automatically starts up from backup. This is indicated by an event
log message “OS Watchdog Triggered”.
v The system hangs or is non-responsive after the system is started with no
checkpoints.
54 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Note: If the system hangs or is non-responsive after starting the system, go to
“Startup problems” on page 152.
If the blade server hangs, you can force the blade server to start the backup image
by using the code page jumper (J14).
Recovery of system firmware code using service aids
Linux on pSeries service aids for hardware diagnostics are available for customers
who have installed and are running Linux. Users can install these free diagnostics
tools for effective diagnosis and repair of the system in the rare instance when a
system error occurs.
This service aid toolkit provides the key tools required to take advantage of the
inherent pSeries hardware RAS functions as outlined in the Linux on pSeries RAS
Whitepaper available from http://techsupport.services.ibm.com/server/
Linux_on_pSeries/images/Linux_RAS.pdfsuch
log analysis. With the toolkit installed, problem determination and correction is
greatly enhanced and the likelihood of an extended system outage is reduced.
The Linux service aids for hardware diagnostics are separate from the operating
system installation and are available for download from the following Web site:
http://techsupport.services.ibm.com/server/lopdiags/.
as first failure data capture and error
Note: The Update_Flash command can only be performed if the Linux service tools
have been installed on the JS20 blade server.
Starting the backup image
To force the blade server to start the backup image, complete the following steps:
Note: Do not perform these steps if the system error code 20D00902 has already
occurred on the blade server.
1. Turn off the blade server and remove it from the BladeCenter unit (see
“Removing the blade server from the BladeCenter unit” on page 73).
2. Open the blade server cover (see “Opening the blade server cover” on page
74).
3
2
1
3. Locate jumper J14 (system firmware code page jumper) on the system board.
4. Remove jumper J14 from pins 1 and 2, and reinstall on Pins 2 and 3 of J14 to
enable system firmware recovery mode.
System firmware code
page jumper (J14)
Chapter 7. Diagnostics 55
5. Replace the cover and reinstall the blade server in the BladeCenter unit; then,
restart the blade server.
6. If the blade server starts with the operating-system prompt, see “Recovering the
primary image.” If the blade server does not start with the operating-system
prompt, replace the system board (see “Replacing the system board” on page
87).
If the blade server does not restart, you must replace the system board (see
Note:
“Replacing the system board” on page 87).
Recovering the primary image
To recover the primary image, you must perform the reject function. The reject
function copies the backup image into the primary image.
To perform the reject function, complete the following steps:
Note: If the operating system is Linux, begin with Step 001. If the operating
system is AIX, begin with Step 002 .
001 Complete the following steps:
Step
1. If you have not installed the ppc64 Linux utilities, perform the
installation now. See http://techsupport.services.ibm.com/server/
lopdiags/.
2. Reject the primary image. From the command line, type
update_flash -r
then, go to Step 003 .
002 Complete the following steps:
Step
1. Reject the primary image. From the command line, type
/usr/lpp/diagnostics/bin/update_flash -r
then, go to Step 003 .
or
Note: Menu items that do not pertain will NOT be displayed if
there is no temporary image; the “Reject the Temporary
Image” menu entry will not appear as a possible
selection.
2. From diagnostics, either concurrent or standalone:
a. Run diagnostics
v If you have booted AIX, login as ″root″ or use the CE
login; then, at the command line, enter:
diag
v Otherwise, boot standalone diagnostics; then press Enter.
From the Function Selection menu, choose Task
b.
Selection .
c. From the Tasks Selection List choose Update and
Manage System Flash .
d. Choose Reject the Temporary Image .
e. Choose Yes to reject the image.
f. Press F10 to exit diagnostics.
56 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Note: This selection rejects the temporary system firmware
image when booted from the permanent image. This
results in the temporary image being overwritten by
the permanent image.
Continue with Step 003.
3.
003 Shut down the blade server using the operating system.
Step
Step 004 If you have not moved jumper J14, go to Step 006.
Step 005 If you moved jumper J14, complete the following steps:
1. Turn off the blade server.
2. Remove the blade server from the BladeCenter unit (see
“Removing the blade server from the BladeCenter unit” on page
73).
3. Open the blade server cover (see “Opening the blade server
cover” on page 74).
3
2
1
System firmware code
page jumper (J14)
4. Locate jumper J14 (system firmware code page jumper) on the
system board.
5. Remove jumper J14 from pins 2 and 3, and reinstall on pins 1
and 2 of J14 for storage of the recovery mode.
6. Replace the cover and reinstall the blade server in the
BladeCenter unit.
006 Restart the blade server.
Step
Note: You might need to update the firmware code to the latest
version. See http://www.ibm.com/pc/support for more
information about how to update the firmware code.
Statement 21
CAUTION:
Hazardous energy is present when the blade server is connected to the power source.
Always replace the blade cover before installing the blade server.
Chapter 7. Diagnostics 57
58 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Chapter 8. General AIX and xSeries standalone diagnostic
information
This chapter describes standalone diagnostics for AIX and running the standalone
diagnostics from CD-ROM.
Information for general diagnostic systems running the AIX operating
system
Information for general diagnostic systems running the AIX operating system is
provided in this section.
For information about standalone CD AIX diagnostics, see “Running the standalone
diagnostics from CD-ROM” on page 25.
Information in this section is common to all JS20 system units.
Any service information or diagnostic procedure that is specific to a system unit or
device is a separate procedure for that system unit or device.
AIX operating system message files
English is the default language displayed by the diagnostic programs when run from
disk. If you want to run the diagnostic programs in a language other than English,
you must install on the system the AIX operating system message locale file set for
the desired language you want displayed.
AIX diagnostic and the standalone diagnostic tasks provide the capability to display
device and adapter microcode levels as well as update device and adapter
microcode. AIX diagnostic tasks also provide the capability to update firmware.
Use the Update and Manage System Flash task to update a system’s firmware.
When the flash update is complete, the system automatically reboots. Microcode
images can be installed from disk, diskette, or NIM server. For additional
information, refer to “Update and manage system flash using AIX diagnostics” on
page 52.
Use the microcode download for systems using AIX 5.2.0.30 or later task to install
microcode onto devices and adapters. This task presents a list of resources that are
currently installed and supported by this task. Microcode images can be installed
from disk, diskette, or NIM server. For additional information, refer to “Download
microcode” on page 64. For adapters and devices with microcode that can be
updated but are not supported by this task, refer to the manufacturer’s instructions.
For systems not using AIX, these tasks can be used via the “Running standalone
diagnostics from a management (NIM) server” on page 68. Otherwise, refer to the
corresponding documentation for the operating system on installing microcode.
CE login
CE login enables a user to perform operating system commands that are required
to service the system without being logged in as a root user. CE login must have a
role of Run Diagnostics and a primary group of System. This enables the user to:
v Run the diagnostics including the service aids, certify, format, and so forth.
© Copyright IBM Corp. 2003 59
v Run all the operating system commands run by system group users.
v Configure and unconfigure devices that are not busy.
In addition, CE login can have Shutdown Group enabled to allow:
v Use of the Update System Microcode service aid.
v Use of shutdown and reboot operations.
use CE login, ask the customer to create a user section of the AIX 5L Version
To
5.3 System Management Guide: Operating System and Devices. After this is set up,
you will need to obtain the user name and password from the customer to log in
with these capabilities. The recommended CE login user name is qserv.
Missing resources
In diagnostics version 5.2.0 and later, missing devices are identified on the
Diagnostic Selection screen by an uppercase ″ M″ preceding the name of the device
that is missing. The Diagnostic Selection menu is displayed any time you run the
diagnostic routines or the advanced diagnostics routines. The Diagnostic Selection
menu can also be entered by running diag -a when there are missing devices or
missing paths to a device.
When a missing device is selected for processing, the Missing Resource menu will
ask whether the device has been turned off, removed from the system, moved to a
different physical location, or if it is still present. When a single device is missing,
the fault is probably with that device. When multiple devices with a common parent
are missing, the fault is most likely related to a problem with the parent device. The
diagnostic procedure may include testing the device’s parent, analyzing which
devices are missing, and any manual procedures that are required to isolate the
problem.
v Symptom response:
The Missing Resource menu is displayed or the letter ″ M″ is displayed alongside
a resource in the resource list.
v Action:
If the Missing Resource menu is displayed, follow the displayed instructions until
either the Diagnostic Selection menu or an SRN is displayed. If an ″ M″ is
displayed in front of a resource (indicating that it is missing), select that resource;
then, choose Commit (F7 key).
– If an 8-digit error code is displayed, go to the“Firmware error codes” on page
– If an SRN is displayed, record it and go to “SRN tables” on page 110.
102, find the error and perform the listed action.
Automatic diagnostic tests
All automatic diagnostic tests are run after the system unit is turned on and before
the AIX operating system is loaded. The automatic diagnostic tests display progress
indicators (or checkpoints) to track test progress. If a test stops or hangs, the
checkpoint for that test remains on the console to identify the unsuccessful test.
Configuration program
The configuration program determines what features, adapters, and devices are
present on the system. The configuration program, which is part of the AIX
operating system, builds a configuration list that is used by the diagnostic programs
to control which tests are run during system checkout.
60 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Diagnostic programs
This section provides an overview of the various diagnostic programs.
The diagnostic controller runs as an application program on the AIX operating
system and carries out the following functions:
v Displays diagnostic menus
v Checks availability of needed resources
v Checks error log entries under certain conditions
v Loads diagnostic application programs
v Loads task and service aid programs
v Displays test results
test an adapter or device, select the device or adapter from the Diagnostic
To
Selection menu. The diagnostic controller then loads the diagnostic application
program for the selected device or adapter. The diagnostic application program
loads and runs test units to check the functions of the device or adapter.
The diagnostic controller checks the results of the tests done by the diagnostic
application and determines the action needed to continue the testing.
The amount of testing that the diagnostic application does depends on the mode
(concurrent/standalone) under which the diagnostic programs are running.
Error log analysis
When you select Diagnostics, the Diagnostic Selection menu displays.
Note: Other menus may display before the Diagnostic Selection menu appears.
This menu allows you to select the purpose for running diagnostics. When you
select the Problem Determination option, the diagnostic programs read and analyze
the contents of the error log.
Note: Most hardware errors in the operating system error log contain “sysplanar0”
If the error log contains recent errors (approximately the last 7 days), the diagnostic
programs automatically select the diagnostic application program to test the logged
function. If there are no recent errors logged or the diagnostic application program
runs without detecting an error, the Diagnostic Selection menu is displayed. This
menu allows you to select a resource for testing. If an error is detected while the
diagnostic application program is running, the ″ A problem was detected″ screen
displays a Service Request Number (SRN).
as the resource name. The resource name identifies the resource that
detected the error; it does not indicate that the resource is faulty or should
be replaced. Use the resource name to determine the appropriate diagnostic
to analyze the error.
If there are no recent errors logged or the diagnostic application program runs
without detecting an error, the Diagnostic Selection menu is displayed.
This menu allows you to select a resource for testing. If an error is detected while
the diagnostic application program is running, the ″ A problem was detected″ screen
displays a Service Request Number (SRN).
Chapter 8. General AIX and xSeries standalone diagnostic information 61
Introducing tasks and service aids
The AIX diagnostic package contains programs that are called Tasks. Tasks can be
thought of as performing a specific function on a resource; for example, running
diagnostics or performing a service aid on a resource. This chapter describes the
tasks available in AIX Diagnostics version 5.2 and later.
Notes:
1. Some programs are only accessible from Online Diagnostics in Service or
Concurrent mode, while others might be accessible only from Standalone
Diagnostics.
2. The specific tasks available will be dependent on the hardware attributes or
capabilities of the system you are servicing. Not all service aids or tasks will be
available on all systems.
perform one of these tasks, use the Task Selection option from the Function
To
Selection menu.
After a task is selected, a resource menu may be presented showing all resources
supported by the task.
A fast-path method is also available to perform a task by using the diag command
and the -T flag. By using the fast path, the user can bypass most of the introductory
menus to access a particular task. The user is presented with a list of resources
available to support the specified task.
The fast-path tasks are as follows:
v Certify – Certifies media
v Chkspares – Checks for the availability of spare sectors
v Download – Downloads microcode to an adapter or device
v Disp_mcode – Displays current level of microcode
v Format – Formats media
v Identify Remove – Identifies and removes devices (hot-plug). To run these tasks
directly from the command line, specify the resource and other task-unique flags.
Use the descriptions in this chapter to understand which flags are needed for a
given task.
Task and service aid functions
If a device does not show in the test list or you suspect that a device’s diagnostic
package is not loaded, check by using the Display Configuration and Resource List
task. If the device you want to test has a plus (+) sign or a minus (-) sign preceding
its name, the diagnostic package is loaded. If the device has an asterisk (*)
preceding its name, the diagnostic package for the device is not loaded or is not
available. Tasks and service aids provide a means to display data, check media,
and check functions without being directed by the hardware problem determination
procedure. Refer to “Tasks (service aids)” on page 63 for a list of tasks and service
aids.
AIX automatic error log analysis (diagela)
Automatic error log analysis (diagela) provides the capability to perform error log
analysis when a permanent hardware error is logged by enabling the diagela
program on all RPA platforms. The diagela program determines if the error should
be analyzed by the diagnostics.
62 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
If the error should be analyzed, a diagnostic application is invoked and the error is
analyzed. No testing is done if the diagnostics determine that the error requires a
service action. Instead, it sends a message to your console, or to all system
groups. The message contains the SRN. Running diagnostics in this mode is similar
to using the diag -c, -e, -d device command.
To activate the automatic error log analysis feature on systems running AIX as the
operating system, log in as root user (or use CE login) and type the following
command:
/usr/lpp/diagnostics/bin/diagela ENABLE
To disable the automatic error log analysis feature on systems running AIX, log in
as root user (or use CE login) and type the following command:
/usr/lpp/diagnostics/bin/diagela DISABLE
The diagela program can also be enabled and disabled using the Periodic
Diagnostic Service Aid.
Error log analysis
This section provides information on error log analysis.
v Error log analysis is the analysis of the AIX error log entries.
v Error log analysis is part of the diagnostic applications. The analysis is started by
v Error log analysis is only performed when running online diagnostics.
v Error log analysis is not performed when running standalone diagnostics.
v Error log analysis only reports problems if the errors have reached defined
v Permanent errors do not necessarily mean a part should be replaced.
v Automatic Error Log Analysis (diagela) provides the capability to do error log
selecting a device from the Diagnostic Selection menu and then using the diag
command or selecting the Run Error Log Analysis task.
thresholds. Thresholds can be from 1 to 100, depending on the error.
analysis whenever a permanent hardware error is logged.
Log repair action
The diagnostics perform error log analysis on most resources. The default time for
error log analysis is seven days; however, this time can be changed from 1 to 60
days using the Display or Change Diagnostic Run Time Options task.
To prevent false problems from being reported when error log analysis is run, repair
actions need to be logged whenever a FRU is replaced. A repair action can be
logged by using the Log Repair Action task or by running diagnostics in system
verification mode.
The Log Repair Action task lists all resources. Replaced resources can be selected
from the list, and when Commit (F7 key) is selected, a repair action is logged for
each selected resource.
Tasks (service aids)
These are tasks that might be available to the JS20 blade server:
v Add Resource to Resource List
v AIX Shell Prompt
v Analyze Adapter Internal Log
v Automatic Error Log Analysis and Notification
Chapter 8. General AIX and xSeries standalone diagnostic information 63
v Backup and Restore Media
v Certify Media
v Change Hardware Vital Product Data
v Configure Reboot Policy
v Configure Surveillance Policy
v Delete Resource from Resource List
v Disk Maintenance
v Display Configuration and Resource List
v Display Firmware Device Node Information
v Display Hardware Error Report
v Display Hardware Vital Product Data
v Display Machine Check Error Log
v Display Microcode Level
v Display Multipath I/O (MPIO) Device Configuration
v Display or Change Bootlist
v Display or Change Diagnostic Run Time Options
v Display Previous Diagnostic Results
v Display Service Hints
v Display Software Product Data
v Display System Environmental Sensors
v Display USB Devices
v Download Microcode
v Fibre Channel RAID Service Aids
v Format Media
v Gather System Information
v Generic Microcode Download
v Hot-Plug Task
v Identify Indicators
v Identify and System Attention Indicators
v Local Area Network Analyzer
v Log Repair Action
v Periodic Diagnostics
v RAID Array Manager
v Run Diagnostics
v Run Error Log Analysis
v Run Exercisers
v Save or Restore Hardware Management Policies
v Save or Restore Service Processor Configuration (RSPC)
v Spare Sector Availability
v System Fault Indicator
v System Identify Indicator
v Update Disk-Based Diagnostics
v Update and Manage System Flash
Download microcode
This service aid provides a way to copy microcode to an adapter or device. The
service aid presents a list of adapters and devices that use microcode. After the
adapter or device is selected, the service aid provides menus to guide you in
checking the current level and installing the needed microcode.
This task can be run directly from the AIX command line. Most adapters and
devices use a common syntax as identified in this section.
64 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
For many adapters and devices, microcode installation occurs and becomes
effective while the adapters and devices are in use. It is recommended that a
current backup be available and the installation be scheduled during a non-peak
production period.
Notes:
1. If the source is /etc/microcode, the image must be stored in the /etc/microcode
directory on the system. If the system is booted from a NIM server, the image
must be stored in the usr/lib/microcode directory of the SPOT the client is
booted from.
2. If the source is CD (cdX), the CD must be in ISO 9660 format. There are no
restrictions as to what directory in which to store the image.
3. If the source is diskette (fdX), the diskette must be in backup format and the
image stored in the /etc/microcode directory.
The following is the common syntax command:
diag [-c ]-d <device>-T "download [-s {/etc/microcode|<source>}][-1 {latest|previous}[-f ]"
Flag descriptions are as follows:
Flag Description
-c No console mode. Run without user interaction.
-d <device Run the task on the device or adapter specified.
-T download Install microcode.
-s /etc/microcode Microcode image is in /etc/microcode.
-s <source> Microcode image is on specified source. For example, fd0, cd0.
-l latest Install latest level of microcode. This is the default.
-l previous Install previous level of microcode.
-f Install microcode even if the current level is not on the source.
Update and manage system flash
Note: The firmware update can be done using the service aid or the AIX command
line.
This selection validates a new system firmware flash image and uses it to update
the system temporary flash image. This selection can also be used to validate a
new system firmware flash image without performing an update, commit the
temporary flash image, and reject the temporary flash image.
Look for additional update and recovery instructions with the update kit. You need to
know the fully-qualified path and file name of the flash update image file provided in
the kit. If the update image file is on a diskette or optical media, the service aid can
list the files on the diskette or optical media for selection. The diskette must be a
valid backup-format diskette.
Refer to the update instructions with the kit to determine the current level of the
system unit or service processor flash memory.
When this service aid is run from online diagnostics, the flash update image file is
copied to the /var file system. It is recommended that the source of the microcode
Chapter 8. General AIX and xSeries standalone diagnostic information 65
that you want to download be put into the /etc/microcode directory on the system. If
there is not enough space in the /var file system for the new flash update image
file, an error is reported. If this error occurs, exit the service aid, increase the size of
the /var file system, and retry the service aid. After the file is copied, a screen
requests confirmation before continuing with the flash update. When you continue
the update flash, the system reboots using the shutdown -u command. The system
does not return to the diagnostics, and the current flash image is not saved. After
the reboot, you can remove the /var/update_flash_image file.
When this service aid is run from standalone diagnostics, the flash update image
file is copied to the file system from diskette, optical media, or from the NIM server.
Using a diskette, the user must provide the image on backup format diskette
because the user does not have access to remote file systems or any other files
that are on the system. If using the NIM server, the microcode image must first be
copied onto the NIM server in the /usr/lib/microcode directory pointed to the NIM
SPOT (from which you plan to have the NIM client boot standalone diagnostics)
prior to performing the NIM boot of diagnostics. Next, a NIM check operation must
be run on the SPOT containing the microcode image on the NIM server. After
performing the NIM boot of diagnostics, one can use this service aid to update the
microcode from the NIM server by choosing the /usr/lib/microcode directory when
prompted for the source of the microcode that you want to update. If not enough
space is available, an error is reported stating that additional system memory is
needed. After the file is copied, a prompt requests confirmation before continuing
with the flash update. When you continue with the update, the system reboots using
the reboot -u command. Yo u might receive a ″ Caution: Some process(es) wouldn’t
die″ message during the reboot process; you can ignore this message. The current
flash image is not saved.
You can use the update_flash command in place of this service aid. The command
is located in the /usr/lpp/diagnostics/bin directory. The command syntax is as
follows:
update_flash [-q | -v] -f file_name
update_flash [-q | -v] -D device_name -f file_name
update_flash [-q | -v] -D
update_flash [-l]
update_flash -c
update_flash -r
Important: The update_flash command reboots the entire system. Do not use this
command if more than one user is logged in to the system.
Flag descriptions are as follows:
Flag Description
-D Specifies that the flash update image file is on diskette. The device_name
variable specifies the device. The default device_name is /dev/fd0.
-f Flash update image file source. The file_name variable specifies the fully
qualified path of the flash update image file.
-l Lists the files on a diskette, from which the user can choose a flash update
image file.
-q Forces the update_flash command to update the flash EPROM and reboot the
system without asking for confirmation.
-v Validates the flash update image. No update will occur.
66 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Flag Description
-c Commits the temporary flash image when booted from the temporary image.
This overwrites the permanent image with the temporary image.
-r Rejects the temporary image when booted from the permanent image. This
overwrites the temporary image with the permanent image.
Using the standalone CD-ROM and online current diagnostics
The diagnostics consist of standalone diagnostics and online diagnostics.
v The standalone diagnostics must be booted before they are run. If booted, they
have no access to the AIX error log or the AIX configuration data.
v Online diagnostics are resident with AIX on the disk or server. They can be
booted and run concurrently (called concurrent mode) with other applications.
They have access to the AIX error log and the AIX configuration data.
Notes:
v If this system unit is attached to another system, be sure you isolate this system unit
before stopping the operating system or running diagnostic programs.
v The AIX operating system must be installed in order to run online diagnostics. If the AIX
operating system is not installed, use the standalone diagnostic procedures.
Standalone and online diagnostics operating considerations
Before you use the diagnostics, consider the following information:
v Run online diagnostics in concurrent mode whenever possible, unless otherwise
directed. The online diagnostics perform additional functions as compared to
standalone diagnostics. The AIX error log functions are only available when
diagnostics are run from the disk (concurrent diagnostic) drive.
v When running online diagnostics, device support for some devices may not have
been installed. If this is the case, that device does not appear in the resource list.
v When running standalone diagnostics, device support for some devices may be
contained on supplemental diagnostic media. If this is the case, the device does
not appear in the resource list when running diagnostics unless the supplemental
media has been processed.
Running online diagnostics
Consider the following information when you run the online diagnostics from a
server or a disk:
v The diagnostics cannot be loaded and run from a disk until the AIX operating
system has been installed and configured.
v When the system is running in a full machine partition, then, if the diagnostics
were loaded from disk or a server, you must shut down the AIX operating system
before powering off the system unit to prevent possible damage to disk data. This
is done in one of two ways:
– If the diagnostic programs were loaded in Standalone mode, press the F3 key
until Diagnostic Operating Instructions displays; then follow the displayed
instructions to shut down the AIX operating system.
– If the diagnostic programs were loaded in maintenance or concurrent mode,
enter the shutdown -F command.
Chapter 8. General AIX and xSeries standalone diagnostic information 67
v Under some conditions the system may stop, with instructions displayed on
attached displays and terminals.
Follow the instructions to select a console display.
Running the online diagnostics in concurrent mode
Use concurrent mode to run online diagnostics on some of the system resources
while the system is running normal system activity. Because the system is running
in normal operation, the following resources cannot be tested in concurrent mode:
v Adapters connected to paging devices, or disk drive used for paging
v Memory
v Microprocessor
Three levels of testing exist in concurrent mode:
v The share-test level tests a resource while the resource is being shared by
programs running in the normal operation. This testing is mostly limited to normal
commands that test for the presence of a device or adapter.
v The sub-test level tests a portion of a resource while the remaining part of the
resource is being used in normal operation. For example, this test could test one
port of a multiparty device while the other ports are being used in normal
operation.
v The full-test level requires the device not be assigned to or used by any other
operation. This level of testing on a disk drive might require the use of the vary
command. The diagnostics display menus to allow you to vary off the needed
resource. Error log analysis is done in concurrent mode when you select the
Problem Determination option on the Diagnostic Mode Selection menu.
log analysis is done in concurrent mode when you select the Problem
Error
Determination option on the Diagnostic Mode Selection menu.
Running standalone diagnostics from a management (NIM) server
A client system connected to a network with a Network Installation Management
(NIM) server is capable of booting the standalone diagnostics from the NIM server if
the client system is registered on the NIM server, and if the NIM boot settings on
both the NIM server and the client system are correct.
Consider the following information when running standalone diagnostics from a NIM
server:
1. For NIM clients that have adapters that would normally require that
supplemental media be loaded when standalone diagnostics are run from
CD-ROM, the support code for these adapters must be loaded into the directory
pointed to by the NIM SPOT from which you wish to boot that client. Before
running standalone diagnostics on these clients from the NIM server, the NIM
server system administrator must ensure that any needed support for these
devices is loaded onto the server.
2. Use one of the following methods to determine the amount of available system
memory:
v Run the Display Resource Attributes task for resource.
v Use the Config option under System Management Services (see the system
unit service guide).
v Use the following AIX command: lsattr -E -l mem0
68 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
3. All operations to configure the NIM server require root authority.
4. If you replace the network adapter in the client, the network adapter hardware
address for the client must be updated on the NIM server.
5. The Control state (Cstate) for standalone clients on the NIM server should be
kept in the Diagnostic Boot has been Enabled state.
6. On the client system, the NIM server network adapter should be put in the boot
list after the boot disk drive. This allows the system to boot up in standalone
diagnostics from the NIM server if there is a problem booting from the disk
drive. Refer to the ″ Multiboot″ section under the SMS chapter in the service
guide for the client system to obtain information about setting the boot list.
NIM server configuration
Refer to the Network Installation Management Guide and Reference for information
on the following:
v Register a client on the NIM server
v Enable a client to run diagnostics from the NIM server
verify that the client system is registered on the NIM server and diagnostic boot
To
is enabled, run the following command from the command line on the NIM server:
lsnim -a Cstate -Z ClientName
Refer to the following table for system responses.
Note: The ClientName is the name of the system on which you want to run the
standalone diagnostics.
System response Client status
#name:Cstate: ClientName: diagnostic boot
has been enabled:
#name:Cstate: ClientName:ready for a NIM
operation: or #name:Cstate: ClientName:BOS
installation has been enabled:
The client system is registered on the NIM server and enabled to run
diagnostics from the NIM server.
The client system is registered on the NIM server but not enabled to
run standalone diagnostics from the NIM server.
Note: If the client system is registered on the NIM server but Cstate
has not been enabled, no data will be returned.
0042-053 lsnim: there is no NIM object named
The client is not registered on the NIM server.
″ ClientName″
Client configuration and booting ERserver standalone diagnostics from
the NIM server
To run standalone diagnostics on a client system from the NIM server, complete the
following steps:
1. Remove all removable media (tape or CD-ROM disc).
2. Stop all programs, including the AIX operating system (get help if needed).
3. If you are running standalone diagnostics in a full machine partition, verify with
the system administrator and system users that the system unit can be
shutdown. Stop all programs, including the operating system (refer to the
operating system documentation). Verify with the system administrator and
system users using that partition that all applications on that partition must be
stopped, and that the partition will be rebooted. Stop all programs on that
partition including the operating system.
Chapter 8. General AIX and xSeries standalone diagnostic information 69
4. If the system is running in a full-machine partition, turn on the system unit
power. Restart the AIX operating system in the system you wish to run online
diagnostics.
5. Enter any requested passwords.
6. Select Utilities.
7. Depending on the console type, select [was ″ RIPL or Remote Initial Program
Load Setup″ ].
8. Depending on the console type, select [″ Set Address or IP Parameters″ ].
9. Enter the client address, server address, gateway address (if applicable), and
subnet mask into the Remote Initial Program Load (RIPL). If there is no
gateway between the NIM server and the client, set the gateway address to
0.0.0.0. To determine if there is a gateway, either ask the system network
administrator or compare the first 3 octets of the NIM server address and the
client address. If they are the same, (for example, if the NIM server address is
9.3.126.16 and the client address is 9.3.126.42, the first 3 octets (9.3.126) are
the same), then set the gateway address in the RIPL field to 0.0.0.0.
10. If the NIM server is setup to allow the pinging of the client system, use the
ping option in the RIPL utility to verify that the client system can ping the NIM
server. Under the Ping utility, choose the network adapter that provides the
attachment to the NIM server to do the ping operation. If the ping comes back
with an OK prompt, the client is prepared to boot from the NIM server. If ping
returns with a FAILED prompt, the client does not proceed with the boot.
Note: If the ping fails, refer to “Boot problem resolution” on page 153; then,
follow the steps for network boot problems.
the following procedure to temporarily change the system boot list so that the
Use
network adapter attached to the NIM server network, is first in the boot list.
The system should start loading packets while doing a bootp from the network.
Follow the instructions on the screen to select the system console. If Diagnostics
Operating Instructions Version x.x.x is displayed, standalone diagnostics has loaded
successfully. If the AIX login prompt displays, standalone diagnostics did not load.
Check the following items:
v The boot list on the client might be incorrect.
v Cstate on the NIM server might be incorrect.
v There might be network problems preventing you from connecting to the NIM
server. Verify the settings and the status of the network. If you continue to have
problems, refer to“Boot problem resolution” on page 153; then, follow the steps
for network boot problems. After running diagnostics, reboot the system and use
BladeCenter management screens to change the boot list sequence back to the
original settings.
the settings and the status of the network. If you continue to have problems,
Verify
refer to“Boot problem resolution” on page 153.
After running diagnostics from NIM server, change the boot list to the original boot
list.
70 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Chapter 9. Installing options
This chapter provides instructions for adding options or customer-replaceable units
(CRUs) to the blade server. CRUs are easily replaceable components, such as
memory modules, hard disk drives, and I/O expansion cards. (Some removal
instructions are provided in case you need to remove one option or CRU to install
another.)
Installation guidelines
Before you begin, read the following information:
v Read Appendix B, “Safety information,” on page 163, and the guidelines in
“Handling static-sensitive devices.” This information will help you work safely with
the blade server and options.
v Read the information in “Preinstallation checklist” on page 9.
v Back up all important data before you make changes to disk drives.
v For a list of supported options for the blade server, go to http://www.ibm.com/pc/
us/compat/.
v Before you remove a hot-swap blade server from the BladeCenter unit, you must
shut down the operating system by typing shutdown -h now . If the blade server
was not turned off, press the power-control button (behind the blade-server
control panel door) to turn off the blade server. You do not have to shut down the
BladeCenter unit itself.
System reliability guidelines
To help ensure proper cooling and system reliability, make sure that:
v The ventilation holes on the blade server are not blocked.
v Each of the blade bays on the front of the BladeCenter unit has a blade server or
filler blade installed. Do not operate the BladeCenter unit for more than 1 minute
without a blade server or filler blade installed in each blade bay.
v You have followed the reliability guidelines in the documentation that comes with
the BladeCenter unit.
v You have not installed any small computer system interface (SCSI) devices. The
blade server does not support SCSI devices. If you attach SCSI devices to the
blade server, these devices will not be recognized or configured, and they will not
operate.
Handling static-sensitive devices
Attention: Static electricity can damage the blade server, the BladeCenter unit,
and other electronic devices. To avoid damage, keep static-sensitive devices in their
static-protective packages until you are ready to install them.
To reduce the possibility of damage from electrostatic discharge, observe the
following precautions:
v When working on the BladeCenter T unit, use an electrostatic discharge (ESD)
wrist strap, especially when you will be handling modules, options, and blade
servers. To work properly, the wrist strap must have a good contact at both ends
(touching your skin at one end and firmly connected to the ESD connector on the
front or back of the BladeCenter T unit).
© Copyright IBM Corp. 2003 71
v Limit your movement. Movement can cause static electricity to build up around
you.
v Handle the device carefully, holding it by its edges or its frame.
v Do not touch solder joints, pins, or exposed printed circuitry.
v Do not leave the device where others can handle and damage it.
v While the device is still in its static-protective package, touch it to any unpainted
metal surface of the BladeCenter chassis or any unpainted metal surface on any
other grounded rack component in the rack in which you are installing the device
for at least 2 seconds. (This drains static electricity from the package and from
your body.)
v Remove the device from its package and install it directly into the blade server or
BladeCenter unit without setting down the device. If it is necessary to set down
the device, put it in its static-protective package. Do not place the device on your
BladeCenter chassis or on a metal surface.
v Take additional care when handling devices during cold weather. Heating reduces
indoor humidity and increases static electricity.
72 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Removing the blade server from the BladeCenter unit
The following illustration shows an example of how to remove the blade server from
a typical BladeCenter unit; the orientation of the blade server depends on the type
of BladeCenter unit you have.
Note: The illustrations in this document might differ slightly from your hardware.
Attention:
v To maintain proper system cooling, do not operate the BladeCenter unit for more
than 1 minute without a blade server or filler blade installed in each blade bay.
v Note the number of the bay that contains the blade server that you will remove.
You will need this information if you decide to reinstall the blade server in the
BladeCenter unit. If you reinstall the blade server, be sure to reinstall it in the
same bay from which it was removed. Reinstalling a blade server into a different
bay than the one from which it was removed could have unintended
consequences, such as incorrectly reconfiguring the blade server. Some blade
server configuration information and update options are established according to
bay number. If you reinstall the blade server into a different bay, you might have
to reconfigure the blade server.
The blade server is a hot-swap device, and the blade bays in the
Note:
BladeCenter unit are hot-swap bays. Therefore, you can install or remove
the blade server without removing power from the BladeCenter unit.
However, you must turn off the blade server before removing it from the
BladeCenter unit.
Complete the following steps to remove the blade server:
1. Read the safety information beginning on page iii and “Installation guidelines” on
page 71
2. If the blade server is operating, the power-on LED is lit continuously (steady).
Shut down the operating system by typing the shutdown -h now command.
Refer to your operating system documentation. If the blade server was not
turned off, press the power-control button (behind the blade-server control-panel
door) to turn off the blade server. See “Blade server controls and LEDs” on
page 14 for more information about the location of the power-control button.
Attention: Wait at least 30 seconds for the hard disk drives to stop spinning,
before proceeding to the next step.
3. Open the two release levers as shown in the illustration. The blade server
moves out of the bay approximately 0.6 cm (0.25 inch).
4. Pull the blade server out of the bay.
5. Place either a filler blade or a new blade server in the bay within 1 minute.
Chapter 9. Installing options 73
Opening the blade server cover
The following illustration shows how to open the cover on the blade server.
Cover
pins
Blade-cover
release (blue)
Blade-cover
release (blue)
Complete the following steps to open the blade server cover:
1. Read “Important safety information” on page iii and “Installation guidelines” on
page 71
2. Carefully place the blade server on a flat, static-protective surface, with the
cover side up.
3. Press the blue blade-cover release on each side of the blade server and lift the
cover open, as shown in the illustration.
4. Lift the cover from the blade server and set it aside.
Statement 21:
CAUTION:
Hazardous energy is present when the blade server is connected to the power
source. Always replace the blade cover before installing the blade server.
74 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Removing the blade server bezel assembly
Before you can replace a defective system-board assembly or blade-server bezel
assembly, you must first remove the blade-server bezel assembly. The following
illustration shows how to remove the bezel assembly from a blade server.
Bezel-assembly
release
Control panel
connector
Complete the following steps to remove the blade-server bezel assembly:
1. Read the safety information beginning on page iii and “Installation guidelines” on
page 71
2. Open the blade server cover.
3. Press the bezel-assembly release and pull the bezel assembly away from the
blade server approximately 1.2 cm (0.5 inch).
4. Disconnect the control-panel cable from the control-panel connector.
5. Pull the bezel assembly away from the blade server.
6. Store the bezel assembly in a safe place.
Control-panel
cable
Bezel-assembly
release
Installing IDE hard disk drives
The blade server has two connectors on the system board for installing optional
2.5-inch integrated drive electronics (IDE) hard disk drives. Each IDE connector is
on a separate channel. Some models come with at least one IDE hard disk drive
already installed.
Note: Some hard disk drives have Phillips screws; therefore, make sure that a
Phillips screwdriver is available.
Attention: To maintain proper system cooling, do not operate the BladeCenter
unit for more than 1 minute without a blade server or filler blade installed in each
blade bay.
Chapter 9. Installing options 75
IDE drive
Tr ay
Riser
card
IDE connector 2
(J2)
Short screws
IDE connector 1
(J1)
Attention:
v Drives must be installed in the following order: IDE connector 1 (J1) first, then
IDE connector 2 (J2).
v Do not install a hard disk drive in IDE connector 2 if you intend to also install an
optional I/O expansion card. The I/O expansion card occupies the same area as
the second IDE hard disk drive.
v Do not press on the top of the hard disk drive when installing it. Pressing the top
could damage the hard disk drive.
v IDE hard disk drives must be set to primary (master). See the documentation that
came with your hard disk drive for instructions.
Complete
1. Read the safety information beginning on page iii and “Installation guidelines”
the following steps to install a 2.5-inch IDE hard disk drive:
on page 71
2. If the blade server is operating, shut down the operating system by typing the
shutdown -h now command. Refer to your operating system documentation. If
the blade server was not turned off, press the power-control button (behind the
blade-server control-panel door) to turn off the blade server. See “Blade server
controls and LEDs” on page 14 for more information about the location of the
power-control button.
3. Remove the blade server from the BladeCenter unit. (See “Removing the
blade server from the BladeCenter unit” on page 73 for instructions.) Carefully
place the blade server on a flat, static-protective surface.
4. Open the blade server cover. See “Opening the blade server cover” on page
74 for instructions.
5. Insert the riser card from the option kit into an IDE connector on the blade
server system board.
6. Place the tray from the option kit over the riser card as shown in the preceding
illustration, aligning the tray with the screws on the system board. Note the
four screws that are under the four screw holes in the tray. Set the tray aside
and remove the four screws.
76 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
7. Replace the tray and secure the tray to the system board with screws from the
hardware kit.
8. Set any jumpers or switches on the hard disk drive, if this requirement is
specified on the drive label or in the documentation that comes with the drive.
9. Place the hard disk drive into the tray and, from the rear edge of the hard disk
drive, push it into the connector on the riser card until the hard disk drive
moves past the lever at the back of the tray. The hard disk drive clicks into
place.
10. If you have other options to install or remove, do so now; otherwise, go to
“Completing the installation” on page 90.
Installing memory modules
You can increase the amount of memory in the blade server by installing additional
memory-module options. The following items describe the types of dual inline
memory modules (DIMMs) that the blade server supports and other information that
you must consider when installing DIMMs:
v The system board contains four DIMM connectors and supports two-way memory
interleaving.
v As of the date of this publication, the blade server supports a minimum of 512
MB and a maximum of 8 GB of system memory (depending on the blade server
model). The DIMM options available are 256 MB, 512 MB, 1 GB, and 2 GB, with
the following exceptions:
– 256 MB DIMMs are not supported by the 8842-4Tx model.
– 2 GB DIMMs are not supported by the 8842-21x, 8842-E1x, 8842-E2x,
8842-41x, and 8842-4Ax models.
Install only 2.5 V, 184-pin, double-data-rate (DDR), PC2700, registered
v
synchronous dynamic random-access memory (SDRAM) with error correcting
code (ECC) DIMMs. For a current list of supported DIMMs for the blade server,
go to http://www.ibm.com/pc/us/compat/.
v Install DIMMs in a matched pair. Each pair must be the same size, speed, type,
and technology. Yo u can mix compatible DIMMs from various manufacturers. The
second pair of DIMMs does not have to be the same size as the first pair.
v After you install or remove a DIMM, the new configuration information is
automatically saved in the blade server firmware code.
The following illustration shows how to install DIMMs on the system board.
DIMM socket 4 (J40)
DIMM socket 3 (J32)
DIMM socket 2 (J28)
DIMM socket 1 (J25)
Chapter 9. Installing options 77
Before you begin, read the documentation that comes with the option.
78 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Complete the following steps to install a DIMM:
1. Read the safety information beginning on page iii and “Installation guidelines” on
page 71
2. If the blade server is operating, shut down the operating system by typing the
shutdown -h now command. Refer to your operating system documentation. If
the blade server was not turned off, press the power-control button (behind the
blade-server control-panel door) to turn off the blade server. See “Blade server
controls and LEDs” on page 14 for more information about the location of the
power-control button.
3. Remove the blade server from the BladeCenter unit. See “Removing the blade
server from the BladeCenter unit” on page 73 for instructions.
4. Carefully place the blade server on a flat, static-protective surface.
5. Open the blade server cover. See “Opening the blade server cover” on page 74
for instructions.
6. Locate the DIMM connectors on the system board. Determine the connectors
into which you will install the DIMMs.
The blade server comes with two 256 MB DIMMs installed in the DIMM 3 (J32)
and DIMM 4 (J40) memory connectors. When you install additional DIMMs, be
sure to install them as a pair, in DIMM connectors 1 and 2 (J25 and J28).
Install the DIMMs in the following order:
Pair DIMM connectors
First 3 and 4 (J32 and J40)
Second 1 and 2 (J25 and J28)
7. Touch the static-protective package that contains the DIMM option to any
unpainted metal surface on the BladeCenter chassis or any unpainted surface
on any other grounded rack component. Then, remove the DIMM from the
package.
8. To install the DIMMs, repeat the following steps for each DIMM that you install:
a. Turn the DIMM so that the DIMM keys align correctly with the connector on
the system board. Ensure that the retaining clips are open.
Attention: To avoid breaking the retaining clips or damaging the DIMM
connectors, handle the clips gently.
b. Insert the DIMM by pressing the DIMM along the guides into the connector.
Make sure that the retaining clips snap into the closed positions.
Important: If there is a gap between the DIMM and the retaining clips, the
DIMM has not been correctly installed. In this case, open the retaining clips
and remove the DIMM; then, reinsert the DIMM.
If you have other options to install or remove, do so now; otherwise, go to
9.
“Completing the installation” on page 90.
Installing an I/O expansion card
You can add an optional I/O expansion card (adapter) to the blade server to give
the blade server additional network connections for communicating on a network.
When you add an I/O expansion card, you must make sure that the switch modules
in I/O bays 3 and 4 on the BladeCenter unit both support the I/O expansion card
network-interface type. For example, if you add an Ethernet expansion card to the
blade server, the modules in I/O bays 3 and 4 on the BladeCenter unit must both
be compatible with the Ethernet expansion card. All other I/O expansion cards
installed on other blade servers in the BladeCenter unit must also be compatible
Chapter 9. Installing options 79
with these switch modules. In this example, you could then install two Ethernet
switch modules, two pass-thru modules, or one Ethernet switch module and one
pass-thru module. Because pass-thru modules are compatible with a variety of I/O
expansion cards, installing two pass-thru modules would allow use of several
different types of compatible I/O expansion cards within the same BladeCenter unit.
Important:
v Installation of an I/O expansion card requires removal of the hard disk drive that
is installed in IDE connector 2. The I/O expansion card occupies the same space
as this hard disk drive and replaces it. You cannot install a hard disk drive in IDE
connector 2 while an I/O expansion card is installed in the blade server.
v The Myrinet Cluster Expansion Card for IBM Eserver BladeCenter comes with a
cable for connection to the system board of a compatible device. However, the
cable is not used in the BladeCenter JS20 Type 8842. Therefore, when you
install a Myrinet Cluster Expansion Card for IBM Eserver BladeCenter into a
BladeCenter JS20 Type 8842, do not connect the cable from the I/O expansion
card to the system board.
v If you plan to install a Fibre Channel expansion card and use it for remote startup
(boot) operations, call the IBM Support Center for additional information. In the
U.S. and Canada, call 1-800-IBM-SERV (1-800-426-7378). In other countries, go
to http://www.ibm.com/planetwide/ to locate your support telephone numbers.
Attention: If the hard disk drive installed in IDE connector 2 contains any
information that you want to keep, back it up to another storage device.
80 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
The following illustration shows how to install an I/O expansion card on the blade
server. The card is installed near IDE connector 2.
I/O expansion tray
D
R
N
E
H
W
IBM I/O expansion card
A
C
G
IN
E
L
L
R
A
E
T
H
S
S
IN
S
E
R
P
I/O expansion card
connector
Raised hook
Short screws
I/O expansion card
connector
Complete the following steps to install an I/O expansion card:
1. Read the safety information beginning on page iii and “Installation guidelines” on
page 71
2. If the blade server is operating, shut down the operating system by typing the
shutdown -h now command. Refer to your operating system documentation. If
the blade server was not turned off, press the power-control button (behind the
blade-server control-panel door) to turn off the blade server. See “Blade server
controls and LEDs” on page 14 for more information about the location of the
power-control button.
3. Remove the blade server from the BladeCenter unit (see “Removing the blade
server from the BladeCenter unit” on page 73 for information).
4. Carefully place the blade server on a flat, static-protective surface.
5. Open the cover (see “Opening the blade server cover” on page 74 for
instructions).
6. Install the I/O expansion card tray:
a. If there is no IDE hard disk drive in IDE connector 2, remove the four
screws as shown in the previous illustration. Then, continue with step 6c.
b. If an IDE hard disk drive is in IDE connector 2, remove the hard disk drive
and tray. Save the four long screws that secured the tray to the system
board. Remove the riser card that connected the IDE hard disk drive to the
blade server system board.
c. Secure the tray to the system board with the screws from the option kit, as
shown in the previous illustration.
Install the I/O expansion card:
7.
a. Orient the I/O expansion card as shown in the previous illustration.
b. Slide the notch in the narrow end of the card into the raised hook on the
tray; then, gently pivot the wide end of the card into the I/O expansion card
connectors, as shown in the previous illustration.
Chapter 9. Installing options 81
Note: For device driver and configuration information to complete the
installation of the I/O expansion card, see the documentation that comes
with the card. Some documentation might also be on the IBM
BladeCenter Documentation CD that comes with the BladeCenter unit.
For the latest editions of the IBM BladeCenter documentation, go to
http://www.ibm.com/support/ on the World Wide Web.
8. If you have other options to install or remove, do so now; otherwise, go to
“Completing the installation” on page 90.
Ethernet controller, switch module, and cabling requirements
One dual-port Gigabit Ethernet controller is integrated on the BladeCenter JS20
Type 8842 system board. To support Ethernet connections and the Serial Over LAN
(SOL) feature and to configure the blade server, you must install an optional
Ethernet-compatible switch module, such as the Nortel Networks Layer 2-7 GbE
Switch Module for IBM Eserver BladeCenter or IBM 4-Port Gb Ethernet Switch
Module for BladeCenter, in I/O bay 1 of the BladeCenter unit.
Each controller port provides a 1000-Mbps full-duplex interface for connecting to
one of the Ethernet-compatible switch modules in I/O bays 1 and 2. If you plan to
attach additional Ethernet devices to the blade server or the BladeCenter unit, you
must install an optional Ethernet-compatible switch module, such as the Nortel
Networks Layer 2-7 GbE Switch Module for IBM Eserver BladeCenter or IBM
4-Port Gb Ethernet Switch Module for BladeCenter, in I/O bay 3 or 4 of the
BladeCenter unit, to support these additional Ethernet connections.
The optional Ethernet switch modules contain four ports with RJ-45 connectors.
These connectors provide a 10/100/1000 Base-T interface (either at half-duplex or
full duplex) for connecting twisted-pair cable to the Ethernet network. You must
purchase and install a compatible cable to connect these devices. To connect an
Ethernet controller port to a repeater or switch module, use an unshielded twisted
pair (UTP) cable with RJ-45 connectors at both ends. For 100 Mbps or higher
operation, Category 5 cabling is required. For 10 Mbps operation, Category 3 or
Category 5 cabling is required.
Notes:
v For more information about Ethernet requirements, see the documentation that
comes with the Ethernet devices and the BladeCenter Type 8677 Installation and
User’s Guide.
v For more information about installing, configuring, and using the Ethernet switch
modules, see the documentation that comes with the Ethernet switch module that
you are using, such as the IBM 4-Port Gb Ethernet Switch Module for
BladeCenter Installation and User’s Guide or Nortel Networks Layer 2-7 GbE
Switch Module for IBM BladeCenter Installation Guide.
v For more information about the SOL feature, see Chapter 3, “Configuration,” on
page 17, the IBM Eserver BladeCenter and BladeCenter T Serial Over LAN
Setup Guide, and the BladeCenter and BladeCenter T Management Module
Command-Line Interface Reference Guide.
82 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Replacing the battery
IBM has designed this product with your safety in mind. The lithium battery must be
handled correctly to avoid possible danger. If you replace the battery, you must
adhere to the following instructions.
Note: In the U. S., call 1-800-IBM-4333 for information about battery disposal.
If you replace the original lithium battery with a heavy-metal battery or a battery with
heavy-metal components, be aware of the following environmental consideration.
Batteries and accumulators that contain heavy metals must not be disposed of with
normal domestic waste. They will be taken back free of charge by the manufacturer,
distributor, or representative, to be recycled or disposed of in a proper manner.
To order replacement batteries, call 1-800-IBM-SERV within the United States, and
1-800-465-7999 or 1-800-465-6666 within Canada. Outside the U.S. and Canada,
call your IBM authorized reseller or IBM marketing representative.
Note: After you replace the battery, the blade server is automatically reconfigured;
however, you must reset the system date and time through the operating
system that you installed.
Statement 2:
CAUTION:
When replacing the lithium battery, use only IBM Part Number 33F8354 or an
equivalent type battery recommended by the manufacturer. If your system has
a module containing a lithium battery, replace it only with the same module
type made by the same manufacturer. The battery contains lithium and can
explode if not properly used, handled, or disposed of.
Do not:
v Throw or immerse into water
v Heat to more than 100°C (212°F)
v Repair or disassemble
Dispose
of the battery as required by local ordinances or regulations.
Note: See “Battery return program” on page 199 for more information about battery
disposal.
Chapter 9. Installing options 83
Complete the following steps to replace the battery:
1. Read the safety information beginning on page iii and “Installation guidelines”
on page 71
2. Follow any special handling and installation instructions that come with the
battery.
3. If the blade server is operating, shut down the operating system by typing the
shutdown -h now command. Refer to your operating system documentation. If
the blade server was not turned off, press the power-control button (behind the
blade-server control-panel door) to turn off the blade server. See “Blade server
controls and LEDs” on page 14 for more information about the location of the
power-control button.
4. Remove the blade server from the BladeCenter unit (see “Removing the blade
server from the BladeCenter unit” on page 73 for information).
5. Carefully place the blade server on a flat, static-protective surface.
6. Open the blade server cover (see “Opening the blade server cover” on page
74 for instructions).
7. Locate the battery (connector BH1) on the system board.
Battery (BH1)
84 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
8. Remove the battery:
a. Use your finger to press down on one side of the battery; then, slide the
battery out from its socket. The spring mechanism will push the battery out
toward you as you slide it from the socket.
Note: You might need to lift the battery clip slightly with your fingernail to
make it easier to slide the battery.
b. Use your thumb and index finger to pull the battery from under the battery
clip.
Note: After you remove the battery, press gently on the clip to make sure
that the battery clip is touching the base of the battery socket.
9. Insert the new battery:
a. Tilt the battery so that you can insert it into the socket, under the battery
clip. Make sure that the side with the positive (+) symbol is facing up.
b. As you slide it under the battery clip, press the battery down into the
socket.
10. Close the blade server cover (see “Closing the blade server cover” on page
92).
Statement 21:
CAUTION:
Hazardous energy is present when the blade server is connected to the
power source. Always replace the blade cover before installing the blade
server.
11. Reinstall the blade server into the BladeCenter unit.
12. n on the blade server (see “Turning on the blade server” on page 13).
13. Reset the system date and time through the operating system that you
installed. For additional information, see your operating-system documentation.
Chapter 9. Installing options 85
System board
Two operational microprocessors and heat sinks are required on the system board
in the blade server at all times. The microprocessors and heat sinks are not
replaceable. Do not attempt to remove these components or any components that
secure the microprocessors and heat sinks to the system board. You must replace
the system board if any of these conditions exists:
v A microprocessor or heat sink becomes defective.
v Certain errors occur as described in “Firmware error codes” on page 102.
v The blade server does not restart after you recover the system firmware code as
described in “Recovering the system firmware code” on page 54.
obtain a new system board, you must order a new blade server. The
To
replacement system board comes attached to the new blade server. To order a
blade server, contact your IBM authorized reseller or IBM marketing representative.
Important: After you replace the system board, you must either update the new
blade server with the latest firmware or restore the pre-existing firmware from a
diskette or CD image. Yo u must also reconfigure the new blade server and reset
the system date and time.
System board component locations
The following illustration shows the location of the system-board components,
including connectors for user-installable options.
Note: All jumpers not specifically mentioned are reserved.
86 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
System-board LED locations
The following illustration shows the location of the LEDs on the system board.
DIMM 3 error
LED (CR46)
DIMM 1 error
LED (CR40)
DIMM 2 error
LED (CR45)
Microprocessor 0
error LED (CR19)
DIMM 4 error
LED (CR53)
Temperature
error LED (CR16)
Light Path Diagnostics
(SW1)
Microprocessor 1
error LED (CR58)
Service processor
error LED (CR27)
NMI error
LED (CR17)
System board
error LED (CR20)
Reserved (CR29)
Note: Error LED CR29 is reserved.
Replacing the system board
Complete the following steps to replace the system-board assembly:
1. Read the safety information beginning on page iii and “Installation guidelines”
on page 71
2. If the blade server is operating, shut down the operating system by typing the
shutdown -h now command. Refer to your operating system documentation. If
the blade server was not turned off, press the power-control button (behind the
blade-server control-panel door) to turn off the blade server. See “Blade server
controls and LEDs” on page 14 for more information about the location of the
power-control button.
3. Remove the blade server from the BladeCenter unit (see “Removing the blade
server from the BladeCenter unit” on page 73 for information). The faulty
system-board assembly is attached to the blade server.
4. Open the blade server cover (see “Opening the blade server cover” on page
74 for instructions).
5. Remove the blade-server bezel assembly (see “Removing the blade server
bezel assembly” on page 75).
Chapter 9. Installing options 87
6. Remove the following components from the faulty system-board assembly (see
the applicable installation instructions in this chapter and reverse the steps),
and place them on a flat, static-protective surface. Note the locations where
these components were installed on the faulty system-board assembly. You will
need this information when you install these components on the replacement
system-board assembly. Make sure that these components are accessible for
reinstallation.
v IDE hard disk drives, drive trays, and riser cards (see “Installing IDE hard
disk drives” on page 75)
v DIMMs (see “Installing memory modules” on page 77)
v I/O expansion cards and expansion card trays (see “Installing an I/O
expansion card” on page 79)
v Jumper J14, between jumpers J16 and J20 (for location, see the illustration
in “Recovering the system firmware code” on page 54)
7. While the new system-board assembly is still in its static-protective package,
touch it to an unpainted metal part of the system unit for at least 2 seconds.
8. Remove the new system-board assembly from its package and place it on a
flat, static-protective surface.
9. Install the components that you removed from the faulty system-board
assembly in step 6 into the corresponding locations on the replacement
system-board assembly.
v IDE hard disk drives, drive trays, and riser cards (see “Installing IDE hard
disk drives” on page 75)
v DIMMs (see “Installing memory modules” on page 77)
v I/O expansion cards and expansion card trays (see “Installing an I/O
expansion card” on page 79)
v Jumper J14, between jumpers J16 and J20 (for location, see the illustration
in “Recovering the system firmware code” on page 54)
If you plan to increase the amount of memory in the blade server, install the
new DIMMs on the new system-board assembly now. For additional
information, see “Installing memory modules” on page 77.
10. Note the machine type, model number, and serial number on the identification
label that is behind the control-panel door on the front of the blade server. Yo u
will need this information to complete this step.
The replacement system-board assembly comes with a repair identification
(RID) tag label. To ensure future entitlement for service, you must write the
serial number of the blade server (with the original system-board assembly)
onto the RID tag label in this step. The part number for the RID tag is
13N0477.
Use the RID tag label to transfer entitlement (machine type, model number,
and serial number) from the original system-board assembly to the new
system-board assembly. Do not use a pencil or felt-tip pen to complete the RID
tag label.
Important:
v The serial number of the blade server (with the original system-board
assembly) must match the serial number that you reported when you called
IBM for service.
v Because the new system-board assembly is not associated with a
blade-server serial number, you must transfer the serial number from the
original system-board assembly to the new system-board assembly. The first
time that you turn on the blade server that contains the new system-board
88 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
assembly, the firmware code will request that you enter the serial number,
as described in step 16. Yo u must enter the blade-server serial number. If
you enter a different serial number, the operating system that you installed
might interpret this information as an incorrect serial number, and you might
have to change your software-licensing agreement.
v To maintain proper airflow, do not place the new label on the blade-server
bezel assembly.
Also, be sure to place the RID tag label on the bottom of the blade server
chassis.
11. Install the blade-server bezel assembly on the blade server (see “Installing the
blade-server bezel assembly” on page 90).
12. Close the blade server cover (see “Closing the blade server cover” on page
92).
Statement 21:
CAUTION:
Hazardous energy is present when the blade server is connected to the
power source. Always replace the blade cover before installing the blade
server.
13. Install the blade server into the same BladeCenter unit I/O bay from which you
removed the blade server when it contained the faulty system-board assembly.
14. Turn on the blade server (see “Turning on the blade server” on page 13).
Note: If you have just connected the power cords of the BladeCenter unit to
electrical outlets, you will have to wait until the power-on LED on the
blade server slowly flashes before you press the power-control button
on the blade server.
15. Configure an SOL connection and attach it to this blade server.
For additional information, see the IBM Eserver BladeCenter and
BladeCenter T Serial Over LAN Setup Guide .
16. The blade server will boot to the open firmware command to enter the serial
number of the blade server (with the original system-board assembly).
The blade server will not start until the serial number and other relevant
information have been entered and verified at the prompts when the following
checkpoint codes are displayed, as shown in the following example window.
Depending on the blade server configuration, the text that is displayed in your
system window might be slightly different.
E1F0
E1F1
D099
D100 > xxxxxxx (The serial number of the blade server with the original
system-board assembly)
D101 > xxxxxxx (Re-enter the serial number to verify)
D102 > 8842 (The type number from the blade server)
D103 > 8842 (Re-enter the type number to verify)
D104 > xxxx (The model number from the blade server)
D105 > xxxx (Re-enter the model number to verify)
Chapter 9. Installing options 89
Note: These checkpoint codes are described in Chapter 7, “Diagnostics,” on
page 37.
17. Reset the system date and time through the operating system that you
installed. For additional information, see your operating-system documentation.
The system-board assembly replacement procedure is now complete. Continue
with “Input/output connectors and devices” on page 92.
Completing the installation
To complete the installation, perform the following tasks, if you have not already
done so.
1. Install the blade-server bezel assembly on the blade server (see “Installing the
blade-server bezel assembly”).
2. Close the blade server cover (see “Closing the blade server cover” on page 92).
Statement 21:
CAUTION:
Hazardous energy is present when the blade server is connected to the
power source. Always replace the blade cover before installing the blade
server.
3. Reinstall the blade server into the BladeCenter unit.
4. Turn on the blade server (see “Turning on the blade server” on page 13).
5. After you replace the battery or the system-board assembly, reset the system
date and time through the operating system that you installed. For additional
information, see your operating-system documentation.
If you have just connected the power cords of the BladeCenter unit to
Note:
electrical outlets, you will have to wait until the power-on LED on the blade
server flashes slowly before pressing the power-control button on a blade
server.
Installing the blade-server bezel assembly
The following illustration shows how to install the bezel assembly on the blade
server.
90 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide