IBM JS20, BladeCenter JS20Type 8842 Maintenance And Troubleshooting Manual

BladeCenter JS20 Ty pe 8842
Hardw are Maintenance Manual and Troubleshooting Guid e

BladeCenter JS20 Ty pe 8842
Hardw are Maintenance Manual and Troubleshooting Guid e

Notes
“Notices” on page 197
v The most recent version of this document is available at http://www.ibm.com/pc/support/.
16th Edition (June 2006) © Copyright International Business Machines Corporation 2003. All rights reserved.
US Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
About this manual
This manual contains diagnostic information, a symptom-to-FRU index, service information, error codes, error messages, and configuration information for the IBM BladeCenter
®
JS20 Type 8842 blade server.
Important safety information
Be sure to read all caution and danger statements in this book before performing any of the instructions; see Appendix B, “Safety information,” on page 163.
Leia todas as instruções de cuidado e perigo antes de executar qualquer operação.
Prenez connaissance de toutes les consignes de type Attention et Danger avant de procéder aux opérations décrites par les instructions.
Lesen Sie alle Sicherheitshinweise, bevor Sie eine Anweisung ausführen.
Accertarsi di leggere tutti gli avvisi di attenzione e di pericolo prima di effettuare qualsiasi operazione.
Lea atentamente todas las declaraciones de precaución y peligro ante de llevar a cabo cualquier operación.
®
WARNING: Handling the cord on this product or cords associated with accessories
sold with this product, will expose you to lead, a chemical known to the State of California to cause cancer, and birth defects or other reproductive harm. Wash
hands after handling.
ADVERTENCIA: El contacto con el cable de este producto o con cables de
accesorios que se venden junto con este producto, pueden exponerle al plomo, un elemento químico que en el estado de California de los Estados Unidos está considerado como un causante de cancer y de defectos congénitos, además de otros riesgos reproductivos. Lávese las manos después de usar el producto.
Online support
You can download the most current firmware update and device driver files from http://www.ibm.com/pc/support.
© Copyright IBM Corp. 2003 iii
iv BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Contents
About this manual . . . . . . . . . . . . . . . . . . . . . . . iii
Important safety information . . . . . . . . . . . . . . . . . . . . iii
Online support . . . . . . . . . . . . . . . . . . . . . . . . . iii
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . .1
Related documentation . . . . . . . . . . . . . . . . . . . . . .3
The IBM BladeCenter Documentation CD . . . . . . . . . . . . . . .4
Hardware and software requirements . . . . . . . . . . . . . . . .4
Using the Documentation Browser . . . . . . . . . . . . . . . . .4
Notices and statements used in this document . . . . . . . . . . . . . .5
Features and specifications . . . . . . . . . . . . . . . . . . . . .6
BladeCenter JS20 specifications for non-NEBS/ETSI environments . . . . .6
BladeCenter JS20 specifications for NEBS/ETSI environments . . . . . . .7
Preinstallation checklist . . . . . . . . . . . . . . . . . . . . . .9
Checking the status of the media tray . . . . . . . . . . . . . . . .10
Chapter 2. Blade server power, controls, and indicators . . . . . . . .13
Turning on the blade server . . . . . . . . . . . . . . . . . . . .13
Turning off the blade server . . . . . . . . . . . . . . . . . . . .14
Blade server controls and LEDs . . . . . . . . . . . . . . . . . .14
Chapter 3. Configuration . . . . . . . . . . . . . . . . . . . . .17
Using the command-line interface . . . . . . . . . . . . . . . . . .18
Configuring the Gigabit Ethernet controller . . . . . . . . . . . . . . .18
Blade server Ethernet controller enumeration . . . . . . . . . . . . . .19
Chapter 4. Problem determination procedures for AIX and Linux . . . . .21
Problem determination . . . . . . . . . . . . . . . . . . . . . .21
Obtaining an SRN/SRC or error code . . . . . . . . . . . . . . . .22
Chapter 5. AIX online, standalone and verification procedures . . . . . .25
Performing AIX online concurrent mode diagnostics for problem determination 25
Running the standalone diagnostics from CD-ROM . . . . . . . . . . .25
Performing AIX online concurrent mode diagnostics for previous diagnostic
results: service aids . . . . . . . . . . . . . . . . . . . . . .28
Performing AIX online concurrent mode diagnostics for system verification . . .29
Verifying the replacement part using AIX diagnostics . . . . . . . . . . .30
Chapter 6. Running a Serial Over LAN session . . . . . . . . . . . .33
Selecting the command target . . . . . . . . . . . . . . . . . . .34
Starting the command-line interface . . . . . . . . . . . . . . . . .34
Establishing a Telnet connection . . . . . . . . . . . . . . . . .35
Establishing a Secure Shell (SSH) connection . . . . . . . . . . . .35
Starting an SOL session . . . . . . . . . . . . . . . . . . . . .35
Ending an SOL session . . . . . . . . . . . . . . . . . . . . . .36
Chapter 7. Diagnostics . . . . . . . . . . . . . . . . . . . . .37
General checkout . . . . . . . . . . . . . . . . . . . . . . . .37
Checkout procedure . . . . . . . . . . . . . . . . . . . . . . .38
Diagnostic tools overview . . . . . . . . . . . . . . . . . . . . .39
POST . . . . . . . . . . . . . . . . . . . . . . . . . . . .40
Checkpoints . . . . . . . . . . . . . . . . . . . . . . . . .40
Accessing the Linux system error log . . . . . . . . . . . . . . . .40
© Copyright IBM Corp. 2003 v
Service aids and the Linux system error log . . . . . . . . . . . . .40
FRU/CRU isolation . . . . . . . . . . . . . . . . . . . . . . .46
Error symptom charts . . . . . . . . . . . . . . . . . . . . . .46
Light path diagnostics . . . . . . . . . . . . . . . . . . . . . .46
Memory errors . . . . . . . . . . . . . . . . . . . . . . . . .47
Recovering the system firmware code . . . . . . . . . . . . . . . .48
Recovery of system firmware code using service aids . . . . . . . . .48
Starting the TEMP image . . . . . . . . . . . . . . . . . . . .48
Recovering the TEMP image from the PERM image . . . . . . . . . .49
Updating the blade server firmware . . . . . . . . . . . . . . . . .50
Determination of current server firmware levels . . . . . . . . . . . .51
Updating the blade server service processor . . . . . . . . . . . . .51
Update and manage system flash using Linux service aids . . . . . . . .51
Updating the system flash using Linux . . . . . . . . . . . . . .51
Verifying the system firmware levels using Linux . . . . . . . . . .52
Update and manage system flash using AIX diagnostics . . . . . . . . .52
Updating the system flash using AIX . . . . . . . . . . . . . . .52
Committing the temporary firmware image using AIX . . . . . . . . .53
Verifying the system firmware levels using AIX . . . . . . . . . . .53
Recovering the system firmware code . . . . . . . . . . . . . . . .54
Recovery of system firmware code using service aids . . . . . . . . .55
Starting the backup image . . . . . . . . . . . . . . . . . . . .55
Recovering the primary image . . . . . . . . . . . . . . . . . .56
Chapter 8. General AIX and xSeries standalone diagnostic information 59
Information for general diagnostic systems running the AIX operating system 59
AIX operating system message files . . . . . . . . . . . . . . . .59
CE login . . . . . . . . . . . . . . . . . . . . . . . . . . .59
Missing resources . . . . . . . . . . . . . . . . . . . . . . . .60
Automatic diagnostic tests . . . . . . . . . . . . . . . . . . . . .60
Configuration program . . . . . . . . . . . . . . . . . . . . .60
Diagnostic programs . . . . . . . . . . . . . . . . . . . . . .61
Error log analysis . . . . . . . . . . . . . . . . . . . . . . . .61
Introducing tasks and service aids . . . . . . . . . . . . . . . . . .62
Task and service aid functions . . . . . . . . . . . . . . . . . .62
AIX automatic error log analysis (diagela) . . . . . . . . . . . . . .62
Error log analysis . . . . . . . . . . . . . . . . . . . . . . .63
Log repair action . . . . . . . . . . . . . . . . . . . . . . .63
Tasks (service aids) . . . . . . . . . . . . . . . . . . . . . .63
Download microcode . . . . . . . . . . . . . . . . . . . . .64
Update and manage system flash . . . . . . . . . . . . . . . .65
Using the standalone CD-ROM and online current diagnostics . . . . . .67
Standalone and online diagnostics operating considerations . . . . . . .67
Running online diagnostics . . . . . . . . . . . . . . . . . . .67
Running the online diagnostics in concurrent mode . . . . . . . . . .68
Running standalone diagnostics from a management (NIM) server . . . . . .68
NIM server configuration . . . . . . . . . . . . . . . . . . . .69
Client configuration and booting ERserver standalone diagnostics from the
NIM server . . . . . . . . . . . . . . . . . . . . . . . .69
Chapter 9. Installing options . . . . . . . . . . . . . . . . . . .71
Installation guidelines . . . . . . . . . . . . . . . . . . . . . .71
System reliability guidelines . . . . . . . . . . . . . . . . . . .71
Handling static-sensitive devices . . . . . . . . . . . . . . . . .71
Removing the blade server from the BladeCenter unit . . . . . . . . . .73
Opening the blade server cover . . . . . . . . . . . . . . . . . . .74
vi BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Removing the blade server bezel assembly . . . . . . . . . . . . . .75
Installing IDE hard disk drives . . . . . . . . . . . . . . . . . . .75
Installing memory modules . . . . . . . . . . . . . . . . . . . .77
Installing an I/O expansion card . . . . . . . . . . . . . . . . . . .79
Ethernet controller, switch module, and cabling requirements . . . . . . . .82
Replacing the battery . . . . . . . . . . . . . . . . . . . . . .83
System board . . . . . . . . . . . . . . . . . . . . . . . . .86
System board component locations . . . . . . . . . . . . . . . .86
System-board LED locations . . . . . . . . . . . . . . . . . . .87
Replacing the system board . . . . . . . . . . . . . . . . . . .87
Completing the installation . . . . . . . . . . . . . . . . . . . . .90
Installing the blade-server bezel assembly . . . . . . . . . . . . . .90
Closing the blade server cover . . . . . . . . . . . . . . . . . .92
Input/output connectors and devices . . . . . . . . . . . . . . . . .92
Chapter 10. Symptom-to-FRU index . . . . . . . . . . . . . . . .93
Firmware checkpoint (progress) codes . . . . . . . . . . . . . . . .94
Firmware error codes . . . . . . . . . . . . . . . . . . . . . . 102
Service request numbers . . . . . . . . . . . . . . . . . . . . . 108
Linux service aid diagela . . . . . . . . . . . . . . . . . . . 109
Using the SRN list . . . . . . . . . . . . . . . . . . . . . . 109
Service request number . . . . . . . . . . . . . . . . . . . 109
Source of SRN . . . . . . . . . . . . . . . . . . . . . . 109
Failing Function Codes . . . . . . . . . . . . . . . . . . . 109
Description and action . . . . . . . . . . . . . . . . . . . .110
Using the SRN list . . . . . . . . . . . . . . . . . . . . .110
SRN tables . . . . . . . . . . . . . . . . . . . . . . . . .110
AIX SRNs 101-711 through 2D02 . . . . . . . . . . . . . . . .110
SRNs A00-(x)xxx through A1D-(x)xxx . . . . . . . . . . . . . . 121
Failing Function Codes (FFCs) . . . . . . . . . . . . . . . . 141
FFC table . . . . . . . . . . . . . . . . . . . . . . . . 142
Light path diagnostics LEDs . . . . . . . . . . . . . . . . . . . 145
Error symptoms . . . . . . . . . . . . . . . . . . . . . . . . 145
CD drive problems . . . . . . . . . . . . . . . . . . . . . . 146
Diskette drive problems . . . . . . . . . . . . . . . . . . . . 147
General problems . . . . . . . . . . . . . . . . . . . . . . 147
Hard disk drive problems . . . . . . . . . . . . . . . . . . . . 147
Memory problems . . . . . . . . . . . . . . . . . . . . . . 148
Microprocessor problems . . . . . . . . . . . . . . . . . . . . 148
Monitor problems . . . . . . . . . . . . . . . . . . . . . . 148
Mouse problems . . . . . . . . . . . . . . . . . . . . . . . 149
Network connection problems . . . . . . . . . . . . . . . . . . 150
Option problems . . . . . . . . . . . . . . . . . . . . . . . 150
Power problems . . . . . . . . . . . . . . . . . . . . . . . 151
Service processor problems . . . . . . . . . . . . . . . . . . . 151
Software problems . . . . . . . . . . . . . . . . . . . . . . 152
Startup problems . . . . . . . . . . . . . . . . . . . . . . . 152
Service processor error codes . . . . . . . . . . . . . . . . . . . 152
Boot problem resolution . . . . . . . . . . . . . . . . . . . . . 153
Physical location codes . . . . . . . . . . . . . . . . . . . . . 154
Undetermined problems . . . . . . . . . . . . . . . . . . . . . 156
Problem determination tips . . . . . . . . . . . . . . . . . . . . 158
Chapter 11. Parts listing, Type 8842 . . . . . . . . . . . . . . . . 159
Appendix A. Getting help and technical assistance . . . . . . . . . . 161
Contents vii
Before you call . . . . . . . . . . . . . . . . . . . . . . . . 161
Using the documentation . . . . . . . . . . . . . . . . . . . . . 161
Getting help and information from the World Wide Web . . . . . . . . . 162
Software service and support . . . . . . . . . . . . . . . . . . . 162
Hardware service and support . . . . . . . . . . . . . . . . . . . 162
Appendix B. Safety information . . . . . . . . . . . . . . . . . 163
General safety . . . . . . . . . . . . . . . . . . . . . . . . 163
Electrical safety . . . . . . . . . . . . . . . . . . . . . . . . 164
Safety inspection guide . . . . . . . . . . . . . . . . . . . . . 165
Grounding requirements . . . . . . . . . . . . . . . . . . . . . 166
Safety notices (multi-lingual translations) . . . . . . . . . . . . . . . 166
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Edition notice . . . . . . . . . . . . . . . . . . . . . . . . . 197
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . 198
Important notes . . . . . . . . . . . . . . . . . . . . . . . . 198
Product recycling and disposal . . . . . . . . . . . . . . . . . . 199
Battery return program . . . . . . . . . . . . . . . . . . . . . 199
Electronic emission notices . . . . . . . . . . . . . . . . . . . . 200
Federal Communications Commission (FCC) statement . . . . . . . . 200
Industry Canada Class A emission compliance statement . . . . . . . . 200
Australia and New Zealand Class A statement . . . . . . . . . . . . 200
United Kingdom telecommunications safety requirement . . . . . . . . 200
European Union EMC Directive conformance statement . . . . . . . . 201
Taiwanese Class A warning statement . . . . . . . . . . . . . . . 201
Chinese Class A warning statement . . . . . . . . . . . . . . . . 201
Japanese Voluntary Control Council for Interference (VCCI) statement 201
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
viii BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Chapter 1. Introduction
The IBM BladeCenter JS20 Type 8842, also known as the blade server, is based on the IBM Power Architecture
technologies.
The BladeCenter JS20 Type 8842 is compatible with IBM BladeCenter units. This high-performance blade server is well-suited for networking environments that require outstanding microprocessor performance, efficient memory management, flexibility, and reliable data storage.
Notes:
1. In this document, the term BladeCenter unit refers to any IBM BladeCenter,
BladeCenter T, or other BladeCenter-class chassis model, except where specifically indicated otherwise.
2. The number of blade servers your BladeCenter unit supports depends on the
type of BladeCenter unit. For example, the IBM Eserver BladeCenter Type 8677 supports up to 14 hot-swap blade servers; the BladeCenter T Types 8720 and 8730 support up to 8 hot-swap blade servers. See the documentation that comes with the BladeCenter unit for more information. For more information about determining the power requirements for the blade server, see the IBM
Eserver BladeCenter Power Module Upgrade Guidelines Technical Update on
the IBM BladeCenter Documentation CD.
3. The types and capacities of power modules your BladeCenter unit supports,
which affects the number of blade servers you can install in the BladeCenter unit, depends on the type of BladeCenter unit. See the documentation that comes with the BladeCenter unit for more information.
© Copyright IBM Corp. 2003 1
Release levers
Release button
Notes:
v In a BladeCenter unit that supports multiple types of power modules with different
capacities, such as the BladeCenter Type 8677, the maximum number of blade servers that the BladeCenter unit supports varies by the wattage of the power modules that are installed in the BladeCenter unit. For more information about determining the power requirements for the blade server, see the IBM Eserver
BladeCenter Power Module Upgrade Guidelines Technical Update on the World
Wide Web at http://www.ibm.com/support/.
v Two power modules are required to support the blade servers in power domain A
in the BladeCenter unit. The following blade bays are in power domain A: Blade bays 1 through 6 in a BladeCenter Type 8677 or similar unit Blade bays 1 through 5 in a BladeCenter T unit
If you install blade servers in these blade bays, you must install power modules in power-module bays 1 and 2 in the BladeCenter unit.
v Two additional power modules are required to support the blade servers in power
domain B in the BladeCenter unit. The following blade bays are in power domain B:
Blade bays 7 through 14 in a Type 8677 or similar BladeCenter unit Blade bays 6 through 8 in a BladeCenter T unit
you install blade servers in these blade bays, you must install power modules
If in power-module bays 3 and 4 in the BladeCenter unit.
v Make sure that you review and understand the design of the BladeCenter unit.
Use this information to help you determine your system configuration requirements and the bays and connectors where you will install or remove
2 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
components. For additional information, see the BladeCenter unit Installation and
User’s Guide on the Documentation CD for your BladeCenter unit, or go to
http://www.ibm.com/support/ on the World Wide Web.
Related documentation
This Hardware Maintenance Manual and Troubleshooting Guide is provided in Portable Document Format (PDF) on the IBM BladeCenter JS20 Documentation CD that comes with the IBM BladeCenter JS20 Type 8842. It contains information to help you solve problems yourself or to provide helpful information to a service technician.
In addition to this Hardware Maintenance Manual and Troubleshooting Guide, the following information is provided in PDF on the IBM BladeCenter Documentation CD that comes with the IBM BladeCenter JS20 Type 8842:
v Safety Information: This document contains translated caution and danger
statements. Each caution and danger statement that appears in the
documentation has a number that you can use to locate the corresponding
statement in your language in the Safety Information document.
v BladeCenter JS20 Type 8842 Installation and User’s Guide: This document
contains instructions for setting up the server, contains basic instructions for
installing some options; and provides general information about the server,
including information about features and how to configure the server. v BladeCenter and BladeCenter T Management Module User’s Guide: This
document contains instructions for installing, starting, configuring, and using the
BladeCenter unit management module. This document also provides general
information about the management module and contains a description of the
management module features.
v BladeCenter and BladeCenter T Management Module Command-Line Interface
Reference Guide: This document contains instructions for installing, starting,
configuring, and using the IBM Eserver BladeCenter management-module
command-line interface. This document also provides general information about
the BladeCenter management-module command-line interface and contains a
description of its features. v BladeCenter or BladeCenter T Management Module Installation Guide: This
document contains instructions for installing, setting up, starting, and configuring
the BladeCenter unit management module. v BladeCenter unit Installation and User’s Guide: This document contains
instructions for setting up and configuring the BladeCenter unit and basic
instructions for installing some options in the BladeCenter unit. It also contains
general information about the BladeCenter unit. v BladeCenter unit Hardware Maintenance Manual and Troubleshooting Guide:
This document contains the information to help you solve BladeCenter unit
problems yourself, and it contains information for service technicians. v BladeCenter unit Rack Installation Instructions: This document contains
instructions for installing the BladeCenter unit in a rack.
v IBM 4-Port Gb Ethernet Switch Module for BladeCenter Installation and User’s
Guide: This document contains instructions for setting up, installing, and
configuring the IBM 4-Port Gb Ethernet Switch Module for BladeCenter and a
description of the switch-module features.
v Nortel Networks Layer 2-7 GbE Switch Module for IBM BladeCenter Installation
Guide: This document contains instructions for setting up, installing, and
Chapter 1. Introduction 3
configuring the Nortel Networks Layer 2-7 GbE Switch Module for IBM Eserver BladeCenter and a description of the switch-module features.
v IBM BladeCenter 2-Port Fibre Channel Switch Module Installation Guide: This
document contains instructions for setting up, installing, and configuring the IBM
Eserver BladeCenter 2-Port Fibre Channel Switch Module, and a description of
the switch module features.
v Technical Update for IBM BladeCenter Fiber Channel Switch Module version
1.00: This document contains updated information about the IBM Eserver
BladeCenter 2-Port Fibre Channel Switch Module.
v IBM Eserver BladeCenter and BladeCenter T Serial Over LAN Setup Guide:
This document contains instructions for establishing a Serial Over LAN (SOL) connection, enabling the SOL feature, and configuring the blade server so that you can run SOL sessions and use the BladeCenter management-module command-line interface. This document also contains instructions for updating and configuring BladeCenter components for SOL operation using the management-module Web-based management and configuration program.
v IBM Eserver BladeCenter Power Module Upgrade Guidelines Technical Update:
This document contains information that helps you determine the power requirements for the blade server.
Additional documentation might be included on the IBM BladeCenter Documentation CD.
The IBM BladeCenter Documentation CD
The IBM BladeCenter JS20 blade server Documentation CD contains documentation for the blade server in Portable Document Format (PDF) and includes the IBM Documentation Browser to help you find information quickly.
Hardware and software requirements
The IBM Documentation CD requires the following minimum hardware and software:
v Microsoft
Red Hat
Windows NT
®
®
Linux
®
4.0 (with Service Pack 3 or later), Windows
®
v 100 MHz microprocessor v 32 MB of RAM v Adobe Acrobat Reader 3.0 (or later) or xpdf, which comes with Linux operating
systems
Note: Acrobat Reader software is included on the CD, and you can install it
when you run the Documentation Browser.
Using the Documentation Browser
Use the Documentation Browser to browse the contents of the CD, read brief descriptions of the documents, and view documents using Adobe Acrobat Reader or xpdf. The Documentation Browser automatically detects the regional settings in use in the system and displays the documents in the language for that region (if available). If a document is not available in the language for that region, the English version is displayed.
®
2000, or
Use one of the following procedures to start the Documentation Browser:
v If Autostart is enabled, insert the CD into the CD drive. The Documentation
Browser starts automatically.
4 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
v If Autostart is disabled or is not enabled for all users:
If you are using a Windows operating system, insert the CD into the CD drive
and click Start --> Run. In the Open field, type
x:\win32.bat
(where x is the drive letter of the CD drive), and click OK.
If you are using a Linux operating system, insert the CD into the CD drive;
then, run the following command from the /mnt/cdrom directory:
sh runlinux.sh
Select the server from the Product menu. The Available Topics list displays all the documents for the server. Some documents might be in folders. A plus sign (+) indicates each folder or document that has additional documents under it. Click the plus sign to display the additional documents.
When you select a document, a description of the document appears under Topic Description. To select more than one document, press and hold the Ctrl key while you select the documents. Click View Book to view the selected document or documents in Acrobat Reader or xpdf. If you selected more than one document, all the selected documents are opened in Acrobat Reader or xpdf.
To search all the documents, type a word or word string in the Search field and click Search. The documents in which the word or word string appears are listed in order of the most occurrences. Click a document to view it, and press Crtl+F to use the Acrobat search function or Alt+F to use the xpdf search function within the document.
Click Help for detailed information about using the Documentation Browser.
Notices and statements used in this document
The caution and danger statements that appear in this document are also in the multilingual Safety Information document, which is on the IBM BladeCenter unit or blade server Documentation CD. Each statement is numbered for reference to the corresponding statement in the Safety Information document.
The following notices and statements are used in the documentation:
v Notes: These notices provide important tips, guidance, or advice. v Important: These notices provide information or advice that might help you avoid
inconvenient or problem situations.
v Attention: These notices indicate potential damage to programs, devices, or
data. An attention notice is placed just before the instruction or situation in which
damage could occur.
v Caution: These statements indicate situations that can be potentially hazardous
to you. A caution statement is placed just before the description of a potentially
hazardous procedure step or situation.
v Danger: These statements indicate situations that can be potentially lethal or
extremely hazardous to you. A danger statement is placed just before the
description of a potentially lethal or extremely hazardous procedure step or
situation.
Chapter 1. Introduction 5
Features and specifications
This section provides a summary of the features and specifications of your blade server. Through the BladeCenter unit management module, you can view the blade server firmware code and other hardware configuration information.
Note: Power, cooling, removable-media drives, external ports, and advanced
system management are provided by the IBM Eserver BladeCenter unit. For more information, see the Installation and User’s Guide for your BladeCenter unit.
BladeCenter JS20 specifications for non-NEBS/ETSI environments
The following table provides a summary of the features and specifications of the BladeCenter JS20 Type 8842 in a non-NEBS/ETSI environment. This includes model-specific information.
®
Microprocessor:
®
Two IBM PowerPC
microprocessors
with 512 KB ECC L2 cache
Memory:
v Four double-data rate (DDR)
PC2700 sockets
v Minimum: 512 MB v Maximum: 4 or 8 GB (depends on
the blade server model) *
IDE
devices:
v Support for up to two internal
integrated drive electronics (IDE)
2.5-inch hard disk drives or
v Support for one internal IDE
2.5-inch hard disk drive in IDE connector 1 and one optional I/O expansion card in IDE connector 2
Note: Installing an I/O expansion
Size:
v Height: 24.5 cm (9.7 inches) v Depth: 44.6 cm (17.6 inches) v Width: 2.9 cm (1.14 inches) v Maximum weight: 5.4 kg (12 lb)
Integrated
functions:
v One dual-port Gigabit Ethernet
controller
v Light path diagnostics v Local service processor v One IDE hard disk drive controller
with two channels
v RS-485 interface for
communication with BladeCenter management module
v Serial Over LAN (SOL)
Predictive Failure Analysis alerts:
v Microprocessors v Memory v Hard disk drives
Environment:
v Air temperature:
Blade server on: 10° to 35°C
(50° to 95°F). Altitude: 0 to 914 m (0 to 3000 ft)
Blade server on: 10° to 32°C
(50° to 90°F). Altitude: 914 m to 2133 m (3000 ft to 7000 ft)
Blade server off: -40° to 60°C
(40° to 140° F)
v
Humidity:
Blade server on: 8% to 80% Blade server off: 5% to 80%
Electrical input:
v Input voltage: 12 V dc
card increases network connections.
* For information about dual inline memory module (DIMM) type and supported DIMM size, see “Installing memory modules” on page 77.
(PFA)
Note: The operating system in the blade server must provide USB support for the
blade server to recognize and use the CD drive and diskette drive. The BladeCenter unit uses USB for internal communications with these devices.
6 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
BladeCenter JS20 specifications for NEBS/ETSI environments
The following table provides a summary of the features and specifications of the BladeCenter JS20 Type 8842 in a NEBS/ETSI environment. This includes model-specific information.
Microprocessor:
®
v Two IBM Power PC
microprocessors with 512 KB ECC L2 cache
Memory:
v Four DDR PC2700 sockets v Minimum: 1 GB v Maximum: 4 or 8 GB (depends on
the blade server model)
IDE
devices:
v NEBS application does not support
internal drives
Size:
v Height: 24.5 cm (9.7 inches) v Depth: 44.6 cm (17.6 inches) v Width: 2.9 cm (1.14 inches) v Maximum weight: 5.4 kg (12 lb)
Integrated
functions:
v One dual-port Gigabit Ethernet
controller
v Light path diagnostics v Local service processor v One IDE hard disk drive controller
with two channels
v RS-485 interface for
communication with BladeCenter management module
v Serial over LAN
Predictive
Failure Analysis (PFA)
alerts:
v Microprocessors v Memory
Environment (NEBS):
v Air temperature:
Blade server on: to 40°C (41°
to 104°F). Altitude: -60 to 1800 m (-197 to 6000 ft)
Blade server on (short term): -5°
to 55°C (23° to 131°F) Altitude:
-60 to 1800 m (-197 to 6000 ft)
Blade server on: to 30°C (41°
to 86°F). Altitude: 1800 to 4000 m (6000 to 13 000 ft)
Blade server on (short term): -5°
to 45°C (23° to 113°F). Altitude: 1800 to 4000 m (6000 to 13 000 ft)
Blade server off: -40° to 70°C
(-40° to 158°F)
v
Humidity:
Blade server on: 5% to 80% Blade server on (short term): 5%
to 90% but not to exceed 0.024 kg water/kg of dry air
Blade server off: uncontrolled
"Short term" refers to a period
Note:
of not more than 96 consecutive hours and a total of not more than 15 days in 1 year. (This refers to a total of 360 hours in any year, but no more than 15 occurrences during that 1-year period.)
Electrical input:
v Input voltage: 12 V dc
* For information about DIMM type and supported DIMM size, see “Installing memory modules” on page 77.
Notes:
1. The operating system in the blade server must provide USB support for the
blade server to recognize and use the CD drive and an external diskette drive. The BladeCenter T unit uses USB for internal communication with these devices.
Chapter 1. Introduction 7
2. BladeCenter JS20 models that are designed for the NEBS environment contain a power-management capability that provides the maximum possible operating time for your system. Power management is invoked only when the blade server is installed in a BladeCenter T unit and only under the short term extended thermal conditions that are described in the preceding table as "short term" in the high end of the NEBS extended temperature range, 40° to 55°C (104° to 131°F). Instead of shutting down or failing in short term extended thermal conditions, the JS20 blade server automatically reduces the frequency of the processor to maintain acceptable thermal levels. The processor frequency automatically returns to normal as thermal conditions improve. The BladeCenter management module is notified when power management starts and again when it stops.
The following entries are made in the event log:
v Frequency throttling process is now active.
(This message indicates that power reduction is in effect.)
v Frequency throttling process is now idling.
(This message indicates that power reduction was previously invoked but is no longer in effect.)
not restart the blade server when power reduction is in effect.
Do
3. Some applications are sensitive to processor frequency changes. Check with your application vendors to determine if there are any possible impacts to your applications from the effects of the JS20 blade server power-management capability in the short term extended thermal conditions of the NEBS environment.
8 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Preinstallation checklist
Before you can use the BladeCenter unit with the blade server, you must correctly set up and configure the BladeCenter unit, and install and configure the required components in the BladeCenter unit. Read Appendix B, “Safety information,” on page 163, and the information in “Installation guidelines” on page 71, and review the documentation that comes with each device and any applicable information in the “Related documentation” on page 3.
If you have not already done so, perform the activities on the following checklist: __ 1. Set up the rack in which you will install the BladeCenter unit. __ 2. Install the BladeCenter unit in a rack. For additional information, see the
Rack Installation Instructions that come with the BladeCenter unit.
__ 3. Install and configure the required BladeCenter unit components:
__ a. Make sure that the BladeCenter unit has adequate power to support
__ b. Install and configure one or two management modules in the
__ c. Install and configure one or two Ethernet switch modules in the
all the installed devices. The BladeCenter unit must contain either two or four power modules. If necessary, on BladeCenter units that support it, upgrade the power modules in the BladeCenter unit to higher-capacity power modules. For additional information, see the
IBM Eserver BladeCenter Power Module Upgrade Guidelines Technical Update.
BladeCenter unit.
BladeCenter unit. To support the Serial Over LAN (SOL) feature on any blade server
that is installed in the BladeCenter unit:
v A SOL-compatible Ethernet switch module must be installed in I/O
bay 1 of the BladeCenter unit.
v Both the BladeCenter unit and the Ethernet switch module must be
configured so that the SOL feature is enabled and set to operate on the same virtual local area network (VLAN).
If you plan to install the operating system through the Ethernet network, you also must install and configure a second Ethernet switch module in I/O bay 2 of the BladeCenter unit.
__ d. Configure the BladeCenter unit for SOL operation as described in the
__ 4. If the BladeCenter unit was shipped to you before June 2003, make sure
that:
Note: If you install other Ethernet switch modules, they do not have
to be the same type that you installed in I/O bay 1 of the BladeCenter unit.
IBM Eserver BladeCenter and BladeCenter T Serial Over LAN Setup Guide.
Verify that the firmware code for the BladeCenter unit, management module, and Ethernet switch modules supports the SOL feature. If you are not sure whether these devices come with this feature, see the
IBM Eserver BladeCenter and BladeCenter T Serial Over LAN Setup Guide for additional information.
The SOL feature is required and must remain enabled for all applicable devices, including the BladeCenter unit, management module, and Ethernet switch modules.
Chapter 1. Introduction 9
__ a. The hardware and firmware in the BladeCenter unit are at the
supported levels for the blade server. Go to the IBM Support Web site, http://www.ibm.com/support/, for additional information.
__ b. The BladeCenter unit has the correct customer interface card (CIC)
(see “Checking the status of the media tray”).
For illustrations and additional information, see the following related documentation on the World Wide Web at http://www.ibm.com/support/:
v BladeCenter Type 8677 Rack Installation Instructions
v BladeCenter Type 8677 Installation and User ’s Guide
v BladeCenter T Types 8720 and 8730 Installation and User’s Guide v BladeCenter T 2-Post Rack Mount Kit Installation Instructions v BladeCenter T 4-Post and Universal Telco Frame (UTF) Rack Mount Kit
Installation Instructions
v IBM Eserver BladeCenter Power Module Upgrade Guidelines Technical Update
v BladeCenter Management Module Installation Guide
v BladeCenter T Management Module Installation Guide
v BladeCenter and BladeCenter T Management Module Command-Line Interface
Reference Guide
v IBM Eserver BladeCenter and BladeCenter T Serial Over LAN Setup Guide v The documentation that comes with the Ethernet switch module that you are
using; for example: IBM 4-Port Gb Ethernet Switch Module for BladeCenter Installation and User’s
Guide
Nortel Networks Layer 2-7 GbE Switch Module for IBM BladeCenter
Installation Guide
For more information, see “Related documentation” on page 3.
Note:
Checking the status of the media tray
If you received a BladeCenter unit other than a Type 8677, this topic does not apply.
Important: If you received a Type 8677 BladeCenter unit before June 2003, the
customer interface card (CIC) in the media tray of the BladeCenter unit might need to be replaced before the CD drive will work correctly with a BladeCenter JS20 Type 8842.
If you received a Type 8677 BladeCenter unit before June 2003, start the management-module Web interface and perform these steps to determine if the CIC in your BladeCenter unit needs to be replaced:
1. In the navigation pane on the left side, select Monitors; then, select Hardware VPD.
2. While looking at the “BladeCenter Hardware VPD” table in the right pane, find the row for module name “Media Tray”.
3. Check the “FRU Number” column for the “Media Tray”.
4. If you see 59P6629, have the CIC replaced before installing a BladeCenter JS20 Type 8842 in the BladeCenter unit.
10 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
To have the CIC replaced, call the IBM Support Center and report the CIC as a failed part and request replacement with the latest CIC field replaceable unit (FRU).
Chapter 1. Introduction 11
12 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Chapter 2. Blade server power, controls, and indicators
This chapter describes the power features, how to turn on and turn off the blade server, and what the controls and indicators mean.
Turning on the blade server
Important: To generate faster blade-server startups from the network, connect the
dynamic host configuration protocol (DHCP) server to the Ethernet switch module in I/O bay 2 in the BladeCenter unit. The system firmware code in the blade server detects this Ethernet controller first. The Ethernet controller in each blade server is then associated with the switch module in I/O bay 2.
Notes:
v After you connect the power cords of the BladeCenter unit to the electrical
outlets, wait until the power-on LED on the blade server flashes slowly before pressing the blade server power-control button. Before the LED flashes, the service processor in the BladeCenter management module is initializing, and the power-control button on the blade server will not respond.
v While the blade server is powering up, the power-on LED on the front of the
server is lit. See “Blade server controls and LEDs” on page 14 for the power-on LED states.
v After an orderly shutdown of the operating system occurs, the Wake on LAN
feature is permanently enabled in the blade server system firmware. Therefore,
Enabled is the default setting. The Wake on LAN setting for each blade server is
stored in the management-module nonvolatile random-access memory (NVRAM). To disable the Wake on LAN feature for one or more blade servers, use the BladeCenter management-module Web interface. For more information about the BladeCenter management-module Web interface, see the BladeCenter and
BladeCenter T Management Module User’s Guide on the IBM BladeCenter Documentation CD.
v Throughout this document, the management-module Web-based user interface is
also known as the BladeCenter management-module Web interface.
®
you connect the BladeCenter unit to power, the blade server can start in any
After of the following ways:
v You can press the power-control button on the front of the blade server (behind
the control panel door) to start the server.
v If a power failure occurs, the BladeCenter unit and then the blade server can
start automatically when power is restored (if the blade server is configured through the BladeCenter management module to do so).
v You can turn on the blade server remotely by means of the service processor in
the BladeCenter management module.
v If the operating system supports the Wake on LAN feature and it has not been
disabled through the BladeCenter management-module Web interface, the blade server power-on LED is flashing slowly, and the Wake on LAN feature can turn on the blade server.
© Copyright IBM Corp. 2003 13
Turning off the blade server
When you turn off the blade server, it is still connected to power through the BladeCenter unit. The blade server can respond to requests from the service processor, such as a remote request to turn on the blade server. To remove all power from the blade server, you must remove it from the BladeCenter unit.
Shut down the operating system before you turn off the blade server. See the operating-system documentation for information about shutting down the operating system.
If the blade server has not been turned off, it can be turned off in any of the following ways:
v You can press the power-control button on the blade server (behind the control
panel door). This starts an orderly shutdown of the operating system, if this feature is supported by the operating system.
Note: After turning off the blade server, wait at least 5 seconds before you press
the power-control button to turn on the blade server again.
v If the operating system stops functioning, you can press and hold the
power-control button for more than 4 seconds to turn off the blade server.
v The management module can turn off the blade server.
After turning off the blade server, wait at least 30 seconds for its hard disk
Note:
drives to stop spinning before you remove the blade server from the BladeCenter unit.
Blade server controls and LEDs
This section describes the controls and LEDs on the blade server.
Power-control button: This button is behind the control panel door. Press this
button to manually turn the blade server on or off.
Note: The power-control button has effect only if local power control is enabled for
the blade server. Local power control is enabled and disabled through the BladeCenter management-module Web interface.
14 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Power control button
Notes:
1. The blade-error LED, information LED, and location LED can be turned off through the BladeCenter management-module Web interface.
2. For additional information about errors, see “Light path diagnostics” on page 46.
3. This blade server does not have a keyboard/mouse/video select button.
Blade-error LED
Information LED
Location LED
Activity LED
Power-on LED
CD/diskette/USB select button
CD
Blade-error LED: When this amber LED is lit, it indicates that a system error has
occurred in the blade server.
Information LED: When this amber LED is lit, it indicates that information about a
system error for this blade server has been placed in the BladeCenter system-error log.
Location LED: When this blue LED is lit, it has been turned on remotely by the
system administrator to aid in visually locating the blade server. The location LED on the BladeCenter unit will be lit also.
Chapter 2. Blade server power, controls, and indicators 15
Activity LED: When this green LED is lit, it indicates that there is hard disk drive or
network activity.
Power-on LED: This green LED indicates the power status of the blade server in
the following manner:
v Flashing rapidly The service processor on the blade server is communicating
with the BladeCenter management module.
v Flashing slowly The blade server has power but is not turned on. v Lit continuously (steady) The blade server has power and is turned on.
16 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Chapter 3. Configuration
The firmware in the blade server uses auto-configuration; therefore, additional blade-server configuration programs are not required for the blade server. However, if you have attached other devices to the blade server or the BladeCenter unit, you must configure those devices as described in the applicable documentation that comes with those devices or the BladeCenter unit. You do not have to set any passwords to use the blade server. If you change the battery or replace the system board, you must reset the date and time through the operating system.
You must establish a Serial Over LAN (SOL) connection and start an SOL session on the blade server:
v To establish a communications channel between the blade server and a
compatible monitor (or video console), keyboard, and mouse
v To install the operating system on the blade server v To configure the SOL feature v To run diagnostics programs v To have the blade server serviced
information relating to establishing an SOL connection, enabling the SOL
For feature, and configuring the blade server so that you can run SOL sessions and use the BladeCenter management-module command-line interface, see the following documents on the IBM BladeCenter Documentation CD:
v IBM Eserver BladeCenter JS20 Installation and User’s Guide v IBM Eserver BladeCenter and BladeCenter T Management Module
Command-Line Interface Reference Guide
Other documents on the IBM BladeCenter Documentation CD that you might find useful in the configuration process are:
v IBM 4-Port Gb Ethernet Switch Module for BladeCenter Installation and User’s
Guide
v Nortel Networks Layer 2-7 GbE Switch Module for IBM BladeCenter Installation
Guide
For information about setting up the network configuration for remote management, see the IBM Eserver BladeCenter Planning and Installation Guide or the IBM Eserver BladeCenter T Planning and Installation Guide. Yo u can obtain the planning guide from the Web site at http://www.ibm.com/pc/support.
To support the SOL feature and to configure the blade server, you must install a compatible Ethernet switch module in I/O bay 1 of the BladeCenter unit. Examples of compatible Ethernet switch modules are the IBM 4-Port Gb Ethernet Switch Module for BladeCenter and the Nortel Networks Layer 2-7 GbE Switch Module for IBM BladeCenter. For more information about these switch modules, see the IBM
4-Port Gb Ethernet Switch Module for BladeCenter Installation and User’s Guide or Nortel Networks Layer 2-7 GbE Switch Module for IBM BladeCenter Installation Guide on the IBM BladeCenter Documentation CD. Information is also available in:
v Chapter 6, “Running a Serial Over LAN session,” on page 33 v The IBM Eserver BladeCenter and BladeCenter T Management Module
Command Line Interface Reference Guide on the IBM BladeCenter Documentation CD
© Copyright IBM Corp. 2003 17
Note: The BladeCenter unit supports up to four Ethernet switch modules.
The SOL feature is accessed through the Management Module Command-Line Interface. For information about using the command-line interface, see “Using the command-line interface” and the IBM Eserver BladeCenter and BladeCenter T
Management Module Command Line Interface Reference Guide on the IBM BladeCenter Documentation CD.
Using the command-line interface
The IBM Eserver BladeCenter Management Module Command-Line Interface provides direct access to BladeCenter management functions as an alternative to using the Web interface. Using the command-line interface, you can issue commands to control the power and configuration of the blade server and other components installed in the BladeCenter unit. The command-line interface also provides access to the text-console command prompt for the blade server through an SOL connection. See the IBM Eserver BladeCenter and BladeCenter T
Management Module Command Line Interface Reference Guide on the IBM BladeCenter Documentation CD for information and instructions.
Configuring the Gigabit Ethernet controller
One dual-port Gigabit Ethernet controller is integrated on the blade server system board. Each controller port provides a 1000-Mbps full-duplex interface for connecting to one of the Ethernet-compatible switch modules in I/O bays 1 and 2, which enables simultaneous transmission and reception of data on the Ethernet local area network (LAN). Each Ethernet controller port on the system board is routed to a different switch module in I/O bay 1 or bay 2. The routing from the Ethernet controller port to the I/O bay will vary based on blade server type and the operating system that is installed. See “Blade server Ethernet controller enumeration” on page 19 for information about how to determine the routing from the Ethernet controller ports to I/O bays for the blade server.
Note: Other types of blade servers, such as the BladeCenter HS20 Type 8678, that
are installed in the same BladeCenter unit as this BladeCenter JS20 Type 8842 might have different requirements for Ethernet controller routing. See the documentation that comes with the other blade servers for detailed information.
You do not need to set any jumpers or configure the controllers for the blade server operating system. However, you must install a device driver to enable the blade server operating system to address the Ethernet controller ports. For device drivers and information about configuring the Ethernet controller ports, see the Ethernet software documentation that comes with the blade server, or contact your reseller or IBM marketing representative. For updated information about configuring the controllers, go to the IBM Support Web site at http://www.ibm.com/support/.
The Ethernet controller supports failover, which provides automatic redundancy for the Ethernet controller ports. Without failover you can have only one Ethernet controller port from each server attached to each virtual LAN or subnet. With failover you can configure more than one Ethernet controller port from each server to attach to the same virtual LAN or subnet. Either one of the integrated Ethernet controller ports can be configured as the primary Ethernet controller port. If you have configured the controller ports for failover and the primary link fails, the secondary controller port takes over. When the primary link is restored, the Ethernet
18 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
traffic switches back to the primary Ethernet controller port. (See the operating-system device driver documentation for information about configuring for failover.)
Important: To support failover on the blade server Ethernet controller, the Ethernet
switch modules in the BladeCenter unit must have identical configurations to each other.
Blade server Ethernet controller enumeration
The enumeration of the Ethernet controllers or controller ports in a blade server is operating-system dependent. Yo u can verify the Ethernet controller or controller port designations that a blade server uses through the operating-system settings.
The routing of an Ethernet controller or controller port to a particular BladeCenter unit I/O bay depends on the type of blade server. You can verify which Ethernet controller port in this blade server is routed to which I/O bay by using the following test:
1. Install only one Ethernet switch module or pass-thru module in I/O bay 1.
2. Make sure that the ports on the switch module or pass-thru module are enabled (I/O Module Tasks Management Advanced Management in the BladeCenter management-module Web interface).
3. Enable only one of the Ethernet controller ports on the blade server. Note the designation that the blade server operating system has for the controller port.
4. Ping an external computer on the network connected to the Ethernet switch module. If you can ping the external computer, the Ethernet controller port that you enabled is associated with the switch module in I/O bay 1. The other Ethernet controller port in the blade server is associated with the switch module in I/O bay 2.
If you have installed an I/O expansion card on a blade server, communications from the option are routed to I/O bays 3 and 4. You can verify which controller port on the card is routed to which I/O bay by performing the above test, using a controller on the I/O expansion card and a compatible switch module or pass-thru module in I/O bay 3 or 4.
Chapter 3. Configuration 19
20 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Chapter 4. Problem determination procedures for AIX and Linux
This chapter outlines the procedure to follow if the server suspends operation without notice.
Use the following procedure if any of the following is true:
v The console displays
an SRN/SRC code an 8-digit firmware error code a 3- or 4-digit firmware checkpoint (progress) code
The server does not start up after installation
v
v The server experiences an undetermined error while running, such as if the
server stops running with no error code displayed
Certain errors listed in the SRN/SRC table, Failing Function Code table, and Symptom-to-FRU index will also direct you to perform the diagnostic procedure based on the operating system and the type of problem.
Problem determination
Perform the steps in this section to perform the problem determination.
Step 001 Check for the following information:
002 Perform the following steps:
Step
1. If a firmware checkpoint (progress) code (3 or 4 digits) is displayed, see “Firmware checkpoint (progress) codes” on page
94.
2. If a firmware error code (8 digits) is displayed, see “Firmware error codes” on page 102.
3. If you have an SRN or SRC, see “SRN tables” on page 110.
4. Check the BladeCenter management module event log. If an error was recorded by the system, see “SRN tables” on page
110.
5. Check the blade error LED on the information LED panel; if it is lit, see “Light path diagnostics LEDs” on page 145.
6. If the Blade has stalled, with no error codes and no command line or login prompt, continue to Step 002
7. If the login prompt appears and you still suspect a problem, continue to Step 002.
8. If you have none of the above symptoms, go to “Undetermined problems” on page 156.
1. Turn off the server, making sure to first turn off all external devices, if attached.
2. Check all cables and power cords.
3. Turn on all external devices; then, turn on the blade server.
© Copyright IBM Corp. 2003 21
4. Start the Serial Over Lan (SOL) console for the blade server to be tested and check for the following responses:
a. Progress codes are displayed on the console. b. AIX or Linux login prompt appears.
003 Record any error messages or codes that are displayed on the
Step
screen. If the last error is a POST or firmware error (8-digit) code, look up the error in “Firmware error codes” on page 102. If the last error or is a firmware checkpoint (progress) code (3 or 4 digits), see “Firmware checkpoint (progress) codes” on page 94.
Step 004 Check the BladeCenter management module event log. If an error
was recorded by the system, see Chapter 10, “Symptom-to-FRU index,” on page 93.
Obtaining an SRN/SRC or error code
Perform the steps in this section to get a service request number (SRN), or to obtain an SRC or error code.
Step 001 Check for the following information:
1. If a firmware checkpoint (progress) code (3 or 4 digits) is displayed, see “Firmware checkpoint (progress) codes” on page
94.
2. If a firmware error code (8 digits) is displayed, see “Firmware error codes” on page 102.
3. If you have an SRN or SRC, see “SRN tables” on page 110.
4. Check the BladeCenter management module event log. If an error was recorded by the system, continue with Step 002; then, see “SRN tables” on page 110.
5. If the login prompt appears and you still suspect a problem, continue to Step 002.
6. If you have none of the above symptoms, go to “Undetermined problems” on page 156.
002 Visually check the system for obvious problems such as unplugged
Step
power cables or external devices that are powered off. Did you find an obvious problem?
No Go to Step 003. Yes Fix the problem.
003 Is the operating system AIX?
Step
Yes Record any information or messages that may be provided
on the system console and then go to Step 005 to perform problem determination procedures.
No Go to Step 004.
Step 004 Is the operating system Linux?
Yes Record any information or messages that may be provided
on the system console; then go to Step 007, to perform standalone diagnostics. If you cannot load standalone diagnostics, then answer this question No.
No Go to “Undetermined problems” on page 156.
22 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Step 005 Perform the following procedure for problem determination.
Note: When possible, run AIX Online Diagnostics in Concurrent
Mode. AIX Online diagnostics perform additional functions, compared to standalone diagnostics CD.
1. Perform the AIX online concurrent mode diagnostics for Problem Determination, see “Performing AIX online concurrent mode diagnostics for problem determination” on page 25. Record any diagnostic results and utilize the “SRN tables” on page 110 to identify the failing component.
Note: If you have replaced the failing component then to verify
the repair go to “Verifying the replacement part using AIX diagnostics” on page 30.
2. If you cannot perform AIX concurrent online diagnostics then continue to Step 006.
Step 006 Perform the following steps:
1. Turn off the system unit power and wait 45 seconds before proceeding.
2. Turn on the system unit power.
3. Start the Serial Over Lan (SOL) console for the blade server to be tested and check for the following responses:
a. Progress codes are displayed on the console. b. Record any messages or diagnostic information that may be
displayed on the system console.
Load the standalone diagnostics CD-ROM. Go to “Running the
4. standalone diagnostics from CD-ROM” on page 25 or “Running standalone diagnostics from a management (NIM) server” on page 68.
5. If you have replaced the failing component then to verify the repair go to “Verifying the replacement part using AIX diagnostics” on page 30.
6. If you are still having a problem or think you still have a problem call the support center.
Step 007 If the operating system is Linux then perform the following:
This ends the AIX procedure.
1. Turn off the system unit power and wait 45 seconds before proceeding.
2. Turn on the system unit power.
3. Start the Serial Over Lan (SOL) console for the blade server to be tested and check for the following responses:
a. Progress codes are displayed on the console. b. Record any messages or diagnostic information that may be
displayed on the system console.
Continue with step 008.
Chapter 4. Problem determination procedures for AIX and Linux 23
Step 008 Load the Standalone Diagnostics in Service Mode. Refer to
“Running the standalone diagnostics from CD-ROM” on page 25 or “Running standalone diagnostics from a management (NIM) server” on page 68.
Can you load the standalone diagnostics?
No Go to “Undetermined problems” on page 156. If you still
have a problem then Call to get additional support.
Yes Select the resources to be tested and record any diagnostic
information (SRNs or SRC Error codes) and go to “SRN tables” on page 110.
This ends the Linux procedure.
24 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Chapter 5. AIX online, standalone and verification procedures
This chapter describes the procedures for performing online concurrent and CD-based diagnostics, and replacement part verification for an AIX operating system.
Performing AIX online concurrent mode diagnostics for problem determination
Perform the following steps to run online diagnostics in concurrent mode:
1. Log in to the AIX operating system as root user, or use CE login. If you need help, contact the system operator.
2. Enter the diag command to load the diagnostic controller, and display the online diagnostic menus.
3. At the Function Selection menu, select Diagnostic Routines.
4. At the diagnostic mode selection menu, select problem determination.
5. When the Diagnostic Selection menu is displayed, choose a resource or all resources to be tested.
6. After selecting the resource that you want to test, select Commit (PF7 = F7). The resources will then be tested.
7. If any SRNs or firmware error codes are displayed, record all information provided from the diagnostic results; then, go to the SRN tables. If No trouble found is displayed, continue to next step.
8. When testing is complete, press F3 to return to the Diagnostic Operating Instructions display.
9. Press Ctrl+D to log off from root user or CE login.
10. When finished, contact your hardware service provider with any information you received during the diagnostics, including service request numbers (SRNs) or firmware error codes.
Running the standalone diagnostics from CD-ROM
The AIX diagnostics can be downloaded from the World Wide Web at http://techsupport.services.ibm.com/server/mdownload/diags/download.
Note: Select the diagnostics that indicate JS20 support.
To run standalone diagnostics in service mode, complete the following steps:
Step 001 Verify with the system administrator and systems users that the
system unit may be shut down. Stop all programs including the operating system (refer to the AIX operating system documentation for information on the shutdown command). Make the CD drive available to the system on which you want to run standalone diagnostics (see the BladeCenter Management Module Operations Guide for more information).
Step 002 Before attempting to load standalone CD diagnostic, make sure you
are at the latest firmware level (xx.xx) before continuing.
Step 003 Log in to the management module. Step 004 Enable SOL for the JS20 Blade to be tested.
© Copyright IBM Corp. 2003 25
Step 005 Select CD-ROM as the first device to be booted from the
configuration menu boot sequence.
Step 006 On the operator panel on the blade to be tested, press the CD
button to assign the CD-ROM to the blade to be tested; then, insert the diagnostic CD into the CD drive.
Step 007 Turn on the blade to be tested.
Note: This could take from 3 to 5 minute to load diagnostic from
CD. Please be patient.
Progress codes will stop at E1AD; look at the CD drive activity LED (flashing). Standalone diagnostics are booting. This may take 2 to 3 minutes. During this time the system may reboot and the progress code will stop at E14D again.
Note: Only firmware progress codes, not AIX progress codes, are
displayed when booting from the diagnostic CD or booting AIX.
The screen will display “Welcome to AIX”.
Note: Once you have the “Welcome to AIX” screen, another 3 to 5
minutes may be required to get to the next screen; please be patient.
A console message will display:
Standalone Diagnostic has completed loading. Please remove the Diagnostic CD from the Tray.
You can leave the CD installed at this point or remove the CD (256MB RAM required to remove CD at this point). The diagnostic application is now loaded into memory and the CD is no longer required.
The screen will display “Please define the system console”. Follow the instructions on the screen to define the system console. A choice of vs100 as the system console is recommended.
Notes:
v At this point, you can follow the instructions to run standalone
diagnostics or service aids from CD. Once you are done, you can press the F10 key to exit diagnostics.
v The operating system will not be available until the system is
rebooted with the diagnostics CD removed from the CD drive, or with a different startup drive selected.
008 At the Diagnostic Operating Instructions screen, press Enter.
Step Step 009 From the Function Selection screen, use the up and down arrow
keys to select the function to be performed. Select the type of terminal to be defined from the list provided at the prompt, for example, type vs100.
26 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Note: Use vs100 as the type of terminal; however, depending on
the terminal emulator selected, the function keys (PF#) may
not function. In this case, use the F# function keys or press Esc and the number in the screen menus. For example, for PF3 you can press F3 or you can press the Esc key and the #3 key.
010
Step
v Select Diagnostic Routine and, if attempting to run diagnostics in
Problem Determination, go to Step 011.
v Select Diagnostic Routine and, if attempting to run diagnostics in
System Verification, go to Step 012.
v Select Task Selection if attempting to perform diagnostic service
aids (for example, Display Hardware Error Report); then, go to Step 013.
011 Problem determination.
Step
From the Function Selection screen, use the up or down arrow keys to select Diagnostic Routines; then, press Enter.
1. Press 1, then press Enter. From the diagnostic selection menu,
use the up or down arrow keys to select Problem Determination.
2. Select the resource to be tested and the commit (PF) key.
3. Record any results provided and go to the “SRN tables” on page 110 to identify the failure and perform the action(s).
4. When testing is complete, use the F3 or the Esc and #3 keys to return to the Diagnostic Selection. If you want to run another test, press F3 or the Esc and #3 keys again to return to the
Function Selection screen.
5. If you want to exit standalone diagnostics, select the exit function (F10 key) from the menu.
012 System Verification
Step
From the Function Selection screen, use the up or down arrow keys to select Diagnostic Routines and press Enter.
1. Press 1, then press Enter
2. From the Diagnostic Selection menu, use the up or down arrow keys to select System Verification.
3. Select the resource to be verified or select All Resources, and press the Commit (PF) key.
4. Record any results provided; then, go to the “SRN tables” on page 110 to identify the failure and perform the actions.
5. When testing is complete, use the F3 or the Esc and #3 keys to return to the Diagnostic Selection screen; then, press F3 or the Esc and #3 keys again to return to the Function Selection screen if you want to verify another component.
013 Task selection.
Step
From the Function Selection screen use the up or down arrow keys to select Task Selection; then, press Enter.
1. Use the up or down arrow keys to select the task to be run; then, press Enter.
2. Follow the instruction for the task selected.
Chapter 5. AIX online, standalone and verification procedures 27
3. When the task is completed, press F3 or the Esc and #3 keys
to return to the Task Select screen.
4. If you want to run another task, select the task to be performed.
From the Task Selection list, select the service aid task you want to perform; for example, Update and Manage System
Flash (see “Tasks (service aids)” on page 63).
5. After a task is selected, a resource menu might be presented showing all resources supported by the task.
6. When you are finished with Task Selection, press F3 or the Esc and #3 keys to return to the Function Select screen, or press F10 to exit.
014 Once you have completed the “Please ensure that you reset the
Step
boot list” screen, remove the CD if it is still in the CD drive, and make sure you set the original boot list that had been defined by the user.
Step 015 If you are still having a problem or think you still have a problem,
call the support center.
Performing AIX online concurrent mode diagnostics for previous diagnostic results: service aids
Complete the following steps to display previous diagnostic results form online diagnostics in concurrent mode:
1. Log in to the AIX operating system as root user, or use CE login. If you need help, contact the system operator.
2. Enter the diag command to load the diagnostic controller, and display the online diagnostic menus.
3. At the Function Selection menu, select Task Selection.
4. At the Task Selection List menu, select Display Previous Diagnostic Results.
5. At the Previous Diagnostic Results menu, select Display Diagnostic Log Summary.
The diagnostic log will be shown with a time-ordered table of events from the error log. Look in the ’T’ column form the most recent entry that is an ’S’ type of entry.
Press Enter to select that row in the table; then, choice commit. The details of this entry from the table will be displayed; look for the SRN
entry shown near the end of the entry and record the information shown. Example:
28 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
IDENTIFIER: DATE
Date/Time: Fri Jul 16 04:06:09 Sequence Number: 287 Event type: SRN Callout
Resource Name: sysplanar0 Resource Description: System Planar Location: 00-00
Diag Session: 12736 Test Mode: No Console,Non-Advanced,Normal IPL,ELA,Option Checkout
Error Log Sequence Number: 3 Error Log Identifier: BFE4C025
SRN: 2B276422
Description: Refer to the Error Code to FRU Index in the system service guide.
Probable FRUs: n/a FRU: 09P0406 P1-C1 _______________________________ [BOTTOM] Use Enter to continue.
Esc+3=Cancel Esc+0=Exit Enter
6. If any SRNs are displayed, record all information provided from the diagnostic results; then, go to“SRN tables” on page 110 or “Firmware error codes” on page
102. If No trouble found is displayed, continue to next step.
7. When results are complete, press F3 to return to the Diagnostic Operating
Instructions display.
8. Press Ctrl+D to log off from root user or CE login.
9. When finished, contact your hardware service provider with any information you received during the diagnostics, including service request numbers (SRNs) or firmware error codes.
Performing AIX online concurrent mode diagnostics for system verification
Note: This procedure is for verifying newly-installed components, such as hard disk
drives, options, etc.
Perform the following steps to run online diagnostics in concurrent mode:
1. Log in to the AIX operating system as root user, or use CE login. If you need help, contact the system operator.
2. Enter the diag command to load the diagnostic controller, and display the online diagnostic menus.
3. At the Function Selection menu, select Diagnostic Routines.
4. At the Diagnostic Mode Selection menu, select System Verification.
5. When the Diagnostic Selection menu is displayed, choose the resource to be tested or all resources to be tested.
6. After selecting the resource, select Commit (PF7 or F7). The resources will then be tested.
Chapter 5. AIX online, standalone and verification procedures 29
7. If any SRNs or firmware error codes are displayed, record all information provided from the diagnostic results, go to the “SRN tables” on page 110 or “Firmware error codes” on page 102.
If No trouble found is displayed, continue to the next step.
8. When testing is complete, press F3 to return to the Diagnostic Operating
Instructions display.
9. Press Ctrl+D to log off from root user or CE login.
When finished, contact your hardware service provider with any information you received during the diagnostics, including service request numbers (SRNs) or firmware error codes.
Verifying the replacement part using AIX diagnostics
Complete the following steps to verify the replacement part using AIX diagnostics:
1. Did you use an AIX or online diagnostics service aid hot-swap operation to replace the part?
Note: When you are not sure, answer this question No. No Go to Step 2. Yes Go to Step 4 on page 31. Note: Hot plug is currently not supported on
the JS20.
Follow these steps:
2. a. Start the system. b. Wait until the AIX operating system login prompt displays or until apparent
system activity on the operator panel or display has stopped. Did the AIX login prompt display?
No If an SRN displays, suspect a loose adapter or cable connection.
Review the procedures for the part that you replaced to ensure that the new part is installed correctly. If you cannot correct the problem, collect all SRNs or any other reference code information that you see.
Contact your service provider for assistance.
Note: If you received an SRN or any other reference code when
you attempted to start the system, you can learn more about these codes in the “SRN tables” on page 110 or “Firmware error codes” on page 102.
This ends the procedure.
Yes Go to Step 3.
If the Resource Repair Action menu appears, go to Step 6 on page 316. If not,
3. follow these steps:
a. Log in as root user or use CE login b. At the command line, type diag -a and press Enter to check for missing
resources. Follow any instructions that appear. If an SRN displays, suspect a loose card or connection. If no instructions appear, no resources were detected as missing; go to Step 4 on page 31.
Note: If you have a resource with a “-M”, this means the resource is no
30 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
longer available to this JS20 blade. Check to see if the resource
(such as USB CD-ROM or diskette drive) are assigned to the JS20 blade against which you are running diag -. Follow the prompts to resolve the resource conflict. See“Missing resources” on page 60 for more information. If an 8-digit error code is displayed, go to “Firmware error codes” on page 102, find the error and perform the listed action. If an SRN is displayed, record it and go to “SRN tables” on page 110.
c. Then, proceed to the next step.
Complete these steps:
4. a. At the command line, type diag and press Enter to move to the Function
Selection menu.
b. From the Function Selection menu, select Advanced Diagnostics
Routines and press Enter.
c. From the Diagnostic Mode Selection menu, select System Verification
and press Enter.
d. When the Diagnostic Selection menu appears, select All Resources, or
test only the part you replaced, along with any devices that are attached to the part you replaced, by selecting the diagnostics for the individual part. Press F7 = commit to run the tests. Did the Resource Repair Action menu appear?
No Go to Step 5. Yes Go to Step 6.
Did the ″Testing complete, no trouble was found display appear?
5.
No There is still a problem. Contact your service provider. This ends the
procedure.
Yes Complete the following steps:
a. Press F3 or Enter to return to the Advanced Diagnostic selection,
then press F3 or press the Esc key and the #3 key for the Function
Selection menu.
b. Select Task Menu. c. Select Log Repair Action to update the AIX error log. If the repair
action was reseating a cable or adapter, select the resource associated with that repair action. If the resource associated with your action is not displayed on the resource list, select sysplanar0 and press F7 = Commit.
Note: This action changes the indicator light for the part from the
Fault state to the Normal state. Go to Step 8 on page 32.
6. When a test is run on a resource in System Verification mode and that resource has an entry in the AIX error log, then, if the test on the resource was successful, the Resource Repair Action menu appears. After replacing a part, you must select the resource for that part from the Resource Repair Action menu. This updates the AIX error log to indicate that a system-detectable part has been replaced.
Note: On systems with an indicator light for the failing part, this changes the
Follow these steps: a. Select the resource that has been replaced from the Resource Repair Action
indicator light from the Fault state to the Normal state.
menu. If the repair action was reseating a cable or adapter, select the
Chapter 5. AIX online, standalone and verification procedures 31
resource associated with that repair action. If the resource associated with your action does not appear on the resource list, select sysplanar0 and press Enter.
b. After you have made your selections, choose F7 Commit. Did another
Resource Repair Action display appear? If RA Complete appears, press Enter for the NTF screen.
No If the “No trouble found” display appears, go to Step 8. Yes Go to Step 7.
7. The parent or child of the resource you just replaced may also require that you run the Resource Repair Action option on it. When a test is run on a resource in System Verification mode and that resource has an entry in the AIX error log, then, if the test on the resource was successful, the Resource Repair Action menu appears. After replacing that part, you must select the resource for that part from the Resource Repair Action menu. This updates the AIX error log to indicate that a system-detectable part has been replaced.
Note: This changes the indicator light for the part from the Fault state to the
Normal state. Complete these steps: a. From the Resource Repair Action menu, select the parent or child of the
resource that has been replaced. If the repair action was reseating a cable or adapter, select the resource associated with that repair action. If the resource associated with your action does not appear on the resource list,
select sysplanar0 and press Enter. b. After you have made your selections, choose Commit. c. If the “No trouble found” display appears, go to Step 8.
8. If the operating system is not started, then start the operating system with the system or partition in normal mode. Were you able to start the operating system?
No Contact your service provider. This ends the procedure. Yes This ends the procedure.
32 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Chapter 6. Running a Serial Over LAN session
The IBM Eserver BladeCenter management-module command-line interface provides a convenient method for entering commands that manage and monitor BladeCenter components. The blade server does not support a direct connection to a monitor, keyboard, or mouse. Therefore, to enable communication between the blade server and these devices, you must first configure the SOL feature on the blade server to establish an SOL connection and then start an SOL session as described in this chapter.
Note: Detailed information about configuring the SOL feature is described in the
BladeCenter JS20 Installation and User’s Guide on the IBM BladeCenter Documentation CD.
This chapter contains the following information about running an SOL session:
v Starting the command-line interface v Establishing a Telnet connection v Establishing a Secure Shell (SSH) connection v Starting an SOL session v Ending an SOL session
In the BladeCenter environment, the integrated system management processor (ISMP) and network interface controller (NIC) on each blade server route the serial data from the blade server serial communications port to the network infrastructure of the BladeCenter unit, including an Ethernet compatible I/O module that supports SOL communication. Configuration of BladeCenter components for SOL operation is done through the BladeCenter management module (see the BladeCenter JS20
Installation and User’s Guide on the IBM BladeCenter Documentation CD). The
management module also acts as a proxy in the network infrastructure to couple a client running a Telnet session with the management module to an SOL session running on a blade server, allowing the Telnet client to interact with the serial port of the blade server over the network. Because all SOL traffic is controlled by and routed through the management module, it is possible for administrators to segregate the management traffic for the BladeCenter unit from the data traffic of the blade servers.
To start an SOL connection with a blade server, you must first start a Telnet command-line interface session with the management module. When this Telnet command-line interface session is running, you can start a remote console SOL session with any blade server installed in the BladeCenter unit that is set up and enabled for SOL operation. You can establish as many as 20 separate Telnet sessions with the BladeCenter management module, giving you the ability to have 14 simultaneous SOL sessions active (one for each of up to 14 blade servers) with six additional command-line interface sessions available for BladeCenter unit management. If security is a concern, secure shell (SSH) sessions can be used to establish secure Telnet command-line interface sessions with the BladeCenter management module before starting an SOL console redirect session with a blade server.
The most recent versions of all BladeCenter documentation are available from the IBM Web site. Complete the following steps to check for updated BladeCenter documentation and technical updates:
1. Go to http://www.ibm.com/support/.
© Copyright IBM Corp. 2003 33
2. In the Learn section, click Online publications.
3. On the “Online publications” page, in the Brand field, select Servers.
4. In the Family field, select BladeCenter.
5. Click Continue.
Detailed
information about SOL setup instructions is available in the BladeCenter
JS20 Installation and User’s Guide on the IBM BladeCenter Documentation CD. Also, see the documentation for the operating system for information about commands that you can enter through an SOL connection. See the IBM Eserver
BladeCenter and BladeCenter T Management Module Command Line Interface Reference Guide for information about:
v Command-line interface guidelines v Command syntax and descriptions v Command-line interface error messages
Selecting the command target
You can use the command-line interface to target commands to the management module or to other devices installed in the BladeCenter unit. The command line prompt indicates the persistent command environment: the environment where commands are entered unless otherwise redirected. When a command-line interface session is started, the persistent command environment is “system”; this indicates that commands are being directed to the BladeCenter unit. See the IBM
Eserver BladeCenter and BladeCenter T Management Module Command Line Interface Reference Guide for additional information.
Starting the command-line interface
Start the management-module command-line interface from a client computer by establishing a Telnet connection to the IP address of the management module or by establishing an SSH connection. Yo u can establish up to 14 separate Telnet or SSH sessions to the BladeCenter management module.
Although a remote network administrator can access the management-module command-line interface through Telnet, this method does not provide a secure connection. As a secure alternative to using Telnet to access the command-line interface, SSH ensures that all data that is sent over the network is encrypted and secure.
The SSH clients listed below are available. Although some SSH clients have been tested, support or non-support of any particular SSH client is not implied.
v The SSH clients distributed with operating systems such as Linux, AIX®, and
UNIX
®
(see the operating-system documentation for information). The SSH client
of Red Hat Linux 8.0 Professional was used to test the command-line interface.
v The SSH client of cygwin (see http://www.cygwin.com for information) v Putty (see http://www.chiark.greenend.org.uk/~sgatham/putty for information)
The following table shows the types of encryption algorithms that are supported, according to the client software version that is being used.
34 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Algorithm SSH version 1.5 clients SSH version 2.0 clients
Public key exchange SSH 1-key exchange algorithm Diffie-Hellman-group 1-sha-1 Host key type RSA (1024-bit) DSA (1024-bit) Bulk cipher algorithms 3-des 3-des-cbc or blowfish-cbc MAC algorithms 32-bit crc Hmac-sha1
Establishing a Telnet connection
To log on to the management module using Telnet, complete the following steps:
1. Open a command-line window on the network-management workstation, type telnet 192.168.70.125, and press Enter. The IP address, 192.168.70.125, is the default IP address of the management module; if a new IP address has been assigned to the management module, use that one instead.
A command-prompt window opens.
2. At the login prompt, type the management-module user ID. At the password prompt, type the management-module password. The user ID and password are case sensitive and are the same as those that are used for management-module Web access.
A command prompt is displayed. You can now enter commands for the management module.
Establishing a Secure Shell (SSH) connection
To log on to the management module using SSH, complete the following steps:
1. Make sure that the SSH service on the network-management workstation is enabled. See the operating-system documentation for instructions.
2. Make sure that the SSH server on the BladeCenter management module is enabled. See the IBM Eserver BladeCenter and BladeCenter T Management
Module User’s Guide for instructions.
3. Start an SSH session to the management module using the SSH client of your choice. For example, if you are using the cygwin client, open a command-line window on the network-management workstation, type ssh -x 192.168.70.125, and press Enter. The IP address, 192.168.70.125, is the default IP address of the management module; if a new IP address has been assigned to the management module, use that one instead.
A command prompt window opens.
4. Type the management-module user ID when prompted. At the password prompt, type the management-module password. The user ID and password are case sensitive and are the same as those that are used for management-module Web access.
A command prompt is displayed. You can now enter commands for the management module.
For information about installing and configuring the SSH, see the
Note:
BladeCenter JS20 Installation and User’s Guide on the IBM BladeCenter Documentation CD.
Starting an SOL session
Notes:
1. The SOL feature must be enabled for both the BladeCenter unit and the blade server before you can start an SOL session with the blade server. See the IBM
Chapter 6. Running a Serial Over LAN session 35
Eserver BladeCenter and BladeCenter T Management Module Command Line Interface Reference Guide for information about SOL commands. See the
operating-system documentation for information about SOL commands that you can enter using the command-line interface. Additional information about setting up and enabling SOL, and configuring a blade server for SOL, is available in the BladeCenter JS20 Installation and User’s Guide on the IBM BladeCenter
Documentation CD.
2. The BladeCenter management module automatically stores the previous 8 KB of serial data that was transmitted by each blade server, even when SOL sessions are not active. When an SOL session is established, all of the previous serial data, up to 8 KB, is automatically displayed. If no previous data is available when the SOL session starts, the cursor will remain on the command line until new serial data is transmitted.
you start a Telnet or SSL session to the BladeCenter management module,
After you can start an SOL session to any individual blade server that supports SOL using the console command. Therefore, you can have simultaneous SOL sessions active for each blade server installed in the BladeCenter unit.
Use the console command from the command line, indicating the target blade server, where x is the corresponding blade server bay number:
console -T system:blade[x]
For example, to start an SOL connection to the blade server in blade bay 14, type
console -T system:blade[14]
A blade server that occupies more than one blade server bay is identified by the lowest bay number that it occupies.
After an SOL session is started, all commands are sent to the blade server that is specified by the console command until the SOL session is ended, regardless of the persistent command target that was in effect before the SOL session.
To restart the blade server through an SOL session, use the following key sequence: Esc R Esc r Esc R
Complete the following steps to start a BladeCenter management module Telnet CLI session:
1. From a command prompt, type telnet location Where location is the host name or IP address of the BladeCenter management
module
2. Log on to the BladeCenter management module. The default user name is USERID, and the default password is PASSW0RD (note the number zero, not the letter O, in PASSW0RD).
Ending an SOL session
To end an SOL session, use the following key sequence: Press the Esc and shift-9 keys. The command-line interface will return to the persistent command target that was in effect before the SOL session.
To exit a BladeCenter management-module Telnet CLI session, type exit at the BladeCenter management-module Telnet CLI prompt after ending an SOL session.
36 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Chapter 7. Diagnostics
This chapter provides basic troubleshooting information to help you solve some common problems that might occur with the blade server.
Note: Linux service aids for hardware diagnostics (separate from the operating
system installation) are available for download from the following Web site: http://techsupport.services.ibm.com/server/lopdiags/.
Diagnostic utilities for the Linux operating system are available from IBM. For more information, go to http://www.ibm.com/servers/eservers/support/bladecenter/; in the
Hardware field select BladeCenter JS20, in the Software field select Linux on POWER environment, then click Go.
For information about standalone AIX diagnostics, see “Running the standalone diagnostics from CD-ROM” on page 25.
Other supported operating systems might have diagnostic tools available through the operating system. Consult your operating system documentation for more information.
If you cannot locate and correct the problem using the information in this section, see Appendix A, “Getting help and technical assistance,” on page 161 for more information.
Note: A problem with the BladeCenter JS20 Type 8842 blade server may relate to
General checkout
Follow the checkout procedure for diagnosing hardware problems.
Note: Before performing the checkout procedure, read Appendix B, “Safety
The firmware diagnostics program tests the major components of the blade server during startup and while the operating system is running. For Linux or AIX, there are automatic error log analysis routines that provide failure information during runtime.
either the BladeCenter JS20 Type 8842 blade server or the BladeCenter unit.
v A blade-server problem exists if the BladeCenter unit contains more than
one blade server and only one of the blade servers has the symptom.
v If all of the blade servers have the same symptom, then the problem
relates to the BladeCenter unit. For more information, see the Hardware Maintenance Manual and Troubleshooting Guide for your BladeCenter
unit.
information,” on page 163.
There are also AIX concurrent online diagnostics (from disk) and standalone diagnostics (from CD) to assist you in performing problem determination.
The firmware diagnostic tests are run automatically when the JS20 server is started.
v If your operating system is AIX, then you have AIX diagnostics available
concurrently to check out your hardware. See “Performing AIX online concurrent mode diagnostics for system verification” on page 29.
© Copyright IBM Corp. 2003 37
Note: If your system will not start, you can use the “Running the standalone
diagnostics from CD-ROM” on page 25 procedure to isolate a hard disk drive failure that may be preventing the system from starting.
v If your operating system is Linux, then you have the eSever Standalone
Diagnostics CD available to check out your hardware. See “Running the
standalone diagnostics from CD-ROM” on page 25.
v The console must be open for error codes to be visible. Make sure that SOL is
enabled.
v A single problem might cause several error messages. When this occurs, correct
the cause of the first error message. When the cause of the first error message is corrected, the other error messages
might not occur the next time you run the test.
Important:
1. If multiple error codes are displayed, diagnose the first code that is displayed.
2. If the server stops and a POST (3- or 4-digit) error code is displayed, see “Firmware checkpoint (progress) codes” on page 94.
3. If the server is suspended and no error message is displayed, see “Undetermined problems” on page 156.
4. For intermittent problems, go to Chapter 4, “Problem determination procedures for AIX and Linux,” on page 21 and check the BladeCenter management module event log.
If the operating system is Linux, the Linux Syslog (platform log) may have more information to help isolation the problem.
5. If the blade front panel shows no lit LEDs, verify blade status and errors in BladeCenter management-module web interface; also, see “Undetermined problems” on page 156.
6. If device errors occur, go to Chapter 4, “Problem determination procedures for AIX and Linux,” on page 21 or see Chapter 10, “Symptom-to-FRU index,” on page 93.
Checkout procedure
The checkout procedure can be used if the server does not start up after installation or if it experiences an undetermined error while running, such as if the server stops running with no error code displayed. Certain errors listed in the Symptom-to-FRU index will also direct you to perform the checkout procedure. If the operating system is AIX, you can also verify the system following the procedure “Performing AIX online concurrent mode diagnostics for system verification” on page 29. If the operating system is Linux, then you can use the standalone diagnostic CD in system verification mode to verify the JS20 blade server.
001 PERFORM THE CHECKOUT PROCEDURE:
1. Turn off the server, making sure to first turn off all external devices, if
2. Check all cables and power cords.
3. Turn on all external devices; then, turn on the blade server.
4. Start the Serial Over Lan (SOL) console for the blade server to be
002 DID THE LINUX OR AIX LOGIN PROMPT APPEAR?
YES.
38 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
attached.
tested and check for the following responses: a. Progress codes are displayed on the console. b. AIX or Linux login prompt appears.
1. If a firmware checkpoint (progress) (3 or 4-digit) code or firmware error (8-digit) code is displayed on the console, see “Firmware checkpoint (progress) codes” on page 94 or “Firmware error codes” on page 102.
2. Check the BladeCenter management module event log and if the operating system is Linux, check the Linux Syslog (platform log). If an error was recorded by the system, see Chapter 10, “Symptom-to-FRU index,” on page 93.
3. If an error was recorded or if you believe you have a problem, perform the “Performing AIX online concurrent mode diagnostics for problem determination” on page 25.
4. If the login prompt appears and you still suspect a problem, go to “Performing AIX online concurrent mode diagnostics for problem determination” on page 25 or see “Undetermined problems” on page
156.
NO.
1. Check to see if a firmware checkpoint (progress) (3 or 4-digit) code or firmware error (8-digit) code is displayed on the console; if so, see “Firmware checkpoint (progress) codes” on page 94 or “Firmware error codes” on page 102.
2. Check the blade error LED on the information LED panel; if it is lit, see“Light path diagnostics” on page 46.
3. Record any POST error messages that are displayed on the screen; then, check the BladeCenter management module event log. If an error was recorded by the system or if a checkpoint code is displayed on the console, perform Chapter 4, “Problem determination procedures for AIX and Linux,” on page 21 or see Chapter 10, “Symptom-to-FRU index,” on page 93.
4. If you do not have any error codes, perform Chapter 4, “Problem determination procedures for AIX and Linux,” on page 21.
Diagnostic tools overview
The following tools are available to help you diagnose and solve hardware-related problems:
v POST firmware checkpoints (progress codes)
The power-on self-test (POST), or firmware checkpoints, tests the major components of the blade server. These firmware checkpoints (progress codes) indicate the detection of a problem if the server stops on a checkpoint during the startup process.
A four-digit code indicates successful completion of that portion of POST
when the server does not stop on that checkpoint.
A result other than a four-digit code indicates that POST might have detected
a problem. Error messages also appear during startup if POST detects a hardware-configuration problem. The last POST firmware checkpoint code posted is the most likely failure indicator. See “POST” on page 40 for more information.
Error symptom charts
v
These charts list problem symptoms and steps to correct the problems. See “Error symptoms” on page 145 for more information.
v Light path diagnostics
Use the light path diagnostics feature to diagnose system errors quickly. See “Light path diagnostics” on page 46 for more information.
Chapter 7. Diagnostics 39
POST
Checkpoints
Note: The service processor runs on its own power boundary and continually
monitors hardware attributes and the environmental conditions within the system. The service processor is controlled by firmware and does not require the operating system to be operational to perform its tasks.
After power is turned on and before the operating system is loaded, the system does a power-on self-test (POST). This test performs checks to ensure that the hardware is functioning correctly before the operating system is started. During POST, a POST screen displays, and POST indicators appear on the Serial Over LAN (SOL) console (if one is connected). The next section describes the POST indicators and functions that can be accessed during POST.
The system firmware uses checkpoints (progress codes and error codes) to indicate the status of the system. These codes can appear only on the Serial Over Lan (SOL) console. Firmware error codes and messages indicate that a problem exists; they are not intended to be used to identify a failing part.
Checkpoints display in the system console from the time ac power is connected to the system until the operating system login prompt is displayed after a successful operating system boot. These checkpoints have the following forms:
Exxx Exxx checkpoints indicate that a system processor is in control and
is initializing the system resources. Control is being passed to the operating system when E105 displays on the operator panel display. Location code information may also display on the operator panel during this time (see “Physical location codes” on page 154).
Error codes If a fault is detected, an 8-digit error code is displayed in the
BladeCenter management module event log. A location code might be displayed at the same time on the second line.
The management-module log, which can be accessed through the Blade Center
unit, contains the most recent error codes and messages that the system generated during POST.
Accessing the Linux system error log
If the system information LED is lit, do one of the following:
1. Check for an entry in the BladeCenter management-module event log. If the information in this log is either a four-digit or eight-digit error code, go directly to “Firmware checkpoint (progress) codes” on page 94 or “Firmware error codes” on page 102.
2. Go to “General checkout” on page 37.
Service aids and the Linux system error log
®
Linux on pSeries who have installed and are running Linux. Users can install these free diagnostics tools for effective diagnosis and repair of their system in the rare instance when a system error occurs.
service aids for hardware diagnostics are available for customers
This service aid toolkit provides the key tools required to take advantage of the inherent pSeries hardware reliability, availability, and serviceability (RAS) functions as outlined in the Linux on pSeries RAS Whitepaper, available from
40 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
http://techsupport.services.ibm.com/server/Linux_on_pSeries/images/Linux_RAS.pdf, such as first failure data capture and error log analysis. With the toolkit installed, problem determination and correction is greatly enhanced and the likelihood of an extended system outage is reduced.
The Linux service aids for hardware diagnostics are separate from the operating system installation and are available for download from the following Web site: http://techsupport.services.ibm.com/server/lopdiags/.
Note: The following steps can only be performed if the Linux service tools have
been installed on the JS20 blade server.
001
Step
1. If the blade server is functional, determine your level of Linux by logging in to the blade server as the root user and entering the following command:
ls -l /var/log/platform
If the /var/log/platform file exists, go to substep 2.
2. Use the following command to list diagela messages recorded in the Linux Syslog (platform log):
cat /var/log/platform |grep diagela |more
Linux run-time diagela error messages are logged in the platform file under /var/log.
The following illustration shows an example of the Linux Syslog (platform log) error log diagela messages.
Chapter 7. Diagnostics 41
Aug 13 09:38:45 larry diagela: 08/13/2003 09:38:44 Aug 13 09:38:45 larry diagela: Automatic Error Log Analysis has detected
a problem.
Aug 13 09:38:45 larry diagela: Aug 13 09:38:45 larry diagela: The Service
Request Number(s)/Probable Cause(s)
Aug 13 09:38:45 larry diagela: (causes are listed in descending order of
probability): Aug 13 09:38:45 larry diagela: Aug 13 09:38:45 larry diagela: 651-880: The CEC or SPCN reported an error.
Report the SRN and the following reference and physical location codes
to your service provider. Aug 13 09:38:45 larry diagela: Location: n/a FRU: n/a Ref-Code: B1004699 Aug 13 09:38:45 larry diagela: Aug 13 09:38:45 larry diagela: Analysis of Error log sequence number: 3 Aug 29 07:13:04 larry diagela: 08/29/2003 07:13:04 Aug 29 07:13:04 larry diagela: Automatic Error Log Analysis has detected
a problem. Aug 29 07:13:04 larry diagela: Aug 29 07:13:04 larry diagela: The Service Request Number(s)/Probable Cause(s) Aug 29 07:13:04 larry diagela: (causes are listed in descending order of
probability): Aug 29 07:13:04 larry diagela: Aug 29 07:13:04 larry diagela: 651-880: The CEC or SPCN reported an error.
Report the SRN and the following reference and physical location codes
to your service provider. Aug 29 07:13:04 larry diagela: Location: U0.1-F4 FRU: 09P5866 Ref-Code:
10117661 Aug 29 07:13:04 larry diagela: Aug 29 07:13:04 larry diagela: Analysis of /var/log/platform sequence
number: 24 Sep 4 06:00:55 larry diagela: 09/04/2003 06:00:55 Sep 4 06:00:55 larry diagela: Automatic Error Log Analysis reports the
following: Sep 4 06:00:55 larry diagela: Sep 4 06:00:55 larry diagela: 651204 ANALYZING SYSTEM ERROR LOG Sep 4 06:00:55 larry diagela: A loss of redundancy on input power was
detected. Sep 4 06:00:55 larry diagela: Sep 4 06:00:55 larry diagela: Check for the following: Sep 4 06:00:55 larry diagela: 1. Loose or disconnected power source
connections. Sep 4 06:00:55 larry diagela: 2. Loss of the power source. Sep 4 06:00:55 larry diagela: 3. For multiple enclosure systems, loose or Sep 4 06:00:55 larry diagela: disconnected power and/or signal connections Sep 4 06:00:55 larry diagela: between enclosures. Sep 4 06:00:55 larry diagela: Sep 4 06:00:55 larry diagela: Supporting data: Sep 4 06:00:55 larry diagela: Ref. Code: 10111520 Sep 4 06:00:55 larry diagela: Location Codes: P1 P2 Sep 4 06:00:55 larry
diagela: Sep 4 06:00:55 larry diagela: Analysis of /var/log/platform sequence
number: 13
3. Also use the following command to list RTAS messages recorded in the Linux Syslog (platform log):
cat /var/log/platform |grep RTAS |more
Linux RTAS error messages are logged in the platform file under /var/log. The following illustration shows an example of the Linux Syslog (platform error log) RTAS messages.
42 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Aug 27 12:16:33 larry kernel: RTAS: 15 -------- RTAS event begin -------­Aug 27 12:16:33 larry kernel: RTAS 0: 04440040 000003f8 96008508 19155800 Aug 27 12:16:33 larry kernel: RTAS 1: 20030827 00000001 20000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 2: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 3: 49424d00 55302e31 2d463400 00503034 Aug 27 12:16:33 larry kernel: RTAS 4: 10117661 04a0005d 10110000 00000000 Aug 27 12:16:33 larry kernel: RTAS 5: 00007701 000000e0 00000003 000000e3 Aug 27 12:16:33 larry kernel: RTAS 6: 00000000 01000000 00000000 31303131 Aug 27 12:16:33 larry kernel: RTAS 7: 37363631 20202020 20202020 55302e31 Aug 27 12:16:33 larry kernel: RTAS 8: 2d463420 20202020 20202020 03705a39 Aug 27 12:16:33 larry kernel: RTAS 9: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 10: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 11: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 12: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 13: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 14: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 15: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 16: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 17: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 18: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 19: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 20: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 21: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 22: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 23: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 24: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 25: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 26: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 27: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 28: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 29: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 30: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 31: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 32: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 33: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 34: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 35: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 36: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 37: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 38: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 39: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 40: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 41: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 42: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 43: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 44: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 45: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 46: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 47: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 48: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 49: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 50: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 51: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 52: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 53: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 54: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 55: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 56: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 57: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 58: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 59: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 60: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 61: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 62: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 63: 00000000 00000000 00000000 00020000 Aug 27 12:16:33 larry kernel: RTAS: 15 -------- RTAS event end ----------
Error codes and location codes might appear as RTAS messages.
The extended data is also provided in the form of an RTAS message. The extended data contains other error code words that help in isolating the correct FRUs. The start of the extended data is marked, for example, by the line
Aug 27 12:16:33 larry kernel: RTAS: 15 ------ RTAS event begin ------.
Chapter 7. Diagnostics 43
The number after the colon is a sequence number that correlates this data with any diagela data with the same sequence number. The end of the extended data is marked by the line
Aug 27 12:16:33 larry kernel: RTAS: 15 ----- RTAS event end -------
with the same sequence number. Word 13 and word 19 are found in the RTAS messages. For
example, to find word 13, first find the error code in the left column of words of the extended data, 10117661. In this example, we find the error code to the right of RTAS 4:. This is also word 11. To get word 13, 10110000, count the words left to right, beginning at word 11.
002 If you performed substep 2 on page 41 of Step 001, record any
Step
RTAS messages found in the Linux Syslog (platform log) in Step 001. If you performed substep 2 on page 41 of Step 001, record any RTAS and diagela messages found in the Linux Syslog (platform log) in Step 001, and also record any extended data found in the RTAS messages, especially word 13 and word 19. Ignore all other messages in the Linux Syslog (platform log).
Step 003 Examine the Linux boot (IPL) log by logging in to the system as the
root user and entering the following command:
cat /var/log/boot.msg |grep RTAS |more
Linux boot (IPL) error messages are logged into the boot.msg file under /var/log. The following illustration shows an example of the Linux boot error log.
RTAS daemon started RTAS: -------- event-scan begin -------­RTAS: Location Code: U0.1-F3 RTAS: WARNING: (FULLY RECOVERED) type: SENSOR RTAS: initiator: UNKNOWN target: UNKNOWN RTAS: Status: bypassed new RTAS: Date/Time: 20020830 14404000 RTAS: Environment and Power Warning RTAS: EPOW Sensor Value: 0x00000001 RTAS: EPOW caused by fan failure RTAS: -------- event-scan end ----------
Step 004 Record any RTAS messages found in the Linux boot (IPL) log in
Step 003. Ignore all other messages in the Linux boot (IPL) log.
Step 005 If you performed substep 3 on page 42 of Step 001 for the
current Linux partition, go to Step 006 on page 45, and when asked in Step 006, do not record any additional extended data from Step 004 for the current Linux partition. Examine the extended data in both logs.
The following is an example of the Linux extended data.
44 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
RTAS daemon started RTAS: -------- event-scan begin -------­RTAS: Location Code: U0.1-P1-C2 RTAS: Log Debug: 04 4b2726fb04a00011702c0014000000000000000000000000f1800001001801d3ffffffff0100000
00000000042343138 20202020383030343236464238454134303030303 030303030303030 RTAS: Log Debug: D2 5046413405020d0a000001000271400100000033434d502044415441000001000000000000010000 f180000153595320444154410000000000000000200216271501050920021627150105092002063 7150105095352432044415441702c001400000000000000020018820201d3820000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000280048ea400000000000000000000000000000000000000004350 542044415441702cff08000000001c000000702cf0080000000080000000702cf100702cf200702 c000400000800702c01040bf2002e702c02040c1fffbf702c0300702c1000702c11040bf2002e70 2c12040c1fffbf702c1300702ca000702ca108000000000000a03c702ca208000000000000effc7 02cb000702cb108000000000000a03c702cb208000000000000effc702cc000702cc10800000000 0000a03c702cc208000000000000effc702c3000702c31080000000000000003702c32080000000 00000007b702c8000702c81080000000020e27a39702c820800000000fffeffff702cd000702cd1 080000000010004010702cd208000000007777f3fffffffffffffffffffffffffffffffffffffff fffffffffffffffffffffffffff RTAS: WARNING: (FULLY RECOVERED) type: INTERN_DEV_FAIL RTAS: initiator: UNKNOWN target: UNKNOWN RTAS: Status: unrecoverable new RTAS: Date/Time: 20020905 15372200 RTAS: CPU Failure RTAS: Internal error (not cache) RTAS: CPU id: 0 RTAS: Failing element: 0x0000 RTAS: -------- event-scan end --------
Step 006 Record any extended data found in the Linux Syslog (platform log)
in Step 001 or the Linux boot (IPL) log in Step 003. Be sure to record word 13.
Note: The lines in the Linux extended data that begin with
RTAS: Log Debug: 04
contain the error code listed in the next 8 hex characters. In the previous example, 4b27 26fb is an error code. The error code is also known as word 11. Each 4 bytes after the error code in the Linux extended data is another word (for example, 04a0 0011 is word 12, and 702c 0014 is word 13, and so on).
007 Were any error codes or checkpoints recorded in Steps 001,
Step
002, 003, 004, 005, or 006?
NO. Go to Step 008. YES. Go to “Firmware checkpoint (progress) codes” on page 94
and “Firmware error codes” on page 102 for each recorded error code or symptom. Perform the indicated actions one at a time for each error code until the problem has been corrected. If all recorded error codes have been processed and the problem has not been corrected, go to Step 008.
Step 008 If no additional error information is available and the problem has
not been corrected, shut down the blade server.
1. Are there any event-logged entries in the BladeCenter management-module event log?
2. Replace the system board.
Chapter 7. Diagnostics 45
FRU/CRU isolation
Error codes and the recommended actions for each code are provided in Chapter 10, “Symptom-to-FRU index,” on page 93. These actions can provide you with informational messages and directions or can refer you to Chapter 11, “Parts listing, Type 8842,” on page 159.
If a replacement part is indicated, the part is referred to by name. The physical location codes are listed for each occurrence as required (see “Physical location codes” on page 154). Chapter 11, “Parts listing, Type 8842,” on page 159 provides a parts index with the predominant field replaceable units (FRUs) or customer replaceable units (CRUs) listed by name, and provides illustrations of the various assemblies and components that make up the blade server.
Error symptom charts
You can use the error symptom charts to find solutions to problems that have definite symptoms (see “Error symptoms” on page 145).
If you cannot find the problem in the error symptom charts, go to “Checkout procedure” on page 38 and “Undetermined problems” on page 156.
If you encounter problems with an Ethernet or Fibre Channel switch module, IBM
Eserver BladeCenter Optical Pass-Thru Module, I/O expansion card, or other
optional device that can be installed in the BladeCenter unit, see the applicable
Hardware Maintenance Manual and Troubleshooting Guide on the IBM BladeCenter Documentation CD or other documentation that comes with the device for more
information.
Light path diagnostics
Many errors are first indicated when the blade-error LED on the blade server is lit (see “Blade server controls and LEDs” on page 14). If this LED is lit, one or more error LEDs elsewhere in the blade server might also be lit and can direct you to the source of the error.
This section describes how to use the light path diagnostics to identify problems that might arise. To locate the actual component that caused the error, you must locate the lit error LED for that component.
Note: Read Appendix B, “Safety information,” on page 163 and “Handling
static-sensitive devices” on page 71.
For example, if a blade error has occurred and the blade-error LED is lit on the blade server, complete the following steps:
1. Turn off the blade server and remove it from the BladeCenter unit.
2. Place the blade server on a flat, static-protective surface.
3. Remove the cover from the blade server (see “Opening the blade server cover” on page 74).
46 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
DIMM 3 error LED (CR46)
DIMM 1 error LED (CR40)
DIMM 2 error LED (CR45)
DIMM 4 error LED (CR53)
Microprocessor 0 error LED (CR19)
Temperature error LED (CR16)
Light Path Diagnostics (SW1)
Microprocessor 1 error LED (CR58)
Service processor error LED (CR27)
NMI error LED (CR17)
System board error LED (CR20)
Reserved (CR29)
Memory errors
4. Press and hold the light path diagnostics button (SW1) to light the LEDs that were lit before you removed the blade server from the BladeCenter unit. The LEDs will remain lit for as long as you press the button, up to a maximum of 25 seconds.
Notes:
a. Power is available to relight the light path diagnostics LEDs for a short
period of time after the blade server is removed from the BladeCenter unit. During that time, you can relight the light path diagnostics LEDs for a maximum of 25 seconds (or less, depending on the number of LEDs that are lit and the length of time the blade server is removed from the BladeCenter unit) by pressing the light path diagnostics button (SW1).
b. Error LED CR29 is reserved.
Use the table at “Light path diagnostics LEDs” on page 145 to help determine
5. the cause of the error and the action that should be taken.
If a memory problem occurs, complete the following steps before replacing a DIMM:
1. Reseat both DIMMs in the bank.
2. Turn off the blade server and wait 30 seconds; then, turn on the blade server.
3. Check for a memory type mismatch in the bank.
For more information about memory, see “Installing memory modules” on page 77.
Chapter 7. Diagnostics 47
Recovering the system firmware code
The system firmware is contained in two separate images in the flash memory of the blade server: temporary and permanent. These images are referred to as TEMP and PERM, respectively. The system normally starts from the TEMP image, and the PERM image serves as a backup. If the TEMP image becomes damaged, such as from a power failure during a flash update, you can recover the TEMP image from the PERM image.
If the TEMP image becomes damaged, you can see one of two symptoms:
v The system automatically starts from the PERM image. This is indicated by the
error code 20D00902.
v The system hangs or is non-responsive after the system is started with no
checkpoints.
your system hangs, you can force the system to start from the PERM image by
If using the code page jumper (J14).
v Setting jumper J14 to pins 2 and 3 will force the blade server to start (boot) from
the PERM image.
v Setting jumper J14 to pins 1 and 2 will enable the blade server to start (boot)
from either the TEMP or PERM image.
Recovery of system firmware code using service aids
Linux on pSeries service aids for hardware diagnostics are available for customers who have installed and are running the Linux operating system. Users can install these free diagnostics tools for effective diagnosis and repair of the system in the rare instance when a system error occurs.
This service aid toolkit provides the key tools required to take advantage of the inherent pSeries hardware RAS functions as outlined in the Linux on pSeries RAS White paper available from http://techsupport.services.ibm.com/server/ Linux_on_pSeries/images/Linux_RAS.pdf. These functions include first failure data capture and error log analysis. With the toolkit installed, problem determination and correction is greatly enhanced and the likelihood of an extended system outage is reduced.
The Linux service aids for hardware diagnostics are separate from the operating system installation and are available for download from the following Web site: http://techsupport.services.ibm.com/server/lopdiags/.
Note: The Update_Flash command can only be performed if the Linux service
tools have been installed on the blade server.
Starting the TEMP image
To force the system to start the TEMP image, complete the following steps:
Note: Do not perform these steps if the system error code 20D00902 has already
occurred on your system.
1. Turn off the blade server.
2. Remove the blade server (see “Removing the blade server from the BladeCenter unit” on page 73).
3. Open the blade-server cover (see “Opening the blade server cover” on page 74 for instructions).
48 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
4. Remove the blade-server bezel assembly (see “Removing the blade server bezel assembly” on page 75).
5. Locate jumper J14 (system firmware code page jumper) on the system board.
3 2 1
System firmware code page jumper (J14)
6. Move jumper J14 to pins 2 and 3 to enable system firmware recovery mode.
7. Replace the cover and reinstall the blade server in the BladeCenter unit, making sure that the blade server controls all relevant components and restart the blade server.
8. If the system starts up and boots to the operating-system prompt, see “Recovering the TEMP image from the PERM image.” If the system does not boot to the operating-system prompt, replace the system-board assembly. Contact a service support representative for assistance.
If the blade server does not restart, you must replace the system-board
Note:
assembly. Contact a service support representative for assistance.
Recovering the TEMP image from the PERM image
To recover the TEMP image from the PERM image, you must perform the reject function. The reject function copies the PERM image into the TEMP image. To perform the reject function, complete the following steps:
1. If you have not started the system from the TEMP image, do so now. For additional information, see “Starting the TEMP image” on page 48.
2. If you have not installed the ppc64 Linux utilities, perform the installation now. For instructions, go to the Linux on POWER Web site at http:// techsupport.services.ibm.com/server/lopdiags/.
3. Reject the TEMP image.
v If you are using the Red Hat Linux or SUSE LINUX operating system, type
the following command:
update_flash -r
v If you are using the AIX operating system, type the following command:
/usr/lpp/diagnostics/bin/update_flash -r
Shut down the blade server using the operating system.
4.
5. If you have not moved jumper J14 as described in “Starting the TEMP image” on page 48, restart the system.
6. If you moved jumper J14, complete the following steps: a. Turn off the blade server. b. Remove the blade server (see “Removing the blade server from the
BladeCenter unit” on page 73).
Chapter 7. Diagnostics 49
c. Open the blade-server cover (see “Opening the blade server cover” on page
74 for instructions).
d. Locate jumper J14 (system firmware code page jumper) on the system
board.
3 2 1
System firmware code page jumper (J14)
e. Move jumper J14 to pins 1 and 2 to enable system firmware recovery mode. f. Replace the cover and reinstall the blade server in the BladeCenter unit,
making sure that the blade server controls all relevant components.
Statement 21:
CAUTION: Hazardous energy is present when the blade server is connected to the power source. Always replace the blade cover before installing the blade server.
g. Restart the blade server. h. Verify that the system starts from the TEMP image.
v If you are using one of the Linux operating systems, go to “Verifying the
system firmware levels using Linux” on page 52.
v If you are using the AIX operating system, go to “Committing the
temporary firmware image using AIX” on page 53.
i. Update the flash again, if you are updating the system firmware code.
might need to update the firmware code to the latest version. See
You http://www.ibm/.com/pc/support for more information about how to update the firmware code.
Updating the blade server firmware
This section describes how to determine the current code levels for the blade server (system) firmware and Integrated Systems Management Processor (service processor). Information on how to validate, update and commit the system firmware is included.
The blade server contains firmware code for the system and service processor. IBM will periodically make firmware updates available for the server system and the service processor. You can maintain the latest levels of firmware code for the blade
50 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
server system and service processor by installing firmware updates as they become available. Be sure to follow the instructions in this section.
Determination of current server firmware levels
Complete the following steps to view the current firmware code levels for the blade server and the service processor:
1. Access and log on to the BladeCenter management-module Web interface as described in the For more information, see the Installation and User’s Guide for your BladeCenter unit.
2. From the Blade Tasks section, select Firmware VPD.
The Blade Server Firmware VPD window contains the build identifier, release, and revision for system and service processor. Compare this information to the firmware information on the IBM Support Web site at http://www.ibm.com/support/. If these two types of information match, then your blade server has the latest firmware code. If these two types of information differ, download the latest firmware code from the IBM Support Web site. Follow the update instructions on the IBM Support Web site.
Updating the blade server service processor
To apply the latest firmware code for your blade server service processor, flash the service processor. Download the latest firmware code for your integrated systems management processor from the IBM Support Web site at http://www.ibm.com/ support/. service processor. Follow the update instructions on the IBM Support Web site.
Use the BladeCenter management-module Web interface to flash the
Important: To avoid problems and to maintain proper system performance, always
make sure that the blade server firmware code and service processor code levels are consistent for all blade servers within the BladeCenter unit.
Update and manage system flash using Linux service aids
This section describes how to update and verify the system flash using Linux service aids.
Updating the system flash using Linux
The Linux service aid for managing the system flash is separate from the operating system installation and is available for download from the following Web site: http://techsupport.services.ibm.com/server/lopdiags/.
Note: The update_flash command can only be performed if the update_flash
service aid has been installed on the JS20 blade server for the appropriate version of Linux.
Complete the following steps to update the system firmware code:
1. Obtain the flash image you want to update and place it in the /etc/microcode directory (create the directory if it does not exist).
2. Issue following command:
update_flash -f /etc/microcode/<update_image_name>
After the system reboots successfully and once you are satisfied with the functionality of the new image, commit the update using the following Linux command:
update_flash -c
Chapter 7. Diagnostics 51
This will copy your new image from the temp side to perm side of flash.
Verifying the system firmware levels using Linux
To verify the system firmware levels on the Perm and Temp side, enter the following command at the Linux prompt (the entire command must be entered on one line):
for file in `ls /proc/device-tree/openprom/*bank*`; do echo $file;
cat $file; echo; echo; done
For Example: This command will return information similar to the following:
/proc/device-tree/openprom/ibm,fw-bank P /proc/device-tree/openprom/ibm,fw-perm-bank FW04310120, 17:16:09, 07/26/2004 /proc/device-tree/openprom/ibm,fw-temp-bank FW04310120, 17:16:09, 07/26/2004
v The value for ibm,fw-bank indicates what side you booted from (T
for TEMP, P for PERM).
v The value for ibm,fw-perm-bank identifies the firmware version,
date and time stamp of firmware on the PERM side.
v The value for ibm,fw-temp-bank indicates the firmware version,
date and time stamp of firmware on the TEMP side.
Notes:
1. If you have to recover the system firmware code, see “Recovering the system firmware code” on page 54.
2. The IBM Remote Deployment Manager (RDM) program does not support the BladeCenter JS20 Type 8842.
3. A reboot of the system must be done after using update_flash -c for the firmware level shown in ibm,fw-perm-bank to be current.
Update and manage system flash using AIX diagnostics
This section describes how to update, commit and verify the system flash using AIX diagnostics.
Updating the system flash using AIX
Attention: Do not power off the system while performing this task!
Complete the following steps:
1. Obtain the flash image you want to update from the IBM Support Web site at http://www.ibm.com/support/ (look for the flash image for Type 8842, under AIX Diagnostics Version Number, this is the version used by the AIX diagnostics service aid).
v If you want to update the image from the local file system, put the image into
the /etc/microcode directory on the system prior to running this service aid.
v If you want to update the image from media (diskette or optical media), put
the image on the media of choice prior to running this service aid.
Run diagnostics.
2. v If you have booted AIX, login as ″root″ or use the CE login; then, at the
command line, enter:
diag
v Otherwise, boot standalone diagnostics; then press Enter.
52 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
3. From the Function Selection menu, choose Task Selection.
4. From the Tasks Selection List choose Update and Manage System Flash.
5. From the Update and Manage System Flash list:
v If, in Step 1 above, you have put the image in the /etc/microcode file system,
then choose the File System selection. At the flash update image file prompt, specify the directory that contains the image (normally /etc/microcode), and then Commit (PF7).
v If, in Step 1 above, you have put the image on optical media or diskette,
place the media containing the image into the drive and choose Removable Media and then Commit (PF7).
If you have booted standalone diagnostics from CD-ROM, you may
Note:
remove the standalone diagnostics CD-ROM media from the drive and replace it with the optical media containing the image you want to update.
6. Follow instructions displayed on the screen, for example: Choose Yes to proceed with the flash operation.
7. After the system flash completes, and the system reboots, you may remove the media containing the image from the diskette or optical drive, or from the /etc/microcode directory on the file system. You may also remove the temporary file created in the /var/update_flash_image directory after the reboot occurs, after you log in to the operating system.
Committing the temporary firmware image using AIX
After the system reboots successfully and once you are satisfied with the functionality of the new image, commit the update using the following AIX diagnostic commands:
1. Run diagnostics. v If you have booted AIX, login as ″root″ or use the CE login; then, at the
command line, enter:
diag
v Otherwise, boot standalone diagnostics; then press Enter.
From the ″Function Selection menu, choose ″Task Selection.
2.
3. From the Tasks Selection List choose Update and Manage System Flash.
4. Choose Commit the Temporary Image.
5. Choose Yes to commit the image.
6. Press F10 to exit diagnostics
This selection commits the Temporary system firmware image to the
Note:
Permanent image when booted from the Temporary image.
Verifying the system firmware levels using AIX
To verify levels of system firmware on the Permanent and Temporary sides, use the AIX diagnostics “Update and Manage System Flash” (see “Updating the system flash using AIX” on page 52). The diagnostics function displays the system firmware image level for both the Permanent and Temporary sides as well as an indication as to which side was used for the current boot cycle. Following is an example screen:
Chapter 7. Diagnostics 53
UPDATE AND MANAGE FLASH 802810
The current permanent system firmware image is 2b204_310 The current temporary system firmware image is 2b204_310 The system is currently booted from the temporary firmware image.
Move cursor to selection, then press ’Enter’.
Validate and Update System Firmware Validate System Firmware Commit the Temporary Image
F1=Help F10=Exit F3=Previous Menu
If the system was booted using the permanent image instead of the temporary image (as shown in the above example), the screen example would show:
The system is currently booted from the permanent firmware image.
and the last selection option is changed to:
Reject the Temporary Image
The firmware version information displayed by the AIX diagnostics (2b204_310 in the example shown above) may be different than the information displayed by the management module (see“Determination of current server firmware levels” on page
51). Cross reference information is given in the firmware information (“Blade Server Firmware - IBM BladeCenter JS20”) on the IBM Support Web site at http://www.ibm.com/support/, as well as in the README file for the firmware image.
Recovering the system firmware code
If the system firmware code has become damaged, such as from a power failure during a flash update, the blade server might appear to be nonfunctional (no progress codes or firmware codes). Yo u can recover the system firmware code by using the system firmware code page jumper (J14).
Note: To obtain a system firmware flash image, download a system firmware file
from http://www.ibm.com/support/.
The system firmware is contained in two separate images in the flash memory of the blade server (primary and backup).
Note: The primary image is also known as the TEMP side of the flash module. The
backup image is also known as the PERM side of the flash module.
If the primary image becomes damaged, such as from a power failure during a flash update, you can recover the primary image from the backup image. If this occurs, you can see one of two symptoms:
v The system automatically starts up from backup. This is indicated by the error
code 20D00902.
v The system automatically starts up from backup. This is indicated by an event
log message “OS Watchdog Triggered”.
v The system hangs or is non-responsive after the system is started with no
checkpoints.
54 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Note: If the system hangs or is non-responsive after starting the system, go to
“Startup problems” on page 152.
If the blade server hangs, you can force the blade server to start the backup image by using the code page jumper (J14).
Recovery of system firmware code using service aids
Linux on pSeries service aids for hardware diagnostics are available for customers who have installed and are running Linux. Users can install these free diagnostics tools for effective diagnosis and repair of the system in the rare instance when a system error occurs.
This service aid toolkit provides the key tools required to take advantage of the inherent pSeries hardware RAS functions as outlined in the Linux on pSeries RAS Whitepaper available from http://techsupport.services.ibm.com/server/ Linux_on_pSeries/images/Linux_RAS.pdfsuch log analysis. With the toolkit installed, problem determination and correction is greatly enhanced and the likelihood of an extended system outage is reduced.
The Linux service aids for hardware diagnostics are separate from the operating system installation and are available for download from the following Web site: http://techsupport.services.ibm.com/server/lopdiags/.
as first failure data capture and error
Note: The Update_Flash command can only be performed if the Linux service tools
have been installed on the JS20 blade server.
Starting the backup image
To force the blade server to start the backup image, complete the following steps:
Note: Do not perform these steps if the system error code 20D00902 has already
occurred on the blade server.
1. Turn off the blade server and remove it from the BladeCenter unit (see “Removing the blade server from the BladeCenter unit” on page 73).
2. Open the blade server cover (see “Opening the blade server cover” on page
74).
3 2 1
3. Locate jumper J14 (system firmware code page jumper) on the system board.
4. Remove jumper J14 from pins 1 and 2, and reinstall on Pins 2 and 3 of J14 to enable system firmware recovery mode.
System firmware code page jumper (J14)
Chapter 7. Diagnostics 55
5. Replace the cover and reinstall the blade server in the BladeCenter unit; then, restart the blade server.
6. If the blade server starts with the operating-system prompt, see “Recovering the primary image.” If the blade server does not start with the operating-system prompt, replace the system board (see “Replacing the system board” on page
87).
If the blade server does not restart, you must replace the system board (see
Note:
“Replacing the system board” on page 87).
Recovering the primary image
To recover the primary image, you must perform the reject function. The reject function copies the backup image into the primary image.
To perform the reject function, complete the following steps:
Note: If the operating system is Linux, begin with Step 001. If the operating
system is AIX, begin with Step 002.
001 Complete the following steps:
Step
1. If you have not installed the ppc64 Linux utilities, perform the installation now. See http://techsupport.services.ibm.com/server/ lopdiags/.
2. Reject the primary image. From the command line, type
update_flash -r
then, go to Step 003.
002 Complete the following steps:
Step
1. Reject the primary image. From the command line, type
/usr/lpp/diagnostics/bin/update_flash -r
then, go to Step 003.
or
Note: Menu items that do not pertain will NOT be displayed if
there is no temporary image; the “Reject the Temporary Image” menu entry will not appear as a possible selection.
2. From diagnostics, either concurrent or standalone: a. Run diagnostics
v If you have booted AIX, login as ″root″ or use the CE
login; then, at the command line, enter:
diag
v Otherwise, boot standalone diagnostics; then press Enter.
From the Function Selection menu, choose Task
b.
Selection.
c. From the Tasks Selection List choose Update and
Manage System Flash. d. Choose Reject the Temporary Image. e. Choose Yes to reject the image. f. Press F10 to exit diagnostics.
56 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Note: This selection rejects the temporary system firmware
image when booted from the permanent image. This results in the temporary image being overwritten by the permanent image.
Continue with Step 003.
3.
003 Shut down the blade server using the operating system.
Step Step 004 If you have not moved jumper J14, go to Step 006. Step 005 If you moved jumper J14, complete the following steps:
1. Turn off the blade server.
2. Remove the blade server from the BladeCenter unit (see “Removing the blade server from the BladeCenter unit” on page
73).
3. Open the blade server cover (see “Opening the blade server cover” on page 74).
3 2 1
System firmware code page jumper (J14)
4. Locate jumper J14 (system firmware code page jumper) on the system board.
5. Remove jumper J14 from pins 2 and 3, and reinstall on pins 1 and 2 of J14 for storage of the recovery mode.
6. Replace the cover and reinstall the blade server in the BladeCenter unit.
006 Restart the blade server.
Step
Note: You might need to update the firmware code to the latest
version. See http://www.ibm.com/pc/support for more information about how to update the firmware code.
Statement 21
CAUTION: Hazardous energy is present when the blade server is connected to the power source. Always replace the blade cover before installing the blade server.
Chapter 7. Diagnostics 57
58 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Chapter 8. General AIX and xSeries standalone diagnostic information
This chapter describes standalone diagnostics for AIX and running the standalone diagnostics from CD-ROM.
Information for general diagnostic systems running the AIX operating system
Information for general diagnostic systems running the AIX operating system is provided in this section.
For information about standalone CD AIX diagnostics, see “Running the standalone diagnostics from CD-ROM” on page 25.
Information in this section is common to all JS20 system units.
Any service information or diagnostic procedure that is specific to a system unit or device is a separate procedure for that system unit or device.
AIX operating system message files
English is the default language displayed by the diagnostic programs when run from disk. If you want to run the diagnostic programs in a language other than English, you must install on the system the AIX operating system message locale file set for the desired language you want displayed.
AIX diagnostic and the standalone diagnostic tasks provide the capability to display device and adapter microcode levels as well as update device and adapter microcode. AIX diagnostic tasks also provide the capability to update firmware.
Use the Update and Manage System Flash task to update a system’s firmware. When the flash update is complete, the system automatically reboots. Microcode images can be installed from disk, diskette, or NIM server. For additional information, refer to “Update and manage system flash using AIX diagnostics” on page 52.
Use the microcode download for systems using AIX 5.2.0.30 or later task to install microcode onto devices and adapters. This task presents a list of resources that are currently installed and supported by this task. Microcode images can be installed from disk, diskette, or NIM server. For additional information, refer to “Download microcode” on page 64. For adapters and devices with microcode that can be updated but are not supported by this task, refer to the manufacturer’s instructions.
For systems not using AIX, these tasks can be used via the “Running standalone diagnostics from a management (NIM) server” on page 68. Otherwise, refer to the corresponding documentation for the operating system on installing microcode.
CE login
CE login enables a user to perform operating system commands that are required to service the system without being logged in as a root user. CE login must have a role of Run Diagnostics and a primary group of System. This enables the user to:
v Run the diagnostics including the service aids, certify, format, and so forth.
© Copyright IBM Corp. 2003 59
v Run all the operating system commands run by system group users. v Configure and unconfigure devices that are not busy.
In addition, CE login can have Shutdown Group enabled to allow:
v Use of the Update System Microcode service aid. v Use of shutdown and reboot operations.
use CE login, ask the customer to create a user section of the AIX 5L Version
To
5.3 System Management Guide: Operating System and Devices. After this is set up, you will need to obtain the user name and password from the customer to log in with these capabilities. The recommended CE login user name is qserv.
Missing resources
In diagnostics version 5.2.0 and later, missing devices are identified on the Diagnostic Selection screen by an uppercase M preceding the name of the device that is missing. The Diagnostic Selection menu is displayed any time you run the diagnostic routines or the advanced diagnostics routines. The Diagnostic Selection menu can also be entered by running diag -a when there are missing devices or missing paths to a device.
When a missing device is selected for processing, the Missing Resource menu will ask whether the device has been turned off, removed from the system, moved to a different physical location, or if it is still present. When a single device is missing, the fault is probably with that device. When multiple devices with a common parent are missing, the fault is most likely related to a problem with the parent device. The diagnostic procedure may include testing the device’s parent, analyzing which devices are missing, and any manual procedures that are required to isolate the problem.
v Symptom response:
The Missing Resource menu is displayed or the letter M is displayed alongside a resource in the resource list.
v Action:
If the Missing Resource menu is displayed, follow the displayed instructions until either the Diagnostic Selection menu or an SRN is displayed. If an M is displayed in front of a resource (indicating that it is missing), select that resource; then, choose Commit (F7 key).
If an 8-digit error code is displayed, go to the“Firmware error codes” on page
If an SRN is displayed, record it and go to “SRN tables” on page 110.
102, find the error and perform the listed action.
Automatic diagnostic tests
All automatic diagnostic tests are run after the system unit is turned on and before the AIX operating system is loaded. The automatic diagnostic tests display progress indicators (or checkpoints) to track test progress. If a test stops or hangs, the checkpoint for that test remains on the console to identify the unsuccessful test.
Configuration program
The configuration program determines what features, adapters, and devices are present on the system. The configuration program, which is part of the AIX operating system, builds a configuration list that is used by the diagnostic programs to control which tests are run during system checkout.
60 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Diagnostic programs
This section provides an overview of the various diagnostic programs.
The diagnostic controller runs as an application program on the AIX operating system and carries out the following functions:
v Displays diagnostic menus v Checks availability of needed resources v Checks error log entries under certain conditions v Loads diagnostic application programs v Loads task and service aid programs v Displays test results
test an adapter or device, select the device or adapter from the Diagnostic
To Selection menu. The diagnostic controller then loads the diagnostic application program for the selected device or adapter. The diagnostic application program loads and runs test units to check the functions of the device or adapter.
The diagnostic controller checks the results of the tests done by the diagnostic application and determines the action needed to continue the testing.
The amount of testing that the diagnostic application does depends on the mode (concurrent/standalone) under which the diagnostic programs are running.
Error log analysis
When you select Diagnostics, the Diagnostic Selection menu displays.
Note: Other menus may display before the Diagnostic Selection menu appears.
This menu allows you to select the purpose for running diagnostics. When you select the Problem Determination option, the diagnostic programs read and analyze the contents of the error log.
Note: Most hardware errors in the operating system error log contain “sysplanar0”
If the error log contains recent errors (approximately the last 7 days), the diagnostic programs automatically select the diagnostic application program to test the logged function. If there are no recent errors logged or the diagnostic application program runs without detecting an error, the Diagnostic Selection menu is displayed. This menu allows you to select a resource for testing. If an error is detected while the diagnostic application program is running, the A problem was detected screen displays a Service Request Number (SRN).
as the resource name. The resource name identifies the resource that detected the error; it does not indicate that the resource is faulty or should be replaced. Use the resource name to determine the appropriate diagnostic to analyze the error.
If there are no recent errors logged or the diagnostic application program runs without detecting an error, the Diagnostic Selection menu is displayed.
This menu allows you to select a resource for testing. If an error is detected while the diagnostic application program is running, the A problem was detected screen displays a Service Request Number (SRN).
Chapter 8. General AIX and xSeries standalone diagnostic information 61
Introducing tasks and service aids
The AIX diagnostic package contains programs that are called Tasks. Tasks can be thought of as performing a specific function on a resource; for example, running diagnostics or performing a service aid on a resource. This chapter describes the tasks available in AIX Diagnostics version 5.2 and later.
Notes:
1. Some programs are only accessible from Online Diagnostics in Service or Concurrent mode, while others might be accessible only from Standalone Diagnostics.
2. The specific tasks available will be dependent on the hardware attributes or capabilities of the system you are servicing. Not all service aids or tasks will be available on all systems.
perform one of these tasks, use the Task Selection option from the Function
To Selection menu.
After a task is selected, a resource menu may be presented showing all resources supported by the task.
A fast-path method is also available to perform a task by using the diag command and the -T flag. By using the fast path, the user can bypass most of the introductory menus to access a particular task. The user is presented with a list of resources available to support the specified task.
The fast-path tasks are as follows:
v Certify Certifies media v Chkspares Checks for the availability of spare sectors v Download Downloads microcode to an adapter or device v Disp_mcode Displays current level of microcode v Format Formats media v Identify Remove Identifies and removes devices (hot-plug). To run these tasks
directly from the command line, specify the resource and other task-unique flags. Use the descriptions in this chapter to understand which flags are needed for a given task.
Task and service aid functions
If a device does not show in the test list or you suspect that a device’s diagnostic package is not loaded, check by using the Display Configuration and Resource List task. If the device you want to test has a plus (+) sign or a minus (-) sign preceding its name, the diagnostic package is loaded. If the device has an asterisk (*) preceding its name, the diagnostic package for the device is not loaded or is not available. Tasks and service aids provide a means to display data, check media, and check functions without being directed by the hardware problem determination procedure. Refer to “Tasks (service aids)” on page 63 for a list of tasks and service aids.
AIX automatic error log analysis (diagela)
Automatic error log analysis (diagela) provides the capability to perform error log analysis when a permanent hardware error is logged by enabling the diagela program on all RPA platforms. The diagela program determines if the error should be analyzed by the diagnostics.
62 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
If the error should be analyzed, a diagnostic application is invoked and the error is analyzed. No testing is done if the diagnostics determine that the error requires a service action. Instead, it sends a message to your console, or to all system groups. The message contains the SRN. Running diagnostics in this mode is similar to using the diag -c, -e, -d device command.
To activate the automatic error log analysis feature on systems running AIX as the operating system, log in as root user (or use CE login) and type the following command:
/usr/lpp/diagnostics/bin/diagela ENABLE
To disable the automatic error log analysis feature on systems running AIX, log in as root user (or use CE login) and type the following command:
/usr/lpp/diagnostics/bin/diagela DISABLE
The diagela program can also be enabled and disabled using the Periodic Diagnostic Service Aid.
Error log analysis
This section provides information on error log analysis.
v Error log analysis is the analysis of the AIX error log entries. v Error log analysis is part of the diagnostic applications. The analysis is started by
v Error log analysis is only performed when running online diagnostics. v Error log analysis is not performed when running standalone diagnostics. v Error log analysis only reports problems if the errors have reached defined
v Permanent errors do not necessarily mean a part should be replaced. v Automatic Error Log Analysis (diagela) provides the capability to do error log
selecting a device from the Diagnostic Selection menu and then using the diag command or selecting the Run Error Log Analysis task.
thresholds. Thresholds can be from 1 to 100, depending on the error.
analysis whenever a permanent hardware error is logged.
Log repair action
The diagnostics perform error log analysis on most resources. The default time for error log analysis is seven days; however, this time can be changed from 1 to 60 days using the Display or Change Diagnostic Run Time Options task.
To prevent false problems from being reported when error log analysis is run, repair actions need to be logged whenever a FRU is replaced. A repair action can be logged by using the Log Repair Action task or by running diagnostics in system verification mode.
The Log Repair Action task lists all resources. Replaced resources can be selected from the list, and when Commit (F7 key) is selected, a repair action is logged for each selected resource.
Tasks (service aids)
These are tasks that might be available to the JS20 blade server:
v Add Resource to Resource List v AIX Shell Prompt v Analyze Adapter Internal Log v Automatic Error Log Analysis and Notification
Chapter 8. General AIX and xSeries standalone diagnostic information 63
v Backup and Restore Media v Certify Media v Change Hardware Vital Product Data v Configure Reboot Policy v Configure Surveillance Policy v Delete Resource from Resource List v Disk Maintenance v Display Configuration and Resource List v Display Firmware Device Node Information v Display Hardware Error Report v Display Hardware Vital Product Data v Display Machine Check Error Log v Display Microcode Level v Display Multipath I/O (MPIO) Device Configuration v Display or Change Bootlist v Display or Change Diagnostic Run Time Options v Display Previous Diagnostic Results v Display Service Hints v Display Software Product Data v Display System Environmental Sensors v Display USB Devices v Download Microcode v Fibre Channel RAID Service Aids v Format Media v Gather System Information v Generic Microcode Download v Hot-Plug Task v Identify Indicators v Identify and System Attention Indicators v Local Area Network Analyzer v Log Repair Action v Periodic Diagnostics v RAID Array Manager v Run Diagnostics v Run Error Log Analysis v Run Exercisers v Save or Restore Hardware Management Policies v Save or Restore Service Processor Configuration (RSPC) v Spare Sector Availability v System Fault Indicator v System Identify Indicator v Update Disk-Based Diagnostics v Update and Manage System Flash
Download microcode
This service aid provides a way to copy microcode to an adapter or device. The service aid presents a list of adapters and devices that use microcode. After the adapter or device is selected, the service aid provides menus to guide you in checking the current level and installing the needed microcode.
This task can be run directly from the AIX command line. Most adapters and devices use a common syntax as identified in this section.
64 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
For many adapters and devices, microcode installation occurs and becomes effective while the adapters and devices are in use. It is recommended that a current backup be available and the installation be scheduled during a non-peak production period.
Notes:
1. If the source is /etc/microcode, the image must be stored in the /etc/microcode directory on the system. If the system is booted from a NIM server, the image must be stored in the usr/lib/microcode directory of the SPOT the client is booted from.
2. If the source is CD (cdX), the CD must be in ISO 9660 format. There are no restrictions as to what directory in which to store the image.
3. If the source is diskette (fdX), the diskette must be in backup format and the image stored in the /etc/microcode directory.
The following is the common syntax command:
diag [-c ]-d <device>-T "download [-s {/etc/microcode|<source>}][-1 {latest|previous}[-f ]"
Flag descriptions are as follows:
Flag Description
-c No console mode. Run without user interaction.
-d <device Run the task on the device or adapter specified.
-T download Install microcode.
-s /etc/microcode Microcode image is in /etc/microcode.
-s <source> Microcode image is on specified source. For example, fd0, cd0.
-l latest Install latest level of microcode. This is the default.
-l previous Install previous level of microcode.
-f Install microcode even if the current level is not on the source.
Update and manage system flash
Note: The firmware update can be done using the service aid or the AIX command
line.
This selection validates a new system firmware flash image and uses it to update the system temporary flash image. This selection can also be used to validate a new system firmware flash image without performing an update, commit the temporary flash image, and reject the temporary flash image.
Look for additional update and recovery instructions with the update kit. You need to know the fully-qualified path and file name of the flash update image file provided in the kit. If the update image file is on a diskette or optical media, the service aid can list the files on the diskette or optical media for selection. The diskette must be a valid backup-format diskette.
Refer to the update instructions with the kit to determine the current level of the system unit or service processor flash memory.
When this service aid is run from online diagnostics, the flash update image file is copied to the /var file system. It is recommended that the source of the microcode
Chapter 8. General AIX and xSeries standalone diagnostic information 65
that you want to download be put into the /etc/microcode directory on the system. If there is not enough space in the /var file system for the new flash update image file, an error is reported. If this error occurs, exit the service aid, increase the size of the /var file system, and retry the service aid. After the file is copied, a screen requests confirmation before continuing with the flash update. When you continue the update flash, the system reboots using the shutdown -u command. The system does not return to the diagnostics, and the current flash image is not saved. After the reboot, you can remove the /var/update_flash_image file.
When this service aid is run from standalone diagnostics, the flash update image file is copied to the file system from diskette, optical media, or from the NIM server. Using a diskette, the user must provide the image on backup format diskette because the user does not have access to remote file systems or any other files that are on the system. If using the NIM server, the microcode image must first be copied onto the NIM server in the /usr/lib/microcode directory pointed to the NIM SPOT (from which you plan to have the NIM client boot standalone diagnostics) prior to performing the NIM boot of diagnostics. Next, a NIM check operation must be run on the SPOT containing the microcode image on the NIM server. After performing the NIM boot of diagnostics, one can use this service aid to update the microcode from the NIM server by choosing the /usr/lib/microcode directory when prompted for the source of the microcode that you want to update. If not enough space is available, an error is reported stating that additional system memory is needed. After the file is copied, a prompt requests confirmation before continuing with the flash update. When you continue with the update, the system reboots using the reboot -u command. Yo u might receive a Caution: Some process(es) wouldn’t die message during the reboot process; you can ignore this message. The current flash image is not saved.
You can use the update_flash command in place of this service aid. The command is located in the /usr/lpp/diagnostics/bin directory. The command syntax is as follows:
update_flash [-q | -v] -f file_name update_flash [-q | -v] -D device_name -f file_name update_flash [-q | -v] -D update_flash [-l] update_flash -c update_flash -r
Important: The update_flash command reboots the entire system. Do not use this
command if more than one user is logged in to the system.
Flag descriptions are as follows:
Flag Description
-D Specifies that the flash update image file is on diskette. The device_name variable specifies the device. The default device_name is /dev/fd0.
-f Flash update image file source. The file_name variable specifies the fully qualified path of the flash update image file.
-l Lists the files on a diskette, from which the user can choose a flash update image file.
-q Forces the update_flash command to update the flash EPROM and reboot the system without asking for confirmation.
-v Validates the flash update image. No update will occur.
66 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Flag Description
-c Commits the temporary flash image when booted from the temporary image. This overwrites the permanent image with the temporary image.
-r Rejects the temporary image when booted from the permanent image. This overwrites the temporary image with the permanent image.
Using the standalone CD-ROM and online current diagnostics
The diagnostics consist of standalone diagnostics and online diagnostics.
v The standalone diagnostics must be booted before they are run. If booted, they
have no access to the AIX error log or the AIX configuration data.
v Online diagnostics are resident with AIX on the disk or server. They can be
booted and run concurrently (called concurrent mode) with other applications. They have access to the AIX error log and the AIX configuration data.
Notes:
v If this system unit is attached to another system, be sure you isolate this system unit
before stopping the operating system or running diagnostic programs.
v The AIX operating system must be installed in order to run online diagnostics. If the AIX
operating system is not installed, use the standalone diagnostic procedures.
Standalone and online diagnostics operating considerations
Before you use the diagnostics, consider the following information:
v Run online diagnostics in concurrent mode whenever possible, unless otherwise
directed. The online diagnostics perform additional functions as compared to standalone diagnostics. The AIX error log functions are only available when diagnostics are run from the disk (concurrent diagnostic) drive.
v When running online diagnostics, device support for some devices may not have
been installed. If this is the case, that device does not appear in the resource list.
v When running standalone diagnostics, device support for some devices may be
contained on supplemental diagnostic media. If this is the case, the device does not appear in the resource list when running diagnostics unless the supplemental media has been processed.
Running online diagnostics
Consider the following information when you run the online diagnostics from a server or a disk:
v The diagnostics cannot be loaded and run from a disk until the AIX operating
system has been installed and configured.
v When the system is running in a full machine partition, then, if the diagnostics
were loaded from disk or a server, you must shut down the AIX operating system before powering off the system unit to prevent possible damage to disk data. This is done in one of two ways:
If the diagnostic programs were loaded in Standalone mode, press the F3 key
until Diagnostic Operating Instructions displays; then follow the displayed instructions to shut down the AIX operating system.
If the diagnostic programs were loaded in maintenance or concurrent mode,
enter the shutdown -F command.
Chapter 8. General AIX and xSeries standalone diagnostic information 67
v Under some conditions the system may stop, with instructions displayed on
attached displays and terminals.
Follow the instructions to select a console display.
Running the online diagnostics in concurrent mode
Use concurrent mode to run online diagnostics on some of the system resources while the system is running normal system activity. Because the system is running in normal operation, the following resources cannot be tested in concurrent mode:
v Adapters connected to paging devices, or disk drive used for paging v Memory v Microprocessor
Three levels of testing exist in concurrent mode:
v The share-test level tests a resource while the resource is being shared by
programs running in the normal operation. This testing is mostly limited to normal commands that test for the presence of a device or adapter.
v The sub-test level tests a portion of a resource while the remaining part of the
resource is being used in normal operation. For example, this test could test one port of a multiparty device while the other ports are being used in normal operation.
v The full-test level requires the device not be assigned to or used by any other
operation. This level of testing on a disk drive might require the use of the vary command. The diagnostics display menus to allow you to vary off the needed resource. Error log analysis is done in concurrent mode when you select the Problem Determination option on the Diagnostic Mode Selection menu.
log analysis is done in concurrent mode when you select the Problem
Error Determination option on the Diagnostic Mode Selection menu.
Running standalone diagnostics from a management (NIM) server
A client system connected to a network with a Network Installation Management (NIM) server is capable of booting the standalone diagnostics from the NIM server if the client system is registered on the NIM server, and if the NIM boot settings on both the NIM server and the client system are correct.
Consider the following information when running standalone diagnostics from a NIM server:
1. For NIM clients that have adapters that would normally require that supplemental media be loaded when standalone diagnostics are run from CD-ROM, the support code for these adapters must be loaded into the directory pointed to by the NIM SPOT from which you wish to boot that client. Before running standalone diagnostics on these clients from the NIM server, the NIM server system administrator must ensure that any needed support for these devices is loaded onto the server.
2. Use one of the following methods to determine the amount of available system memory:
v Run the Display Resource Attributes task for resource. v Use the Config option under System Management Services (see the system
unit service guide).
v Use the following AIX command: lsattr -E -l mem0
68 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
3. All operations to configure the NIM server require root authority.
4. If you replace the network adapter in the client, the network adapter hardware address for the client must be updated on the NIM server.
5. The Control state (Cstate) for standalone clients on the NIM server should be kept in the Diagnostic Boot has been Enabled state.
6. On the client system, the NIM server network adapter should be put in the boot list after the boot disk drive. This allows the system to boot up in standalone diagnostics from the NIM server if there is a problem booting from the disk drive. Refer to the Multiboot section under the SMS chapter in the service guide for the client system to obtain information about setting the boot list.
NIM server configuration
Refer to the Network Installation Management Guide and Reference for information on the following:
v Register a client on the NIM server v Enable a client to run diagnostics from the NIM server
verify that the client system is registered on the NIM server and diagnostic boot
To is enabled, run the following command from the command line on the NIM server: lsnim -a Cstate -Z ClientName
Refer to the following table for system responses.
Note: The ClientName is the name of the system on which you want to run the
standalone diagnostics.
System response Client status
#name:Cstate: ClientName: diagnostic boot has been enabled:
#name:Cstate: ClientName:ready for a NIM operation: or #name:Cstate: ClientName:BOS installation has been enabled:
The client system is registered on the NIM server and enabled to run diagnostics from the NIM server.
The client system is registered on the NIM server but not enabled to run standalone diagnostics from the NIM server.
Note: If the client system is registered on the NIM server but Cstate
has not been enabled, no data will be returned.
0042-053 lsnim: there is no NIM object named
The client is not registered on the NIM server.
ClientName
Client configuration and booting ERserver standalone diagnostics from the NIM server
To run standalone diagnostics on a client system from the NIM server, complete the following steps:
1. Remove all removable media (tape or CD-ROM disc).
2. Stop all programs, including the AIX operating system (get help if needed).
3. If you are running standalone diagnostics in a full machine partition, verify with the system administrator and system users that the system unit can be shutdown. Stop all programs, including the operating system (refer to the operating system documentation). Verify with the system administrator and system users using that partition that all applications on that partition must be stopped, and that the partition will be rebooted. Stop all programs on that partition including the operating system.
Chapter 8. General AIX and xSeries standalone diagnostic information 69
4. If the system is running in a full-machine partition, turn on the system unit power. Restart the AIX operating system in the system you wish to run online diagnostics.
5. Enter any requested passwords.
6. Select Utilities.
7. Depending on the console type, select [was RIPL or Remote Initial Program Load Setup].
8. Depending on the console type, select [Set Address or IP Parameters].
9. Enter the client address, server address, gateway address (if applicable), and subnet mask into the Remote Initial Program Load (RIPL). If there is no gateway between the NIM server and the client, set the gateway address to
0.0.0.0. To determine if there is a gateway, either ask the system network administrator or compare the first 3 octets of the NIM server address and the client address. If they are the same, (for example, if the NIM server address is
9.3.126.16 and the client address is 9.3.126.42, the first 3 octets (9.3.126) are the same), then set the gateway address in the RIPL field to 0.0.0.0.
10. If the NIM server is setup to allow the pinging of the client system, use the ping option in the RIPL utility to verify that the client system can ping the NIM server. Under the Ping utility, choose the network adapter that provides the attachment to the NIM server to do the ping operation. If the ping comes back with an OK prompt, the client is prepared to boot from the NIM server. If ping returns with a FAILED prompt, the client does not proceed with the boot.
Note: If the ping fails, refer to “Boot problem resolution” on page 153; then,
follow the steps for network boot problems.
the following procedure to temporarily change the system boot list so that the
Use network adapter attached to the NIM server network, is first in the boot list.
The system should start loading packets while doing a bootp from the network. Follow the instructions on the screen to select the system console. If Diagnostics Operating Instructions Version x.x.x is displayed, standalone diagnostics has loaded successfully. If the AIX login prompt displays, standalone diagnostics did not load.
Check the following items:
v The boot list on the client might be incorrect. v Cstate on the NIM server might be incorrect. v There might be network problems preventing you from connecting to the NIM
server. Verify the settings and the status of the network. If you continue to have problems, refer to“Boot problem resolution” on page 153; then, follow the steps for network boot problems. After running diagnostics, reboot the system and use BladeCenter management screens to change the boot list sequence back to the original settings.
the settings and the status of the network. If you continue to have problems,
Verify refer to“Boot problem resolution” on page 153.
After running diagnostics from NIM server, change the boot list to the original boot list.
70 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Chapter 9. Installing options
This chapter provides instructions for adding options or customer-replaceable units (CRUs) to the blade server. CRUs are easily replaceable components, such as memory modules, hard disk drives, and I/O expansion cards. (Some removal instructions are provided in case you need to remove one option or CRU to install another.)
Installation guidelines
Before you begin, read the following information:
v Read Appendix B, “Safety information,” on page 163, and the guidelines in
“Handling static-sensitive devices.” This information will help you work safely with the blade server and options.
v Read the information in “Preinstallation checklist” on page 9. v Back up all important data before you make changes to disk drives. v For a list of supported options for the blade server, go to http://www.ibm.com/pc/
us/compat/.
v Before you remove a hot-swap blade server from the BladeCenter unit, you must
shut down the operating system by typing shutdown -h now. If the blade server was not turned off, press the power-control button (behind the blade-server control panel door) to turn off the blade server. You do not have to shut down the BladeCenter unit itself.
System reliability guidelines
To help ensure proper cooling and system reliability, make sure that:
v The ventilation holes on the blade server are not blocked. v Each of the blade bays on the front of the BladeCenter unit has a blade server or
filler blade installed. Do not operate the BladeCenter unit for more than 1 minute without a blade server or filler blade installed in each blade bay.
v You have followed the reliability guidelines in the documentation that comes with
the BladeCenter unit.
v You have not installed any small computer system interface (SCSI) devices. The
blade server does not support SCSI devices. If you attach SCSI devices to the blade server, these devices will not be recognized or configured, and they will not operate.
Handling static-sensitive devices
Attention: Static electricity can damage the blade server, the BladeCenter unit,
and other electronic devices. To avoid damage, keep static-sensitive devices in their static-protective packages until you are ready to install them.
To reduce the possibility of damage from electrostatic discharge, observe the following precautions:
v When working on the BladeCenter T unit, use an electrostatic discharge (ESD)
wrist strap, especially when you will be handling modules, options, and blade servers. To work properly, the wrist strap must have a good contact at both ends (touching your skin at one end and firmly connected to the ESD connector on the front or back of the BladeCenter T unit).
© Copyright IBM Corp. 2003 71
v Limit your movement. Movement can cause static electricity to build up around
you.
v Handle the device carefully, holding it by its edges or its frame. v Do not touch solder joints, pins, or exposed printed circuitry. v Do not leave the device where others can handle and damage it. v While the device is still in its static-protective package, touch it to any unpainted
metal surface of the BladeCenter chassis or any unpainted metal surface on any other grounded rack component in the rack in which you are installing the device for at least 2 seconds. (This drains static electricity from the package and from your body.)
v Remove the device from its package and install it directly into the blade server or
BladeCenter unit without setting down the device. If it is necessary to set down the device, put it in its static-protective package. Do not place the device on your BladeCenter chassis or on a metal surface.
v Take additional care when handling devices during cold weather. Heating reduces
indoor humidity and increases static electricity.
72 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Removing the blade server from the BladeCenter unit
The following illustration shows an example of how to remove the blade server from a typical BladeCenter unit; the orientation of the blade server depends on the type of BladeCenter unit you have.
Note: The illustrations in this document might differ slightly from your hardware.
Attention:
v To maintain proper system cooling, do not operate the BladeCenter unit for more
than 1 minute without a blade server or filler blade installed in each blade bay.
v Note the number of the bay that contains the blade server that you will remove.
You will need this information if you decide to reinstall the blade server in the BladeCenter unit. If you reinstall the blade server, be sure to reinstall it in the same bay from which it was removed. Reinstalling a blade server into a different bay than the one from which it was removed could have unintended consequences, such as incorrectly reconfiguring the blade server. Some blade server configuration information and update options are established according to bay number. If you reinstall the blade server into a different bay, you might have to reconfigure the blade server.
The blade server is a hot-swap device, and the blade bays in the
Note:
BladeCenter unit are hot-swap bays. Therefore, you can install or remove the blade server without removing power from the BladeCenter unit. However, you must turn off the blade server before removing it from the BladeCenter unit.
Complete the following steps to remove the blade server:
1. Read the safety information beginning on page iii and “Installation guidelines” on page 71
2. If the blade server is operating, the power-on LED is lit continuously (steady). Shut down the operating system by typing the shutdown -h now command. Refer to your operating system documentation. If the blade server was not turned off, press the power-control button (behind the blade-server control-panel door) to turn off the blade server. See “Blade server controls and LEDs” on page 14 for more information about the location of the power-control button.
Attention: Wait at least 30 seconds for the hard disk drives to stop spinning,
before proceeding to the next step.
3. Open the two release levers as shown in the illustration. The blade server moves out of the bay approximately 0.6 cm (0.25 inch).
4. Pull the blade server out of the bay.
5. Place either a filler blade or a new blade server in the bay within 1 minute.
Chapter 9. Installing options 73
Opening the blade server cover
The following illustration shows how to open the cover on the blade server.
Cover pins
Blade-cover release (blue)
Blade-cover release (blue)
Complete the following steps to open the blade server cover:
1. Read “Important safety information” on page iii and “Installation guidelines” on page 71
2. Carefully place the blade server on a flat, static-protective surface, with the cover side up.
3. Press the blue blade-cover release on each side of the blade server and lift the cover open, as shown in the illustration.
4. Lift the cover from the blade server and set it aside.
Statement 21:
CAUTION: Hazardous energy is present when the blade server is connected to the power source. Always replace the blade cover before installing the blade server.
74 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Removing the blade server bezel assembly
Before you can replace a defective system-board assembly or blade-server bezel assembly, you must first remove the blade-server bezel assembly. The following illustration shows how to remove the bezel assembly from a blade server.
Bezel-assembly release
Control panel connector
Complete the following steps to remove the blade-server bezel assembly:
1. Read the safety information beginning on page iii and “Installation guidelines” on page 71
2. Open the blade server cover.
3. Press the bezel-assembly release and pull the bezel assembly away from the blade server approximately 1.2 cm (0.5 inch).
4. Disconnect the control-panel cable from the control-panel connector.
5. Pull the bezel assembly away from the blade server.
6. Store the bezel assembly in a safe place.
Control-panel cable
Bezel-assembly release
Installing IDE hard disk drives
The blade server has two connectors on the system board for installing optional
2.5-inch integrated drive electronics (IDE) hard disk drives. Each IDE connector is
on a separate channel. Some models come with at least one IDE hard disk drive already installed.
Note: Some hard disk drives have Phillips screws; therefore, make sure that a
Phillips screwdriver is available.
Attention: To maintain proper system cooling, do not operate the BladeCenter
unit for more than 1 minute without a blade server or filler blade installed in each blade bay.
Chapter 9. Installing options 75
IDE drive
Tr ay
Riser card
IDE connector 2 (J2)
Short screws
IDE connector 1 (J1)
Attention:
v Drives must be installed in the following order: IDE connector 1 (J1) first, then
IDE connector 2 (J2).
v Do not install a hard disk drive in IDE connector 2 if you intend to also install an
optional I/O expansion card. The I/O expansion card occupies the same area as the second IDE hard disk drive.
v Do not press on the top of the hard disk drive when installing it. Pressing the top
could damage the hard disk drive.
v IDE hard disk drives must be set to primary (master). See the documentation that
came with your hard disk drive for instructions.
Complete
1. Read the safety information beginning on page iii and “Installation guidelines”
the following steps to install a 2.5-inch IDE hard disk drive:
on page 71
2. If the blade server is operating, shut down the operating system by typing the
shutdown -h now command. Refer to your operating system documentation. If
the blade server was not turned off, press the power-control button (behind the blade-server control-panel door) to turn off the blade server. See “Blade server controls and LEDs” on page 14 for more information about the location of the power-control button.
3. Remove the blade server from the BladeCenter unit. (See “Removing the blade server from the BladeCenter unit” on page 73 for instructions.) Carefully place the blade server on a flat, static-protective surface.
4. Open the blade server cover. See “Opening the blade server cover” on page 74 for instructions.
5. Insert the riser card from the option kit into an IDE connector on the blade server system board.
6. Place the tray from the option kit over the riser card as shown in the preceding illustration, aligning the tray with the screws on the system board. Note the four screws that are under the four screw holes in the tray. Set the tray aside and remove the four screws.
76 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
7. Replace the tray and secure the tray to the system board with screws from the hardware kit.
8. Set any jumpers or switches on the hard disk drive, if this requirement is specified on the drive label or in the documentation that comes with the drive.
9. Place the hard disk drive into the tray and, from the rear edge of the hard disk drive, push it into the connector on the riser card until the hard disk drive moves past the lever at the back of the tray. The hard disk drive clicks into place.
10. If you have other options to install or remove, do so now; otherwise, go to “Completing the installation” on page 90.
Installing memory modules
You can increase the amount of memory in the blade server by installing additional memory-module options. The following items describe the types of dual inline memory modules (DIMMs) that the blade server supports and other information that you must consider when installing DIMMs:
v The system board contains four DIMM connectors and supports two-way memory
interleaving.
v As of the date of this publication, the blade server supports a minimum of 512
MB and a maximum of 8 GB of system memory (depending on the blade server model). The DIMM options available are 256 MB, 512 MB, 1 GB, and 2 GB, with the following exceptions:
256 MB DIMMs are not supported by the 8842-4Tx model. 2 GB DIMMs are not supported by the 8842-21x, 8842-E1x, 8842-E2x,
8842-41x, and 8842-4Ax models.
Install only 2.5 V, 184-pin, double-data-rate (DDR), PC2700, registered
v
synchronous dynamic random-access memory (SDRAM) with error correcting code (ECC) DIMMs. For a current list of supported DIMMs for the blade server, go to http://www.ibm.com/pc/us/compat/.
v Install DIMMs in a matched pair. Each pair must be the same size, speed, type,
and technology. Yo u can mix compatible DIMMs from various manufacturers. The second pair of DIMMs does not have to be the same size as the first pair.
v After you install or remove a DIMM, the new configuration information is
automatically saved in the blade server firmware code.
The following illustration shows how to install DIMMs on the system board.
DIMM socket 4 (J40)
DIMM socket 3 (J32)
DIMM socket 2 (J28)
DIMM socket 1 (J25)
Chapter 9. Installing options 77
Before you begin, read the documentation that comes with the option.
78 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Complete the following steps to install a DIMM:
1. Read the safety information beginning on page iii and “Installation guidelines” on
page 71
2. If the blade server is operating, shut down the operating system by typing the
shutdown -h now command. Refer to your operating system documentation. If
the blade server was not turned off, press the power-control button (behind the blade-server control-panel door) to turn off the blade server. See “Blade server controls and LEDs” on page 14 for more information about the location of the power-control button.
3. Remove the blade server from the BladeCenter unit. See “Removing the blade
server from the BladeCenter unit” on page 73 for instructions.
4. Carefully place the blade server on a flat, static-protective surface.
5. Open the blade server cover. See “Opening the blade server cover” on page 74
for instructions.
6. Locate the DIMM connectors on the system board. Determine the connectors
into which you will install the DIMMs. The blade server comes with two 256 MB DIMMs installed in the DIMM 3 (J32)
and DIMM 4 (J40) memory connectors. When you install additional DIMMs, be sure to install them as a pair, in DIMM connectors 1 and 2 (J25 and J28).
Install the DIMMs in the following order:
Pair DIMM connectors
First 3 and 4 (J32 and J40) Second 1 and 2 (J25 and J28)
7. Touch the static-protective package that contains the DIMM option to any
unpainted metal surface on the BladeCenter chassis or any unpainted surface
on any other grounded rack component. Then, remove the DIMM from the package.
8. To install the DIMMs, repeat the following steps for each DIMM that you install:
a. Turn the DIMM so that the DIMM keys align correctly with the connector on
the system board. Ensure that the retaining clips are open.
Attention: To avoid breaking the retaining clips or damaging the DIMM
connectors, handle the clips gently.
b. Insert the DIMM by pressing the DIMM along the guides into the connector.
Make sure that the retaining clips snap into the closed positions.
Important: If there is a gap between the DIMM and the retaining clips, the
DIMM has not been correctly installed. In this case, open the retaining clips and remove the DIMM; then, reinsert the DIMM.
If you have other options to install or remove, do so now; otherwise, go to
9.
“Completing the installation” on page 90.
Installing an I/O expansion card
You can add an optional I/O expansion card (adapter) to the blade server to give the blade server additional network connections for communicating on a network.
When you add an I/O expansion card, you must make sure that the switch modules in I/O bays 3 and 4 on the BladeCenter unit both support the I/O expansion card network-interface type. For example, if you add an Ethernet expansion card to the blade server, the modules in I/O bays 3 and 4 on the BladeCenter unit must both be compatible with the Ethernet expansion card. All other I/O expansion cards installed on other blade servers in the BladeCenter unit must also be compatible
Chapter 9. Installing options 79
with these switch modules. In this example, you could then install two Ethernet switch modules, two pass-thru modules, or one Ethernet switch module and one pass-thru module. Because pass-thru modules are compatible with a variety of I/O expansion cards, installing two pass-thru modules would allow use of several different types of compatible I/O expansion cards within the same BladeCenter unit.
Important:
v Installation of an I/O expansion card requires removal of the hard disk drive that
is installed in IDE connector 2. The I/O expansion card occupies the same space as this hard disk drive and replaces it. You cannot install a hard disk drive in IDE connector 2 while an I/O expansion card is installed in the blade server.
v The Myrinet Cluster Expansion Card for IBM Eserver BladeCenter comes with a
cable for connection to the system board of a compatible device. However, the cable is not used in the BladeCenter JS20 Type 8842. Therefore, when you install a Myrinet Cluster Expansion Card for IBM Eserver BladeCenter into a BladeCenter JS20 Type 8842, do not connect the cable from the I/O expansion card to the system board.
v If you plan to install a Fibre Channel expansion card and use it for remote startup
(boot) operations, call the IBM Support Center for additional information. In the U.S. and Canada, call 1-800-IBM-SERV (1-800-426-7378). In other countries, go to http://www.ibm.com/planetwide/ to locate your support telephone numbers.
Attention: If the hard disk drive installed in IDE connector 2 contains any
information that you want to keep, back it up to another storage device.
80 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
The following illustration shows how to install an I/O expansion card on the blade server. The card is installed near IDE connector 2.
I/O expansion tray
D
R
N
E
H
W
IBM I/O expansion card
A
C
G IN
E
L
L
R
A
E
T
H
S
S
IN S E
R
P
I/O expansion card connector
Raised hook
Short screws
I/O expansion card connector
Complete the following steps to install an I/O expansion card:
1. Read the safety information beginning on page iii and “Installation guidelines” on page 71
2. If the blade server is operating, shut down the operating system by typing the
shutdown -h now command. Refer to your operating system documentation. If
the blade server was not turned off, press the power-control button (behind the blade-server control-panel door) to turn off the blade server. See “Blade server controls and LEDs” on page 14 for more information about the location of the power-control button.
3. Remove the blade server from the BladeCenter unit (see “Removing the blade server from the BladeCenter unit” on page 73 for information).
4. Carefully place the blade server on a flat, static-protective surface.
5. Open the cover (see “Opening the blade server cover” on page 74 for instructions).
6. Install the I/O expansion card tray: a. If there is no IDE hard disk drive in IDE connector 2, remove the four
screws as shown in the previous illustration. Then, continue with step 6c.
b. If an IDE hard disk drive is in IDE connector 2, remove the hard disk drive
and tray. Save the four long screws that secured the tray to the system board. Remove the riser card that connected the IDE hard disk drive to the blade server system board.
c. Secure the tray to the system board with the screws from the option kit, as
shown in the previous illustration.
Install the I/O expansion card:
7. a. Orient the I/O expansion card as shown in the previous illustration. b. Slide the notch in the narrow end of the card into the raised hook on the
tray; then, gently pivot the wide end of the card into the I/O expansion card connectors, as shown in the previous illustration.
Chapter 9. Installing options 81
Note: For device driver and configuration information to complete the
installation of the I/O expansion card, see the documentation that comes with the card. Some documentation might also be on the IBM
BladeCenter Documentation CD that comes with the BladeCenter unit.
For the latest editions of the IBM BladeCenter documentation, go to http://www.ibm.com/support/ on the World Wide Web.
8. If you have other options to install or remove, do so now; otherwise, go to “Completing the installation” on page 90.
Ethernet controller, switch module, and cabling requirements
One dual-port Gigabit Ethernet controller is integrated on the BladeCenter JS20 Type 8842 system board. To support Ethernet connections and the Serial Over LAN (SOL) feature and to configure the blade server, you must install an optional Ethernet-compatible switch module, such as the Nortel Networks Layer 2-7 GbE Switch Module for IBM Eserver BladeCenter or IBM 4-Port Gb Ethernet Switch Module for BladeCenter, in I/O bay 1 of the BladeCenter unit.
Each controller port provides a 1000-Mbps full-duplex interface for connecting to one of the Ethernet-compatible switch modules in I/O bays 1 and 2. If you plan to attach additional Ethernet devices to the blade server or the BladeCenter unit, you must install an optional Ethernet-compatible switch module, such as the Nortel Networks Layer 2-7 GbE Switch Module for IBM Eserver BladeCenter or IBM 4-Port Gb Ethernet Switch Module for BladeCenter, in I/O bay 3 or 4 of the BladeCenter unit, to support these additional Ethernet connections.
The optional Ethernet switch modules contain four ports with RJ-45 connectors. These connectors provide a 10/100/1000 Base-T interface (either at half-duplex or full duplex) for connecting twisted-pair cable to the Ethernet network. You must purchase and install a compatible cable to connect these devices. To connect an Ethernet controller port to a repeater or switch module, use an unshielded twisted pair (UTP) cable with RJ-45 connectors at both ends. For 100 Mbps or higher operation, Category 5 cabling is required. For 10 Mbps operation, Category 3 or Category 5 cabling is required.
Notes:
v For more information about Ethernet requirements, see the documentation that
comes with the Ethernet devices and the BladeCenter Type 8677 Installation and User’s Guide.
v For more information about installing, configuring, and using the Ethernet switch
modules, see the documentation that comes with the Ethernet switch module that you are using, such as the IBM 4-Port Gb Ethernet Switch Module for
BladeCenter Installation and User’s Guide or Nortel Networks Layer 2-7 GbE Switch Module for IBM BladeCenter Installation Guide.
v For more information about the SOL feature, see Chapter 3, “Configuration,” on
page 17, the IBM Eserver BladeCenter and BladeCenter T Serial Over LAN Setup Guide, and the BladeCenter and BladeCenter T Management Module Command-Line Interface Reference Guide.
82 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Replacing the battery
IBM has designed this product with your safety in mind. The lithium battery must be handled correctly to avoid possible danger. If you replace the battery, you must adhere to the following instructions.
Note: In the U. S., call 1-800-IBM-4333 for information about battery disposal.
If you replace the original lithium battery with a heavy-metal battery or a battery with heavy-metal components, be aware of the following environmental consideration. Batteries and accumulators that contain heavy metals must not be disposed of with normal domestic waste. They will be taken back free of charge by the manufacturer, distributor, or representative, to be recycled or disposed of in a proper manner.
To order replacement batteries, call 1-800-IBM-SERV within the United States, and 1-800-465-7999 or 1-800-465-6666 within Canada. Outside the U.S. and Canada, call your IBM authorized reseller or IBM marketing representative.
Note: After you replace the battery, the blade server is automatically reconfigured;
however, you must reset the system date and time through the operating system that you installed.
Statement 2:
CAUTION: When replacing the lithium battery, use only IBM Part Number 33F8354 or an equivalent type battery recommended by the manufacturer. If your system has a module containing a lithium battery, replace it only with the same module type made by the same manufacturer. The battery contains lithium and can explode if not properly used, handled, or disposed of.
Do not:
v Throw or immerse into water v Heat to more than 100°C (212°F) v Repair or disassemble
Dispose
of the battery as required by local ordinances or regulations.
Note: See “Battery return program” on page 199 for more information about battery
disposal.
Chapter 9. Installing options 83
Complete the following steps to replace the battery:
1. Read the safety information beginning on page iii and “Installation guidelines” on page 71
2. Follow any special handling and installation instructions that come with the battery.
3. If the blade server is operating, shut down the operating system by typing the
shutdown -h now command. Refer to your operating system documentation. If
the blade server was not turned off, press the power-control button (behind the blade-server control-panel door) to turn off the blade server. See “Blade server controls and LEDs” on page 14 for more information about the location of the power-control button.
4. Remove the blade server from the BladeCenter unit (see “Removing the blade server from the BladeCenter unit” on page 73 for information).
5. Carefully place the blade server on a flat, static-protective surface.
6. Open the blade server cover (see “Opening the blade server cover” on page 74 for instructions).
7. Locate the battery (connector BH1) on the system board.
Battery (BH1)
84 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
8. Remove the battery: a. Use your finger to press down on one side of the battery; then, slide the
battery out from its socket. The spring mechanism will push the battery out toward you as you slide it from the socket.
Note: You might need to lift the battery clip slightly with your fingernail to
make it easier to slide the battery.
b. Use your thumb and index finger to pull the battery from under the battery
clip.
Note: After you remove the battery, press gently on the clip to make sure
that the battery clip is touching the base of the battery socket.
9. Insert the new battery: a. Tilt the battery so that you can insert it into the socket, under the battery
clip. Make sure that the side with the positive (+) symbol is facing up.
b. As you slide it under the battery clip, press the battery down into the
socket.
10. Close the blade server cover (see “Closing the blade server cover” on page
92).
Statement 21:
CAUTION: Hazardous energy is present when the blade server is connected to the power source. Always replace the blade cover before installing the blade server.
11. Reinstall the blade server into the BladeCenter unit.
12. n on the blade server (see “Turning on the blade server” on page 13).
13. Reset the system date and time through the operating system that you installed. For additional information, see your operating-system documentation.
Chapter 9. Installing options 85
System board
Two operational microprocessors and heat sinks are required on the system board in the blade server at all times. The microprocessors and heat sinks are not replaceable. Do not attempt to remove these components or any components that secure the microprocessors and heat sinks to the system board. You must replace the system board if any of these conditions exists:
v A microprocessor or heat sink becomes defective. v Certain errors occur as described in “Firmware error codes” on page 102. v The blade server does not restart after you recover the system firmware code as
described in “Recovering the system firmware code” on page 54.
obtain a new system board, you must order a new blade server. The
To replacement system board comes attached to the new blade server. To order a blade server, contact your IBM authorized reseller or IBM marketing representative.
Important: After you replace the system board, you must either update the new
blade server with the latest firmware or restore the pre-existing firmware from a diskette or CD image. Yo u must also reconfigure the new blade server and reset the system date and time.
System board component locations
The following illustration shows the location of the system-board components, including connectors for user-installable options.
Note: All jumpers not specifically mentioned are reserved.
86 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
System-board LED locations
The following illustration shows the location of the LEDs on the system board.
DIMM 3 error LED (CR46)
DIMM 1 error LED (CR40)
DIMM 2 error LED (CR45)
Microprocessor 0 error LED (CR19)
DIMM 4 error LED (CR53)
Temperature error LED (CR16)
Light Path Diagnostics (SW1)
Microprocessor 1 error LED (CR58)
Service processor error LED (CR27)
NMI error LED (CR17)
System board error LED (CR20)
Reserved (CR29)
Note: Error LED CR29 is reserved.
Replacing the system board
Complete the following steps to replace the system-board assembly:
1. Read the safety information beginning on page iii and “Installation guidelines” on page 71
2. If the blade server is operating, shut down the operating system by typing the
shutdown -h now command. Refer to your operating system documentation. If
the blade server was not turned off, press the power-control button (behind the blade-server control-panel door) to turn off the blade server. See “Blade server controls and LEDs” on page 14 for more information about the location of the power-control button.
3. Remove the blade server from the BladeCenter unit (see “Removing the blade server from the BladeCenter unit” on page 73 for information). The faulty system-board assembly is attached to the blade server.
4. Open the blade server cover (see “Opening the blade server cover” on page 74 for instructions).
5. Remove the blade-server bezel assembly (see “Removing the blade server bezel assembly” on page 75).
Chapter 9. Installing options 87
6. Remove the following components from the faulty system-board assembly (see the applicable installation instructions in this chapter and reverse the steps), and place them on a flat, static-protective surface. Note the locations where these components were installed on the faulty system-board assembly. You will need this information when you install these components on the replacement system-board assembly. Make sure that these components are accessible for reinstallation.
v IDE hard disk drives, drive trays, and riser cards (see “Installing IDE hard
disk drives” on page 75)
v DIMMs (see “Installing memory modules” on page 77) v I/O expansion cards and expansion card trays (see “Installing an I/O
expansion card” on page 79)
v Jumper J14, between jumpers J16 and J20 (for location, see the illustration
in “Recovering the system firmware code” on page 54)
7. While the new system-board assembly is still in its static-protective package,
touch it to an unpainted metal part of the system unit for at least 2 seconds.
8. Remove the new system-board assembly from its package and place it on a flat, static-protective surface.
9. Install the components that you removed from the faulty system-board assembly in step 6 into the corresponding locations on the replacement system-board assembly.
v IDE hard disk drives, drive trays, and riser cards (see “Installing IDE hard
disk drives” on page 75)
v DIMMs (see “Installing memory modules” on page 77) v I/O expansion cards and expansion card trays (see “Installing an I/O
expansion card” on page 79)
v Jumper J14, between jumpers J16 and J20 (for location, see the illustration
in “Recovering the system firmware code” on page 54)
If you plan to increase the amount of memory in the blade server, install the new DIMMs on the new system-board assembly now. For additional information, see “Installing memory modules” on page 77.
10. Note the machine type, model number, and serial number on the identification label that is behind the control-panel door on the front of the blade server. Yo u will need this information to complete this step.
The replacement system-board assembly comes with a repair identification (RID) tag label. To ensure future entitlement for service, you must write the serial number of the blade server (with the original system-board assembly) onto the RID tag label in this step. The part number for the RID tag is 13N0477.
Use the RID tag label to transfer entitlement (machine type, model number, and serial number) from the original system-board assembly to the new system-board assembly. Do not use a pencil or felt-tip pen to complete the RID tag label.
Important:
v The serial number of the blade server (with the original system-board
assembly) must match the serial number that you reported when you called IBM for service.
v Because the new system-board assembly is not associated with a
blade-server serial number, you must transfer the serial number from the original system-board assembly to the new system-board assembly. The first time that you turn on the blade server that contains the new system-board
88 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
assembly, the firmware code will request that you enter the serial number, as described in step 16. Yo u must enter the blade-server serial number. If you enter a different serial number, the operating system that you installed might interpret this information as an incorrect serial number, and you might have to change your software-licensing agreement.
v To maintain proper airflow, do not place the new label on the blade-server
bezel assembly.
Also, be sure to place the RID tag label on the bottom of the blade server chassis.
11. Install the blade-server bezel assembly on the blade server (see “Installing the blade-server bezel assembly” on page 90).
12. Close the blade server cover (see “Closing the blade server cover” on page
92).
Statement 21:
CAUTION: Hazardous energy is present when the blade server is connected to the power source. Always replace the blade cover before installing the blade server.
13. Install the blade server into the same BladeCenter unit I/O bay from which you
removed the blade server when it contained the faulty system-board assembly.
14. Turn on the blade server (see “Turning on the blade server” on page 13).
Note: If you have just connected the power cords of the BladeCenter unit to
electrical outlets, you will have to wait until the power-on LED on the blade server slowly flashes before you press the power-control button on the blade server.
15. Configure an SOL connection and attach it to this blade server.
For additional information, see the IBM Eserver BladeCenter and BladeCenter T Serial Over LAN Setup Guide.
16. The blade server will boot to the open firmware command to enter the serial
number of the blade server (with the original system-board assembly). The blade server will not start until the serial number and other relevant
information have been entered and verified at the prompts when the following checkpoint codes are displayed, as shown in the following example window. Depending on the blade server configuration, the text that is displayed in your system window might be slightly different.
E1F0 E1F1 D099 D100 > xxxxxxx (The serial number of the blade server with the original
system-board assembly)
D101 > xxxxxxx (Re-enter the serial number to verify) D102 > 8842 (The type number from the blade server) D103 > 8842 (Re-enter the type number to verify) D104 > xxxx (The model number from the blade server) D105 > xxxx (Re-enter the model number to verify)
Chapter 9. Installing options 89
Note: These checkpoint codes are described in Chapter 7, “Diagnostics,” on
page 37.
17. Reset the system date and time through the operating system that you installed. For additional information, see your operating-system documentation.
The system-board assembly replacement procedure is now complete. Continue with “Input/output connectors and devices” on page 92.
Completing the installation
To complete the installation, perform the following tasks, if you have not already done so.
1. Install the blade-server bezel assembly on the blade server (see “Installing the
blade-server bezel assembly”).
2. Close the blade server cover (see “Closing the blade server cover” on page 92).
Statement 21:
CAUTION: Hazardous energy is present when the blade server is connected to the power source. Always replace the blade cover before installing the blade server.
3. Reinstall the blade server into the BladeCenter unit.
4. Turn on the blade server (see “Turning on the blade server” on page 13).
5. After you replace the battery or the system-board assembly, reset the system
date and time through the operating system that you installed. For additional information, see your operating-system documentation.
If you have just connected the power cords of the BladeCenter unit to
Note:
electrical outlets, you will have to wait until the power-on LED on the blade server flashes slowly before pressing the power-control button on a blade server.
Installing the blade-server bezel assembly
The following illustration shows how to install the bezel assembly on the blade server.
90 BladeCenter JS20 Type 8842: Hardware Maintenance Manual and Troubleshooting Guide
Loading...