IBM 8669, xSeries 342 Type 8669 Hardware Maintenance Manual

ERserver
򔻐򗗠򙳰
Hardware Maintenance Manual
xSeries 342 Type 8669
ER s e r v e r
򔻐򗗠򙳰
Hardware Maintenance Manual
xSeries 342 Type 8669
Note
Before using this information and the product it supports, be sure to read the general information under “Notices” on page 143.
First Edition (June 2001; updated December 2001, March 2002, July 2002) The following paragraph does not apply to the United Kingdom or any country were such provisions are
inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION AS ISWITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This publication could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time.
This publication was developed for products and services offered in the United States of America. IBM may not offer the products, services, or features discussed in this document in other countries, and the information is subject to change without notice. Consult your local IBM representative for information on the products, services, and features available in your area.
Requests for technical information about IBM products should be made to your IBM reseller or IBM marketing representative.
© Copyright International Business Machines Corporation 2000, 2001. All rights reserved.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
About this manual
This manual contains diagnostic information, a Symptom-to-FRU index, service information, error codes, error messages, and configuration information for the IBM
xSeries 342.
Important: This manual is intended for trained servicers who are familiar with IBM
PC Server products.
Important safety information
Be sure to read all caution and danger statements in this book before performing any of the instructions. See “Safety information” on page 109.
Leia todas as instruções de cuidado e perigo antes de executar qualquer operação.
®
Prenez connaissance de toutes les consignes de type Attention et Danger avant de procéder aux opérations décrites par les instructions.
Lesen Sie alle Sicherheitshinweise, bevor Sie eine Anweisung ausführen.
© Copyright IBM Corp. 2000, 2001 iii
Online support
Accertarsi di leggere tutti gli avvisi di attenzione e di pericolo prima di effettuare qualsiasi operazione.
Lea atentamente todas las declaraciones de precaución y peligro ante de llevar a cabo cualquier operación.
Use the World Wide Web (WWW) to download Diagnostic, BIOS Flash, and Device Driver files.
File download address is:
http://www.ibm.com/pc/support
iv Hardware Maintenance Manual: xSeries 342 Type 8669
Contents
About this manual .......................iii
Important safety information ....................iii
Online support .........................iv
General checkout ........................1
General information .......................3
Features and specifications .....................3
Server features .........................5
Reliability, availability, and serviceability features .............6
Controls and indicators ......................7
Operator information panel ....................8
Powering on the server .....................9
Powering off the server .....................9
Diagnostics ..........................11
Diagnostic tools overview .....................11
Identifying problems using LEDs ..................11
Power supply LEDs ......................11
Light path diagnostics .....................11
Diagnostics panel .......................11
Light path diagnostics .....................12
POST ............................12
POST error messages .....................12
Error logs ..........................12
Log viewing from Configuration/Setup ..............13
Log viewing from diagnostic programs ..............13
SCSI messages (some models)...................13
Diagnostic programs and error messages ...............13
Text messages ........................14
Starting the diagnostic programs .................15
Viewing the test log ......................16
Diagnostic error message tables .................16
Recovering BIOS code ......................16
Troubleshooting the Ethernet controller ................17
Network connection problems ..................17
Ethernet controller troubleshooting chart ..............18
Power checkout ........................19
Replacing the battery ......................19
Configuration .........................23
Using the Configuration/Setup Utility program .............23
Starting the Configuration/Setup Utility program ............23
Main menu of the Configuration/Setup Utility .............24
Additional Configuration/Setup Utility menu choices .........26
Using passwords .......................27
Setting the password override switch ...............27
Using the SCSISelect utility program .................28
Starting the SCSISelect utility program ...............28
Choices available from the SCSISelect menu.............28
Using the PXE boot agent utility program ...............29
Starting the PXE boot agent utility program .............29
Choices available from the PXE boot agent menu ...........30
© Copyright IBM Corp. 2000, 2001 v
Installing options .......................33
Major components of the xSeries 342 server ..............33
System board layout .......................34
System board options connectors .................34
System board internal cable connectors...............35
System board external port connectors ...............36
System board switches and jumpers ................36
System board LED locations ...................38
Light Path Diagnostics panel ..................39
Before you begin ........................40
System reliability considerations .................41
Working inside a server with power on ...............41
Handling static-sensitive devices .................41
Locations ...........................42
Removing the cover and bezel ..................42
Adapters ..........................44
Adapter considerations ....................44
Adapter installation .....................45
Internal drives ........................47
Internal drive bays .....................47
SCSI drives ........................48
SCSI IDs .........................48
Hot-swap drive installation...................49
Non-hot-swap drive installation .................50
Memory modules .......................51
Microprocessor installation ...................53
Microprocessor installation in Models 1RX, 2RX, 4RX, 1TG, 2TG, 4TG 54
Microprocessor installation in Model 5RX .............56
Hot-swap power supply installation ................61
3-Pack Ultra 160 Hot-Swap Expansion Kit .............63
Hot-swap fan assembly replacement ................63
Installing the server cover and bezel ................64
Connecting external options ....................65
Cabling requirements .....................65
Setting SCSI IDs for external devices ...............65
Installation procedure .....................65
Input/Output ports .......................65
Video port.........................66
Keyboard port .......................66
Auxiliary-device (pointing device) port ..............67
Ultra 160 SCSI ports.....................67
Serial ports ........................69
Universal Serial Bus ports ...................70
Ethernet port........................71
Integrated System Management Processor ports ..........74
Cabling the server........................75
FRU information (service only) ..................77
LED cover ..........................77
LED board ..........................77
On/off reset board ........................78
Diskette/CD-ROM drive ......................78
SCSI backplane ........................79
Hot-swap hard disk drive backplane assembly .............79
Power supply backplane .....................80
AC distribution box .......................81
vi Hardware Maintenance Manual: xSeries 342 Type 8669
System board .........................81
Symptom-to-FRU index .....................83
Beep symptoms ........................83
No Beep symptoms .......................85
Information panel system error LED .................85
Diagnostic error codes ......................87
Error symptoms ........................91
Power supply LED errors .....................92
POST error codes ........................93
Service processor error codes ...................98
SCSI error codes ........................99
Temperature error messages ....................99
Fan error messages ......................100
Power error messages .....................100
System shutdown .......................100
Voltage related system shutdown.................101
Temperature related system shutdown ...............101
DASD checkout ........................101
Host Built-In Self Test (BIST) ...................102
Bus fault messages.......................102
Undetermined problems .....................102
Parts listing, xSeries 342 ....................105
System ...........................106
Keyboards ..........................107
Power cords .........................108
Related service information ...................109
Safety information .......................109
General safety .......................109
Electrical safety .......................110
Safety inspection guide ....................111
Handling static-sensitive devices .................112
Grounding requirements ....................113
Safety notices (multi-lingual translations) ..............113
Send us your comments!.....................142
Problem determination tips ....................143
Notices ...........................143
Trademarks..........................144
Contents vii
viii Hardware Maintenance Manual: xSeries 342 Type 8669
General checkout
The server diagnostic programs are stored in upgradable read-only memory (ROM) on the system board. These programs are the primary method of testing the major components of the server: the system board, Ethernet controller, video controller, RAM, keyboard, mouse (pointing device), diskette drive, serial ports, and hard drives. You can also use them to test some external devices. See “Diagnostic programs and error messages” on page 13.
Also, if you cannot determine whether a problem is caused by the hardware or by the software, you can run the diagnostic programs to confirm that the hardware is working properly.
When you run the diagnostic programs, a single problem might cause several error messages. When this occurs, work to correct the cause of the first error message. After the cause of the first error message is corrected, the other error messages might not occur the next time you run the test.
A failed system might be part of a shared DASD cluster (two or more systems sharing the same external storage device(s)). Prior to running diagnostics, verify that the failing system is not part of a shared DASD cluster.
A system might be part of a cluster if:
v The customer identifies the system as part of a cluster. v One or more external storage units are attached to the system and at least one
of the attached storage units is additionally attached to another system or
unidentifiable source. v One or more systems are located near the failing system.
If the failing system is suspected to be part of a shared DASD cluster, all diagnostic tests can be run except diagnostic tests which test the storage unit (DASD residing in the storage unit) or the storage adapter attached to the storage unit.
Notes:
1. For systems that are part of a shared DASD cluster, run one test at a time in
looped mode. Do not run all tests in looped mode, as this could enable the DASD diagnostic tests.
2. If multiple error codes are displayed, diagnose the first error code displayed.
3. If the computer hangs with a POST error, go to “POST error codes” on page 93.
4. If the computer hangs and no error is displayed, go to “Undetermined problems”
on page 102.
5. Power supply problems, see “Power supply LED errors” on page 92.
6. Safety information, see “Safety information” on page 109.
7. For intermittent problems, check the error log; see “Error logs” on page 12.
© Copyright IBM Corp. 2000, 2001 1
001 IS THE SYSTEM PART OF A CLUSTER?
YES. Schedule maintenance with the customer. Shut down all systems
related to the cluster. Run storage test.
NO. Go to step 002.
002 IF THE SYSTEM IS NOT PART OF A CLUSTER:
v Power-off the computer and all external devices. v Check all cables and power cords. v Set all display controls to the middle position. v Power-on all external devices. v Power-on the computer. v Record any POST error messages displayed on the screen. If an error is
displayed, look up the first error in the “POST error codes” on page 93.
v Check the information LED panel System Error LED; if on, see
“Information panel system error LED” on page 85.
v Check the System Error Log. If an error was recorded by the system,
see “Symptom-to-FRU index” on page 83.
v Start the Diagnostic Programs. See “Starting the diagnostic programs” on
page 15.
v Check for the following responses:
1. One beep.
2. Readable instructions or the Main Menu.
003 DID YOU RECEIVE BOTH OF THE CORRECT RESPONSES?
NO. Find the failure symptom in “Symptom-to-FRU index” on page 83. YES. Run the Diagnostic programs. If necessary, refer to “Starting the
diagnostic programs” on page 15. If you receive an error, go to“Symptom-to-FRU index” on page 83. If the diagnostics completed successfully and you still suspect a problem,
see “Undetermined problems” on page 102.
2 Hardware Maintenance Manual: xSeries 342 Type 8669
General information
The IBM xSeries 342 server is a high-performance server that supports symmetric multiprocessing (SMP). It is ideally suited for networking environments that require superior microprocessor performance, efficient memory management, flexibility, and large amounts of reliable data storage.
Performance, ease of use, reliability, and expansion capabilities were key considerations during the design of this server. These design features make it possible for you to customize the system hardware to meet your needs today, while providing flexible expansion capabilities for the future.
If you have access to the World Wide Web, you can obtain up-to-date information about the server model and other IBM server products at the following World Wide Web address: http://www.ibm.com/eserver/xseries.
Features and specifications
The following table provides a summary of the features and specifications for the xSeries 342:
Microprocessor:
v Intel v 256 KB or 512 KB Level-2 cache v Supports up to two microprocessors
Memory:
v Maximum: 4 GB v Type: ECC, SDRAM, PC133, Registered DIMMs v Slots: Four (two-way interleaved)
Drives standard:
v Diskette: 1.44 MB v CD-ROM: 24X IDE
Expansion bays:
v Hot-swap: Three slim high v Non-hot-swap: Two 5.25-inch, replaceable with a three slim-high hot-swap drive
PCI expansion slots:
v One 33 MHz/32-bit v Two 33 MHz/64-bit v Two 66 MHz/64-bit
Hot-swap power supplies:
v 270 Watt (115-230 V ac) v Minimum: One v Maximum: Two, second power supply provides redundant power
®
Pentium®III
expansion option
Redundant cooling:
v Three hot-swap fans
Video:
v S3 video controller v Compatible with SVGA and VGA v 8 MB video memory
© Copyright IBM Corp. 2000, 2001 3
Size (3U):
v Height: 128 mm (5 in.) v Depth: 695 mm (27.3 in.) v Width: 440 mm (17.3 in.) v Weight: 21.3 to 29.5 (47 to 65 lbs.) depending upon configuration
Integrated functions:
v Dual channel Ultra 160 SCSI controller v One 10BASE-T/100BASE-TX/100BASE-FX, Intel Ethernet controller with Alert on
LAN
and Wake on LAN®support
v Two serial ports v Two Universal Serial Bus ports v Keyboard port v Mouse port v Video port v Integrated system management processor (ISMP) with two ISM (RS-485) RJ-45
connectors
v Two ISM (RJ-45) connectors v One system management Serial C port
Acoustical noise emissions:
v Sound power, idling (open bay): 6.6 bel maximum v Sound power, operating: 6.8 bel maximum v Sound pressure, operating: 53 dBa maximum
Environment:
v Air temperature:
– Server on: 10° to 35° C (50° to 95° F). Altitude: 0 to 914 m (2998 ft.) – Server on: 10° to 32° C (50° to 89.6° F). Altitude: 914 m (2998 ft.) to 2133 m
(6998 ft.)
– Server off: 10° to 43° C (50° to 109.4° F). Maximum altitude: 2133 m (6998
ft.)
v Humidity:
– Server on: 8% to 80% – Server off: 8% to 80%
Heat output:
v Approximate heat output in British Thermal Units (BTU) per hour
– Minimum configuration 375 BTU (110 watts) – Maximum configuration 1300 BTU (380 watts)
Electrical input:
v Sine-wave input (50-60 Hz) required v Input voltage low range:
– Minimum: 100 V ac – Maximum: 127 V ac
v Input voltage high range:
– Minimum: 200 V ac – Maximum: 240 V ac
v Input kilovolt-amperes (kVA) approximately:
– Minimum: 0.08 kVA (0.076 kW) – Maximum: 0.38 kVA
* KB equals approximately 1000 bytes. MB equals approximately 1000000 bytes. GB equals approximately 1000000000 bytes.
4 Hardware Maintenance Manual: xSeries 342 Type 8669
Server features
The xSeries 342 is designed to be cost-effective, powerful, and flexible. The server offers:
v Impressive performance using an innovative approach to SMP
The server supports up to two Intel Pentium III microprocessors. The server
comes with one microprocessor installed; you can install an additional
microprocessor to enhance performance and provide SMP capability. v Large data-storage and hot-swap capabilities
All models of the server support up to three hot-swap hard disk drives. This
hot-swap feature enables you to remove and replace hard disk drives without
turning off the server. The xSeries 3-Pack Ultra 160 Hot-Swap Expansion Kit
option is available to add three additional drives. v Optional PCI adapters
The server uses peripheral component interconnect (PCI) bus architecture to
provide compatibility with a wide range of existing hardware devices and software
applications. The server supports up to five PCI adapters in the expansion slots
on the system board. v Redundant cooling capability
The redundant cooling capability of the hot-swap fans in the server allow
continued operation if one of the fans fails. You can also replace a failing
hot-swap fan without turning off the server. v Optional redundant power capability
You can install an additional 270-watt power supply in the server to provide
redundant power for the server. The Power Non-Redundant (NON) light emitting
diode (LED) in the group of diagnostic LEDs on the system board is lit when the
power load is 270 watts or greater with two power supplies installed. v Large system memory
The memory bus in the server supports up to 4 GB (GB equals approximately
1000000000 bytes) of system memory. The memory controller provides error
correcting code (ECC) support for up to four industry standard PC133, 3.3 V,
168-pin, 8-byte, registered, synchronous-dynamic-random access memory
(SDRAM) dual inline memory modules (DIMMs). v IBM integrated system management processor (ISMP)
The integrated system management processor provides environmental monitoring
for the server. When environmental conditions exceed thresholds or when system
components fail, the ISMP lights the Light Path Diagnostics
LEDs to indicate the location of the problem. Critical errors, as well as environmental and configuration information, are also included in the error log.
If an optional Remote Supervisor Adapter is installed on the Advanced System Management (ASM) interconnect network, you can view the system health; update the ISMP code; power on, power off, and restart the server; view the error log; view the vital product data; and send alerts over the ASM interconnect network.
v Integrated network environment support
The server comes with an Ethernet controller on the system board. This Ethernet controller has an interface for connecting to 10-Mbps or 100-Mbps networks. The server automatically selects between 10BASE-T and 100BASE-TX. The controller provides full-duplex (FDX) capability, which allows simultaneous transmission and reception of data on the Ethernet local area network (LAN).
v Redundant network-interface card
General information 5
The addition of an optional, redundant network interface card (NIC) provides a failover capability to a redundant Ethernet connection. If a problem occurs with the primary Ethernet connection, all Ethernet traffic associated with this primary connection is automatically switched to the redundant NIC. This switching occurs without data loss and without user intervention.
v Optional digital linear tape drive
The addition of an optional digital linear tape drive (DLT) allows quick backup of large amounts of data.
v IBM ServerGuide
CDs
The ServerGuide CDs included with the server provide programs to help you set up the server and install the network operating system (NOS). The ServerGuide program detects the hardware options installed, and provides the correct configuration programs and device drivers. In addition, the ServerGuide CDs include a variety of application programs for the server.
Reliability, availability, and serviceability features
Three of the most important features in server design are reliability, availability, and serviceability (RAS). These factors help to ensure the integrity of the data stored on the server; that the server is available when you want to use it; and that should a failure occur, you can easily diagnose and repair the failure with minimal inconvenience.
The following is an abbreviated list of the RAS features that the server supports. v Menu-driven setup, system configuration, RAID configuration, and diagnostic
programs
v Power-on self-test (POST) v ROM resident diagnostics v Integrated System Management Processor v Predictive failure alerts v Microprocessor built-in self-test (BIST), internal error signal monitoring,
configuration checking, CPU/VRM failure identification through Light Path Diagnostics technology
v Diagnostic support of ServeRAID v Cable detection v Hot-swap drive bays v System error logging v Upgradable BIOS, diagnostics, and system management code v Automatic restart after a power failure v Parity checking on the SCSI and PCI buses v Error checking and correcting (ECC) memory v Redundant hot-swap power supply option v Redundant hot-swap cooling v Redundant Ethernet capabilities (with optional adapter) v Vital Product Data (VPD) on processor complex, system board, power backplane,
SCSI backplane, and each power supply
v Operator information panel and group of diagnostic LEDs on the system board v Remind button to temporarily turn off LEDs for nonvital alerts
adapters and Ethernet adapters
6 Hardware Maintenance Manual: xSeries 342 Type 8669
Controls and indicators
The most commonly used controls and status indicators are on the front panel of the server.
System power light (green)
Power-control-button shield (if installed)
Power control button
Hard disk drive activity light (green) Hard disk drive status light (amber)
System Power Light: When this green light is on, system power is present in the server. When this light flashes, the server is in standby mode (the system power supply is turned off and AC current is present). When this light is off, either a power supply, AC power, or a light has failed.
Attention: If this light is off, it does not mean there is no electrical current present
Reset button
Operator information panel
Serial number
in the server. The light might be burned out. To remove all electrical current from the server, you must unplug the server power cords from the electrical outlets or from the UPS.
Power-control button shield: You can install this circular disk over the power-control button to prevent accidental manual power-off. This disk is provided with the server.
Power-control Button: Press this button to manually turn the server on or off.
Reset Button: Press this button to reset the server and run the power-on self-test
(POST).
Operator Information Panel: The lights on this panel give status information for the server. See “Operator information panel” on page 8 for more information.
Hard Disk Drive Status Light: Each of the hot-swap drive bays has a Hard Disk Status light. When this amber light is on continuously, the drive has failed (only if RAID is installed). If a ServeRAID adapter is installed and this light flashes slowly (one flash per second), the drive is being rebuilt. When the light flashes rapidly (three flashes per second), the controller is identifying the drive.
General information 7
Hard Disk Drive Activity Light: Each of the hot-swap drive bays has a Hard Disk Activity light. When this green light is flashing, the controller is accessing the drive.
AC power LED (green)
DC power LED (green)
AC Power Light: This light provides status information about the power supply. During normal operation, both the AC and DC Power lights are on. For any other combination of lights, see “Power supply LED errors” on page 92.
DC Power Light: This light provides status information about the power supply. During normal operation, both the AC and DC Power lights are on. For any other combination of lights, see “Power supply LED errors” on page 92.
Operator information panel
SCSI Hard Drive Activity Light (green)
Ethernet Link Status Light (green)
System Error Light (amber)
SCSI Hard Disk Drive Activity Light: This green light is on when there is activity on a hard disk drive.
TX
RX
LINK
OK
Ethernet Transmit/ Receive Activity Light (green)
Information Light (amber)
Ethernet Transmit/Receive Activity Light: When this green light is on, there is transmit or receive activity to or from the server. This light stays on even if the server power is turned off.
8 Hardware Maintenance Manual: xSeries 342 Type 8669
Ethernet Link Status Light: When this green light is on, there is an active connection on the Ethernet port. The light stays on even if the server power is turned off.
Information Light: This amber light is on when the system error log contains information about certain conditions in the server that might affect performance.
System Error Light: This amber light is lit when a system error occurs. An LED on the diagnostic LED panel may also be on to further isolate the error.
Powering on the server
You can start the server in several ways: v You can turn on the server by pressing the power-control button on the front of
the server.
v If the server is turned on, a power failure occurs, and unattended-start mode is
enabled in the Configuration/Setup utility program, the server will start automatically when power is restored.
v If AC power is present, the server is off, and the wake-up feature is enabled in
the Configuration/Setup utility program, the wake-up feature will turn on the server at the set time.
v The Integrated System Management Processor can also turn on the server.
Powering off the server
Statement 5:
CAUTION:
The power control button on the device and the power switch on the power supply do not turn off the electrical current supplied to the device. The device also might have more than one power cord. To remove all electrical current from the device, ensure that all power cords are disconnected from the power source.
2 1
The server can be turned off as follows: v You can turn off the server by pressing the power-control button on the front of
the server. Pressing the power-control button starts an orderly shutdown of the operating system, if this feature is supported by the operating system, and places the server in standby mode.
Note: After turning off the server, wait at least 5 seconds before pressing the
power-control button to power the server on again.
General information 9
v You can press and hold the power-control button for more than 4 seconds to
cause an immediate shutdown of the server and place the server in standby mode. You can use this feature if the operating system hangs.
v You can disconnect the server power cords from the electrical outlets to shut off
all power to the server.
Note: Wait about 15 seconds after disconnecting the power cords for the system
to stop running. Watch for the System Power light on the operator information panel to stop blinking.
v If the system was turned on by the wake-up feature or Wake on LAN feature, you
can turn it off by either a software routine or by the fail-safe, power-down counter.
v The Integrated System Management Processor can turn off the server.
10 Hardware Maintenance Manual: xSeries 342 Type 8669
Diagnostics
This section provides basic troubleshooting information to help you resolve some common problems that might occur with the server.
Diagnostic tools overview
The following tools are available to help you identify and resolve hardware-related problems:
v POST beep codes, error messages, and error logs
The power-on self-test (POST) generates beep codes and messages to indicate successful test completion or the detection of a problem. See “POST” on page 12 for more information.
v Diagnostic programs and error messages
The server diagnostic programs are stored in upgradable read-only memory (ROM) on the system board. These programs are the primary method of testing the major components of the server. See “Diagnostic programs and error messages” on page 13 for more information.
v Light path diagnostics
The server has light-emitting diodes (LEDs) to help you identify problems with server components. These LEDs are part of the light-path diagnostics that are built into the server. By following the path of lights, you can quickly identify the type of system error that occurred. See “Light path diagnostics” for more information.
Identifying problems using LEDs
The server has LEDs to help you identify problems with some server components. These LEDs are part of the light path diagnostics built into the server. By following the path of lights, you can identify the type of system error that occurred. See the following sections for more information.
Power supply LEDs
The AC and DC Power LEDs on the power supply provide status information about the power supply. See “Power supply LED errors” on page 92.
Light path diagnostics
You can use the light path diagnostics built into the server to quickly identify the type of system error that occurred. The diagnostics panel is under the air baffle. The server is designed so that any LEDs that are illuminated remain illuminated when the server shuts down as long as the AC power source is good and the power supplies can supply +5V DC current to the server. This feature helps you isolate the problem if an error causes the server to shut down. See “Light path diagnostics” on page 12.
Diagnostics panel
The following illustration shows the LEDs on the diagnostics panel on the system board. See “Light path diagnostics” on page 12 for information on identifying
© Copyright IBM Corp. 2000, 2001 11
problems using these LEDs.
MEM
CPU PCI A PCI B PCI C
VRM
DASD
SP
PS1 PS2 PS3
NON
OVER
NMI
TEMP
FAN
Light path diagnostics
The System Error LED on the operator information panel is lit when certain system errors occur. If the System Error LED on the server is lit, see the table in “Information panel system error LED” on page 85 to determine the cause of the error and the action you should take.
POST
When you turn on the server, it performs a series of tests to check the operation of server components and some of the options installed in the server. This series of tests is called the power-on self-test, or POST.
REMIND
If POST finishes without detecting any problems, a single beep sounds and the first screen of the operating system or application program appears.
If POST detects a problem, more than one beep sounds and an error message appears on the screen. See “Beep symptoms” on page 83 and “POST error messages” for more information.
Notes:
1. If you have a power-on password or administrator password set (with Remote Supervisor Adapter installed), you must type the password and press Enter, when prompted, before POST will continue.
2. A single problem might cause several error messages. When this occurs, work to correct the cause of the first error message. After you correct the cause of the first error message, the other error messages usually will not occur the next time you run the test.
POST error messages
The table “POST error codes” on page 93 provides information about the POST error messages that can appear during startup.
Error logs
The POST error log contains the three most recent error codes and messages that the system generated during POST. The system error log contains all messages issued during POST and system status messages from the integrated system management processor.
12 Hardware Maintenance Manual: xSeries 342 Type 8669
Note: It is possible during power-on that some system error log entries may
contain a date of 01/01/1990 with a time of 00:00:00. These entries occur prior to the ISMP clock being initialized and may be disregarded.
You can view the contents of the error logs from the Configuration/Setup Utility program or from the diagnostic programs.
Log viewing from Configuration/Setup
To view the contents of the error log from the Configuration/Setup Utility, start the Configuration/Setup Utility program; then, select Error Logs from the main menu. See “Starting the Configuration/Setup Utility program” on page 23 for more information.
Log viewing from diagnostic programs
To view the contents of the error log from the diagnostic programs, start the diagnostic programs; select Hardware Info from the top of the diagnostic programs screen; select System Error Log from the list that appears; then, follow the instructions on the screen. See “Starting the diagnostic programs” on page 15 for more information.
SCSI messages (some models)
If you receive a SCSI error message while using the SCSISelect Utility, use the following list to determine the possible cause of the error and what action to take.
Note: If the server does not have a hard disk drive, ignore any message that
indicates that the drive is not installed.
One or more of the following might be causing the problem.
v A failing SCSI device (adapter or drive) v An improper SCSI configuration v Duplicate SCSI IDs in the same SCSI chain v An improperly installed SCSI terminator v A defective SCSI terminator v An improperly installed cable v A defective cable
Verify that: v The external SCSI devices are turned on. External devices must be turned on
before the server.
v The cables for all external SCSI devices are connected correctly. v The last device in each SCSI chain is terminated properly. v The SCSI devices are configured correctly.
You will get these messages only when running the SCSISelect Utility. See “SCSI error codes” on page 99.
Diagnostic programs and error messages
The server diagnostic programs are stored in upgradable read-only memory (ROM) on the system board. These programs are the primary method of testing the major components of the server.
Diagnostics 13
Diagnostic error messages indicate that a problem exists; they are not intended to be used to identify a failing part. Troubleshooting and servicing of complex problems that are indicated by error messages should be performed by trained service personnel.
Sometimes the first error to occur causes additional errors. In this case, the server displays more than one error message. Always follow the suggested action instructions for the first error message that appears.
The following sections contain the error codes that might appear in the detailed test log and summary log when running the diagnostic programs.
The error code format is as follows:
fff-ttt-iii-date-cc-text message
where: fff is the three-digit function code that indicates the function being tested when
the error occurred. For example, function code 089 is for the microprocessor.
ttt is the three-digit failure code that indicates the exact test failure that was
encountered.
iii is the three-digit device ID. date is the date that the diagnostic test was run and the error recorded.
Text messages
cc is the check digit that is used to verify the validity of the information. text message
is the diagnostic message that indicates the reason for the problem.
The diagnostic text message format is as follows:
Function Name: Result (test specific string)
where:
Function Name
is the name of the function being tested when the error occurred. This corresponds to the function code (fff) given in the previous list.
Result
can be one of the following:
Passed
This result occurs when the diagnostic test completes without any errors.
Failed This result occurs when the diagnostic test discovers an error. User Aborted
This result occurs when you stop the diagnostic test before it is complete.
Not Applicable
This result occurs when you specify a diagnostic test for a device that is not present.
Aborted
This result occurs when the test could not proceed because of the system configuration.
14 Hardware Maintenance Manual: xSeries 342 Type 8669
Warning
This result occurs when a possible problem is reported during the diagnostic test, such as when a device that is to be tested is not installed.
Test Specific String
This is additional information that you can use to analyze the problem.
Starting the diagnostic programs
You can press F1 while running the diagnostic programs to obtain Help information. You also can press F1 from within a help screen to obtain online documentation from which you can select different categories. To exit Help and return to where you left off, press Esc.
To start the diagnostic programs:
1. Turn on the server and watch the screen.
Note: To run the diagnostic programs, you must start the server with the
highest level password that is set. That is, if an administrator password is set, you must enter the administrator password, not the power-on password, to run the diagnostic programs.
2. When the message F2 for Diagnostics appears, press F2.
3. Type in the appropriate password; then, press Enter.
4. Select either Extended or Basic from the top of the screen.
5. When the Diagnostic Programs screen appears, select the test you want to run from the list that appears; then, follow the instructions on the screen.
Notes:
a. If the server stops during testing and you cannot continue, restart the server
and try running the diagnostic programs again.
b. The keyboard and mouse (pointing device) tests assume that a keyboard
and mouse are attached to the server.
c. If you run the diagnostic programs with either no mouse or a USB mouse
attached to the server, you will not be able to navigate between test categories using the Next Cat and Prev Cat buttons. All other functions provided by mouse-selectable buttons are also available using the function keys.
d. You can test the USB keyboard by using the regular keyboard test. Also,
you can run the USB Interface test only if there are no USB devices attached.
e. You can view server configuration information (such as system configuration,
memory contents, interrupt request (IRQ) use, direct memory access (DMA) use, device drivers, and so on) by selecting Hardware Info from the top of the screen.
When the tests have completed, you can view the Test Log by selecting Utility from the top of the screen.
If the hardware checks out OK but the problem persists during normal server operations, a software error might be the cause. If you suspect a software problem, refer to the information that comes with the software package.
Diagnostics 15
Viewing the test log
The test log will not contain any information until after the diagnostic program has run.
Note: If you already are running the diagnostic programs, begin with step 3
To view the test log:
1. Turn on the server and watch the screen. If the server is on, shut down the operating system and restart the server.
2. When the message F2 for Diagnostics appears, press F2. If a power-on password or administrator password is set, the server prompts
you for it. Type in the appropriate password; then, press Enter.
3. When the Diagnostic Programs screen appears, select Utility from the top of the screen.
4. Select View Test Log from the list that appears; then, follow the instructions on the screen.
The system maintains the test-log data while the server is powered on. When you turn off the power to the server, the test log is cleared.
Diagnostic error message tables
For descriptions of the error messages that might appear when you run the diagnostic programs, see “Diagnostic error codes” on page 87.
Attention: If diagnostic error messages appear that are not listed in the tables, make sure that the server has the latest levels of BIOS, Integrated System Management Processor, ServeRAID, and diagnostics microcode installed.
Recovering BIOS code
If the BIOS code has become damaged, such as from a power failure during a flash update, you can recover the BIOS using the recovery boot block and a BIOS flash diskette.
Note: You can obtain a BIOS flash diskette from one of the following sources:
v Use the ServerGuide program to make a BIOS flash diskette. v Download a BIOS flash diskette from the World Wide Web. Go to
The flash memory of the server contains a protected area that cannot be overwritten. The recovery boot block is a section of code in this protected area that enables the server to start up and to read a flash diskette. The flash utility recovers the system BIOS from the BIOS recovery files on the diskette.
To recover the BIOS:
1. Turn off the server and peripheral devices and disconnect all external cables and power cords; then, remove the cover.
2. Locate the boot-block jumper block (J16) on the system board.
3. Place a jumper on pins 2 and 3 to enable BIOS backup page.
4. Insert the BIOS flash diskette into the diskette drive.
5. Restart the server.
http://www.ibm.com/pc/support/ and make the appropriate selections.
16 Hardware Maintenance Manual: xSeries 342 Type 8669
6. The system completes the power-on self-test (POST). Select 1 – Update POST/BIOS from the menu that contains various flash (update) options.
7. When you are asked if you would like to move the current POST/BIOS image to the backup ROM location, type N.
Attention: Typing Y will copy the corrupted BIOS into the secondary page.
8. When you are asked if you would like to save the current code to a diskette, select N.
9. You will be asked to choose which language you wish to use. Select your language (0-7) and press Enter to accept your choice. You will be prompted to remove the diskette and press Enter to restart the system. Remove the flash diskette from the diskette drive.
10. Turn off the server.
11. Remove the jumper on the boot-block jumper block or move it to pins 1 and 2 to return to normal startup mode.
12. Restart the server. The system should start up normally.
Troubleshooting the Ethernet controller
This section provides troubleshooting information for problems that might occur with the 10/100 Mbps Ethernet controller.
Network connection problems
If the Ethernet controller cannot connect to the network, check the following: v Make sure that the cable is installed correctly.
The network cable must be securely attached at all connections. If the cable is attached but the problem persists, try a different cable.
If you set the Ethernet controller to operate at 100 Mbps, you must use Category 5 cabling.
If you directly connect two workstations (without a hub), or if you are not using a hub with X ports, use a crossover cable.
Note: To determine whether a hub has an X port, check the port label. If the
label contains an X, the hub has an X port.
v Determine if the hub supports auto-negotiation. If not, try configuring the
integrated Ethernet controller manually to match the speed and duplex mode of the hub.
v Check the Ethernet controller lights on the operator information panel.
These lights indicate whether a problem exists with the connector, cable, or hub. – The Ethernet Link Status light illuminates when the Ethernet controller
receives a LINK pulse from the hub. If the light is off, there might be a bad connector or cable, or a problem with the hub.
– The Ethernet Transmit/Receive Activity light illuminates when the Ethernet
controller sends or receives data over the Ethernet Network. If the Ethernet Transmit/Receive Activity light is off, make sure that the hub and network are operating and that the correct device drivers are loaded.
– The Ethernet Speed 100 Mbps light illuminates when the Ethernet controller
LAN speed is 100 Mbps.
v Make sure that you are using the correct device drivers, supplied with the server. v Check for operating system-specific causes for the problem. v Make sure that the device drivers on the client and server are using the same
protocol.
v Test the Ethernet controller.
Diagnostics 17
The way you test the Ethernet controller depends on which operating system you are using (see the Ethernet controller device driver README file).
The Ethernet wrap test can be used to determine if a hardware problem is causing the Ethernet connection to fail. To perform the Ethernet wrap test, use the wrap plug (FRU # 60G3981) in conjunction with the diagnostic tests.
If the testing methods above indicate that the hardware is functioning normally but the problem still exists, inform the network administrator.
Ethernet controller troubleshooting chart
You can use the following troubleshooting chart to find solutions to 10/100 Mbps Ethernet controller problems that have definite symptoms.
Table 1. Ethernet troubleshooting chart
Ethernet controller problem Suggested Action
The server stops running when loading device drivers.
The PCI BIOS interrupt settings are incorrect.
Check the following: v Determine if the interrupt (IRQ) setting assigned to the Ethernet controller is also
assigned to another device in the Configuration/Setup Utility program. Although interrupt sharing is allowed for PCI devices, some devices do not function
well when they share an interrupt with a dissimilar PCI device. Try changing the IRQ assigned to the Ethernet controller or the other device. For example, for NetWare Versions 3 and 4 it is recommended that disk controllers not share interrupts with LAN controllers.
v Make sure that you are using the most recent device driver available from the
World Wide Web.
v Run the network diagnostic program.
If the problem remains, go to “Starting the diagnostic programs” on page 15 to run the diagnostic programs.
Ethernet Link Status light does not light.
The Ethernet Transmit/Receive Activity light does not light.
Data is incorrect or sporadic. Check the following:
Check the following:
v Make sure that the hub is turned on. v Check all connections at the Ethernet controller and the hub. v Check the cable. A crossover cable is required unless the hub has an X
designation.
v Use another port on the hub. v If the hub does not support auto-negotiation, manually configure the Ethernet
controller to match the hub.
v If you manually configured the duplex mode, make sure that you also manually
configure the speed.
v Run diagnostics on the LEDs.
If the problem remains, go to “Starting the diagnostic programs” on page 15 to run the diagnostic programs.
Check the following: Note: The Ethernet Transmit/Receive Activity LED illuminates only when data is sent to or by this Ethernet controller.
v Make sure that you have loaded the network device drivers. v The network might be idle. Try sending data from this workstation. v Run diagnostics on the LEDs. v The function of this LED can be changed by device driver load parameters. If
necessary, remove any LED parameter settings when you load the device drivers.
v Make sure that you are using Category 5 cabling when operating the server at 100
Mbps.
v Make sure that the cables do not run close to noise-inducing sources like
fluorescent lights.
18 Hardware Maintenance Manual: xSeries 342 Type 8669
Table 1. Ethernet troubleshooting chart (continued)
Ethernet controller problem Suggested Action
The Ethernet controller stopped working when another adapter was added to the server.
The Ethernet controller stopped working without apparent cause.
Check the following:
v Make sure that the cable is connected to the Ethernet controller. v Make sure that the PCI system BIOS is current. v Reseat the adapter. v Determine if the interrupt (IRQ) setting assigned to the Ethernet adapter is also
assigned to another device in the Configuration/Setup Utility program. Although interrupt sharing is allowed for PCI devices, some devices do not function
well when they share an interrupt with a dissimilar PCI device. Try changing the IRQ assigned to the Ethernet adapter or the other device.
If the problem remains, go to “Starting the diagnostic programs” on page 15 to run the diagnostic programs.
Check the following:
v Run diagnostics for the Ethernet controller. v Try a different connector on the hub. v Reinstall the device drivers. Refer to the operating-system documentation and to
the ServerGuide information.
If the problem remains, go to “Starting the diagnostic programs” on page 15 to run the diagnostic programs.
Power checkout
Power problems can be difficult to troubleshoot. For instance, a short circuit can exist anywhere on any of the power distribution busses. Usually a short circuit will cause the power subsystem to shut down because of an overcurrent condition.
A general procedure for troubleshooting power problems is as follows:
1. Power off the system and disconnect the AC cord(s).
2. Check for loose cables in the power subsystem. Also check for short circuits, for
instance if there is a loose screw causing a short circuit on a circuit board.
3. Remove adapters and disconnect the cables and power connectors to all
internal and external devices until system is at minimum configuration required for power on (see Minimum operating requirementson page 103).
4. Reconnect the AC cord and power on the system. If the system powers up
successfully, replace adapters and devices one at a time until the problem is isolated. If system does not power up from minimal configuration, replace FRUs of minimal configuration one at a time until the problem is isolated.
To use this method it is important to know the minimum configuration required for a system to power up (see page 103). For specific problems, see “Power error messages” on page 100.
Replacing the battery
When replacing the battery, you must replace it with a lithium battery of the same type from the same manufacturer. To avoid possible danger, read and follow the safety statement below.
To order replacement batteries, call 1-800-772-2227 within the United States, and 1-800-465-7999 or 1-800-465-6666 within Canada. Outside the U.S. and Canada, call your IBM reseller or IBM marketing representative.
Diagnostics 19
Note: After you replace the battery, you must reconfigure the system and reset the
system date and time.
v Statement 2
CAUTION:
When replacing the lithium battery, use only IBM Part Number 33F8354 or an equivalent type battery recommended by the manufacturer. If the system has a module containing a lithium battery, replace it only with the same module type made by the same manufacturer. The battery contains lithium and can explode if not properly used, handled, or disposed of.
Do not:
v Throw or immerse into water. v Heat to more than 100°C (212°F) v Repair or disassemble
Dispose of the battery as required by local ordinances or regulations.
Note: Before you begin be sure to read “Before you begin” on page 40. Follow any
special handling and installation instructions supplied with the replacement battery.
To replace the battery:
1. Review the information in “Before you begin” on page 40 and any special handling and installation instructions supplied with the replacement battery.
2. Turn off the server and peripheral devices and disconnect all external cables and power cords; then, remove the server cover.
3. Remove the battery: a. Use one finger to lift the battery clip over the battery. b. Use one finger to slightly slide the battery from its socket. The spring
mechanism behind the battery will push the battery out toward you as you slide it from the socket.
c. Use your thumb and index finger to pull the battery from under the battery
clip.
d. Ensure that the battery clip is touching the base of the battery socket by
pressing gently on the clip.
4. Insert the new battery: a. Tilt the battery so that you can insert it into the socket, under the battery
clip.
b. As you slide it under the battery clip, press the battery down into the socket.
20 Hardware Maintenance Manual: xSeries 342 Type 8669
Loading...
+ 126 hidden pages