IBM 8682 User Manual

ERserver

IBM xSeries 350 Type 8682
Hardware Maintenance Manual
ER s e r v e r

IBM xSeries 350 Type 8682
Hardware Maintenance Manual
: Note: Before using this information and the product it supports, be sure to read the general information
under “Notices” on page 160
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION AS ISWITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This publication could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time.
This publication was developed for products and services offered in the United States of America. IBM may not offer the products, services, or features discussed in this document in other countries, and the information is subject to change without notice. Consult your local IBM representative for information on the products, services, and features available in your area.
Requests for technical information about IBM products should be made to your IBM reseller or IBM marketing representative.
© Copyright International Business Machines Corporation 2000. All rights reserved.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

About this manual

This manual contains diagnostic information, a Symptom-to-FRU index, service information, error codes, error messages, and configuration information for the IBM
Eserver
Important: This manual is intended for trained servicers who are familiar with IBM
xSeries™350 Type 8682 server.
xSeries products. Before servicing an IBM product, be sure to review “Safety information” on page 127.

Important safety information

Be sure to read all caution and danger statements in this book before performing any of the instructions.
Leia todas as instruções de cuidado e perigo antes de executar qualquer operação.
Prenez connaissance de toutes les consignes de type Attention et Danger avant de procéder aux opérations décrites par les instructions. Lesen Sie alle Sicherheitshinweise, bevor Sie eine Anweisung ausführen.
®

Online support

Accertarsi di leggere tutti gli avvisi di attenzione e di pericolo prima di effettuare qualsiasi operazione.
Lea atentamente todas las declaraciones de precaución y peligro ante de llevar a cabo cualquier operación.
WARNING: Handling the cord on this product or cords associated with accessories sold with this product, will expose you to lead, a chemical known to the State of California to cause cancer, and birth defects or other reproductive harm. Wash
hands after handling.
ADVERTENCIA: El contacto con el cable de este producto o con cables de accesorios que se venden junto con este producto, pueden exponerle al plomo, un elemento químico que en el estado de California de los Estados Unidos está considerado como un causante de cancer y de defectos congénitos, además de otros riesgos reproductivos. Lávese las manos después de usar el producto.
You can download the most current diagnostic, BIOS flash, and device driver files from http://www.ibm.com/pc/support.
© Copyright IBM Corp. 2000 iii
iv IBM xSeries 350 Type 8682: Hardware Maintenance Manual

Contents

About this manual .......................iii
Important safety information ....................iii
Online support .........................iii
General checkout ........................1
Checkout procedure .......................2
General information .......................3
Features and specifications.....................3
Server features .........................5
Reliability, availability, and serviceability ................6
Start the server .........................8
Controls and indicators ......................9
Information LED panel ......................9
Diagnostics ..........................11
Diagnostic tools overview .....................11
POST ............................11
POST beep codes ......................12
POST error messages .....................12
Event/error logs .......................12
Small computer system interface messages ..............12
ServerGuide error symptoms....................12
Diagnostic programs and error messages ...............13
Text messages ........................14
Starting the diagnostic programs .................14
Viewing the test log ......................15
Diagnostic error message tables .................15
Light path diagnostics ......................16
Power supply LEDs ......................16
Diagnostic panel LEDs ....................18
Light Path Diagnostics ....................19
Power checkout ........................20
Recovering BIOS ........................20
Replacing the battery ......................21
Temperature checkout ......................23
Diagnosing errors ........................23
Troubleshooting the Ethernet controller ...............23
Network connection problems .................23
Ethernet controller troubleshooting chart .............24
Ethernet controller messages ..................25
Novell NetWare or IntraNetWare server ODI driver messages......25
Network driver interface specification 2.01 (OS/2) driver messages....28
NDIS 4.0 (Windows NT) driver messages .............29
UNIX messages ......................30
Configuring the server .....................33
Using the Configuration/Setup Utility program .............33
Starting the Configuration/Setup Utility program ............33
Choices available from the Configuration/Setup main menu .......34
Using passwords .......................37
Power-on password .....................38
Administrator password....................38
© Copyright IBM Corp. 2000 v
Using the SCSISelect utility program .................39
Starting the SCSISelect utility program ...............39
Choices available from the SCSISelect menu.............40
Installing options .......................41
Expanded view of the xSeries 350..................41
Component locations......................42
System board component locations ...............42
System board LED locations ..................46
Diagnostic panel LEDs ....................47
Memory board component locations ...............48
Advanced System Management Interconnect board component locations 49
Before you begin ........................50
System reliability considerations .................50
Working inside the server with the power on .............50
Handling static-sensitive devices .................51
Removing the server top cover and bezel ...............53
Working with adapters ......................54
Installing a hot-plug adapter ...................55
Cabling example for the ServeRAID adapter .............56
Installing internal drives......................59
Internal drive bays ......................59
Hard disk drive component locations ...............60
Installing a hot-swap hard disk drive ................62
Installing memory-module kits ..................63
Installing a microprocessor kit ...................66
Changing jumper positions ...................69
Three-pin jumper blocks ...................69
Installing a hot-swap power supply .................70
Installing a 3-Pack Ultra160 Hot-Swap Expansion Kit ...........71
Replacing a hot-swap fan .....................73
Completing the installation.....................74
Installing the server top cover and bezel ..............74
Updating the server configuration .................75
Connecting external options ....................75
Input/output ports ........................76
Parallel port .........................76
Viewing or changing the parallel-port assignments ..........76
Parallel port connector ....................77
Video port..........................77
Keyboard port ........................78
Auxiliary-device (pointing device) port ...............79
Ultra160 SCSI ports ......................79
SCSI cabling requirements ..................80
Setting SCSI IDs ......................80
SCSI connector pin-number assignments .............80
Serial ports .........................81
Viewing or changing the serial-port assignments ..........82
Serial-port connectors ....................82
Universal Serial Bus ports ....................82
USB cables and hubs ....................82
USB-port connectors.....................83
Ethernet port.........................83
Configuring the Ethernet controller................83
Failover for redundant Ethernet .................83
Ethernet port connector....................86
vi IBM xSeries 350 Type 8682: Hardware Maintenance Manual
Advanced System Management ports ...............87
Cabling the server........................88
FRU information (service only) ..................89
LED switch card ........................89
Four-fan housing ........................90
Dual fan mount assembly with cable .................91
Service processor card assembly ..................92
PCI switch card.........................93
Power backplane assembly ....................94
System board assembly with backerplate ...............95
Symptom-to-FRU index .....................97
Beep symptoms ........................97
No beep symptoms.......................100
Diagnostic panel error LEDs ...................101
Diagnostic error codes .....................103
Error symptoms ........................108
Power supply LED errors.....................110
POST error codes .......................110
ServeRAID ..........................116
I2C bus fault messages .....................117
SCSI error codes .......................118
Temperature error messages ...................118
Fan error messages ......................119
Power error messages......................119
System shutdown .......................120
Voltage related system shutdown.................120
Temperature related system shutdown...............121
DASD checkout ........................121
CPU checkout.........................121
Undetermined problems .....................122
Parts listing, Type 8682.....................123
System ...........................123
Keyboards ..........................125
Power cords .........................125
Related service information ...................127
Safety information .......................127
General safety .......................127
Electrical safety .......................128
Safety inspection guide ....................129
Handling electrostatic discharge-sensitive devices ..........130
Grounding requirements ....................130
Safety notices (multi-lingual translations) ..............131
Problem determination tips ....................160
Notices ...........................160
Trademarks..........................161
Contents vii
viii IBM xSeries 350 Type 8682: Hardware Maintenance Manual

General checkout

The server diagnostic programs are stored in upgradable read-only memory (ROM) on the system board. These programs are the primary method of testing the major components of the server: The system board, Ethernet controller, video controller, RAM, keyboard, mouse (pointing device), diskette drive, serial ports, hard drives, and parallel port. You can also use them to test some external devices. See “Diagnostic programs and error messages” on page 13
Also, if you cannot determine whether a problem is caused by the hardware or by the software, you can run the diagnostic programs to confirm that the hardware is working properly.
When you run the diagnostic programs, a single problem might cause several error messages. When this occurs, work to correct the cause of the first error message. After the cause of the first error message is corrected, the other error messages might not occur the next time you run the test.
A failed system might be part of a shared DASD cluster (two or more systems sharing the same external storage device(s)). Prior to running diagnostics, verify that the failing system is not part of a shared DASD cluster.
A system might be part of a cluster if:
v The customer identifies the system as part of a cluster. v One or more external storage units are attached to the system and at least one
of the attached storage units is additionally attached to another system or
unidentifiable source. v One or more systems are located near the failing system.
If the failing system is suspected to be part of a shared DASD cluster, all diagnostic tests can be run except diagnostic tests which test the storage unit (DASD residing in the storage unit) or the storage adapter attached to the storage unit.
Notes:
1. For systems that are part of a shared DASD cluster, run one test at a time in
looped mode. Do not run all tests in looped mode, as this could enable the DASD diagnostic tests.
2. If multiple error codes are displayed, diagnose the first error code displayed.
3. If the computer hangs with a POST error, go to the “Symptom-to-FRU index” on
page 97
4. If the computer hangs and no error is displayed, go to “Undetermined problems”
on page 122
5. Power supply problems, see “Symptom-to-FRU index” on page 97
6. Safety information, see “Safety information” on page 127
7. For intermittent problems, check the error log; see “POST error messages” on
page 12
© Copyright IBM Corp. 2000 1

Checkout procedure

1. IS THE SYSTEM PART OF A CLUSTER? YES. Schedule maintenance with the customer. Shut down all systems related to
the cluster. Run storage test. NO. Go to step 2.
2. IF THE SYSTEM IS NOT PART OF A CLUSTER:
v Power-off the computer and all external devices. v Check all cables and power cords. v Set all display controls to the middle position. v Power-on all external devices. v Power-on the computer. v Record any POST error messages displayed on the screen. If an error is
v Check the information LED panel System Error LED; if on, see “Diagnostic v Check the System Error Log. If an error was recorded by the system, see v Start the Diagnostic Programs. See “Diagnostic programs and error v Check for the following responses:
3. DID YOU RECEIVE BOTH OF THE CORRECT RESPONSES? NO. Find the failure symptom in “Symptom-to-FRU index” on page 97 YES. Run the Diagnostic Programs. If necessary, refer to “Diagnostic programs and
error messages” on page 13
displayed, look up the first error in the “POST error codes” on page 110 panel error LEDs” on page 101 “Symptom-to-FRU index” on page 97 messages” on page 13 a. One beep.
b. Readable instructions or the Main Menu.
If you receive an error, go to “Symptom-to-FRU index” on page 97 If the diagnostics completed successfully and you still suspect a problem, see
“Undetermined problems” on page 122
2 IBM xSeries 350 Type 8682: Hardware Maintenance Manual

General information

The IBM xSeries 350 server is a high-performance server with the capability of microprocessor upgrade to a symmetric multiprocessing (SMP) server. It is ideally suited for networking environments that require superior microprocessor performance, efficient memory management, flexibility, and large amounts of reliable data storage.
Performance, ease of use, reliability, and expansion capabilities were key considerations during the design of the server. These design features make it possible for you to customize the system hardware to meet your needs today, while providing flexible expansion capabilities for the future.
The xSeries 350 server comes with a three-year limited warranty and 90-Day IBM Start Up Support. If you have access to the World Wide Web, you can obtain up-to-date information about the server model and other IBM server products at the following World Wide Web address: http://www.ibm.com/eserver/xseries/

Features and specifications

The following provides a summary of the features and specifications for the xSeries 350 server.
v Microprocessor:
– Intel Pentium III Xeon
– 32 KB of level-1 cache
– 1 MB or 2 MB Level-2 cache depending upon model
– 100 MHz front-side bus (FSB)
– Supports up to four microprocessors
v Memory:
– Maximum: 16GB
– Type: ECC, SDRAM, Registered DIMMs
– 16 slots, 4-way interleaved
v Drives standard:
– Diskette: 1.44 MB
– CD-ROM: 40X IDE
v Expansion bays:
Hot-swap drives: Three standard slim-high, three optional slim-high
v Active PCI expansion slots:
– One 33 MHz/32-bit
– Three 66 MHz/64-bit
– Two 33 MHz/64-bit
v Hot-swap power supplies:
270 W (115-230 V ac)
– Minimum: One
– Maximum: Three
v Redundant cooling:
– Six hot-swap fans
v Video:
– S3 video controller
– Compatible with SVGA and VGA
– 8 MB video memory
© Copyright IBM Corp. 2000 3
v Size (4 U)
– Height: 178 mm (7 in.) (4 U) – Depth: 711.2 mm (28 in.) – Width: 482.6 mm (19 in.) – Weight: 34.9 kg (77 lb.) to 50.4 kg (111 lb.) depending upon configuration
v Integrated functions:
– Advanced System Management processor with Light Path Diagnostics – Dual channel Ultra160 SCSI controller (one internal and one external channel)
(non-RAID) – One 10BASE-T/100BASE-TX AMD Ethernet controller – Two serial ports – One parallel port – Two universal serial bus ports – Keyboard port – Mouse port – Video port
v Acoustical noise emissions:
– Sound power, idling: 6.3 bel maximum – Sound power, operating: 6.3 bel maximum – Sound pressure, operating: 48 dBa maximum
v Environment:
– Air temperature:
- Server on: 10° to 35°C (50° to 95°F). Altitude: 0 to 914 m (3000 ft.)
- Server on: 10° to 32°C (50° to 89.6°F). Altitude: 914 m (3000 ft.) to 2133 m (7000 ft.)
- Server off: 10° to 43°C (50° to 110°F). Maximum altitude: 2133 m (7000 ft.)
– Humidity:
- Server on: 8% to 80%
- Server off: 8% to 80%
v Heat output:
Approximate heat output in British Thermal Units (BTU) per hour – Minimum configuration:461 BTU (0.14 kilowatts per hour) – Maximum configuration: 1796 BTU (0.53 kilowatts per hour)
v Electrical input:
– Sine-wave input (50-60 Hz) required – Input voltage low range:
- Minimum: 90 V ac
- Maximum: 137 V ac
– Input voltage high range:
- Minimum: 180 V ac
- Maximum: 265 V ac
– Input kilovolt-amperes (kVA) approximately:
- Minimum: 0.08 kVA
- Maximum: 0.52 kVA
4 IBM xSeries 350 Type 8682: Hardware Maintenance Manual

Server features

The unique design of the server takes advantage of advancements in symmetric multiprocessing (SMP), data storage, and memory management. The server combines:
v Impressive performance using an innovative approach to SMP
The server supports up to four Pentium III Xeon processors. The server comes with at least one processor installed; you can install additional processors to enhance performance and provide SMP capability.
v Large data-storage and hot-swap capabilities
All models of the server support up to three standard and three optional 26 mm (1-inch) slim-high 3.5-inch hot-swap hard disk drives in the hot-swap bays. This hot-swap feature enables you to remove and replace hard disk drives without turning off the server.
v Active PCI (hot-plug) adapter capabilities
The server has six hot-plug slots for PCI adapters. With operating system support, you can replace failing hot-plug PCI adapters without turning off the server. If the hot-add feature is supported by the operating system and the PCI adapter, you can also add PCI adapters in these slots without turning off the server.
v Redundant cooling and power capabilities
The redundant cooling and hot-swap capabilities of the fans in the server enable continued operation if one of the fans fails. You can also replace a failing fan without turning off the server.
The server comes standard with one 270-watt power supply. Install three 270-watt power supplies to ensure redundancy and hot-swap capability for a typical configuration. (See “Installing a hot-swap power supply” on page 70 for instructions.)
v 100 MHz front-side bus (FSB)
The FSB is the processor external bus. This bus is the interface between the processors and the system board. The FSB is also known as the processor/host bus.
v Large system memory
The memory bus in the server supports up to 16 GB of system memory. The memory controller provides error correcting code (ECC) support for up to 16 industry-standard, 3.3 V, 168-pin, 8-byte, PCI, PC100 registered, dual inline memory modules (DIMMs). The memory controller also provides Chipkill
memory protection. Chipkill memory protection is a technology that protects the system from a single chip failure on a DIMM.
v System-management capabilities
The server comes with an Advanced System Management Processor on the system board. This processor enables you to manage the functions of the server locally and remotely. The Advanced System Management Processor also provides system monitoring, event recording, and dial-out alert capability.
Note: The Advanced System Management Processor is sometimes referred to
as the service processor.
v Integrated network environment support
The server comes with an Ethernet controller on the system board. This Ethernet controller has an interface for connecting to 10-Mbps or 100-Mbps networks. The server automatically selects between 10BASE-T and 100BASE-TX. The controller
General information 5
provides full-duplex (FDX) capability, which enables simultaneous transmission and reception of data on the Ethernet local area network (LAN).
v Redundant network-interface card (NIC)
The addition of an optional, redundant network-interface card (NIC) provides a failover capability to a redundant Ethernet connection. If a problem occurs with the primary Ethernet connection, all Ethernet traffic associated with this primary connection is automatically switched to the redundant NIC. This switching occurs without data loss and without user intervention.
v IBM ServerGuide
CDs
The ServerGuide CDs included with xSeries servers provide programs to help you set up the server and install the network operating system (NOS). The ServerGuide program detects the hardware options installed, and provides the correct configuration program and device drivers. In addition, the ServerGuide CDs include a variety of application programs such as IBM Update Connector
to help keep the server basic input/output system (BIOS) and microcode updated.
Note: The latest level of BIOS for the server is also available through the World
Wide Web. Refer to “Recovering BIOS” on page 20 for the appropriate World Wide Web addresses and bulletin-board telephone numbers.
The server is designed to be cost-effective, powerful, and flexible. It uses peripheral component interconnect (PCI) bus architecture to provide compatibility with a wide range of existing hardware devices and software applications.
As always, the IBM server meets stringent worldwide certifications for power, electromagnetic compatibility (EMC), and safety. See “Related service information”
on page 127 for additional information.

Reliability, availability, and serviceability

Three of the most important features in server design are reliability, availability, and serviceability (RAS). These factors help to ensure the integrity of the data stored on the server; that the server is available when you want to use it; and that should a failure occur, you can easily diagnose and repair the failure with minimal inconvenience.
The following is an abbreviated list of the RAS features that the server supports.
v Cooling fans with speed-sensing capability (hot-swap) v Error correcting code (ECC) FSBs v ECC L2 cache v ECC memory v Fast power-on self-test (POST) v 45°C (113°F) normal operating temperature for hard disk drives v Parity checking on the small computer system interface (SCSI) bus and PCI
buses
v Power Managed - Advanced Configuration and Power Interface (ACPI) level v System management monitoring via Intra-Integrated Circuit (I2C) bus v Ambient temperature monitoring v Automatic error retry/recovery v Automatic restart after a power failure v Built-in temperature/fan/voltages monitoring v Chipkill memory protection v Fault-resistant startup v Hot-swap drive bays
6 IBM xSeries 350 Type 8682: Hardware Maintenance Manual
v Hot-swap hard disk drives v Active PCI (hot-plug) adapter slots v Information and diagnostic LED panels v Menu-driven setup, system configuration, SCSISelect configuration, and
diagnostic programs
v Memory scrubbing and Predictive Failure Analysis
®
(PFA) (background and real
time)
v Microcode and diagnostic levels available v NIC failover support v Power and temperature monitoring v Power-supply redundancy monitoring v Predictive Failure Analysis (PFA) alerts v Redundant Ethernet capabilities (with optional adapter) v Redundant hot-swap cooling v Redundant and hot-swap power supplies v Remote Connect v Remote system problem-determination support v System auto-configuring from a configuration menu v Upgradable POST, BIOS, diagnostics, and Advanced System Management
Processor microcode
v Wake on LAN v Windows NT
®
capability
®
failover support
v Alert on LAN™capability v Backup BIOS switching by jumper v Error codes and messages v Integrated service processor subsystem provides control for remote system
management
v Processor serial number access v Standard cables present detection v System error logging (POST and Advanced System Management Processor) v Vital Product Data (VPD) on microprocessors, system board, power supplies,
hot-swap-drive backplane, and power backplane
General information 7

Start the server

Use the following procedure to start the server.
1. Turn on all external devices, such as the monitor. Note: After you plug the power cord into an outlet, wait 20 seconds before
pressing the power control button. During this time, the system-management processor is initializing and the power control button does not respond.
2. Press the power control button on the front of the server. The power-on light comes on and the power-on self-test (POST) begins.
v If the server is turned on and a power failure occurs, the server will start
automatically when power is restored.
v The server can also be turned on by the Advanced System Management
Processor.
When you turn off the server, observe the following precaution:
Statement 5
CAUTION: The power control button on the device and the power switch on the power supply do not turn off the electrical current supplied to the device. The device also might have more than one power cord. To remove all electrical current from the device, ensure that all power cords are disconnected from the power source.
2 1
The server can be turned off as follows: v You can turn off the server by pressing the power-control button on the front of
the server. Note: After turning off the server, wait at least five seconds before pressing the
power-control button to turn on the server again.
v You can disconnect the server power cords from the electrical outlets to shut off
all power to the server. Note: Wait about 15 seconds after disconnecting the power cords for the system
to stop running. Watch for the system-power light on the information LED panel to stop blinking.
The following section describes the controls and indicators on the server.
8 IBM xSeries 350 Type 8682: Hardware Maintenance Manual

Controls and indicators

1 Power-control button: Press this button to manually turn on or off the
server.
2 Reset button: Press this button to reset the server and run the power-on
self-test (POST).
3 Hard-disk drive activity light: Each hot-swap drive has a hard–disk drive
activity light. When this green light is flashing, the drive is being accessed.
4 Hard-disk drive status light: Each hot-swap drive has a hard-disk drive
status light. With a ServeRAID continuously, it means that the drive has failed.

Information LED panel

The information panel on the front of the server contains status lights. The following illustration shows the server information panel.
POWER RESET
1 System power: When this green light is on, system power is present in the
server. When this light flashes, the server is in standby mode (the system power supply is turned off and ac current is present). When this light is off, either a power supply, AC power, or a light has failed. The power light is located above and between the power-control button and the reset button.
Attention: If this light is off, it does not mean there is no electrical current
installation, if this amber light is on
LINK
OK
SCSI ACT LINK OK
TX
100 MB
RX
100 MB TX/RX INFO SYS ERROR
present in the server. The light might be burned out. To remove all electrical current from the server, you must unplug the server power cords from the electrical outlets.
2 Hard disk drive activity light: This green light is on when there is activity
on a hard disk drive.
3 Ethernet-link status light: When this green light is on, there is an active
connection on the Ethernet port. The Ethernet transmit/receive activity light is also located on the Ethernet (RJ-45) connector on the rear of the server.
4 Information light: When this amber light is on, the server power supplies
General information 9
are nonredundant or some other noncritical event has occurred. Check the diagnostic LED panel for more information (see “Diagnostic panel LEDs” on page 18).
5 System error light: This amber light is on when a system error occurs. A
light on the diagnostics LED panel will also be on to further isolate the error. (For more information, see “Diagnostic panel LEDs” on page 18)
5 Ethernet transmit/receive activity light: When this green light is on, there
is activity between the server and the network. The Ethernet transmit/receive activity light is also located on the Ethernet (RJ-45) connector on the rear of the server.
7 Ethernet speed 100 Mbps: When this green light is on, the Ethernet speed
is 100 Mbps. When the light is off, the Ethernet speed is 10 Mbps.
10 IBM xSeries 350 Type 8682: Hardware Maintenance Manual

Diagnostics

This section provides basic troubleshooting information to help you resolve some common problems that might occur with the server.
If you cannot locate and correct the problem using the information in this section, refer to “Symptom-to-FRU index” on page 97 for more information.

Diagnostic tools overview

The following tools are available to help you identify and resolve hardware-related problems:
v POST beep codes, error messages, and error logs
The power-on self-test (POST) generates beep codes and messages to indicate successful test completion or the detection of a problem. See “POST” for more information.
v Diagnostic programs and error messages
The server diagnostic programs are stored in upgradable read-only memory (ROM) on the system board. These programs are the primary method of testing the major components of the server. See “Diagnostic programs and error messages” on page 13 for more information.
v Light Path Diagnostics
The server has light-emitting diodes (LEDs) to help you identify problems with server components. These LEDs are part of the light-path diagnostics that are built into the server. By following the path of lights, you can quickly identify the type of system error that occurred. See “Light path diagnostics” on page 16 for more information.
v Error symptoms
These charts list problem symptoms, along with suggested steps to correct the problems. See “Diagnosing errors” on page 23 for more information.

POST

When you turn on the server, it performs a series of tests to check the operation of server components and some of the options installed in the server. This series of tests is called the power-on self-test or POST.
If POST finishes without detecting any problems, a single beep sounds, the first screen of the operating system or application program appears.
If POST detects a problem, more than one beep sounds and an error message appears on the screen. See “POST beep codes” on page 12 and “POST error messages” on page 12 for more information.
Notes:
1. If you have a power-on password or administrator password set, you must type the password and press Enter, when prompted, before POST will continue.
2. A single problem might cause several error messages. When this occurs, work to correct the cause of the first error message. After you correct the cause of the first error message, the other error messages usually will not occur the next time you run the test.
© Copyright IBM Corp. 2000 11

POST beep codes

POST generates beep codes to indicate successful completion or the detection of a problem.
v One beep indicates the successful completion of POST. v More than one beep indicates that POST detected a problem. For more
information, see “Beep symptoms” on page 97

POST error messages

POST error messages occur during startup when POST finds a problem with the hardware or detects a change in the hardware configuration. For a list of POST errors, see “POST error codes” on page 110

Event/error logs

The POST error log contains the three most recent error codes and messages that the system generated during POST. The System Event/Error Log contains all error messages issued during POST and all system status messages from the Advanced System Management Processor.
To view the contents of the error logs, start the Configuration/Setup Utility program (see “Starting the Configuration/Setup Utility program” on page 33); then, select
Event/Error Logs from the main menu.

Small computer system interface messages

If you receive a SCSI error message, see “SCSI error codes” on page 118 Note: If the server does not have a hard disk drive, ignore any message that
indicates that the BIOS is not installed.
You will get these messages only when running the SCSI Select Utility.

ServerGuide error symptoms

Look for symptoms in the left column of the following chart. Probable solutions appear in the right column.
Setup Action
Setup and Installation CD won’t start. v Be sure the system is a supported eServer with a startable
(bootable) CD-ROM drive.
v If the startup (boot) sequence settings have been altered, be sure
the CD-ROM is first in the boot sequence.
v If more than one CD-ROM drive is installed, be sure that only one
drive is set as the primary drive. Start the CD from the primary drive.
ServeRAID program cannot view all installed drives – or – cannot install NOS.
The Operating System Installation program continuously loops.
ServerGuide won’t start your NOS CD. Be sure the NOS CD you have is supported by ServerGuide. See the
Can’t install NOS – option is grayed out. Either there is no logical drive defined (ServeRAID systems) or the
v Be sure there are no duplicate SCSI IDs or IRQ assignments. v Be sure that the hard disk drive is connected properly.
Free up more space on the hard disk.
Setup and Installation CD label for a list of NOS versions supported.
ServerGuide system partition is not present. Run the setup and configuration program.
12 IBM xSeries 350 Type 8682: Hardware Maintenance Manual
TechConnect CD Action
®
Can’t start TechConnect
Can’t view publications from TechConnect CD, or text is unreadable.
Diskette Factory CD Action
Get “time out” or “Unknown host” errors Be sure you have access to the Internet through FTP directly.
CD. Be sure you’re starting the CD on a system with Microsoft
Windows®installed. Be sure you have the Adobe reader installed (available from the
TechConnect CD).
®

Diagnostic programs and error messages

The server diagnostic programs are stored in upgradable read-only memory (ROM) on the system board. These programs are the primary method of testing the major components of the server.
Diagnostic error messages indicate that a problem exists; they are not intended to be used to identify a failing part. Troubleshooting and servicing of complex problems that are indicated by error messages should be performed by trained service personnel.
Sometimes the first error to occur causes additional errors. In this case, the server displays more than one error message. Always follow the suggested action instructions for the first error message that appears.
The following sections contain the error codes that might appear in the detailed test log and summary log when running the diagnostic programs.
The error code format is as follows:
fff-ttt-iii-date-cc-text message
where: fff is the three-digit function code that indicates the function being
tested when the error occurred. For example, function code 089 is for the microprocessor.
ttt is the three-digit failure code that indicates the exact test failure that
was encountered.
iii is the three-digit device ID. date is the date that the diagnostic test was run and the error recorded. cc is the check digit that is used to verify the validity of the information. text message is the diagnostic message that indicates the reason for the problem.
Diagnostics 13

Text messages

The diagnostic text message format is as follows:
Function Name: Result (test specific string)
where:
Function Name
is the name of the function being tested when the error occurred. This corresponds to the function code (fff) given in the previous list.
Result
can be one of the following:
Passed
This result occurs when the diagnostic test completes without any errors.
Failed This result occurs when the diagnostic test discovers an error. User Aborted
This result occurs when you stop the diagnostic test before it is complete.
Not Applicable
This result occurs when you specify a diagnostic test for a device that is not present.
Aborted
This result occurs when the test could not proceed because of the system configuration.
Warning
This result occurs when a possible problem is reported during the diagnostic test, such as when a device that is to be tested is not installed.
Test Specific String
This is additional information that you can use to analyze the problem.

Starting the diagnostic programs

You can press F1 while running the diagnostic programs to obtain Help information. You also can press F1 from within a help screen to obtain online documentation from which you can select different categories. To exit Help and return to where you left off, press Esc.
To start the diagnostic programs:
1. Turn on the server and watch the screen. Note: To run the diagnostic programs, you must start the server with the
highest level password that is set. That is, if an administrator password is set, you must enter the administrator password, not the power-on password, to run the diagnostic programs.
2. When the message F2 for Diagnostics appears, press F2.
3. Type in the appropriate password; then, press Enter.
4. Select either Extended or Basic from the top of the screen.
5. When the Diagnostic Programs screen appears, select the test you want to run from the list that appears; then, follow the instructions on the screen.
14 IBM xSeries 350 Type 8682: Hardware Maintenance Manual
Notes:
a. If the server stops during testing and you cannot continue, restart the server
and try running the diagnostic programs again.
b. The keyboard and mouse (pointing device) tests assume that a keyboard
and mouse are attached to the server.
c. If you run the diagnostic programs with no mouse attached to the server,
you will not be able to navigate between test categories using the Next Cat and Prev Cat buttons. All other functions provided by mouse-selectable buttons are also available using the function keys.
d. You can run the USB interface test and the USB external loopback test only
if there are no USB devices attached.
e. You can view server configuration information (such as system configuration,
memory contents, interrupt request (IRQ) use, direct memory access (DMA) use, device drivers, and so on) by selecting Hardware Info from the top of the screen.
When the tests have completed, you can view the Test Log by selecting Utility from the top of the screen.
If the hardware checks out OK but the problem persists during normal server operations, a software error might be the cause. If you suspect a software problem, refer to the information that comes with the software package.

Viewing the test log

The test log will not contain any information until after the diagnostic program has run.
Note: If you already are running the diagnostic programs, begin with step 3 To view the test log:
1. Turn on the server and watch the screen. If the server is on, shut down the operating system and restart the server.
2. When the message F2 for Diagnostics appears, press F2. If a power-on password or administrator password is set, the server prompts
you for it. Type in the appropriate password; then, press Enter.
3. When the Diagnostic Programs screen appears, select Utility from the top of the screen.
4. Select View Test Log from the list that appears; then, follow the instructions on the screen.
The system maintains the test-log data while the server is powered on. When you turn off the power to the server, the test log is cleared.

Diagnostic error message tables

For descriptions of the error messages that might appear when you run the diagnostic programs, see “Diagnostic error codes” on page 103 If diagnostic error messages appear that are not listed in those tables, make sure that the server has the latest levels of BIOS, Advanced System Management Processor, ServeRAID, and diagnostics microcode installed.
Diagnostics 15

Light path diagnostics

The server has LEDs to help you identify problems with some server components. These LEDs are part of the light path diagnostics built into the server. By following the path you can quickly identify the type of system error that occurred.
Status LEDs are located on the following components:
v Information panel v Hard disk drive trays v Power supply v Diagnostic panel v System board

Power supply LEDs

The ac and dc power LEDs on the power supply provide status information about the power supply. See “Installing a hot-swap power supply” on page 70 for the location of these LEDs.
1 Filler panel 2 AC power light 3 DC power light 4 Power supply handle 5 Power supply
16 IBM xSeries 350 Type 8682: Hardware Maintenance Manual
The following table describes the ac and dc power LEDs.
AC power LED DC power LED Description and action
On On The power supply is on and operating correctly. On Off There is a dc power problem.
Possible causes:
1. The server is not turned on (the power LED is blinking on the front of the server). Action: Press the power-control button to start the server.
2. The power supply has failed. Action: Replace the power supply.
Off Off There is an ac power problem.
Possible causes:
1. There is no ac power to the power supply. Actions: Verify that:
v The electrical cord is properly connected to the server. v The electrical outlet functions properly.
2. The power supply has failed. Action: Replace the power supply.
Diagnostics 17
Diagnostic panel LEDs
The following illustration shows the LEDs on the diagnostics panel inside the server. See Table 1 on page 19 for information on identifying problems using these LEDs.
DASD
NMI SP Bus Event Log Non Red Over Spec Temp
PCI Bus
Power
Supply
CPU Memory
A B C D
1 2 3 4
Fan
1 System board connector 2 Diskette drive connector
The meanings of these LEDs are as follows:
CPU Microprocessor fault Memory Memory fault PCI Bus A (PCIA) PCI bus A fault PCI Bus B (PCIB) PCI bus B fault PCI Bus C (PCIC) PCI bus C fault PCI Bus D (PCID) Not implemented at this time Power supply 1 Power supply number 1 failure Power supply 2 Power supply number 2 failure Power supply 3 Power supply number 3 failure FAN Fan failure DASD Hard disk drive fault NMI Nonmaskable interrupt SP Bus Service processor failure Event Log Not implemented at this time. NON RED Nonredundant power mode OVER SPEC Over specification TEMP System temperature failure
18 IBM xSeries 350 Type 8682: Hardware Maintenance Manual
Notes:
1. The server does not support replaceable voltage regulator modules (VRMs).
2. The server supports a maximum of three PCI buses.
3. The server supports a maximum of three power supplies.
Light Path Diagnostics
You can use the light path diagnostics built into the server to quickly identify the type of system error that occurred. The server is designed so that LEDs remain illuminated when the server shuts down, as long as the power supplies are operating properly. This feature helps you to isolate the problem if an error causes the server to shut down.
If the system error LED (on the information LED panel) is not lit and no diagnostics panel LEDs are lit, it means that the light path diagnostics have not detected a system error.
If the system error LED (on the information LED panel) is lit, it means that a system error was detected. Check to see which of the LEDs on the diagnostics panel inside the server are lit and refer to the following table:
Table 1. Light Path Diagnostics
LED on Cause None
CPU One of the microprocessors has failed. (See “Diagnostic panel error LEDs” on page 101) Memory A memory error occurred. (See “Diagnostic panel error LEDs” on page 101) PCIA An error occurred on PCI bus A. An adapter in PCI slot 1, or the system board, caused the error.
PCIB An error occurred on PCI bus B. An adapter in PCI slot 2, 3, or 4, or the system board, caused the
PCIC An error occurred on PCI bus C. An adapter in PCI slot 5 or 6, or the system board, caused the
PCID Not implemented at this time. PS1 The first power supply has failed. (See “Diagnostic panel error LEDs” on page 101) PS2 The second power supply has failed. (See “Diagnostic panel error LEDs” on page 101) PS3 The third power supply has failed. (See “Diagnostic panel error LEDs” on page 101) Fan One of the fan assemblies has failed or is operating too slowly.
DASD A hot-swap hard disk drive has failed on SCSI channel B (see “Diagnostic panel error LEDs” on
NMI A nonmaskable interrupt occurred. (The PCIA, PCIB, PCIC, or Memory LED will probably also be
SP The service processor has failed. (See “Diagnostic panel error LEDs” on page 101) Event Log Not implemented at this time. Non Red System is operating in non-redundant power mode. (See “Diagnostic panel error LEDs” on
1. The system error log is 75% or more full or a PFA alert was logged. (See “Diagnostic panel error LEDs” on page 101)
2. Bad, missing, or mis-installed processor terminator.
(See “Diagnostic panel error LEDs” on page 101)
error. (See “Diagnostic panel error LEDs” on page 101)
error. (See “Diagnostic panel error LEDs” on page 101)
Note: A failing fan can also cause the TEMP and/or DASD LEDs to be on; see “Diagnostic panel error LEDs” on page 101.
page 101).
on; see “Diagnostic panel error LEDs” on page 101.) Note: The NMI LED can only be reset by completely removing power from system.
page 101)
Diagnostics 19
Table 1. Light Path Diagnostics (continued)
LED on Cause Over Spec The server is drawing more power than the power supplies are rated for. (See “Diagnostic panel
error LEDs” on page 101)
Temp The system temperature has exceeded the maximum rating. (See “Diagnostic panel error LEDs” on
page 101)

Power checkout

Power problems can be difficult to troubleshoot. For instance, a short circuit can exist anywhere on any of the power distribution busses. Usually a short circuit will cause the power subsystem to shut down because of an overcurrent condition.
A general procedure for troubleshooting power problems is as follows:
1. Power off the system and disconnect the AC cord(s).
2. Check for loose cables in the power subsystem. Also check for short circuits, for instance if there is a loose screw causing a short circuit on a circuit board.
3. Remove adapters and disconnect the cables and power connectors to all internal and external devices until system is at minimum configuration required for power on (see Minimum operating requirementson page 122).
4. Reconnect the AC cord and power on the system. If the system powers up successfully, replace adapters and devices one at a time until the problem is isolated. If system does not power up from minimal configuration, replace FRUs of minimal configuration one at a time until the problem is isolated.
To use this method it is important to know the minimum configuration required for a system to power up (see page 122). For specific problems, see “Power error messages” on page 119

Recovering BIOS

If the BIOS code in the server has become corrupted, such as from a power failure during a flash update, you can recover the BIOS using the recovery boot block and a BIOS flash diskette.
Note: You can obtain a BIOS flash diskette from one of the following sources:
The flash memory of the server consists of a primary page and a backup page. The J14 jumper controls which page is used to start the server. If the BIOS in the primary page is corrupted, you can use the backup page to start the server; then boot the BIOS Flash Diskette to restore the BIOS to the primary page.
To recover the BIOS:
v Use the ServerGuide program to make a BIOS flash diskette. v Download a BIOS flash diskette from the World Wide Web. Go to
http://www.pc.ibm.com/support/, select IBM Server Support, and make the selections for the server.
1. Turn off the server and peripheral devices and disconnect all external cables and power cords; then, remove the cover.
2. Locate jumper J14 on the processor board (see “System board jumpers” on page 44).
3. Move J14 to pins 1 and 2 to enable secondary boot block page.
20 IBM xSeries 350 Type 8682: Hardware Maintenance Manual
Loading...
+ 144 hidden pages