IBM 8646, eServer xSeries 220 Type 8646 Hardware Maintenance Manual

ERserver
򔻐򗗠򙳰
Hardware Maintenance Manual
xSeries 220 Type 8646
ER s e r v e r
򔻐򗗠򙳰
Hardware Maintenance Manual
xSeries 220 Type 8646
Note:
Before using this information and the product it supports, be sure to read the general information under “Notices” on page 145.
First Edition (August, November 2001)
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION AS ISWITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some jurisdictions do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This publication could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time.
This publication was developed for products and services offered in the United States of America. IBM may not offer the products, services, or features discussed in this document in other countries, and the information is subject to change without notice. Consult your local IBM representative for information on the products, services, and features available in your area.
Requests for technical information about IBM products should be made to your IBM reseller or IBM marketing representative.
© Copyright International Business Machines Corporation 2001. All rights reserved.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
About this manual
This manual contains diagnostic information, a Symptom-to-FRU index, service information, error codes, error messages, and configuration information for the xSeries 220.
Important: This manual is intended for trained servicers who are familiar with
IBM PC Server products. Before servicing an IBM product, be sure to review Safety informationon page 111.
Important safety information
Be sure to read all caution and danger statements in this book before performing any of the instructions.
Leia todas as instruções de cuidado e perigo antes de executar qualquer operação.
Prenez connaissance de toutes les consignes de type Attention et
Danger avant de procéder aux opérations décrites par les instructions.
© Copyright IBM Corp. 2001 iii
Lesen Sie alle Sicherheitshinweise, bevor Sie eine Anweisung ausführen.
Accertarsi di leggere tutti gli avvisi di attenzione e di pericolo prima di effettuare qualsiasi operazione.
Lea atentamente todas las declaraciones de precaución y peligro ante de llevar a cabo cualquier operación.
Online support
Use the World Wide Web (WWW) to download Diagnostic, BIOS Flash, and Device Driver files.
File download address is:
http://www.ibm.com/pc/support
iv Hardware Maintenance Manual: xSeries 220 Type 8646
Contents
About this manual ..........iii
Important safety information ........iii
Online support .............iv
General checkout ..........1
General information .........3
Features and specifications..........4
Serverfeatures..............5
Reliability, availability, and serviceability .....6
Servercontrolsandindicators ........7
Turning on the server ..........8
Turning off the server ..........9
Diagnostics.............11
Diagnostic tools overview .........11
POST ................11
POST beep codes ...........11
POST error messages ..........12
POST error log ............12
Small computer system interface messages ....12
Diagnostic programs and error messages ....12
Textmessages ............13
Starting the diagnostic programs ......14
Viewing the test log ..........14
Diagnostic error messages ........15
Power checkout .............15
Recovering BIOS ............15
Identifying problems using status LEDs .....17
Front panel and system board LEDs .....17
Diagnostic LEDs............17
Replacing the battery ...........19
Temperature checkout ...........21
Diagnosing errors ............21
Troubleshooting the Ethernet controller ....21
Ethernet controller messages........23
System board internal cable connectors ....41
System board external port connectors ....41
System board jumpers and switches .....42
Before you begin ............44
System reliability considerations .......44
Rotating the stabilizing feet .........44
Removing the side cover ..........45
Removingthebezel ...........46
Removing the support bracket assembly.....47
Working with adapters ..........48
Adapter considerations .........48
Installing an adapter ..........49
Installing internal drives ..........51
Internal drive bays ...........51
Preinstallation steps (all bays) .......53
Installing a drive in bay 1, 2, 3, or 4 .....53
Installing a non-hot-swap hard disk drive in bay
5,6,or7..............55
Installing a hot-swap hard disk drive in bay 5, 6,
or7................56
Installing memory modules .........58
Installing and removing a microprocessor ....61
Installing a microprocessor ........62
Removing a microprocessor ........64
Installing the bezel ............65
Installing the cover ............65
Connecting external options .........66
I/O connector locations ..........66
Input/output ports ............67
Parallel port .............67
Serial ports .............69
Universal Serial Bus ports ........69
Keyboard port ............70
Auxiliary-device (pointing device) port ....71
Videoport..............71
SCSI port ..............72
Ethernetport.............74
Configuration ............29
Using the Configuration/Setup Utility program . . 29
Starting the Configuration/Setup Utility program 29 Choices available from the Configuration/Setup
mainmenu.............30
Usingpasswords ...........33
Using the SCSISelect utility program ......35
Starting the SCSISelect utility program ....35
Choices available from the SCSISelect menu . . 36
Using the PXE boot agent utility program ....37
Starting the PXE boot agent utility program . . 37 Choices available from the PXE boot agent menu 37
Installing options ..........39
Major components of the xSeries 220 server . . . 39
System board..............40
System board options connectors ......40
© Copyright IBM Corp. 2001 v
FRU information (service only) ....79
Buttonkit...............79
Hot-swap hard disk drive cage ........80
Hot-swap backplane ...........81
Power supply .............82
Rearfan...............83
System board..............84
CD-ROMdrive.............85
Floppy disk drive ............85
Bezel release latch ............86
Top/sidecover.............87
Handle assembly ............88
Adapter retainer.............89
Symptom-to-FRU index .......91
Beepsymptoms.............91
Nobeepsymptoms............94
Diagnostic error codes ...........94
Errorsymptoms.............97
POST error codes ............99
System board LEDs ...........105
ServeRAID..............105
Undetermined problems..........105
Parts listing, Type 8646 .......107
System ...............108
Keyboards ..............109
Powercords..............110
Related service information .....111
Safety information ............111
General safety ............111
Electrical safety............112
Safety inspection guide .........113
Handling static-sensitive devices ......114
Grounding requirements.........115
Safety notices (multi-lingual translations) . . . 115
Send us your comments! .........144
Problem determination tips.........145
Notices ...............145
Trademarks..............146
vi Hardware Maintenance Manual: xSeries 220 Type 8646
General checkout
The server diagnostic programs are stored in upgradable read-only memory (ROM) on the system board. These programs are the primary method of testing the major components of the server: The system board, Ethernet controller, video controller, RAM, keyboard, mouse (pointing device), diskette drive, serial ports, hard drives, and parallel port. You can also use them to test some external devices. See Diagnostic programs and error messageson page 12.
Also, if you cannot determine whether a problem is caused by the hardware or by the software, you can run the diagnostic programs to confirm that the hardware is working properly.
When you run the diagnostic programs, a single problem might cause several error messages. When this occurs, work to correct the cause of the first error message. After the cause of the first error message is corrected, the other error messages might not occur the next time you run the test.
A failed system might be part of a shared DASD cluster (two or more systems sharing the same external storage device(s)). Prior to running diagnostics, verify that the failing system is not part of a shared DASD cluster.
A system might be part of a cluster if:
v The customer identifies the system as part of a cluster. v One or more external storage units are attached to the system and at least one of
the attached storage units is additionally attached to another system or unidentifiable source.
v One or more systems are located near the failing system.
If the failing system is suspected to be part of a shared DASD cluster, all diagnostic tests can be run except diagnostic tests which test the storage unit (DASD residing in the storage unit) or the storage adapter attached to the storage unit.
Notes:
1. For systems that are part of a shared DASD cluster, run one test at a time in looped mode. Do not run all tests in looped mode, as this could enable the DASD diagnostic tests.
2. If multiple error codes are displayed, diagnose the first error code displayed.
3. If the computer hangs with a POST error, go to the Symptom-to-FRU index
on page 91.
4. If the computer hangs and no error is displayed, go to Undetermined problemson page 105.
5. Power supply problems, see “Power checkouton page 15 andSymptom-to-FRU indexon page 91.
6. Safety information, see Safety informationon page 111.
7. For intermittent problems, check the error log; see POST error messageson
page 12.
© Copyright IBM Corp. 2001 1
1. IS THE SYSTEM PART OF A CLUSTER? YES. Schedule maintenance with the customer. Shut down all systems related to
the cluster. Run storage test. NO. Go to step 2.
2. IF THE SYSTEM IS NOT PART OF A CLUSTER:
v Power-off the computer and all external devices. v Check all cables and power cords. v Set all display controls to the middle position. v Power-on all external devices. v Power-on the computer. v Record any POST error messages displayed on the screen. If an error is
displayed, look up the first error in the POST error codeson page 99.
v Check the diagnostic LED panel system error LED; if on, see Front panel
and system board LEDson page 17.
v Check the System Error Log. If an error was recorded by the system, see
Symptom-to-FRU indexon page 91.
v Start the Diagnostic Programs. See Diagnostic programs and error
messageson page 12.
v Check for the following responses:
a. One beep. b. Readable instructions or the Main Menu.
3. DID YOU RECEIVE BOTH OF THE CORRECT RESPONSES?
NO. Find the failure symptom in Symptom-to-FRU indexon page 91. YES. Run the Diagnostic programs. If necessary, refer to Diagnostic programs and
error messageson page 12.
If you receive an error, go toSymptom-to-FRU indexon page 91.
If the diagnostics completed successfully and you still suspect a problem, see Undetermined problemson page 105.
2 Hardware Maintenance Manual: xSeries 220 Type 8646
General information
The IBM« xSeries 220 is an affordable solution for entry-level server applications. It is ideally suited for networking environments that require superior microprocessor performance, efficient memory management, flexibility, and large amounts of reliable data storage.
The IBM xSeries 220 server comes with a limited warranty and IBM Server Start Up Support. If you have access to the World Wide Web, you can obtain up-to-date information about the server model and other IBM server products at the following World Wide Web address: http://www.ibm.com/eserver/xseries/
Cover-release latch
© Copyright IBM Corp. 2001 3
Key lock
Features and specifications
This section provides a summary of the features and specifications of the xSeries 220 server.
Microprocessor:
v Intel
®
Pentium®III microprocessor with MMX™technology and SIMD
extensions
v 256 or 512 KB ECC, Level-2 cache v 133 MHz front-side bus (FSB) v Support for up to two microprocessors
Memory:
v Standard: 128 or 256 MB v Maximum: 4 GB v Type: 133 MHz, ECC, SDRAM, registered DIMMs v Slots: 4 DIMM connectors, noninterleaved
Drives standard:
v Diskette: 1.44 MB v CD-ROM: 20x-48X IDE v Supports hot-swap SCSI hard disk drives (some models)
Expansion bays:
v Two 5.25-in. bays (one CD-ROM drive installed) v Two 3.5-in. bays (one diskette drive installed) v Three 3.5-in. slim bays available in drive cage (some models have a hard disk
drive installed)
PCI expansion slots:
v Three 33 MHz/64-bit v Two 33 MHz/32-bit
Power supply:
One 330 watt autosensing (115-230 V ac)
Video:
v S3 video controller (integrated on system board) v Compatible with SVGA and VGA v 8 MB SDRAM video memory
Size
v Height: 470 mm (18.5 in.) v Depth: 508 mm (20 in.) v Width: 165 mm (6.5 in.) v Weight: approximately 19.5 kg (43 lb.) when fully configured or 15.9 kg (35 lb.)
minimum
Integrated functions:
v Ultra160 SCSI low voltage differential (LVD) controller v One 10BASE-T/100BASE-TX Intel Ethernet controller on the system board with
Wake on LAN
®
support
v Automatic BIOS Recovery (ABR) v Automatic Server Restart (ASR) v Two serial ports
4 Hardware Maintenance Manual: xSeries 220 Type 8646
v Parallel port v Two Universal Serial Bus (USB) ports v Keyboard port v Mouse port v IDE controller port v Video port
Acoustical noise emissions:
v Sound power, idling: 5.1 bel maximum v Sound power, operating: 5.5 bel maximum
Environment:
v Air temperature:
Server on: 10° to 35° C (50.0° to 95.0° F). Altitude: 0 to 914 m (2998.7 ft)Server on: 10° to 32° C (50.0° to 89.6° F). Altitude: 914 m (2998.7 ft) to 2133 m
(6998.0 ft)
– Server off: 10° to 43° C (50.0° to 109.4° F). Maximum altitude: 2133 m (6998.0
ft)
v Humidity:
Server on: 8% to 80%Server off: 8% to 80%
Heat output:
Server features
Approximate heat output in British thermal units (Btu) per hour
v Minimum configuration: 341 Btu (100 watts) v Maximum configuration: 1604 Btu (470 watts)
Electrical input:
v Sine-wave input (50-60 Hz) required v Input voltage low range:
Minimum: 100 V acMaximum: 127 V ac
v Input voltage high range:
Minimum: 200 V acMaximum: 240 V ac
v Input kilovolt-amperes (kVA), approximately:
Minimum: 0.08 kVAMaximum: 0.52 kVA
The design of the server takes advantage of advancements in symmetric multiprocessing (SMP), data storage, and memory management. The server combines:
v Impressive performance using an innovative approach to SMP
The server supports up to two Pentium III microprocessors. The server comes with one microprocessor installed; you can install an additional microprocessor to enhance performance and provide SMP capability.
v Large system memory
The memory bus in the server supports up to 4 GB of non-interleaved system memory. The memory controller provides error correcting code (ECC) support
General information 5
for up to four industry-standard PC133, 3.3 V, 168-pin, 8-byte, registered, synchronous-dynamic-random access memory (SDRAM) dual inline memory modules (DIMMs).
v System-management capabilities
System-management software is included with the server to manage the functions of the server locally and remotely. Refer to the documentation that comes with the system-management software for more information.
v Integrated network environment support
The server comes with an Ethernet controller on the system board. This Ethernet controller supports the Wake on LAN function and has an interface for connecting to 10-Mbps or 100-Mbps networks. The server automatically selects between 10BASE-T and 100BASE-TX environments. The controller provides full-duplex (FDX) capability, which allows simultaneous transmission and reception of data on the Ethernet local area network (LAN).
v IBM ServerGuide
CDs
The ServerGuide CDs that are included with the server provide programs to help you set up the server and install the network operating system (NOS). The ServerGuide program detects the hardware options that are installed, and provides the correct configuration programs and device drivers. In addition, the ServerGuide CDs include a variety of application programs for the server.
Reliability, availability, and serviceability
Three of the most important considerations in server design are reliability, availability, and serviceability (RAS). The RAS factors help to ensure the integrity of the data that is stored on the server, the availability of the server when it is needed; and the ease with which problems can be diagnosed and repaired.
The following is an abbreviated list of the RAS features that e server supports:
v Automatic Server Restart (ASR) after a power failure or system hang v Automatic BIOS Recovery (ABR) which enables the server to recover if the BIOS
is defective or becomes corrupt
v Cyclic redundancy check (CRC) checking on the small computer system interface
(SCSI) bus, the diskette interface, and the Universal Serial Bus (USB)
v Parity checking on the keyboard, serial, and memory interface v Microprocessor Built In Self Test (BIST) with internal error checking v Diagnostic light-emitting diodes (LEDs) v Error checking and correcting (ECC) memory v Error codes, messages, and logs v Hard disk drive Predictive Failure Analysis (PFA) v Menu-driven setup, system configuration, optional redundant array of
independent disks (RAID) configuration, and diagnostic programs
v Optional IBM Remote Supervisor Adapter subsystem to provide control for
remote system management
v Wake on LAN function through onboard Ethernet controller v Power and temperature monitoring v Power-on self-test (POST) v Processor serial number access v System error logging (POST) v Upgradeable basic input/output system (BIOS) code and diagnostics
6 Hardware Maintenance Manual: xSeries 220 Type 8646
v Vital product data (VPD) on memory, system board, and hot-swap drive
backplane
Server controls and indicators
This section identifies the controls and indicators on the front of the server.
Ethernet speed 100 Mbps
Cover-release latch
CD-ROM eject button
CD-ROM drive activity light
Diskette-eject button
Diskette drive activity light
SCSI activity light
Power-on light
Power-control button
System error light
Key lock
Ethernet transmit/receive activity
CD-ROM eject button: Press this button to release a CD from the drive.
CD-ROM drive activity light: When this light is on, it indicates that the CD-ROM
drive is in use.
Diskette-eject button: Press this button to release a diskette from the drive.
Diskette-drive activity light: When this light is on, it indicates that the diskette
drive is in use.
SCSI activity light: When this green light is flashing, the controller is accessing a SCSI device, for example, a hard disk drive.
Note: Hot-swap hard disk drives also have an activity light. This light is also
known as the SCSI hard disk drive activity light.
If the server has a ServeRAID
controller installed and this light flashes slowly (one flash per second), the drive is being rebuilt. When the light flashes rapidly (three flashes per second), the controller is identifying the drive.
Power-on light: When this green light is on, system power is present in the server.
General information 7
Power-control button: Press this button to manually turn the server on or off.
System error light: When this amber light is on, it indicates that a system error has
occurred. An amber error light on the interior of the server, adjacent to the faulty component, will also be on to further isolate the error. (For more information, see Diagnosticson page 11.)
Cover-release latch: Slide this lever to release the cover.
Key lock: Use the key that comes with your server to unlock the over.
Turning on the server
Turning on the server refers to the act of plugging the power cord of your server into the power source and starting the operating system.
After you plug the power cord of your server into the power supply and an electrical outlet, the server can start in any of the following ways:
v You can press the power-control button on the front of the server to start the
server.
Note: After you plug the power cord of your server into an electrical outlet,
wait approximately 20 seconds before pressing the power-control button. During this time, the system is initializing; therefore, the power-control button does not respond.
v If the server is turned on and a power failure occurs, the server will start
automatically when power is restored.
v The Wake on LAN feature will turn on the server at the set time (when a Magic
Packet is received), provided that all of the following conditions are met:
AC power is present.The server is either off or shut down from an Advanced Configuration and
Power Interface (ACPI) operating system.
– The Wake on LAN feature is enabled in the Configuration/Setup Utility
program.
Notes:
1. For additional information on the Wake on LAN function, adapters, and cables, refer to the documentation that comes with the adapters.
2. See Ethernet porton page 74 for information on Ethernet controllers and adapters, and Wake on LAN PCI adapters.
v If the IBM Remote Supervisor Adapter is installed in your server, the Remote
Supervisor Adapter can turn on the server.
v The server has an Automatic BIOS Recovery (ABR) feature which allows
recovery from defective BIOS in some cases.
v The server provides Automatic Server Restart (ASR) logic that supports
restarting the system when there is an operating system hang.
Notes:
1. you can install a circular disk over the power-control button to prevent accidental manual power-off. The disk, known as the power-control button shield, comes with your server.
2. See Choices available from the Configuration/Setup main menuon page 30 for a description of the Configuration/Setup Utility program.
3. See System board options connectorson page 40 for connector locations.
8 Hardware Maintenance Manual: xSeries 220 Type 8646
Turning off the server
Turning off the server refers to the act of disconnecting the server from the power source.
Statement 5
CAUTION:
The power control button on the device and the power switch on the power supply do not turn off the electrical current supplied to the device. The device also might have more than one power cord. To remove all electrical current from the device, ensure that all power cords are disconnected from the power source.
2
1
You can turn off the server in any of the following ways: v You can press the power-control button on the top of the server. This starts an
orderly shutdown of the operating system, if this feature is supported by your operating system.
Note: After turning off the server, wait at least five seconds before you press the
power-control button to turn on the server again.
v You might need to press and hold the power-control button for more than four
seconds to cause an immediate shutdown of the server and to force the power off. You can use this feature if the operating system stops functioning.
v You can disconnect the server power cords from the electrical outlets to shut off
all power to the server.
Note: After disconnecting the power cords, wait approximately 15 seconds for
your system to stop running. Watch for the power-on light to stop blinking.
General information 9
10 Hardware Maintenance Manual: xSeries 220 Type 8646
Diagnostics
This section provides basic troubleshooting information to help you resolve some common problems that might occur with the server.
If you cannot locate and correct the problem using the information in this section, refer to Symptom-to-FRU indexon page 91 for more information.
Diagnostic tools overview
The following tools are available to help you identify and resolve hardware-related problems:
v POST beep codes, error messages, and error logs
The power-on self-test (POST) generates beep codes and messages to indicate successful test completion or the detection of a problem. See POSTfor more information.
v Diagnostic programs and error messages
The server diagnostic programs are stored in upgradable read-only memory (ROM) on the system board. These programs are the primary method of testing the major components of the server. See Diagnostic programs and error messageson page 12 for more information.
v Diagnostic LEDs
The server has light-emitting diodes (LEDs) to help you identify problems with server components. These LEDs are part of the diagnostics that are built into the server. See Diagnostic LEDson page 17 for more information.
POST
When you turn on the server, it performs a series of tests to check the operation of server components and some of the options installed in the server. This series of tests is called the power-on self-test or POST.
If POST finishes without detecting any problems, a single beep sounds and the first screen of the operating system or application program appears.
If POST detects a problem, more than one beep sounds and an error message appears on the screen. See POST beep codesand POST error messageson page 12 for more information.
Notes:
1. If you have a power-on password set, you must type the password and press Enter, when prompted, before POST will continue.
2. A single problem might cause several error messages. When this occurs, work to correct the cause of the first error message. After you correct the cause of the first error message, the other error messages usually will not occur the next time you run the test.
POST beep codes
Beep codes are sounded in a series of beeps. For example, a 1-2-4 beep code sounds like one beep, a pause, two consecutive beeps, another pause, and four more consecutive beeps.
© Copyright IBM Corp. 2001 11
v One beep indicates the successful completion of POST. v More than one beep indicates that POST detected a problem. For more
information, see Beep symptomson page 91.
POST error messages
POST error messages occur during startup when POST finds a problem with the hardware or detects a change in the hardware configuration. For a list of POST errors, see POST error codeson page 99.
POST error log
The POST error log contains the three most recent error codes and messages that the system generated during POST.
To view the contents of the error logs, start the Configuration/Setup Utility program (see Starting the Configuration/Setup Utility programon page 29); then, select Error Logs from the main menu.
Small computer system interface messages
The following table lists actions to take if you receive a SCSI error message.
Note: If your server does not have a hard disk drive, ignore any message that
indicates that the BIOS is not installed.
You will get these messages only when running the SCSISelect Utility.
Table 1. SCSI messages
SCSI Messages Description
All One or more of the following might be causing the problem.
v A failing SCSI device (adapter, drive, controller) v An improper SCSI configuration v Duplicate SCSI IDs in the same SCSI chain v An improperly installed SCSI terminator v A defective SCSI terminator v An improperly installed cable v A defective cable
Action: Verify that:
v The external SCSI devices are turned on. External SCSI devices must
be turned on before the server.
v The cables for all external SCSI devices are connected correctly. v The last device in each SCSI chain is terminated properly. v The SCSI devices are configured correctly.
If the above items are correct, run the diagnostic programs to obtain additional information about the failing device.
Diagnostic programs and error messages
The server diagnostic programs are stored in upgradable read-only memory (ROM) on the system board. These programs are the primary method of testing the major components of the server.
12 Hardware Maintenance Manual: xSeries 220 Type 8646
Diagnostic error messages indicate that a problem exists; they are not necessarily intended to be used to identify a failing part. Troubleshooting and servicing of complex problems that are indicated by error messages should be performed by trained service personnel.
Sometimes the first error to occur causes additional errors. In this case, the server displays more than one error message. Always follow the suggested action instructions for the first error message that appears.
The following sections contain the error codes that might appear in the detailed test log and summary log when running the diagnostic programs.
The error code format is as follows:
fff-ttt-iii-date-cc-text message
where: fff is the three-digit function code that indicates the function being tested
when the error occurred. For example, function code 089 is for the microprocessor.
ttt is the three-digit failure code that indicates the exact test failure that was
encountered.
iii is the three-digit device ID. date is the date that the diagnostic test was run and the error recorded. cc is the check digit that is used to verify the validity of the information. text message
is the diagnostic message that indicates the reason for the problem.
Text messages
The diagnostic text message format is as follows:
Function Name: Result (test specific string)
where:
Function Name
Result can be one of the following:
is the name of the function being tested when the error occurred. This corresponds to the function code (fff) given in the previous list.
Passed
This result occurs when the diagnostic test completes without any errors.
Failed This result occurs when the diagnostic test discovers an
error.
User Aborted
This result occurs when you stop the diagnostic test before it is complete.
Not Applicable
This result occurs when you specify a diagnostic test for a device that is not present.
Diagnostics 13
Aborted
This result occurs when the test could not proceed because of the system configuration.
Warning
This result occurs when a possible problem is reported during the diagnostic test, such as when a device that is to be tested is not installed.
Test Specific String
This is additional information that you can use to analyze the problem.
Starting the diagnostic programs
You can press F1 while running the diagnostic programs to obtain Help information. You also can press F1 from within a help screen to obtain online documentation from which you can select different categories. To exit Help and return to where you left off, press Esc.
To start the diagnostic programs:
1. Turn on the server and watch the screen.
2. When the message F2 for Diagnostics appears, press F2.
3. Type in the appropriate password; then, press Enter.
4. Select either Extended or Basic from the top of the screen.
5. When the Diagnostic Programs screen appears, select the test you want to run
from the list that appears; then, follow the instructions on the screen.
Notes:
a. If the server stops during testing and you cannot continue, restart the server
and try running the diagnostic programs again.
b. The keyboard and mouse (pointing device) tests assume that a keyboard
and mouse are attached to the server.
c. If you run the diagnostic programs with no mouse attached to the server,
you will not be able to navigate between test categories using the Next Cat and Prev Cat buttons. All other functions provided by mouse-selectable buttons are also available using the function keys.
d. You can test the USB by using the regular keyboard test. The regular mouse
test can test a USB mouse. Also, you can run the USB interface test only if there are no USB devices attached.
e. You can view server configuration information (such as system
configuration, memory contents, interrupt request (IRQ) use, direct memory access (DMA) use, device drivers, and so on) by selecting Hardware Info from the top of the screen.
When the tests have completed, you can view the Test Log by selecting Utility from the top of the screen.
If the hardware checks out OK but the problem persists during normal server operations, a software error might be the cause. If you suspect a software problem, refer to the information that comes with the software package.
Viewing the test log
The test log will not contain any information until after the diagnostic program has run.
14 Hardware Maintenance Manual: xSeries 220 Type 8646
Diagnostic error messages
Power checkout
Note: If you already are running the diagnostic programs, begin with step 3 .
To view the test log:
1. Turn on the server and watch the screen. If the server is on, shut down the operating system and restart the server.
2. When the message F2 for Diagnostics appears, press F2. If a power-on password is set, the server prompts you for it. Type in the
appropriate password; then, press Enter.
3. When the Diagnostic Programs screen appears, select Utility from the top of the screen.
4. Select View Test Log from the list that appears; then, follow the instructions on the screen.
The system maintains the test-log data while the server is powered on. When you turn off the power to the server, the test log is cleared.
For descriptions of the error messages that might appear when you run the diagnostic programs, see Diagnostic error codeson page 94. If diagnostic error messages appear that are not listed in those tables, make sure that the server has the latest levels of BIOS, ServeRAID, and diagnostics microcode installed.
Power problems can be difficult to troubleshoot. For instance, a short circuit can exist anywhere on any of the power distribution busses. Usually a short circuit will cause the power subsystem to shut down because of an overcurrent condition.
A general procedure for troubleshooting power problems is as follows:
1. Power off the system and disconnect the AC cord(s).
2. Check for loose cables in the power subsystem. Also check for short circuits, for
3. Remove adapters and disconnect the cables and power connectors to all
4. Reconnect the AC cord and power on the system. If the system powers up
To use this method it is important to know the minimum configuration required for a system to power up (see page 106).
Recovering BIOS
The flash memory of the server consists of a primary BIOS page and a backup BIOS page. The server has an Automatic BIOS Recovery (ABR) feature which allows recovery when the BIOS is damaged (e.g. BIOS is damaged when the server loses power during the flash update). When the ABR feature finds the primary BIOS damaged, circuitry switches to the backup BIOS and restarts the system. If the primary BIOS has become damaged, ABR will proceed and boot the backup BIOS and the system will function normally. However, the primary BIOS should be updated using a BIOS flash diskette as soon as possible. The flash diskette always
instance if there is a loose screw causing a short circuit on a circuit board.
internal and external devices until system is at minimum configuration required for power on (see Minimum operating requirementson page 106).
successfully, replace adapters and devices one at a time until the problem is isolated. If system does not power up from minimal configuration, replace FRUs of minimal configuration one at a time until the problem is isolated.
Diagnostics 15
updates the primary BIOS. The backup BIOS can be updated by moving the primary BIOS to the backup BIOS page during the diskette flash update process. The ABR feature can be disabled with the jumper J20; this action is not normally required.
Note: You can obtain a BIOS flash diskette from one of the following sources:
v Use the ServerGuide program to make a BIOS flash diskette. v Download a BIOS flash diskette from the World Wide Web. Go to
http://www.ibm.com/pc/support/, select IBM Server Support, and make the selections for your server.
To recover the BIOS, do the following:
1. Shutdown the server
2. Insert the BIOS flash diskette into the diskette drive.
3. Restart the server. The system begins the power-on self-test (POST).
4. Follow the on-screen instructions to update the BIOS.
5. Restart the server when the flash update is complete.
Another jumper, J38, can be used to force the system to boot from the backup POST/BIOS (manually boot to backup BIOS). Normally this action is not required, but if the primary BIOS is damaged, and ABR is not functioning properly, the jumper can be moved for troubleshooting purposes. If jumper J38 is in the backup page position, the ABR feature is disabled, regardless of the jumper J20 position, and the system will boot from the backup BIOS page.
To manually boot to backup BIOS and recover primary BIOS:
1. Turn off the server and peripheral devices and disconnect power cords and all external cables; then, remove the cover.
2. Locate jumper J38 on the system board.
Switch block
Flash ROM page-swap jumper (J38)
123
3. Move J38 to the losetting (pins 1 and 2) to enable BIOS recovery mode
(manual boot).
4. Reconnect the power cord.
16 Hardware Maintenance Manual: xSeries 220 Type 8646
5. Insert the BIOS flash diskette in the diskette drive.
6. Restart the server. The system begins the power-on self-test (POST).
7. Follow the on-screen instructions to update the BIOS.
8. Turn off the server and unplug the power cord when the flash update is
complete.
9. Move jumper J38 to the hisetting (pins 2 and 3) to return to normal startup mode.
10. Re-connect the power cord and all external cables and re-install the cover.
11. Restart the server
Identifying problems using status LEDs
The server has LEDs to help you identify problems with some server components. These LEDs are part of the diagnostics that are built into the server. Use the illuminated LEDs to identify the failing or incorrectly installed components.
Front panel and system board LEDs
The system error LED is on the front panel inside the server. All of the remaining error LEDs are on the system board, adjacent to the failing components. See Diagnostic LEDsfor information on identifying problems using these LEDs.
The meanings of these LEDs are as follows:
CPU 1 Microprocessor number 1 (connector U12) fault CPU 2 Microprocessor number 2 (connector U11) fault Fan 1 Fan number 1 (connector J10) failure (see note 1) Fan 2 Fan number 2 (connector J18) failure (see note 1) Fan 3 Fan number 3 (connector J3) failure (see note 1) Fan 4 Fan number 4 (connector J2) failure (see note 1) DIMM 1 DIMM number 1 (connector DIMM 1) fault DIMM 2 DIMM number 2 (connector DIMM 2) fault DIMM 3 DIMM number 3 (connector DIMM 3) fault DIMM 4 DIMM number 4 (connector DIMM 4) fault VRM 1 Microprocessor VRM number 1 (connector J42) fault (see note 1) VRM 2 Microprocessor VRM number 2 (connector J12) fault (see note 1) Notes:
1. The fan and VRM LEDs will illuminate only if the IBM Remote Supervisor Adapter is installed in the server.
2. The server does not support user-replaceable power supplies or fans.
Diagnostic LEDs
You can use the diagnostic LEDs built into the server to quickly identify the type of system error that occurred. The server is designed so that LEDs remain illuminated when the server shuts down, as long as the power supply is operating properly. This feature helps you to isolate the problem if an error causes the server to shut down. To correct specific problems, see Symptom-to-FRU indexon page 91.
Diagnostics 17
Table 2. Diagnostic LEDs
System Error LED (on front panel)
On
A system error was detected. Check to see which of the LEDs on the system board are on.
On DIMM1, DIMM2,
On CPU1 or CPU2 One of the microprocessors has
On Fan 1, Fan 2, Fan
System board LED
None The system error log is 75% or more
DIMM3, or DIMM4
3, or Fan 4
Cause Action
full or a Predictive Failure Analysis (PFA) alert was logged. This could also be caused by a Processor terminal problem.
A memory error occurred.
failed, or a microprocessor is installed incorrectly.
One of the fans has failed or is operating too slowly.
Check the system error log and correct any problems. See Choices available from the Configuration/Setup main menuon page 30 for more information about clearing the error log. Disconnecting the server from all power sources for at least 20 seconds will turn off the system error LED. Check mounting of the processor sink fans.
1. Check the DIMM error LEDs on the system board.
2. Replace the DIMM indicated by the lit DIMM error LED.
1. Check the microprocessor error LEDs on the system board. If a microprocessor error LED is on for a microprocessor connector that has a terminator card installed instead of a microprocessor, the microprocessors are not installed in the correct order. See Installing and removing a microprocessoron page 61 for information about the correct order for installing microprocessors. Otherwise, continue with the next step.
2. Turn off the server, reseat the microprocessor indicated by the lit microprocessor LED, and restart the server.
3. If the problem persists, replace the microprocessor.
The LED on the failing fan will be lit. Replace the fan with the lit LED.
18 Hardware Maintenance Manual: xSeries 220 Type 8646
Table 2. Diagnostic LEDs (continued)
System Error LED (on front panel)
On VRM1 or VRM2 One of the microprocessor VRMs
Off None The diagnostic LEDs have not
System board LED
Cause Action
has failed, or a microprocessor VRM is installed in the wrong connector.
detected a system error.
1. Check the microprocessor VRM error LEDs on the system board. If a microprocessor VRM error LED is on for a microprocessor VRM connector that has a terminator card installed instead of a microprocessor, the microprocessor VRMs are not installed in the correct order. See Installing and removing a microprocessoron page 61 for information about the correct order for installing microprocessor VRMs. Otherwise, continue with the next step.
2. Turn off the server, reseat the microprocessor VRM indicated by the lit microprocessor VRM error LED, and restart the server.
3. If the problem persists, replace the microprocessor VRM.
None
Replacing the battery
When replacing the battery you must replace it with a lithium battery of the same type, from the same manufacturer. To avoid possible danger read and follow the safety statement below.
To order replacement batteries, call 1-800-772-2227 within the United States, and 1-800-465-7999 or 1-800-465-6666 within Canada. Outside the U.S. and Canada, call your IBM reseller or IBM marketing representative.
Note: After you replace the battery, you must reconfigure the server and reset the
system date and time.
Diagnostics 19
CAUTION: When replacing the battery, use only IBM Part Number 33F8354 or an equivalent type battery recommended by the manufacturer. If your system has a module containing a lithium battery, replace it only with the same module type made by the same manufacturer. The battery contains lithium and can explode if not properly used, handled, or disposed of.
Do not:
v Throw or immerse into water v Heat to more than 100°C (212°F) v Repair or disassemble
Dispose of the battery as required by local ordinances or regulations.
Do the following to replace the battery:
1. Read System reliability considerationson page 44, and follow any special handling and installation instructions supplied with the replacement battery.
2. Turn off the server and peripheral devices and disconnect power cords and all external cables; then, remove the server cover.
3. Do the following to remove the battery: a. Use one fingernail to press the top of the battery clip away from the battery. b. Use your thumb and index finger to lift the battery from the socket. c. Ensure that the battery clip is touching the base of the battery socket by
pressing gently on the clip.
4. Do the following to insert the new battery: a. Tilt the battery so that you can insert it into the socket on the side opposite
the battery clip.
b. Press the battery down into the socket until it clicks into place. Make sure
the battery clip holds the battery securely.
5. Reinstall the server cover and connect all device and signal cables, and then power cords.
6. Turn on the server.
7. Start the BIOS Setup Utility program and set configuration parameters.
v Set the system date and time. v Set passwords if necessary.
20 Hardware Maintenance Manual: xSeries 220 Type 8646
v Save the configuration.
Temperature checkout
Proper cooling of the system is important for proper operation and system reliability. For a typical IBM xSeries server, you should make sure:
v Each of the drive bays has either a drive or a filler panel installed v The cover is in place during normal operation v There is at least 50 mm (2 inches) of ventilated space at the sides of the server
and 100 mm (4 inches) at the rear of the server
v The cover is removed for no longer than 30 minutes while the server is
operating
v A removed hot-swap drive is replaced within two minutes of removal v Cables for optional adapters are routed according to the instructions provided
with the adapters (ensure that cables are not restricting air flow)
v The fans are operating correctly and the air flow is good v A failed fan is replaced within 48 hours
In addition, ensure that the environmental specifications for the system are met. See Features and specificationson page 4.
Diagnosing errors
To find solutions to problems that have definite symptoms, see Error symptoms
on page 97.
If you cannot find the problem there, go to Starting the diagnostic programson page 14 to test the server.
If you have just added new software or a new option and the server is not working, do the following before using the error symptoms table:
v Remove the software or device that you just added. v Run the diagnostic tests to determine if the server is running correctly. v Reinstall the new software or new device.
Troubleshooting the Ethernet controller
This section provides troubleshooting information for problems that might occur with the 10/100 Mbps Ethernet controller.
Network connection problems
If the Ethernet controller cannot connect to the network, check the following: v Make sure that the cable is installed correctly.
The network cable must be securely attached at all connections. If the cable is attached but the problem persists, try a different cable.
If you set the Ethernet controller to operate at 100 Mbps, you must use Category 5 cabling.
If you directly connect two workstations (without a hub), or if you are not using a hub with X ports, use a crossover cable.
Note: To determine whether a hub has an X port, check the port label. If the
label contains an X, the hub has an X port.
Diagnostics 21
v Determine if the hub supports auto-negotiation. If not, try configuring the
integrated Ethernet controller manually to match the speed and duplex mode of the hub.
v Check the LAN activity light on the rear of the server. The LAN activity light
illuminates when the Ethernet controller sends or receives data over the Ethernet network. If the LAN activity light is off, make sure that the hub and network are operating and that the correct device drivers are loaded.
v Make sure that you are using the correct device drivers, supplied with your
server.
v Check for operating system-specific causes for the problem. v Make sure that the device drivers on the client and server are using the same
protocol.
v Test the Ethernet controller.
How you test the Ethernet controller depends on which operating system you are using (see the Ethernet controller device driver README file).
Ethernet controller troubleshooting chart
You can use the following troubleshooting chart to find solutions to 10/100 Mbps Ethernet controller problems that have definite symptoms.
Table 3. Ethernet troubleshooting chart
Ethernet controller problem
The server stops running when loading device drivers.
The LAN activity light does not light.
Data is incorrect or sporadic.
Suggested Action
The PCI BIOS interrupt settings are incorrect.
Check the following: v Determine if the interrupt (IRQ) setting assigned to the Ethernet
controller is also assigned to another device in the Configuration/Setup Utility program.
Although interrupt sharing is allowed for PCI devices, some devices do not function well when they share an interrupt with a dissimilar PCI device. Try changing the IRQ assigned to the Ethernet controller or the other device. For example, for NetWare Versions 3 and 4 it is recommended that disk controllers not share interrupts with LAN controllers.
v Make sure that you are using the most recent device driver available
from the World Wide Web.
v Run the network diagnostic program. Check the following:
v Make sure that you have loaded the network device drivers. v The network might be idle. Try sending data from this workstation. v Run diagnostics on the LEDs. v The function of this LED can be changed by device driver load
parameters. If necessary, remove any LED parameter settings when you load the device drivers.
Check the following: v Make sure that you are using Category 5 cabling when operating
the server at 100 Mbps.
v Make sure that the cables do not run close to noise-inducing sources
like fluorescent lights.
22 Hardware Maintenance Manual: xSeries 220 Type 8646
Loading...
+ 126 hidden pages