IBM 866631Y, Netfinity 7100 8666 Hardware Maintenance Manual

Hardware Maintenance Manual
Netfinity 7100 – Type 8666
IBM
Hardware Maintenance Manual
Netfinity 7100 – Type 8666
IBM
Note
Before using this information and the product it supports, be sure to read the general information under “Notices” on page 185.
First Edition (February 2000)
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION AS ISWITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This publication could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time.
This publication was developed for products and services offered in the United States of America. IBM may not offer the products, services, or features discussed in this document in other countries, and the information is subject to change without notice. Consult your local IBM representative for information on the products, services, and features available in your area.
Requests for technical information about IBM products should be made to your IBM reseller or IBM marketing representative.
© Copyright International Business Machines Corporation 1999. All rights reserved.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
About this manual
This manual contains diagnostic information, a Symptom-to-FRU index, service information, error codes, error messages, and configuration information for the Netfinity 7100 – Type 8666, Models 1RY, 2RY, 11Y, 21Y.
Important: This manual is intended for trained servicers who are familiar with
IBM PC Server products.
Important safety information
Be sure to read all caution and danger statements in this book before performing any of the instructions.
Leia todas as instruções de cuidado e perigo antes de executar qualquer operação.
Prenez connaissance de toutes les consignes de type Attention et
Danger avant de procéder aux opérations décrites par les instructions.
© Copyright IBM Corp. 1999 iii
Lesen Sie alle Sicherheitshinweise, bevor Sie eine Anweisung ausführen.
Accertarsi di leggere tutti gli avvisi di attenzione e di pericolo prima di effettuare qualsiasi operazione.
Lea atentamente todas las declaraciones de precaución y peligro ante de llevar a cabo cualquier operación.
Online support
IBM online addresses
Use the World Wide Web (WWW) to download Diagnostic, BIOS Flash, and Device Driver files.
File download address is:
http://www.us.pc.ibm.com/files.html
The HMM manuals online address is:
http://www.us.pc.ibm.com/cdt/hmm.html
The IBM PC Company Support Page is:
http://www.us.pc.ibm.com/support/index.html
The IBM PC Company Home Page is:
http://www.pc.ibm.com
iv Hardware Maintenance Manual: Netfinity 7100 – Type 8666
Contents
About this manual ..........iii
Important safety information ........iii
Online support .............iv
IBM online addresses ..........iv
General checkout ..........1
General information .........3
Features and specifications..........3
Server features..............5
Reliability, availability, and serviceability .....6
Controls and indicators ...........7
Information LED panel ...........9
Diagnostics.............11
Diagnostic tools overview .........11
POST ................11
POST beep codes ...........12
POST error messages ..........12
Event/error logs............12
Small computer system interface messages ....12
Diagnostic programs and error messages ....12
Text messages ............13
Starting the diagnostic programs ......14
Viewing the test log ..........15
Diagnostic error message tables.......15
Light path diagnostics ...........15
Power supply LEDs ..........15
Diagnostic panel LEDs ........16
Light path diagnostics .........17
Power checkout .............19
Temperature checkout ...........20
Recovering BIOS ............20
Replacing the battery ...........21
Diagnosing errors ............22
Troubleshooting the Ethernet controller ....23
Network connection problems ......23
Ethernet controller troubleshooting chart . . 23
Ethernet controller messages........25
Novell NetWare or IntraNetWare server ODI
driver messages ...........25
NDIS 4.0 (Windows NT) driver messages . . 27
UNIX messages ...........28
Configuring the server ........31
Using the Configuration/Setup Utility program . . 31
Starting the Configuration/Setup Utility program 31 Choices available from the Configuration/Setup
main menu .............31
Using passwords ...........35
Power-on password .........35
Administrator password ........36
Using the SCSISelect utility program ......36
Starting the SCSISelect utility program ....36
Choices available from the SCSISelect menu . . 37
Installing options ..........39
Major components of the Netfinity 7100 .....39
Component locations ...........40
I/O board component locations.......40
Processor board component locations .....41
Processor board LEDs .........41
Processor board connectors .......42
Processor board jumpers ........42
Memory board component locations .....43
Memory board connectors .......43
Memory board LED locations ......43
Before you begin ............43
Working inside the server with the power on . . 44 Removing the top cover, front door and media-bay
bezel ................44
Removing the top cover .........45
Removing the server front door and the
media-bay bezel ............46
Working with adapters ..........46
Adapter considerations .........47
Installing an adapter ..........47
Cabling example for the ServeRAID adapter . . . 49
Installing internal drives ..........50
Internal drive bays ...........51
Installing a hot-swap hard disk drive.....51
Installing a 5.25-inch removable-media drive . . 53
Installing memory-module kits ........54
Installing a microprocessor kit ........57
Installing a hot-swap power supply ......60
Replacing a hot-swap fan ..........61
Completing the installation .........62
Installing the server front door and media-bay
bezel ...............63
Installing the top cover .........64
Reconfiguring the server .........64
Connecting external options .........65
Input/output ports ............65
Parallel port .............65
Viewing or changing the parallel-port
assignments ............66
Parallel port connector.........66
Video port..............67
Keyboard port ............68
Auxiliary-device (pointing device) port ....68
Ultra2 SCSI ports ...........68
SCSI cabling requirements .......69
Setting SCSI IDs ...........69
SCSI connector pin-number assignments. . . 69
Serial ports .............70
Viewing or changing the serial-port
assignments ............70
Serial-port connectors .........71
Universal Serial Bus ports ........71
© Copyright IBM Corp. 1999 v
USB cables and hubs .........71
USB-port connectors .........71
Ethernet port .............72
Configuring the Ethernet controller ....72
Failover for redundant Ethernet .....72
Ethernet port connector ........74
Advanced System Management ports.....74
Cabling the server ............74
Installing the server in a rack ........75
Netfinity Manager ..........77
Managing your IBM Netfinity server with Netfinity
Manager ...............78
Netfinity Manager documentation .......78
Netfinity Manager system requirements .....78
Netfinity Manager for OS/2 system requirements 79
Netfinity Manager for Windows 95 and Windows
98 system requirements .........79
Netfinity Manager for Windows NT system
requirements .............80
Starting the Netfinity Manager installation program 81
Netfinity Manager database support ......87
DB2 database support ..........87
System requirements .........87
Installing and configuring the database . . . 87
Activating the database ........88
Granting and revoking database privileges . . 89
Deleting the database .........90
Lotus Notes database support .......91
System requirements .........91
Installing the database .........91
Browsing the Netfinity Manager Lotus Notes
database .............92
ODBC database support .........93
System requirements .........93
ODBC database configuration ......93
Creating the Netfinity Manager tables . . . 94
Supported and certified databases .....96
Starting Netfinity Manager .........97
Netfinity Manager Service Manager .....98
Netfinity Manager service descriptions ....98
Advanced System Management .....99
Alert Manager ...........99
Alert on LAN configuration .......99
Capacity Management.........99
Cluster Manager...........99
Critical File Monitor .........100
DMI Browser ...........100
ECC Memory Setup .........100
Event Scheduler ..........100
File Transfer............100
Power-On Error Detect ........100
Predictive Failure Analysis .......100
Process Manager ..........100
RAID Manager ...........100
Remote Session...........101
Remote System Manager .......101
Remote Workstation Control ......101
ScreenView............101
Security Manager ..........101
Serial Connection Control .......101
Service Configuration Manager .....101
Software Inventory .........101
System Diagnostics Manager ......102
System Information Tool........102
System Monitor ..........102
System Partition Access ........102
System Profile ...........102
Update Connector Manager ......102
Web Manager Configuration ......103
Delaying Netfinity Manager startup on OS/2
systems ..............103
Getting more information about Netfinity Manager 103
Installation options ...........104
Automated installation .........104
Customized installation .........105
FRU information (service only) ....109
Diagnostic switch card ..........109
Disconnecting the shuttle .........110
Front LED card assembly .........111
I/O Legacy board ............112
Memory card removal ..........112
Power backplane assembly .........114
Processor/PCI backplane .........114
Removing the shuttle ...........116
SCSI backplane assembly .........116
Symptom-to-FRU index .......119
Beep symptoms ............119
No beep symptoms ...........122
Diagnostic panel LEDs ..........122
Diagnostic error codes ..........125
Error symptoms ............130
Power supply LED errors .........131
POST error codes ............131
ServeRAID ..............137
SCSI error codes ............138
Temperature error messages........138
Fan error messages ...........139
Power error messages ..........139
System shutdown ............140
Power related system shutdown ......140
Temperature related system shutdown ....140
DASD checkout ............141
Host Built-In Self Test (BIST) checkout .....141
I2C bus fault messages ..........141
Undetermined problems..........142
Parts listing (Type 8666).......145
Part A ................145
System ..............146
Part B ................147
System ...............148
Keyboards ..............149
Power cords..............150
Related service information .....151
Safety information............152
General safety ............152
Electrical safety............152
vi Hardware Maintenance Manual: Netfinity 7100 – Type 8666
Safety inspection guide .........154
Handling electrostatic discharge-sensitive
devices ..............155
Grounding requirements ........155
Safety notices (multi-lingual translations) . . . 155
Send us your comments! .........184
Problem determination tips.........185
Notices ...............185
Trademarks ..............186
Contents vii
viii Hardware Maintenance Manual: Netfinity 7100 – Type 8666
General checkout
The server diagnostic programs are stored in upgradable read-only memory (ROM) on the system bsoard. These programs are the primary method of testing the major components of the server: The system board, Ethernet controller, video controller, RAM, keyboard, mouse (pointing device), diskette drive, serial ports, hard drives, and parallel port. You can also use them to test some external devices. See “Diagnostic programs and error messages” on page 12.
Also, if you cannot determine whether a problem is caused by the hardware or by the software, you can run the diagnostic programs to confirm that the hardware is working properly.
When you run the diagnostic programs, a single problem might cause several error messages. When this occurs, work to correct the cause of the first error message. After the cause of the first error message is corrected, the other error messages might not occur the next time you run the test.
A failed system might be part of a shared DASD cluster (two or more systems sharing the same external storage device(s)). Prior to running diagnostics, verify that the failing system is not part of a shared DASD cluster.
A system might be part of a cluster if:
v The customer identifies the system as part of a cluster. v One or more external storage units are attached to the system and at least one of
the attached storage units is additionally attached to another system or unidentifiable source.
v One or more systems are located near the failing system.
If the failing system is suspected to be part of a shared DASD cluster, all diagnostic tests can be run except diagnostic tests which test the storage unit (DASD residing in the storage unit) or the storage adapter attached to the storage unit.
Notes:
1. For systems that are part of a shared DASD cluster, run one test at a time in
looped mode. Do not run all tests in looped mode, as this could enable the DASD diagnostic tests.
2. If multiple error codes are displayed, diagnose the first error code displayed.
3. If the computer hangs with a POST error, go to the “Symptom-to-FRU index”
on page 119.
4. If the computer hangs and no error is displayed, go to “Undetermined
problems” on page 142.
5. Power supply problems, see “Symptom-to-FRU index” on page 119.
6. Safety information, see “Safety information” on page 152.
7. For intermittent problems, check the error log; see “POST error messages” on
page 12.
1. IS THE SYSTEM PART OF A CLUSTER? YES. Schedule maintenance with the customer. Shut down all systems related to
the cluster. Run storage test. NO. Go to step 2.
© Copyright IBM Corp. 1999 1
2. THE SYSTEM IS NOT PART OF A CLUSTER.
v Power-off the computer and all external devices. v Check all cables and power cords. v Set all display controls to the middle position. v Power-on all external devices. v Power-on the computer. v Record any POST error messages displayed on the screen. If an error is
displayed, look up the first error in the “POST error codes” on page 131.
v Check the information LED panel System Error LED; if on, see “Diagnostic
panel LEDs” on page 122.
v Check the System Error Log. If an error was recorded by the system, see
“Symptom-to-FRU index” on page 119.
v Start the Diagnostic Programs. See “Diagnostic programs and error
messages” on page 12.
v Check for the following responses:
a. One beep. b. Readable instructions or the Main Menu.
3. DID YOU RECEIVE BOTH OF THE CORRECT RESPONSES?
NO. Find the failure symptom in “Symptom-to-FRU index” on page 119. YES. Run the Diagnostic Programs. If necessary, refer to “Diagnostic programs and
error messages” on page 12.
If you receive an error, go to “Symptom-to-FRU index” on page 119.
If the diagnostics completed successfully and you still suspect a problem, see “Undetermined problems” on page 142.
2 Hardware Maintenance Manual: Netfinity 7100 – Type 8666
General information
The IBM®Netfinity®7100 server is a high-performance server with the capability of microprocessor upgrade to a symmetric multiprocessing (SMP) server. It is ideally suited for networking environments that require superior microprocessor performance, efficient memory management, flexibility, and large amounts of reliable data storage.
Performance, ease of use, reliability, and expansion capabilities were key considerations during the design of the server. These design features make it possible for you to customize the system hardware to meet your needs today, while providing flexible expansion capabilities for the future.
The IBM Netfinity 7100 server comes with a three-year limited warranty and 90-Day IBM Start Up Support. If you have access to the World Wide Web, you can obtain up-to-date information about the server model and other IBM server products at the following World Wide Web address: http://www.ibm.com/pc/us/netfinity/
Features and specifications
The following provides a summary of the features and specifications for the Netfinity 7100 server.
v Microprocessor:
– Intel – 32 KB of level-1 cache – 512K of level-2 cache (min.) – Expandable to four microprocessors
v Memory:
– Maximum: 16 GB – Type: ECC, SDRAM, Registered DIMMs – Slots: 4-way interleaved, 16 slots
v Drives standard:
– Diskette: 1.44 MB – CD-ROM: 40X IDE
v Expansion bays:
– Hot-swap: 10 slim high or 7 half high – Non-hot-swap: Two 5.25-inch
v PCI expansion slots:
– Four 33 MHz / 64-bit – Two 66 MHz / 64-bit
v Hot-swap power supplies:
250 W (115-230 V ac) – Minimum: Two – Maximum: Four
v Cooling:
– Four hot-swap fan assemblies
v Video:
– S3 video controller
®
Pentium®III Xeon
© Copyright IBM Corp. 1999 3
– Compatible with SVGA and VGA – 4 MB video memory
v Size (Rack Model) (8U)
– Height: 356 mm (14 in.) – Depth: 650 mm (25.6 in.) – Width: 440 mm (17.3 in.) – Weight: 34.4 kg (76 lb.) to 61 kg (134 lb.) depending upon configuration
v Size (Tower Model)
– Height: 356 mm (14 in.) – Depth: 700 mm (27.6 in.) – Width: 483 mm (19 in.) – Weight: 39 kg (86 lb.) to 55 kg (121 lb.) depending upon configuration
v Integrated functions:
– Netfinity Advanced System Management processor – Dual Ultra-2 (LVD) SCSI controller (one external port, one internal port) – One 10BASE-T/100BASE-TX AMD Ethernet controller – Three serial ports (one reserved for system management) – Two RS 485 ports – One parallel port – Two universal serial bus ports – Keyboard port – Mouse port – Video port
v Acoustical noise emissions:
– Sound power, idling: 6.3 bel maximum – Sound power, operating: 6.3 bel maximum – Sound pressure, operating: 48 dBa maximum
v Environment:
– Air temperature:
- Server on: 10to 35C (50to 95F). Altitude: 0 to 914 m (3000 ft.)
- Server on: 10to 32C (50to 89.6F). Altitude: 914 m (3000 ft.) to 2133 m (7000 ft.)
- Server off: 10to 43C (50to 110F). Maximum altitude: 2133 m (7000 ft.)
– Humidity:
- Server on: 8% to 80%
- Server off: 8% to 80%
v Heat output:
Approximate heat output in British Thermal Units (BTU) per hour – Minimum configuration:1023.9 BTU (0.3 kilowatts per hour) – Maximum configuration: 2764.6 BTU (0.81 kilowatts per hour)
v Electrical input:
– Sine-wave input (50-60 Hz) required – Input voltage low range:
- Minimum: 90 V ac
- Maximum: 137 V ac
– Input voltage high range:
- Minimum: 180 V ac
- Maximum: 265 V ac
– Input kilovolt-amperes (kVA) approximately:
- Minimum: 0.08 kVA
- Maximum: 0.52 kVA
4 Hardware Maintenance Manual: Netfinity 7100 – Type 8666
Server features
The unique design of the server takes advantage of advancements in symmetric multiprocessing (SMP), data storage, and memory management. The server combines:
v Impressive performance using an innovative approach to SMP
The server supports up to four Pentium III Xeon microprocessors. The server comes with one microprocessor installed; you can install additional microprocessors to enhance performance and provide SMP capability.
v Large data-storage and hot-swap capabilities
All models of the server support up to 10 slim-high or 7 half-high hot-swap hard disk drives. This hot-swap feature enables you to remove and replace hard disk drives without turning off the server.
v Large system memory
The memory bus in the server supports up to 16 GB of system memory. The memory controller provides error correcting code (ECC) support for up to 16 industry-standard, 3.3 V, 168-pin, 8-byte, registered, dual inline memory modules (DIMMs).
v System-management capabilities
The server comes with a Netfinity Advanced System Management Processor on the system board. This processor, in conjunction with the Netfinity Manager software provided on the ServerGuide CDs, enables you to manage the functions of the server locally and remotely. The Netfinity Advanced System Management Processor also provides system monitoring, event recording, and dial-out alert capability.
Note: The Netfinity Advanced System Management Processor is sometimes
referred to as the service processor.
v Integrated network environment support
The server comes with an Ethernet controller. This Ethernet controller on the system board has an interface for connecting to 10-Mbps or 100-Mbps networks. The server automatically selects between 10BASE-T and 100BASE-TX. The controller provides full-duplex (FDX) capability, which enables simultaneous transmission and reception of data on the Ethernet local area network (LAN).
v Redundant network-interface card
The addition of an optional, redundant network interface card (NIC) provides a failover capability to a redundant Ethernet connection. If a problem occurs with the primary Ethernet connection, all Ethernet traffic associated with this primary connection is automatically switched to the redundant NIC. This switching occurs without data loss and without user intervention.
v IBM ServerGuide CDs
The ServerGuide CDs included with the Netfinity server provide programs to help you set up the server and install the network operating system (NOS). The ServerGuide program detects the hardware options that are installed, and provides the correct configuration programs and device drivers. In addition, the ServerGuide CDs include a variety of application programs for the server.
General information 5
Reliability, availability, and serviceability
Three of the most important features in server design are reliability, availability, and serviceability (RAS). These factors help to ensure the integrity of the data stored on the server; that the server is available when you want to use it; and that should a failure occur, you can easily diagnose and repair the failure with minimal inconvenience.
The following is an abbreviated list of the RAS features that the server supports. v Menu-driven setup, system configuration, SCSISelect configuration, and
diagnostic programs
v Power-on self-test (POST) v Integrated Netfinity Advanced System Management Processor v Predictive Failure Analysis v Remote system problem-determination support v Power and temperature monitoring v Power-supply redundancy monitoring v Fault-resistant startup v Hot-swap drive bays v Error codes and messages v System error logging v Upgradable BIOS, diagnostics, and Netfinity Advanced System Management
Processor code
v Automatic restart after a power failure v Parity checking on the SCSI bus and the PCI bus v Error correcting code (ECC) memory v Redundant hot-swap power supplies and fans v Hot-swap cooling v Chipkill™memory protection (optional) v Support for hot-plug PCI adapters (optional) v Redundant Ethernet capabilities (with optional adapter) v Vital Product Data (VPD) on processors, processor board, I/O board, power
supplies, hard disk backplane, power backplane and VRMs.
v Information and diagnostic LED panels
(PFA) alerts
6 Hardware Maintenance Manual: Netfinity 7100 – Type 8666
Controls and indicators
The following illustration shows the controls and indicators on the server.
Information LED panel
Power-control button
Diskette drive in-use light
Diskette-eject button
CD-ROM drive in-use light CD-ROM eject/load button
Reset button
Hard-disk
activity light
Hard-disk
status light
Hard-disk drive status light: Each of the hot-swap drives has a status light. When this amber light is on continuously, the drive has failed. When the light flashes slowly (one flash per second), the drive is being rebuilt. When the light flashes rapidly (three flashes per second) the controller is identifying the drive.
Hard-disk activity light:Each of the hot-swap drives has a hard-disk activity light. When this green light is flashing, the drive is being accessed.
CD-ROM eject/load button:Press this button to eject or retract the CD-ROM tray.
CD-ROM drive in-use light:When this light is on, the CD-ROM drive is being
accessed.
Diskette-eject button:Press this button to eject a diskette from the drive.
Diskette drive in-use light:When this light is on, the diskette drive is being
accessed.
Reset button:Press this button to reset the server and run the power-on self-test (POST).
Power control button:Press this button to manually turn on or off the server.
General information 7
CAUTION:
The power control button on the device and/or the power switch on the power supply do not turn off the electrical current supplied to the device. The device also might have more than one power cord. To remove all electrical current from the device, ensure that all power cords are disconnected from the power source.
1 2 3
You can start the server in several ways: v You can turn on the server by pressing the Power Control button on the front of
the server.
Note: After you plug the power cords of your server into electrical outlets, wait
20 seconds before pressing the Power Control button. During this time the system-management processor is initializing and the Power Control button does not respond.
v If the server is turned on, a power failure occurs, and unattended- start mode is
enabled in the Configuration/Setup utility program, the server will start automatically when power is restored.
v If AC power is present, the server is off, and the wake-up feature is enabled in
the Configuration/Setup utility program, the wake-up feature will turn on the server at the set time.
v If AC power is present, the server is off, and ring signal detect is enabled in the
Configuration/Setup utility program, you can turn on the server by telephone input.
v The Netfinity Advanced System Management Processor can also turn on the
server.
You can turn off the server in several ways: v You can turn off the server by pressing the Power Control button on the front of
the server. Pressing the Power Control button starts an orderly shutdown of the operating system, if this feature is supported by your operating system, and places the server in standby mode.
Note: After turning off the server, wait at least 5 seconds before pressing the
Power Control button to power the server on again.
v You can press and hold the Power Control button for more than 4 seconds to
cause an immediate shutdown of the server and place the server in standby mode. You can use this feature if the operating system stalls.
v You can disconnect the server power cords from the electrical outlets to shut off
all power to the server.
Note: Wait about 15 seconds after disconnecting the power cords for your
system to stop running. Watch for the System Power light on the operator information panel to stop blinking.
8 Hardware Maintenance Manual: Netfinity 7100 – Type 8666
Information LED panel:The lights on this panel give status information for the server. See “Information LED panel”.
Information LED panel
The following illustration shows the status lights on the Information LED panel.
POST-complete
System power
OK
System error
Information
Hard disk drive
activity
Processor activity
1 2 3 4
Ethernet speed Ethernet-link
100 MB
status
LINK
OK
TX
RX
Ethernet transmit/receive activity
System power light: When this green light is on, power is present in the server. When this light flashes, the server is in standby mode (the system power supply is turned off and ac current is present). When this light is off, the power subsystem, the ac power, or a light has failed.
Attention: If the system power light is off, it does not mean there is no electrical
current present in the server. The light might be burned out. To remove all electrical current from the server, you must unplug the server power cords from the electrical outlets or from the uninterruptible power supply.
POST-complete light:This green light is on when the power-on self-test (POST) completes without any errors.
Hard disk drive activity light: This green light flickers when there is activity on a hard disk drive.
Information light: When this amber light is on, the server power supplies are nonredundant or some other noncritical event has occurred. A light on the diagnostic panel may also be on. The event is recorded in the Event log. See “Choices available from the Configuration/Setup main menu” on page 31 for information on viewing the Event log.
System error light: This amber light is on when a system error occurs. A light on the diagnostics LED panel will also be on to further isolate the error. (For more information, see “Diagnostic panel LEDs” on page 16.)
Ethernet transmit/receive activity light: When this green light is on, there is activity between the server and the network.
Ethernet-link status light: When this green light is on, there is an active connection on the Ethernet port.
General information 9
Ethernet speed 100 Mbps: When this green light is on, the Ethernet speed is 100 Mbps. When the light is off, the Ethernet speed is 10 Mbps.
Processor activity lights: One or more of these green lights are on when there is microprocessor activity. The number of lights that are on indicates the number of microprocessors with activity.
10 Hardware Maintenance Manual: Netfinity 7100 – Type 8666
Diagnostics
Diagnostic tools overview .........11
POST ................11
POST beep codes ...........12
POST error messages ..........12
Event/error logs............12
Small computer system interface messages ....12
Diagnostic programs and error messages ....12
Text messages ............13
Starting the diagnostic programs ......14
Viewing the test log ..........15
Diagnostic error message tables.......15
Light path diagnostics ...........15
Power supply LEDs ..........15
Diagnostic panel LEDs ........16
This section provides basic troubleshooting information to help you resolve some common problems that might occur with the server.
If you cannot locate and correct the problem using the information in this section, refer to “Symptom-to-FRU index” on page 119 for more information.
Diagnostic tools overview
The following tools are available to help you identify and resolve hardware-related problems:
v POST beep codes, error messages, and error logs
The power-on self-test (POST) generates beep codes and messages to indicate successful test completion or the detection of a problem. See “POST” for more information.
v Diagnostic programs and error messages
The server diagnostic programs are stored in upgradable read-only memory (ROM) on the system board. These programs are the primary method of testing the major components of the server. See “Diagnostic programs and error messages” on page 12 for more information.
v Light path diagnostics
Your server has light-emitting diodes (LEDs) to help you identify problems with server components. These LEDs are part of the light-path diagnostics that are built into the server. By following the path of lights, you can quickly identify the type of system error that occurred. See “Light path diagnostics” on page 15 for more information.
v Error symptoms
These charts list problem symptoms, along with suggested steps to correct the problems. See the “Diagnosing errors” on page 22 for more information.
Light path diagnostics .........17
Power checkout .............19
Temperature checkout ...........20
Recovering BIOS ............20
Replacing the battery ...........21
Diagnosing errors ............22
Troubleshooting the Ethernet controller ....23
Network connection problems ......23
Ethernet controller troubleshooting chart . . 23
Ethernet controller messages........25
Novell NetWare or IntraNetWare server ODI
driver messages ...........25
NDIS 4.0 (Windows NT) driver messages . . 27
UNIX messages ...........28
POST
When you turn on the server, it performs a series of tests to check the operation of server components and some of the options installed in the server. This series of tests is called the power-on self-test or POST.
© Copyright IBM Corp. 1999 11
If POST finishes without detecting any problems, a single beep sounds, the first screen of the operating system or application program appears, and the System POST Complete (OK) light is illuminated on the operator information panel.
If POST detects a problem, more than one beep sounds and an error message appears on the screen. See “POST beep codes” and “POST error messages” for more information.
Notes:
1. If you have a power-on password or administrator password set, you must
type the password and press Enter, when prompted, before POST will continue.
2. A single problem might cause several error messages. When this occurs, work
to correct the cause of the first error message. After you correct the cause of the first error message, the other error messages usually will not occur the next time you run the test.
POST beep codes
POST generates beep codes to indicate successful completion or the detection of a problem.
v One beep indicates the successful completion of POST. v More than one beep indicates that POST detected a problem. For more
information, see “Beep symptoms” on page 119.
POST error messages
POST error messages occur during startup when POST finds a problem with the hardware or detects a change in the hardware configuration. For a list of POST errors, see “POST error codes” on page 131.
Event/error logs
The POST error log contains the three most recent error codes and messages that the system generated during POST. The System Event/Error Log contains all error messages issued during POST and all system status messages from the Netfinity Advanced System Management Processor.
To view the contents of the error logs, start the Configuration/Setup Utility program; then, select Event/Error Logsfrom the main menu.
Small computer system interface messages
If you receive a SCSI error message, see “SCSI error codes” on page 138.
Note: If the server does not have a hard disk drive, ignore any message that
indicates that the BIOS is not installed.
You will get these messages only when running the SCSISelect Utility.
Diagnostic programs and error messages
The server diagnostic programs are stored in upgradable read-only memory (ROM) on the system board. These programs are the primary method of testing the major components of the server.
12 Hardware Maintenance Manual: Netfinity 7100 – Type 8666
Diagnostic error messages indicate that a problem exists; they are not intended to be used to identify a failing part. Troubleshooting and servicing of complex problems that are indicated by error messages should be performed by trained service personnel.
Sometimes the first error to occur causes additional errors. In this case, the server displays more than one error message. Always follow the suggested action instructions for the first error message that appears.
The following sections contain the error codes that might appear in the detailed test log and summary log when running the diagnostic programs.
The error code format is as follows:
fff-ttt-iii-date-cc-text message
where: fff is the three-digit function code that indicates the function being
tested when the error occurred. For example, function code 089 is for the microprocessor.
ttt is the three-digit failure code that indicates the exact test failure
that was encountered.
iii is the three-digit device ID. date is the date that the diagnostic test was run and the error recorded. cc is the check digit that is used to verify the validity of the
text message is the diagnostic message that indicates the reason for the problem.
Text messages
The diagnostic text message format is as follows:
Function Name: Result (test specific string)
where:
Function Name
is the name of the function being tested when the error occurred. This corresponds to the function code (fff) given in the previous list.
Result can be one of the following:
Passed
Failed This result occurs when the diagnostic test discovers an error. User Aborted
Not Applicable
information.
This result occurs when the diagnostic test completes without any errors.
This result occurs when you stop the diagnostic test before it is complete.
This result occurs when you specify a diagnostic test for a device that is not present.
Aborted
This result occurs when the test could not proceed because of the system configuration.
Diagnostics 13
Warning
This result occurs when a possible problem is reported during the diagnostic test, such as when a device that is to be tested is not installed.
Test Specific String
This is additional information that you can use to analyze the problem.
Starting the diagnostic programs
You can press F1 while running the diagnostic programs to obtain Help information. You also can press F1 from within a help screen to obtain online documentation from which you can select different categories. To exit Help and return to where you left off, press Esc.
To start the diagnostic programs:
1. Turn on the server and watch the screen.
Note: To run the diagnostic programs, you must start the server with the
highest level password that is set. That is, if an administrator password is set, you must enter the administrator password, not the power-on password, to run the diagnostic programs.
2. When the message F2 for Diagnostics appears, press F2.
3. Type in the appropriate password when prompted; then, press Enter.
4. Select either Extended or Basic from the top of the screen.
5. When the Diagnostic Programs screen appears, select the test you want to run
from the list that appears; then, follow the instructions on the screen.
Notes:
a. If the server stops during testing and you cannot continue, restart the server
and try running the diagnostic programs again. If the problem persists, flash server with the latest diagnostics code and run the test again.
b. The keyboard and mouse (pointing device) tests assume that a keyboard
and mouse are attached to the server.
c. If you run the diagnostic programs with no mouse attached to the server,
you will not be able to navigate between test categories using the Next Cat and Prev Catbuttons. All other functions provided by mouse-selectable buttons are also available using the function keys.
d. You can run the USB interface test and the USB external loopback test only
if there are no USB devices attached.
e. You can view server configuration information (such as system
configuration, memory contents, interrupt request (IRQ) use, direct memory access (DMA) use, device drivers, and so on) by selecting Hardware Info from the top of the screen.
When the tests have completed, you can view the Test Log by selecting Utility from the top of the screen.
If the hardware checks out OK but the problem persists during normal server operations, a software error might be the cause. If you suspect a software problem, refer to the information that comes with the software package.
14 Hardware Maintenance Manual: Netfinity 7100 – Type 8666
Viewing the test log
The test log will not contain any information until after the diagnostic program has run.
Note: If you already are running the diagnostic programs, begin with step 3.
To view the test log:
1. Turn on the server and watch the screen.
If the server is on, shut down the operating system and restart the server.
2. When the message F2 for Diagnostics appears, press F2.
If a power-on password or administrator password is set, the server prompts you for it. Type in the appropriate password; then, press Enter.
3. When the Diagnostic Programs screen appears, select Utility from the top of
the screen.
4. Select View Test Log from the list that appears; then, follow the instructions on
the screen. The system maintains the test-log data while the server is powered on. When
you turn off the power to the server, the test log is cleared.
Diagnostic error message tables
For descriptions of the error messages that might appear when you run the diagnostic programs, see “Diagnostic error codes” on page 125. If diagnostic error messages appear that are not listed in those tables, make sure that the server has the latest levels of BIOS, Advanced System Management Processor, ServeRAID, and diagnostics microcode installed.
Light path diagnostics
The server has LEDs to help you identify problems with some server components. These LEDs are part of the light path diagnostics built into the server. By following the path of lights you can quickly identify the type of system error that occurred.
Power supply LEDs
The AC and DC power LEDs on the power supply provide status information about the power supply. See “Installing a hot-swap power supply” on page 60 for
Diagnostics 15
the location of these LEDs.
Handle
Filler panel
AC power lightDC power light
The following table describes the AC and DC power LEDs. For more information see “Power checkout” on page 19.
AC power LED
On On The power supply is on and operating correctly. On Off There is a dc power problem.
Off Off There is an ac power problem.
DC power LED
Description and action
Possible causes:
1. The server is not turned on (the power LED is blinking on the
front of the server). Action: Press the power-control button to start the server.
2. The power supply has failed.
Action: Replace the power supply.
Possible causes:
1. There is no ac power to the power supply.
Actions: Verify that:
v The electrical cord is properly connected to the server. v The electrical outlet functions properly.
2. The power supply has failed.
Action: Replace the power supply.
Diagnostic panel LEDs
The following illustration shows the LEDs on the diagnostics panel inside the server. See Table 1 on page 17 for information on identifying problems using these
16 Hardware Maintenance Manual: Netfinity 7100 – Type 8666
LEDs.
SMI
DASD1
NON RED
NMI
MEM
OVER SPEC
SP
CPU
PS1
PCIA PCIB
PS2
FAN TEMPVRM
PS3
PCIC
PS4
Light path diagnostics
You can use the light path diagnostics built into the server to quickly identify the type of system error that occurred. Your server is designed so that LEDs remain illuminated when the server shuts down, as long as the power supplies are operating properly. This feature helps you to isolate the problem if an error causes the server to shut down.
If the system error LED (on the information LED panel) is not lit and no diagnostics panel LEDs are lit, it means that the light path diagnostics have not detected a system error.
If the system error LED (on the information LED panel) is lit, it means that a system error was detected. Check to see which of the LEDs on the diagnostics panel inside the server are lit and refer to the following table:
Table 1. Light path diagnostics
LED Cause Action
None The system error log is 75% or
more full; a PFA alert was logged; or a failure occurred on the I2C bus.
SMI A systems management event
occurred.
NMI A nonmaskable interrupt
occurred. The PCIA, PCIB, or PCIC LED will probably also be on.
SP The service processor has
failed.
Check the system error log and correct any problems. See “Choices available from the Configuration/Setup main menu” on page 31 for information about clearing the error log. Disconnecting the server from all power sources for at least 20 seconds will turn off the system error LED.
Restart the server.
1. If the PCIA, PCIB, or PCIC LED is not on,
restart the server.
If the problem persists, try to determine the failing adapter by removing one adapter at a time and restarting the server after each adapter is removed.
1. Run service processor diagnostics.
2. Replace Legacy I/O board.
Diagnostics 17
Table 1. Light path diagnostics (continued)
LED Cause Action
PCIA An error occurred on PCI bus
A. An adapter in PCI slot 1 or 2, or the processor board caused the error.
PCIB An error occurred on PCI bus
B. An adapter in PCI slot 3, 4, 5, or 6 or the processor board caused the error.
PCIC An error occurred on PCI bus
C. An error on the processor or I/O board caused the problem.
DASD A hot-swap hard disk drive
has failed on SCSI channel B.
MEM A memory error occurred. 1. Check the DIMM error LEDs on the
CPU One of the microprocessors
has failed or a microprocessor is installed in the wrong connector.
VRM One of the voltage regulator
modules on the processor board has failed.
1. Check the error log for additional
information.
2. If you cannot correct the problem from
the information in the error log, try to determine the failing adapter by removing one adapter at a time from PCI bus A (PCI slots 1–2) and restarting the server after each adapter is removed.
1. Check the error log for additional
information.
2. If you cannot correct the problem from
the information in the error log, try to determine the failing adapter by removing one adapter at a time from PCI bus B (PCI slots 3–6) and restarting the server after each adapter is removed.
Check the error log for additional information. If the error log indicates a problem with the integrated SCSI controller, the Ethernet controller or video controller, see “Starting the diagnostic programs” on page 14.
1. If the TEMP LED is also on, take the
actions listed for that LED.
2. If the amber status LED on one of the
hot-swap hard disk drives is on, replace the drive.
memory board.
2. Replace the DIMM indicated by the lit
DIMM error LEDs.
1. Check the microprocessor error LEDs on
the memory board. If a microprocessor error LED is on for a microprocessor connector that has a terminator card installed instead of a microprocessor, the microprocessors are not installed in the correct order. See “Installing a microprocessor kit” on page 57 for information about the correct order for installing microprocessors and VRMs. Otherwise, continue with the next step.
2. Turn off the server, reseat the
microprocessor indicated by the lit microprocessor error LED, and restart the server.
3. If the problem persists, replace the
microprocessor.
1. Check the VRM error LEDs on the
processor board.
2. Turn off the server, reseat the VRM
indicated by the lit VRM error LED, and restart the server.
3. If the problem persists, replace the VRM.
18 Hardware Maintenance Manual: Netfinity 7100 – Type 8666
Table 1. Light path diagnostics (continued)
LED Cause Action
FAN One of the fan assemblies has
failed or is operating too slowly. Note: A failing fan can also cause the TEMP and DASD LEDs to be on.
TEMP The system temperature has
exceeded the maximum rating.
NON RED Server drawing too much
power to operate in a redundant power mode.
OVER SPEC
PS1 The first power supply has
PS2 The second power supply has
PS3 The third power supply has
PS4 The fourth power supply has
The server is drawing more power than the power supplies are rated for.
failed.
failed.
failed.
failed.
The LED on the failing fan assembly will be lit. Replace the fan assembly.
1. Check to see if a fan has failed. If it has,
replace the fan.
2. Make sure the room temperature is not
too high. (See “Temperature checkout” on page 20.)
If the problem persists, see “Diagnostic panel LEDs” on page 122.
System can continue to operate in a nonredundant power mode. To operate in a redundant mode, add a power supply or remove most recently installed options.
Either add a power supply or remove a device from the server.
Replace the first power supply.
Replace the second power supply.
Replace the third power supply.
Replace the fourth power supply.
Power checkout
Power problems can be difficult to troubleshoot. For instance, a short circuit can exist anywhere on any of the power distribution busses. Usually a short circuit will cause the power subsystem to shut down because of an overcurrent condition.
A general procedure for troubleshooting power problems is as follows:
1. Power off the system and disconnect the AC cord(s).
2. Check for loose cables in the power subsystem. Also check for short circuits, for
instance if there is a loose screw causing a short circuit on a circuit board.
3. Remove adapters and disconnect the cables and power connectors to all
internal and external devices until system is at minimum configuration required for power on (see Minimum operating requirementson page 131).
4. Reconnect the AC cord and power on the system. If the system powers up
successfully, replace adapters and devices one at a time until the problem is isolated. If system does not power up from minimal configuration, replace FRUs of minimal configuration one at a time until the problem is isolated.
To use this method it is important to know the minimum configuration required for a system to power up (see page 131). For specific problems, see “Power error messages” on page 139.
Diagnostics 19
Temperature checkout
Proper cooling of the system is important for proper operation and system reliability. For a typical Netfinity server, you should make sure:
v Each of the drive bays has either a drive or a filler panel installed v Each of the power supply bays has either a power supply or a filler panel
installed
v The top cover is in place during normal operation v There is at least 50 mm (2 inches) of ventilated space at the sides of the server
and 100 mm (4 inches) at the rear of the server
v The top cover is removed for no longer than 30 minutes while the server is
operating
v The processor housing cover covering the processor and memory area is
removed for no longer that ten minutes while the server is operating
v A removed hot-swap drive is replaced within two minutes of removal v Cables for optional adapters are routed according to the instructions provided
with the adapters (ensure that cables are not restricting air flow)
v The fans are operating correctly and the air flow is good v A failed fan is replaced within 48 hours
In addition, ensure that the environmental specifications for the system are met. See “Features and specifications” on page 3.
For more information on specific temperature error messages, see “Temperature error messages” on page 138.
Recovering BIOS
If the BIOS code in the server has become corrupted, such as from a power failure during a flash update, you can recover the BIOS using the recovery boot block and a BIOS flash diskette.
Note: You can obtain a BIOS flash diskette from one of the following sources:
The flash memory of the server consists of a primary page and a backup page. The J56 jumper controls which page is used to start the server. If the BIOS in the primary page is corrupted, you can use the backup page to start the server; then boot the BIOS Flash Diskette to restore the BIOS to the primary page.
To recover the BIOS:
v Use the ServerGuide program to make a BIOS flash diskette. v Download a BIOS flash diskette from the World Wide Web. Go to
http://www.pc.ibm.com/support/, select IBM Server Support, and make the selections for the server.
1. Turn off the server and peripheral devices and disconnect all external cables
and power cords; then, remove the cover.
2. Locate jumper J56 on the processor board (see “Processor board jumpers” on
page 42).
3. Move J56 to pins 1 and 2 to enable secondary boot block page.
4. Insert the BIOS flash diskette into the diskette drive.
5. Restart the server.
20 Hardware Maintenance Manual: Netfinity 7100 – Type 8666
Loading...
+ 168 hidden pages