ThinkServer TD200x
Machine Types: 3719, 3821, 3822, and 3823
Page 2
Page 3
ThinkServer TD200x Types 3719, 3821, 3822, and 3823
Hardw are Maintenance Man ual
Page 4
Note: Before using this information and the product it supports, read the general information in Appendix B, “Notices,” on page 279
and the Warranty and Support Information document on the Lenovo®ThinkServer Documentation DVD.
LENOVO products, data, computer software, and services have been developed exclusively at private expense and
are sold to governmental entities as commercial items as defined by 48 C.F.R. 2.101 with limited and restricted rights
to use, reproduction and disclosure.
LIMITED AND RESTRICTED RIGHTS NOTICE: If products, data, computer software, or services are delivered
pursuant a General Services Administration ″GSA″ contract, use, reproduction, or disclosure is subject to restrictions
set forth in Contract No. GS-35F-05925.
Page 5
Contents
Chapter 1. About this manual ...................1
Important Safety Information ....................1
Important information about replacing RoHS compliant FRUs ........2
Turkish statement of compliance ...................3
Chapter 2. Safety information ...................5
Guidelines for trained service technicians...............6
Inspecting for unsafe conditions ..................6
Guidelines for servicing electrical equipment.............6
Safety statements ........................8
Chapter 3. General information ..................15
Features and technologies....................15
Specifications .........................17
Software ...........................18
EasyStartup .........................19
EasyManage.........................19
Chapter 4. General Checkout ...................21
Checkout procedure .......................21
About the checkout procedure ..................21
Performing the checkout procedure ................22
Diagnosing a problem ......................22
Undocumented problems .....................25
Chapter 5. Diagnostics .....................27
Diagnostic tools........................27
Event logs..........................27
Viewing event logs through the Setup utility .............28
Viewing event logs without restarting the server ............28
This Hardware Maintenance Manual contains information to help you solve
problems that might occur in your server. It describes the diagnostic tools that come
with the server, error codes and suggested actions, and instructions for replacing
failing components.
Replaceable components are of three types:
|
|
|
|
|
|
|
|
v Self-service customer replaceable unit (CRU): Replacement of self-service
CRUs is your responsibility. If Lenovo installs a self-service CRU at your request,
you will be charged for the installation.
v Optional-service customer replaceable unit: You may install an
optional-service CRU yourself or request Lenovo to install it, at no additional
charge, under the type of warranty service that is designated for the server.
v Field replaceable unit (FRU): FRUs must be installed only by trained service
technicians.
The most recent version of this document is available at http://www.lenovo.com/
support.
Before servicing a Lenovo product, be sure to read the Safety Information. See
Chapter 2, “Safety information,” on page 5.
For information about the terms of the warranty and getting service and assistance,
see the Warranty and Support Information document.
Important Safety Information
Be sure to read all caution and danger statements in this book before performing
any of the instructions.
Veuillez lire toutes les consignes de type DANGER et ATTENTION du présent
document avant d’exécuter les instructions.
Lesen Sie unbedingt alle Hinweise vom Typ ″ACHTUNG″ oder ″VORSICHT″ in
dieser Dokumentation, bevor Sie irgendwelche Vorgänge durchführen
Leggere le istruzioni introdotte da ATTENZIONE e PERICOLO presenti nel manuale
prima di eseguire una qualsiasi delle istruzioni
Certifique-se de ler todas as instruções de cuidado e perigo neste manual antes de
executar qualquer uma das instruções
Es importante que lea todas las declaraciones de precaución y de peligro de este
manual antes de seguir las instrucciones.
Important information about replacing RoHS compliant FRUs
RoHS, The Restriction of Hazardous Substances in Electrical and Electronic
Equipment Directive (2002/95/EC) is a European Union legal requirement
affecting the global electronics industry. RoHS requirements must be
implemented on Lenovo products placed on the market and sold in the
European Union after June 2006. Products on the market before June 2006
are not required to have RoHS compliant parts. If the parts are not compliant
originally, replacement parts can also be noncompliant, but in all cases, if the
parts are compliant, the replacement parts must also be compliant.
Note: RoHS and non-RoHS FRU part numbers with the same fit and function are
identified with unique FRU part numbers.
Lenovo plans to transition to RoHS compliance well before the implementation date
and expects its suppliers to be ready to support Lenovo’s requirements and
schedule in the EU. Products sold in 2005, will contain some RoHS compliant
FRUs. The following statement pertains to these products and any product Lenovo
produces containing RoHS compliant parts.
RoHS compliant ThinkCentre parts have unique FRU part numbers. Before or after
June, 2006, failed RoHS compliant parts must always be replaced using RoHS
compliant FRUs, so only the FRUs identified as compliant in the system HMM or
direct substitutions for those FRUs can be used.
Products marketed before June 2006Products marketed after June 2006
Current or original
part
Non-RoHSCan be Non-RoHSMust be RoHSMust be RoHS
Non-RoHSCan be RoHS
Non-RoHSCan sub to RoHS
RoHSMust be RoHS
Replacement FRUCurrent or original
part
Replacement FRU
Note: A direct substitution is a part with a different FRU part number that is
automatically shipped by the distribution center at the time of order.
The Lenovo product meets the requirements of the Republic of Turkey Directive on
the Restriction of the Use of Certain Hazardous Substances in Electrical and
Electronic Equipment (EEE).
Türkiye EEE Yönetmeliğine Uygunluk Beyanı
Bu Lenovo ürünü,
“Elektrik ve Elektronik Eşyalarda Bazı Zararlı Maddelerin
Kullanımının Sınırlandırılmasına Dair Yönetmelik (EEE)”
direktiflerine uygundur.
This section contains information for trained service technicians.
Inspecting for unsafe conditions
Use the information in this section to help you identify potential unsafe conditions in
a Lenovo product that you are working on. Each Lenovo product, as it was
designed and manufactured, has required safety items to protect users and service
technicians from injury. The information in this section addresses only those items.
Use good judgment to identify potential unsafe conditions that might be caused by
non-Lenovo alterations or attachment of non-Lenovo features or options that are not
addressed in this section. If you identify an unsafe condition, you must determine
how serious the hazard is and whether you must correct the problem before you
work on the product.
Consider the following conditions and the safety hazards that they present:
v Electrical hazards, especially primary power. Primary voltage on the frame can
cause serious or fatal electrical shock.
v Explosive hazards, such as a damaged CRT face or a bulging capacitor.
v Mechanical hazards, such as loose or missing hardware.
To inspect the product for potential unsafe conditions, complete the following steps:
1. Make sure that the power is off and the power cord is disconnected.
2. Make sure that the exterior cover is not damaged, loose, or broken, and
observe any sharp edges.
3. Check the power cord:
v Make sure that the third-wire ground connector is in good condition. Use a
meter to measure third-wire ground continuity for 0.1 ohm or less between
the external ground pin and the frame ground.
v Make sure that the power cord is the correct type.
v Make sure that the insulation is not frayed or worn.
4. Remove the cover.
5. Check for any obvious non-Lenovo alterations. Use good judgment as to the
safety of any non-Lenovo alterations.
6. Check inside the server for any obvious unsafe conditions, such as metal filings,
contamination, water or other liquid, or signs of fire or smoke damage.
7. Check for worn, frayed, or pinched cables.
8. Make sure that the power-supply cover fasteners (screws or rivets) have not
been removed or tampered with.
Guidelines for servicing electrical equipment
Observe the following guidelines when servicing electrical equipment:
v Check the area for electrical hazards such as moist floors, nongrounded power
extension cords, power surges, and missing safety grounds.
v Use only approved tools and test equipment. Some hand tools have handles that
are covered with a soft material that does not provide insulation from live
electrical currents.
v Regularly inspect and maintain your electrical hand tools for safe operational
condition. Do not use worn or broken tools or testers.
v Do not touch the reflective surface of a dental mirror to a live electrical circuit.
The surface is conductive and can cause personal injury or equipment damage if
it touches a live electrical circuit.
v Some rubber floor mats contain small conductive fibers to decrease electrostatic
discharge. Do not use this type of mat to protect yourself from electrical shock.
v Do not work alone under hazardous conditions or near equipment that has
hazardous voltages.
v Locate the emergency power-off (EPO) switch, disconnecting switch, or electrical
outlet so that you can turn off the power quickly in the event of an electrical
accident.
v Disconnect all power before you perform a mechanical inspection, work near
power supplies, or remove or install main units.
v Before you work on the equipment, disconnect the power cord. If you cannot
disconnect the power cord, have the customer power-off the wall box that
supplies power to the equipment and lock the wall box in the off position.
v Never assume that power has been disconnected from a circuit. Check it to
make sure that it has been disconnected.
v If you have to work on equipment that has exposed electrical circuits, observe
the following precautions:
– Make sure that another person who is familiar with the power-off controls is
near you and is available to turn off the power if necessary.
– When you are working with powered-on electrical equipment, use only one
hand. Keep the other hand in your pocket or behind your back to avoid
creating a complete circuit that could cause an electrical shock.
– When you use a tester, set the controls correctly and use the approved probe
leads and accessories for that tester.
– Stand on a suitable rubber mat to insulate you from grounds such as metal
floor strips and equipment frames.
v Use extreme care when you measure high voltages.
v To ensure proper grounding of components such as power supplies, pumps,
blowers, fans, and motor generators, do not service these components outside of
their normal operating locations.
v If an electrical accident occurs, use caution, turn off the power, and send another
person to get medical aid.
Chapter 2. Safety information7
Page 16
Safety statements
Important:
Each caution and danger statement in this document is labeled with a number. This
number is used to cross reference an English-language caution or danger
statement with translated versions of the caution or danger statement in the SafetyInformation document.
For example, if a caution statement is labeled "Statement 1," translations for that
caution statement are in the Safety Information document under "Statement 1."
Be sure to read all caution and danger statements in this document before you
perform the procedures. Read any additional safety information that comes with the
server or optional device before you install the device.
Attention:Use No. 26 AWG or larger UL-listed or CSA certified
telecommunication line cord.
Electrical current from power, telephone, and communication cables is
hazardous.
To avoid a shock hazard:
v Do not connect or disconnect any cables or perform installation,
maintenance, or reconfiguration of this product during an electrical
storm.
v Connect all power cords to a properly wired and grounded electrical
outlet.
v Connect to properly wired outlets any equipment that will be attached to
this product.
v When possible, use one hand only to connect or disconnect signal
cables.
v Never turn on any equipment when there is evidence of fire, water, or
structural damage.
v Disconnect the attached power cords, telecommunications systems,
networks, and modems before you open the device covers, unless
instructed otherwise in the installation and configuration procedures.
v Connect and disconnect cables as described in the following table when
installing, moving, or opening covers on this product or attached
devices.
To Connect:To Disconnect:
1. Turn everything OFF.
2. First, attach all cables to devices.
3. Attach signal cables to connectors.
4. Attach power cords to outlet.
5. Turn device ON.
1. Turn everything OFF.
2. First, remove power cords from outlet.
3. Remove signal cables from connectors.
4. Remove all cables from devices.
Chapter 2. Safety information9
Page 18
Statement 2:
CAUTION:
When replacing the lithium battery, use only a type battery recommended by
the manufacturer. If your system has a module containing a lithium battery,
replace it only with the same module type made by the same manufacturer.
The battery contains lithium and can explode if not properly used, handled, or
disposed of.
Do not:
v Throw or immerse into water
v Heat to more than 100°C (212°F)
v Repair or disassemble
Dispose of the battery as required by local ordinances or regulations.
CAUTION:
When laser products (such as CD-ROMs, DVD drives, fiber optic devices, or
transmitters) are installed, note the following:
v Do not remove the covers. Removing the covers of the laser product could
result in exposure to hazardous laser radiation. There are no serviceable
parts inside the device.
v Use of controls or adjustments or performance of procedures other than
those specified herein might result in hazardous radiation exposure.
DANGER
Some laser products contain an embedded Class 3A or Class 3B laser
diode. Note the following.
Laser radiation when open. Do not stare into the beam, do not view directly
with optical instruments, and avoid direct exposure to the beam.
Class 1 Laser Product
Laser Klasse 1
Laser Klass 1
Luokan 1 Laserlaite
Appareil A Laser de Classe 1
`
Chapter 2. Safety information11
Page 20
Statement 4:
≥ 18 kg (39.7 lb)≥ 32 kg (70.5 lb)≥ 55 kg (121.2 lb)
CAUTION:
Use safe practices when lifting.
Statement 5:
CAUTION:
The power control button on the device and the power switch on the power
supply do not turn off the electrical current supplied to the device. The device
also might have more than one power cord. To remove all electrical current
from the device, ensure that all power cords are disconnected from the power
source.
CAUTION:
Never remove the cover on a power supply or any part that has the following
label attached.
Hazardous voltage, current, and energy levels are present inside any
component that has this label attached. There are no serviceable parts inside
these components. If you suspect a problem with one of these parts, contact
a service technician.
Statement 26:
CAUTION:
Do not place any object on top of rack-mounted devices.
Attention:This server is suitable for use on an IT power distribution system
whose maximum phase-to-phase voltage is 240 V under any distribution fault
condition.
Important: This product is not suitable for use with visual display workplace devices
according to Clause 2 of the German Ordinance for Work with Visual Display Units.
This chapter provides general information that applies to all machine types
supported by this publication.
Features and technologies
The TD200x server offers the following features and technologies:
v UEFI-compliant server firmware
The server firmware offers several features, including Unified Extensible
Firmware Interface (UEFI) 2.1 compliance, enhanced RAS capabilities, and BIOS
compatibility support. UEFI replaces the basic input/output system (BIOS) and
defines a standard interface between the operating system, platform firmware,
and external devices. UEFI-compliant servers are capable of starting
UEFI-compliant operating systems, BIOS-based operating systems, and
BIOS-based adapters as well as UEFI-compliant adapters.
Note: The server does not support DOS.
v Integrated Management Module
The integrated management module (IMM) combines service processor
functions, video controller, and remote presence function in a single chip. The
IMM provides advanced service-processor control, monitoring, and alerting
function. If an environmental condition exceeds a threshold or if a system
component fails, the IMM lights LEDs to help you diagnose the problem, records
the error in the event log, and alerts you to the problem. The IMM also provides
a virtual presence capability for remote server management capabilities. The IMM
provides remote server management through industry-standard interfaces:
– Intelligent Platform Management Interface (IPMI) version 2.0
– Simple Network Management Protocol (SNMP) version 3
– Common Information Model (CIM)
– Web browser
v Remote presence capability and blue-screen capture
The remote presence feature provides the following functions:
– Remotely viewing video with graphics resolutions up to 1600 x 1200 at 85 Hz,
regardless of the system state
– Remotely accessing the server, using the keyboard and mouse from a remote
client
– Mapping the CD or DVD drive, diskette drive, and USB flash drive on a
remote client, and mapping ISO and diskette image files as virtual drives that
are available for use by the server
– Uploading a diskette image to the IMM memory and mapping it to the server
as a virtual drive
The blue-screen capture feature captures the video display contents before the
IMM restarts the server when the IMM detects an operating-system hang
condition. A system administrator can use the blue-screen capture to assist in
determining the cause of the hang condition.
v Preboot diagnostics programs
The preboot diagnostics programs are stored on the integrated USB memory. It
collects and analyzes system information to aid in diagnosing server problems.
The diagnostics programs collect the following information about the server:
– System configuration
– Network interfaces and settings
– Installed hardware
– EasyLED diagnostics status
– Service processor status and configuration
– Vital product data, firmware, and UEFI (formerly BIOS) configuration
– Hard disk drive health
– RAID controller configuration
– Event logs for service processors
The diagnostic programs create a merged log that includes events from all
collected logs. The information is collected into a file that you can send to
Lenovo service and support. Additionally, you can view the information locally
through a generated text report file. You can also copy the log to a removable
media and view the log from a Web browser.
For additional information about preboot diagnostics, see “Running the diagnostic
programs” on page 90.
v EasyStartup DVD
The ThinkServer EasyStartup program guides you through the configuration of
the hardware, the RAID controller, and the installation of the operating system
and device drivers.
v EasyManage DVD
The ThinkServer EasyManage program helps you manage and administer your
servers and clients through remote problem notification as well as monitoring and
alerting.
v Integrated network support
The server comes with one integrated Broadcom 5709C series Gigabit Ethernet
controller, which supports connection to a 10 Mbps, 100 Mbps, or 1000 Mbps
network. For more information, see “Enabling the Broadcom Gigabit Ethernet
Utility program” on page 266.
v Intelligent Platform Management Interface (IPMI) 2.0
IPMI 2.0 support providing secure remote power-on/power-off and several
standard alerts for components such as fans, voltage, and temperature.
v Large data-storage capacity and hot-swap capability
The server supports up to eight or 16 (depending on your model) 2.5-inch
hot-swap hard disk drives in the hot-swap bays. With the hot-swap feature, you
can add, remove, or replace hard disk drives without turning off the server.
v Large system-memory capacity
The server supports up to 64 GB of system memory. The memory controller
supports error correcting code (ECC) for up to 16 single-sided industry-standard
third-generation double-data-rate 3 (DDR3) 800, 1066, and 1333, 240-pin,
registered, synchronous dynamic random access memory (SDRAM) dual inline
memory modules (DIMMs).
v EasyLED diagnostics
EasyLED diagnostics provides LEDs to help you diagnose problems. For more
information, see “EasyLED diagnostics panel” on page 129.
v Memory mirroring
Memory mirroring improves the availability of memory by writing information to
the main memory and redundant locations in a mirrored pair of DIMMs.
The server has five slots for PCI Express x8 adapters. Three of these slots
accept x8 adapters, but the adapters will operate as x4 adapters.
v PCI Express x16 adapter capabilities
The server has one slot for PCI Express x16 adapter, which will operate as an x8
adapter.
v Redundant cooling and power capabilities
The server supports up to two 920-watt hot-swap power supplies. If the server
came with only one power supply, you can install an additional power supply with
three redundant hot-swap cooling fans to add redundant power and cooling
capabilities. If the maximum load on the server is less than 920 watts and a
problem occurs with one of the power supplies, the other power supply can meet
the power requirements. The redundant cooling of the fans enables continued
operation if one of the fans fails.
v RAID support
The server supports an internal RAID SAS Controller, which is required for you to
use the hot-swap hard disk drives and to create redundant array of independent
disks (RAID) configurations.
v Symmetric multiprocessing (SMP)
®
The server supports up to two Intel
Xeon®quad-core microprocessors. If the
server comes with only one microprocessor, you can install an additional
microprocessor to enhance performance and provide SMP capability.
v Systems-management capabilities
The server contains an Integrated Management Module (IMM) which enables you
to manage the functions of the server locally and remotely and provides remote
presence and blue-screen capture capability. The IMM also provides system
monitoring and event recording.
v TCP/IP offload engine (TOE) support
The Ethernet controllers in the server support TOE, which is a technology that
offloads the TCP/IP flow from the microprocessors and I/O subsystem to increase
the speed of the TCP/IP flow. When an operating system that supports TOE is
running on the server and TOE is enabled, the server supports TOE operation.
See the operating-system documentation for information about enabling TOE.
Specifications
®
Note: As of the date of this document, the Linux
operating system does not
support TOE.
The following information is a summary of the features and specifications of the
server. Depending on the server model, some features might not be available, or
some specifications might not apply.
Chapter 3. General information17
Page 26
Table 1. Features and specifications
Microprocessor:
v Intel Xeon dual-core or quad-core with
integrated memory controller and Quick Path
Interconnect (QPI) architecture
v Designed for LGA 1366 socket
v Scalable up to four cores
v 32 KB instruction cache, 32 KB data cache,
and 8 MB cache that is shared among the
cores
v Support for up to two microprocessors, second
microprocessor with pluggable VRM
v Support for Intel Extended Memory 64
Technology (EM64T)
Note: Use the Setup Utility to determine the type
and speed of the microprocessors. For a list of
supported microprocessors, see
http://www.lenovo.com/thinkserver and click
Options.
Memory:
v 16 DIMM connectors (eight per
microprocessor)
v Minimum: 2 GB DIMM per microprocessor
v Maximum: 64 GB
v Type: Registered ECC DDR3 800, 1066, and
1333 MHz DIMMs only
v Sizes: 1 GB single-rank, 2 GB single-rank or
dual-rank, 4 GB dual-rank (PC3-10600R-999)
Drives:
v S ATA:
– DVD (standard)
– DVD/CD-RW (optional)
– Maximum of two devices can be installed
v Diskette (optional): External USB 1.44 MB
v Supported hard disk drives:
– Serial Attached SCSI (SAS)
Expansion bays:
v 16 hot-swap SAS 2.5-inch bays
v Three half-high 5.25-inch bays (one DVD drive
installed)
Note: Full-high devices such as an optional
tape drive will occupy two half-high
5.25-inch bays.
PCI and PCI-X expansion slots:
v Six PCI expansion slots on system board
– Two PCI Express x8 (x4 link)
– Two PCI Express x8 (x8 link)
– One PCI Express x16 (x8 link)
– One PCI 32-bit
v One PCI Express x8 (x4 link) on the extender
card
Power supply:
Note: To upgrade to two 920-watt hot-swap
power supplies, install the redundant power and
cooling option kit. Kit includes one hot-swap
920-watt power-supply and three hot-swap fans.
v Standard: One 920-watt 110 V or 240 V ac
input dual-rated power supply
v Upgradeable to two 920-watt hot-swap power
supplies
Hot-swap fans:
v Three (standard)
v Upgradeable to six fans (for redundant
cooling)
Note: To upgrade to redundant cooling, install
the redundant power and cooling option kit. Kit
includes one 920-watt hot-swap power-supply
and three hot-swap fans.
Size:
v Tower
– Height: 440 mm (17.3 inches)
– Depth: 767 mm (30.2 inches)
– Width: 218 mm (8.6 inches)
– Weight: approximately 38 kg (84 lb.) when
fully configured or 20 kg (42 lb.) minimum
Integrated functions:
v Integrated management module (IMM), which
provides service processor control and
monitoring functions, video controller, remote
keyboard, video, mouse, and remote hard
disk drive capabilities
v Dedicated or shared management network
connections
v Six-port Serial ATA (SATA) controller
v Serial over LAN (SOL) and serial redirection
over Telnet or Secure Shell (SSH)
v Support for remote management presence
v One systems-management RJ-45 for
connection to a dedicated
systems-management network
v EasyLED diagnostics
v Six Universal Serial Bus (USB) ports
standard (v2.0 supporting v1.1)
– Four on rear of server
– Two on front of server
v One internal USB tape connector
v One Broadcom dual-port 10/100/1000
Ethernet controller with Wake on LAN
support and TCP/IP Offload Engine (TOE)
support
v One serial connector, shared with the IMM
Note: In messages and documentation, the
term service processor refers to the integrated
management module (IMM).
Video controller:
v Matrox G200 video on system board
v Compatible with SVGA and VGA
v 8 MB DDR2 SDRAM video memory
Note: Maximum video resolution 1600 x
1200 at 85 MHz
RAID controllers:
v ServeRAID-BR10i SAS/SATA Controller that
supports RAID levels 0, 1, 1E (standard)
v Upgradeable to ServeRAID-MR10i SAS/SATA
Controller, which supports RAID levels 0, 1, 5,
6, 10
v Upgradeable to ServeRAID-MR10is SAS/SATA
Controller, which supports RAID levels 0, 1, 5,
6, 10
Acoustical noise emissions:
v Sound power, idle: 5.5 bel declared
v Sound power, operating: 6.0 bel declared
Environment:
v Air temperature:
– Server on: 10° to 35° C (50.0° to 95.0° F);
altitude: 0 to 914.4 m (3000 ft.)
– Server off: -40° to 60° C (-40.0° to 140.4° F);
maximum altitude: 2133.6 m (7000 ft.)
v Humidity:
– Server on: 8% to 80%
– Server off: 8% to 80%
Heat output:
Approximate heat output in British thermal units
(Btu) per hour:
v Minimum configuration: 2013 Btu per hour (590
watts)
v Maximum configuration: 3610 Btu per hour
(1058 watts)
Electrical input:
v Sine-wave input (50-60 Hz) required
v Input voltage low range:
– Minimum: 100 V ac
– Maximum: 127 V ac
v Input voltage high range:
– Minimum: 200 V ac
– Maximum: 240 V ac
v Approximate input kilovolt-amperes (kVA):
– Minimum: 0.60 kVA
– Maximum: 1.10 kVA
Notes:
1. Power consumption and heat output vary
depending on the number and type of optional
features that are installed and the
power-management optional features that are
in use.
2. These levels were measured in controlled
acoustical environments according to the
procedures that are specified by the American
National Standards Institute (ANSI) S12.10 and
ISO 7779 and are reported in accordance with
ISO 9296. Actual sound-pressure levels in a
given location might exceed the average stated
values because of room reflections and other
nearby noise sources. The declared
sound-power levels indicate an upper limit,
below which a large number of computers will
operate.
Software
Lenovo provides software to help get your server up and running.
The ThinkServer EasyStartup program simplifies the process of your RAID
controller and installing supported Microsoft
systems and device drivers on your server. The EasyStartup program is provided
with your server on DVD. The DVD is self starting (bootable). The user guide for the
EasyStartup program is on the DVD and can be accessed directly from the program
interface. For additional information, see “Using the ThinkServer EasyStartup DVD”
on page 263.
The ThinkServer EasyManage Core Server provides centralized hardware and
software inventory management and secure automated system management
through a centralized console. The ThinkServer EasyManage Agent enables other
clients on the network to be managed by the centralized console. The ThinkServer
EasyManage Core Server is supported on Microsoft Windows Server 2003 and
Microsoft Windows Server 2008 (32-bit) products. The ThinkServer EasyManage
Agent is supported on 32-bit and 64-bit Windows, Red Hat, and SUSE operating
systems.
You can solve many problems without outside assistance by following the
troubleshooting procedures in this Hardware Maintenance Manual and on the
Lenovo Web site. This document describes the diagnostic tests that you can
perform, troubleshooting procedures, and explanations of error messages and error
codes. The documentation that comes with your operating system and software
also contains troubleshooting information.
Checkout procedure
The checkout procedure is the sequence of tasks that you should follow to
diagnose a problem in the server.
About the checkout procedure
Before you perform the checkout procedure for diagnosing hardware problems,
review the following information:
v Read the safety information that begins on page vii.
v The diagnostic programs provide the primary methods of testing the major
components of the server, such as the system board, Ethernet controller,
keyboard, mouse (pointing device), serial ports, and hard disk drives. You can
also use them to test some external devices. If you are not sure whether a
problem is caused by the hardware or by the software, you can use the
diagnostic programs to confirm that the hardware is working correctly.
v When you run the diagnostic programs, a single problem might cause more than
one error message. When this happens, correct the cause of the first error
message. The other error messages usually will not occur the next time you run
the diagnostic programs.
|
|
|
|
Exception: If multiple error codes or EasyLED diagnostics LEDs indicate a
microprocessor error, the error might be in a microprocessor or in a
microprocessor socket. See “Microprocessor problems” on page 69 for
information about diagnosing microprocessor problems.
v Before you run the diagnostic programs, you must determine whether the failing
server is part of a shared hard disk drive cluster (two or more servers sharing
external storage devices). If it is part of a cluster, you can run all diagnostic
programs except the ones that test the storage unit (that is, a hard disk drive in
the storage unit) or the storage adapter that is attached to the storage unit. The
failing server might be part of a cluster if any of the following conditions is true:
– You have identified the failing server as part of a cluster (two or more servers
sharing external storage devices).
– One or more external storage units are attached to the failing server and at
least one of the attached storage units is also attached to another server or
unidentifiable device.
– One or more servers are located near the failing server.
Important: If the server is part of a shared hard disk drive cluster, run one test
at a time. Do not run any suite of tests, such as “quick” or “normal” tests,
because this might enable the hard disk drive diagnostic tests.
v If the server is halted and a POST error code is displayed, see “POST error
codes” on page 30. If the server is halted and no error message is displayed,
see “Troubleshooting tables” on page 64 and “Solving undetermined problems”
on page 124.
v For information about power-supply problems, see “Solving power problems” on
page 123 and “Power-supply LEDs” on page 88.
v For intermittent problems, check the system-event log; see “Event logs” on page
27, “System-event log” on page 38, and “Diagnostic programs, messages, and
error codes” on page 90.
Performing the checkout procedure
To perform the checkout procedure, complete the following steps:
1. Is the server part of a cluster?
v No: Go to step 2.
v Yes: Shut down all failing servers that are related to the cluster. Go to step 2.
2. Complete the following steps:
a. Turn off the server and all external devices.
b. Check all cables and power cords.
|
|
|
c. Check all internal and external devices for compatibility at
http://www.lenovo.com/thinkserver and then click Options. Open the Server
Options Guide.pdf.
d. Set all display controls to the middle positions.
e. Turn on all external devices.
f. Turn on the server. If the server does not start, see “Troubleshooting tables”
on page 64.
g. Check the system-error LED on the operator information panel (see
Chapter 6, “Locating Server Controls and connectors,” on page 127). If it is
flashing, check the EasyLED diagnostics LEDs (see “EasyLED diagnostics”
on page 76).
h. Check for the following results:
v Successful completion of POST
v Successful completion of startup, indicated by a readable display of the
operating-system desktop
3. Are there readable instructions on the main menu?
v No: Find the failure symptom in “Troubleshooting tables” on page 64; if
necessary, see “Solving undetermined problems” on page 124.
v Yes: Run the diagnostic programs (see “Running the diagnostic programs” on
page 90).
– If you receive an error, see “Diagnostic messages” on page 91.
– If the diagnostic programs were completed successfully and you still
suspect a problem, see “Solving undetermined problems” on page 124.
Diagnosing a problem
Before you contact Lenovo or an approved warranty service provider, follow these
procedures in the order in which they are presented to diagnose a problem with
your server:
Determine whether any of the following items were added, removed, replaced,
or updated before the problem occurred:
v Lenovo ThinkServer Server Firmware (server firmware)
v Device drivers
v Firmware
v Hardware components
v Software
If possible, return the server to the condition it was in before the problem
occurred.
2. Collect data.
Thorough data collection is necessary for diagnosing hardware and software
problems.
a. Document error codes and system-board LEDs.
v System error codes: See “Viewing the test log” on page 91 for
information about error codes.
v Software or operating-system error codes: See the documentation for
the software or operating system for information about a specific error
code. See the manufacturer's Web site for documentation.
v EasyLED diagnostics LEDs: See “EasyLED diagnostics” on page 76 for
information about EasyLED diagnostics LEDs that are lit.
v System-board LEDs: See “System-board LEDs” on page 135 for
information about system-board LEDs that are lit.
“EasyLED diagnostics” on page 76
b. Collect system data.
Run Dynamic System Analysis (DSA) to collect information about the
hardware, firmware, software, and operating system. Have this information
available when you contact Lenovo or an approved warranty service
provider. For instructions for running the DSA program, see “Running the
diagnostic programs” on page 90.
If you have to download the latest version of DSA , complete the following
steps.
Note: Changes are made periodically to the Lenovo Web site. The actual
procedure might vary slightly from what is described in this document.
1) Go to: http://www.lenovo.com/support.
2) Enter your product number (machine type and model number) or select
Servers and Storage from the Select your product list.
3) Select Servers and Storage from the Brand list.
4) From Family list, select ThinkServer TD200x, and click Continue.
5) Click Downloads and drivers and look at the list for the Preboot DSA
CD image.
3. Follow the problem-resolution procedures.
The four problem-resolution procedures are presented in the order in which they
are most likely to solve your problem. Follow these procedures in the order in
which they are presented:
a. Check for and apply code updates.
Most problems that appear to be caused by faulty hardware are actually
caused by Lenovo ThinkServer Server Firmware (server firmware), system
firmware, device firmware, or device drivers that are not at the latest levels.
Chapter 4. General Checkout23
Page 32
Important: Some cluster solutions require specific code levels or
coordinated code updates. If the device is part of a cluster solution, verify
that the latest level of code is supported for the cluster solution before you
update the code.
1) Determine the existing code levels.
In DSA, click Firmware/VPD to view system firmware levels, or click
Software to view operating-system levels.
2) Download and install updates of code that is not at the latest level.
To display a list of available updates for your server, complete the
following steps.
Note: Changes are made periodically to the Lenovo Web site. The
actual procedure might vary slightly from what is described in this
document.
a) Go to: http://www.lenovo.com/support.
b) Enter your product number (machine type and model number) or
select Servers and Storage from the Select your product list.
c) Select Servers and Storage from the Brand list.
d) From Family list, select ThinkServer TD200x, and click Continue.
e) Click System TD200x to display the list of downloadable files for the
server.
b. Check for and correct an incorrect configuration.
If the server is incorrectly configured, a system function can fail to work
when you enable it; if you make an incorrect change to the server
configuration, a system function that has been enabled can stop working.
1) Make sure that all installed hardware and software are supported.
See http://www.lenovo.com/thinkserver to verify that the server supports
the installed operating system, optional devices, and software levels. If
any hardware or software component is not supported, uninstall it to
determine whether it is causing the problem. You must remove
nonsupported hardware before you contact Lenovo or an approved
warranty service provider for support.
2) Make sure that the server, operating system, and software are
installed and configured correctly.
Many configuration problems are caused by loose power or signal
cables or incorrectly seated adapters. You might be able to solve the
problem by turning off the server, reconnecting cables, reseating
adapters, and turning the server back on. For information about
performing the checkout procedure, see “Checkout procedure” on page
21.
If the problem is associated with a specific function (for example, if a
RAID hard disk drive is marked offline in the RAID array), see the
documentation for the associated controller and management or
controlling software to verify that the controller is correctly configured.
Problem determination information is available for many devices such as
RAID and network adapters.
For problems with operating systems or Lenovo software or devices,
complete the following steps.
Note: Changes are made periodically to the Lenovo Web site. The
actual procedure might vary slightly from what is described in this
document.
a) Go to: http://www.lenovo.com/support.
b) Enter your product number (machine type and model number) or
select Servers and Storage from the Select your product list.
c) Select Servers and Storage from the Brand list.
d) From Family list, select ThinkServer TD200x, and click Continue.
e) Under Support & downloads, click Documentation, Install, and
Use to search for related documentation.
|
|
|
|
c. Check for troubleshooting procedures, and hints and tips.
Troubleshooting procedures, and hints and tips document known problems
and suggested solutions. To search for troubleshooting procedures, and
hints and tips, complete the following steps.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note: Changes are made periodically to the Lenovo Web site. The actual
procedure might vary slightly from what is described in this document.
1) Go to: http://www.lenovo.com/support.
2) Enter your product number (machine type and model number) or select
Servers and Storage from the Select your product list.
3) Select Servers and Storage from the Brand list.
4) From Family list, select ThinkServer TD200x, and click Continue.
5) Under Support & downloads, click Troubleshoot.
6) Select the troubleshooting procedure or hints and tips that applies to
your problem:
v Troubleshooting procedures are under Diagnostic.
v Hints and tips are under Troubleshoot.
d. Check for and replace defective hardware.
If a hardware component is not operating within specifications, it can cause
unpredictable results. Most hardware failures are reported as error codes in
a system or operating-system log. For more information, see
“Troubleshooting tables” on page 64 and Chapter 7, “Installing optional
devices and replacing customer replaceable units,” on page 149. Hardware
errors are also indicated by EasyLED diagnostics LEDs.
A single problem might cause multiple symptoms. Follow the troubleshooting
procedure for the most obvious symptom. If that procedure does not
diagnose the problem, use the procedure for another symptom, if possible.
If the problem remains, contact Lenovo or an approved warranty service
provider for assistance with additional problem determination and possible
hardware replacement. Be prepared to provide information about any error
codes and collected data.
Undocumented problems
If you have completed the diagnostic procedure and the problem remains, the
problem might not have been previously identified by Lenovo. After you have
verified that all code is at the latest level, all hardware and software configurations
are valid, and no EasyLED diagnostics LEDs or log entries indicate a hardware
component failure, contact Lenovo or an approved warranty service provider for
assistance. Be prepared to provide information about any error codes and collected
data and the problem determination procedures that you have used.
This chapter describes the diagnostic tools that are available to help you solve
problems that might occur in the server.
If you cannot diagnose and correct a problem by using the information in this
chapter, see Appendix A, “Getting help and technical assistance,” on page 275 for
more information.
Diagnostic tools
The following tools are available to help you diagnose and solve hardware-related
problems:
v POST error messages
The power-on self-test (POST) generates messages to indicate successful test
completion or the detection of a problem. See “POST error codes” on page 30
for more information.
v Event logs
For information about the POST event log, the system-event log, the integrated
management module (IMM) event log, and the DSA log, see “Event logs” and
“System-event log” on page 38.
v Troubleshooting tables
These tables list problem symptoms and actions to correct the problems. See
“Troubleshooting tables” on page 64.
v EasyLED diagnostics
|
|
Use the EasyLED diagnostics to diagnose system errors quickly. See “EasyLED
diagnostics” on page 76 for more information.
v Diagnostic programs, messages, and error codes
The diagnostic programs are the primary method of testing the major
components of the server. See “Diagnostic programs, messages, and error
codes” on page 90 for more information.
Event logs
Error codes and messages are displayed in the following types of event logs:
v POST event log: This log contains the three most recent error codes and
messages that were generated during POST. You can view the POST event log
through the Setup utility.
v System-event log: This log contains all IMM, POST, and system management
interrupt (SMI) events. You can view the system-event log through the Setup
utility and through the Dynamic System Analysis (DSA) program (as the IPMI
event log).
The system-event log is limited in size. When it is full, new entries will not
overwrite existing entries; therefore, you must periodically save and then clear
the system-event log through the Setup utility when the IMM logs an event that
indicates that the log is more than 75% full. When you are troubleshooting, you
might have to save and then clear the system-event log to make the most recent
events available for analysis.
Messages are listed on the left side of the screen, and details about the selected
message are displayed on the right side of the screen. To move from one entry
to the next, use the Up Arrow (↑) and Down Arrow (↓) keys.
Some IMM sensors cause assertion events to be logged when their setpoints are
reached. When a setpoint condition no longer exists, a corresponding
deassertion event is logged. However, not all events are assertion-type events.
v Integrated management module (IMM) event log: This log contains a filtered
subset of all IMM, POST, and system management interrupt (SMI) events. You
can view the IMM event log through the IMM Web interface and through the
Dynamic System Analysis (DSA) program (as the ASM event log).
v DSA log: This log is generated by the Dynamic System Analysis (DSA) program,
and it is a chronologically ordered merge of the system-event log (as the IPMI
event log), the IMM event log (as the ASM event log), and the operating-system
event logs. You can view the DSA log through the DSA program.
Viewing event logs through the Setup utility
To view the POST event log or system-event log, complete the following steps:
1. Turn on the server.
2. When the prompt <F1> Setup is displayed, press F1. If you have set both a
power-on password and an administrator password, you must type the
administrator password to view the event logs.
3. Select System Event Logs and use one of the following procedures:
v To view the POST event log, select POST Event Viewer.
v To view the system-event log, select System Event Log.
Viewing event logs without restarting the server
If the server is not hung, methods are available for you to view one or more event
logs without having to restart the server.
|
|
|
|
You can use the DSA Preboot to view the system event log (as the IPMI event log),
the IMM event log (as the ASM event log), or the merged DSA log. You must restart
the server to use DSA Preboot to view those logs. To install a DSA Preboot CD
image, complete the following steps:
Note: Changes are made periodically to the Lenovo Web site. The actual
procedure might vary slightly from what is described in this document.
1. Go to: http://www.lenovo.com/support.
2. Enter your product number (machine type and model number) or select Serversand Storage from the Select your product list.
3. Select Servers and Storage from the Brand list.
4. From Family list, select ThinkServer TD200x, and click Continue.
5. Click Downloads and drivers and look at the list for the Preboot DSA CD
image.
You can view the IMM event log through the Event Log link in the integrated
management module (IMM) Web interface.
The following table describes the methods that you can use to view the event logs,
depending on the condition of the server. The first two conditions generally do not
require that you restart the server.
The server is not hung and is connected to a
network.
Use any of the following methods:
v Run Portable or Installable DSA to view
the event logs or create an output file that
you can send to Lenovo service and
support.
v Type the IP address of the IMM and go to
the Event Log page.
v Use IPMItool to view the system-event log.
The server is not hung and is not connected
to a network.
The server is hung.
Use IPMItool locally to view the system-event
log.
v If DSA Preboot is installed, restart the
server and press F2 to start DSA Preboot
and view the event logs.
v If DSA Preboot is not installed, insert the
DSA Preboot CD and restart the server to
start DSA Preboot and view the event
logs.
v Alternatively, you can restart the server
and press F1 to start the Setup utility and
view the POST event log or system-event
log. For more information, see “Viewing
event logs through the Setup utility” on
page 28.
Chapter 5. Diagnostics29
Page 38
POST error codes
When you turn on the server, it performs a series of tests to check the operation of
the server components and some optional devices in the server. This series of tests
is called the power-on self-test, or POST.
If a power-on password is set, you must type the password and press Enter, when
you are prompted, for POST to run.
If POST is completed without detecting any problems, the server startup is
completed.
If POST detects a problem, an error message is sent to the POST event log.
The following table describes the POST error codes and suggested actions to
correct the detected problems. These errors can appear as severe, warning, or
informational.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error codeDescriptionAction
0010002Microprocessor not supported
0011000Invalid microprocessor type
1. Reseat the following components one at a time,
in the order shown, restarting the server each
time:
a. (Trained service technician only)
Microprocessor 1
b. (Trained service technician only)
Microprocessor 2 (if one is installed)
2. (Trained service technician only) Remove
microprocessor 2 and restart the server.
3. (Trained service technician only) Remove
microprocessor 1 and install microprocessor 2 in
the microprocessor 1 connector. Restart the
server. If the error is corrected, microprocessor 1
is bad and must be replaced.
4. Replace the following components one at a time,
in the order shown, restarting the server each
time:
a. (Trained service technician only)
Microprocessor 1
b. (Trained service technician only)
Microprocessor 2
c. (Trained service technician only) System
board
1. Update the firmware (see “Updating the firmware”
on page 267).
2. (Trained service technician only) Remove and
replace the affected microprocessor (error LED is
lit) with a supported type.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error codeDescriptionAction
0011002Microprocessor mismatch
1. Run the Setup utility and view the microprocessor
information to compare the installed
microprocessor specifications.
2. (Trained service technician only) Remove and
replace one of the microprocessors so that they
both match.
0011004Microprocessor failed BIST
1. Update the firmware (see “Updating the firmware”
on page 267).
2. (Trained service technician only) Reseat
microprocessor 2.
3. Replace the following components one at a time,
in the order shown, restarting the server each
time:
a. (Trained service technician only)
Microprocessor
b. (Trained service technician only) System
board
001100AMicrocode update failed
1. Update the server firmware (see “Updating the
firmware” on page 267).
2. (Trained service technician only) Replace the
microprocessor.
0050001DIMM disabled
1. If the server fails the POST memory test, reseat
the DIMMs.
2. Remove and replace any DIMM for which the
associated error LED is lit (see “Removing a
memory module” on page 210 and “Installing a
memory module” on page 211).
3. Run the Setup utility to enable all the DIMMs.
4. Run the DSA memory test.
0051003Uncorrectable DIMM error
1. If the server failed the POST memory test, reseat
the DIMMs.
2. Remove and replace any DIMM for which the
associated error LED is lit (see “Removing a
memory module” on page 210 and “Installing a
memory module” on page 211).
3. Run the Setup utility to enable all the DIMMs.
4. Run the DSA memory test.
0051006DIMM mismatch detectedMake sure that the DIMMs match and are installed in
the correct sequence (see “Installing a memory
module” on page 211).
Chapter 5. Diagnostics31
Page 40
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error codeDescriptionAction
0051009No memory detected
005100ANo usable memory detected
0058001PFA threshold exceeded
0058007DIMM population is unsupported
0058008DIMM failed memory test
00580A1Invalid DIMM population for mirroring mode
1. Make sure that the server contains DIMMs.
2. Reseat the DIMMs.
3. Install DIMMs in the correct sequence (see
“Installing a memory module” on page 211).
1. Make sure that the server contains DIMMs.
2. Reseat the DIMMs.
3. Install DIMMs in the correct sequence (see
“Installing a memory module” on page 211).
4. Clear CMOS memory to re-enable all the memory
connectors.
1. Update the firmware (see“Updating the firmware”
on page 267).
2. Reseat the DIMMs and run the memory test.
3. Replace the failing DIMM, which is indicated by a
lit LED on the system board.
1. Reseat the DIMMs, and then restart the server.
2. Remove the lowest-numbered DIMM pair of those
that are identified, replace it with an identical pair
of known good DIMMs, and then restart the
server. Repeat as necessary. If the failures
continue, go to step 4.
3. Return the removed DIMMs, one pair at a time, to
their original connectors, restarting the server
after each pair, until a pair fails. Replace the
DIMMs in the failed pair with identical known
good DIMMs, restarting the server after each
DIMM is installed. Replace the failed DIMM.
Repeat this step until you have tested all
removed DIMMs.
4. (Trained service technician only) Replace the
system board.
1. Reseat the DIMMs, and then restart the server.
2. Replace the following components one at a time,
in the order shown, restarting the server each
time:
a. DIMMs
b. (Trained service technician only) System
board
1. If a fault LED is lit, resolve the failure.
2. Install the DIMMs in the correct sequence (see
“Installing a memory module” on page 211).
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error codeDescriptionAction
00580A4Memory population changedInformation only. Memory has been added, moved, or
changed.
00580A5Mirror failover completeInformation only. Memory redundancy has been lost.
Check the event log for uncorrected DIMM failure
events.
0068002CMOS battery cleared
1. Reseat the battery.
2. Clear the CMOS memory (see “System-board
switches and jumpers” on page 144).
3. Replace the following components one at a time,
in the order shown, restarting the server each
time:
a. Battery
b. (Trained service technician only) System
board
2011000PCI-X PERR
1. Check the extender card LEDs.
2. Reseat all affected adapters and extender cards.
3. Update the PCI device firmware.
4. Remove the adapters from the extender card.
5. Replace the following components one at a time,
in the order shown, restarting the server each
time:
a. Extender card
b. (Trained service technician only) System
board
2011001PCI-X SERR
1. Check the extender-card LEDs.
2. Reseat all affected adapters and extender cards.
3. Update the PCI device firmware.
4. Remove the adapters from the extender card.
5. Replace the following components one at a time,
in the order shown, restarting the server each
time:
a. Extender card
b. (Trained service technician only) System
board
Chapter 5. Diagnostics33
Page 42
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error codeDescriptionAction
2018001PCI Express uncorrected or uncorrected
error
2018002Option ROM resource allocation failureInformational message that some devices might not
3xx0007 (xx
can be 00 - 19)
3038003Firmware corrupted
3048005Booted secondary (backup) server firmware
Firmware fault detected, system halted
image
1. Check the extender-card LEDs.
2. Reseat all affected adapters and extender cards.
3. Update the PCI device firmware.
4. Remove both adapters from the extender card.
5. Replace the following components one at a time,
in the order shown, restarting the server each
time:
a. Extender card
b. (Trained service technician only) System
board
be initialized.
1. If possible, rearrange the order of the adapters in
the PCI slots to change the load order of the
optional-device ROM code.
2. Run the Setup utility, select Start Options, and
change the boot priority to change the load order
of the optional-device ROM code.
3. Run the Setup utility and disable some other
resources, if their functions are not being used, to
make more space available. Select Devices andI/O Ports to disable any of the integrated devices.
4. Replace the following components one at a time,
in the order shown, restarting the server each
time:
a. Each adapter
b. (Trained service technician only) System
board
1. Recover the server firmware to the latest level.
2. Undo any recent configuration changes, or clear
CMOS memory to restore the settings to the
default values.
3. Remove any recently installed hardware.
1. Run the Setup utility, select Load DefaultSettings, and save the settings to recover the
server firmware.
2. (Trained service technician only) Replace the
system board.
Information only. The backup switch was used to boot
the secondary bank.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Error codeDescriptionAction
3818001Core Root of Trust Measurement (CRTM)
update failed
1. Run the Setup utility, select Load DefaultSettings, and save the settings.
2. (Trained service technician only) Replace the
system board.
3818002Core Root of Trust Measurement (CRTM)
update aborted
1. Run the Setup utility, select Load DefaultSettings, and save the settings.
2. (Trained service technician only) Replace the
system board.
3818003Core Root of Trust Measurement (CRTM)
flash lock failed
1. Run the Setup utility, select Load DefaultSettings, and save the settings.
2. (Trained service technician only) Replace the
system board.
3818004Core Root of Trust Measurement (CRTM)
system error
1. Run the Setup utility, select Load DefaultSettings, and save the settings.
2. (Trained service technician only) Replace the
system board.
3818005Current Bank Core Root of Trust
Measurement (CRTM) capsule signature
invalid
1. Run the Setup utility, select Load DefaultSettings, and save the settings.
2. (Trained service technician only) Replace the
system board.
3818006Opposite bank CRTM capsule signature
invalid
1. Switch the firmware bank to the backup bank.
2. Run the Setup utility, select Load DefaultSettings, and save the settings.
3. Switch the bank back to the current bank.
4. (Trained service technician only) Replace the
system board.
3818007CRTM update capsule signature invalid
1. Run the Setup utility, select Load DefaultSettings, and save the settings.
2. (Trained service technician only) Replace the
system board.
Chapter 5. Diagnostics37
Page 46
System-event log
The system-event log contains messages of three types:
Information
Information messages do not require action; they record significant
system-level events, such as when the server is started.
Warning
Warning messages do not require immediate action; they indicate possible
problems, such as when the recommended maximum ambient temperature
is exceeded.
ErrorError messages might require action; they indicate system errors, such as
when a fan is not detected.
Each message contains date and time information, and it indicates the source of
the message (POST or the IMM).
Integrated management module error messages
The following table describes the IMM error messages and suggested actions to
correct the detected problems. For more information about IMM, see the IMMUser’s Guide on the Web.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
MessageSeverityDescriptionAction
Numeric sensor Ambient Temp going
high (upper critical) has asserted.
Numeric sensor Ambient Temp going
high (upper non-recoverable) has
asserted.
Numeric sensor Planar 3.3V going
low (lower critical) has asserted.
Numeric sensor Planar 3.3V going
high (upper critical) has asserted.
Numeric sensor Planar 5V going low
(lower critical) has asserted.
Numeric sensor Planar 5V going high
(upper critical) has asserted.
Numeric sensor Planar 12V going
low (lower critical) has asserted.
Numeric sensor Planar 12V going
high (upper critical) has asserted.
ErrorAn upper critical sensor
going high has asserted.
ErrorAn upper nonrecoverable
sensor going high has
asserted.
ErrorA lower critical sensor going
low has asserted.
ErrorAn upper critical sensor
going high has asserted.
ErrorA lower critical sensor going
low has asserted.
ErrorAn upper critical sensor
going high has asserted.
ErrorA lower critical sensor going
low has asserted.
ErrorAn upper critical sensor
going high has asserted.
Reduce the ambient temperature.
Reduce the ambient temperature.
(Trained service technician only)
Replace the system board.
(Trained service technician only)
Replace the system board.
(Trained service technician only)
Replace the system board.
(Trained service technician only)
Replace the system board.
Check the power-supply LED on
the EasyLED panel (see
“EasyLED diagnostics” on page
76).
Check the power-supply LED on
the EasyLED panel (see
“EasyLED diagnostics” on page
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Numeric sensor Planar VBAT going
low (lower critical) has asserted.
Numeric sensor Fan n Tach going
low (lower critical) has asserted.
(n = fan number)
ErrorA lower critical sensor going
low has asserted.
ErrorA lower critical sensor going
low has asserted.
Replace the 3 V battery.
1. Reseat the failing fan n, which
is indicated by a lit LED on
the fan.
2. Replace the failing fan.
(n = fan number)
The Processor CPU nStatus has
Failed with IERR.
(n = microprocessor number)
ErrorA processor failed - IERR
condition has occurred.
1. Make sure that the latest
levels of firmware and device
drivers are installed for all
adapters and standard
devices, such as Ethernet,
SCSI, and SAS.
Important: Some cluster
solutions require specific code
levels or coordinated code
updates. If the device is part
of a cluster solution, verify
that the latest level of code is
supported for the cluster
solution before you update
the code.
2. Run the DSA program for the
hard disk drives and other I/O
devices.
3. (Trained service technician
only) Replace microprocessor
n.
(n = microprocessor number)
An Over-Temperature Condition has
been detected on the Processor CPU
nStatus.
(n = microprocessor number)
ErrorAn overtemperature
condition has occurred for
microprocessor n.
(n = microprocessor number)
1. Make sure that the fans are
operating, that there are no
obstructions to the airflow,
that the air baffle is in place
and correctly installed, and
that the server cover is
installed and completely
closed.
2. Make sure that the heat sink
for microprocessor nis
installed correctly.
3. (Trained service technician
only) Replace microprocessor
n.
(n = microprocessor number)
Chapter 5. Diagnostics39
Page 48
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
The Processor CPU nStatus has
Failed with FRB1/BIST condition.
(n = microprocessor number)
The Processor CPU nStatus has a
Configuration Mismatch.
(n = microprocessor number)
ErrorA processor failed -
FRB1/BIST condition has
occurred.
ErrorA processor configuration
mismatch has occurred.
1. Check for a server firmware
update.
Important: Some cluster
solutions require specific code
levels or coordinated code
updates. If the device is part
of a cluster solution, verify
that the latest level of code is
supported for the cluster
solution before you update
the code.
2. Make sure that the installed
microprocessors are
compatible with each other
(see “Installing a
microprocessor and heat sink”
on page 220 for information
about microprocessor
requirements).
3. (Trained service technician
only) Reseat microprocessor
n.
4. (Trained service technician
only) Replace microprocessor
n.
(n = microprocessor number)
1. Make sure that the installed
microprocessors are
compatible with each other
(see “Installing a
microprocessor and heat sink”
on page 220 for information
about microprocessor
requirements).
2. (Trained service technician
only) Replace the
incompatible microprocessor.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
An SM BIOS Uncorrectable CPU
complex error for Processor CPU
nStatus has asserted.
(n = microprocessor number)
ErrorAn SMBIOS uncorrectable
CPU complex error has
asserted.
1. Check for a server firmware
update.
Important: Some cluster
solutions require specific code
levels or coordinated code
updates. If the device is part
of a cluster solution, verify
that the latest level of code is
supported for the cluster
solution before you update
the code.
2. Make sure that the installed
microprocessors are
compatible with each other
(see “Installing a
microprocessor and heat sink”
on page 220 for information
about microprocessor
requirements).
3. (Trained service technician
only) Reseat microprocessor
n.
4. (Trained service technician
only) Replace microprocessor
n.
(n = microprocessor number)
Sensor CPU nOverTemp has
transitioned to critical from a less
severe state.
(n = microprocessor number)
ErrorA sensor has changed to
Critical state from a less
severe state.
1. Make sure that the fans are
operating, that there are no
obstructions to the airflow,
that the air baffle is in place
and correctly installed, and
that the server cover is
installed and completely
closed.
2. Make sure that the heat sink
for microprocessor n is
installed correctly.
3. (Trained service technician
only) Replace microprocessor
n.
(n = microprocessor number)
Chapter 5. Diagnostics41
Page 50
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Sensor CPU nOverTemp has
transitioned to non-recoverable from
a less severe state.
(n = microprocessor number)
Sensor CPU nOverTemp has
transitioned to critical from a
non-recoverable state.
(n = microprocessor number)
Sensor CPU nOverTemp has
transitioned to non-recoverable.
(n = microprocessor number)
ErrorA sensor has changed to
Nonrecoverable state from a
less severe state.
ErrorA sensor has changed to
Critical state from
Nonrecoverable state.
ErrorA sensor has changed to
Nonrecoverable state.
1. Make sure that the fans are
operating, that there are no
obstructions to the airflow,
that the air baffle is in place
and correctly installed, and
that the server cover is
installed and completely
closed.
2. Make sure that the heat sink
for microprocessor n is
installed correctly.
3. (Trained service technician
only) Replace microprocessor
n.
(n = microprocessor number)
1. Make sure that the fans are
operating, that there are no
obstructions to the airflow,
that the air baffle is in place
and correctly installed, and
that the server cover is
installed and completely
closed.
2. Make sure that the heat sink
for microprocessor nis
installed correctly.
3. (Trained service technician
only) Replace microprocessor
n.
(n = microprocessor number)
1. Make sure that the fans are
operating, that there are no
obstructions to the airflow,
that the air baffle is in place
and correctly installed, and
that the server cover is
installed and completely
closed.
2. Make sure that the heat sink
for microprocessor nis
installed correctly.
3. (Trained service technician
only) Replace microprocessor
n.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
A diagnostic interrupt has occurred
on system %1.
(%1 = CIM_ComputerSystem.
ElementName)
ErrorAn operator information
panel NMI/diagnostic
interrupt has occurred.
If the NMI button on the system
board has not been pressed,
complete the following steps:
1. Make sure that the NMI
button is not pressed.
2. Replace the operator
information panel cable.
3. Replace the operator
information panel.
A bus timeout has occurred on
system %1.
(%1 = CIM_ComputerSystem.
ElementName)
ErrorA bus timeout has occurred.
1. Remove the adapter from the
PCI slot that is indicated by a
lit LED.
2. Replace the extender card.
3. Remove all PCI adapters.
4. (Trained service technicians
only) Replace the system
board.
A software NMI has occurred on
system %1.
(%1 = CIM_ComputerSystem.
ErrorA software NMI has
occurred.
1. Check the device driver.
2. Reinstall the device driver.
ElementName)
The System %1 encountered a
POST Error.
(%1 = CIM_ComputerSystem.
ElementName)
ErrorA POST error has occurred.
(Sensor = ABR Status)
1. Recover the server firmware
from the backup page (see
“Recovering from a Lenovo
ThinkServer Server Firmware
update failure” on page 122).
2. Update the server firmware to
the latest level.
Important: Some cluster
solutions require specific code
levels or coordinated code
updates. If the device is part
of a cluster solution, verify
that the latest level of code is
supported for the cluster
solution before you update
the code.
Chapter 5. Diagnostics43
Page 52
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
The System %1 encountered a
POST Error.
(%1 = CIM_ComputerSystem.
ElementName)
A Uncorrectable Bus Error has
occurred on system %1.
(%1 = CIM_ComputerSystem.
ElementName)
ErrorA POST error has occurred.
(Sensor = Firmware Error)
ErrorA bus uncorrectable error
has occurred.
(Sensor = Critical Int PCI)
1. Update the server firmware
on the primary page.
Important: Some cluster
solutions require specific code
levels or coordinated code
updates. If the device is part
of a cluster solution, verify
that the latest level of code is
supported for the cluster
solution before you update
the code.
2. (Trained service technician
only) Replace the system
board.
1. Check the system-event log.
2. Check the PCI error LEDs.
3. Remove the adapter from the
indicated PCI slot.
4. Check for a server firmware
update.
Important: Some cluster
solutions require specific code
levels or coordinated code
updates. If the device is part
of a cluster solution, verify
that the latest level of code is
supported for the cluster
solution before you update
the code.
5. (Trained service technician
only) Replace the system
board.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
A Uncorrectable Bus Error has
occurred on system %1.
(%1 = CIM_ComputerSystem.
ElementName)
ErrorA bus uncorrectable error
has occurred.
(Sensor = Critical Int CPU)
1. Check the system-event log.
2. Check the microprocessor
error LEDs.
3. Remove the failing
microprocessor from the
system board.
4. Check for a server firmware
update.
Important: Some cluster
solutions require specific code
levels or coordinated code
updates. If the device is part
of a cluster solution, verify
that the latest level of code is
supported for the cluster
solution before you update
the code.
5. Make sure that the two
microprocessors are
matching.
6. (Trained service technician
only) Replace the system
board.
A Uncorrectable Bus Error has
occurred on system %1.
(%1 = CIM_ComputerSystem.
ElementName)
ErrorA bus uncorrectable error
has occurred.
(Sensor = Critical Int DIM)
1. Check the system-event log.
2. Check the DIMM error LEDs.
3. Remove the failing DIMM
from the system board.
4. Check for a server firmware
update.
Important: Some cluster
solutions require specific code
levels or coordinated code
updates. If the device is part
of a cluster solution, verify
that the latest level of code is
supported for the cluster
solution before you update
the code.
5. Make sure that the installed
DIMMs are supported and
configured correctly.
6. (Trained service technician
only) Replace the system
board.
Chapter 5. Diagnostics45
Page 54
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Sensor Sys Board Fault has
transitioned to critical from a less
severe state.
The Power Supply (Power Supply: n)
has Failed.
(n = power supply number)
Sensor PS n Fan Fault has
transitioned to critical from a less
severe state.
(n = power supply number)
ErrorA sensor has changed to
Critical state from a less
severe state.
ErrorPower supply nhas failed.
(n = power supply number)
ErrorA sensor has changed to
Critical state from a less
severe state.
1. Check the system-event log.
2. Check for an error LED on
the system board.
3. Replace any failing device.
4. Check for a server firmware
update.
Important: Some cluster
solutions require specific code
levels or coordinated code
updates. If the device is part
of a cluster solution, verify
that the latest level of code is
supported for the cluster
solution before you update
the code.
5. (Trained service technician
only) Replace the system
board.
1. If the power-on LED is lit,
complete the following steps:
a. Reduce the server to the
minimum configuration.
b. Reinstall the components
one at a time, restarting
the server each time.
c. If the error recurs, replace
the component that you
just reinstalled.
2. Reseat power supply n.
3. Replace power supply n.
(n = power supply number)
1. Make sure that there are no
obstructions, such as bundled
cables, to the airflow from the
power-supply fan.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Redundancy Cooling Zone 3 has
been reduced.
ErrorRedundancy has been lost
and is insufficient to continue
operation.
1. Make sure that the connector
on fan 3 and fan 6 (if
installed) is not damaged.
2. Make sure that the fan
connectors on the system
board are not damaged.
3. Make sure that the fan cage
is correctly installed.
4. Reseat the fan.
5. Replace the fan.
Sensor RAID Error has transitioned
to critical from a less severe state.
ErrorA sensor has changed to
Critical state from a less
severe state.
1. Check the hard disk drive
LEDs.
2. Reseat the hard disk drive for
which the status LED is lit.
3. Replace the defective hard
disk drive.
The Drive n Status has been
removed from unit Drive 0 Status.
ErrorA drive has been removed.Reseat hard disk drive n.
(n = hard disk drive number)
(n = hard disk drive number)
The Drive n Status has been
disabled due to a detected fault.
(n = hard disk drive number)
ErrorA drive has been disabled
because of a fault.
1. Run the hard disk drive
diagnostic test on drive n.
2. Reseat the following
components:
a. Hard disk drive
b. Cable from the system
board to the backplane
3. Replace the following
components one at a time, in
the order shown, restarting
the server each time:
a. Hard disk drive
b. Cable from the system
board to the backplane
c. Hard disk drive backplane
(n = hard disk drive number)
Array %1 is in critical condition.
(%1 = CIM_ComputerSystem.
ElementName)
Array %1 has failed.
(%1 = CIM_ComputerSystem.
ElementName)
ErrorAn array is in Critical state.
(Sensor = Drive n Status)
(n = hard disk drive number)
ErrorAn array is in Failed state.
(Sensor = Drive n Status)
(n = hard disk drive number)
Replace the hard disk drive that
is indicated by a lit status LED.
Replace the hard disk drive that
is indicated by a lit status LED.
Chapter 5. Diagnostics51
Page 60
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Memory uncorrectable error detected
for DIMM All DIMMs on Memory
Subsystem All DIMMs.
Memory Logging Limit Reached for
DIMM All DIMMs on Memory
Subsystem All DIMMs.
Memory DIMM Configuration Error
for All DIMMs on Memory Subsystem
All DIMMs.
Memory uncorrectable error detected
for DIMM One of the DIMMs on
Memory Subsystem One of the
DIMMs.
ErrorA memory uncorrectable
error has occurred.
ErrorThe memory logging limit
has been reached.
ErrorA DIMM configuration error
has occurred.
ErrorA memory uncorrectable
error has occurred.
1. If the server failed the POST
memory test, reseat the
DIMMs.
2. Replace any DIMM that is
indicated by a lit error LED.
Note: You do not have to
replace DIMMs by pairs.
3. Run the Setup utility to enable
all the DIMMs.
4. Run the DSA memory test.
1. Update the server firmware to
the latest level.
Important: Some cluster
solutions require specific code
levels or coordinated code
updates. If the device is part
of a cluster solution, verify
that the latest level of code is
supported for the cluster
solution before you update
the code.
2. Reseat the DIMMs and run
the DSA memory test.
3. Replace any DIMM that is
indicated by a lit error LED.
Make sure that DIMMs are
installed in the correct sequence
and have the same size, type,
speed, and technology.
1. If the server failed the POST
memory test, reseat the
DIMMs.
2. Replace any DIMM that is
indicated by a lit error LED.
Note: You do not have to
replace DIMMs by pairs.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Memory Logging Limit Reached for
DIMM One of the DIMMs on Memory
Subsystem One of the DIMMs.
ErrorThe memory logging limit
has been reached.
1. Update the server firmware to
the latest level.
Important: Some cluster
solutions require specific code
levels or coordinated code
updates. If the device is part
of a cluster solution, verify
that the latest level of code is
supported for the cluster
solution before you update
the code.
2. Reseat the DIMMs and run
the DSA memory test.
3. Replace any DIMM that is
indicated by a lit error LED.
Memory DIMM Configuration Error
for One of the DIMMs on Memory
Subsystem One of the DIMMs.
ErrorA DIMM configuration error
has occurred.
Make sure that DIMMs are
installed in the correct sequence
and have the same size, type,
speed, and technology.
Memory uncorrectable error detected
for DIMM n Status on Memory
Subsystem DIMM n Status.
(n = DIMM number)
ErrorA memory uncorrectable
error has occurred.
1. If the server failed the POST
memory test, reseat the
DIMMs.
2. Replace any DIMM that is
indicated by a lit error LED.
Note: You do not have to
replace DIMMs by pairs.
3. Run the Setup utility to enable
all the DIMMs.
4. Run the DSA memory test.
5. (Trained service technician
only) Replace the system
board.
Memory Logging Limit Reached for
DIMM nStatus on Memory
Subsystem DIMMnStatus.
(n = DIMM number)
ErrorThe memory logging limit
has been reached.
1. Update the server firmware to
the latest level.
Important: Some cluster
solutions require specific code
levels or coordinated code
updates. If the device is part
of a cluster solution, verify
that the latest level of code is
supported for the cluster
solution before you update
the code.
2. Reseat the DIMMs and run
the DSA memory test.
3. Replace any DIMM that is
indicated by a lit error LED.
Chapter 5. Diagnostics53
Page 62
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Memory DIMM Configuration Error
for DIMM nStatus on Memory
Subsystem DIMM nStatus.
(n = DIMM number)
Sensor DIMM n Temp has
transitioned to critical from a less
severe state.
(n = DIMM number)
A PCI PERR has occurred on system
%1.
(%1 = CIM_ComputerSystem.
ElementName)
ErrorA DIMM configuration error
has occurred.
ErrorA sensor has changed to
Critical state from a less
severe state.
ErrorA PCI PERR has occurred.
(Sensor = PCI Slot n; n =
PCI slot number)
Make sure that DIMMs are
installed in the correct sequence
and have the same size, type,
speed, and technology.
1. Make sure that the fans are
operating, that there are no
obstructions to the airflow,
that the air baffles are in
place and correctly installed,
and that the server cover is
installed and completely
closed.
2. If a fan has failed, complete
the action for a fan failure.
3. Replace DIMM n.
(n = DIMM number)
1. Check the extender-card
LEDs.
2. Reseat the affected adapters
and extender card.
3. Update the server and
adapter firmware (UEFI and
IMM).
Important: Some cluster
solutions require specific code
levels or coordinated code
updates. If the device is part
of a cluster solution, verify
that the latest level of code is
supported for the cluster
solution before you update
the code.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
A PCI SERR has occurred on system
%1.
(%1 = CIM_ComputerSystem.
ElementName)
ErrorA PCI SERR has occurred.
(Sensor = PCI Slot n; n =
PCI slot number)
1. Check the extender-card
LEDs.
2. Reseat the affected adapters
and extender card.
3. Update the server and
adapter firmware (UEFI and
IMM).
Important: Some cluster
solutions require specific code
levels or coordinated code
updates. If the device is part
of a cluster solution, verify
that the latest level of code is
supported for the cluster
solution before you update
the code.
4. Remove the adapter from slot
n.
5. Replace the PCIe adapter.
6. Replace extender card n.
(n = PCI slot number)
A PCI PERR has occurred on system
%1.
(%1 = CIM_ComputerSystem.
ElementName)
ErrorA PCI PERR has occurred.
(Sensor = One of PCI Err)
1. Check the extender-card
LEDs.
2. Reseat the affected adapters
and riser card.
3. Update the server and
adapter firmware (UEFI and
IMM).
Important: Some cluster
solutions require specific code
levels or coordinated code
updates. If the device is part
of a cluster solution, verify
that the latest level of code is
supported for the cluster
solution before you update
the code.
4. Remove both adapters.
5. Replace the PCIe adapter.
6. Replace the extender card.
7. (Trained service technician
only) Replace the system
board.
Chapter 5. Diagnostics55
Page 64
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
A PCI SERR has occurred on system
%1.
(%1 = CIM_ComputerSystem.
ElementName)
Fault in slot System board on system
%1.
(%1 = CIM_ComputerSystem.
ElementName)
ErrorA PCI SERR has occurred.
(Sensor = One of PCI Err)
Error
1. Check the extender-card
LEDs.
2. Reseat the affected adapters
and extender card.
3. Update the server and
adapter firmware (UEFI and
IMM).
Important: Some cluster
solutions require specific code
levels or coordinated code
updates. If the device is part
of a cluster solution, verify
that the latest level of code is
supported for the cluster
solution before you update
the code.
4. Remove both adapters.
5. Replace the PCIe adapter.
6. Replace the extender card.
7. (Trained service technician
only) Replace the system
board.
1. Check the extender-card
LEDs.
2. Reseat the affected adapters
and extender card.
3. Update the server and
adapter firmware (UEFI and
IMM).
Important: Some cluster
solutions require specific code
levels or coordinated code
updates. If the device is part
of a cluster solution, verify
that the latest level of code is
supported for the cluster
solution before you update
the code.
4. Remove both adapters.
5. Replace the PCIe adapter.
6. Replace the extender card.
7. (Trained service technician
only) Replace the system
board.
Ethernet Duplex setting modified
from %1 to %2 by user %3.
(%1 = CIM_EthernetPort.
InfoA user has modified the
Ethernet port MAC address
setting.
No action; information only.
NetworkAddresses;
%2 = CIM_EthernetPort.
NetworkAddresses;
%3 = user ID)
Ethernet interface %1 by user %2.
(%1 =
CIM_EthernetPort.EnabledState;
InfoA user has enabled or
disabled the Ethernet
interface.
No action; information only.
%2 = user ID)
Chapter 5. Diagnostics57
Page 66
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Hostname set to %1 by user %2.
(%1 = CIM_DNSProtocolEndpoint.
Hostname;
%2 = user ID)
IP address of network interface
modified from %1 to %2 by user %3.
(%1 = CIM_IPProtocolEndpoint.
IPv4Address;
%2 = CIM_StaticIPAssignment
SettingData.IPAddress;
%3 = user ID)
IP subnet mask of network interface
modified from %1 to %2 by user
%3s.
(%1 = CIM_IPProtocolEndpoint.
SubnetMask;
%2 = CIM_StaticIPAssignment
SettingData.SubnetMask;
%3 = user ID)
IP address of default gateway
modified from %1 to %2 by user
%3s.
(%1 = CIM_IPProtocolEndpoint.
GatewayIPv4Address;
%2 = CIM_StaticIPAssignment
SettingData.
DefaultGatewayAddress;
%3 = user ID)
OS Watchdog response %1 by %2.
(%1 = Enabled or Disabled; %2 =
user ID)
DHCP[%1] failure, no IP address
assigned.
(%1 = IP address, xxx.xxx.xxx.xxx)
Remote Login Successful. Login ID:
%1 from %2 at IP address %3.
(%1 = user ID; %2 =
ValueMap(CIM_ProtocolEndpoint.
ProtocolIFType; %3 = IP address,
xxx.xxx.xxx.xxx)
InfoA user has modified the host
name of the IMM.
InfoA user has modified the IP
address of the IMM.
InfoA user has modified the IP
subnet mask of the IMM.
InfoA user has modified the
default gateway IP address
of the IMM.
InfoA user has enabled or
disabled an OS Watchdog.
InfoA DHCP server has failed to
assign an IP address to the
IMM.
InfoA user has successfully
logged in to the IMM.
No action; information only.
No action; information only.
No action; information only.
No action; information only.
No action; information only.
1. Make sure that the network
cable is connected.
2. Make sure that there is a
DHCP server on the network
that can assign an IP address
to the IMM.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Attempting to %1 server %2 by user
%3.
(%1 = Power Up, Power Down,
InfoA user has used the IMM to
perform a power function on
the server.
No action; information only.
Power Cycle, or Reset; %2 =
Lenovo_ComputerSystem.
ElementName; %3 = user ID)
Security: Userid: '%1' had %2 login
failures from WEB client at IP
address %3.
(%1 = user ID; %2 =
MaximumSuccessiveLoginFailures
(currently set to 5 in the firmware);
%3 = IP address, xxx.xxx.xxx.xxx)
Security: Login ID: '%1' had %2 login
failures from CLI at %3.
(%1 = user ID; %2 =
MaximumSuccessiveLoginFailures
(currently set to 5 in the firmware);
%3 = IP address, xxx.xxx.xxx.xxx)
Remote access attempt failed. Invalid
userid or password received. Userid
is '%1' from WEB browser at IP
address %2.
(%1 = user ID; %2 = IP address,
xxx.xxx.xxx.xxx)
Remote access attempt failed. Invalid
userid or password received. Userid
is '%1' from TELNET client at IP
address %2.
(%1 = user ID; %2 = IP address,
xxx.xxx.xxx.xxx)
The Chassis Event Log (CEL) on
system %1 cleared by user %2.
ErrorA user has exceeded the
maximum number of
unsuccessful login attempts
from a Web browser and has
been prevented from logging
in for the lockout period.
ErrorA user has exceeded the
maximum number of
unsuccessful login attempts
from the command-line
interface and has been
prevented from logging in for
the lockout period.
ErrorA user has attempted to log
in from a Web browser by
using an invalid login ID or
password.
ErrorA user has attempted to log
in from a Telnet session by
using an invalid login ID or
password.
InfoA user has cleared the IMM
event log.
1. Make sure that the correct
login ID and password are
being used.
2. Have the system
administrator reset the login
ID or password.
1. Make sure that the correct
login ID and password are
being used.
2. Have the system
administrator reset the login
ID or password.
1. Make sure that the correct
login ID and password are
being used.
2. Have the system
administrator reset the login
ID or password.
1. Make sure that the correct
login ID and password are
being used.
2. Have the system
administrator reset the login
ID or password.
No action; information only.
(%1 = CIM_ComputerSystem.
ElementName; %2 = user ID)
IMM reset was initiated by user %1.
(%1 = user ID)
InfoA user has initiated a reset
of the IMM.
No action; information only.
Chapter 5. Diagnostics59
Page 68
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Watchdog %1 Failed to Capture
Screen.
(%1 = OS Watchdog or Loader
Watchdog)
ErrorAn operating-system error
has occurred, and the
screen capture failed.
1. Reconfigure the watchdog
timer to a higher value.
2. Make sure that the IMM
Ethernet over USB interface
is enabled.
3. Reinstall the RNDIS or
cdc_ether device driver for
the operating system.
4. Disable the watchdog.
5. Check the integrity of the
installed operating system.
6. Update the IMM firmware.
Important: Some cluster
solutions require specific code
levels or coordinated code
updates. If the device is part
of a cluster solution, verify
that the latest level of code is
supported for the cluster
solution before you update
the code.
Running the backup IMM main
application.
ErrorThe IMM has resorted to
running the backup main
application.
Update the IMM firmware.
Important: Some cluster
solutions require specific code
levels or coordinated code
updates. If the device is part of a
cluster solution, verify that the
latest level of code is supported
for the cluster solution before you
update the code.
Please ensure that the IMM is
flashed with the correct firmware. The
IMM is unable to match its firmware
to the server.
ErrorThe server does not support
the installed IMM firmware
version.
Update the IMM firmware to a
version that the server supports.
Important: Some cluster
solutions require specific code
levels or coordinated code
updates. If the device is part of a
cluster solution, verify that the
latest level of code is supported
for the cluster solution before you
update the code.
IMM reset was caused by restoring
default values.
InfoThe IMM has been reset
because a user has restored
No action; information only.
the configuration to its
default settings.
IMM clock has been set from NTP
server %1.
(%1 =
Lenovo_NTPService.ElementName)
InfoThe IMM clock has been set
to the date and time that is
provided by the Network
Time Protocol server.
No action; information only.
Chapter 5. Diagnostics61
Page 70
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
SSL data in the IMM configuration
data is invalid. Clearing configuration
data region and disabling SSL+H25.
Flash of %1 from %2 succeeded for
user %3.
(%1 = CIM_ManagedElement.
ElementName;
%2 = Web or LegacyCLI;
%3 = user ID)
Flash of %1 from %2 failed for user
%3.
(%1 = CIM_ManagedElement.
ElementName;
%2 = Web or LegacyCLI;
%3 = user ID)
The Chassis Event Log (CEL) on
system %1 is 75% full.
(%1 = CIM_ComputerSystem.
ElementName)
The Chassis Event Log (CEL) on
system %1 is 100% full.
(%1 = CIM_ComputerSystem.
ElementName)
%1 Platform Watchdog Timer expired
for %2.
(%1 = OS Watchdog or Loader
Watchdog; %2 = OS Watchdog or
Loader Watchdog)
IMM Test Alert Generated by %1.
(%1 = user ID)
ErrorThere is a problem with the
certificate that has been
imported into the IMM. The
imported certificate must
contain a public key that
corresponds to the key pair
that was previously
generated through the
Generate a New Key and
Certificate Signing
Request link.
InfoA user has successfully
updated one of the following
firmware components:
v IMM main application
v IMM boot ROM
v Server firmware
v Diagnostics
v Integrated service
processor
InfoAn attempt to update a
firmware component from
the interface and IP address
has failed.
InfoThe IMM event log is 75%
full. When the log is full,
older log entries are
replaced by newer ones.
InfoThe IMM event log is full.
When the log is full, older
log entries are replaced by
newer ones.
ErrorA Platform Watchdog Timer
Expired event has occurred.
InfoA user has generated a test
alert from the IMM.
1. Make sure that the certificate
that you are importing is
correct.
2. Try to import the certificate
again.
No action; information only.
Try to update the firmware again.
To avoid losing older log entries,
save the log as a text file and
clear the log.
To avoid losing older log entries,
save the log as a text file and
clear the log.
1. Reconfigure the watchdog
timer to a higher value.
2. Make sure that the IMM
Ethernet over USB interface
is enabled.
3. Reinstall the RNDIS or
cdc_ether device driver for
the operating system.
4. Disable the watchdog.
5. Check the integrity of the
installed operating system.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Security: Userid: '%1' had %2 login
failures from an SSH client at IP
address %3.
(%1 = user ID; %2 =
MaximumSuccessiveLoginFailures
(currently set to 5 in the firmware);
%3 = IP address, xxx.xxx.xxx.xxx)
ErrorA user has exceeded the
maximum number of
unsuccessful login attempts
from SSH and has been
prevented from logging in for
the lockout period.
1. Make sure that the correct
login ID and password are
being used.
2. Have the system
administrator reset the login
ID or password.
Chapter 5. Diagnostics63
Page 72
Troubleshooting tables
Use the troubleshooting tables to find solutions to problems that have identifiable
symptoms.
If you cannot find a problem in these tables, see “Running the diagnostic programs”
on page 90 for information about testing the server.
|
|
|
|
|
|
|
If you have just added new software or a new optional device and the server is not
working, complete the following steps before you use the troubleshooting tables:
1. Check the operator information panel and the EasyLED diagnostics LEDs (see
“EasyLED diagnostics” on page 76).
2. Remove the software or device that you just added.
3. Run the diagnostic tests to determine whether the server is running correctly.
4. Reinstall the new software or new device.
DVD drive problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
SymptomAction
The DVD drive is not
recognized.
A DVD is not working correctly.
1. Make sure that:
v The SATA channel to which the DVD drive is attached (primary or
secondary) is enabled in the Setup utility.
v All cables and jumpers are installed correctly.
v The signal cable and connector are not damaged and the connector pins are
not bent.
v The correct device driver is installed for the DVD drive.
2. Run the DVD drive diagnostic programs.
3. Reseat the following components:
a. DVD drive
b. DVD drive cables
4. Replace the following components one at a time, in the order shown, restarting
the server each time:
a. DVD drive
b. DVD drive and cables
c. (Trained service technician only) System board
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
SymptomAction
The DVD drive tray is not
working.
1. Make sure that the server is turned on.
2. Insert the end of a straightened paper clip into the manual tray-release
opening.
3. Reseat the DVD drive.
4. Replace the DVD drive.
General problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
SymptomAction
A cover lock is broken, an LED
is not working, or a similar
problem has occurred.
If the part is a CRU, replace it. If the part is a FRU, the part must be replaced by a
trained service technician.
Hard disk drive problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
SymptomAction
Not all drives are recognized by
the hard disk drive diagnostic
tests.
The server stops responding
during the hard disk drive
diagnostic test.
A hard disk drive was not
detected while the operating
system was being started.
Remove the drive that is indicated by the diagnostic tests; then, run the hard disk
drive diagnostic tests again. If the remaining drives are recognized, replace the
drive that you removed with a new one.
Remove the hard disk drive that was being tested when the server stopped
responding, and run the diagnostic test again. If the hard disk drive diagnostic test
runs successfully, replace the drive that you removed with a new one.
Reseat all hard disk drives and cables; then, run the hard disk drive diagnostic
tests again.
Chapter 5. Diagnostics65
Page 74
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
SymptomAction
A hard disk drive passes the
diagnostic Fixed Disk Test, but
the problem remains.
Run the diagnostic SCSI Fixed Disk Test (see “Running the diagnostic programs”
on page 90).
Note: This test is not available on servers that have RAID arrays or servers that
have SATA hard disk drives.
Intermittent problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
SymptomAction
A problem occurs only
occasionally and is difficult to
diagnose.
1. Make sure that:
v All cables and cords are connected securely to the rear of the server and
attached devices.
v When the server is turned on, air is flowing from the fan grille. If there is no
airflow, the fan is not working. This can cause the server to overheat and
shut down.
2. Check the system-event log or IMM log (see “Event logs” on page 27).
3. See “Solving undetermined problems” on page 124.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
SymptomAction
All or some keys on the
keyboard do not work.
|
|
The mouse or pointing device
|
does not work.
|
|
|
|
|
|
|
1. Make sure that:
v The keyboard cable is securely connected.
v The server and the monitor are turned on.
2. See http://www.lenovo.com/thinkserver and then click Options. Open the
Server Options Guide.pdf for keyboard compatibility.
3. If you are using a USB keyboard, run the Setup utility and enable keyboardless
operation to prevent the 301 POST error message from being displayed during
startup.
4. If you are using a USB keyboard and it is connected to a USB hub, disconnect
the keyboard from the hub and connect it directly to the server.
5. Replace the following components one at a time, in the order shown, restarting
the server each time:
a. Keyboard
b. (Trained service technician only) System board
1. Make sure that:
v The mouse or pointing device is compatible with the server. See
http://www.lenovo.com/thinkserver and then click Options. Open the Server
Options Guide.pdf.
v The mouse or pointing-device cable is securely connected to the server.
v The mouse or pointing-device device drivers are installed correctly.
v The server and the monitor are turned on.
v The mouse is enabled in the Setup utility.
2. If you are using a USB mouse or pointing device and it is connected to a USB
hub, disconnect the mouse or pointing device from the hub and connect it
directly to the server.
3. Replace the following components one at a time, in the order shown, restarting
the server each time:
a. Mouse or pointing device
b. (Trained service technician only) System board
Chapter 5. Diagnostics67
Page 76
Memory problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
SymptomAction
The amount of system memory
that is displayed is less than the
amount of installed physical
memory.
Multiple rows of DIMMs in a
branch are identified as failing.
1. Make sure that:
v No error LEDs are lit on the operator information panel or on the DIMM.
v Memory mirroring does not account for the discrepancy.
v The memory modules are seated correctly.
v You have installed the correct type of memory.
v If you changed the memory, you updated the memory configuration in the
Setup utility.
v All banks of memory are enabled. The server might have automatically
disabled a memory bank when it detected a problem, or a memory bank
might have been manually disabled.
2. Check the POST event log for DIMM error messages:
v If a DIMM was disabled by a system-management interrupt (SMI), replace
the DIMM.
v If a DIMM was disabled by the user or by POST, run the Setup utility and
enable the DIMM.
3. Run memory diagnostics (see “Running the diagnostic programs” on page 90).
4. Make sure that there is no memory mismatch when the server is at the
minimum memory configuration (two 512 MB DIMMs; see the information about
the minimum required configuration on page “Solving undetermined problems”
on page 124).
5. Add one pair of DIMMs at a time, making sure that the DIMMs in each pair are
matching.
6. Reseat the DIMMs.
7. Replace the components in step 6, one at a time, in the order shown, restarting
the server each time.
1. Reseat the DIMMs; then, restart the server.
2. Replace the lowest-numbered DIMMs with identical known good DIMMs; then,
restart the server. Repeat as necessary. If the failures continue after all
identified pairs are replaced, go to step4.
3. Return the removed DIMMs, one pair at a time, to their original connectors,
restarting the server after each pair, until a pair fails. Replace each DIMM in the
failed pair with an identical known good DIMM, restarting the server after you
reinstall each DIMM. Replace the failed DIMM. Repeat step 3 until you have
tested all removed DIMMs.
4. (Trained service technician only) Replace the system board.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
SymptomAction
The server emits a continuous
beep during POST, indicating
that the startup (boot)
microprocessor is not working
correctly.
1. Correct any errors that are indicated by the EasyLED diagnostics LEDs (see
“EasyLED diagnostics” on page 76).
2. Make sure that the server supports all the microprocessors and that the
microprocessors match in speed and cache size.
3. (Trained service technician only) Reseat microprocessor 1
4. (Trained service technician only) If there is no indication of which
microprocessor has failed, isolate the error by testing with one microprocessor
at a time.
5. Replace the following components one at a time, in the order shown, restarting
the server each time:
a. (Trained service technician only) Microprocessor 2
b. VRM 2
c. (Trained service technician only) System board
6. (Trained service technician only) If multiple error codes or EasyLED diagnostics
LEDs indicate a microprocessor error, reverse the locations of two
microprocessors to determine whether the error is associated with a
microprocessor or with a microprocessor socket.
v If the error is associated with a microprocessor, replace the microprocessor.
v If the error is associated with a VRM, replace the VRM.
v If the error is associated with a microprocessor socket, replace the system
board.
Chapter 5. Diagnostics69
Page 78
Monitor problems
Some Lenovo monitors have their own self-tests. If you suspect a problem with your
monitor, see the documentation that comes with the monitor for instructions for
testing and adjusting the monitor.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
SymptomAction
Testing the monitor
The screen is blank.
The monitor works when you
turn on the server, but the
screen goes blank when you
start some application
programs.
1. Make sure that the monitor cables are firmly connected.
2. Try using a different monitor on the server, or try using the monitor that is being
tested on a different server.
3. Run the diagnostic programs. If the monitor passes the diagnostic programs,
the problem might be a video device driver.
4. (Trained service technician only) Replace the system board.
1. If the server is attached to a KVM switch, bypass the KVM switch to eliminate it
as a possible cause of the problem: connect the monitor cable directly to the
correct connector on the rear of the server.
2. Make sure that:
v The server is turned on. If there is no power to the server, see “Power
problems” on page 73.
v The monitor cables are connected correctly.
v The monitor is turned on and the brightness and contrast controls are
adjusted correctly.
v No POST errors are generated when the server is turned on.
3. Make sure that the correct server is controlling the monitor, if applicable.
4. See “Solving undetermined problems” on page 124.
1. Make sure that:
v The application program is not setting a display mode that is higher than the
capability of the monitor.
v You installed the necessary device drivers for the application.
2. Run video diagnostics (see “Running the diagnostic programs” on page 90).
v If the server passes the video diagnostics, the video is good; see “Solving
undetermined problems” on page 124.
v (Trained service technician only) If the server fails the video diagnostics,
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
SymptomAction
The monitor has screen jitter, or
the screen image is wavy,
unreadable, rolling, or distorted.
1. If the monitor self-tests show that the monitor is working correctly, consider the
location of the monitor. Magnetic fields around other devices (such as
transformers, appliances, fluorescent lights, and other monitors) can cause
screen jitter or wavy, unreadable, rolling, or distorted screen images. If this
happens, turn off the monitor.
Attention: Moving a color monitor while it is turned on might cause screen
discoloration.
Move the device and the monitor at least 305 mm (12 in.) apart, and turn on
the monitor.
Notes:
a. To prevent diskette drive read/write errors, make sure that the distance
between the monitor and any external diskette drive is at least 76 mm (3
in.).
b. Non-Lenovo monitor cables might cause unpredictable problems.
2. Reseat the monitor.
3. Replace the following components one at a time, in the order shown, restarting
the server each time:
a. Monitor
b. (Trained service technician only) System board
Wrong characters appear on the
screen.
1. If the wrong language is displayed, update the server firmware with the correct
language (see “Updating the firmware” on page 267).
2. Reseat the monitor
3. Replace the following components one at a time, in the order shown, restarting
the server each time:
a. Monitor
b. (Trained service technician only) System board
Chapter 5. Diagnostics71
Page 80
Optional-device problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
SymptomAction
An optional device that was just
|
installed does not work.
|
|
|
|
|
|
|
An optional device that used to
work does not work now.
1. Make sure that:
v The device is designed for the server (See http://www.lenovo.com/thinkserver
and then click Options. Open the Server Options Guide.pdf).
v You followed the installation instructions that came with the device and the
device is installed correctly.
v You have not loosened any other installed devices or cables.
v You updated the configuration information in the Setup utility. Whenever
memory or any other device is changed, you must update the configuration.
2. Reseat the device that you just installed.
3. Replace the device that you just installed.
1. Make sure that all of the hardware and cable connections for the device are
secure.
2. If the device comes with test instructions, use those instructions to test the
device.
3. If the failing device is a SCSI device, make sure that:
v The cables for all external SCSI devices are connected correctly.
v The last device in each SCSI chain, or the end of the SCSI cable, is
terminated correctly.
v Any external SCSI device is turned on. You must turn on an external SCSI
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
SymptomAction
The power-control button does
not work (the server does not
start).
Note: The power-control button
will not function until 3 minutes
after the server has been
connected to ac power.
The server does not turn off.
1. Make sure that the power-control button is working correctly:
a. Disconnect the server power cords.
b. Reconnect the power cords.
c. (Trained service technician only) Reseat the operator information panel
cables, and then repeat steps 1a and 1b. If the server starts, reseat the
operator information panel. If the problem remains, replace the operator
information panel.
2. Make sure that:
v The power cords are correctly connected to the server and to a working
electrical outlet.
v The type of memory that is installed is correct.
v The DIMM is fully seated.
v The LEDs on the power supply do not indicate a problem.
v The microprocessors are installed in the correct sequence.
3. Reseat the following components:
a. DIMMs
b. (Trained service technician only) Power switch connector
c. (Trained service technician only) Power backplane
4. Replace the following components one at a time, in the order shown, restarting
the server each time:
a. DIMMs
b. (Trained service technician only) Power switch connector
c. (Trained service technician only) Power backplane
d. (Trained service technician only) System board
5. If you just installed an optional device, remove it, and restart the server. If the
server now starts, you might have installed more devices than the power supply
supports.
6. See “Power-supply LEDs” on page 88.
7. See “Solving undetermined problems” on page 124.
1. Determine whether you are using an Advanced Configuration and Power
Interface (ACPI) or a non-ACPI operating system. If you are using a non-ACPI
operating system, complete the following steps:
a. Press Ctrl+Alt+Delete.
b. Turn off the server by pressing the power-control button for 5 seconds.
c. Restart the server.
d. If the server fails POST and the power-control button does not work,
disconnect the power cord for 20 seconds; then, reconnect the power cord
and restart the server.
2. If the problem remains or if you are using an ACPI-aware operating system,
suspect the system board.
Chapter 5. Diagnostics73
Page 82
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
SymptomAction
The server unexpectedly shuts
down, and the LEDs on the
operator information panel are
not lit.
See “Solving undetermined problems” on page 124.
Serial port problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
SymptomAction
The number of serial ports that
are identified by the operating
system is less than the number
of installed serial ports.
A serial device does not work.
1. Make sure that:
v Each port is assigned a unique address in the Setup utility and none of the
serial ports is disabled.
v The serial port adapter (if one is present) is seated correctly.
2. Reseat the serial port adapter.
3. Replace the serial port adapter.
1. Make sure that:
v The device is compatible with the server.
v The serial port is enabled and is assigned a unique address.
v The device is connected to the correct connector.
2. Reseat the following components:
a. Failing serial device
b. Serial cable
3. Replace the following components one at a time, in the order shown, restarting
the server each time:
a. Failing serial device
b. Serial cable
c. (Trained service technician only) System board
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
SymptomAction
You suspect a software
problem.
1. To determine whether the problem is caused by the software, make sure that:
v The server has the minimum memory that is needed to use the software. For
memory requirements, see the information that comes with the software. If
you have just installed an adapter or memory, the server might have a
memory-address conflict.
v The software is designed to operate on the server.
v Other software works on the server.
v The software works on another server.
2. If you receive any error messages while you use the software, see the
information that comes with the software for a description of the messages and
suggested solutions to the problem.
3. Contact the software vendor.
Universal Serial Bus (USB) port problems
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
SymptomAction
A USB device does not work.
1. Run USB diagnostics (see “Running the diagnostic programs” on page 90).
2. Make sure that:
v The correct USB device driver is installed.
v The operating system supports USB devices.
v A standard PS/2 keyboard or mouse is not connected to the server. If it is, a
USB keyboard or mouse will not work during POST.
3. Make sure that the USB configuration optional devices are set correctly in the
Setup utility (see “Setup Utility menu choices” on page 252 for more
information).
4. If you are using a USB hub, disconnect the USB device from the hub and
connect it directly to the server.
Chapter 5. Diagnostics75
Page 84
EasyLED diagnostics
|
EasyLED diagnostics is a system of LEDs on various external and internal
components of the server. When an error occurs, LEDs are lit throughout the
server. By viewing the LEDs in a particular order, you can often identify the source
of the error.
When LEDs are lit to indicate an error, they remain lit when the server is turned off,
provided that the server is still connected to power and the power supply is
operating correctly.
Before you work inside the server to view the EasyLED diagnostics LEDs, read the
safety information that begins on page 5.
If an error occurs, view the EasyLED diagnostics LEDs in the following order:
1. Look at the operator information panel LEDs on the front of the server.
v If an operator information panel LED is lit, it indicates that information about a
suboptimal condition in the server is available in the system-event log.
v If the system-error LED is lit, it indicates that an error has occurred; go to
step 2 on page 77.
The following illustration shows the operator information panel LEDs that are
visible through the bezel.
1System power-on LED
2Hard disk drive activity LED
3System-locator LED
4System-information LED
5System-error LED
The following table lists the operator information panel LEDs, the problems that
they indicate, and actions to solve the problems.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See the Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Lit EasyLED diagnostics LEDs with the
system-error or information LED also litDescription
System power (green)
Hard disk drive activity (green)When this LED is flashing rapidly, it indicates that there is activity on a
System information (amber)When this amber LED is lit, it indicates that information about a
System error (amber)When this LED is lit, it indicates that a system error has occurred. Use
v Off: AC power is not present, or the power supply or the LED itself
has failed.
v Flashing rapidly (4 times per second): The server is turned off
and is not ready to be turned on. The power-control button is
disabled. Approximately 3 minutes after the server is connected to
ac power, the power-control button becomes active.
v Flashing slowly (once per second): The server is turned off and is
ready to be turned on. You can press the power-control button to
turn on the server.
v Lit: The server is turned on.
v Fading on and off: The server is in a reduced-power state. To
wake the server, press the power-control button or use the IMM
Web interface.
hard disk drive.
suboptimal condition in the server is available in the IMM event log or
in the system-event log. Check the EasyLED panel for more
information.
the EasyLED panel and the system service label to further isolate the
error.
2. Look at the EasyLED panel on the front of the server. Lit LEDs on the EasyLED
panel indicate the type of error that has occurred.
The following illustration shows the EasyLED panel LEDs that are visible
through the bezel.
Chapter 5. Diagnostics77
Page 86
1Server processor bus8Power supply
2Microprocessor9Fan
3VRM10PCI bus
4Microprocessor/memory configuration 11System board
5Memory12Temperature
6NMI13System-event log
7Hard disk drive/RAID14USB ports
The following table lists the EasyLED diagnostics LEDs, the problems that they
indicate, and actions to solve the problems.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See the Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Lit EasyLED
diagnostics LED with
the system-error or
information LED also
litDescriptionAction
System-event log
(LOG)
TemperatureThe system temperature has
A system error occurred.View the contents of the system-event log (see “Event
logs” on page 27).
exceeded a threshold level.
1. See the system-event log for the source of the
fault (see “System-event log” on page 38).
2. Make sure that the airflow in the server is not
blocked.
3. Make sure that the room temperature is neither too
hot nor too cold (see “Environment” in
“Specifications” on page 17).
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See the Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Lit EasyLED
diagnostics LED with
the system-error or
information LED also
litDescriptionAction
System board (BRD)An error occurred on the system
board.
1. Check the LEDs on the system board to identify
the component that is causing the error. The BRD
LED can be lit for the following conditions:
v Failed or missing battery
v Failed voltage regulator
2. Check the system-event log for information about
the error.
3. Replace any failed or missing replaceable
components, such as the battery.
4. (Trained service technician only) If a voltage
regulator has failed, replace the system board.
PCI busA PCI adapter has failed.
1. See the system-event log (see “System-event log”
on page 38).
2. Check the LEDs on the PCI slots to identify the
component that is causing the error, and reseat the
failing adapter.
3. Replace the following components one at a time,
in the order shown, restarting the server each time:
a. Failing adapter
b. (Trained service technician only) System board
FanA fan has failed or is operating too
slowly.
1. Reinstall the removed fan.
2. If an individual fan LED is lit, replace the fan.
3. (Trained service technician only) Replace the
system board.
Power supplyA power supply has failed or has
been removed.
Note: In a redundant power
configuration, the dc power LED on
one power supply might be off.
1. Check the individual power-supply LEDs.
2. Reseat the following components:
a. Power supply
b. (Trained service technician only) Power-supply
cage cables
3. Replace the following components one at a time,
in the order shown, restarting the server each time:
a. Power supply
b. (Trained service technician only) Power-supply
cage
Chapter 5. Diagnostics79
Page 88
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See the Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Lit EasyLED
diagnostics LED with
the system-error or
information LED also
litDescriptionAction
DASD/RAIDA hard disk drive, SAS controller, or
RAID adapter error has occurred.
Notes:
1. This LED is also lit when a hard
disk drive is removed from the
server.
2. The error LED on the failing
hard disk drive is also lit.
3. Check the system-event log for
a RAID error.
NMIA hardware error has been reported
to the operating system.
1. Reinstall the removed drive.
2. Reseat the following components:
a. Failing hard disk drive
b. SAS hard disk drive backplane
c. SAS signal and power cables
d. System board
e. ServeRAID adapter
3. Replace the components listed in step 2 one at a
time, in the order shown, restarting the server each
time.
1. See the system-event log (see “System-event log”
on page 38).
2. If the PCI LED is lit, follow the instructions for that
LED.
3. If the MEM LED is lit, follow the instructions for
that LED.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See the Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Lit EasyLED
diagnostics LED with
the system-error or
information LED also
litDescriptionAction
Memory (MEM)A memory error has occurred.
Note: The error LED on the DIMM
is also lit.
1. Determine whether the CNFG LED is also lit,
which indicates that the memory configuration is
invalid. Reinstall the DIMMs in a supported
configuration.
2. If the CNFG LED is not lit, one of the following
conditions might be present:
v The server did not start and a failing DIMM LED
is lit:
a. Check for a PFA log event in the
system-event log.
b. Reseat the DIMM.
c. Move the DIMM to a different slot or replace
the DIMM.
d. (Trained service technician only) Replace
the system board.
v The server started, the failing DIMM is disabled,
and the LED is lit:
a. If the LEDs are lit by two DIMMs, check the
system-event log for a PFA event on one of
the DIMMs, and then replace that DIMM.
Otherwise, replace both DIMMs.
b. If the LED is lit by only one DIMM, replace
that DIMM.
c. Re-enable the DIMM, using the Setup utility.
Microprocessor/
Memory Configuration
(CNFG)
A hardware configuration error has
occurred. (This LED is used with the
MEM, VRM, and CPU LEDs.)
1. (The system error LED, CPU LED, and this LED
are lit when POST detects a microprocessor
mismatch.) Remove and install two
microprocessors of the same cache size, type, and
clock speed.
2. (The system error LED, MEM LED, and this LED
are lit when POST detects an invalid memory
configuration.) Remove and install supported
DIMMs (see “Installing a memory module” on page
211).
3. (The system error LED, VRM LED, and this LED
are lit when POST detects a missing VRM.) Install
a VRM for microprocessor 2 (see “Installing a
voltage regulator module” on page 180).
4. Check the system error log for information
indicating incompatible components.
Chapter 5. Diagnostics81
Page 90
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See the Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Lit EasyLED
diagnostics LED with
the system-error or
information LED also
litDescriptionAction
VRMA VRM has failed.
Microprocessor (CPU) A microprocessor has failed, or an
invalid microprocessor configuration
is installed.
Note: (Trained service technician
only) Make sure that the
microprocessors are installed in the
correct sequence.
1. Check the system-event log to determine the
reason for the lit LED (for a VRM).
2. Determine whether the CNFG LED is also lit. If the
CNFG LED is lit, the memory configuration is
invalid. Reseat the VRM.
3. If the CNFG LED is not lit, reseat the following
components:
a. Failing VRM
b. (Trained service technician only)
Microprocessor associated with the VRM
4. Replace the following components one at a time,
in the order shown, restarting the server each time:
a. Failing VRM
b. (Trained service technician only)
Microprocessor associated with the VRM
c. (Trained service technician only) System board
1. Check the system-event log to determine the
reason for the lit LED.
2. Determine whether the CNFG LED is also lit. If the
CNFG LED is not lit, a microprocessor has failed.
a. Make sure that the failing microprocessor,
which is indicated by the CPU1 or CPU2 error
LED on the system board, is installed correctly.
b. Replace the following components one at a
time, in the order shown, restarting the server
each time:
1) (Trained service technician only) Failing
microprocessor
2) (Trained service technician only) System
board
c. If the CNFG LED is lit and the CPU mismatch
LED on the system board is also lit, an invalid
microprocessor configuration is installed:
1) Make sure that the microprocessors are
compatible with each other. They must
match in speed and cache size. Use the
Setup utility to compare the microprocessor
information.
2) (Trained service technician only) Replace
the incompatible microprocessor.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See the Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Lit EasyLED
diagnostics LED with
the system-error or
information LED also
litDescriptionAction
Service processor bus
(SP BUS)
The IMM detects an internal error.
1. Disconnect the server from ac power; then,
reconnect the server to power and restart the
server.
2. Update the IMM firmware.
Look at the system service label on the top of the server, which gives an
overview of internal components that correspond to the LEDs on the EasyLED
panel. This information can often provide enough information to diagnose the
error.
Chapter 5. Diagnostics83
Page 92
3. Remove the server cover and look inside the server for lit LEDs. Certain
components inside the server have LEDs that are lit to indicate the location of a
problem.
The following illustration shows the LEDs on the system board.
1PCI slot 1 error LED9Battery error LED
2PCI slot 2 error LED10System-board error LED
3PCI slot 3 error LED11VRM fail LED
4HS heartbeat LED12CPU 1 error LED
5PCI slot 4 error LED13DIMMs1-8error LEDs (starting
The system board is equipped with a PCI extender card that provides either one
or two additional expansion slots. The following illustration shows the LEDs on
the PCI Express extender card, if one is installed.
The following illustration shows the LEDs on the PCI-X extender card, if one is
installed.
The following table describes the LEDs on the system board and extender card
and suggested actions to correct the detected problems.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Lit EasyLED
diagnostics LED with
the system-error or
information LED also
litDescriptionAction
DIMM 1 to DIMM 16
error LEDs
CPU 1 error LEDMicroprocessor 1 has failed, is
A DIMM has failed or is incorrectly
installed.
missing, or has been incorrectly
installed.
Note: (Trained service technician
only) Make sure that the
microprocessors are installed in the
correct sequence; see “Installing a
microprocessor and heat sink” on
page 220.
1. Remove the DIMM that is indicated by a lit error
LED.
2. Reseat the DIMM.
3. Replace the following components one at a time,
in the order shown, restarting the server each time:
a. DIMM
b. (Trained service technician only) System board
1. Check the system-event log to determine the
reason for the lit LED.
2. (Trained service technician) Reseat the failing
microprocessor.
3. Replace the following components one at a time,
in the order shown, restarting the server each time:
a. (Trained service technician only) Failing
microprocessor
b. (Trained service technician only) System board
Chapter 5. Diagnostics85
Page 94
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Lit EasyLED
diagnostics LED with
the system-error or
information LED also
litDescriptionAction
CPU 2 error LEDMicroprocessor 2 has failed, is
missing, or has been incorrectly
installed.
Note: (Trained service technician
only) Make sure that the
microprocessors are installed in the
correct sequence; see “Installing a
microprocessor and heat sink” on
page 220.
CPU mismatch LEDA mismatched microprocessor has
been installed.
Note: All microprocessors must
have the same speed and cache
size.
VRM failure LEDMicroprocessor 2 VRM has failed or
is incorrectly installed.
System-board error
LED
Battery failure LEDBattery low.
System-board CPU VRD, power
voltage regulators, or both have
failed.
1. Check the system-event log to determine the
reason for the lit LED.
2. Find the failing, missing, or mismatched
microprocessor by checking the LEDs on the
system board.
3. (Trained service technician) Reseat the failing
microprocessor.
4. Replace the following components one at a time,
in the order shown, restarting the server each time:
a. (Trained service technician only) Failing
microprocessor
b. (Trained service technician only) System board
1. Run the Setup utility and view the microprocessor
information to compare the installed
microprocessor specifications.
2. (Trained service technician only) Remove and
replace one of the microprocessors so that they
both match.
1. Reseat the VRM
2. Replace the following components one at a time,
in the order shown, restarting the server each time:
a. VRM
b. (Trained service technician only) System board
3. Replace the VRM
(Trained service technician only) Replace the system
board.
1. Replace the CMOS lithium battery, if necessary.
2. (Trained service technician only) Replace the
system board.
v Follow the suggested actions in the order in which they are listed in the Action column until the problem
is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to
determine which components are customer replaceable units (CRU) and which components are field
replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a
trained service technician.
Lit EasyLED
diagnostics LED with
the system-error or
information LED also
litDescriptionAction
IMM heartbeat LEDIndicates the status of the boot
process of the IMM.
When the server is connected to
power this LED flashes quickly to
indicate that the IMM code is
loading. When the loading is
complete, the LED stops flashing
briefly and then flashes slowly to
indicate that the IMM if fully
operational and you can press the
power-control button to start the
server.
PCI slot 1 to PCI slot
8 error LEDs
H8 heartbeat LEDIndicates the status of power-on and
An error has occurred on a PCI bus
or on the system board. An
additional LED is lit next to a failing
PCI slot.
power-off sequencing.
If the LED does not begin flashing within 30 seconds
of when the server is connected to power, complete
the following steps:
1. (Trained service technician only) Use the IMM
recovery switch to recover the firmware (see
Table 10 on page 144).
2. (Trained service technician only) Replace the
system board.
1. Check the system-event log for information about
the error.
2. If you cannot isolate the failing adapter through the
LEDs and the information in the system-event log,
remove one adapter at a time, and restart the
server after each adapter is removed.
3. If the failure remains, go to http://www.lenovo.com/
support for additional troubleshooting information.
1. If the H8 heartbeat LED is blinking ata1Hzrate,
no action is necessary.
2. (Trained service technician only) If the H8
heartbeat LED is not blinking, replace the system
board.
Remind button
You can use the remind button on the EasyLED panel to put the system-error LED
on the operator information panel into Remind mode. When you press the remind
button, you acknowledge the error but indicate that you will not take immediate
action. The system-error LED flashes while it is in Remind mode and stays in
Remind mode until one of the following conditions occurs:
v All known errors are corrected.
v The server is restarted.
v A new error occurs, causing the system-error LED to be lit again.
Chapter 5. Diagnostics87
Page 96
Power-supply LEDs
The following illustration shows the power-supply LEDs on the rear of the server.
1ac power LED
2dc power LED
3Power error LED
The following table describes the problems that are indicated by various
combinations of the power-supply LEDs and the system power LED on the operator
information panel and suggested actions to correct the detected problems.
the server or a
problem with the
ac power source
and the power
supply had
detected an
internal problem
OffOnOffFaulty power
supply
OffOnOnFaulty power
supply
OnOffOffPower supply not
fully seated,
faulty system
board, or faulty
power supply
OnOff or
Flashing
OnFaulty power
supply
OnOnOffNormal operation
OnOnOnPower supply is
faulty but still
operational
1. Check the ac power to the server.
2. Make sure that the power cord is
connected to a functioning power
source.
3. Turn the server off and then turn the
server back on.
4. If the problem remains, replace the
power supply.
1. Replace the power supply.
2. Make sure that the power cord is
connected to a functioning power
source.
Replace the power supply.
Replace the power supply.
1. Reseat the power supply.
2. If the system-board error LED is not lit,
replace the power supply.
3. (Trained service technician only) If
system-board error LED is lit, replace
the system board.
Replace the power supply.
Replace the power supply.
This is a normal
condition when no
ac power is present.
This happens only
when a second
power supply is
providing power to
the server.
Typically indicates
that a power supply
is not fully seated.
Chapter 5. Diagnostics89
Page 98
Diagnostic programs, messages, and error codes
The diagnostic programs are the primary method of testing the major components
of the server. As you run the diagnostic programs, text messages and error codes
are displayed on the screen and are saved in the test log. A diagnostic text
message or error code indicates that a problem has been detected; to determine
what action you should take as a result of a message or error code, see the table in
“Diagnostic messages” on page 91.
Running the diagnostic programs
To run the diagnostic programs, complete the following steps:
1. If the server is running, turn off the server and all attached devices.
2. Turn on all attached devices; then, turn on the server.
3. When the prompt Press F2 for Dynamic System Analysis (DSA) is displayed,
press F2.
Note: The DSA Preboot diagnostic program might appear to be unresponsive
for an unusual length of time when you start the program. This is normal
operation while the program loads.
4. Optionally, select Quit to DSA to exit from the stand-alone memory diagnostic
program.
Note: After you exit from the stand-alone memory diagnostic environment, you
must restart the server to access the stand-alone memory diagnostic
environment again.
5. Select gui to display the graphical user interface, or select cmd to display the
DSA interactive menu.
6. Follow the instructions on the screen to select the diagnostic test to run.
If the diagnostic programs do not detect any hardware errors but the problem
remains during normal server operations, a software error might be the cause. If
you suspect a software problem, see the information that comes with your software.
A single problem might cause more than one error message. When this happens,
correct the cause of the first error message. The other error messages usually will
not occur the next time you run the diagnostic programs.
Exception: If multiple error codes or EasyLED diagnostics LEDs indicate a
microprocessor error, the error might be in a microprocessor or in a microprocessor
socket. See “Microprocessor problems” on page 69 for information about diagnosing
microprocessor problems.
If the server stops during testing and you cannot continue, restart the server and try
running the diagnostic programs again. If the problem remains, replace the
component that was being tested when the server stopped.
Diagnostic text messages
Diagnostic text messages are displayed while the tests are running. A diagnostic
text message contains one of the following results:
Passed: The test was completed without any errors.
User Aborted: You stopped the test before it was completed.
Not Applicable: You attempted to test a device that is not present in the server.
Aborted: The test could not proceed because of the server configuration.
Warning: The test could not be run. There was no failure of the hardware that was
being tested, but there might be a hardware failure elsewhere, or another problem
prevented the test from running; for example, there might be a configuration
problem, or the hardware might be missing or is not being recognized.
The result is followed by an error code or other additional information about the
error.
Viewing the test log
To view the DSA log when the tests are completed, select Utility from the top of the
screen and then select View Test Log. To view a detailed test log, press Tab while
you view the DSA log. The DSA log data is maintained only while you are running
the diagnostic programs. When you exit from the diagnostic programs, the DSA log
is cleared.
To save the DSA log to a file on a diskette or to the hard disk, click Save Log on
the diagnostic programs screen and specify a location and name for the saved log
file.
Notes:
1. To create and use a diskette, you must add an optional external diskette drive to
the server.
2. To save the test log to a diskette, you must use a diskette that you have
formatted yourself; this function does not work with preformatted diskettes. If the
diskette has sufficient space for the test log, the diskette can contain other data.
Diagnostic messages
The following table describes the messages that the diagnostic programs might
generate and suggested actions to correct the detected problems. Follow the
suggested actions in the order in which they are listed in the column.
Table 4. DSA messages
v Follow the suggested actions in the order in which they are listed in the Action column until the problem is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to determine which
components are customer replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a Trained
service technician.
Message
numberComponentTestStateDescriptionAction
089-000-xxxCPUCPU Stress
test
PassCPU passed
stress test
No action required.
Chapter 5. Diagnostics
91
Page 100
Table 4. DSA messages (continued)
v Follow the suggested actions in the order in which they are listed in the Action column until the problem is solved.
v See Chapter 8, “Parts Listing, TD200x Machine Types 3719, 3821, 3822, and 3823,” on page 237 to determine which
components are customer replaceable units (CRU) and which components are field replaceable units (FRU).
v If an action step is preceded by “(Trained service technician only),” that step must be performed only by a Trained
service technician.
Message
numberComponentTestStateDescriptionAction
089-801-xxxCPUCPU Stress
Test
089-802-xxxCPUCPU Stress
Test
AbortedInternal
program error.
AbortedSystem
resource
availability
error.
1. Turn off and restart the system.
2. Make sure that the DSA code is at the latest
level.
3. Run the test again.
4. Make sure that the system firmware is at the
latest level. The installed firmware level is shown
in the DSA log in the Firmware/VPD section for
this component. For more information, see
“Updating the firmware” on page 267.
5. Run the test again.
6. Turn off and restart the system if necessary to
recover from a hung state.
7. Run the test again.
8. Replace the following components one at a
time, in the order shown, and run this test again
to determine whether the problem has been
solved:
a. (Trained service technician only)
Microprocessor board
b. (Trained service technician only)
Microprocessor
9. If the failure remains, go to the Lenovo Web site
for more troubleshooting information at
http://www.lenovo.com/support.
1. Turn off and restart the system.
2. Make sure that the DSA code is at the latest
level.
3. Run the test again.
4. Make sure that the system firmware is at the
latest level. The installed firmware level is shown
in the DSA log in the Firmware/VPD section for
this component. For more information, see
“Updating the firmware” on page 267.
5. Run the test again.
6. Turn off and restart the system if necessary to
recover from a hung state.
7. Run the test again.
8. Replace the following components one at a
time, in the order shown, and run this test again
to determine whether the problem has been
solved:
a. (Trained service technician only)
Microprocessor board
b. (Trained service technician only)
Microprocessor
9. If the failure remains, go to the Lenovo Web site
for more troubleshooting information at
http://www.lenovo.com/support.