Submit comments about this document at: http://www.sun.com/hwdocs/feedback
Copyright 2005Sun Microsystems,Inc., 4150 Network Circle, SantaClara, California95054, U.S.A. All rights reserved.
Sun Microsystems, Inc.has intellectual property rights relating to technologythat isdescribed in this document. Inparticular, and without
limitation, theseintellectual propertyrights may include one ormore ofthe U.S. patents listed athttp://www.sun.com/patentsand one or
more additional patents orpending patent applications in theU.S. and in other countries.
This documentand the product to which it pertainsare distributedunder licenses restricting their use,copying, distribution,and
decompilation. Nopart of the product or of thisdocument may be reproduced in any formby anymeans without priorwritten authorizationof
Sun andits licensors, if any.
Third-party software, including fonttechnology, iscopyrighted andlicensed from Sun suppliers.
Parts ofthe productmay be derived from BerkeleyBSD systems,licensed from the University ofCalifornia. UNIXis a registered trademark in
the U.S.and in other countries, exclusivelylicensed throughX/Open Company,Ltd.
Sun, Sun Microsystems, the Sun logo,AnswerBook2, docs.sun.com,Java, OpenBoot,SunSolve, SunVTS,Sun Fire,and Solarisare trademarksor
registered trademarks of SunMicrosystems, Inc.in the U.S. and inother countries.
All SPARCtrademarks areused under license and are trademarksor registeredtrademarks ofSPARCInternational, Inc. in the U.S.and in other
countries. Products bearingSPARCtrademarks arebased upon an architecture developed by SunMicrosystems, Inc.
The OPENLOOK and Sun™ Graphical UserInterface wasdeveloped by SunMicrosystems, Inc.for its users and licensees.Sun acknowledges
the pioneeringefforts ofXerox inresearching anddeveloping the concept of visualor graphical user interfaces forthe computer industry.Sun
holds anon-exclusive license from Xeroxto the Xerox GraphicalUser Interface,which license also covers Sun’slicensees who implement OPEN
LOOK GUIsand otherwise comply with Sun’swritten licenseagreements.
U.S. GovernmentRights—Commercial use.Government users are subject tothe SunMicrosystems, Inc. standard license agreement and
applicable provisions ofthe FAR and its supplements.
DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES,
INCLUDING ANYIMPLIED WARRANTY OFMERCHANTABILITY,FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT,
ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Copyright 2005Sun Microsystems,Inc., 4150 Network Circle, SantaClara, Californie95054, Etats-Unis. Tous droits réservés.
Sun Microsystems, Inc.a les droits de propriété intellectuels relatants àla technologiequi est décrit dans cedocument. En particulier, etsans la
limitation, cesdroits depropriété intellectuelspeuvent inclure un ou plusdes brevetsaméricains énumérés à http://www.sun.com/patents et
un oules brevetsplus supplémentairesou les applicationsde breveten attente dans les Etats-Uniset dansles autres pays.
Ce produit oudocument est protégé par un copyright etdistribué avec des licences quien restreignentl’utilisation, lacopie, la distribution,et la
décompilation. Aucunepartie de ce produit ou document nepeut êtrereproduite sousaucune forme, par quelque moyenque ce soit, sans
l’autorisation préalableet écrite de Sun etde sesbailleurs de licence,s’il yena.
Le logicieldétenu par des tiers, etqui comprendla technologie relative aux polices de caractères,est protégépar un copyright et licenciépar des
fournisseurs deSun.
Des partiesde ce produit pourrontêtre dérivées des systèmes BerkeleyBSD licenciés par l’Université deCalifornie. UNIXest une marque
déposée auxEtats-Unis et dans d’autres pays et licenciéeexclusivement par X/Open Company, Ltd.
Sun, SunMicrosystems, lelogo Sun, AnswerBook2, docs.sun.com, Java,SunVTS, Sun Fire, et Solaris sont desmarques defabrique ou des
marques déposées de SunMicrosystems, Inc.aux Etats-Unis etdans d’autrespays.
Toutes lesmarques SPARCsont utiliséessous licence et sont desmarques defabrique ou desmarques déposéesde SPARCInternational, Inc.
aux Etats-Uniset dans d’autres pays. Les produits portantles marquesSPARCsont basés sur une architecture développée par Sun
Microsystems, Inc.
L’interfaced’utilisation graphiqueOPEN LOOK et Sun™ aété développée par Sun Microsystems,Inc. pourses utilisateurs et licenciés. Sun
reconnaît les efforts depionniers deXerox pour la rechercheet le développementdu conceptdes interfaces d’utilisationvisuelle ougraphique
pour l’industriede l’informatique. Sun détient unelicense nonexclusive de Xerox surl’interface d’utilisationgraphique Xerox, cette licence
couvrant égalementles licenciées de Sun quimettent enplace l’interface d’utilisation graphiqueOPEN LOOK et qui enoutre seconforment
aux licencesécrites de Sun.
LA DOCUMENTATION EST FOURNIE "EN L’ÉTAT" ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES
OU TACITESSONT FORMELLEMENT EXCLUES, DANS LAMESURE AUTORISEEPARLA LOIAPPLICABLE, Y COMPRIS NOTAMMENT
TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L’APTITUDE A UNE UTILISATION PARTICULIERE OU A
L’ABSENCE DE CONTREFAÇON.
Please
Recycle
Contents
Prefaceix
1.Sun Fire T2000 Server Overview1
Sun Fire T2000 Server Features2
Chip-Multitheaded (CMT) Multicore Processor and Memory Technology2
Performance Enhancements4
Remote Manageability With ALOM5
System Reliability, Availability, and Serviceability5
Hot-Pluggable and Hot-Swappable Components6
Power Supply Redundancy6
Fan Redundancy6
Environmental Monitoring7
Error Correction and Parity Checking7
Predictive Self Healing7
Chassis Identification9
Additional Service Related Information10
2.Sun Fire T2000 Server Diagnostics11
Overview of Sun Fire T2000 Server Diagnostics12
Using LEDs to Identify the State of Devices16
iii
Front and Rear Panel LEDs16
Hard Drive LEDs19
Power Supply LEDs20
Fan LEDs21
Blower Unit LED21
Using ALOM For Diagnosis and Repair Verification22
Running ALOM Service-Related Commands24
Connecting to ALOM24
Switching Between the System Console and ALOM24
Service-Related ALOM Commands25
▼To Run the showfaults Command26
▼To Run the showenvironment Command27
▼To Run the showfru Command29
Running POST31
Controlling How POST Runs31
▼To Change POST Parameters34
Reasons to Run POST35
Routine Sanity Check of the Hardware35
Diagnosing the System Hardware35
▼To Run POST35
Using the Solaris Predictive Self-Healing Feature40
▼To Use the fmdump Command to Identify Faults41
Collecting Information From Solaris OS Files and Commands43
▼To Check the Message Buffer43
▼To View System Message Log Files43
Managing Components with Automatic System Recovery (ASR) Commands44
▼To Run the showcomponent Command46
▼To Run the disablecomponent Command47
ivSun Fire T2000 Server Service Manual • October 2005
▼To Run the enablecomponent Command47
Exercising the System With SunVTS48
Checking Whether SunVTS Software Is Installed48
▼To Check Whether SunVTS Software Is Installed48
Exercising the System Using SunVTS Software50
▼To Exercise the System Using SunVTS Software50
For further information, refer to the manuals that accompany the SunVTS
software53
3.Replacing Hot-Swappable and Hot-Pluggable FRUs55
Devices That Are Hot-Swappable and Hot-Pluggable56
Hot-Swapping a Fan56
▼To Remove a Fan57
▼To Replace a Fan58
Hot-Swapping a Power Supply58
▼To Remove a Power Supply58
▼To Replace a Power Supply60
Hot-Swapping the Rear Blower61
▼To Remove the Rear Blower61
▼To Replace the Rear Blower61
Hot-Plugging a Hard Drive63
▼To Remove a Hard Drive63
▼To Replace a Hard Drive64
4.Replacing Cold Swappable FRUs65
Safety Information66
Safety Symbols66
Electrostatic Discharge Safety67
Use an Antistatic Wrist Strap67
Use an Antistatic Mat67
Contentsv
Common Procedures for Parts Replacement67
Required Tools68
▼To Shut the System Down68
▼To Extend the Server to the Maintenance Position69
viSun Fire T2000 Server Service Manual • October 2005
▼To Remove the SAS Disk Backplane99
▼To Replace the SAS Disk Backplane100
▼To Remove the Battery on the System Controller101
▼To Replace the Battery on the System Controller101
Common Procedures for Finishing Up103
▼To Replace the Top Front Cover and Front Bezel103
▼To Replace the Top Cover104
▼To Reinstall Server Chassis in the Rack104
▼To Return the Server to the Normal Rack Position105
▼To Apply Power to the Server107
5.Adding New Components and Devices109
Adding Hot-Pluggable and Hot-Swappable Devices110
▼To Add a Hard Drive to the Server110
▼To Add a USB Device111
Adding Components Inside the Chassis113
▼To Add DIMMs113
▼To Add a PCI-E or PCI-X Card116
A.Field-Replaceable Units119
Contentsvii
viiiSun Fire T2000 Server Service Manual • October 2005
Preface
The Sun Fire T2000 Service Manual provides information to aid in diagnosing
hardware problems and describes how to replace components within the Sun Fire™
T2000 server. This guide also describes how to add components such as hard drives
and memory to the server.
This manual is written for technicians, service personnel, and system administrators
who service and repair computer systems. The person qualified to use this manual:
■ Can open a system chassis, identify, and replace internal components.
■ Understands the Solaris Operating System and the command-line interface.
■ Has superuser privileges for the system being serviced.
This guide is organized into the following chapters:
Chapter 1 describes the main features of the Sun Fire T2000 server.
Chapter 2 describes the diagnostics that are available for monitoring and diagnosing
the Sun Fire T2000 server.
Chapter 3 explains how to remove and replace hot-swappable and hot-pluggable
field replaceable units (FRUs).
Chapter 4 describes how to remove and replace the FRUs that cannot be hot-
swapped.
Chapter 5 explains how to add new components such as hard drives, memory, and
PCI cards to the Sun Fire T2000 server.
Appendix A provides an illustrated breakdown of parts and lists the field
replaceable units (FRUs).
x Sun Fire T2000 Server Service Manual • October 2005
Sun Fire T2000 Server Documentation
You can view and print the following manuals from the Sun documentation web site
at: http://www.sun.com/documentation
Part
TitleDescription
Number
Sun Fire T2000 Server Site Planning
Guide
Sun Fire T2000 Server Product NotesLate-breaking information about the
Sun Fire T2000 Server OverviewOverview of the features of this server819-2543
Sun Fire T2000 Server Getting
Started Guide
Sun Fire T2000 Server Installation
Guide
Sun Fire T2000 Server
Administration Guide
Sun Fire T2000 Server Advanced
Lights Out Manager (ALOM) Guide
Site planning information for the
Sun Fire T2000 server
server.
Information about where to find
documentation to get your system
installed and running quickly
Detailed rackmounting, cabling, poweron, and configuration information
How to perform administrative tasks that
are specific to the Sun Fire T2000 server
How to use the Advanced Lights Out
Manager (ALOM) software on the
Sun Fire T2000 server
819-2545
819-2544
819-2542
819-2546
819-2549
819-2550
Prefacexi
Typographic Conventions
Typeface
AaBbCc123The names of commands, files,
AaBbCc123
AaBbCc123Book titles, new words or terms,
1 The settings on your browser might differ from these settings.
1
MeaningExamples
Edit your.login file.
and directories; on-screen
computer output
What you type, when contrasted
with on-screen computer output
words to be emphasized.
Replace command-line variables
with real names or values.
Use ls -a to list all files.
% You have mail.
% su
Password:
Read Chapter 6 in the User’s Guide.
These are called class options.
You must be superuser to do this.
To delete a file, type rm filename.
Shell Prompts
ShellPrompt
C shellmachine-name%
C shell superusermachine-name#
Bourne shell and Korn shell$
Bourne shell and Korn shell superuser#
Accessing Sun Documentation
You can view, print, or purchase a broad selection of Sun documentation, including
localized versions, at:
http://www.sun.com/documentation
xii Sun Fire T2000 Server Service Manual • October 2005
Third-Party Web Sites
Sun is not responsible for the availability of third-party web sites mentioned in this
document. Sun does not endorse and is not responsible or liable for any content,
advertising, products, or other materials that are available on or through such sites
or resources. Sun will not be responsible or liable for any actual or alleged damage
or loss caused by or in connection with the use of or reliance on any such content,
goods, or services that are available on or through such sites or resources.
Contacting Sun Technical Support
If you have technical questions about this product that are not answered in this
document, go to:
http://www.sun.com/service/contacting
Sun Welcomes Your Comments
Sun is interested in improving its documentation and welcomes your comments and
suggestions. You can submit your comments by going to:
http://www.sun.com/hwdocs/feedback
Please include the title and part number of your document with your feedback:
Sun Fire T2000 Server Service Manual, part number 819-2548-10
Prefacexiii
xiv Sun Fire T2000 Server Service Manual • October 2005
CHAPTER
1
Sun Fire T2000 Server Overview
This chapter provides an overview of the features of the Sun Fire T2000 server.
The following topics are covered:
■ “Sun Fire T2000 Server Features” on page 2
■ “Chassis Identification” on page 9
1
Sun Fire T2000 Server Features
The Sun Fire T2000 server is a high-performance entry-level server that is highly
scalable and extremely reliable.
FIGURE 1-1 Sun Fire T2000 Server
Chip-Multitheaded (CMT) Multicore Processor
and Memory Technology
The UltraSPARC®T1 multicore processor is the basis of the Sun Fire T2000 server.
The UltraSPARC T1 processor is based on chip multithreading (CMT) technology
that is optimized for highly threaded transactional processing. The UltraSPARC T1
processor improves throughput while using less power and dissipating less heat
than conventional processor designs.
Depending on the model purchased, the processor has four, six, or eight
UltraSPARC cores. Each core equates to a 64-bit execution pipeline capable of
running four threads. The result is that the 8-core processor handles up to 32 active
threads concurrently.
2Sun Fire T2000 Server Service Manual • October 2005
Additional processor components, such as L1 cache, L2 cache, memory access
crossbar, DDR2 memory controllers, and a JBus I/O interface have been carefully
tuned for optimal performance.
UltraSPARC T1 multicore processor
FIGURE 1-2 Motherboard and UltraSPARC T1 Multicore Processor
Chapter 1 Sun Fire T2000 Server Overview3
Performance Enhancements
The Sun Fire T2000 server introduces several new technologies with its sun4v
architecture and multithreaded UltraSPARC T1 multicore processor.
Some of these enhancements are:
■ Large page optimization
■ Reduction on TLB misses
■ Optimized block copy
TABLE 1-1 lists feature specifications for the Sun Fire T2000 server.
TABLE 1-1Sun Fire T2000 System Features at a Glance
FeatureDescription
Processor1 UltraSPARC T1 multicore processor (4, 6, or 8 cores)
Memory16 slots that can be populated with one of the following types of
and 8x width cards)
2 PCI-X slots for 64-bit 133 MHz low-profile cards
Note: One PCI-X slot is occupied by a SAS disk controller card.
Power2 hot-swappable and redundant power supplies
Remote
management
ALOM management controller with a serial and 10/100 Mb Ethernet
port
FirmwareOpenBoot Prom (OBP) for reset and POST support
ALOM for remote management administration
*
Operating system
Solaris 10 3/05 HW2 Operating System preinstalled on disk 0
Other softwareJava™ Enterprise System with a 90-day trial license
* Check the Sun Fire T2000 ProductNotes for the latest informationabout supported releases of the Solaris OS.
4Sun Fire T2000 Server Service Manual • October 2005
Remote Manageability With ALOM
The Sun Advanced Lights Out Manager (ALOM) feature is a system controller (SC)
that enables you to remotely manage and administer the Sun Fire T2000 server.
The ALOM software is preinstalled as firmware, and it initializes as soon as you
apply power to the system. You can customize ALOM to work with your particular
installation.
ALOM enables you to monitor and control your server over a network, or by using
a dedicated serial port for connection to a terminal or terminal server. ALOM
provides a command-line interface that you can use to remotely administer
geographically distributed or physically inaccessible machines. In addition, ALOM
enables you to run diagnostics (such as POST) remotely that would otherwise
require physical proximity to the server’s serial port.
You can configure ALOM to send email alerts of hardware failures, hardware
warnings, and other events related to the server or to ALOM. The ALOM circuitry
runs independently of the server, using the server’s standby power. Therefore,
ALOM firmware and software continue to function when the server operating
system goes offline or when the server is powered off. ALOM monitors the
following Sun Fire T2000 server components:
■ CPU temperature conditions
■ Hard drive status
■ Enclosure thermal conditions
■ Fan speed and status
■ Power supply status
■ Voltage levels
■ Faults detected by POST (Power-On Self-Test)
■ Solaris Predictive Self Healing (PSH) diagnostic facilities
For information about configuring and using the ALOM system controller, refer to
the Sun Fire T2000 Server Advanced Lights Out Manager (ALOM) Guide.
System Reliability, Availability, and Serviceability
Reliability, availability, and serviceability (RAS) are aspects of a system’s design that
affect its ability to operate continuously and to minimize the time necessary to
service the system. Reliability refers to a system’s ability to operate continuously
without failures and to maintain data integrity. System availability refers to the
ability of a system to recover to an operational state after a failure, with minimal
impact. Serviceability relates to the time it takes to restore a system to service
following a system failure. Together, reliability, availability, and serviceability
features provide for near continuous system operation.
Chapter 1 Sun Fire T2000 Server Overview5
To deliver high levels of reliability, availability, and serviceability, the Sun Fire
T2000 server offers the following features:
■ Hot-pluggable hard drives
■ Redundant, hot-swappable power supplies (two)
■ Redundant hot-swappable fan units (three)
■ Environmental monitoring
■ Error detection and correction for improved data integrity
■ Easy access for most component replacements
■ Extensive POST tests that automatically deletes faulty components from the
configuration.
■ PSH automated run time diagnosis capability that takes faulty components off
line.
For more information about using RAS features, refer to the Sun Fire T2000 ServerSystem Administration Guide.
Hot-Pluggable and Hot-Swappable Components
Sun Fire T2000 hardware supports hot-plugging or hot-swapping of the chassismounted hard drives, fans, power supplies, and the rear blower. Using the proper
software commands, you can install or remove these components while the system is
running. Hot-plug and hot-swap technology significantly increases the system’s
serviceability and availability by providing the ability to replace hard drives, fan
units, rear blower, and power supplies without service disruption.
Power Supply Redundancy
The Sun Fire T2000 server features two hot-swappable power supplies which enable
the system to continue operating should one of the power supplies fail or if one
power source fails.
The Sun Fire T2000 server also has a single hot-swappable blower unit that works in
conjunction with the power supply fans to provide cooling for the internal disk
drives. If the blower unit fails, the two power supply fan units provide enough
cooling for the disk drive bay to keep the system running.
Fan Redundancy
The Sun Fire T2000 server features three hot-swappable system fans. Multiple fans
enable the system to continue operating with adequate cooling in the event that one
of the fans fails.
6Sun Fire T2000 Server Service Manual • October 2005
Environmental Monitoring
The Sun Fire T2000 server features an environmental monitoring subsystem
designed to protect the server and its components against:
■ Extreme temperatures
■ Lack of adequate airflow through the system
■ Power supply failures
■ Hardware faults
Temperature sensors located throughout the system monitor the ambient
temperature of the system and internal components. The software and hardware
ensure that the temperatures within the enclosure do not exceed predetermined safe
operating ranges. If the temperature observed by a sensor falls below a lowtemperature threshold or rises above a high-temperature threshold, the monitoring
subsystem software lights the amber Service Required LEDs on the front and back
panel. If the temperature condition persists and reaches a critical threshold, the
system initiates a graceful system shutdown.
All error and warning messages are sent to the system controller (SC), console, and
are logged in the ALOM log file. Additionally, some FRUs such as power supplies
provide LEDs that indicate a failure within the FRU.
Error Correction and Parity Checking
The UltraSPARC T1 multicore processor provides parity protection on its internal
cache memories, including tag parity and data parity on the D-cache and I-cache.
The internal 3MB L2 cache has parity protection on the tags, and ECC protection of
the data.
Advanced ECC, also called chipkill, corrects up to 4-bits in error on nibble
boundaries, as long as they are all in the same DRAM. If a DRAM fails, the DIMM
continues to function.
Predictive Self Healing
The Sun Fire T2000 server features the latest fault management technologies. With
the Solaris 10 Operating System (OS), Sun is introducing a new architecture for
building and deploying systems and services capable of predictive self-healing. Selfhealing technology enables Sun systems to accurately predict component failures
and mitigate many serious problems before they actually occur. This technology is
incorporated into both the hardware and software of the Sun Fire T2000 server.
Chapter 1 Sun Fire T2000 Server Overview7
At the heart of the predictive self-healing capabilities is the Solaris Fault Manager, a
service that receives data relating to hardware and software errors, and
automatically and silently diagnoses the underlying problem. Once a problem is
diagnosed, a set of agents automatically responds by logging the event, and if
necessary, takes the faulty component offline. By automatically diagnosing
problems, business-critical applications and essential system services can continue
uninterrupted in the event of software failures, or major hardware component
failures.
8Sun Fire T2000 Server Service Manual • October 2005
Chassis Identification
FIGURE 1-3 and FIGURE 1-4 show the physical characteristics of the Sun Fire T2000
server.
Indicators and buttons
USB ports
3
2
FIGURE 1-3 Sun Fire T2000 Server Front Panel
SC serial mgt
port
port
Drive 2
Drive 0
SC net mgt
port
Drive 3
Drive 1
GBE ports
2
3
0 1
DVD drive
Hard drives
PCI-X slotsTTYA serial
PowerPower
FIGURE 1-4 Sun Fire T2000 Server Rear Panel
Slot 0
PCI-E slot
supply 1supply 0
Indicators
USB ports
1
Slot 2
Slot 1
Slot 0
PCI-E slots
Slot 1
0
Chapter 1 Sun Fire T2000 Server Overview9
Additional Service Related Information
In addition to this service manual, the following resources are available to help you
keep your server running optimally:
■ Product Notes – The Sun Fire T2000 Server Product Notes (819-2544) contain late
breaking information about the system including required software patches,
updated hardware and compatibility information, and solutions to know issues.
The product notes are available online at:
http://www.sun.com/documentation
■ Release Notes – The Solaris OS release Notes contain important information
about the Solaris OS. The release notes are available online at:
http://www.sun.com/documentation
■ SunSolve Online – Provides a collection of support resources. Depending on the
level of your service contract, you have access to Sun patches, the Sun System
Handbook, the SunSolve™ knowledge base, the Sun Support Forum, and
additional documents, bulletins, and related links. Access this site at:
http://sunsolve.sun.com
■ Predictive Self-Healing Knowledge Database – You can access the knowledge
article corresponding to a self-healing message by taking the Sun Message
Identifier (SUNW-MSG-ID) and entering it into the field on this page:
http://www.sun.com/msg
10Sun Fire T2000 Server Service Manual • October 2005
CHAPTER
2
Sun Fire T2000 Server Diagnostics
This chapter describes the diagnostics that are available for monitoring and
troubleshooting the Sun Fire T2000 server. This chapter does not provide
troubleshooting methods, but instead describes the Sun Fire T2000 server
diagnostics facilities and describes how to use them.
This chapter is intended for technicians, service personnel, and system
administrators who service and repair computer systems.
The following topics are covered:
■ “Overview of Sun Fire T2000 Server Diagnostics” on page 12
■ “Using LEDs to Identify the State of Devices” on page 16
■ “Using ALOM For Diagnosis and Repair Verification” on page 22
■ “Running POST” on page 31
■ “Using the Solaris Predictive Self-Healing Feature” on page 40
■ “Collecting Information From Solaris OS Files and Commands” on page 43
■ “Managing Components with Automatic System Recovery (ASR) Commands” on
page 44
■ “Exercising the System With SunVTS” on page 48
11
Overview of Sun Fire T2000 Server
Diagnostics
There are a variety of diagnostic tools, commands, and indicators you can use to
monitor and troubleshoot a Sun Fire T2000 server:
■ LEDs – provide a quick visual notification of the status of the server and of some
of the FRUs.
■ ALOM firmware –This system firmware runs on the system controller. In
addition to providing the interface between the hardware and OS, ALOM also
tracks and reports the health of key server components. ALOM works closely
with POST and Solaris predictive self-healing technology to keep the system up
and running even when there is a faulty component.
■ Power-on self-test (POST) – POST performs diagnostics on system components
upon system reset to ensure the integrity of those components. POST is
configureable and works with ALOM to take faulty components offline if needed.
■ Solaris OS predictive self healing (PSH) – This technology continuously monitors
the health of the CPU and memory, and works with ALOM to take a faulty
component offline if needed. The predictive self-healing technology enables Sun
systems to accurately predict component failures and mitigate many serious
problems before they occur.
■ Log files and console messages – Provide the standard Solaris OS log files and
investigative commands that can be accessed and displayed on the device of your
choice.
■ SunVTS – An application that exercises the system, provides hardware validation,
and discloses possible faulty components with recommendations for repair.
The LEDs, ALOM, Solaris OS PSH, and many of the log files and console messages
are integrated. For example, a fault detected by the Solaris software will display the
fault, log it, pass information to ALOM where it is logged, and depending on the
fault, might light one or more LEDs.
The diagnostic flowchart in
FIGURE 2-1 and TABLE 2-1 describe an approach for using
the server diagnostics to identify a faulty field replaceable unit (FRU). The
diagnostics you use, and the order in which you use them, depend on the nature of
the problem you are troubleshooting, so you might perform some actions and not
others.
The flowchart assumes that you have already performed some rudimentary
troubleshooting such as verification of proper installation, visual inspection of cables
and power, and possibly performed a reset of the server (refer to the Sun Fire T2000Server Installation Guide and Sun Fire T2000 Server Administration Guide for details).
12Sun Fire T2000 Server Service Manual • October 2005
Faulty
hardware
suspected
Use this flow chart to understand what diagnostics are available to troubleshoot
faulty hardware, and use
TABLE 2-1 to find more information about each diagnostic in
this chapter.
Numbers in this flow chart correspond
to the Action numbers in Table 2-1.
1.
Are any
faults reported by
the showfaults
command?
Ye s
2.
Is a
fault message
ID (MSG-ID)
displayed?
Ye s
3. Enter the
message ID into
the Sun Knowl-
edge Article
web site for
recommended
actions
6.
Do the
Solaris logs
NoNo
5.
Do fault
NoNo
LEDs indicate
a faulty
FRU?
Ye s
9.
Replace Faulty
FRU
Ye s
Ye s
indicate a
faulty FRU?
Ye s
7.
Does POST
report any faulty
devices?
No
8.
Does SunVTS
report any faulty
devices?
4.
Did the
article recom-
mend a FRU
replacement?
No
FIGURE 2-1 Diagnostic Flow Chart
Ye s
10
Verify the
repair
No
11
Perform recommended corrective
actions. If needed,
contact Sun for
Support
Chapter 2 Sun Fire T2000 Server Diagnostics13
TABLE 2-1Diagnostic Flow Chart Actions
Action
No.Diagnostic ActionResulting Action
1.
Run the ALOM
showfaults
command.
The showfaults command displays faults
detected by the system firmware.
• If faults are displayed, go to Action 2.
• If no faults are displayed, go to Action 6.
2.
Check fault
message for a Sun
Message ID.
Sun Message IDs (SUNW-MSG-ID) indicate that
information is available from Sun’s knowledge
article database.
• If you have a message ID number, go to Action 3.
• If you do not have a message ID number, go to
Action 5.
3.
Enter the Sun
Message ID into
the Sun
Enter the Sun Message ID number into the
knowledge article web site at:
http:www.sun.com/msg and go to Action 4.
Knowledge
Article web site.
4.
Analyze the
suggested actions.
In some cases, fault related messages are identified
with suggested actions.
• If the suggested action recommends replacing a
FRU, go to Action 9.
If the suggested action does not recommend
replacing a FRU, perform the suggested action.
Contact Sun for additional support, if needed
For more information, see
these sections
“To Run the showfaults
Command” on page 26
“Using the Solaris
Predictive Self-Healing
Feature” on page 40
Sun Support information:
http://www.sun.com/
service/contacting
5.
Do any of the
fault LEDs
indicate a faulty
FRU?
The first LED to check is the Service Required LED.
Additional LEDs on specific FRUs (fans, blower,
power supplies, and hard disk drives) can pinpoint
the faulty FRU.
• If an LED indicates a faulty FRU, go to Action 9.
• If FRU LEDs do not indicate a fault, go to
Action 6.
6.
Check the Solaris
log files for fault
information.
The Solaris message buffer and log files record
system events and can provide information about
faults.
• If system messages indicate a faulty device,
replace the FRU (Action 9).
• To obtain more diagnostic information, got to
Action 7.
14Sun Fire T2000 Server Service Manual • October 2005
“Using LEDs to Identify
the State of Devices” on
page 16
“Collecting Information
From Solaris OS Files and
Commands” on page 43
Run POST.POST performs basic tests of the server components
and reports faulty FRUs.
• If POST indicates a faulty FRU, replace the FRU
(Action 9).
• If POST does not indicate a faulty FRU, go to
Action 8
8.
Run SunVTS.SunVTS provides tests used to exercise and
diagnose FRUs. To run SunVTS, the server must be
running the Solaris OS.
• If SunVTS reports a faulty device replace the
FRU (Action 9).
• If SunVTS does not report a faulty device, go to
Action 11.
9.
Replace faulty
FRU.
The fans, blower, power supplies, and hard drives
are hot-swappable.
The other FRUs require that you shut down the
server to perform a cold-swap.
After replacing the faulty FRU, go to Action 10.
For more information, see
these sections
“Running POST” on
page 31
“Exercising the System
With SunVTS” on page 48
“Replacing HotSwappable and HotPluggable FRUs” on
page 55
“Replacing Cold
Swappable FRUs” on
page 65
10.
11.
Verify the repair.Various commands and utilities can be used to
verify the functionality of the system components.
Two useful commands are:
• The ALOM showfaults command
• The ASR showcomponents command
If the FRU is blacklisted, you can manually remove
it from the black list with the enablecomponent
command.
If the fault is cleared, and the component is not
blacklisted, the repair is verified well enough to
boot the server. For added assurance, you can run
the SunVTS diagnostic software.
Contact Sun for
Support.
The majority of hardware faults are detected by the
server’s diagnostics. In rare cases it is possible that
a problem requires additional troubleshooting. If
you are unable to determine the cause of the
problem, contact Sun for support.
“To Run the showfaults
Command” on page 26
“Managing Components
with Automatic System
Recovery (ASR)
Commands” on page 44
“Exercising the System
With SunVTS” on page 48
Sun Support information:
http://www.sun.com/
service/contacting
Chapter 2 Sun Fire T2000 Server Diagnostics15
Using LEDs to Identify the State of
Devices
The Sun Fire T2000 server provides the following groups of LEDs:
■ Front and Rear Panel LEDs (TABLE 2-2)
■ Power Supply LEDs (TABLE 2-4)
■ Fan LEDs (TABLE 2-5)
■ Hard Drive LEDs (TABLE 2-3)
These LEDs provide a quick visual check of the state of the system.
Front and Rear Panel LEDs
The six front panel LEDs (FIGURE 2-2) are located in the upper left corner of the
server chassis. Three of these LEDs are also provided on the rear panel (
FIGURE 2-3).
Locator
LED/button
FIGURE 2-2 Front Panel LEDs
Service
Required
LED
Power OK
LED
Power
On/Off
button
Rear-FRUFault
Top Fan
LED
LED
Over Temp
LED
16Sun Fire T2000 Server Service Manual • October 2005
Locator
LED/button
Service
Required
Power OK
LED
LED
FIGURE 2-3 Rear Panel LEDs
TABLE 2-2 lists and describes the front and rear panel LEDs.
TABLE 2-2Front and Rear Panel LEDs
LEDColorDescription
Locator
*
and
LED
button
Service
Required
LED*
Power OK
LED*
WhiteEnables you to identify a particular server. The LED is activated
using one of the following methods:
• Issuing the setlocator on or off command.
• Pressing the button to toggle the indicator on or off.
This LED provides the following indications:
• Off– Normal operating state.
• Fast blink – The server received a signal as a result of one of the
preceding methods and is indicating here I am.
AmberIf on, indicates that service is required. The ALOM showfaults
command provides details about any faults that cause this
indicator to be lit.
GreenThe LED provides the following indications:
• Off – The system is unavailable. Either it has no power or
ALOM is not running.
• Steady on – Indicates that the system is powered on and is
running it its normal operating state.
• Standby blink – Indicates that the service processor is running
while the system is running at a minimum level in standby
mode and ready to be returned to its normal operating state.
• Slow blink – Indicates that a normal transitory activity is taking
place. It might mean that the system diagnostics are running, or
that the system is booting.
Chapter 2 Sun Fire T2000 Server Diagnostics17
TABLE 2-2
LEDColorDescription
Front and Rear Panel LEDs (Continued)
Power
on/off
button
Turns the host system on and off. This button is recessed to
prevent accidental server power-off. Use the tip of a pen to operate
this button.
Top fan LEDAmberProvides the following operational fan indications:
• Off – Indicates a steady state, no service action is required.
• Steady on – Indicates a fan failure event has been acknowledged
and a service action is required on at least one of the three fans.
Use the fan LEDs to determine which fan requires service.
Rear-FRU
FAULT LED
AmberProvides the following indications:
• Off – Indicates a steady state, no service action is required.
• Steady on – Indicates a failure of a rear-access FRU (a power
supply or the rear blower). Use the FRU LEDs to determine
which FRU requires service.
OverTemp
LED
AmberProvides the following operational temperature indications:
• Off – Indicates a steady state, no service action is required.
• Steady on – Indicates a temperature failure event has been
acknowledged and a service action is required. View the ALOM
reports for further information on this event.
* Provided on the front and rear panel, otherwise the LED is only located on the front panel.
18Sun Fire T2000 Server Service Manual • October 2005
Hard Drive LEDs
The hard drive LEDs (FIGURE 2-4 and TABLE 2-3) are located on the front of each hard
drive that is installed in the Sun Fire T2000 server chassis.
OK to
Remove
unused
Activity
FIGURE 2-4 Hard Drive LEDs
TABLE 2-3Hard Drive LEDs
LEDColorDescription
OK to
Remove
UnusedAmber
ActivityGreenOn – Drive is receiving power. Solidly lit if drive is idle. Flashes
BlueOn – The drive is ready for hot-plug removal.
Off – Normal operation.
while the drive processes a command.
Off – Power is off.
Chapter 2 Sun Fire T2000 Server Diagnostics19
Power Supply LEDs
The power supply LEDs (FIGURE 2-5 and TABLE 2-4) are located on the back of each
power supply.
Power OK
Failure
AC OK
FIGURE 2-5 .Power Supply LEDs
TABLE 2-4Power Supply LEDs
LEDColorDescription
Power OKGreenOn – Normal operation. DC output voltage is within normal limits.
Off – Power is off.
FailureAmberOn – Power supply has detected a failure.
Off – Normal operation.
AC OKGreenOn – Normal operation. Input power is within normal limits.
Off – No input voltage, or input voltage is below limits.
.
20Sun Fire T2000 Server Service Manual • October 2005
Fan LEDs
The fan LEDs are located on the top of each fan unit (TABLE 2-5). These LEDs are
visible when you open the top fan door.
TABLE 2-5Fan LEDs
LEDColorDescription
Fan LEDsAmberOn – This fan is faulty.
Off – Normal operation.
Note: When a fan fault is detected the front panel Top Fan LED is
lit.
Blower Unit LED
The blower unit LED is located on the back of the blower unit and visible from the
rear of the server (
TABLE 2-6Blower Unit LED
TABLE 2-6).
LEDColorDescription
Blower Unit
LED
AmberOn – The blower unit is faulty.
Off – Normal operation.
Note: When a blower fault is detected the Rear-FRU Fault LED is
lit.
Chapter 2 Sun Fire T2000 Server Diagnostics21
Using ALOM For Diagnosis and Repair
Verification
The Sun Advanced Lights Out Manager (ALOM) is a system controller in the Sun
Fire T2000 server that enables you to remotely manage and administer your server.
ALOM enables you to run diagnostics remotely such as power-on self test (POST),
that would otherwise require physical proximity to the server’s serial port. You can
also configure ALOM to send email alerts of hardware failures, hardware warnings,
and other events related to the server or to ALOM.
The ALOM circuitry runs independently of the server, using the server’s standby
power. Therefore, ALOM firmware and software continue to function when the
server operating system goes offline or when the server is powered off.
Note – Refer to the Sun Fire T2000 Server Advanced Lights Out Manager (ALOM)
Guide for comprehensive ALOM information.
Faults detected by ALOM, POST, and the Solaris Predictive Self-healing (PSH)
technology are forwarded to the ALOM for fault handling (
In the event of a system fault, ALOM ensures that the Service Required LED is lit,
FRU ID PROMs are updated, the fault is logged, and alerts are displayed.
FIGURE 2-6).
FIGURE 2-6 ALOM Fault Management
ALOM sends alerts to all ALOM users that are logged in, sending the alert through
email to a configured email address, and writing the event to the ALOM event log.
22Sun Fire T2000 Server Service Manual • October 2005
Service Required LED
FRU LEDs
FRUID PROMs
Logs
Alerts
ALOM can detect when a fault is no longer present and clears the fault in several
ways:
■ Fault recovery – The system automatically detects that the fault condition is no
longer present. ALOM extinguishes the Service Required LED and updates the
FRU’s PROM, indicating that the fault is no longer present.
■ Fault repair – The fault has been repaired by human intervention. In most cases,
ALOM detects the repair and extinguishes the Service Required LED In the event
that ALOM does not perform these actions, you must perform these tasks
manually with clearfault or enablecomponent commands.
ALOM can detect the removal of a FRU, in many cases even if the FRU is removed
while ALOM is powered off. This enables ALOM to know that a fault, diagnosed to
a specific FRU, has been repaired. The ALOM clearfault command enables you
to manually clear certain types of faults without a FRU replacement or if ALOM was
unable to automatically detect the FRU replacement.
ALOM does not automatically detect hard drive replacement.
Many environmental faults can automatically recover. A temperature that is
exceeding a threshold may return to normal limits. An unplugged a power supply
can be plugged in and so on. Recovery of environmental faults is automatically
detected. Recovery events are reported using one of two forms:
■ fru at location is OK.
■ sensor at location is within normal range.
Environmental faults can be repaired through hot removal of the faulty FRU. FRU
removal is automatically detected by the environmental monitoring and all faults
associated with the removed FRU are cleared. The message for that case, and the
alert sent for all FRU removals is:
fru at location has been removed.
There is no ALOM command to manually repair an environmental fault.
ALOM does not handle hard drive faults. Use the Solaris message files to view hard
drive faults. See “Collecting Information From Solaris OS Files and Commands” on
page 43.
Chapter 2 Sun Fire T2000 Server Diagnostics23
Running ALOM Service-Related Commands
This section describes the ALOM commands that are commonly used for servicerelated activities.
Connecting to ALOM
Before you can run ALOM commands, you must connect to the ALOM. There are
several ways to connect to the system controller:
■ Connect an ASCII terminal directly to the serial management port.
■ Use the telnet command to connect to ALOM through an Ethernet connection
on the network management port.
■ Connect an external modem to the network management port and dial-in to the
modem.
Note – Refer to the Sun Fire T2000 Server Advanced Lights Out Manager (ALOM)
Guide for instructions on configuring and connecting to ALOM.
Switching Between the System Console and ALOM
■ To switch from the console output to the ALOM sc> prompt, type #. (Pound
Period).
■ To switch from the sc> prompt to the console, type console.
24Sun Fire T2000 Server Service Manual • October 2005
Service-Related ALOM Commands
TABLE 2-7 describes the typical ALOM commands for servicing a Sun Fire T2000
server. For descriptions of all ALOM commands, issue the help command or refer
to the Sun Fire T2000 Server Advanced Lights Out Management (ALOM) Guide.
TABLE 2-7Service-Related ALOM Commands
ALOM CommandDescription
help [command]Displays a list of all ALOM commands with syntax and descriptions.
Specifying a command name as an option displays help for that command.
clearfault UUIDManually clears system faults. UUID is the unique fault ID of the fault to
be cleared.
powercycle [-f]Performs a poweroff followed by poweron. The -f option forces an
immediate poweroff, otherwise the command attempts a graceful
shutdown.
poweroff [-y][-f]Removes the main power from the host server. The -y option enables you
to skip the confirmation question. The -f option forces an immediate
shutdown.
poweronApplies the main power to the host server.
removefruIndicates if it is OK to perform a hot-swap of a power supply.
reset [-y]Generates a hardware reset on the host server. The -y option enables you
to skip the confirmation question.
resetsc [-y]Reboots the system controller. The -y option enables you to skip the
confirmation question.
setkeyswitch [normal | stby
| diag | locked]
setlocator [on | off]Turns the Locator LED on the server on or off.
showenvironmentDisplays the environmental status of the host server. This information
showfaults [-v]Displays current system faults. See “To Run the showfaults Command”
showfru [-g lines][-s | -d]
[FRU]
Sets the virtual keyswitch.
includes system temperatures, power supply, front panel LED, hard drive,
fan, voltage, and current sensor status. See “To Run the
showenvironment Command” on page 27.
on page 26.
Displays information about the FRUs in the server.
• The -g lines option specifies the number of lines to display before
pausing the output to the screen.
• The -s option displays static information about system FRUs (defaults
to all FRUs, unless one is specified).
• The -d displays dynamic information about system FRUs (defaults to all
FRUs, unless one is specified). See “To Run the showfaults Command”
2. Use the Sun message ID to obtain more information about the fault.
In a browser, go to the Predictive Self-Healing Knowledge Article web site:
http://www.sun.com/msg and enter the Sun message ID in the lookup field.
26Sun Fire T2000 Server Service Manual • October 2005
▼ To Run the showenvironment Command
The showenvironment command displays a snapshot of the server’s
environmental status. This command displays system temperatures, hard disk drive
status, power supply and fan status, front panel LED status, voltage and current
sensors. The output uses a format similar to the Solaris OS command prtdiag (1m).
● At the sc> prompt, type the showenvironment command.
The output differs according to your system’s model and configuration.
Example:
sc> showenvironment
=============== Environmental Status ===============
-MB/V_+1V5 OK 1.48 1.36 1.39 1.60 1.63
MB/V_VMEML OK 1.78 1.69 1.72 1.87 1.90
MB/V_VMEMR OK 1.78 1.69 1.72 1.87 1.90
MB/V_VTTL OK 0.87 0.84 0.86 0.93 0.95
MB/V_VTTR OK 0.87 0.84 0.86 0.93 0.95
MB/V_+3V3STBY OK 3.33 3.13 3.16 3.53 3.59
MB/V_VCORE OK 1.30 1.20 1.24 1.36 1.39
IOBD/V_+1V5 OK 1.48 1.27 1.35 1.65 1.72
IOBD/V_+1V8 OK 1.78 1.53 1.62 1.98 2.07
IOBD/V_+3V3MAIN OK 3.38 2.80 2.97 3.63 3.79
IOBD/V_+3V3STBY OK 3.33 2.80 2.97 3.63 3.79
IOBD/V_+1V OK 1.11 0.93 0.99 1.21 1.26
IOBD/V_+1V2 OK 1.17 1.02 1.08 1.32 1.38
IOBD/V_+5V OK 5.09 4.25 4.50 5.50 5.75
IOBD/V_-12V OK -12.11 -13.80 -13.20 -10.80 -10.20
IOBD/V_+12V OK 12.18 10.20 10.80 13.20 13.80
SC/BAT/V_BAT OK 3.03 -- 2.69 -- --
----------------------------------------------------------System Load (in amps):
----------------------------------------------------------Sensor Status Load Warn Shutdown
----------------------------------------------------------MB/I_VCORE OK 25.280 80.000 88.000
MB/I_VMEML OK 4.680 60.000 66.000
MB/I_VMEMR OK 4.680 60.000 66.000
28Sun Fire T2000 Server Service Manual • October 2005
-----------------------------------------------------------------------------Supply Status Underspeed Overtemp Overvolt Undervolt
Overcurrent
-----------------------------------------------------------------------------PS0 OK OFF OFF OFF OFF OFF
PS1 OK OFF OFF OFF OFF OFF
sc>
Note – Some environmental information might not be available when the server is
in standby mode.
▼ To Run the showfru Command
The showfru command displays information about the FRUs in the server. Use this
command to see information about an individual FRU, or for all the FRUs.
Note – By default, the output of the showfru command for all FRUs is very long.
Chapter 2 Sun Fire T2000 Server Diagnostics29
● At the sc> prompt, enter the showfru command.
In the following example, the showfru command is used to get information about
the motherboard (MB).
30Sun Fire T2000 Server Service Manual • October 2005
Running POST
Power on self test (POST) is a group of PROM-based tests that run when the server
is powered on or reset. POST checks the basic integrity of the critical hardware
components in the server (CPU, memory, and I/O buses).
If POST detects a faulty component, it is disabled automatically, preventing faulty
hardware from potentially harming any software. If the system is capable of running
without the disabled component, the system will boot when POST is complete. For
example, if one of the processor cores is deemed faulty by POST, the core will be
disabled, and the system will boot and run using the remaining cores.
POST faults are automatically repaired if the fault is not detected on subsequent
POST runs. Any devices that pass POST are enabled, even if they were previously
disabled. Devices can be manually enabled or disabled using ASR commands (see
“Managing Components with Automatic System Recovery (ASR) Commands” on
page 44).
Controlling How POST Runs
The server can be configured for normal, extensive, or no POST execution. You can
also control the level of tests that run, the amount of POST output that is displayed,
and which reset events trigger POST by using ALOM variables.
TABLE 2-8 lists the ALOM variables used to configure POST and FIGURE 2-7 shows
how the variables work together.
TABLE 2-8ALOM Parameters Used For POST Configuration
ParameterValuesDescription
setkeyswitch
diag_modeoffPOST does not run.
*
normalThe system can power on and run POST (based
on the other parameter settings). For details see
FIGURE 2-7. This parameter overrides all other
commands.
diagThe system runs POST based on predetermined
settings.
stbyThe system cannot power on.
lockedThe system can power on and run POST, but no
flash updates can be made.
normalRuns POST according to diag_level value.
Chapter 2 Sun Fire T2000 Server Diagnostics31
TABLE 2-8
ParameterValuesDescription
ALOM Parameters Used For POST Configuration
serviceRuns POST with preset values for diag_level
and diag_verbosity.
diag_levelminIf diag_mode = normal, run minimum set of
tests.
maxIf diag_mode = normal, runs all the minimum
tests plus extensive CPU and memory tests.
diag_triggernoneDo not run POST on reset.
user_resetRuns POST upon user initiated resets.
power_on_resetOnly run POST for the first power on. This is the
default.
error_resetRuns POST if fatal errors are detected.
all_resetRuns POST after any reset.
diag_verbositynoneNo POST output is displayed.
minPOST output displays functional tests with a
banner and pinwheel.
normalPOST output displays all test and informational
messages.
maxPOST displays all test, informational, and some
debugging messages.
* All of these parameters are set using the ALOM setsc command except for the setkeyswitch command.
32Sun Fire T2000 Server Service Manual • October 2005
FIGURE 2-7 Flowchart of ALOM Variables for POST Configuration
Chapter 2 Sun Fire T2000 Server Diagnostics33
TABLE 2-9 shows typical combinations of ALOM variables and associated POST
modes.
TABLE 2-9ALOM Parameters and POST Modes
ParameterNormal Diagnostic
Mode
(default settings)
diag_modenormaloffservicenormal
setkeyswitch
diag_levelminn/amaxmax
diag_triggerpower-on-reset
diag_verbositynormaln/amaxmax
Description of POST
execution
* The setkeyswitch parameter, when set to diag, overrides all the other ALOM POST variables.
*
normalnormalnormaldiag
error-reset
This is the default POST
configuration and
provides a reasonable
compromise between
testing thoroughness
and quick server
initialization.
No POST
Execution
noneall-resetsall-resets
POST does not
run, resulting in
quick system
initialization, but
this is not a
suggested
configuration.
Diagnostic
Service Mode
POST runs the
full spectrum of
tests with the
maximum output
displayed.
Keyswitch
Diagnostic preset
values
POST runs the
full spectrum of
tests with the
maximum output
displayed.
▼ To Change POST Parameters
1. Access the ALOM sc> prompt:
At the console, issue the #. key sequence:
#.
2. At the ALOM sc> prompt, use the setsc command to set the POST parameter:
Example:
sc> setsc diag_mode service
The setkeyswitch parameter is a command that sets the virtual keyswitch, so it
does not use the setsc command. Example:
sc> setkeyswitch diag
34Sun Fire T2000 Server Service Manual • October 2005
Reasons to Run POST
You can use POST for basic sanity checking of the server hardware and for
troubleshooting as described in the following sections.
Routine Sanity Check of the Hardware
POST tests critical hardware components to verify functionality before the system
boots and accesses software. If POST detects an error, the faulty component is
disabled automatically, preventing faulty hardware from potentially harming
software.
Under normal operating conditions, the server is usually configured to run POST in
minimum mode for all power-on or error-generated resets. This enables the system
to initialize quickly, and still have hardware checkups to ensure a healthy system.
Diagnosing the System Hardware
You can use POST as an initial diagnostic tool for the system hardware. In this case,
configure POST to run in diagnostic service mode for maximum test coverage and
verbose output.
▼ To Run POST
This procedure describes how to run POST when you want maximum testing, as in
the case when you are troubleshooting a system.
1. Switch from the system console prompt to the SC console prompt by issuing the #.
escape sequence.
ok #.
sc>
2. Set the virtual keyswitch to diag so that POST will run in service mode.
sc> setkeyswitch diag
Chapter 2 Sun Fire T2000 Server Diagnostics35
3. Reset the system so that POST runs.
There are several ways to initiate a reset. The following example uses the
powercycle command. For other methods, refer to the Sun Fire T2000 Server
Administration Guide.
sc> powercycle
Are you sure you want to powercycle the system [y/n]? y
Powering host off at MON JAN 10 02:52:02 2000
Waiting for host to Power Off; hit any key to abort.
SC Alert: SC Request to Power Off Host.
SC Alert: Host system has shut down.
Powering host on at MON JAN 10 02:52:13 2000
SC Alert: SC Request to Power On Host.
4. Switch to the system console to view the post output:
sc> console
Example of POST output:
SC Alert: Host System has ResetNote: some output omitted.
0:0>Config port A, bus 2 dev 0 func 0, tag IOBD/PCI-SWITCH0
0:0>Config port A, bus 3 dev 1 func 0, tag IOBD/GBE0
0:0>INFO:Master Abort for probe, device IOBD/PCIE1 looks like it
is not present!
0:0>INFO:Master Abort for probe, device IOBD/PCIE2 looks like it
is not present!
0:0>INFO:
0:0>POST Passed all devices.
0:0>
0:0>DEMON: (Diagnostics Engineering MONitor)
0:0>Select one of the following functions
0:0>POST:Return to OBP.
0:0>INFO:
0:0>POST Passed all devices.
0:0>Master set ACK for vbsc runpost command and spin...
5. Perform further investigation if needed.
When POST is finished running, and if no faults were detected, the system will boot.
If POST detects a faulty device, the fault is displayed and the fault information is
passed to ALOM for fault handling.
a. Interpret the POST messages:
POST error messages use the following syntax:
c:s > ERROR: TEST = failing_test
c:s > H/W under test = FRU
c:s > Repair Instructions: Replace items in order listed by H/W
under test above
c:s > MSG = test_error_message
c:s > END_ERROR
In this syntax, c = the core number, s = the strand number.
Warning and informational messages use the following syntax:
INFO or WARNING: message
38Sun Fire T2000 Server Service Manual • October 2005
The following example shows a POST error message.
7:2>
7:2>ERROR: TEST = Data Bitwalk
7:2>H/W under test = MB/CMP0/CH2/R0/D0/S0 (MB/CMP0/CH2/R0/D0)
7:2>Repair Instructions: Replace items in order listed by 'H/W
under test' above.
7:2>MSG = Pin 149 failed on MB/CMP0/CH2/R0/D0 (J1601)
7:2>END_ERROR
7:2>Decode of Dram Error Log Reg Channel 2 bits
60000000.0000108c
7:2> 1 MEC 62 R/W1C Multiple corrected
errors, one or more CE not logged
7:2> 1 DAC 61 R/W1C Set to 1 if the error
was a DRAM access CE
7:2> 108c SYND 15:0 RW ECC syndrome.
7:2>
7:2> Dram Error AFAR channel 2 = 00000000.00000000
7:2> L2 AFAR channel 2 = 00000000.00000000
In this example, POST is reporting a memory error at DIMM location
MB/CMP0/CH2/R0/D0. It was detected by POST running on core 7, strand 2.
b. Run the showfaults command to obtain additional fault information.
The fault is captured by ALOM, where the fault is logged, the Service Required
LED is lit, and the faulty component is disabled.
Example:
ok .#
sc> showfaults -v
IDTime FRUFault
1 APR 24 12:47:27MB/CMP0/CH2/R0/D0MB/CMP0/CH2/R0/D0
deemed faulty and disabled
In this example,
MB/CMP0/CH2/R0/D0 (DIMM 13) is disabled. The system can
boot using memory that was not disabled until the faulty component is replaced.
Note – You can use ASR commands to display and control disabled components.
See “Managing Components with Automatic System Recovery (ASR) Commands”
on page 44.
Chapter 2 Sun Fire T2000 Server Diagnostics39
Using the Solaris Predictive Self-Healing
Feature
The Solaris predictive self-healing (PSH) technology enables Sun Fire T2000 server to
diagnose problems while the Solaris OS is running, and mitigate many problems
before they occur.
The Solaris OS uses the fault manager daemon, fmd(1M), which starts at boot time
and runs in the background to monitor the system. If a component generates an
error, the daemon handles the error by correlating the error with data from previous
errors and other related information to diagnose the problem. Once diagnosed, the
fault manager daemon assigns the problem a unique identifier (UUID) that
distinguishes the problem across any set of systems. When possible, the fault
manager daemon initiates steps to self-heal the failed component and take the
component offline. The daemon also logs the fault to the syslogd daemon and
provides a fault notification with a message ID (MSGID). You can use message ID to
get additional information about the problem from Sun’s knowledge article
database.
The predictive self-healing technology covers the following Sun Fire T2000 server
components:
■ UltraSPARC T1 multicore processor
■ Memory
■ I/O bus
The PSH console message provides the following information:
■ Type
■ Severity
■ Description
■ Automated Response
■ Impact
■ Suggested Action for System Administrator
■ Details
If the Solaris PSH facility has detected a faulty component, use the fmdump
command to identify the fault.
Note – Additional predictive self-healing information is available at:
http://www.sun.com/msg
40Sun Fire T2000 Server Service Manual • October 2005
▼ To Use the fmdump Command to Identify Faults
The fmdump command displays the list of faults detected by the Solaris PSH facility.
Use this command for the following reasons:
■ To see if any faults have been detected by the Solaris PSH facility.
■ If you need to obtain the fault message ID (SUNW-MSG-ID) for detected faults.
■ To verify that the replacement of a FRU has cleared the fault and not generated
any additional faults.
If you already have a fault message ID, go to Step 2 to obtain more information
about the fault from Suns Predictive Self-Healing Knowledge Article web site.
1. Check the event log using the fmdump command with -v for verbose output:
In this example, a fault is displayed, indicating the following details:
■ Date and time of the fault (Apr 24 06:54:08.2005)
■ Universal Unique Identifier (UUID) that is unique for every fault (lce22523-
lc80-6062-e61d-f3b39290ae2c)
■ Sun message identifier (SUNW4V-8000-6H) that can be used to obtain additional
fault information
■ Faulted FRU (FRU:hc:///component=MB), that in this example is identified as
MB, indicating that the motherboard requires replacement.
2. Use the Sun message ID to obtain more information about this type of fault.
a. In a browser, go to the Predictive Self-Healing Knowledge Article web site:
http://www.sun.com/msg
Chapter 2 Sun Fire T2000 Server Diagnostics41
b. Enter the message ID in the SUNW-MSG-ID field, and press Lookup.
In this example, the message ID SUN4U-8000-6H returns the following
information for corrective action:
CPU errors exceeded acceptable levels
Type
Fault
Severity
Major
Description
The number of errors associated with this CPU has exceeded
acceptable levels.
Automated Response
The fault manager will attempt to remove the affected CPU from
service.
Impact
System performance may be affected.
Suggested Action for System Administrator
Schedule a repair procedure to replace the affected CPU, the
identity of which can be determined using fmdump -v -u <EVENT_ID>.
Details
The Message ID: SUN4U-8000-6H indicates diagnosis has
determined that a CPU is faulty. The Solaris fault manager arranged
an automated attempt to disable this CPU. The recommended action
for the system administrator is to contact Sun support so a Sun
service technician can replace the affected component.
c. Follow the suggested actions to repair the fault.
42Sun Fire T2000 Server Service Manual • October 2005
Collecting Information From Solaris OS
Files and Commands
With the Solaris OS running on the Sun Fire T2000 server, you have the full
compliment of Solaris OS files and commands available for collecting information
and for troubleshooting.
In the event that POST, ALOM, or the Solaris PSH features did not indicate the
source of a fault, check the message buffer and log files for notifications for faults.
Hard drive faults are usually captured by the Solaris message files.
Use the dmesg command to view the most recent system message. To view the
system messages log file, view the contents of the /var/adm/messages file.
▼ To Check the Message Buffer
1. Log in as superuser.
2. Issue the dmesg command:
# dmesg
The dmesg command displays the most recent messages generated by the system.
▼ To View System Message Log Files
The error logging daemon, syslogd automatically records various system
warnings, errors, and faults in message files. These messages can alert you to system
problems such as a device that is about to fail.
The /var/adm directory contains several message files. The most recent messages
are in the /var/adm/messages file. After a period of time (usually every ten days),
a new messages file is automatically created. The original contents of the
messages file are rotated to a file named messages.1. Over a period of time, the
messages are further rotated to messages.2 and messages.3, and then deleted.
1. Log in as superuser.
Chapter 2 Sun Fire T2000 Server Diagnostics43
2. Issue the following command:
# more /var/adm/messages
3. If you want to view all logged messages, issue the following command:
# more /var/adm/messages*
Managing Components with Automatic
System Recovery (ASR) Commands
The Automatic System Recovery (ASR) feature enables the server to automatically
configure failed components out of operation until they can be replaced. In the Sun
Fire T2000 server, the following components managed by the ASR feature:
■ UltraSPARC T1 processor strands
■ Memory DIMMS
■ I/O bus
The database that contains the list of disabled components is called the ASR blacklist
(asr-db).
In most cases, POST and ALOM automatically manage the disabling of faulty
comments and automatically enables them when the faulty FRU is replaced. In some
situations, it is necessary to manually manage the blacklist.
Example: A component appears faulty and is automatically disabled. The problem is
due to a loose connector, and no FRU replacement is required to fix the problem.
ALOM, which would normally detect a FRU replacement and enable the FRU, does
not do so. In this case, after the loose cable is reseated, the disabled component must
be manually enabled.
44Sun Fire T2000 Server Service Manual • October 2005
The Automatic System Recovery (ASR) commands (TABLE 2-10) enable you to view,
and manually add or remove components from the ASR blacklist. These commands
are run from the ALOM sc> prompt.
TABLE 2-10 ASR Commands
CommandDescription
showcomponent
enablecomponent asrkeyRemoves a component from the asr-db blacklist,
disablecomponent asrkeyAdds a component to the asr-db blacklist, where
clearasrdbRemoves all entries from the asr-db blacklist.
* The showcomponent command may not report all blacklisted DIMMS.
*
Displays system components and their current state.
where asrkey is the component to enable.
asrkey is the component to disable.
Note – The components (asrkeys) vary from system to system, depending on how
many cores and memory are present. Use the showcomponent command to see the
asrkeys on a given system.
Note – A reset or powercycle is required after disabling or enabling a
component. If the status of a component is changed with power on there is no effect
to the system until the next reset or powercycle.
Chapter 2 Sun Fire T2000 Server Diagnostics45
▼ To Run the showcomponent Command
The showcomponent command displays the system components (asrkeys) and
reports their status.
1. At the sc> prompt, enter the showcomponent command.
46Sun Fire T2000 Server Service Manual • October 2005
▼ To Run the disablecomponent Command
The disablecomponent command disables a component by adding it to the ASR
blacklist.
1. At the sc> prompt, enter the disablecomponent command
sc> disablecomponent MB/CMP0/CH3/R1/D1
sc>SC Alert:MB/CMP0/CH3/R1/D1 disabled
2. After receiving confirmation that the disablecomponent command is complete,
reset the server for so that the ASR command takes effect.
sc> reset
.
▼ To Run the enablecomponent Command
The enablecomponent command enables a disabled component by removing it
from the ASR blacklist.
1. At the sc> prompt, enter the enablecomponent command.
sc> enablecomponent MB/CMP0/CH3/R1/D1
sc>SC Alert:MB/CMP0/CH3/R1/D1 reenabled
2. After receiving confirmation that the enablecomponent command is complete,
reset the server for so that the ASR command takes effect.
sc> reset
Chapter 2 Sun Fire T2000 Server Diagnostics47
Exercising the System With SunVTS
Sometimes a server exhibits a problem that cannot be isolated definitively to a
particular hardware or software component. In such cases, it may be useful to run a
diagnostic tool that stresses the system by continuously running a comprehensive
battery of tests. Sun provides the SunVTS software for this purpose.
This chapter describes the tasks necessary to use SunVTS software to exercise your
Sun Fire T2000 server:
■ “Checking Whether SunVTS Software Is Installed” on page 48
■ “Exercising the System Using SunVTS Software” on page 50
Checking Whether SunVTS Software Is Installed
This procedure assumes that the Solaris OS is running on the Sun Fire T2000 server,
and that you have access to the Solaris command line.
▼ To Check Whether SunVTS Software Is Installed
1. Check for the presence of SunVTS packages using the pkginfo command.
% pkginfo -l SUNWvts SUNWvtsr SUNWvtsts SUNWvtsmn
■ If SunVTS software is loaded, information about the packages is displayed.
■ If SunVTS software is not loaded, you see an error message for each missing
package.
ERROR: information for "SUNWvts" was not found
ERROR: information for "SUNWvtsr" was not found
...
48Sun Fire T2000 Server Service Manual • October 2005
The following table lists SunVTS packages:
PackageDescription
SUNWvtsSunVTS framework
SUNWvtsrSunVTS Framework (root)
SUNWvtstsSunVTS for tests
SUNWvtsmnSunVTS man pages
If SunVTS is not installed, you can obtain the installation packages from the
following:
■ Solaris Operating System DVDs
■ From the Sun Download Center: http://www.sun.com/oem/products/vts
The SunVTS 6.0 PS3 software, and future compatible versions, are supported on the
Sun Fire T2000 server.
SunVTS installation instructions are described in the SunVTS User’s Guide.
Chapter 2 Sun Fire T2000 Server Diagnostics49
Exercising the System Using SunVTS Software
Before you begin, the Solaris OS must be running. You also need to ensure that
SunVTS validation test software is installed on your system. See “Checking Whether
SunVTS Software Is Installed” on page 48.
The SunVTS installation process requires that you specify one of two security
schemes from which to use when running SunVTS. The security scheme you choose
must be properly configured in the Solaris OS for you to run SunVTS. For details,
refer to the SunVTS User’s Guide.
SunVTS software features both character-based and graphics-based interfaces. This
procedure assumes that you are using the graphical user interface (GUI) on a system
running the Common Desktop Environment (CDE). For more information about the
character-based SunVTS TTY interface, and specifically for instructions on accessing
it by TIP or telnet commands, refer to the SunVTS User’s Guide.
SunVTS software can be run in several modes. This procedure assumes that you are
using the default mode.
This procedure also assumes that the Sun Fire T2000 server is headless—that is, it is
not equipped with a monitor capable of displaying bit mapped graphics. In this case,
you access the SunVTS GUI by logging in remotely from a machine that has a
graphics display.
Finally, this procedure describes how to run SunVTS tests in general. Individual
tests may presume the presence of specific hardware, or may require specific
drivers, cables, or loopback connectors. For information about test options and
prerequisites, refer to the following documentation:
■ SunVTS Test Reference Manual
■ SunVTS 6.0 PS3 Doc Supplement (SPARC)
▼ To Exercise the System Using SunVTS Software
1. Log in as superuser to a system with a graphics display.
The display system should be one with a frame buffer and monitor capable of
displaying bit-mapped graphics such as those produced by the SunVTS GUI.
2. Enable the remote display.
On the display system, type:
# /usr/openwin/bin/xhost + test-system
where test-system is the name of the Sun Fire T2000 server you plan to test.
50Sun Fire T2000 Server Service Manual • October 2005
3. Remotely log in to the Sun Fire T2000 server as superuser.
Use a command such as rlogin or telnet.
4. Start SunVTS software.
If you have installed SunVTS software in a location other than the default /opt
directory, alter the path in the preceding command accordingly.
where display-system is the name of the machine through which you are remotely
logged in to the Sun Fire T2000 server.
The SunVTS GUI is displayed (
FIGURE 2-8).
FIGURE 2-8 SunVTS GUI
Chapter 2 Sun Fire T2000 Server Diagnostics51
5. Expand the test lists to see the individual tests.
The test selection area lists tests in categories, such as Network, as shown in
FIGURE 2-9. To expand a category, left-click theicon (expand category icon) to the
+
left of the category name.
FIGURE 2-9 SunVTS Test Selection Panel
6. (Optional) Select the tests you want to run.
Certain tests are enabled by default, and you can choose to accept these.
Alternatively, you can enable and disable individual tests or blocks of tests by
clicking the checkbox next to the test name or test category name. Tests are enabled
when checked, and disabled when not checked.
TABLE 2-11 lists tests that are especially useful to run on a Sun Fire T2000 server.
TABLE 2-11 Useful SunVTS Tests to Run on a Sun Fire T2000 Server
usbkbtest, disktestUSB devices, cable, CPU motherboard (USB
hsclbtestMotherboard, system controller
memory DIMMS, CPU motherboard
motherboard
controller)
(Host to System Controller interface)
52Sun Fire T2000 Server Service Manual • October 2005
7. (Optional) Customize individual tests.
You can customize individual tests by right-clicking on the name of the test. For
example, in
FIGURE 2-9, right-clicking on the text string ce0(nettest) brings up a
menu that enables you to configure this Ethernet test.
8. Start testing.
Click the Start button that is located at the top left of the SunVTS window. Status
and error messages appear in the test messages area located across the bottom of the
window. You can stop testing at any time by clicking the Stop button.
During testing, SunVTS software logs all status and error messages. To view these,
click the Log button or select Log Files from the Reports menu. This opens a log
window from which you can choose to view the following logs:
■ Information – Detailed versions of all the status and error messages that appear
in the test messages area.
■ Test Error – Detailed error messages from individual tests.
■ VTS Kernel Error – Error messages pertaining to SunVTS software itself. You
should look here if SunVTS software appears to be acting strangely, especially
when it starts up.
■ Solaris OS Messages (/var/adm/messages) – A file containing messages
generated by the operating system and various applications.
■ Log Files (/var/opt/SUNWvts/logs) – A directory containing the log files.
For further information, refer to the manuals that accompany the SunVTS software
Chapter 2 Sun Fire T2000 Server Diagnostics53
54Sun Fire T2000 Server Service Manual • October 2005
CHAPTER
3
Replacing Hot-Swappable and HotPluggable FRUs
This chapter describes how to remove and replace the hot-swappable and hotpluggable field replaceable units (FRUs) in the Sun Fire T2000 Server.
The following topics are covered:
■ “Devices That Are Hot-Swappable and Hot-Pluggable” on page 56
■ “Hot-Swapping a Fan” on page 56
■ “Hot-Swapping a Power Supply” on page 58
■ “Hot-Swapping the Rear Blower” on page 61
■ “Hot-Plugging a Hard Drive” on page 63
55
Devices That Are Hot-Swappable and
Hot-Pluggable
Hot swappable devices are those devices that you can remove and install while the
system is running without affecting the rest of the systems capabilities. In a Sun Fire
T2000 server, the following devices are hot swappable:
■ Fans
■ Power supplies
■ Rear blower
Hot-pluggable devices are those devices that can be removed and installed while the
system is running, but you must perform administrative tasks beforehand. In a Sun
Fire T2000 server, the chassis mounted hard drives can be hot-swappable
(depending on how they are configured).
Hot-Swapping a Fan
Three hot-swappable fans are located under the fan door.
Two working fans are required to provide adequate cooling for the Sun Fire T2000
server. If a fan fails, replace it as soon as possible to ensure system availability.
The following LEDs are lit when a fan fault is detected:
■ Front and rear Service Required LEDs.
■ Top Fan LED on the front of the server
■ LED on the faulty fan
If an overtemperature conditions occurs, the front panel OverTemp LED lights.
A message is displayed on the console and logged by ALOM. Use the showfaults
command at the sc> prompt to view the current faults.
56Sun Fire T2000 Server Service Manual • October 2005
▼ To Remove a Fan
1. Gain access to the top of the server where the fan door is located (FIGURE 3-1).
You might need to extend the server to a maintenance position. See “To Extend the
Server to the Maintenance Position” on page 69
.
FN2
FN1
FN0
LED
Latch
Fan door
FIGURE 3-1 Removing a Fan
2. Unpackage the replacement fan and place it near the server.
3. Lift the latch on the top of the fan door (“Removing a Fan” on page 57), and lift
the fan door open.
The fan door is spring loaded, and you must hold it in the open position.
4. Identify the faulty fan.
A lighted LED on the top of a fan (
FIGURE 3-1) indicates that the fan is faulty.
Chapter 3 Replacing Hot-Swappable and Hot-Pluggable FRUs57
5. Pull up on the fan strap handle until the fan is removed from the fan bay.
▼ To Replace a Fan
1. With the fan door held open, slide the replacement fan into the fan bay.
2. Apply firm pressure to fully seat the fan.
3. Verify that the LED on the replaced fan and the Top fan, Service Required, and
Locator LEDs are not lit.
4. Close the fan door.
5. If necessary, return the server to its normal position in the rack.
Hot-Swapping a Power Supply
The Sun Fire T2000 server’s redundant hot-swappable power supplies enable you to
remove and replace a power supply without shutting the server down provided that
the other power supply is online and working.
The following LEDs are lit when a power supply fault is detected:
■ Front and rear Service Required LEDs.
■ Rear-FRU Fault LED on the front of the server
■ Amber Failure LED on the faulty power supply
If a power supply fails and you do not have a replacement available, leave the failed
power supply installed to ensure proper air flow in the server.
▼ To Remove a Power Supply
1. Identify which power supply (0 or 1) requires replacement (FIGURE 3-2).
A lighted amber LED on a power supply indicates that a failure was detected. You
can also use the showfaults command at the sc> prompt.
58Sun Fire T2000 Server Service Manual • October 2005
Latches
PS1
PS0
FIGURE 3-2 Locating Power Supplies and Release Latch
2. At the sc> prompt, issue the removefru command.
The removefru command prepares the server for the hot swap operation.
For instructions on how to access the sc> prompt, refer to the Sun Fire T2000 Server
Advanced Lights Out Manager (ALOM) Guide.
Example:
sc> removefru -y PSn
Are you sure you want to remove PS0 [y/n]? y
<PS0> is safe to remove.
Where
PSn is the power supply identifier for the power supply you plan to remove,
either PS0 or PS1.
3. Gain access to the rear of the server where the faulty power supply is located.
Chapter 3 Replacing Hot-Swappable and Hot-Pluggable FRUs59
4. At the rear of the server, release the cable management arm (CMA) tab (FIGURE 3-3)
and swing the CMA out of the way so you can access the power supply.
FIGURE 3-3 Rotating the Cable Management Arm
5. Disconnect the power cord from the faulty power supply.
6. Grasp the power supply handle and push the power supply latch to the right.
7. Pull the power supply out of the chassis.
▼ To Replace a Power Supply
1. Align the replacement power supply with the empty power supply bay.
2. Slide the power supply into bay until it is fully seated.
3. Reconnect the power cord to the power supply.
4. Close the CMA, inserting the end of the CMA into the rear left rail bracket.
5. Verify that the amber LED on the replaced power supply, the Service required,
and Rear-FRU Fault LEDs are not lit.
6. At the sc> prompt, issue the showenvironment command to verify the status of
the power supplies.
60Sun Fire T2000 Server Service Manual • October 2005
Hot-Swapping the Rear Blower
The rear blower on the Sun Fire T2000 server is hot-swappable.
The following LEDs are lit when a blower unit fault is detected:
■ Front and rear Service Required LEDs
■ LED on the blower is lit.
▼ To Remove the Rear Blower
1. Gain access to the rear of the server where the faulty blower unit is located.
2. Release cable management arm tab (
FIGURE 3-3) and swing the cable management
arm out of the way so you can access the power supply.
3. Unscrew the two thumbscrews (
FIGURE 3-4) that secure the rear blower to the
chassis.
LED
FIGURE 3-4 Removing the Rear Blower
4. Grasp the thumbscrews and slowly slide the blower out of the chassis, keeping
the blower level as you remove it.
▼ To Replace the Rear Blower
1. Unpackage the replacement blower.
2. Slide the blower into the chassis until it locks into the power connector at the
front of the blower compartment (
FIGURE 3-5).
Chapter 3 Replacing Hot-Swappable and Hot-Pluggable FRUs61
FN2
FIGURE 3-5 Replacing the Blower Unit
3. Tighten the two thumbscrews to secure the blower (FN2)to the chassis.
4. Verify that the Rear Blower and Service Required LEDs are not lit.
5. Close the CMA, inserting the end of the CMA into the rear left rail bracket.
62Sun Fire T2000 Server Service Manual • October 2005
Hot-Plugging a Hard Drive
The hard disk drives in the Sun Fire T2000 server are hot-pluggable, but this
capability depends on how the hard drives are configured. To hot plug a drive you
must be able to take the drive offline (prevent any applications from accessing it,
and remove the logical software links to it) before you can safely remove it.
The following situations inhibit the ability to perform hot-plugging of a drive:
■ The hard drive provides the operating system, and the operating system is not
mirrored on another drive.
■ The hard drive cannot be logically isolated from the online operations of the
server
If your drive falls into these conditions, you must shut the system down before you
replace the hard drive. See“To Shut the System Down” on page 68.
▼ To Remove a Hard Drive
1. Identify the location of the hard drive that you want to replace (FIGURE 3-6).
HDD2
Latch
FIGURE 3-6 Locating the Hard Drive Release Button and Latch
Latch release
button
HDD0
Chapter 3 Replacing Hot-Swappable and Hot-Pluggable FRUs63
HDD3
HDD1
2. Issue the Solaris OS commands required to stop using the hard drive.
Exact commands required depend on the configuration of your hard drives. You
might need to unmount file systems or perform RAID commands.
Example:
cfgadm -c unconfigure c0t0d0s0
3. On the drive you plan to remove, push the latch release button (
FIGURE 3-6).
The latch opens.
Caution – The latch is not an ejector. Do not bend it too far to the left. Doing so can
damage the latch.
4. Grasp the latch and pull the drive out of the drive slot.
▼ To Replace a Hard Drive
1. Align the replacement drive to the drive slot.
The hard drive is physically addressed according to the slot in which it is installed.
See
FIGURE 3-6. It is important to install a replacement drive in the same slot as the
drive that was removed.
2. Slide the drive into the bay until it is fully seated.
3. Close the latch to lock the drive in place.
4. Perform administrative tasks to reconfigure the hard disk drive.
The procedures that you perform at this point depend on how your data is
configured. You might need to partition the drive, create file systems, load data from
backups, or have it updated from a RAID configuration.
Example:
cfgadm -c configure c0t0d0s0
64Sun Fire T2000 Server Service Manual • October 2005
CHAPTER
4
Replacing Cold Swappable FRUs
This chapter describes how to remove and replace field replaceable units (FRUs) in
the Sun Fire T2000 server that must be cold swapped.
The following topics are covered:
■ “Safety Information” on page 66
■ “Common Procedures for Parts Replacement” on page 67
■ “Removing and Replacing FRUs” on page 74
■ “Common Procedures for Finishing Up” on page 103
For a list of FRUs, see Appendix A, “Field-Replaceable Units” on page 119.
Note – Never attempt to run the system with the cover removed. The cover must be
in place for proper air flow. The cover interlock switch immediately shuts the system
down when the cover is removed.
65
Safety Information
This section describes important safety information you need to know prior to
removing or installing parts in the Sun Fire T2000 server.
For your protection, observe the following safety precautions when setting up your
equipment:
■ Follow all Sun standard cautions, warnings, and instructions marked on the
equipment and described in Important Safety Information for Sun Hardware Systems,
816-7190.
■ Make sure that the voltage and frequency of your power source match the voltage
and frequency inscribed on the equipment s electrical rating label.
■ Follow the electrostatic discharge safety practices as described in this section.
Safety Symbols
The following symbols might appear in this book, note their meanings:
Caution – There is a risk of personal injury and equipment damage. To avoid
personal injury and equipment damage, follow the instructions.
Caution – Hot surface. Avoid contact. Surfaces are hot and might cause personal
injury if touched.
Caution – Hazardous voltages are present. To reduce the risk of electric shock and
danger to personal health, follow the instructions.
66Sun Fire T2000 Server Service Manual • October 2005
Electrostatic Discharge Safety
Electrostatic discharge (ESD) sensitive devices, such as the motherboard, PCI cards,
hard drives, and memory cards require special handling.
Caution – The boards and hard drives contain electronic components that are
extremely sensitive to static electricity. Ordinary amounts of static electricity from
clothing or the work environment can destroy components. Do not touch the
components along their connector edges.
Use an Antistatic Wrist Strap
Wear an antistatic wrist strap and use an antistatic mat when handling components
such as drive assemblies, boards, or cards. When servicing or removing server
components, attach an antistatic strap to your wrist and then to a metal area on the
chassis. Do this after you disconnect the power cords from the server. Following this
practice equalizes the electrical potentials between you and the server.
Use an Antistatic Mat
Place ESD-sensitive components such as the motherboard, memory, and other PCB
cards on an antistatic mat.
Common Procedures for Parts
Replacement
Before you can remove and replace parts that are inside the Sun Fire T2000 server,
you must perform the following procedures:
■ “To Shut the System Down” on page 68
■ “To Extend the Server to the Maintenance Position” on page 69
■ “To Disconnect Power From the Server” on page 72
■ “To Remove the Top Cover” on page 72
■ “To Remove the Front Bezel and Top Front Cover” on page 73
Chapter 4 Replacing Cold Swappable FRUs67
Note – These procedures do not apply to the hot-pluggable and hot-swappable
devices (fans, power supplies, hard drives and rear blower) described in the
preceding chapter.
The corresponding procedures that you perform when maintenance is complete are
described in “Common Procedures for Finishing Up” on page 103.
Required Tools
The Sun Fire T2000 server can be serviced with the following tools:
■ Antistatic wrist strap
■ Antistatic mat
■ No. 2 Phillips screwdriver
▼ To Shut the System Down
Performing a graceful shutdown makes sure all of your data is saved and the system
is ready for restart.
1. Log in as superuser or equivalent.
Depending on the nature of the problem, you might want to view the system status,
the log files, or run diagnostics before you shut down the system. Refer to the SunFire T2000 Server Administration Guide for log file information.
2. Notify affected users.
Refer to your Solaris system administration documentation for additional
information.
3. Save any open files and quit all running programs.
Refer to your application documentation for specific information on these processes.
4. Shut down the Solaris OS.
Refer to the Solaris system administration documentation for additional information.
5. Switch from the system console to the ALOM sc> prompt by typing the #. (Pound
Period) key sequence.
d. At the ALOM sc> prompt, issue the poweroff command.
sc> poweroff -fy
SC Alert: SC Request to Power Off Host Immediately.
68Sun Fire T2000 Server Service Manual • October 2005
Note – You can also use the Power ON/OFF button on the front of the server to
initiate a graceful system shutdown. This button is recessed to
prevent accidental server power-off. Use the tip of a pen to operate this button.
Refer to the Sun Fire T2000 Server Advanced Lights Out Management (ALOM) Guide for
more information about the ALOM poweroff command.
▼ To Extend the Server to the Maintenance
Position
If the server is installed in a rack with the extendable slide rails that were supplied
with it, use this procedure to extend the server to the maintenance position.
Note – Removing the server from the rack is recommended for all cold-swappable
FRU replacement procedures except the DIMMs, PCI cards, and the system
controller.
1. (Optional) Issue the following command from the ALOM sc> prompt to locate the
system that requires maintenance.
sc> setlocator on
Locator LED is on.
Once you have located the server, press the Locator LED button to turn it off.
2. Check to see that no cables will be damaged or interfere when the server is
extended.
Although the cable management arm (CMA) that is supplied with the server is
hinged to accommodate extending the server, you should make sure that all cables
and cords are capable of extending.
3. From the front of the server, release the slide rail latches on each side.
Pinch the green latches as shown in
FIGURE 4-1.
Chapter 4 Replacing Cold Swappable FRUs69
FIGURE 4-1 Slide Release Latches
4. While pinching the release latches, slowly pull the server forward until the slide
rails latch.
▼ To Remove the Server From the Rack
Removing the server from the rack is recommended for all cold swappable FRU
replacement procedures except the DIMMs, PCI cards, and the system controller.
Caution – The server weighs approximately 40 lb. (18 kg). Two people are required
to dismount and carry the chassis.
1. Disconnect all the cables and power cords from the server.
2. Extend the server to the maintenance position as described in “To Extend the
Server to the Maintenance Position” on page 69.
3. Press the metal lever (
disconnect the CMA from the rail assembly (on the right side from the back of the
rack).
This leaves the CMA still attached to the cabinet, but the server chassis is now
disconnected from the CMA.
70Sun Fire T2000 Server Service Manual • October 2005
FIGURE 4-2) that is located on the inner side of the rail to
FIGURE 4-2 Locating the Metal Lever
Caution – The server weighs approximately 40 lb. (18 kg). The next step requires
two people to dismount and carry the chassis.
4. From the front of the server, pull the release tabs forward and pull the server
forward until it is free of the rack rails.
The release tabs are located on each rail, about midway on the server.
5. Set the server on a sturdy work surface.
Chapter 4 Replacing Cold Swappable FRUs71
▼ To Disconnect Power From the Server
Caution – The system supplies standby power to the circuit boards even when the
system is powered off.
● Disconnect both power cords from the power supplies.
Note – The following FRU replacements do not require that power be removed:
DIMMs and PCI cards.
▼ To Perform Electrostatic Discharge (ESD)
Prevention Measures
1. Prepare an antistatic surface by which to set parts during removal and installation.
Place ESD-sensitive components such as the printed circuit boards on an antistatic
mat. The following items can be used as an antistatic mat:
■ Antistatic bag used to wrap a Sun replacement part
■ Sun ESD mat, part number 250-1088
■ Disposable ESD mat (shipped with some replacement parts or optional system
components)
2. Attach an Antistatic Wrist Strap.
When servicing or removing server components, attach an antistatic strap to your
wrist and then to a metal area on the chassis. Do this after you disconnect the power
cords from the server.
▼ To Remove the Top Cover
All field replaceable units (FRUs) that are not hot swappable require the removal of
the top cover.
72Sun Fire T2000 Server Service Manual • October 2005
1. Press the Top cover release button (FIGURE 4-3).
Top cover
Fan cover
Fan
Top cover release button
cover latch
FIGURE 4-3 Top Cover and Release Button
Top front cover
2. While pressing the top cover release button, slide the cover toward the rear of the
server about half of an inch.
3. Lift the cover off the chassis.
▼ To Remove the Front Bezel and Top Front Cover
The following field-replaceable units (FRUs) require the removal of the top front
cover and front bezel:
■ Motherboard
■ SAS disk backplane
■ LED board
■ Front I/O board
■ Fan power board
■ DVD
1. Remove the top cover as described in the previous procedure.
2. Lift the fan cover latch (
3. Loosen the captive screw (near the right-most fan) that secures the bezel to the
chassis (
FIGURE 4-4).
FIGURE 4-3) and open the fan cover.
Chapter 4 Replacing Cold Swappable FRUs73
FIGURE 4-4 Removing the Front Bezel from the Server Chassis
4. Remove the front bezel from the chassis (FIGURE 4-4).
The bezel is held in place by a mounting tab and four fasteners that clamp the bezel
to the chassis.
5. While holding the fan cover open, slide the top front cover forward to disengage
it from the chassis
.
6. Lift the top front cover from the chassis.
Removing and Replacing FRUs
This section provides procedures for replacing the following field replaceable parts
(FRUs) inside the server chassis:
■ “To Remove PCI-E and PCI-X Cards” on page 75 and “To Replace PCI Cards” on
page 77
■ “To Remove DIMMs” on page 77 and “To Replace DIMMs” on page 79
■ “To Remove the System Controller” on page 82 and “To Replace the System
Controller Board” on page 83
■ “To Remove the Motherboard Assembly” on page 84 and “To Replace the
Motherboard Assembly” on page 88
■ “To Remove the Power Distribution Board” on page 90 and “To Replace the
Power Distribution Board” on page 92
74Sun Fire T2000 Server Service Manual • October 2005
■ “To Remove the LED Board” on page 93 and “To Remove the LED Board” on
page 93
■ “To Remove the Fan Power Board” on page 95 and “To Replace the Fan Power
Board” on page 96
■ “To Remove the DVD Drive” on page 98 and “To Replace the DVD Drive” on
page 99
■ “To Remove the SAS Disk Backplane” on page 99 and “To Replace the SAS Disk
Backplane” on page 100
■ “To Remove the Battery on the System Controller” on page 101 and “To Replace
the Battery on the System Controller” on page 101
To locate these FRUs, refer to Appendix A, “Field-Replaceable Units” on page 119.
▼ To Remove PCI-E and PCI-X Cards
Use this procedure to remove the optional PCI-E and PCI-X cards from the server.
1. Perform the procedures described in “Common Procedures for Parts
Replacement” on page 67.
2. Locate the PCI card that you want to remove.
To locate the PCI card slots, refer to
FIGURE 4-5 and FIGURE 4-6. The PCI card slots are
located on the I/O portion of the motherboard assembly.
PCI-E slotsPCI-X slots
Slot 0
FIGURE 4-5 Location of PCI-E and PCI-X Card Slots
Slot 1
Slot 2
Slot 1
Slot 0
3. Make a note of where the PCI card is installed and note any cables so you know
where to reinstall the card and cables.
Chapter 4 Replacing Cold Swappable FRUs75
PCI-E slots 0, 1, 2
PCI-X slots 0, 1
FIGURE 4-6 Location of PCI-E and PCI-X Card Slots
4. Make note of and remove any cables that are attached to the card.
5. Rotate the PCI hold-down bracket 90 degrees so it no longer covers the PCI card
(
FIGURE 4-7).
PCI hold-down bracket
FIGURE 4-7 PCI Card and Hold-down Bracket
76Sun Fire T2000 Server Service Manual • October 2005
6. Carefully work the card out of the socket.
7. Place the card on an antistatic mat.
8. Rotate the hold-down bracket so that it does not protrude into the chassis.
▼ To Replace PCI Cards
Use this procedure to replace PCI-E and PCI-X cards.
1. Unpackage the replacement PCI-E or PCI-X card and place it on an antistatic mat.
2. Locate the proper socket for the card you are replacing.
3. Rotate the PCI hold-down bracket 90 degrees so you can install the card.
4. Insert the card into the socket.
5. Rotate the PCI hold-down bracket 90 degrees to lock the card in place.
6. Perform the procedures described in “Common Procedures for Finishing Up” on
page 103.
▼ To Remove DIMMs
Caution – This procedure requires that you handle components that are sensitive to
static discharges that can cause the component to fail. To avoid this problem, ensure
that you follow antistatic practices as described in “To Perform Electrostatic
Discharge (ESD) Prevention Measures” on page 72.
1. Perform the procedures described in “Common Procedures for Parts
Replacement” on page 67.
2. Locate the DIMM (
FIGURE 4-8 and TABLE 4-1 to identify the DIMM you want to remove.
Use
Note – For memory configuration information see “To Add DIMMs” on page 113.
FIGURE 4-8) that you want to replace.
Chapter 4 Replacing Cold Swappable FRUs77
Front of
board
FIGURE 4-8 DIMM Locations
Use FIGURE 4-8 and TABLE 4-1 to map DIMM names that are displayed in faults to
socket numbers that identify the location of the DIMM on the motherboard.
TABLE 4-1DIMM Names and Socket Numbers
DIMM Name Used in Messages
CH0/R1/D1J0901DIMM 1
CH0/R0/D1J0701DIMM 2
CH0/R1/D0J0801DIMM 3
CH0/R0/D0J0601DIMM 4
CH1/R0/D1J1401DIMM 5
78Sun Fire T2000 Server Service Manual • October 2005
*
Socket No.DIMM No.
TABLE 4-1
DIMM Names and Socket Numbers (Continued)
DIMM Name Used in Messages
CH1/R1/D1J1201DIMM 6
CH1/R1/D0J1301DIMM 7
CH1/R0/D0J1101DIMM 8
CH2/R1/D1J1901DIMM 16
CH2/R0/D1J1701DIMM 15
CH2/R1/D0J1801DIMM 14
CH2/R0/D0J1601DIMM 13
CH3/R1/D1J2401DIMM 12
CH3/R0/D1J2201DIMM 11
CH3/R1/D0J2301DIMM 10
CH3/R0/D0J2101DIMM 9
* DIMM names in messages are displayed with the full name such as MB/CMP0/CH1/R1/D1, but this table lists
the DIMM namein an abbreviated way(the preceding MB/CMP0 is omitted) for clarity.
*
Socket No.DIMM No.
3. Make note of the DIMM location so you can install the replacement DIMM in the
same socket.
4. Push down on the ejector levers on each side of the DIMM until the DIMM is
released.
5. Grasp the top corners of the faulty DIMM and remove it from the system.
6. Place the DIMM on an antistatic mat.
▼ To Replace DIMMs
1. Unpackage the replacement DIMMs and place them on an antistatic mat.
2. Ensure that the connector ejector tabs are in the open position.
3. Line up the replacement DIMM with the connector.
Align the DIMM notch with the key in the connector. This ensures that the DIMM is
oriented correctly.
4. Push the DIMM into the connector until the ejector tabs lock the DIMM in place.
5. Perform the procedures described in “Common Procedures for Finishing Up” on
page 103.
Chapter 4 Replacing Cold Swappable FRUs79
6. Perform the following steps to clear the memory fault.
a. Gain access to the ALOM sc> prompt.
Refer to the Sun Fire T2000 Server Advanced Lights Out Management (ALOM) Guide
for instructions.
b. Run the showfaults -v command to determine how to clear the fault:
■ If the fault is a Host-detected fault (displays a UUID), such as the following:
sc> showfaults -v
ID Time FRU Fault
0 SEP 09 11:09:26 MB/CMP0/CH0/R0/D0 Host detected fault,
MSGID:
SUN4U-8000-2S UUID: 7ee0e46b-ea64-6565-e684-e996963f7b86
Run the clearfault command with the UUID provided in the showfaults
output:
sc> clearfault 7ee0e46b-ea64-6565-e684-e996963f7b86
Clearing fault from all indicted FRUs...
Fault cleared.
■ If the fault resulted in the DIMM being disabled, such as the following:
sc> showfaults -v
ID Time FRU Fault
1 OCT 13 12:47:27 MB/CMP0/CH0/R0/D0 MB/CMP0/CH0/R0/D0
deemed faulty and disabled
Run the enablecomponent command to enable the FRU:
sc> enablecomponent MB/CMP0/CH0/R0/D0
7. Perform the following steps to verify that there are no faults:
a. Set the virtual keyswitch to diag mode so that POST will run in service mode.
sc> setkeyswitch diag
80Sun Fire T2000 Server Service Manual • October 2005
b. Issue the poweron command.
sc> poweron
c. Switch to the system console to view POST output.
sc> console
Watch the POST output for possible fault messages. The following output is a
sign that POST did not detect any faults:
.
.
.
0:0>POST Passed all devices.
0:0>
0:0>DEMON: (Diagnostics Engineering MONitor)
0:0>Select one of the following functions
0:0>POST:Return to OBP.
0:0>INFO:
0:0>POST Passed all devices.
0:0>Master set ACK for vbsc runpost command and spin...
Note – Depending on the configuration of ALOM POST variables (see “Flowchart of
ALOM Variables for POST Configuration” on page 33) and whether POST detected
faults or not, the system might boot, or the system might remain at the ok prompt. If
the system is at the ok prompt, type boot.
d. Issue the Solaris OS fmadm faulty command.
# fmadm faulty
No memory or DIMM faults should be displayed.
If faults are reported, return to the “Diagnostic Flow Chart” on page 13 for an
approach to diagnosing the fault.
Chapter 4 Replacing Cold Swappable FRUs81
▼ To Remove the System Controller
Caution – The system controller can be hot. To avoid injury, handle it carefully.
1. Perform the procedures described in “Common Procedures for Parts
Replacement” on page 67.
2. Locate the system controller card.
See Appendix A for an illustration of the servers FRUs that shows the system
controller card.
3. Push down on the ejector levers on each side of the system controller until the
card is released from the socket.
FIGURE 4-9 Ejecting and Removing the System Controller Card
4. Grasp the top corners of the card and pull it out of the socket.
5. Place the system controller card on an antistatic mat.
6. Remove the system configuration PROM (
FIGURE 4-10) from the system controller
and place it on an antistatic mat.
The system controller contains the persistent storage for the host ID and Ethernet
MAC addresses of the system, as well as the ALOM configuration including the IP
addresses and ALOM user accounts, if configured. This information will be lost
unless the system configuration PROM is removed and installed in the replacement
system controller. The PROM does not hold the fault data, and this data will no
longer be accessible when the system controller is replaced.
82Sun Fire T2000 Server Service Manual • October 2005
System configuration
PROM
FIGURE 4-10 Locating the System Configuration PROM
▼ To Replace the System Controller Board
1. Unpackage the replacement system controller board and place it on an antistatic
mat.
2. Install the system configuration PROM that you removed from the faulty system
controller board.
The PROM is keyed to ensure proper orientation.
3. Locate the system controller slot on the motherboard assembly.
4. Ensure that the ejector levers are open.
5. Holding the bottom edge of the system controller parallel to its socket, carefully
align the system controller so that each of its contacts is centered on a socket pin.
Ensure that the system controller is correctly oriented. A notch along the bottom of
the system controller corresponds to a tab on the socket.
6. Push firmly and evenly on both ends of the system controller until it is firmly
seated in the socket.
You hear a click when the ejector levers lock into place.
7. Perform the procedures described in “Common Procedures for Finishing Up” on
page 103.
Chapter 4 Replacing Cold Swappable FRUs83
▼ To Remove the Motherboard Assembly
Although the CPU and the I/O board are two distinct boards, they must be removed
and replaced as a single motherboard assembly (
Caution – The flexible cable that connects the motherboard to the I/O board is
fragile. Handle these parts very carefully to prevent damage.
Caution – This procedure requires that you handle components that are sensitive to
static discharges that can cause the component to fail. To avoid this problem, ensure
that you follow antistatic practices as described in “To Perform Electrostatic
Discharge (ESD) Prevention Measures” on page 72.
CPU board
I/O board
FIGURE 4-11).
f
FIGURE 4-11 Motherboard Assembly
1. Perform the procedures described in “Common Procedures for Parts
Replacement” on page 67.
2. Remove all cables from the rear of the server.
Ensure that you remove all cables as well as the power cords.
3. Remove any PCI option cards that are installed and then rotate the hold-down
brackets so they do not protrude into the chassis.
84Sun Fire T2000 Server Service Manual • October 2005
4. Remove all DIMMs. See “To Remove DIMMs” on page 77 from the motherboard
assembly.
Make note of the memory configuration so you can reinstall the memory in the
replacement board.
5. Remove the system controller board from the motherboard assembly See “To
Remove the System Controller” on page 82.
6. Disconnect cables from the motherboard assembly:
■ The gray ribbon cable that runs along the left side of the chassis and
motherboard.
■ The cable marked P8 (FIGURE 4-12).
■ Disconnect the hard drive data cables and carefully pull them through the interior
wall of the chassis.
The SAS hard drive and the cable marked P8 pass through a cut out in the interior
wall of the chassis. Before removing the motherboard assembly by lifting it over
the interior wall, ensure that these cables are out of the way. The SAS hard drive
cables can readily be folded back over the interior wall or passed through the
cutout (
FIGURE 4-12). However, the cable marked P8 is large and contains a
number of small wires. The cable will not easily pass through the cutout. While
pushing and pulling the cables through the cutout be careful not to damage the
wires.
FIGURE 4-12 Cable Cutout
Chapter 4 Replacing Cold Swappable FRUs85
7. Remove the screws and nylon washers that secure the motherboard assembly to
the chassis (
FIGURE 4-13).
Caution – Do not remove the screws that hold the flexible cable in place. These
screws must be installed at the factory, and they must not be removed.
1
2
Bus bar screws
7
3
9
10
8
4
Flexible cable
(do not remove flex cable screws)
FIGURE 4-13 Location of the Screws in the Motherboard Assembly
6
5
8. Slide the motherboard assembly forward to disengage the connectors at the rear of
the motherboard assembly from the cutouts in the rear of the chassis.
86Sun Fire T2000 Server Service Manual • October 2005
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.