Submit comments about this document at: http://www.sun.com/hwdocs/feedback
Copyright 2006Sun Microsystems,Inc., 4150 NetworkCircle, SantaClara, California95054, U.S.A. Allrights reserved.
Sun Microsystems,Inc. hasintellectual propertyrights relating to technology thatis described in this document.In particular, andwithout
limitation, theseintellectual propertyrights may includeone ormore ofthe U.S. patentslisted athttp://www.sun.com/patentsand one or
more additionalpatents orpending patent applicationsin theU.S. and in other countries.
This documentand the product to whichit pertainsare distributedunder licenses restricting theiruse, copying, distribution,and
decompilation. Nopart of the product orof thisdocument may be reproduced in any formby anymeans without priorwritten authorizationof
Sun andits licensors, if any.
Third-party software, includingfont technology, iscopyrighted andlicensed from Sun suppliers.
Parts ofthe productmay be derivedfrom BerkeleyBSD systems,licensed from the University ofCalifornia. UNIXis a registered trademarkin
the U.S.and in other countries, exclusivelylicensed throughX/Open Company, Ltd.
Sun, SunMicrosystems, theSun logo, Answerbook2,docs.sun.com, Java,OpenBoot, SunSolve, SunVTS, Sun Fire,and Solarisare trademarksor
registered trademarks of SunMicrosystems, Inc.in the U.S. and inother countries.
All SPARC trademarks are usedunder licenseand aretrademarks or registered trademarksof SPARC International,Inc. inthe U.S. andin other
countries. Productsbearing SPARC trademarksare basedupon anarchitecture developedby Sun Microsystems, Inc.
The OPENLOOK and Sun™ Graphical UserInterface wasdeveloped by SunMicrosystems, Inc.for its users and licensees.Sun acknowledges
the pioneeringefforts ofXerox inresearching anddeveloping the conceptof visualor graphical user interfaces forthe computer industry.Sun
holds anon-exclusive license from Xeroxto the Xerox GraphicalUser Interface, whichlicense alsocovers Sun’s licenseeswho implementOPEN
LOOK GUIsand otherwise comply with Sun’swritten licenseagreements.
U.S. GovernmentRights—Commercial use.Government users are subjectto the SunMicrosystems, Inc.standard licenseagreement and
applicable provisionsof theFAR and its supplements.
DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES,
INCLUDING ANYIMPLIED WARRANTY OF MERCHANTABILITY, FITNESSFOR A PARTICULARPURPOSE OR NON-INFRINGEMENT,
ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Copyright 2006Sun Microsystems,Inc., 4150 NetworkCircle, SantaClara, Californie95054, Etats-Unis. Tousdroits réservés.
Sun Microsystems,Inc. ales droitsde propriété intellectuels relatants àla technologiequi est décrit dans cedocument. En particulier, etsans la
limitation, cesdroits depropriété intellectuelspeuvent inclure un ou plusdes brevetsaméricains énumérés à http://www.sun.com/patents et
un oules brevetsplus supplémentairesou les applicationsde breveten attente dans les Etats-Uniset dansles autres pays.
Ce produitou documentest protégépar un copyrightet distribuéavec des licencesqui enrestreignent l’utilisation,la copie, la distribution, etla
décompilation. Aucunepartie de ce produit oudocument nepeut êtrereproduite sousaucune forme, par quelque moyenque ce soit, sans
l’autorisation préalableet écrite de Sun etde sesbailleurs de licence,s’il yena.
Le logicieldétenu par des tiers, etqui comprendla technologie relative aux policesde caractères,est protégépar un copyright et licenciépar des
fournisseurs deSun.
Des partiesde ce produit pourrontêtre dérivées des systèmes BerkeleyBSD licenciés par l’Université deCalifornie. UNIXest une marque
déposée auxEtats-Unis et dans d’autres payset licenciéeexclusivement par X/Open Company, Ltd.
Sun, SunMicrosystems, lelogo Sun, AnswerBook2,docs.sun.com, Java,OpenBoot,SunSolve, SunVTS, Sun Fire, etSolaris sontdes marquesde
fabrique oudes marquesdéposées de SunMicrosystems, Inc.aux Etats-Uniset dans d’autres pays.
Toutes les marques SPARC sontutilisées sous licence et sontdes marquesde fabrique ou des marquesdéposées deSPARCInternational, Inc.
aux Etats-Uniset dans d’autres pays. Lesproduits portantles marquesSPARCsont baséssur une architecture développéepar Sun
Microsystems, Inc.
L’interfaced’utilisation graphiqueOPEN LOOK etSun™ aété développée parSun Microsystems,Inc. pourses utilisateurs etlicenciés. Sun
reconnaît lesefforts depionniers deXerox pour la rechercheet le développementdu conceptdes interfaces d’utilisationvisuelle ougraphique
pour l’industriede l’informatique. Sun détient unelicense nonexclusive de Xerox sur l’interface d’utilisation graphiqueXerox, cettelicence
couvrant égalementles licenciées de Sun quimettent enplace l’interface d’utilisation graphiqueOPEN LOOK etqui enoutre seconforment
aux licencesécrites de Sun.
LA DOCUMENTATION EST FOURNIE "EN L’ÉTAT" ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES
OU TACITES SONTFORMELLEMENT EXCLUES,DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE,Y COMPRISNOTAMMENT
TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L’APTITUDE A UNE UTILISATION PARTICULIERE OU A
L’ABSENCE DE CONTREFAÇON.
Please
Recycle
Contents
Prefacevii
1.Sun Fire T1000 Server Overview1
Sun Fire T1000 Server Features1
Chip-Multitheaded (CMT) Multicore Processor and Memory Technology2
Performance Enhancements2
Remote Manageability With ALOM3
System Reliability, Availability, and Serviceability4
Environmental Monitoring5
Error Correction and Parity Checking5
Predictive Self-Healing6
Chassis Identification6
Additional Service Related Information7
2.Sun Fire T1000 Server Diagnostics9
Overview of Sun Fire T1000 Server Diagnostics9
Using LEDs to Identify the State of Devices14
Front and Rear Panel LEDs16
Power Supply LEDs17
Using ALOM For Diagnosis and Repair Verification17
iii
Running ALOM Service-Related Commands19
Connecting to ALOM19
Switching Between the System Console and ALOM20
Service-Related ALOM Commands20
▼To Run the showfaults Command21
▼To Run the showenvironment Command22
▼To Run the showfru Command24
Running POST27
Controlling How POST Runs27
▼To Change POST Parameters30
Reasons to Run POST31
Routine Sanity Check of the Hardware31
Diagnosing the System Hardware31
▼To Run POST31
Using the Solaris Predictive Self-Healing Feature35
▼To Use the fmdump Command to Identify Faults37
Collecting Information From Solaris OS Files and Commands39
▼To Add or Replace the Optional PCI Express Card60
▼To Remove the Fan Tray Assembly60
▼To Replace the Fan Tray Assembly61
▼To Remove the Power Supply61
▼To Replace the Power Supply62
▼To Remove the Hard Drive63
▼To Replace the Hard Drive64
▼To Remove DIMMs65
▼To Add or Replace DIMMs66
▼To Remove the Motherboard and Chassis68
▼To Replace the Motherboard and Chassis Assembly69
▼To Remove the Clock Battery on the Motherboard70
Contentsv
▼To Replace the Clock Battery on the Motherboard71
Common Procedures for Finishing Up72
▼To Replace the Top Cover72
▼To Reinstall the Server Chassis in the Rack73
▼To Apply Power to the Server73
A.Field-Replaceable Units (FRUs)75
viSun Fire T2000 Server Service Manual • January 2006
Preface
The Sun Fire T1000 Service Manual provides information to aid in troubleshooting
problems with and replacing components within the Sun Fire™ T1000 server.
This manual is written for technicians, service personnel, and system administrators
who service and repair computer systems. The person qualified to use this manual:
■ Can open a system chassis, identify, and replace internal components.
■ Understands the Solaris Operating System and the command-line interface.
■ Has superuser privileges for the system being serviced.
This guide is organized into the following chapters:
Chapter 1 describes the main features of the Sun Fire T1000 server
Chapter 2 describes the diagnostics that are available for monitoring and
troubleshooting the Sun Fire T1000 server.
Chapter 3 describes how to remove and replace the FRUS.
Appendix A lists the customer-replaceable components in the Sun Fire T1000 server.
vii
Using UNIX Commands
Use this section to alert readers that not all UNIX commands are provided.
For example:
This document might not contain information on basic UNIX
procedures such as shutting down the system, booting the system, and configuring
devices.
See one or more of the following for this information:
■ Solaris Handbook for Sun Peripherals
■ AnswerBook2
■ Other software documentation that you received with your system
™
online documentation for the Solaris™operating environment
®
commands and
viii Sun Fire T1000 Server Service Manual • January 2006
Typographic Conventions
Typeface
AaBbCc123The names of commands, files,
AaBbCc123
AaBbCc123Book titles, new words or terms,
1 The settings on your browser might differ from these settings.
1
MeaningExamples
Edit your.login file.
and directories; on-screen
computer output
What you type, when contrasted
with on-screen computer output
words to be emphasized.
Replace command-line variables
with real names or values.
Use ls -a to list all files.
% You have mail.
% su
Password:
Read Chapter 6 in the User ’s Guide.
These are called class options.
Yo u must be superuser to do this.
To delete a file, type rm filename.
Shell Prompts
ShellPrompt
C shellmachine-name%
C shell superusermachine-name#
Bourne shell and Korn shell$
Bourne shell and Korn shell superuser#
Sun Fire T1000 Server Documentation
You can view and print the following documents from the Sun documentation web
Prefaceix
site at http://www.sun.com/documentation
TitleDescriptionPart Number
Sun Fire T1000 Server Site Planning
Data Guide
Sun Fire T1000 Server Product NotesLate-breaking information about the
Sun Fire T1000 Server Product
Overview
Sun Fire T1000 Server Getting
Started Guide
Sun Fire T1000 Server Installation
Guide
Sun Fire T1000 Server System
Administration Guide
Advanced Lights Out Management
(ALOM) CMT v1.1 Guide
Site planning information for the
Sun Fire T1000 server
server. The latest notes are posted at:
http://www.sun.com/documentation
Provides an overview of the features of
this server
Information about where to find
documentation to get your system
installed and running quickly
Detailed rack mounting, cabling, poweron, and configuration information
How to perform administrative tasks that
are specific to the Sun Fire T1000 server
How to use the Advanced Lights Out
Manager (ALOM) software on the Sun
Fire T1000 server
Accessing Sun Documentation
819-3246
819-3244
819-3247
819-3249
819-3248
819-3250
819-3246
You can view, print, or purchase a broad selection of Sun™ documentation,
including localized versions, at:
http://www.sun.com/documentation
Third-Party Web Sites
Sun is not responsible for the availability of third-party web sites mentioned in this
document. Sun does not endorse and is not responsible or liable for any content,
advertising, products, or other materials that are available on or through such sites
x Sun Fire T1000 Server Service Manual • January 2006
or resources. Sun will not be responsible or liable for any actual or alleged damage
or loss caused by or in connection with the use of or reliance on any such content,
goods, or services that are available on or through such sites or resources.
Contacting Sun Technical Support
If you have technical questions about this product that are not answered in this
document, go to:
http://www.sun.com/service/contacting
Sun Welcomes Your Comments
Sun is interested in improving its documentation and welcomes your comments and
suggestions. You can submit your comments by going to:
http://www.sun.com/hwdocs/feedback
Please include the title and part number of your document with your feedback:
Sun Fire T1000 Server Service Manual, part number 819-3248-10
Prefacexi
xii Sun Fire T1000 Server Service Manual • January 2006
CHAPTER
1
Sun Fire T1000 Server Overview
This chapter provides an overview of the features of the Sun Fire T1000 server.
The following topics are covered:
■ “Sun Fire T1000 Server Features” on page 1
■ “Chassis Identification” on page 6
Sun Fire T1000 Server Features
The Sun Fire T1000 server FIGURE 1-1 is a high-performance, entry-level server that is
highly scalable and very reliable.
FIGURE 1-1 Sun Fire T1000 Server
1
Chip-Multitheaded (CMT) Multicore
Processor and Memory Technology
The UltraSPARC®T1 multicore processor is the basis of the Sun Fire T1000 server.
The UltraSPARC T1 processor is based on chip multithreading (CMT) technology
that is optimized for highly threaded transactional processing. The UltraSPARC T1
processor improves throughput while using less power and dissipating less heat
than conventional processor designs.
Depending on the model purchased, the processor has six or eight UltraSPARC
cores. Each core equates to a 64-bit execution pipeline capable of running four
threads. The result is that the 8-core processor handles up to 32 active threads
concurrently.
Additional processor components, such the
cache, and the Jbus I/O interface have been carefully tuned for optimal
performance.
shows the major components in the Sun Fire T1000 server.
DIMMs
Fan tray
assembly
FIGURE 1-2 Sun Fire T1000 Server Components
Hard disk drive
DDR2 memory controllers, L1 cache, L2
PCI-E socket
and slot
Motherboard and
chassis assembly
UltraSPARC T1
mullticore processor
Power
supply
Performance Enhancements
The Sun Fire T1000 server introduces several new technologies with its sun4v
architecture and multicore, multithreaded UltraSPARC T1 multicore processor.
2Sun Fire T1000 Server Service Manual • January 2006
TABLE 1-1 lists feature specifications for the Sun Fire T1000 server.
TABLE 1-1Sun Fire T1000 System Features
FeatureDescription
Processor1 UltraSPARC T1 multicore processor (6 or 8 cores)
Memory8 slots that can be populated with one of the following types of
• A green Link indicator, lit when a link is established at any speed,
• A yellow Activity indicator, which blinks during packet transfers.
DB-9 serial port1 DB-9 serial port
Internal hard disk
drive
Cooling4 fans in a single assembly
PCI interface1 PCI-Express (PCI-E) slot for low-profile cards (supports 1x, 4x, and
Power1 power supply (PS)
FirmwareOpenBoot™ PROM for reset and POST support
Operating systemSolaris 10 1/06 or later Operating System preinstalled on the hard
Other softwareJava™ Enterprise System with a 90-day trial license
1 SATA disk drive, 3.5-inch form factor
Support for hardware-embedded RAID 1 (mirroring)
8x width cards)
ALOM system controller (integrated on motherboard) with a serial
and 10/100 Mbit Ethernet port
ALOM-CMT for remote management administration
disk drive
For additional information on the Sun Fire T1000 server features refer to the Sun FireT1000 Server Product Overview.
Remote Manageability With ALOM
The Sun Advanced Lights Out Manager (ALOM) feature is a system controller (SC)
that enables to you remotely manage and administer the Sun Fire T1000 server.
Chapter 1 Sun Fire T1000 Server Overview3
The ALOM-CMT software is preinstalled as firmware, and therefore, ALOM
initializes as soon as you apply power to the system. You can customize ALOM to
work with your particular installation.
ALOM enables you to monitor and control your server over a network, or by using
a dedicated serial port for connection to a terminal or terminal server. ALOM
provides a command-line interface that you can use to remotely administer
geographically distributed or physically inaccessible machines. In addition, ALOM
enables you to run diagnostics (such as POST) remotely that would otherwise
require physical proximity to the server’s serial port.
You can configure ALOM to send email alerts of hardware failures, hardware
warnings, and other events related to the server or to ALOM. The ALOM circuitry
runs independently of the server, using the server’s standby power. Therefore,
ALOM firmware and software continue to function when the server operating
system goes offline or when the server is powered off. ALOM monitors the
following Sun Fire T1000 server components:
■ Hard disk drive status
■ Enclosure thermal conditions
■ Power supply status
■ Voltage levels
■ Faults detected by POST (Power-On Self-Test)
■ Solaris OS Predictive Self Healing (PSH) diagnostic facilities
For information about configuring and using the ALOM system controller, refer to
the Sun Fire T1000 Server Advanced Lights Out Manager (ALOM) Guide.
System Reliability, Availability, and Serviceability
Reliability, availability, and serviceability (RAS) are aspects of a system’s design that
affect its ability to operate continuously and to minimize the time necessary to
service the system. Reliability refers to a system’s ability to operate continuously
without failures and to maintain data integrity. System availability refers to the
ability of a system to recover to an operational state after a failure, with minimal
impact. Serviceability relates to the time it takes to restore a system to service
following a system failure. Together, reliability, availability, and serviceability
features provide for near continuous system operation.
To deliver high levels of reliability, availability, and serviceability, the Sun Fire T1000
server offers the following features:
■ Environmental monitoring
■ Error detection and correction for improved data integrity
■ Easy access for most component replacements
■ Extensive POST tests that automatically delete faulty components from the
configuration.
4Sun Fire T1000 Server Service Manual • January 2006
■ PSH automated run time diagnosis capability that takes faulty components off
line.
For more information about using RAS features, refer to the Sun Fire T1000 ServerSystem Administration Guide.
Environmental Monitoring
The Sun Fire T1000 server features an environmental monitoring subsystem
designed to protect the server and its components against:
■ Extreme temperatures
■ Lack of adequate airflow through the system
■ Power supply failure
■ Hardware faults
Temperature sensors throughout the system monitor the ambient temperature of the
system and internal components. The software and hardware ensure that the
temperatures within the enclosure do not exceed predetermined safe operating
ranges. If the temperature observed by a sensor falls below a low-temperature
threshold or rises above a high-temperature threshold, the monitoring subsystem
software lights the amber Service required LEDs on the front and back panels. If the
temperature condition persists and reaches a critical threshold, the system initiates a
graceful system shutdown.
All error and warning messages are sent to the ALOM system controller system
console and logged in the ALOM log file. Additionally, some FRUs such as the
power supply provide LEDs that indicate a failure within the FRU.
Additionally, the power supply contains an LED that is lit to indicate a failure within
the power supply.
Error Correction and Parity Checking
The SPARC T1 multicore processor provides parity protection on its internal cache
memories, including tag parity and data parity on the D-cache and I-cache. The
internal 3MB L2 cache has parity protection on the tags, and ECC protection of the
data.
Advanced ECC, also called Chipkill, detects up to 4-bits in error.
Chapter 1 Sun Fire T1000 Server Overview5
Predictive Self-Healing
The Sun Fire T1000 server features the latest fault management technologies. With
the Solaris 10 Operating System (OS), Sun is introducing a new architecture for
building and deploying systems and services capable of predictive self-healing. Selfhealing technology enables Sun systems to accurately predict component failures
and mitigate many serious problems before they actually occur. This technology is
incorporated into both the hardware and software of the Sun Fire T2000 server.
At the heart of the predictive self-healing capabilities is the Solaris Fault Manager, a
new service that receives data relating to hardware and software errors, and
automatically and silently diagnoses the underlying problem. Once a problem is
diagnosed, a set of agents automatically responds by logging the event, and if
necessary, takes the faulty component offline. By automatically diagnosing
problems, business-critical applications and essential system services can continue
uninterrupted in the event of software failures, or major hardware component
failures.
Chassis Identification
FIGURE 1-3 and FIGURE 1-4 show the physical characteristics of the Sun Fire T1000
server.
Power OK
LED and
Power
On/Off
button
FIGURE 1-3 Sun Fire T1000 Server Front Panel
6Sun Fire T1000 Server Service Manual • January 2006
Service
required
LED
Locator
LED/button
Ethernet ports
PCI-E slot
Power supply
LEDs
FIGURE 1-4 Sun Fire T1000 Server Rear Panel
Locator
LED/
button
ServicePower OK LED
required
LED
Additional Service Related Information
In addition to this document, the following resources are available to help you keep
your server running optimally:
■ Product Notes – The Sun Fire T1000 Server Product Notes (819-3244) contain late
breaking information about the system including required software patches,
updated hardware and compatibility information, and solutions to know issues.
The product notes are available online at:
http://www.sun.com/documentation
■ Release Notes – The Solaris OS Release Notes contain important information
about the Solaris operating system. The release notes are available online at:
http://www.sun.com/documentation
■ SunSolve™ Online – Provides a collection of support resources. Depending on
the level of your service contract, you have access to Sun patches, the Sun System
Handbook, the SunSolve knowledge base, the Sun Support Forum, and additional
documents, bulletins, and related links. Access this site at:
http://sunsolve.sun.com
■ Predictive Self-Healing Knowledge Database – You can access the knowledge
article corresponding to a self-healing message by taking the Sun Message
Identifier (SUNW-MSG-ID) and entering it into the field on this page:
http://www.sun.com/msg
DB9
serial
port
System
console
ports
Chapter 1 Sun Fire T1000 Server Overview7
8Sun Fire T1000 Server Service Manual • January 2006
CHAPTER
2
Sun Fire T1000 Server Diagnostics
This chapter describes the diagnostics that are available for monitoring and
troubleshooting the Sun Fire T1000 server. This chapter does not provide detailed
troubleshooting procedures, but instead describes the Sun Fire T1000 server
diagnostics facilities and how to use them.
This chapter is intended for technicians, service personnel, and system
administrators who service and repair computer systems.
The following topics are covered:
■ “Overview of Sun Fire T1000 Server Diagnostics” on page 9
■ “Using LEDs to Identify the State of Devices” on page 14
■ “Using ALOM For Diagnosis and Repair Verification” on page 17
■ “Running POST” on page 27
■ “Using the Solaris Predictive Self-Healing Feature” on page 35
■ “Collecting Information From Solaris OS Files and Commands” on page 39
■ “Managing System Components with Automatic System Recovery Commands”
on page 40
■ “Exercising the System with SunVTS” on page 43
Overview of Sun Fire T1000 Server
Diagnostics
There are a variety of diagnostic tools, commands, and indicators you can use to
troubleshoot a Sun Fire T1000 server.
■ LEDs – provide a quick visual notification of the status of the server and of some
of the FRUs.
9
■ ALOM-CMT firmware – is the system firmware that runs on the system
controller. In addition to providing the interface between the hardware and OS,
ALOM also tracks and reports the health of key server components. ALOM works
closely with POST and Solaris predictive self healing technology to keep the
system up and running even when there is a faulty component.
■ Power-On self-test (POST) – Performs diagnostics on system components upon
system reset to ensure the integrity of those components. POST is configureable
and works with ALOM to take faulty components offline if needed and blacklist
them in the asr-db.
■ Solaris OS predictive self healing (PSH) – Continuously monitors the health of
the CPU and memory, and works with ALOM to take a faulty component offline
if needed.
■ Log files and console messages – Provide the standard Solaris OS log files and
investigative commands that can be accessed and displayed on the device of your
choice.
■ SunVTS™ – is an application you can run that exercises the system, provides
hardware validation, and discloses possible faulty components with
recommendations for repair.
The LEDs, ALOM, Solaris OS PSH, and many of the log files and console messages
are integrated. For example, a fault detected by the Solaris PSH software will display
the fault, log it, pass information to ALOM where it is logged, and depending on the
fault, might result in the illumination of one or more LEDs.
The diagnostic flowchart in
FIGURE 2-1 and TABLE 2-1 describe an approach for using
the servers diagnostics that is likely identify a faulty field-replaceable unit (FRU).
The diagnostics you use, and the order in which you use them, depend on the nature
of the problem you are troubleshooting, so you might not follow this flow step-bystep.
The flowchart assumes that you have already performed some rudimentary
troubleshooting such as verification of proper installation, visual inspection of cables
and power, and possibly reset server (For details, refer to the Sun Fire T1000 ServerInstallation Guide and Sun Fire T1000 Server Administration Guide .
Use this flow chart to understand what diagnostics are available to troubleshoot
faulty hardware, and use TABLE 2-1 to find more information about each diagnostic
in this chapter.
For many faults, service can be deferred, either because the faulty component has
been asr'd out, the fault is being corrected, or the fault is predictive
10Sun Fire T1000 Server Service Manual • January 2006
Suspect
faulty
hardware
No
1.
Is the power
supply
fault LED
lit?
Ye s
2.
Connect power
cord or replace
faulty power
supply.
Numbers in this flowchart
correspond to the Action
numbers in Table 2-1.
3.
Are any
faults reported by
the showfaults
command?
Ye s
4.
Is a
fault message
ID (MSG-ID)
displayed?
Ye s
5. Enter the
message ID into
the Sun Knowl-
edge Article
web site for
recommended
actions
9.
Do the
Solaris logs
NoNo
indicate a
faulty FRU?
No
10.
Identify and
replace faulty
Ye s
FRU.
No
7.
showenvironment command
reports overtemp
cond?
Ye s
Ye s
11.
Does POST
report any faulty
devices?
No
12.
8.
8.
Find cause of
Find cause of
overtemp cond.
overtemp
report any faulty
Ye s
Does SunVTS
devices?
No
6.
Did the
article recom-
mend a FRU
replacement?
No
FIGURE 2-1 Diagnostic Flow Chart
Ye s
13.
Perform recommended corrective
actions. If needed,
contact Sun for
support
Chapter 2 Sun Fire T1000 Server Diagnostics11
TABLE 2-1Diagnostic Flow Chart Actions
Action
No.Diagnostic ActionResulting Action
1.Check the power
supply fault LED.
The amber Fault LED indicates the power cord in
unplugged or the power supply is faulty.
• If the Fault LED is lit, go to Action 2.
2.Check the power
cord.
Connect the power cord.
• If the Fault LED is still lit, replace faulty power
supply.
• If the green LEDs are lit, go to Action 3.
3.Run the ALOM
showfaults
command.
The showfaults command displays faults
detected by the system firmware.
• If faults are displayed, go to Action 2.
• If no faults are displayed, go to Action 6.
4.Check fault
message for a Sun
Message ID.
Sun Message IDs (SUNW-MSG-ID) indicate that
information is available from Sun’s knowledge
article database.
• If you have a message ID number, go to Action 5.
• If you do not have a message ID number, go to
Action 10.
5.Enter the Sun
Message ID into
the Sun
Enter the Sun Message ID number into the
knowledge article web site at:
http:www.sun.com/msg and go to Action 4.
Knowledge
Article web site.
For more information, see
these sections
“To Remove the Power
Supply” on page 61 and
“To Replace the Power
Supply” on page 62
“To Run the showfaults
Command” on page 21
“Using the Solaris
Predictive Self-Healing
Feature” on page 35
6.Analyze the
suggested actions.
In some cases, fault related messages are identified
with suggested actions.
• If the suggested action recommends replacing a
FRU, go to Action 9.
• If the suggested action does not recommend
replacing a FRU, perform the suggested action.
Contact Sun for additional support, if needed
7.Run the ALOM
showenvironment
command.
12Sun Fire T1000 Server Service Manual • January 2006
The showenvironment command reports over
temperature conditions when the ambient room
temperature exceeds the upper limit.
14.Verify the repair.Various commands and utilities can be used to
15.Contact Sun for
Support.
The FRUs require that you shut down the server to
perform a cold-swap.
After replacing the faulty FRU, go to Action 14.
verify the functionality of the system components.
Two useful commands are:
• The ALOM showfaults command
• The ASR showcomponents command
If the FRU is blacklisted, you can manually remove
it from the black list with the enablecomponent
command.
If the fault is cleared, and the component is not
blacklisted, the repair is verified well enough to
boot the server. For added assurance, you can run
the SunVTS diagnostic software.
The majority of hardware faults are detected by the
server’s diagnostics. In rare cases it is possible that
a problem requires additional troubleshooting. If
you are unable to determine the cause of the
problem, contact Sun for support.
For more information, see
these sections
“Removing and Replacing
FRUs” on page 51
“To Run the showfaults
Command” on page 21
“Managing System
Components with
Automatic System
Recovery Commands” on
page 40
“Exercising the System
with SunVTS” on page 43
Sun Support information:
http://www.sun.com/
service/contacting
Using LEDs to Identify the State of
Devices
The Sun Fire T1000 server provides the following groups of LEDs:
AC OK
■ Front and rear panel LEDS (FIGURE 2-2, FIGURE 2-3, and TABLE 2-2)
LED
■ Power supply LEDs (FIGURE 2-3 and TABLE 2-3)
These LEDs provide a quick visual check of the state of the system.
14Sun Fire T1000 Server Service Manual • January 2006
Power OK
LED/power
on/off button
FIGURE 2-2 Sun Fire T1000 Server Front Panel
Service
required
LED
Locator
LED
Fault LED
C OK
ED
Power OK LEDDB9 serial
DC OK
LED
FIGURE 2-3 Sun Fire T1000 Server Rear Panel LEDs
Locator
LED
Service required
LED
Activity
LED
Link
LED
Ethernet
ports
port
System
console
ports
Link
LED
Activity
LED
Chapter 2 Sun Fire T1000 Server Diagnostics15
Front and Rear Panel LEDs
Two LEDs and one LED/button are located in the upper left corner of the front
panel (
TABLE 2-2). The LEDs are also provided on the rear panel.
TABLE 2-2Front and Rear Panel LEDs
LEDColorDescription
Locator
*
and
LED
button
Service
required
LED*
Power OK
LED* and
Power
On/Off
button
Ethernet
Activity
LEDs
WhiteEnables you to identify a particular server. The LED is controlled
using one of the following methods:
• Issuing the setlocator on or off command.
• Pressing the button to toggle the indicator on or off.
This LED provides the following indications:
• Off– Normal operating state.
• Fast blink – The server received a signal as a result of one of the
preceding methods and is indicating here I am—that it is
operational.
YellowIf on, indicates that service is required. The ALOM showfaults
command will indicate any faults causing this indicator to light.
GreenThe LED provides the following indications:
• Off – The system is unavailable. Either it has no power or
ALOM is not running.
Steady on – Indicates that the system is powered on and is
running in its normal operating state. No service actions are
required.
• Standby blink – Indicates the system is running at a minimum
level in standby and is ready to be quickly returned to full
function. The service processor is running.
• Slow blink – Indicates that a normal transitory activity is taking
place. This could indicate that the system diagnostics are
running, or that the system is booting.
The Power On/Off button turns the server on and off.
There is no
Power On/Off button on the rear panel.
GreenThese LEDs indicate that there is activity on the associated net(s).
16Sun Fire T1000 Server Service Manual • January 2006
TABLE 2-2Front and Rear Panel LEDs
LEDColorDescription
Ethernet
Link LEDs
System
console
Activity
LED
System
console Link
LED
* Provided on the front and rear panel.
YellowThese LEDs indicate that the system is linked to the associated
net(s).
GreenThis LED indicates that there is activity on the associated system
console.
YellowThese LEDs indicate that the system is linked to the associated
system console.
Power Supply LEDs
The power supply LEDs (TABLE 2-3) are located on the back of the power supply.
TABLE 2-3
NameColorDescription
FaultAmberOn – Power supply has detected a failure.
DC OKGreenOn – Normal operation. DC output voltage is within normal limits.
AC OKGreenOn – Normal operation. Input power is within normal limits.
Power Supply LEDs
Off – Normal operation.
Off – Power is off.
Off – No input voltage, or input voltage is below limits.
Using ALOM For Diagnosis and Repair
Verification
The Sun Advanced Lights Out Manager (ALOM) is a system controller on the Sun
Fire T1000 server motherboard that enables you to remotely manage and administer
your server.
Chapter 2 Sun Fire T1000 Server Diagnostics17
ALOM enables you to run diagnostics remotely such as power-on self test (POST),
that would otherwise require physical proximity to the server s serial port. You can
also configure ALOM to send email alerts of hardware failures, hardware warnings,
and other events related to the server or to ALOM.
The ALOM circuitry runs independently of the server, using the server s standby
power. Therefore, ALOM firmware and software continue to function when the
server operating system goes offline or when the server is powered off.
Note – For comprehensive ALOM information, refer to the Sun Fire T1000 Server
Advanced Lights Out Manager (ALOM) guide.
Faults detected by ALOM, POST, and the Solaris Predictive Self-healing (PSH)
technology are forwarded to the ALOM for fault handling (
FIGURE 2-4).
In the event of a system fault, ALOM ensures that the Service required LED is lit,
FRU ID PROMs are updated, the fault is logged, and alerts are displayed.
FIGURE 2-4 ALOM Fault Management
ALOM sends alerts to all ALOM users that are logged in, sending the alert through
email to a configured email address, and writing the event to the ALOM event log.
■ Fault recovery – The system automatically detects that the fault condition is no
longer present. ALOM extinguishes the Service required LED and updates the
FRUs PROM, indicating that the fault is no longer present.
■ Fault repair – The fault has been repaired by human intervention. In most cases,
ALOM detects the repair and extinguishes the Service required LED. In the event
that ALOM does not perform these actions, you must perform these tasks
manually with clearfault or enablecomponent commands.
ALOM can detect the removal of a FRU, in many cases even if the FRU is removed
while ALOM is powered off. This enables ALOM to know that a fault, diagnosed to
a specific FRU, has been repaired. The ALOM clearfault command enables you to
18Sun Fire T1000 Server Service Manual • January 2006
manually clear certain types of faults without a FRU replacement or if ALOM was
unable to automatically detect the FRU replacement. ALOM does not automatically
detect hard drive replacement.
Persistent environmental faults can automatically recover. A temperature that is
exceeding a threshold may return to normal limits. An unplugged a power supply
can be plugged in and so on. Recovery of environmental faults is automatically
detected. Recovery events are reported using one of two forms:
fru at location is OK.
sensor at location is within normal range.
Environmental faults can be repaired through removal and replacement of the faulty
FRU. FRU removal is automatically detected by the environmental monitoring and
all faults associated with the removed FRU are cleared. The message for that case,
and the alert sent for all FRU removals is:
fru at location has been removed.
There is no ALOM command to manually repair an environmental fault.
ALOM does not handle hard drive faults. Use the Solaris message files to view hard
drive faults. See “Collecting Information From Solaris OS Files and Commands” on
page 39.
Running ALOM Service-Related Commands
This section describes the ALOM commands that are commonly used for servicerelated activities.
Connecting to ALOM
Before you can run ALOM commands, you must connect to the ALOM. There are
several ways to connect to the system controller:
■ Connect an ASCII terminal directly to the serial management port.
■ Use the telnet command to connect to ALOM through an Ethernet connection
on the network management port.
■ Connect an external modem to the network management port and dial-in to the
modem.
Note – Refer to the Sun Fire T1000 Server Advanced Lights Out Manager (ALOM)
Guide for instructions on configuring and connecting to ALOM.
Chapter 2 Sun Fire T1000 Server Diagnostics19
Switching Between the System Console and ALOM
■ To switch from the console output to the ALOM sc> prompt, type #. (Pound
Period).
■ To switch from the sc> prompt to the console, type console.
Service-Related ALOM Commands
TABLE 2-4 describes the typical ALOM commands for servicing a Sun Fire T1000
server. For descriptions of all ALOM commands, issue the help command or refer
to the Sun Fire T1000 Server Advanced Lights Out Management (ALOM) Guide.
TABLE 2-4Service-Related ALOM Commands
ALOM CommandDescription
help [command]Displays a list of all ALOM commands with syntax and descriptions.
Specifying a command name as an option displays help for that command.
clearfault UUIDManually clears system faults. UUID is the unique fault ID of the fault to
be cleared.
powercycle [-f]Performs a poweroff followed by poweron. The -f option forces an
immediate poweroff, otherwise the command attempts a graceful
shutdown.
poweroff [-y][-f]Removes the main power from the host server. The -y option enables you
to skip the confirmation question. The -f option forces an immediate
shutdown. CAUTION: Using the -y option to skip the confirmation
question could enable you to inadvertently shut down the system.
poweron [-y][-c][FRU]Applies the main power to the host server. or FRU. The -y option enables
you to skip the confirmation question. The [-c] option instructs ALOM to
connect to the system console after performing the operation.
removefru[-y][FRU]Prepares a FRU for removal, and illuminates the host system’s OK to
Remove LED. >
reset [-y]-[-c]Generates a hardware reset on the host server. The -y option enables you
to skip the confirmation question. The[-c option instructs ALOM to
connect to the system console after performing the operation.
resetsc [-y]Reboots the ALOM system controller. The -y option enables you to skip
the confirmation question.
setkeyswitch [normal | stby
| diag | locked]
setlocator [on | off]Turns the Locator LED on the server on or off.
Sets the virtual keyswitch.
20Sun Fire T1000 Server Service Manual • January 2006
showenvironmentDisplays the environmental status of the host server. This information
includes system temperatures, power supply, front panel LED, hard drive,
fan, voltage, and current sensor status. See “To Run the
showenvironment Command” on page 22.
showfaults [-v]Displays current system faults. See “To Run the showfaults Command”
on page 21.
showfru [-g lines][-s | -d]
[FRU]
Displays information about the FRUs in the server.
• The -g lines option specifies the number of lines to display before
pausing the output to the screen.
• The -s option displays static information about system FRUs (defaults
to all FRUs, unless one is specified)
• The -d displays dynamic information about system FRUs (defaults to all
FRUs, unless one is specified). See “To Run the showfru Command” on
page 24
showkeyswitchDisplays the status of the virtual keyswitch.
showlocatorDisplays the current state of the Locator LED as either on or off.
showlogs [-b lines | -e lines][-
Displays the history of all events logged in the ALOM event buffer.
g lines][-v]
showplatform [-v]Displays information about the host system’s hardware configuration, and
whether the hardware is providing service.
Note – For the ALOM ASR commands, see TABLE 2-7.
▼ To Run the showfaults Command
The showfaults command displays faults handled by ALOM. Use the
showfaults command for the following reasons:
■ To see if any faults have been passed to, or detected by ALOM.
■ To obtain the fault message ID (SUNW-MSG-ID).
■ To verify that the replacement of a FRU has cleared the fault and not generated
any additional faults.
Chapter 2 Sun Fire T1000 Server Diagnostics21
● At the sc> prompt, type the showfaults command.
sc> showfaults -v
Last POST run: WED OCT 20 19:32:24 2004
POST status: Passed all devices
ID Time FRU Fault
1 OCT 21 14:32:48 MB/CMP0/CH0/R1/D0 Host detected fault,
MSGID:
SUN4U-8000-2S UUID: a26d5379-24b8-4a46-bcbf-d9e1ff75a1bc
In this example, showfaults is reporting a memory error at DIMM location
MB/CMP0/CH0/R1/D0. (J0701).
▼ To Run the showenvironment Command
The showenvironment command displays a snapshot of the server’s
environmental status. The information this command can display includes system
temperatures, hard drive status, power supply and fan status, and voltage and
current sensors.
Note – You do not need user permissions to use this command.
● At the sc> prompt, type the showenvironment command.
sc> showenvironment
=============== Environmental Status ===============
-------------------------------------------------------------------------------System Temperatures (Temperatures in Celsius):
-------------------------------------------------------------------------------Sensor Status Temp LowHard LowSoft LowWarn HighWarn HighSoft HighHard
-------------------------------------------------------------------------------MB/T_AMB OK 28 -10 -5 0 45 50 55
MB/CMP0/T_TCORE OK 50 -10 -5 0 85 90 95
MB/CMP0/T_BCORE OK 51 -10 -5 0 85 90 95
MB/IOB/T_CORE OK 49 -10 -5 0 95 100 105
---------------------------------------------------------Fans (Speeds Revolution Per Minute):
---------------------------------------------------------Sensor Status Speed Warn Low
---------------------------------------------------------FT0/F0 OK 6762 2240 1920
FT0/F1 OK 6762 2240 1920
FT0/F2 OK 6762 2240 1920
FT0/F3 OK 6653 2240 1920
-------------------------------------------------------------------------------Voltage sensors (in Volts):
-------------------------------------------------------------------------------Sensor Status Voltage LowSoft LowWarn HighWarn HighSoft
-------------------------------------------------------------------------------MB/V_VCORE OK 1.30 1.20 1.24 1.36 1.39
MB/V_VMEM OK 1.79 1.69 1.72 1.87 1.90
MB/V_VTT OK 0.89 0.84 0.86 0.93 0.95
MB/V_+1V2 OK 1.18 1.09 1.11 1.28 1.30
MB/V_+1V5 OK 1.49 1.36 1.39 1.60 1.63
MB/V_+2V5 OK 2.51 2.27 2.32 2.67 2.72
MB/V_+3V3 OK 3.29 3.06 3.10 3.49 3.53
MB/V_+5V OK 5.02 4.55 4.65 5.35 5.45
MB/V_+12V OK 12.25 10.92 11.16 12.84 13.08
MB/V_+3V3STBY OK 3.33 3.13 3.16 3.53 3.59
----------------------------------------------------------System Load (in amps):
----------------------------------------------------------Sensor Status Load Warn Shutdown
----------------------------------------------------------MB/I_VCORE OK 20.560 80.000 88.000
MB/I_VMEM OK 8.160 60.000 66.000
-----------------------------------------------------------------------------Supply Status Underspeed Overtemp Overvolt Undervolt Overcurrent
-----------------------------------------------------------------------------PS0 OK OFF OFF OFF OFF OFF
sc>
Chapter 2 Sun Fire T1000 Server Diagnostics23
Note – Some information might not be available when the server is in standby
mode.
▼ To Run the showfru Command
Note – By default, the output of the showfru command for all FRUs is very long.
The showfru command displays information about the FRUs in the server. Use this
command to see information about an individual FRU, or for all the FRUs.
Note – You do not need user permissions to use this command.
24Sun Fire T1000 Server Service Manual • January 2006
FRU_PROM at MB/CMP0/CH0/R0/D0/SEEPROM
/SPD/Timestamp: MON OCT 03 12:00:00 2005
/SPD/Description: DDR2 SDRAM, 2048 MB
/SPD/Manufacture Location:
/SPD/Vendor: Infineon (formerly Siemens)
/SPD/Vendor Part No: 72T256220HR3.7A
/SPD/Vendor Serial No: d03fe27
FRU_PROM at MB/CMP0/CH0/R0/D1/SEEPROM
/SPD/Timestamp: MON OCT 03 12:00:00 2005
/SPD/Description: DDR2 SDRAM, 2048 MB
/SPD/Manufacture Location:
/SPD/Vendor: Infineon (formerly Siemens)
/SPD/Vendor Part No: 72T256220HR3.7A
Chapter 2 Sun Fire T1000 Server Diagnostics25
/SPD/Vendor Serial No: d03f623
FRU_PROM at MB/CMP0/CH0/R1/D0/SEEPROM
/SPD/Timestamp: MON OCT 03 12:00:00 2005
/SPD/Description: DDR2 SDRAM, 2048 MB
/SPD/Manufacture Location:
/SPD/Vendor: Infineon (formerly Siemens)
/SPD/Vendor Part No: 72T256220HR3.7A
/SPD/Vendor Serial No: d03fc26
FRU_PROM at MB/CMP0/CH0/R1/D1/SEEPROM
/SPD/Timestamp: MON OCT 03 12:00:00 2005
/SPD/Description: DDR2 SDRAM, 2048 MB
/SPD/Manufacture Location:
/SPD/Vendor: Infineon (formerly Siemens)
/SPD/Vendor Part No: 72T256220HR3.7A
/SPD/Vendor Serial No: d03eb26
FRU_PROM at MB/CMP0/CH3/R0/D0/SEEPROM
/SPD/Timestamp: MON OCT 03 12:00:00 2005
/SPD/Description: DDR2 SDRAM, 2048 MB
/SPD/Manufacture Location:
/SPD/Vendor: Infineon (formerly Siemens)
/SPD/Vendor Part No: 72T256220HR3.7A
/SPD/Vendor Serial No: d03e620
FRU_PROM at MB/CMP0/CH3/R0/D1/SEEPROM
/SPD/Timestamp: MON OCT 03 12:00:00 2005
/SPD/Description: DDR2 SDRAM, 2048 MB
/SPD/Manufacture Location:
/SPD/Vendor: Infineon (formerly Siemens)
/SPD/Vendor Part No: 72T256220HR3.7A
/SPD/Vendor Serial No: d040920
FRU_PROM at MB/CMP0/CH3/R1/D0/SEEPROM
/SPD/Timestamp: MON OCT 03 12:00:00 2005
/SPD/Description: DDR2 SDRAM, 2048 MB
/SPD/Manufacture Location:
/SPD/Vendor: Infineon (formerly Siemens)
/SPD/Vendor Part No: 72T256220HR3.7A
/SPD/Vendor Serial No: d03ec27
FRU_PROM at MB/CMP0/CH3/R1/D1/SEEPROM
26Sun Fire T1000 Server Service Manual • January 2006
/SPD/Timestamp: MON OCT 03 12:00:00 2005
/SPD/Description: DDR2 SDRAM, 2048 MB
/SPD/Manufacture Location:
/SPD/Vendor: Infineon (formerly Siemens)
/SPD/Vendor Part No: 72T256220HR3.7A
/SPD/Vendor Serial No: d040924
sc>
If you do not provide a command-line argument, all FRUs are listed.
Running POST
Power on self test (POST) is a group of PROM-based tests that run when the server
is powered on or reset. POST checks the basic integrity of the critical hardware
components in the server (motherboard, memory, and I/O buses).
If POST detects a faulty component, it is disabled automatically. If the system is
capable of running without the disabled component, the system will boot when
POST is complete. For example, if one of the processor cores is deemed faulty by
POST, the core will be disabled, and the system will boot and run using the
remaining cores.
Devices can be manually enabled or disabled using ASR commands (see “Managing
System Components with Automatic System Recovery Commands” on page 40).
Controlling How POST Runs
The server can be configured for normal, extensive, or no POST execution. You can
also control the level of tests that run, the amount of POST output that is displayed,
and which reset events trigger POST by using ALOM variables.
Chapter 2 Sun Fire T1000 Server Diagnostics27
TABLE 2-5 lists the ALOM variables used to configure POST and FIGURE 2-5 shows
how the variables work together.
TABLE 2-5ALOM Parameters Used For POST Configuration
ParameterValuesDescription
setkeyswitch
diag_modeoffPOST does not run.
diag_levelminIf diag_mode = normal, run minimum set of
diag_triggernoneDo not run POST on reset.
diag_verbositynoneNo POST output is displayed.
* All of these parameters are set using the ALOM setsc command except for the setkeyswitch command.
*
normalThe system can power on and run POST (based
on the other parameter settings). For details see
FIGURE 2-5. This parameter overrides all other
commands.
diagThe system runs POST based on predetermined
settings.
stbyThe system cannot power on.
lockedThe system can power on and run POST, but no
flash updates can be made.
normalRuns POST according to diag_level value.
serviceRuns POST with preset values for diag_level
and diag_verbosity.
tests.
maxIf diag_mode = normal, runs all the minimum
tests plus extensive CPU and memory tests.
user-resetRuns POST upon user initiated resets.
power-on_resetOnly run POST for the first power on. This is the
default.
error-resetRuns POST if fatal errors are detected.
all-resetRuns POST after any reset.
minPOST output displays functional tests with a
banner and pinwheel.
normalPOST output displays all test and informational
messages.
maxPOST displays all test, informational, and some
debugging messages.
28Sun Fire T1000 Server Service Manual • January 2006
FIGURE 2-5 Flowchart of ALOM Variable for POST Configuration
Chapter 2 Sun Fire T1000 Server Diagnostics29
TABLE 2-6 shows typical combinations of ALOM variables and associated POST
mode.
TABLE 2-6ALOM Parameters and POST Modes
ParameterNormal Diagnostic
Mode
(default settings)
diag_modenormaloffservicenormal
*
setkeyswitch
diag_levelmaxn/amaxmax
diag_triggerpower-on-reset
diag_verbositynormaln/amaxmax
Description of POST
execution
* The setkeyswitch parameter, when set to diag, overrides all the other ALOM POST variables.
normalnormalnormaldiag
error-reset
This is the default POST
configuration and
provides a reasonable
compromise between
testing thoroughness
and quick server
initialization.
No POST
Execution
noneall-resetsall-resets
POST does not
run, resulting in
quick system
initialization, but
this is not a
suggested
configuration.
Diagnostic
Service Mode
POST runs the
full spectrum of
tests with the
maximum output
displayed.
Keyswitch
Diagnostic preset
values
POST runs the
full spectrum of
tests with the
maximum output
displayed.
▼ To Change POST Parameters
1. Access the ALOM sc> prompt:
At the console, issue the #. key sequence:
#.
2. At the ALOM sc> prompt, use the setsc command to set the POST parameter:
Example:
sc> setsc diag_mode service
The setkeyswitch parameter is a command that sets the virtual keyswitch, so it
does not use the setsc command. Example:
sc> setkeyswitch diag
30Sun Fire T1000 Server Service Manual • January 2006
Reasons to Run POST
You can use POST for basic sanity checking of the server hardware and for
troubleshooting as described in the following sections.
Routine Sanity Check of the Hardware
POST tests critical hardware components to verify functionality before the system
boots and accesses software. If POST detects an error, the faulty component is
disabled automatically, preventing faulty hardware from impacting system
operation.
Under normal operating conditions, the server is usually configured to run POST
maximum mode for all power-on or error-generated resets. This enables the system
to initialize quickly, and still have hardware checkups to ensure a healthy system.
Diagnosing the System Hardware
You can use POST as an initial diagnostic tool for the system hardware. In this case,
configure POST to run in diagnostic service mode for maximum test coverage and
verbose output.
▼ To Run POST
This procedure describes how to run POST when you want maximum testing, as in
the case when you are troubleshooting a system.
1. Switch from the system console prompt to the SC console prompt by issuing the #.
escape sequence and type the command setsc diag_mode normal.
ok #.
sc> setsc diag_mode normal
2. Set the virtual keyswitch to diag so that POST will run in service mode.
sc> setkeyswitch diag
Chapter 2 Sun Fire T1000 Server Diagnostics31
3. Reset the system so that POST runs.
The following example uses the powercycle command. For other methods, refer to
the Sun Fire T1000 Server Administration Guide.
sc> powercycle
Are you sure you want to powercycle the system [y/n]? y
Powering host off at MON JAN 10 02:52:02 2000
Waiting for host to Power Off; hit any key to abort.
SC Alert: SC Request to Power Off Host.
SC Alert: Host system has shut down.
Powering host on at MON JAN 10 02:52:13 2000
SC Alert: SC Request to Power On Host.
4. Switch to the system console to view the post output:
sc> console
Example of POST output:
SC: Alert: Host system has reset1Note: Some output omitted.
0:0>
0:0>@(#) ERIE Integrated POST 4.x.0.build_17 2005/08/30 11:25
0:0>Config port B, bus 2 dev 0 func 0, tag 5714 BRIDGE
0:0>Config port B, bus 3 dev 8 func 0, tag PCIX BRIDGE
0:0>IO-Bridge unit 1 PCI id test
0:0>INFO:10 count read passed for MB/IOB_PCIEb/BRIDGE! Last read VID:1166|DID:103
0:0>INFO:10 count read passed for MB/IOB_PCIEb/BRIDGE/GBE! Last read VID:14e4|DID:1648
0:0>INFO:10 count read passed for MB/IOB_PCIEb/BRIDGE/HBA! Last read VID:1000|DID:50
0:0>Quick JBI Loopback Block Mem Test
0:0>Quick jbus loopback Test 262144 bytes at 00000000.00600000
0:0>INFO:
0:0>POST Passed all devices.
0:0>POST:Return to VBSC.
0:0>Master set ACK for vbsc runpost command and spin...
5. Perform further investigation if needed.
When POST is finished running, the system will continue to boot even if post detects
a faulty FRU, provided it does not leave the system without memory or a CPU core.
Note that certain DIMM failures may not be diagnosable to a single DIMM. These
failures are fatal, and will result in both logical banks being unconfigured.If POST
detects a faulty device, the fault is displayed and the fault information is passed to
ALOM for fault handling..
a. Interpret the POST messages:
POST error messages use the following syntax:
c:s > ERROR: TEST = failing-test
c:s > H/W under test = FRU
c:s > Repair Instructions: Replace items in order listed by H/W
Chapter 2 Sun Fire T1000 Server Diagnostics33
under test above
c:s > MSG = test-error-message
c:s > END_ERROR
where c = the core number, s = the strand number.
Warning and informational messages use the following syntax:
INFO or WARNING: message
The following is an example of a POST error message.
.
.
.
0:0>Data Bitwalk
0:0>L2 Scrub Data
0:0>L2 Enable
0:0>Testing Memory Channel 0 Rank 0 Stack 0
0:0>Testing Memory Channel 3 Rank 0 Stack 0
0:0>Testing Memory Channel 0 Rank 1 Stack 0
.
.
.
0:0>ERROR: TEST = Data Bitwalk
0:0>H/W under test = MB/CMP0/CH0/R1/D0/S0 (J0701)
0:0>Repair Instructions: Replace items in order listed by ’H/W
under test’ above.
0:0>MSG = Pin 3 failed on MB/CMP0/CH0/R1/D0/S0 (J0701)
0:0>END_ERROR
0:0>Testing Memory Channel 3 Rank 1 Stack 0
In this example, POST is reporting a memory error at DIMM location
MB/CMP0/CH0/R1/D0. (J0701).
b. Run the showfaults command to obtain additional fault information.
The fault is captured by ALOM, where the fault is logged, the Service required
LED is lit, and the faulty component is disabled.
34Sun Fire T1000 Server Service Manual • January 2006
Example:
ok .#
sc> showfaults -v
ID Time FRU Fault
1 APR 24 12:47:27 MB/CMP0/CH2/R0/D0 MB/CMP0/CH2/R0/D0 deemed
faulty and disabled
In this example, MB/CMP0/CH2/R0/D0 (DIMM 0 at J0701) is disabled. Until the
faulty component is replaced, the system can boot using memory that was not
disabled.
Note – You can use ASR commands to display and control disabled components.
See “Managing System Components with Automatic System Recovery Commands”
on page 40.
Using the Solaris Predictive Self-Healing
Feature
The Solaris OS predictive self-healing technology enables Sun Fire T1000 server to
diagnose problems while the Solaris OS is running, and mitigate many serious
problems before they occur.
The Solaris OS uses the fault manager daemon, fmd(1M), which starts at boot time
and runs in the background to monitor the system. If a component generates an
error, the daemon handles the error by correlating the error with data from previous
errors and other related information to diagnose the problem. Once diagnosed, the
fault manager daemon assigns the problem a unique identifier (UUID) that
distinguishes the problem across any set of systems. When possible, the fault
manager daemon initiates steps to self-heal the failed component and take the
component offline. The daemon also logs the fault to the syslogd daemon and
provides a fault notification with a message ID (MSGID). You can use message ID to
get additional information about the problem from Sun’s knowledge article
database.
The predictive self-healing technology covers the following Sun Fire T1000 server
components:
■ UltraSPARC T1 multicore processor
■ Memory
■ I/O bus
Chapter 2 Sun Fire T1000 Server Diagnostics35
The PSH console message provides the following information:
■ Type
■ Severity
■ Description
■ Automated Response
■ Impact
■ Suggested Action for System Administrator
■ Details
If the Solaris OS PSH facility has detected a faulty component, use the fmdump
command to identify the fault.
Note – Additional predictive self-healing information is available at:
http://www.sun.com/msg.
36Sun Fire T1000 Server Service Manual • January 2006
▼ To Use the fmdump Command to Identify Faults
The fmdump command displays the list of faults detected by the Solaris PSH facility.
Use this command for the following reasons:
■ To see if any faults have been detected by the Solaris PSH facility.
■ If you need to obtain the fault message ID (SUNW-MSG-ID) for detected faults.
■ To verify that the replacement of a FRU has cleared the fault and not generated
any additional faults.
If you already have a fault message ID, go to Step 2 to obtain more information
about the fault from Suns Predictive Self-Healing Knowledge Article web site.
1. Check the event log using the fmdump command with -v for verbose output:
In this example, a fault is displayed, indicating the following details:
■ Date and time of the fault (Oct 21 10:32 EDT 2004)
■ Universal Unique Identifier (UUID) that is unique for every fault (a26d5379-
24b8-4a46-bcbf-d9e1ff75a1bc)
■ Sun message identifier (SUNW4U-8000-2S) that can be used to obtain additional
fault information
■ Faulted FRU (FRU: mem:///component=MB/CMP0/CH0:R1/D0/J0701), that in
this example is identified as the DIMM at
R1/D0(J0701).
2. Use the Sun message ID to obtain more information about this type of fault.
a. In a browser, go to the Predictive Self-Healing Knowledge Article web site:
http://www.sun.com/msg
b. Enter the message ID in the SUNW-MSG-ID field, and press Lookup.
In this example, the message ID SUN4U-8000-2S returns the following
information for corrective action:
Chapter 2 Sun Fire T1000 Server Diagnostics37
Memory module errors exceeded acceptable levels
Type
Fault
Severity
Major
Description
The Solaris(TM) Fault Manager has determined that the number
of correctable (single bit) memory errors reported against
a memory DIMM module indicates a fault requiring repair
action is present
.
Automated Response
The system will attempt to remove the affected page of
memory from service.
Impact
The system is at increased risk of incurring an uncorrectable
error, which will cause a service interruption, until the
memory DIMM module is replaced.
Suggested Action for System Administrator
For Sun Fire(TM) T1000, T2000 1280, 3800-6800, 2900-6900,
E12K, E15K, F20K, and F25K systems, it is imperative that
the System Controller be checked for evidence of a
faulty system board to ensure that the appropriate
service action is performed.
Use the fmdump(1M) command:
fmdump -vu <event-id>
to view the results of diagnosis and the specific Field
Replaceable Unit (FRU) identified for repair.
The event-id can be found in the EVENT-ID field of the
message. For example:
EVENT-ID:
39b30371-f009-c76c-90ee-b245784d2277
Details
The Message ID: SUN4U-8000-2S indicates the
Solaris Fault Manager has received reports that multiple
correctable (single bit) errors associated with a memory
DIMM module have been detected. Diagnosis applied to
the error reports has determined that a fault requiring
repair action is present.
A service case should be opened and time scheduled to
replace the FRU, identified in the fmdump(1M) output,
on which the suspect DIMM is located.
38Sun Fire T1000 Server Service Manual • January 2006
If Customer Enabled Services apply to the product then
refer to the FRU replacement procedures in the
appropriate service manual.
c. Follow the suggested actions to repair the fault.
Collecting Information From Solaris OS
Files and Commands
With the Solaris OS running on the Sun Fire T1000 server, you have the full
compliment of Solaris OS files and commands available for collecting information
and for troubleshooting.
If POST, ALOM, or the Solaris PSH features did not indicate the source of a fault,
check the message buffer and log files for notifications for faults. Hard drive faults
are usually captured by the Solaris message files.
Use the dmesg command to view the most recent system message. To view the
system messages log file, view the contents of the /var/adm/messages file.
▼ To Check the Message Buffer
1. Log in as superuser.
2. Issue the dmesg command:
# dmesg
The dmesg command displays the most recent messages generated by the system.
▼ To View System Message Log Files
The error logging daemon, syslogd automatically records various system
warnings, errors, and faults in message files. These messages can alert you to system
problems such as a device that is about to fail.
Chapter 2 Sun Fire T1000 Server Diagnostics39
The /var/adm directory contains several message files. The most recent messages
are in the /var/adm/messages file. After a period of time (usually every ten days),
a new messages file is automatically created. The original contents of the
messages file are rotated to a file named messages.1. Over a period of time, the
messages are further rotated to messages.2 and messages.3, and then deleted.
1. Log in as superuser.
2. Issue the following command:
# more /var/adm/messages
3. If you want to view all logged messages, issue the following command:
# more /var/adm/messages*
Managing System Components with
Automatic System Recovery Commands
The Automatic System Recovery (ASR) feature enables the server to automatically
configure failed components out of operation until they can be replaced. In the Sun
Fire T2000 server, the following components managed by the ASR feature:
■ UltraSPARC T1 processor strands
■ Memory DIMMs
■ I/O bus
The database that contains the list of disabled components is called the ASR blacklist
(asr-db).
In most cases, POST and ALOM automatically manage the disabling of faulty
components. When the faulty FRU is replaced, it must be manually enabled.
Example: A component appears faulty and is automatically disabled. The problem is
due to a loose connector, and no FRU replacement is required to fix the problem.
ALOM, which would normally detect a FRU replacement and enable the FRU, does
not do so. In this case, after the loose cable is reseated, the disabled component must
be manually enabled.
40Sun Fire T1000 Server Service Manual • January 2006
The ASR commands (TABLE 2-7) enable you to view, and manually add or remove
components from the ASR blacklist. These commands are run from the ALOM sc>
prompt.
TABLE 2-7ASR Commands
CommandDescription
showcomponent
enablecomponent asrkeyRemoves a component from the asr-db blacklist,
disablecomponent asrkeyAdds a component to the asr-db blacklist, where
clearasrdbRemoves all entries from the asr-db blacklist.
* The showcomponent command may not report all blacklisted DIMMs.
*
Displays system components and their current state.
where asrkey is the component to enable.
asrkey is the component to disable.
Note – The components (asrkeys) vary from system to system, depending on how
many cores and memory are present. Use the showcomponent command to see the
asrkeys on a specific system.
Note – A reset or powercycle is required after disabling or enabling a
component. If component status is changed with power on there is no effect to the
system until the next reset or powercycle.The following examples show the output
of these commands.
▼ To Run the showcomponent Command
The showcomponent command displays the system components (asrkeys) and
reports their status.
1. At the sc> prompt, enter the showcomponent command.
Chapter 2 Sun Fire T1000 Server Diagnostics41
Example with no disabled components:
sc> showcomponent
Keys:
.
.
.
ASR state: clean
Example showing a disabled component:.
sc> showcomponent
Keys:
.
.
.
ASR state: Disabled Devices
MB/CMP0/CH3/R1/D1 : dimm8 deemed faulty
To Run the disablecomponent Command
The disablecomponent command disables a component by adding it to the ASR
blacklist.
1. At the sc> prompt, enter the disablecomponent command
sc> disablecomponent MB/CMP0/CH3/R1/D1
sc>SC Alert:MB/CMP0/CH3/R1/D1 disabled
2. After receiving confirmation that the disablecomponent command is complete,
reset the server for so that the ASR command takes effect.
sc> reset
42Sun Fire T1000 Server Service Manual • January 2006
.
▼ To Run the enablecomponent Command
The enablecomponent command enables a disabled component by removing it
from the ASR blacklist.
1. At the sc> prompt, enter the enablecomponent command.
sc> enablecomponent MB/CMP0/CH3/R1/D1
sc>SC Alert:MB/CMP0/CH3/R1/D1 reenabled
2. After receiving confirmation that the enablecomponent command is complete,
reset the server for so that the ASR command takes effect.
sc> reset
Exercising the System with SunVTS
Sometimes a server exhibits a problem that cannot be isolated definitively to a
particular hardware or software component. In such cases, it may be useful to run a
diagnostic tool that stresses the system by continuously running a comprehensive
battery of tests. Sun provides the SunVTS software for this purpose.
This chapter describes the tasks necessary to use SunVTS software to exercise your
Sun Fire T1000 server.:
■ “Checking Whether SunVTS Software Is Installed” on page 43
■ “Exercising the System Using SunVTS Software” on page 44
Checking Whether SunVTS Software Is Installed
This procedure assumes that the Solaris OS is running on the Sun Fire T1000 server,
and that you have access to the Solaris OS command line.
Chapter 2 Sun Fire T1000 Server Diagnostics43
▼ To Check Whether SunVTS Software Is Installed
1. Check for the presence of SunVTS packages. Type:
% pkginfo -l SUNWvts SUNWvtsr SUNWvtsts SUNWvtsmn
■ If SunVTS software is loaded, information about the packages is displayed.
■ If SunVTS software is not loaded, you see an error message for each missing
package.
ERROR: information for "SUNWvts" was not found
ERROR: information for "SUNWvtsr" was not found
...
The pertinent packages are as follows.
PackageDescription
SUNWvtsSunVTS framework
SUNWvtsrSunVTS Framework (root)
SUNWvtstsSunVTS for tests
SUNWvtsmnSunVTS man pages
If SunVTS is not installed, you can obtain the installations packages from the
following:
■ Solaris Operating System DVDs
■ From the Sun Download Center: http://www.sun.com/oem/products/vts
The SunVTS 6.0 PS3 software, and future compatible versions, are supported on the
Sun Fire T1000 server.
SunVTS installation instructions are described in the SunVTS User’s Guide.
Exercising the System Using SunVTS Software
Before you begin, the Solaris OS must be running. You also need to ensure that
SunVTS validation test software is installed on your system. See “Checking Whether
SunVTS Software Is Installed” on page 43.
44Sun Fire T1000 Server Service Manual • January 2006
SunVTS software requires that you use one of two security schemes. The security
scheme you choose must be properly configured in order for you to perform this
procedure. For details, refer to the SunVTS User’s Guide.
SunVTS software features both character-based and graphics-based interfaces. This
procedure assumes that you are using the graphical user interface (GUI) on a system
running the Common Desktop Environment (CDE). For more information about the
character-based SunVTS TTY interface, and specifically for instructions on accessing
it by TIP or telnet commands, refer to the SunVTS User ’s Guide.
SunVTS software can be run in several modes. This procedure assumes that you are
using the default mode.
This procedure also assumes that the Sun Fire T1000 server is headless—that is, it is
not equipped with a monitor capable of displaying bit mapped graphics. In this case,
you access the SunVTS GUI by logging in remotely from a machine that has a
graphics display.
Finally, this procedure describes how to run SunVTS tests in general. Individual tests
may presume the presence of specific hardware, or may require specific drivers,
cables, or loopback connectors. For information about test options and prerequisites,
refer to the following documentation:
■ SunVTS Test Reference Manual
■ SunVTS 6.0 PS3 Doc Supplement (SPARC)
▼ To Exercise the System Using SunVTS Software
1. Log in as superuser to a system with a graphics display.
The display system should be one with a frame buffer and monitor capable of
displaying bit-mapped graphics such as those produced by the SunVTS GUI.
2. Enable remote display. On the display system, type:
# /usr/openwin/bin/xhost + test-system
where test-system is the name of the Sun Fire T1000 server you plan to test.
3. Remotely log in to the Sun Fire T1000 server as superuser.
where display-system is the name of the machine through which you are remotely
logged in to the Sun Fire T1000 server.
If you have installed SunVTS software in a location other than the default /opt
directory, alter the path in this command accordingly.
The SunVTS GUI appears on the display system’s screen.
46Sun Fire T1000 Server Service Manual • January 2006
FIGURE 2-6 The SunVTS GUI Screen
Chapter 2 Sun Fire T1000 Server Diagnostics47
5. Expand the test lists to see the individual tests.
The test selection area lists tests in categories, such as Network, as shown in
FIGURE 2-7. To expand a category, left-click theicon to the left of the category name
FIGURE 2-7 shows the expand category icon, which looks like a plus sign and appears
+
to the left of the category name.
.
FIGURE 2-7 SunVTS Test Selection Panel
6. (Optional) Select the tests you want to run.
Certain tests are enabled by default, and you can choose to accept these.
Alternatively, you can enable and disable individual tests or blocks of tests by
clicking the checkbox next to the test name or test category name. Tests are enabled
when checked, and disabled when not checked.
TABLE 2-8 lists tests that are especially useful to run on a Sun Fire T1000 server.
TABLE 2-8Useful SunVTS Tests to Run on a Sun Fire T1000 Server
SunVTS TestsFRUs Exercised by Tests
cmttest,cputest, fputest, iutest,
l1dcachetest, dtlbtest, and
l2sramtest—indirectly: mptest, and
systest
48Sun Fire T1000 Server Service Manual • January 2006
DIMMs, motherboard
TABLE 2-8Useful SunVTS Tests to Run on a Sun Fire T1000 Server (Continued)
SunVTS TestsFRUs Exercised by Tests
pmemtest, vmemtest, ramtestDIMMs, motherboard
serialtestI/O (serial port interface)
hsclbtestMotherboard, ALOM system Controller
(Host to System Controller interface)
7. (Optional) Customize individual tests.
You can customize individual tests by right-clicking on the name of the test. For
example, in the illustration under
FIGURE 2-7, right-clicking on the text string
bg0(nettest) brings up a menu that enables you to configure this Ethernet test.
8. Start testing.
Click the Start button that is located at the top left of the SunVTS window. Status
and error messages appear in the test messages area located across the bottom of the
window. You can stop testing at any time by clicking the Stop button.
During testing, SunVTS software logs all status and error messages. To view these,
click the Log button or select Log Files from the Reports menu. This opens a log
window from which you can choose to view the following logs:
■ Information —Detailed versions of all the status and error messages that appear in
the test messages area.
■ Test Error —Detailed error messages from individual tests.
■ VTS Kernel Error—Error messages pertaining to SunVTS software itself. You
should look here if SunVTS software appears to be acting strangely, especially
when it starts up.
■ UNIX Messages (/var/adm/messages)—A file containing messages generated by
the operating system and various applications.
■ Log Files (/var/opt/SUNWvts/logs)—A directory containing the log files.
For further information, refer to the documents that accompany the SunVTS
software
Chapter 2 Sun Fire T1000 Server Diagnostics49
50Sun Fire T1000 Server Service Manual • January 2006
CHAPTER
3
Removing and Replacing FRUs
This chapter describes how to remove and replace field-replaceable units (FRUs) in
the Sun Fire T1000 server.
The following topics are covered:
■ “Safety Information” on page 51
■ “Common Procedures for Parts Replacement” on page 53
■ “Removing and Replacing CRUs” on page 57
■ “Common Procedures for Finishing Up” on page 72
For a list of CRUs, see Appendix A, “Field-Replaceable Units (FRUs)” on page 75.
Note – Never attempt to run the system with the cover removed. The cover must be
in place for proper air flow. The cover interlock switch immediately shuts the system
down when the cover is removed.
Safety Information
This section describes important safety information you need to know prior to
removing or installing parts in the Sun Fire T1000 server.
For your protection, observe the following safety precautions when setting up your
equipment:
■ Follow all Sun standard cautions, warnings, and instructions marked on the
equipment and described in Important Safety Information for Sun Hardware Systems.
■ Ensure that the voltage and frequency of your power source match the voltage
and frequency inscribed on the equipment s electrical rating label.
■ Follow the electrostatic discharge safety practices as described in this section.
51
The document, Important Safety Information for Sun Hardware Systems, 816-7190,
contains a listing of safety precautions for Sun systems. This document is located in
the packing carton of your server.
The Sun Fire T1000 server complies with regulatory requirements for safety and
EMI. Document about compliance is available online at:
http://www.sun.com/documentation
Safety Symbols
The following symbols might appear in this document, note their meanings:
Caution – There is a risk of personal injury and equipment damage. To avoid
personal injury and equipment damage, follow the instructions.
Caution – Hot surface. Avoid contact. Surfaces are hot and might cause personal
injury if touched.
Caution – Hazardous voltages are present. To reduce the risk of electric shock and
danger to personal health, follow the instructions.
Electrostatic Discharge Safety
Electrostatic discharge (ESD) sensitive devices, such as the motherboard, PCI cards,
hard drives, and memory cards require special handling.
Caution – The boards and hard drives contain electronic components that are
extremely sensitive to static electricity. Ordinary amounts of static electricity from
clothing or the work environment can destroy components. Do not touch the
components along their connector edges.
52Sun Fire T1000 Server Service Manual • January 2006
Use an Antistatic Wrist Strap
Wear an antistatic wrist strap and use an antistatic mat when handling components
such as drive assemblies, boards, or cards. When servicing or removing server
components, attach an antistatic strap to your wrist and then to a metal area on the
chassis. Do this after you disconnect the power cords from the server. Following this
practice equalizes the electrical potentials between you and the server.
Use an Antistatic Mat
Place ESD-sensitive components such as the motherboard, memory, and other PCB
cards on an antistatic mat.
Common Procedures for Parts
Replacement
Before you can remove and replace parts that are inside the Sun Fire T1000 server,
you must perform the following procedures:
The corresponding procedures that you perform when maintenance is complete are
described in “Common Procedures for Finishing Up” on page 72.
Required Tools
The Sun Fire T1000 server can be serviced with the following tools:
■ Antistatic wrist strap
■ Antistatic mat
■ No. 2 Phillips screwdriver
▼ To Shut the System Down
Performing a graceful shutdown makes sure all of your data is saved and the system
is ready for restart.
Chapter 3 Removing and Replacing FRUs53
1. Log in as superuser or equivalent.
Depending on the nature of the problem, you might want to view the system status
or the log files, or run diagnostics before you shut down the system. Refer to the SunFire T1000 Server Administration Guide for log file information.
2. Notify affected users.
Refer to your Solaris system administration documentation for additional
information.
3. Save any open files and quit all running programs.
Refer to your application documentation for specific information on these processes.
4. Shut down the OS:
a. At the Solaris OS prompt, issue the uadmin command to halt the Solaris OS
and to return to the ok prompt.
# uadmin 2 0
WARNING: proc_exit: init exited
syncing file systems... done
Program terminated
ok
This command is described in Solaris system administration documentation.
5. Switch from the system console prompt to the SC console prompt by issuing the #.
(Pound Period) escape sequence.
ok #.
sc>
b. Using the SC console, issue the poweroff command.
sc> poweroff -fy
SC Alert: SC Request to Power Off Host Immediately.
Note – You can also use the Power On/Off button on the front of the server to
initiate a graceful system shutdown.
Refer to the Sun Fire T1000 Server Administration Guide for more information about
the ALOM poweroff command.
54Sun Fire T1000 Server Service Manual • January 2006
▼ To Remove the Server From a Rack
If the server is installed in a rack with the extendable slide rails that were supplied
with the server, use this procedure to remove the server chassis from the rack.
1. (Optional) Issue the following command from the ALOM SC prompt to locate the
system that requires maintenance:
sc> setlocator on
Locator LED is on.
Once you have located the server, press the Locator button to turn it off.
2. Check to see that no cables will be damaged or interfere when the server chassis
is removed from the rack.
3. Disconnect the power cord from the power supply.
4. Disconnect all cables from the server and label them.
5. From the front of the server, unlock both mounting brackets (
the server chassis out until the brackets lock in the open position.
FIGURE 3-1 Unlocking a Mounting Bracket
FIGURE 3-1) and pull
Chapter 3 Removing and Replacing FRUs55
6. Press the release buttons on both mounting brackets (FIGURE 3-2) to release the
right and left mounting brackets, then pull the server chassis out of the rails.
The mounting brackets slide approximately 4 in (10 cm) further before disengaging.
FIGURE 3-2 Location of the Mounting Bracket Release Buttons
7. Set the chassis on a sturdy work surface.
▼ To Perform Electrostatic Discharge (ESD)
Prevention Measures
1. Prepare an antistatic surface by which to set parts during removal and installation.
Place ESD-sensitive components such as the printed circuit boards on an antistatic
mat. The following items can be used as an antistatic mat:
■ Antistatic bag used to wrap a Sun replacement part
■ Sun ESD mat, part number 250-1088
■ Disposable ESD mat (shipped with some replacement parts or optional system
components)
2. Use an antistatic wrist strap.
56Sun Fire T1000 Server Service Manual • January 2006
▼ To Remove the Top Cover
Access to all customer replaceable units (CRUs) requires the removal of the top
cover:
Note – Never run the system with the top cover removed. The top cover must be in
place for proper air flow. The cover interlock switch immediately shuts the system
down when the cover is removed.
Caution – The system supplies 3.3 Vdc standby power to the circuit boards even
when the system is powered off if the AC power cord is plugged in.
1. Press the cover release button (
2. While pressing the release button, grasp the rear of the cover and slide the cover
toward the rear of the server about one half inch.
3. Lift the cover off the chassis.
Cover release
button
FIGURE 3-3 Location of Top Cover, Release Button
FIGURE 3-3).
Top cover
Removing and Replacing CRUs
This section provides procedures for replacing the following customer replaceable
parts CRUs) inside the server chassis:
■ “To Remove the Optional PCI Express Card” on page 58 and “To Add or Replace
the Optional PCI Express Card” on page 60
Chapter 3 Removing and Replacing FRUs57
■ “To Remove the Fan Tray Assembly” on page 60 and “To Replace the Fan Tray
Assembly” on page 61
■ “To Remove the Power Supply” on page 61 and “To Replace the Power Supply”
on page 62
■ “To Remove the Hard Drive” on page 63 and “To Replace the Hard Drive” on
page 64
■ “To Remove DIMMs” on page 65 and “To Add or Replace DIMMs” on page 66
■ “To Remove the Clock Battery on the Motherboard” on page 70 and “To Replace
the Clock Battery on the Motherboard” on page 71
To locate these CRUs, refer to Appendix A, “Field-Replaceable Units (FRUs)” on
page 75.
▼ To Remove the Optional PCI Express Card
Use this procedure to remove the optional low-profile PCI Express card from the
server.
1. Perform the procedures described in “Common Procedures for Parts
Replacement” on page 53.
2. Remove any cable(s) that are attached to the card.
58Sun Fire T1000 Server Service Manual • January 2006
3. On the rear of the chassis, release the retention latch ()FIGURE 3-5 that secures the
PCI Express card to the chassis.
PCI Express card
Retention latch
FIGURE 3-4 Releasing the PCI Express Card Retention Latch
4. Gently work the PCI Express card out of the socket on the PCI Express riser board
FIGURE 3-5) and the retention bracket.
Socket
FIGURE 3-5 Removing and Replacing the PCI Express Card
Retention bracket
PCI Express card riser board
Chapter 3 Removing and Replacing FRUs59
5. Place the PCI Express card on an antistatic mat.
▼ To Add or Replace the Optional PCI Express
Card
Use this procedure to replace the PCI Express card.
1. Unpackage the replacement PCI Express card and place it on an antistatic mat.
Note – Only low profile PCI_E cards with low brackets will fit into the chassis.
There are a variety of PCI-E cards on the market. Read the product documentation
for your device for additional installation requirements and instructions that are not
covered here.
2. Insert the PCI Express card into the connector slot and retention bracket
FIGURE 3-5) on the PCI Express riser board.
(
3. On the rear of the chassis, engage the retention latch (
to the chassis.
4. Perform the procedures described in “Common Procedures for Finishing Up” on
page 72.
5. Run the Solaris printdiag command to verify that the PCI Express card is being
recognized by the system.
FIGURE 3-4) to secure the card
▼ To Remove the Fan Tray Assembly
1. Perform the procedures described in “Common Procedures for Parts
Replacement” on page 53.
2. Disconnect the fan power cable from the motherboard.
3. Release the tabs (
FIGURE 3-6) on both sides of the fan assembly.
60Sun Fire T1000 Server Service Manual • January 2006
Fan tray
assembly
FIGURE 3-6 Removing the Fan Tray Assembly
4. Remove the fan assembly from the sheet metal mounting brackets.
▼ To Replace the Fan Tray Assembly
1. Unpackage the replacement fan tray assembly and place it on an antistatic mat.
2. Align the fan tray assembly with the sheet metal mounting brackets and slide it
into place until tabs on each side lock it into place.
3. Reconnect the fan power cable to the motherboard.
4. Perform the procedures described in “Common Procedures for Finishing Up” on
page 72.
5. Verify that the Service required and
Locator LEDs are not lit.
▼ To Remove the Power Supply
1. Perform the procedures described in “Common Procedures for Parts
Replacement” on page 53.
2. Disconnect the power cable from the motherboard and pull it through the
midwall.
3. Loosen the fastener (
power supply forward to remove it from the chassis.
FIGURE 3-7) on the front of the power supply and slide the
Chapter 3 Removing and Replacing FRUs61
Fastener
Power supply
FIGURE 3-7 Removing the Power Supply
▼ To Replace the Power Supply
1. Unpackage the replacement power supply.
2. Slide the power supply into the chassis and engage the two alignment pins in the
rear of the chassis that mate with the power supply.
3. Tighten the fastener (
FIGURE 3-8) to lock the power supply into place in the chassis.
4. Redress the power cable through the midwall in the chassis and connect the cable
to the motherboard.
5. Perform the procedures described in “Common Procedures for Finishing Up” on
page 72.
6. Verify that the amber Fault LED on the replaced power supply and the Service
required
LED is not lit.
7. At the sc> prompt, issue the showenvironment command to verify the status of
the power supply.
62Sun Fire T1000 Server Service Manual • January 2006
Fastener
Power supply
FIGURE 3-8 Replacing the Power Supply
▼ To Remove the Hard Drive
1. Perform the procedures described in “Common Procedures for Parts
Replacement” on page 53.
2. Disconnect the cable from the hard drive.
3. Unsnap the catches on the latches (
FIGURE 3-9) on the front of the disk drive and
remove the drive and tray assembly from the chassis.
Latches
Hard drive
Figure showing how to remove the hard disk drive.
FIGURE 3-9 Removing the Hard Drive
Chapter 3 Removing and Replacing FRUs63
▼ To Replace the Hard Drive
1. Unpackage the replacement hard drive and tray assembly.
2. Slide the hard drive and tray assembly into the chassis until it mates with the
front of the chassis (
Hard drive
Latches
FIGURE 3-10 Replacing the Hard Drive
3. Snap the catches on the latches to lock the drive and tray assembly into place in
the chassis.
FIGURE 3-10).
4. Redress the power and cable through the midwall in the chassis and reconnect the
cable to the rear of the drive.
5. Perform the procedures described in “Common Procedures for Finishing Up” on
page 72
6. Perform administrative tasks to reconfigure the hard disk drive.
The procedures that you perform at this point depend on how your data is
configured. You might need to partition the drive, create file systems, load data from
backups, or have it updated from a RAID configuration.
Example:
cfgadm -c configure c0t0d0s0C
64Sun Fire T1000 Server Service Manual • January 2006
▼ To Remove DIMMs
Caution – This procedure requires that you handle components that are sensitive to
static discharges that can cause the component to fail. To avoid this problem, ensure
that you follow antistatic practices as described in “To Perform Electrostatic
Discharge (ESD) Prevention Measures” on page 56.
1. Perform the procedures described in “Common Procedures for Parts
Replacement” on page 53.
2. Locate the DIMM (FIGURE 4-8) that you want to replace.
Use FIGURE 3-11 and TABLE 3-1 to identify the DIMM you want to remove.
3. Make note of the DIMM location so you can install the replacement DIMM in the
same socket.
4. Push down on the ejector levers on each side of the DIMM until the DIMM is
released.
FIGURE 3-11 DIMM Locations
TABLE 3-1 maps the DIMM names that are displayed in faults to the socket numbers
that identify the location of the DIMM on the motherboard.
Chapter 3 Removing and Replacing FRUs65
TABLE 3-1DIMM Names and Socket Numbers
Socket NumberDIMM Name Used in Messages*
J0501
J0601
J0701
J0801
J1001
J1101
J1201
J1301
* DIMMnames in messages are displayed with thefull name such as MB/CMP0/CH1/R1/D1,but thistable
lists theDIMM namei in an abbreviated way the preceding MB/CMP0is omitted)for clarity.
5. Grasp the top corners of the DIMM and remove it from the motherboard.
6. Place the DIMM on an antistatic mat.
▼ To Add or Replace DIMMs
Use the following guidelines and FIGURE 3-11 and TABLE 3-1 to plan the memory
configuration of your server.
■ Eight slots hold industry-standard DDR-2 memory DIMMs (providing a total of
16 GBytes of memory).
■ The Sun Fire T1000 server accepts the following DIMM sizes:
■ 512 MB
■ 1GB
■ 2GB
■ All DIMMs installed must be the same size.
■ DIMMs must be added four at a time.
■ Rank 0 memory must be fully populated for the Sun Fire T1000 to function
1. Unpackage the replacement DIMMs and place them on an antistatic mat.
2. Ensure that the socket ejector tabs are in the open position.
3. Line up the replacement DIMM with the connector.
4. Push the DIMM into the socket until the ejector tabs lock the DIMM in place.
5. Perform the procedures described in “Common Procedures for Finishing Up” on
page 72.
66Sun Fire T1000 Server Service Manual • January 2006
6. Perform the following steps to clear the memory fault.
a. Gain access to the ALOM sc> prompt.
Refer to the Sun Fire T2000 Server Advanced Lights Out Management (ALOM)
Guide for instructions.
b. Run the showfaults -v command to determine how to clear the fault:
■ If the fault is a Host-detected fault (displays a UUID), such as the following:
sc> showfaults -v
ID Time FRU Fault
0 SEP 09 11:09:26 MB/CMP0/CH0/R0/D0 Host detected fault
MSGID:
SUN4U-8000-2S UUID: 7ee0e46b-ea64-6565-e684-e996963f7b86
Run the showfaults -v command to obtain the UUID to clear the fault:
sc> clearfault 7ee0e46b-ea64-6565-e684-e996963f7b86
Clearing fault from all indicted FRUs...
Fault cleared.
■ If the fault resulted in the DIMM being disabled, such as the following:
sc> showfaults -v
ID Time FRU Fault
1 OCT 13 12:47:27 MB/CMP0/CH0/R0/D0 MB/CMP0/CH0/R0/D0
deemed faulty and disabled
Run the enablecomponent command to enable the FRU:
sc> enablecomponent
7. Perform the following steps to verify that there are no faults:
a. Set the virtual keyswitch to diag mode so that POST will run in service mode.
sc> setkeyswitch diag
b. Issue the poweron command.
sc> poweron
Chapter 3 Removing and Replacing FRUs67
c. Switch to the system console to view POST output.
sc> console
Watch the POST output for possible fault messages. The following output is an
indication that POST did not detect any faults:
.
.
.
0:0>POST Passed all devices.
0:0>
0:0>DEMON: (Diagnostics Engineering MONitor) 0:0>Select one of the
following functions
0:0>POST:Return to OBP.
0:0>INFO:
0:0>POST Passed all devices.
0:0>Master set ACK for vbsc runpost command and spin...
Note – Depending on the configuration of ALOM POST variables (see, and whether
POST detected faults or not, the system might boot, or the system might remain at
the ok prompt. If the system is at the ok prompt, type boot.
d. Issue the Solaris OS fmadm faulty command.
# fmadm faulty
No memory or DIMM faults should be displayed.
If any faults are reported, return to the “Diagnostic Flow Chart” on page 11 for an
approach to diagnosing the fault.
▼ To Remove the Motherboard and Chassis
The motherboard, power supply, and chassis are replaced as a unit. Therefore,
remove all other FRUs and associated cables from your chassis and install them in
the new chassis. The FRUs to remove and replace and the procedures to remove and
replace them are:
68Sun Fire T1000 Server Service Manual • January 2006
1. Remove the PCI Express card.
See “To Remove the Optional PCI Express Card” on page 58.
2. Remove the fan tray assembly and cable.
See “To Remove the Fan Tray Assembly” on page 60.
3. Remove the power supply and cable.
“To Remove the Power Supply” on page 61
4. Remove the hard drive and cable.
See “To Remove the Hard Drive” on page 63.
5. Remove the memory DIMMs.
See“To Remove DIMMs” on page 65.
6. Remove the socketed system configuration SEEPROM from the motherboard and
place it on an antistatic mat.
The system configuration SEPROM contains the persistent storage for the host ID
and Ethernet MAC addresses of the system, as well as the ALOM configuration
including the IP addresses and ALOM user accounts, if configured. This information
will be lost unless the system configuration SEEPROM is removed and installed in
the replacement motherboard. The PROM does not hold the fault data, and this data
will no longer be accessible when the motherboard a nd chassis assembly is
replaced.
The location of this SEEPROM is shown in Appendix A, “Field-Replaceable Units
(FRUs)” on page 75.
▼ To Replace the Motherboard and Chassis
Assembly
1. Reconnect the front panel LED cable.
2. Replace the PCI Express card.
See “To Add or Replace the Optional PCI Express Card” on page 60).
3. Replace the fan tray assembly and cable.
See “To Replace the Fan Tray Assembly” on page 61).
4. Replace the power supply and cable.
“To Replace the Power Supply” on page 62
5. Replace the hard disk drive and cable.
See “To Replace the Hard Drive” on page 64).
Chapter 3 Removing and Replacing FRUs69
6. Replace the memory DIMMs.
“To Add or Replace DIMMs” on page 66.
7. Replace the socketed system configuration SEEPROM.
The location of this SEEPROM is shown in Appendix A, “Field-Replaceable Units
(FRUs)” on page 75.
8. Perform the procedures described in “Common Procedures for Finishing Up” on
page 72.
9. Boot the system and run POST to verify that the system is fully operational. See
“Running POST” on page 27.
▼ To Remove the Clock Battery on the
Motherboard
1. Perform the procedures described in “Common Procedures for Parts
Replacement” on page 53.
2. Using a small flat head screwdriver, carefully pry the battery (
motherboard.
FIGURE 3-12 Removing the Clock Battery from the Motherboard
FIGURE 3-12) from the
70Sun Fire T1000 Server Service Manual • January 2006
▼ To Replace the Clock Battery on the
Motherboard
1. Unpackage the replacement battery.
2. Press the new battery into the motherboard (
FIGURE 3-13 Replacing the Clock Battery on the Motherboard
3. Perform the procedures described in “Common Procedures for Finishing Up” on
page 72.
4. Use the ALOM setdate command to set the day and time.
Use the setdate command before you power-on the host system. For details about this
command, refer to the Sun Fire T1000 Server Advanced Lights Out Management (ALOM) Guide.
FIGURE 3-13) with the + facing upward.
Chapter 3 Removing and Replacing FRUs71
Common Procedures for Finishing Up
▼ To Replace the Top Cover
1. Place the top cover on the chassis.
Set the cover down so that the cover hangs over the rear of the server by about an
inch (2.5 cm).
2. Slide the cover forward until it latches into place.
72Sun Fire T1000 Server Service Manual • January 2006
▼ To Reinstall the Server Chassis in the Rack
Refer to the Sun Fire T1000 System Installation Manual for installation instructions.
After you have reinstalled the server chassis in the rack, reconnect all cables that you
disconnected when you remover the chassis from the rack.
▼ To Apply Power to the Server
1. Reconnect the power cord to the power supply.
Note – As soon as the power cord is connected, standby power is applied.
Depending on the configuration of the firmware, the system might boot.
“Safety Information” on page 43
Chapter 3 Removing and Replacing FRUs73
74Sun Fire T1000 Server Service Manual • January 2006
APPENDIX
A
Field-Replaceable Units (FRUs)
FIGURE A-1 shows the locations of the field-replaceable units (FRUs) in the Sun Fire
T1000 server.
The Channel/Rank/DIMM locations.
TABLE A-1 lists the FRUs. TABLE A-2 lists the locations of the DIMMs.
75
.
5
2
(3)
6
Motherboard (1)
8
(5)
(2)
4
(4)
Disk
(5)
7
1
3
FIGURE A-1 Field-Replaceable Units
76 Sun Fire T1000 Server Service Manual • January 2006
1
TABLE A-1Sun Fire T1000 Server FRU List
Item No.CRU
1Motherboard
and chassis
assembly
Replacement
InstructionsDescriptionLocation
“To Remove the
Motherboard and
Chassis” on
page 68
2DIMMs“To Remove
DIMMs” on
page 65
3Fan assembly“To Remove the
Fan Tray
Assembly” on
page 60
4Power supply
unit (PS)
“To Remove the
Power Supply” on
page 61
5Hard drive“To Remove the
Hard Drive” on
page 63
6PCI Express
card slot
“To Remove the
Optional PCI
Express Card” on
page 58
7Clock battery“To Remove the
Clock Battery on
the Motherboard”
on page 70
8SEEPROMRemove and
replace the
socketed
SEEPROM.
The motherboard and chassis are
MB
replaced as a single assembly. The
motherboard is provided in different
configurations to accommodate the
different processor models (6 core and
8 core).
Can be ordered in the following sizes:
• 512 MB
•1GB
See
TABLE A-2
and
FIGURE 3-11.
•2GB
A single assembly containing 4 fans.FAN_TRAY
The power supply provides -3.3 Vdc
PS0
standby power at 3 @ 3 Amps and 12
Vdc at 25 Amps.
SATA disk drive, 3.5-inch form factorHD0
Optional add-on express cardPCI0
Battery is located on the motherboard.SC/BAT
The socketed SEEPROM contains the
MB/SEEPROM
MAC address and system
configuration information.