Sun Microsystems Fire T2000 Service Manual

Sun Fire™T2000 Server
Service Manual
Sun Microsystems, Inc. www.sun.com
Part No. 819-2548-10 October 2005, Revision A
Submit comments about this document at: http://www.sun.com/hwdocs/feedback
Copyright 2005Sun Microsystems,Inc., 4150 Network Circle, SantaClara, California95054, U.S.A. All rights reserved. Sun Microsystems, Inc.has intellectual property rights relating to technologythat isdescribed in this document. Inparticular, and without
limitation, theseintellectual propertyrights may include one ormore ofthe U.S. patents listed athttp://www.sun.com/patentsand one or more additional patents orpending patent applications in theU.S. and in other countries.
This documentand the product to which it pertainsare distributedunder licenses restricting their use,copying, distribution,and decompilation. Nopart of the product or of thisdocument may be reproduced in any formby anymeans without priorwritten authorizationof Sun andits licensors, if any.
Third-party software, including fonttechnology, iscopyrighted andlicensed from Sun suppliers. Parts ofthe productmay be derived from BerkeleyBSD systems,licensed from the University ofCalifornia. UNIXis a registered trademark in
the U.S.and in other countries, exclusivelylicensed throughX/Open Company,Ltd. Sun, Sun Microsystems, the Sun logo,AnswerBook2, docs.sun.com,Java, OpenBoot,SunSolve, SunVTS,Sun Fire,and Solarisare trademarksor
registered trademarks of SunMicrosystems, Inc.in the U.S. and inother countries. All SPARCtrademarks areused under license and are trademarksor registeredtrademarks ofSPARCInternational, Inc. in the U.S.and in other
countries. Products bearingSPARCtrademarks arebased upon an architecture developed by SunMicrosystems, Inc. The OPENLOOK and Sun™ Graphical UserInterface wasdeveloped by SunMicrosystems, Inc.for its users and licensees.Sun acknowledges
the pioneeringefforts ofXerox inresearching anddeveloping the concept of visualor graphical user interfaces forthe computer industry.Sun holds anon-exclusive license from Xeroxto the Xerox GraphicalUser Interface,which license also covers Sun’slicensees who implement OPEN LOOK GUIsand otherwise comply with Sun’swritten licenseagreements.
U.S. GovernmentRights—Commercial use.Government users are subject tothe SunMicrosystems, Inc. standard license agreement and applicable provisions ofthe FAR and its supplements.
DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANYIMPLIED WARRANTY OFMERCHANTABILITY,FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Copyright 2005Sun Microsystems,Inc., 4150 Network Circle, SantaClara, Californie95054, Etats-Unis. Tous droits réservés. Sun Microsystems, Inc.a les droits de propriété intellectuels relatants àla technologiequi est décrit dans cedocument. En particulier, etsans la
limitation, cesdroits depropriété intellectuelspeuvent inclure un ou plusdes brevetsaméricains énumérés à http://www.sun.com/patents et un oules brevetsplus supplémentairesou les applicationsde breveten attente dans les Etats-Uniset dansles autres pays.
Ce produit oudocument est protégé par un copyright etdistribué avec des licences quien restreignentl’utilisation, lacopie, la distribution,et la décompilation. Aucunepartie de ce produit ou document nepeut êtrereproduite sousaucune forme, par quelque moyenque ce soit, sans l’autorisation préalableet écrite de Sun etde sesbailleurs de licence,s’il yena.
Le logicieldétenu par des tiers, etqui comprendla technologie relative aux polices de caractères,est protégépar un copyright et licenciépar des fournisseurs deSun.
Des partiesde ce produit pourrontêtre dérivées des systèmes BerkeleyBSD licenciés par l’Université deCalifornie. UNIXest une marque déposée auxEtats-Unis et dans d’autres pays et licenciéeexclusivement par X/Open Company, Ltd.
Sun, SunMicrosystems, lelogo Sun, AnswerBook2, docs.sun.com, Java,SunVTS, Sun Fire, et Solaris sont desmarques defabrique ou des marques déposées de SunMicrosystems, Inc.aux Etats-Unis etdans d’autrespays.
Toutes lesmarques SPARCsont utiliséessous licence et sont desmarques defabrique ou desmarques déposéesde SPARCInternational, Inc. aux Etats-Uniset dans d’autres pays. Les produits portantles marquesSPARCsont basés sur une architecture développée par Sun Microsystems, Inc.
L’interfaced’utilisation graphiqueOPEN LOOK et Sun™ aété développée par Sun Microsystems,Inc. pourses utilisateurs et licenciés. Sun reconnaît les efforts depionniers deXerox pour la rechercheet le développementdu conceptdes interfaces d’utilisationvisuelle ougraphique pour l’industriede l’informatique. Sun détient unelicense nonexclusive de Xerox surl’interface d’utilisationgraphique Xerox, cette licence couvrant égalementles licenciées de Sun quimettent enplace l’interface d’utilisation graphiqueOPEN LOOK et qui enoutre seconforment aux licencesécrites de Sun.
LA DOCUMENTATION EST FOURNIE "EN L’ÉTAT" ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES OU TACITESSONT FORMELLEMENT EXCLUES, DANS LAMESURE AUTORISEEPARLA LOIAPPLICABLE, Y COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L’APTITUDE A UNE UTILISATION PARTICULIERE OU A L’ABSENCE DE CONTREFAÇON.
Please
Recycle
Contents
Preface ix
1. Sun Fire T2000 Server Overview 1
Sun Fire T2000 Server Features 2
Chip-Multitheaded (CMT) Multicore Processor and Memory Technology 2
Performance Enhancements 4
Remote Manageability With ALOM 5
System Reliability, Availability, and Serviceability 5
Hot-Pluggable and Hot-Swappable Components 6
Power Supply Redundancy 6
Fan Redundancy 6
Environmental Monitoring 7
Error Correction and Parity Checking 7
Predictive Self Healing 7
Chassis Identification 9
Additional Service Related Information 10
2. Sun Fire T2000 Server Diagnostics 11
Overview of Sun Fire T2000 Server Diagnostics 12
Using LEDs to Identify the State of Devices 16
iii
Front and Rear Panel LEDs 16
Hard Drive LEDs 19
Power Supply LEDs 20
Fan LEDs 21
Blower Unit LED 21
Using ALOM For Diagnosis and Repair Verification 22
Running ALOM Service-Related Commands 24
Connecting to ALOM 24
Switching Between the System Console and ALOM 24
Service-Related ALOM Commands 25
To Run the showfaults Command 26
To Run the showenvironment Command 27
To Run the showfru Command 29
Running POST 31
Controlling How POST Runs 31
To Change POST Parameters 34
Reasons to Run POST 35
Routine Sanity Check of the Hardware 35
Diagnosing the System Hardware 35
To Run POST 35
Using the Solaris Predictive Self-Healing Feature 40
To Use the fmdump Command to Identify Faults 41
Collecting Information From Solaris OS Files and Commands 43
To Check the Message Buffer 43
To View System Message Log Files 43
Managing Components with Automatic System Recovery (ASR) Commands 44
To Run the showcomponent Command 46
To Run the disablecomponent Command 47
iv Sun Fire T2000 Server Service Manual • October 2005
To Run the enablecomponent Command 47
Exercising the System With SunVTS 48
Checking Whether SunVTS Software Is Installed 48
To Check Whether SunVTS Software Is Installed 48
Exercising the System Using SunVTS Software 50
To Exercise the System Using SunVTS Software 50
For further information, refer to the manuals that accompany the SunVTS
software 53
3. Replacing Hot-Swappable and Hot-Pluggable FRUs 55
Devices That Are Hot-Swappable and Hot-Pluggable 56
Hot-Swapping a Fan 56
To Remove a Fan 57
To Replace a Fan 58
Hot-Swapping a Power Supply 58
To Remove a Power Supply 58
To Replace a Power Supply 60
Hot-Swapping the Rear Blower 61
To Remove the Rear Blower 61
To Replace the Rear Blower 61
Hot-Plugging a Hard Drive 63
To Remove a Hard Drive 63
To Replace a Hard Drive 64
4. Replacing Cold Swappable FRUs 65
Safety Information 66
Safety Symbols 66
Electrostatic Discharge Safety 67
Use an Antistatic Wrist Strap 67
Use an Antistatic Mat 67
Contents v
Common Procedures for Parts Replacement 67
Required Tools 68
To Shut the System Down 68
To Extend the Server to the Maintenance Position 69
To Remove the Server From the Rack 70
To Disconnect Power From the Server 72
To Perform Electrostatic Discharge (ESD) Prevention Measures 72
To Remove the Top Cover 72
To Remove the Front Bezel and Top Front Cover 73
Removing and Replacing FRUs 74
To Remove PCI-E and PCI-X Cards 75
To Replace PCI Cards 77
To Remove DIMMs 77
To Replace DIMMs 79
To Remove the System Controller 82
To Replace the System Controller Board 83
To Remove the Motherboard Assembly 84
To Replace the Motherboard Assembly 88
To Remove the Power Distribution Board 90
To Replace the Power Distribution Board 92
To Remove the LED Board 93
To Replace the LED Board 94
To Remove the Fan Power Board 95
To Replace the Fan Power Board 96
To Remove the Front I/O Board 96
To Replace the Front I/O Board 97
To Remove the DVD Drive 98
To Replace the DVD Drive 99
vi Sun Fire T2000 Server Service Manual • October 2005
To Remove the SAS Disk Backplane 99
To Replace the SAS Disk Backplane 100
To Remove the Battery on the System Controller 101
To Replace the Battery on the System Controller 101
Common Procedures for Finishing Up 103
To Replace the Top Front Cover and Front Bezel 103
To Replace the Top Cover 104
To Reinstall Server Chassis in the Rack 104
To Return the Server to the Normal Rack Position 105
To Apply Power to the Server 107
5. Adding New Components and Devices 109
Adding Hot-Pluggable and Hot-Swappable Devices 110
To Add a Hard Drive to the Server 110
To Add a USB Device 111
Adding Components Inside the Chassis 113
To Add DIMMs 113
To Add a PCI-E or PCI-X Card 116
A. Field-Replaceable Units 119
Contents vii
viii Sun Fire T2000 Server Service Manual • October 2005
Preface
The Sun Fire T2000 Service Manual provides information to aid in diagnosing hardware problems and describes how to replace components within the Sun Fire™ T2000 server. This guide also describes how to add components such as hard drives and memory to the server.
This manual is written for technicians, service personnel, and system administrators who service and repair computer systems. The person qualified to use this manual:
Can open a system chassis, identify, and replace internal components.
Understands the Solaris Operating System and the command-line interface.
Has superuser privileges for the system being serviced.
Understands typical hardware troubleshooting tasks.
ix
How This Book Is Organized
This guide is organized into the following chapters:
Chapter 1 describes the main features of the Sun Fire T2000 server.
Chapter 2 describes the diagnostics that are available for monitoring and diagnosing
the Sun Fire T2000 server.
Chapter 3 explains how to remove and replace hot-swappable and hot-pluggable
field replaceable units (FRUs).
Chapter 4 describes how to remove and replace the FRUs that cannot be hot-
swapped.
Chapter 5 explains how to add new components such as hard drives, memory, and
PCI cards to the Sun Fire T2000 server.
Appendix A provides an illustrated breakdown of parts and lists the field
replaceable units (FRUs).
x Sun Fire T2000 Server Service Manual • October 2005
Sun Fire T2000 Server Documentation
You can view and print the following manuals from the Sun documentation web site at: http://www.sun.com/documentation
Part
Title Description
Number
Sun Fire T2000 Server Site Planning Guide
Sun Fire T2000 Server Product Notes Late-breaking information about the
Sun Fire T2000 Server Overview Overview of the features of this server 819-2543
Sun Fire T2000 Server Getting Started Guide
Sun Fire T2000 Server Installation Guide
Sun Fire T2000 Server Administration Guide
Sun Fire T2000 Server Advanced Lights Out Manager (ALOM) Guide
Site planning information for the Sun Fire T2000 server
server.
Information about where to find documentation to get your system installed and running quickly
Detailed rackmounting, cabling, power­on, and configuration information
How to perform administrative tasks that are specific to the Sun Fire T2000 server
How to use the Advanced Lights Out Manager (ALOM) software on the Sun Fire T2000 server
819-2545
819-2544
819-2542
819-2546
819-2549
819-2550
Preface xi
Typographic Conventions
Typeface
AaBbCc123 The names of commands, files,
AaBbCc123
AaBbCc123 Book titles, new words or terms,
1 The settings on your browser might differ from these settings.
1
Meaning Examples
Edit your.login file. and directories; on-screen computer output
What you type, when contrasted with on-screen computer output
words to be emphasized. Replace command-line variables with real names or values.
Use ls -a to list all files.
% You have mail.
% su
Password:
Read Chapter 6 in the User’s Guide.
These are called class options.
You must be superuser to do this.
To delete a file, type rm filename.
Shell Prompts
Shell Prompt
C shell machine-name%
C shell superuser machine-name#
Bourne shell and Korn shell $
Bourne shell and Korn shell superuser #
Accessing Sun Documentation
You can view, print, or purchase a broad selection of Sun documentation, including localized versions, at:
http://www.sun.com/documentation
xii Sun Fire T2000 Server Service Manual • October 2005
Third-Party Web Sites
Sun is not responsible for the availability of third-party web sites mentioned in this document. Sun does not endorse and is not responsible or liable for any content, advertising, products, or other materials that are available on or through such sites or resources. Sun will not be responsible or liable for any actual or alleged damage or loss caused by or in connection with the use of or reliance on any such content, goods, or services that are available on or through such sites or resources.
Contacting Sun Technical Support
If you have technical questions about this product that are not answered in this document, go to:
http://www.sun.com/service/contacting
Sun Welcomes Your Comments
Sun is interested in improving its documentation and welcomes your comments and suggestions. You can submit your comments by going to:
http://www.sun.com/hwdocs/feedback
Please include the title and part number of your document with your feedback:
Sun Fire T2000 Server Service Manual, part number 819-2548-10
Preface xiii
xiv Sun Fire T2000 Server Service Manual • October 2005
CHAPTER
1
Sun Fire T2000 Server Overview
This chapter provides an overview of the features of the Sun Fire T2000 server.
The following topics are covered:
“Sun Fire T2000 Server Features” on page 2
“Chassis Identification” on page 9
1
Sun Fire T2000 Server Features
The Sun Fire T2000 server is a high-performance entry-level server that is highly scalable and extremely reliable.
FIGURE 1-1 Sun Fire T2000 Server
Chip-Multitheaded (CMT) Multicore Processor and Memory Technology
The UltraSPARC®T1 multicore processor is the basis of the Sun Fire T2000 server. The UltraSPARC T1 processor is based on chip multithreading (CMT) technology that is optimized for highly threaded transactional processing. The UltraSPARC T1 processor improves throughput while using less power and dissipating less heat than conventional processor designs.
Depending on the model purchased, the processor has four, six, or eight UltraSPARC cores. Each core equates to a 64-bit execution pipeline capable of running four threads. The result is that the 8-core processor handles up to 32 active threads concurrently.
2 Sun Fire T2000 Server Service Manual • October 2005
Additional processor components, such as L1 cache, L2 cache, memory access crossbar, DDR2 memory controllers, and a JBus I/O interface have been carefully tuned for optimal performance.
UltraSPARC T1 multicore processor
FIGURE 1-2 Motherboard and UltraSPARC T1 Multicore Processor
Chapter 1 Sun Fire T2000 Server Overview 3
Performance Enhancements
The Sun Fire T2000 server introduces several new technologies with its sun4v architecture and multithreaded UltraSPARC T1 multicore processor.
Some of these enhancements are:
Large page optimization
Reduction on TLB misses
Optimized block copy
TABLE 1-1 lists feature specifications for the Sun Fire T2000 server.
TABLE 1-1 Sun Fire T2000 System Features at a Glance
Feature Description
Processor 1 UltraSPARC T1 multicore processor (4, 6, or 8 cores)
Memory 16 slots that can be populated with one of the following types of
DDR-2 DIMMS:
• 512 MB (8 GB maximum)
• 1 GB (16 GB maximum)
• 2 GB (32 GB maximum)
Ethernet ports 4 ports, 10/100/1000 Mb autonegotiating
Internal hard disk
1-4 SFF SAS drives, 2.5-inch form factor
drives
Other internal
1 slimline DVD drive
peripherals
USB ports 4 USB 1.1 ports (2 in front and 2 in rear)
Cooling 3 hot-swappable and redundant system fans and 1 blower unit
PCI interfaces 3 PCI-Express (PCI-E) slots for low-profile cards (supports 1x, 4x,
and 8x width cards) 2 PCI-X slots for 64-bit 133 MHz low-profile cards Note: One PCI-X slot is occupied by a SAS disk controller card.
Power 2 hot-swappable and redundant power supplies
Remote management
ALOM management controller with a serial and 10/100 Mb Ethernet port
Firmware OpenBoot Prom (OBP) for reset and POST support
ALOM for remote management administration
*
Operating system
Solaris 10 3/05 HW2 Operating System preinstalled on disk 0
Other software Java™ Enterprise System with a 90-day trial license
* Check the Sun Fire T2000 ProductNotes for the latest informationabout supported releases of the Solaris OS.
4 Sun Fire T2000 Server Service Manual • October 2005
Remote Manageability With ALOM
The Sun Advanced Lights Out Manager (ALOM) feature is a system controller (SC) that enables you to remotely manage and administer the Sun Fire T2000 server.
The ALOM software is preinstalled as firmware, and it initializes as soon as you apply power to the system. You can customize ALOM to work with your particular installation.
ALOM enables you to monitor and control your server over a network, or by using a dedicated serial port for connection to a terminal or terminal server. ALOM provides a command-line interface that you can use to remotely administer geographically distributed or physically inaccessible machines. In addition, ALOM enables you to run diagnostics (such as POST) remotely that would otherwise require physical proximity to the server’s serial port.
You can configure ALOM to send email alerts of hardware failures, hardware warnings, and other events related to the server or to ALOM. The ALOM circuitry runs independently of the server, using the server’s standby power. Therefore, ALOM firmware and software continue to function when the server operating system goes offline or when the server is powered off. ALOM monitors the following Sun Fire T2000 server components:
CPU temperature conditions
Hard drive status
Enclosure thermal conditions
Fan speed and status
Power supply status
Voltage levels
Faults detected by POST (Power-On Self-Test)
Solaris Predictive Self Healing (PSH) diagnostic facilities
For information about configuring and using the ALOM system controller, refer to the Sun Fire T2000 Server Advanced Lights Out Manager (ALOM) Guide.
System Reliability, Availability, and Serviceability
Reliability, availability, and serviceability (RAS) are aspects of a system’s design that affect its ability to operate continuously and to minimize the time necessary to service the system. Reliability refers to a system’s ability to operate continuously without failures and to maintain data integrity. System availability refers to the ability of a system to recover to an operational state after a failure, with minimal impact. Serviceability relates to the time it takes to restore a system to service following a system failure. Together, reliability, availability, and serviceability features provide for near continuous system operation.
Chapter 1 Sun Fire T2000 Server Overview 5
To deliver high levels of reliability, availability, and serviceability, the Sun Fire T2000 server offers the following features:
Hot-pluggable hard drives
Redundant, hot-swappable power supplies (two)
Redundant hot-swappable fan units (three)
Environmental monitoring
Error detection and correction for improved data integrity
Easy access for most component replacements
Extensive POST tests that automatically deletes faulty components from the
configuration.
PSH automated run time diagnosis capability that takes faulty components off
line.
For more information about using RAS features, refer to the Sun Fire T2000 Server System Administration Guide.
Hot-Pluggable and Hot-Swappable Components
Sun Fire T2000 hardware supports hot-plugging or hot-swapping of the chassis­mounted hard drives, fans, power supplies, and the rear blower. Using the proper software commands, you can install or remove these components while the system is running. Hot-plug and hot-swap technology significantly increases the system’s serviceability and availability by providing the ability to replace hard drives, fan units, rear blower, and power supplies without service disruption.
Power Supply Redundancy
The Sun Fire T2000 server features two hot-swappable power supplies which enable the system to continue operating should one of the power supplies fail or if one power source fails.
The Sun Fire T2000 server also has a single hot-swappable blower unit that works in conjunction with the power supply fans to provide cooling for the internal disk drives. If the blower unit fails, the two power supply fan units provide enough cooling for the disk drive bay to keep the system running.
Fan Redundancy
The Sun Fire T2000 server features three hot-swappable system fans. Multiple fans enable the system to continue operating with adequate cooling in the event that one of the fans fails.
6 Sun Fire T2000 Server Service Manual • October 2005
Environmental Monitoring
The Sun Fire T2000 server features an environmental monitoring subsystem designed to protect the server and its components against:
Extreme temperatures
Lack of adequate airflow through the system
Power supply failures
Hardware faults
Temperature sensors located throughout the system monitor the ambient temperature of the system and internal components. The software and hardware ensure that the temperatures within the enclosure do not exceed predetermined safe operating ranges. If the temperature observed by a sensor falls below a low­temperature threshold or rises above a high-temperature threshold, the monitoring subsystem software lights the amber Service Required LEDs on the front and back panel. If the temperature condition persists and reaches a critical threshold, the system initiates a graceful system shutdown.
All error and warning messages are sent to the system controller (SC), console, and are logged in the ALOM log file. Additionally, some FRUs such as power supplies provide LEDs that indicate a failure within the FRU.
Error Correction and Parity Checking
The UltraSPARC T1 multicore processor provides parity protection on its internal cache memories, including tag parity and data parity on the D-cache and I-cache. The internal 3MB L2 cache has parity protection on the tags, and ECC protection of the data.
Advanced ECC, also called chipkill, corrects up to 4-bits in error on nibble boundaries, as long as they are all in the same DRAM. If a DRAM fails, the DIMM continues to function.
Predictive Self Healing
The Sun Fire T2000 server features the latest fault management technologies. With the Solaris 10 Operating System (OS), Sun is introducing a new architecture for building and deploying systems and services capable of predictive self-healing. Self­healing technology enables Sun systems to accurately predict component failures and mitigate many serious problems before they actually occur. This technology is incorporated into both the hardware and software of the Sun Fire T2000 server.
Chapter 1 Sun Fire T2000 Server Overview 7
At the heart of the predictive self-healing capabilities is the Solaris Fault Manager, a service that receives data relating to hardware and software errors, and automatically and silently diagnoses the underlying problem. Once a problem is diagnosed, a set of agents automatically responds by logging the event, and if necessary, takes the faulty component offline. By automatically diagnosing problems, business-critical applications and essential system services can continue uninterrupted in the event of software failures, or major hardware component failures.
8 Sun Fire T2000 Server Service Manual • October 2005
Chassis Identification
FIGURE 1-3 and FIGURE 1-4 show the physical characteristics of the Sun Fire T2000
server.
Indicators and buttons
USB ports
3
2
FIGURE 1-3 Sun Fire T2000 Server Front Panel
SC serial mgt
port
port
Drive 2
Drive 0
SC net mgt
port
Drive 3
Drive 1
GBE ports
2
3
0 1
DVD drive
Hard drives
PCI-X slotsTTYA serial
Power Power
FIGURE 1-4 Sun Fire T2000 Server Rear Panel
Slot 0
PCI-E slot
supply 1supply 0
Indicators
USB ports
1
Slot 2
Slot 1
Slot 0
PCI-E slots
Slot 1
0
Chapter 1 Sun Fire T2000 Server Overview 9
Additional Service Related Information
In addition to this service manual, the following resources are available to help you keep your server running optimally:
Product Notes – The Sun Fire T2000 Server Product Notes (819-2544) contain late
breaking information about the system including required software patches, updated hardware and compatibility information, and solutions to know issues. The product notes are available online at:
http://www.sun.com/documentation
Release Notes – The Solaris OS release Notes contain important information
about the Solaris OS. The release notes are available online at:
http://www.sun.com/documentation
SunSolve Online – Provides a collection of support resources. Depending on the
level of your service contract, you have access to Sun patches, the Sun System Handbook, the SunSolve™ knowledge base, the Sun Support Forum, and additional documents, bulletins, and related links. Access this site at:
http://sunsolve.sun.com
Predictive Self-Healing Knowledge Database – You can access the knowledge
article corresponding to a self-healing message by taking the Sun Message Identifier (SUNW-MSG-ID) and entering it into the field on this page:
http://www.sun.com/msg
10 Sun Fire T2000 Server Service Manual • October 2005
CHAPTER
2
Sun Fire T2000 Server Diagnostics
This chapter describes the diagnostics that are available for monitoring and troubleshooting the Sun Fire T2000 server. This chapter does not provide troubleshooting methods, but instead describes the Sun Fire T2000 server diagnostics facilities and describes how to use them.
This chapter is intended for technicians, service personnel, and system administrators who service and repair computer systems.
The following topics are covered:
“Overview of Sun Fire T2000 Server Diagnostics” on page 12
“Using LEDs to Identify the State of Devices” on page 16
“Using ALOM For Diagnosis and Repair Verification” on page 22
“Running POST” on page 31
“Using the Solaris Predictive Self-Healing Feature” on page 40
“Collecting Information From Solaris OS Files and Commands” on page 43
“Managing Components with Automatic System Recovery (ASR) Commands” on
page 44
“Exercising the System With SunVTS” on page 48
11
Overview of Sun Fire T2000 Server Diagnostics
There are a variety of diagnostic tools, commands, and indicators you can use to monitor and troubleshoot a Sun Fire T2000 server:
LEDs provide a quick visual notification of the status of the server and of some
of the FRUs.
ALOM firmware –This system firmware runs on the system controller. In
addition to providing the interface between the hardware and OS, ALOM also tracks and reports the health of key server components. ALOM works closely with POST and Solaris predictive self-healing technology to keep the system up and running even when there is a faulty component.
Power-on self-test (POST) – POST performs diagnostics on system components
upon system reset to ensure the integrity of those components. POST is configureable and works with ALOM to take faulty components offline if needed.
Solaris OS predictive self healing (PSH) This technology continuously monitors
the health of the CPU and memory, and works with ALOM to take a faulty component offline if needed. The predictive self-healing technology enables Sun systems to accurately predict component failures and mitigate many serious problems before they occur.
Log files and console messages Provide the standard Solaris OS log files and
investigative commands that can be accessed and displayed on the device of your choice.
SunVTS An application that exercises the system, provides hardware validation,
and discloses possible faulty components with recommendations for repair.
The LEDs, ALOM, Solaris OS PSH, and many of the log files and console messages are integrated. For example, a fault detected by the Solaris software will display the fault, log it, pass information to ALOM where it is logged, and depending on the fault, might light one or more LEDs.
The diagnostic flowchart in
FIGURE 2-1 and TABLE 2-1 describe an approach for using
the server diagnostics to identify a faulty field replaceable unit (FRU). The diagnostics you use, and the order in which you use them, depend on the nature of the problem you are troubleshooting, so you might perform some actions and not others.
The flowchart assumes that you have already performed some rudimentary troubleshooting such as verification of proper installation, visual inspection of cables and power, and possibly performed a reset of the server (refer to the Sun Fire T2000 Server Installation Guide and Sun Fire T2000 Server Administration Guide for details).
12 Sun Fire T2000 Server Service Manual • October 2005
Faulty
hardware
suspected
Use this flow chart to understand what diagnostics are available to troubleshoot faulty hardware, and use
TABLE 2-1 to find more information about each diagnostic in
this chapter.
Numbers in this flow chart correspond to the Action numbers in Table 2-1.
1.
Are any
faults reported by
the showfaults
command?
Ye s
2.
Is a
fault message
ID (MSG-ID)
displayed?
Ye s
3. Enter the
message ID into
the Sun Knowl-
edge Article
web site for
recommended
actions
6.
Do the
Solaris logs
No No
5.
Do fault
No No
LEDs indicate
a faulty
FRU?
Ye s
9.
Replace Faulty
FRU
Ye s
Ye s
indicate a
faulty FRU?
Ye s
7.
Does POST
report any faulty
devices?
No
8.
Does SunVTS
report any faulty
devices?
4.
Did the
article recom-
mend a FRU
replacement?
No
FIGURE 2-1 Diagnostic Flow Chart
Ye s
10
Verify the
repair
No
11
Perform recom­mended corrective actions. If needed,
contact Sun for
Support
Chapter 2 Sun Fire T2000 Server Diagnostics 13
TABLE 2-1 Diagnostic Flow Chart Actions
Action No. Diagnostic Action Resulting Action
1.
Run the ALOM
showfaults
command.
The showfaults command displays faults detected by the system firmware.
• If faults are displayed, go to Action 2.
• If no faults are displayed, go to Action 6.
2.
Check fault message for a Sun Message ID.
Sun Message IDs (SUNW-MSG-ID) indicate that information is available from Sun’s knowledge article database.
• If you have a message ID number, go to Action 3.
• If you do not have a message ID number, go to Action 5.
3.
Enter the Sun Message ID into the Sun
Enter the Sun Message ID number into the knowledge article web site at:
http:www.sun.com/msg and go to Action 4.
Knowledge Article web site.
4.
Analyze the suggested actions.
In some cases, fault related messages are identified with suggested actions.
• If the suggested action recommends replacing a FRU, go to Action 9.
If the suggested action does not recommend replacing a FRU, perform the suggested action. Contact Sun for additional support, if needed
For more information, see these sections
“To Run the showfaults Command” on page 26
“Using the Solaris Predictive Self-Healing Feature” on page 40
Sun Support information:
http://www.sun.com/ service/contacting
5.
Do any of the fault LEDs indicate a faulty FRU?
The first LED to check is the Service Required LED. Additional LEDs on specific FRUs (fans, blower, power supplies, and hard disk drives) can pinpoint the faulty FRU.
• If an LED indicates a faulty FRU, go to Action 9.
• If FRU LEDs do not indicate a fault, go to Action 6.
6.
Check the Solaris log files for fault information.
The Solaris message buffer and log files record system events and can provide information about faults.
• If system messages indicate a faulty device, replace the FRU (Action 9).
• To obtain more diagnostic information, got to Action 7.
14 Sun Fire T2000 Server Service Manual • October 2005
“Using LEDs to Identify the State of Devices” on page 16
“Collecting Information From Solaris OS Files and Commands” on page 43
TABLE 2-1 Diagnostic Flow Chart Actions (Continued)
Action No. Diagnostic Action Resulting Action
7.
Run POST. POST performs basic tests of the server components
and reports faulty FRUs.
• If POST indicates a faulty FRU, replace the FRU (Action 9).
• If POST does not indicate a faulty FRU, go to Action 8
8.
Run SunVTS. SunVTS provides tests used to exercise and
diagnose FRUs. To run SunVTS, the server must be running the Solaris OS.
• If SunVTS reports a faulty device replace the FRU (Action 9).
• If SunVTS does not report a faulty device, go to Action 11.
9.
Replace faulty FRU.
The fans, blower, power supplies, and hard drives are hot-swappable.
The other FRUs require that you shut down the server to perform a cold-swap.
After replacing the faulty FRU, go to Action 10.
For more information, see these sections
“Running POST” on page 31
“Exercising the System With SunVTS” on page 48
“Replacing Hot­Swappable and Hot­Pluggable FRUs” on page 55
“Replacing Cold Swappable FRUs” on page 65
10.
11.
Verify the repair. Various commands and utilities can be used to
verify the functionality of the system components. Two useful commands are:
• The ALOM showfaults command
• The ASR showcomponents command
If the FRU is blacklisted, you can manually remove it from the black list with the enablecomponent command.
If the fault is cleared, and the component is not blacklisted, the repair is verified well enough to boot the server. For added assurance, you can run the SunVTS diagnostic software.
Contact Sun for Support.
The majority of hardware faults are detected by the server’s diagnostics. In rare cases it is possible that a problem requires additional troubleshooting. If you are unable to determine the cause of the problem, contact Sun for support.
“To Run the showfaults Command” on page 26
“Managing Components with Automatic System Recovery (ASR) Commands” on page 44
“Exercising the System With SunVTS” on page 48
Sun Support information:
http://www.sun.com/ service/contacting
Chapter 2 Sun Fire T2000 Server Diagnostics 15
Using LEDs to Identify the State of Devices
The Sun Fire T2000 server provides the following groups of LEDs:
Front and Rear Panel LEDs (TABLE 2-2)
Power Supply LEDs (TABLE 2-4)
Fan LEDs (TABLE 2-5)
Hard Drive LEDs (TABLE 2-3)
These LEDs provide a quick visual check of the state of the system.
Front and Rear Panel LEDs
The six front panel LEDs (FIGURE 2-2) are located in the upper left corner of the server chassis. Three of these LEDs are also provided on the rear panel (
FIGURE 2-3).
Locator
LED/button
FIGURE 2-2 Front Panel LEDs
Service
Required
LED
Power OK
LED
Power On/Off button
Rear-FRUFault
Top Fan
LED
LED
Over Temp
LED
16 Sun Fire T2000 Server Service Manual • October 2005
Loading...
+ 108 hidden pages