This documentand theproduct andtechnology towhich itpertains are distributed underlicenses restrictingtheir use, copying, distribution,and
decompilation. Nopart ofsuch productor technology,or ofthis document,may bereproduced inany formby anymeans withoutprior written
authorization ofOracle and/orits affiliatesand FujitsuLimited, andtheir applicablelicensors, ifany. Thefurnishings ofthis documentto youdoes not
give youany rightsor licenses,express orimplied, withrespect to the product ortechnology towhich itpertains, andthis documentdoes notcontain or
represent any commitment of any kind on the part of Oracle or Fujitsu Limited, or any affiliateof eitherof them.
This documentand theproduct andtechnology describedin thisdocument mayincorporate third-party intellectual propertycopyrighted by and/or
licensed fromthe suppliersto Oracleand/or itsaffiliates and Fujitsu Limited, including software andfont technology.
This distributionmay includematerials developedby thirdparties.
Parts ofthe productmay be derivedfrom BerkeleyBSD systems,licensed from the Universityof California. UNIXis aregistered trademarkin the U.S.and
SPARCtrademarks are based uponarchitecturesdeveloped byOracle and/orits affiliates.SPARC64 is atrademark ofSPARC International, Inc.,used
under licenseby FujitsuMicroelectronics, Inc. and Fujitsu Limited. Other names may be trademarks of their respectiveowners.
United StatesGovernment Rights- Commercialuse. U.S. Governmentusers are subject tothe standard government userlicense agreementsof Oracle
and/or itsaffiliates andFujitsu Limitedand theapplicable provisions of theFARand itssupplements.
Disclaimer: Theonly warrantiesgranted byOracle andFujitsu Limited,and/or anyaffiliate ofeither ofthem inconnection withthis documentor any
product ortechnology describedherein are thoseexpressly setforth inthe licenseagreement pursuantto whichthe product or technologyis provided.
EXCEPT ASEXPRESSLYSET FORTH INSUCH AGREEMENT, ORACLE ORFUJITSU LIMITED,AND/OR THEIRAFFILIATES MAKE NO
REPRESENTATIONS ORWARRANTIESOF ANYKIND (EXPRESSOR IMPLIED)REGARDING SUCHPRODUCT ORTECHNOLOGY ORTHIS
DOCUMENT,WHICH AREALL PROVIDEDAS IS,AND ALLEXPRESS ORIMPLIED CONDITIONS,REPRESENTATIONS ANDWARRANTIES,
INCLUDING WITHOUTLIMITATION ANYIMPLIED WARRANTY OFMERCHANTABILITY, FITNESS FORA PARTICULAR PURPOSEOR NONINFRINGEMENT,ARE DISCLAIMED,EXCEPT TOTHE EXTENTTHAT SUCH DISCLAIMERSARE HELDTO BE LEGALLY INVALID. Unless
otherwise expresslyset forthin suchagreement, to the extent allowed by applicable law,in noevent shallOracle orFujitsu Limited,and/or anyof their
affiliates haveany liabilityto anythird party under any legal theory for any loss of revenuesor profits,loss ofuse ordata, orbusiness interruptions,or for
any indirect,special, incidentalor consequentialdamages, evenif advisedof thepossibility ofsuch damages.
DOCUMENTATION ISPROVIDED “ASIS” ANDALL EXPRESSOR IMPLIEDCONDITIONS, REPRESENTATIONSAND WARRANTIES,
INCLUDING ANYIMPLIED WARRANTY OFMERCHANTABILITY,FITNESS FORA PARTICULAR PURPOSEOR NON-INFRINGEMENT, ARE
DISCLAIMED, EXCEPTTO THEEXTENT THAT SUCHDISCLAIMERS AREHELD TOBE LEGALLY INVALID.
technologies décritsdans cedocument. Demême, cesproduits, technologieset cedocument sontprotégés par des lois sur le copyright, des brevets,
d’autres loissur lapropriété intellectuelle et des traités internationaux.
Ce document,le produitet lestechnologies afférents sont exclusivementdistribués avecdes licencesqui en restreignentl’utilisation, lacopie, la
distribution etla décompilation.Aucune partiede ceproduit, deces technologiesou dece documentne peutêtre reproduite sousquelque formeque ce
soit, parquelque moyenque cesoit, sansl’autorisation écritepréalable d’Oracleet/ou sessociétés affiliéeset deFujitsu Limited,et deleurs éventuels
bailleurs delicence. Cedocument, bienqu’il vousait étéfourni, nevous confèreaucun droit et aucunelicence, expressesou tacites, concernant leproduit
ou latechnologie auxquelsil serapporte. Parailleurs, ilne contientni nereprésente aucunengagement, dequelque typeque cesoit, dela partd’Oracle ou
de FujitsuLimited, oudes sociétésaffiliées del’une oul’autre entité.
Cette distributionpeut comprendre des composantsdéveloppés pardes partiestierces.
Des partiesde ceproduit peuventêtre dérivées des systèmes Berkeley BSD, distribués sous licence par l’Université de Californie. UNIX est une marque
Fujitsu Limited.
Toutesles marques SPARC sontutilisées souslicence etsont desmarques déposéesde SPARC International,Inc., auxÉtats-Unis et dans d’autres pays.Les
produits portantla marque SPARC reposent sur des architectures développéespar Oracleet/ou sessociétés affiliées.SPARC64 est unemarque de SPARC
International, Inc.,utilisée souslicence parFujitsu Microelectronics, Inc. etFujitsu Limited.Tout autre nom mentionné peut correspondreà desmarques
appartenant àd’autres propriétaires.
United StatesGovernment Rights- Commercialuse. U.S. Governmentusers are subject tothe standard government userlicense agreementsof Oracle
and/or itsaffiliates andFujitsu Limitedand theapplicable provisions of theFARand itssupplements.
Avisde non-responsabilité : les seulesgaranties octroyées par Oracleet FujitsuLimited et/outoute sociétéaffiliée de l’une ou l’autreentité enrapport
avec cedocument outout produitou toutetechnologie décritsdans lesprésentes correspondent aux garantiesexpressément stipuléesdans le contrat de
licence régissantle produitou latechnologie fournis.SAUF MENTIONCONTRAIRE EXPRESSÉMENTSTIPULÉE DANSCE CONTRAT,ORACLE OU
FUJITSU LIMITEDET LESSOCIÉTÉS AFFILIÉESÀ L’UNE OUL’AUTREENTITÉ REJETTENTTOUTE REPRÉSENTATIONOU TOUTE GARANTIE,
QUELLE QU’ENSOIT LANATURE(EXPRESSE OUIMPLICITE) CONCERNANTCE PRODUIT, CETTETECHNOLOGIE OUCE DOCUMENT,
LESQUELS SONTFOURNIS ENL’ÉTAT. ENOUTRE, TOUTESLES CONDITIONS,REPRÉSENTATIONS ETGARANTIES EXPRESSESOU TACITES, Y
COMPRIS NOTAMMENT TOUTEGARANTIE IMPLICITERELATIVEÀ LAQUALITÉ MARCHANDE,À L’APTITUDE ÀUNE UTILISATION
PARTICULIÈRE OUÀ L’ABSENCE DE CONTREFAÇON, SONTEXCLUES, DANSLA MESUREAUTORISÉE PARLA LOIAPPLICABLE. Saufmention
contraire expressément stipulée dansce contrat,dans lamesure autoriséepar laloi applicable,en aucuncas Oracle ou FujitsuLimited et/ou l’une ou
l’autre deleurs sociétésaffiliées ne sauraient être tenuesresponsables enversune quelconquepartie tierce,sous quelquethéorie juridiqueque cesoit, de
tout manqueà gagnerou deperte deprofit, deproblèmes d’utilisation ou de perte de données, ou d’interruptionsd’activités, oude toutdommage
indirect, spécial,secondaire ou consécutif, même si ces entités ont été préalablement informées d’une telle éventualité.
LA DOCUMENTATIONEST FOURNIE« EN L’ÉTAT » ETTOUTE AUTRECONDITION, DÉCLARATIONET GARANTIE,EXPRESSE OUTACITE,EST
FORMELLEMENT EXCLUE,DANS LAMESURE AUTORISÉEPAR LA LOIEN VIGUEUR,Y COMPRISNOTAMMENT TOUTE GARANTIE
IMPLICITE RELATIVE ÀLA QUALITÉMARCHANDE, ÀL’APTITUDEÀ UNEUTILISATIONPARTICULIÈRE OUÀ L’ABSENCE DE
CONTREFAÇON.
Contents
Prefacexvii
1.Safety and Tools1–1
1.1Safety Precautions1–1
1.2System Precautions1–2
1.2.1Electrical Safety Precautions1–2
1.2.2Equipment Rack Safety Precautions1–2
1.2.3Filler Boards and Filler Panels1–3
1.2.4Handling Components1–3
2.Fault Isolation2–1
2.1Determining Which Diagnostics Tools to Use2–1
2.2Checking the Server and System Configuration2–4
2.2.1Checking the Hardware Configuration and FRU Status2–4
2.2.1.1Checking the Hardware Configuration2–5
2.2.2Checking the Software and Firmware Configuration2–6
2.2.2.1Checking the Software Configuration2–7
2.2.2.2Checking the Firmware Configuration2–7
2.2.3Downloading the Error Log Information2–8
2.3Operator Panel2–8
v
2.4Error Conditions2–14
2.4.1Predictive Self-Healing Tools2–14
2.4.2Monitoring Output2–17
2.4.3Messaging Output2–17
2.5LED Functions2–18
2.6Using the Diagnostic Commands2–21
2.6.1Using the showlogs Command2–21
2.6.2Using the fmdump Command2–22
2.6.2.1fmdump -V Command2–22
2.6.2.2fmdump -e Command2–23
2.6.3Using the fmadm faulty Command2–23
2.6.3.1fmadm repair Command2–23
2.6.3.2fmadm config Command2–24
2.6.4Using the fmstat Command2–24
2.7Traditional Oracle Solaris Diagnostic Commands2–25
2.7.1Using the iostat Command2–26
2.7.1.1Options2–26
2.7.2Using the prtdiag Command2–27
2.7.2.1Options2–27
2.7.3Using the prtconf Command2–30
2.7.3.1Options2–30
2.7.4Using the netstat Command2–32
2.7.4.1Options2–33
2.7.5Using the ping Command2–34
2.7.5.1Options2–34
2.7.6Using the ps Command2–35
2.7.6.1Options2–35
2.7.7Using the prstat Command2–36
viSPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
2.7.7.1Options2–36
2.8Other Issues2–37
2.8.1Can’t Locate Boot Device2–37
3.Periodic Maintenance3–1
3.1Tape Drive Unit3–1
3.1.1Cleaning the Tape Drive Unit3–1
4.FRU Replacement Preparation4–1
4.1FRU Replacement Method4–1
4.2Active Replacement4–4
4.2.1Removing a FRU From a Domain4–4
4.2.2Removing and Replacing a FRU4–5
4.2.3Adding a FRU Into a Domain4–5
4.2.4Verifying Hardware Operation4–6
4.3Hot Replacement4–6
4.3.1Removing and Replacing a FRU4–7
4.3.2Verifying Hardware Operation4–9
4.4Cold Replacement (Powering the Server Off and On)4–12
4.4.1Powering the Server Off Using Software4–12
4.4.2Powering the Server On Using Software4–13
4.4.3Powering the Server Off Manually4–14
4.4.4Powering the Server On Manually4–14
4.4.5Verifying Hardware Operation4–15
5.Internal Components Access5–1
5.1Sliding the Server In and Out to the Fan Stop5–1
5.1.1Sliding the Server Out of the Equipment Rack5–2
5.1.2Sliding the Server Into the Equipment Rack5–4
5.2Top Cover Remove and Replace5–5
Contentsvii
5.2.1Removing the Top Cover5–5
5.2.2Replacing the Top Cover5–8
5.3Fan Cover Remove and Replace5–8
5.3.1Removing the Fan Cover5–8
5.3.2Replacing the Fan Cover5–10
6.Storage Devices Replacement6–1
6.1Hard Disk Drive Replacement6–1
6.1.1Accessing the Hard Disk Drive6–4
6.1.2Removing the Hard Disk Drive6–4
6.1.3Installing the Hard Disk Drive6–5
6.1.4Securing the Server6–5
6.1.5Accessing the Hard Disk Drive Backplane of the M4000 Server6–6
6.1.6Removing the Hard Disk Drive Backplane of the M4000 Server6–6
6.1.7Installing the Hard Disk Drive Backplane of the M4000 Server6–7
6.1.8Securing the Server6–8
6.1.9Accessing the Hard Disk Drive Backplane of the M5000 Server6–9
6.1.10Removing the Hard Disk Drive Backplane of the M5000 Server6–
10
6.1.11Installing the Hard Disk Drive Backplane of the M5000 Server6–
10
6.1.12Securing the Server6–11
6.2CD-RW/DVD-RW Drive Unit (DVDU) Replacement6–12
6.2.1Identifying the Type of CD-RW/DVD-RW Drive Unit6–15
6.2.2Accessing the CD-RW/DVD-RW Drive Unit6–16
6.2.3Removing the CD-RW/DVD-RW Drive Unit6–16
6.2.4Installing the CD-RW/DVD-RW Drive Unit6–17
6.2.5Securing the Server6–17
6.2.6Accessing the CD-RW/DVD-RW Drive Backplane of the M4000
Server6–18
viiiSPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
6.2.7Removing the CD-RW/DVD-RW Drive Backplane of the M4000
Server6–18
6.2.8Installing the CD-RW/DVD-RW Drive Backplane of the M4000
Server6–19
6.2.9Securing the Server6–19
6.2.10Accessing the CD-RW/DVD-RW Drive Backplane of the M5000
Server6–20
6.2.11Removing the CD-RW/DVD-RW Drive Backplane of the M5000
Server6–21
6.2.12Installing the CD-RW/DVD-RW Drive Backplane of the M5000
Server6–21
6.2.13Securing the Server6–22
6.3Tape Drive Unit Replacement6–23
6.3.1Accessing the Tape Drive Unit6–26
6.3.2Removing the Tape Drive Unit6–26
6.3.3Installing the Tape Drive Unit6–27
6.3.4Securing the Server6–27
6.3.5Accessing the Tape Drive Backplane of the M4000 Server6–28
6.3.6Removing the Tape Drive Backplane of the M4000 Server6–29
6.3.7Installing the Tape Drive Backplane of the M4000 Server6–29
6.3.8Securing the Server6–30
6.3.9Accessing the Tape Drive Backplane of the M5000 Server6–31
6.3.10Removing the Tape Drive Backplane of the M5000 Server6–32
6.3.11Installing the Tape Drive Backplane of the M5000 Server6–32
6.3.12Securing the Server6–33
7.Power Systems Replacement7–1
7.1Power Supply Unit Replacement7–1
7.1.1Accessing the Power Supply Unit7–4
7.1.2Removing the Power Supply Unit7–4
7.1.3Installing the Power Supply Unit7–5
Contentsix
7.1.4Securing the Server7–5
8.I/O Unit Replacement8–1
8.1PCI Cassette Replacement8–4
8.1.1Accessing the PCI Cassette8–5
8.1.2Removing the PCI Cassette8–5
8.1.3Installing the PCI Cassette8–6
8.1.4Securing the Server8–7
8.2PCI Card Replacement8–7
8.2.1Removing the PCI Card8–7
8.2.2Installing the PCI Card8–8
8.3I/O Unit Replacement8–10
8.3.1Accessing the I/O Unit8–10
8.3.2Removing the I/O Unit8–10
8.3.3Installing the I/O Unit8–11
8.3.4Securing the Server8–12
8.4I/O Unit DC-DC Converter Replacement8–12
8.4.1Accessing the I/O Unit DC-DC Converter (DDC_A#0 or
DDC_B#0)8–14
8.4.2Removing the I/O Unit DC-DC Converter (DDC_A #0 or DDC_B
#0)8–14
8.4.3Installing the I/O Unit DC-DC Converter (DDC_A #0 or DDC_B
#0)8–17
8.4.4Securing the Server8–21
8.4.5Accessing the I/O Unit DC-DC Converter Riser8–21
8.4.6Removing the I/O Unit DC-DC Converter Riser8–22
8.4.7Replacing the I/O Unit DC-DC Converter Riser8–24
8.4.8Securing the Server8–24
9.XSCF Unit Replacement9–1
xSPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
9.1XSCF Unit Replacement9–1
9.1.1Accessing the XSCF Unit9–3
9.1.2Removing the XSCF Unit9–4
9.1.3Installing the XSCF Unit9–5
9.1.4Securing the Server9–5
10.Fan Modules Replacement10–1
10.1Fan Module Replacement10–1
10.1.1Accessing the 60-mm Fan Module10–4
10.1.2Removing the 60-mm Fan Module10–5
10.1.3Installing the 60-mm Fan Module10–6
10.1.4Securing the Server10–6
10.1.5Accessing the 172-mm Fan Module10–7
10.1.6Removing the 172-mm Fan Module10–8
10.1.7Installing the 172-mm Fan Module10–9
10.1.8Securing the Server10–9
10.1.9Accessing the 60-mm Fan Backplane10–10
10.1.10 Removing the 60-mm Fan Backplane10–11
10.1.11 Installing the 60-mm Fan Backplane10–12
10.1.12 Securing the Server10–12
10.1.13 Accessing the SPARC Enterprise M4000 172-mm Fan Backplane
10–13
10.1.14 Removing the SPARC Enterprise M4000 172-mm Fan Backplane
10–13
10.1.15 Installing the M4000 Server 172-mm Fan Backplane10–16
10.1.16 Securing the Server10–16
10.1.17 Accessing the M5000 Server 172-mm Fan Backplane10–17
10.1.18 Removing the M5000 Server 172-mm Fan Backplane10–17
10.1.19 Installing the M5000 Server 172-mm Fan Backplane10–20
10.1.20 Securing the Server10–20
Contentsxi
11.Memory Board Replacement11–1
11.1Memory Board Replacement11–1
11.1.1Accessing the Memory Board11–4
11.1.2Removing the Memory Board11–5
11.1.3Installing the Memory Board11–6
11.1.4Securing the Server11–6
11.2DIMM Replacement11–7
11.2.1Confirmation of DIMM Information11–8
11.2.2Memory Installation Configuration Rules11–9
11.2.3Installing Memory:11–10
11.2.4Accessing the DIMMs11–10
11.2.5Removing the DIMMs11–11
11.2.6Installing the DIMMs11–12
11.2.7Securing the Server11–12
12.CPU Module Replacement12–1
12.1CPU Module Replacement12–1
12.1.1Accessing the CPU Module12–4
12.1.2Removing the CPU Module12–5
12.1.3Installing the CPU Module12–6
12.1.4Securing the Server12–6
12.2CPU Upgrade12–7
12.2.1SPARC64 VII/SPARC64 VII+ CPU Modules Added to a New
Domain12–8
▼Adding a SPARC64 VII/SPARC64 VII+ CPU Module to a New
Domain128
12.2.2SPARC64 VII/SPARC64 VII+ Processors Added to an Existing
Domain12–11
▼Preparing to Add SPARC64VII/SPARC64 VII+ Processors to an
Existing Domain1211
xiiSPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
▼Adding a SPARC64 VII/SPARC64 VII+ CPU Module to a Domain
Configured With
SPARC64 VI12-13
12.2.3Upgrading a SPARC64 VI CPU Module to SPARC64 VII/SPARC64
VII+ on an Existing Domain12–15
13.Motherboard Unit Replacement13–1
13.1Motherboard Unit Replacement13–1
13.1.1Accessing the M4000 Server Motherboard Unit13–4
13.1.2Removing the M4000 Server Motherboard Unit13–5
13.1.3Installing the M4000 Server Motherboard Unit13–6
13.1.4Securing the Server13–6
13.1.5Accessing the M5000 Server Motherboard Unit13–7
13.1.6Removing the M5000 Server Motherboard Unit13–8
13.1.7Installing the M5000 Server Motherboard Unit13–10
13.1.8Securing the Server13–11
13.2DC-DC Converter Replacement13–12
13.2.1Accessing the M4000 Server DC-DC Converter13–14
13.2.2Removing the M4000 Server DC-DC Converter13–15
13.2.3Installing the M4000 Server DC-DC Converter13–16
13.2.4Securing the Server13–16
13.2.5Accessing the M5000 Server DC-DC Converter13–17
13.2.6Removing the M5000 Server DC-DC Converter13–18
13.2.7Installing the M5000 Server DC-DC Converter13–18
13.2.8Securing the Server13–18
13.3Motherboard Unit Upgrade13–19
13.3.1Notes on Upgrading13–19
13.3.2Replacing a Motherboard Unit as an Upgrade in an Existing
Domain13–20
14.Backplane Unit Replacement14–1
Contentsxiii
14.1Backplane Unit Replacement14–1
14.1.1Accessing the M4000 Server Backplane Unit14–3
14.1.2Removing the M4000 Server Backplane Unit14–5
14.1.3Installing the M4000 Server Backplane Unit14–7
14.1.4Securing the Server14–8
14.1.5Accessing the M5000 Server Backplane Unit14–9
14.1.6Removing the M5000 Server Backplane Unit14–10
14.1.7Installing the M5000 Server Backplane Unit14–12
14.1.8Securing the Server14–12
15.Operator Panel Replacement15–1
15.1Operator Panel Replacement15–1
15.2Accessing the Operator Panel15–4
15.2.1Removing the Operator Panel15–4
15.2.2Installing the Operator Panel15–7
15.2.3Securing the Server15–7
A. Components ListA–1
B. Rules for System ConfigurationB–1
B.1Server ConfigurationB–1
C. FRU ListC–1
C.1Server OverviewC–1
C.2System BoardsC–3
C.2.1Motherboard UnitC–3
C.2.2CPU ModuleC–4
C.2.3Memory BoardC–5
C.3Backplane UnitC–6
C.4I/O UnitC–6
xivSPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
C.5PowerC–7
C.6FAN ModuleC–8
C.7eXtended System Control Facility UnitC–9
C.8DrivesC–10
C.8.1Hard Disk DriveC–10
C.8.2CD-RW/DVD-RW Drive Unit (DVDU)C–11
C.8.3Tape Drive Unit (TAPEU)C–11
D. External Interface SpecificationsD–1
D.1Serial PortD–2
D.2UPC (UPS Control) PortD–3
D.3USB PortD–3
D.4Connection Diagram for Serial CableD–4
E. UPS ControllerE–1
E.1OverviewE–1
E.2Signal CablesE–1
E.3Signal Line ConfigurationE–2
E.4Power Supply ConditionsE–3
E.4.1Input circuitE–3
E.4.2Output circuitE–4
E.5UPS CableE–4
E.6UPC ConnectorE–5
F. Air FiltersF–1
F.1M4000 and M5000 Servers Air FilterF–2
F.1.1Command Operations ProceduresF–2
F.2Air Filter Installation for the M4000 ServerF–3
F.2.1Removing the Air Filter From the M4000 ServerF–8
F.3Air Filter Installation for the M5000 ServerF–9
Contentsxv
F.3.1Removing the Air Filter from the M5000 ServerF–13
F.3.2Servicing the Air FilterF–14
G. AbbreviationsG–1
IndexIndex–1
xviSPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
Preface
This service manual describes how to service the SPARC Enterprise M4000/M5000
servers from Oracle and Fujitsu. This document is intended for authorized service
providers. References herein to the M4000 server or M5000 server are references to
the SPARC Enterprise M4000 or SPARC Enterprise M5000 server.
This document is written for maintenance providers who have received formal
service training. A single engineer service model is used for servicing SPARC
Enterprise M4000/M5000 midrange servers with one exception: When the
motherboard of a SPARC Enterprise M5000 server must be removed and that server
is mounted above waist high in the rack, then two engineers or a platform must be
used for safety.
This section explains:
■ “SPARC Enterprise M4000/M5000 Servers Documentation” on page xviii
For the web location of all SPARC Enterprise M4000/M5000 servers documents, refer
to the SPARC Enterprise M4000/M5000 Servers Getting Started Guide packaged with
your server.
Product notes are available on the website only. Please check for the most recent
update for your product.
Note – For Sun Oracle software-related manuals (Oracle Solaris OS, and so on), go
to: http://docs.sun.com
Book TitlesSun/OracleFujitsu
SPARC Enterprise M4000/M5000 Servers Site Planning Guide819-2205C120-H015
xxSPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
CHAPTER
1
Safety and Tools
This chapter describes safety and tools information. The information is organized
into the following topics:
■ Section 1.1, “Safety Precautions” on page 1-1
■ Section 1.2, “System Precautions” on page 1-2
1.1Safety Precautions
To protect both yourself and the equipment, observe the following safety
precautions.
TABLE 1-1ESD Precautions
ItemProblemPrecaution
ESD
jack/wrist or
foot strap
ESD matESDAn approved ESD mat provides protection from static damage when used
ESD
packaging
box
Electrostatic
Discharge (ESD)
ESDPlace the board or component in the ESD safe packaging box after you
Connect the ESD connector to your server and wear the wrist strap or foot
strap when handling printed circuit boards. There are two antistatic strap
attachment points on the chassis:
1. Right side towards the front
2. Left side towards the rear
with a wrist strap or foot strap. The mat also cushions and protects small
parts that are attached to printed circuit boards.
remove it.
1-1
Caution – Attach the cord of the antistatic wrist strap directly to the server. Do not
attach the antistatic wrist strap to the ESD mat connection.
The antistatic wrist strap and any components you remove must be at the same
potential.
1.2System Precautions
For your protection, observe the following safety precautions when servicing your
equipment:
■ Follow all cautions, warnings, and instructions marked on the equipment.
■ Never push objects of any kind through openings in the equipment, as they might
touch dangerous voltage points or short out components that could result in fire
or electric shock.
■ Refer servicing of equipment to qualified personnel.
1.2.1Electrical Safety Precautions
Ensure that the voltage and frequency of the power outlet to be used match the
electrical rating labels on the equipment.
Wear antistatic wrist straps when handling any magnetic storage devices,
system boards, or other printed circuit boards.
Use only properly grounded power outlets as described in the SPARC EnterpriseM4000/M5000 Servers Installation Guide.
Caution – Do not make mechanical or electrical modifications. The manufacturer is
not responsible for regulatory compliance of modified servers.
1.2.2Equipment Rack Safety Precautions
All equipment racks should be anchored to the floor, ceiling, or to adjacent frames,
using the manufacturer’s instructions.
1-2SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
Free-standing equipment racks should be supplied with a stabilizer feature, which
must be sufficient to support the weight of the server when extended on its slides.
This prevents instability during installation or service actions.
Where a stabilizer feature is not supplied and the equipment rack is not bolted to the
floor, a safety evaluation must be conducted by the installation or service engineer.
The safety evaluation determines stability when the server is extended on its slides,
prior to any installation or service activity.
Prior to installing the equipment rack on a raised floor, a safety evaluation must be
conducted by the installation or service engineer. The safety evaluation ensures that
the raised floor has sufficient strength to withstand the forces upon it when the
server is extended on its slides. The normal procedure in this case would be to fix the
rack through the raised floor to the concrete floor below, using a proprietary
mounting kit for the purpose.
Caution – If more than one server is installed in an equipment rack, service only one
server at a time.
1.2.3Filler Boards and Filler Panels
Filler boards and panels, which are physically inserted into the server when a board
or module has been removed are used for EMI protection and for air flow.
1.2.4Handling Components
Caution – There is a separate ground located on the rear of the server. It is
important to ensure that the server is properly grounded.
Caution – The server is sensitive to static electricity. To prevent damage to the
board, connect an antistatic wrist strap between you and the server.
Caution – The boards have surface-mount components that can be broken by flexing
the boards.
To minimize the amount of board flexing, observe the following precautions:
Chapter 1 Safety and Tools1-3
■ Hold the board by the handle and finger hold panels, where the board stiffener is
located. Do not hold the board at the ends.
■ When removing the board from the packaging, keep the board vertical until you
lay it on the cushioned ESD mat.
■ Do not place the board on a hard surface. Use a cushioned antistatic mat. The
board connectors and components have very thin pins that bend easily.
■ Be careful of small component parts located on both sides of the board.
■ Do not use an oscilloscope probe on the components. The soldered pins are easily
damaged or shorted by the probe point.
■ Transport the board in its packaging box.
Caution – The heat sinks can be damaged by incorrect handling. Do not touch the
heat sinks while replacing or removing boards. If a heat sink is loose or broken,
obtain a replacement board. When storing or shipping a board, ensure that the heat
sinks have sufficient protection.
Caution – On the PCI cassette, when removing cables such as LAN cable, if your
finger can’t reach the latch lock of the connector, press the latch with a flathead
screwdriver to remove the cable. Forcing your finger into the clearance can cause
damage to the PCI card.
1-4SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
CHAPTER
2
Fault Isolation
This chapter describes overview and fault diagnosis information. The information is
organized into the following topics:
■ Section 2.1, “Determining Which Diagnostics Tools to Use” on page 2-1
■ Section 2.2, “Checking the Server and System Configuration” on page 2-4
■ Section 2.3, “Operator Panel” on page 2-8
■ Section 2.4, “Error Conditions” on page 2-14
■ Section 2.5, “LED Functions” on page 2-18
■ Section 2.6, “Using the Diagnostic Commands” on page 2-21
■ Section 2.7, “Traditional Oracle Solaris Diagnostic Commands” on page 2-25
■ Section 2.8, “Other Issues” on page 2-37
2.1Determining Which Diagnostics Tools to
Use
When a failure occurs, a message is often displayed on the monitor. Use the
flowcharts in
problems.
FIGURE 2-1 and FIGURE 2-2 to find the correct methods for diagnosing
2-1
FIGURE 2-1 Diagnostic Method Flow Chart
No
2-2SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
FIGURE 2-2 Diagnostic Method Flow Chart—Traditional Data Collection
Chapter 2 Fault Isolation2-3
2.2Checking the Server and System
Configuration
Before and after maintenance work, the state and configuration of the server and
components should be checked and the information saved. For recovery from a
problem, conditions related to the problem and the repair status must be checked.
The operating conditions must remain the same before and after maintenance.
A functioning
For example:
■ The syslog file should not display error messages.
■ The XSCF Shell command showhardconf does not display the * mark.
■ The administrative console should not display error messages.
■ The server processor logs should not display any error messages.
■ The Oracle Solaris Operating System message files should not indicate any
additional errors.
server without any problems should not display any error conditions.
2.2.1Checking the Hardware Configuration and FRU
Status
To replace a faulty component and perform the maintenance on the server it is
important to check and understand the hardware configuration of the server and the
state of each hardware component.
The hardware configuration refers to information that indicates to which layer a
component belongs in the hardware configuration.
The status of each hardware component refers to information on the condition of the
standard or optional component in the server: temperature, power supply voltage,
CPU operating conditions, and other times.
2-4SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
The hardware configuration and the status of each hardware component can be
checked from the maintenance terminal using eXtended System Control Facility
(XSCF) Shell commands, as shown in the following table.
TABLE 2-1Commands for Checking Hardware Configuration
CommandDescription
showhardconfDisplays hardware configuration.
showstatusDisplays the status of a component. This command is used when
only a faulty component is checked.
showboardsDisplays the status of devices and resources.
showdclDisplays the hardware resource configuration information of a
domain.
showfruDisplays the setting information of a device.
Also some conditions can be checked based on the On or blinking state of the
component LEDs (see
TABLE 2-3).
2.2.1.1Checking the Hardware Configuration
Login authority is required to check the hardware configuration. The following
procedure for these checks can be made from the maintenance terminal:
1. Log in with the account of the XSCF hardware maintenance engineer.
2. Type showhardconf.
XSCF> showhardconf
The showhardconf command prints the hardware configuration information to
the screen. See the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers
XSCF User ’s Guide for more detailed information.
Chapter 2 Fault Isolation2-5
2.2.2Checking the Software and Firmware
Configuration
The software and firmware configurations and versions affect the operation of the
server. To change the configuration or investigate a problem, check the latest
information and check for any problems in the software.
Software and firmware varies according to users:
■ The software configuration and version can be checked in the Oracle Solaris OS.
Refer to the Solaris 10 documentation for more information.
■ The firmware configuration and versions can be checked from the maintenance
terminal using XSCF Shell commands. Refer to the SPARC Enterprise
M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide for more detailed
information.
Check the software and firmware configuration information with assistance from the
system administrator. However, if you have received login authority from the system
administrator, the commands shown in the table can be used from the maintenance
terminal for these checks.
TABLE 2-2Commands for Checking Software and Firmware Configuration
CommandDescription
showrev(1M)System administration command that displays information system
patches.
uname(1)System administration command that outputs the current system
information.
version(8)XSCF Shell command that outputs the current firmware version
information.
showhardconf(8)XSCF Shell command that indicates information on components
mounted on the server.
showstatus(8)XSCF Shell command that displays the status of a component. This
command is used when only a faulty component is to be checked.
2-6SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
TABLE 2-2Commands for Checking Software and Firmware Configuration (Continued)
CommandDescription
showboards(8)XSCF Shell command that indicates information on eXtended system
board (XSB). It can indicate information on XSB that belongs to the
specified domain and information on all XSBs mounted. The
eXtended System Board (XSB) combines the hardware resources of a
physical system board. The SPARC Enterprise servers can generate
one (Uni-XSB) or four (Quad-XSB) XSB(s) from one physical system
board.
showdcl(8)XSCF Shell command that displays the configuration information of a
domain (hardware resource information).
showfru(8)XSCF Shell command that displays the setting information of a
device.
2.2.2.1Checking the Software Configuration
The following procedure for these checks can be made from the domain console:
1. Type showrev.
# showrev
The showrev command prints the system configuration information to the
screen.
2.2.2.2Checking the Firmware Configuration
Login authority is required to check the firmware configuration. The following
procedure for these checks can be made from the maintenance terminal:
1. Log in with the account of the XSCF hardware maintenance engineer.
2. Type version(8).
XSCF> version(8)
The version(8) command prints the firmware version information to the
screen. See the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF
User’s Guide for more detailed information.
Chapter 2 Fault Isolation2-7
2.2.3Downloading the Error Log Information
If you want to download the error log information, use the XSCF log fetch function.
The eXtended System Control facility unit (XSCFU) has an interface with external
units so that a maintenance engineer can easily obtain useful maintenance
information such as error logs
Connect the maintenance terminal, and use the command-line interface (CLI) or
browser user interface (BUI) to issue a download instruction to the maintenance
terminal to download Error Log information over the XSCF-LAN.
2.3Operator Panel
When no network connection is available the operator panel is used to start or stop
the server. The operator panel displays three LED status indicators, a Power switch,
and a security keyswitch. The panel is located on the front of the server, in the upper
right.
When the server is running, the Power and XSCF STANDBY LEDs (green) should be
lit and the CHECK LED (amber) should not be lit. If the CHECK LED is lit, search
the system logs to determine what is wrong.
The three LED status indicators on the operator panel provide the following:
■ General system status
■ System problem alerts
■ Location of the system fault
FIGURE 2-3 and FIGURE 2-4 show the operator panel.
2-8SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
FIGURE 2-3 M4000 Server Operator Panel
Location NumberComponent
1POWER LED
2XSCF STANDBY LED
3CHECK LED
4Power switch
5Mode switch (keyswitch)
6Antistatic ground socket
1
2
3
4
5
6
l
Chapter 2 Fault Isolation2-9
FIGURE 2-4 M5000 Server Operator Panel
Location NumberComponent
1POWER LED
2XSCF STANDBY LED
3CHECK LED
4Power switch
5Mode switch (keyswitch)
6Antistatic ground socket
1
2
3
4
5
6
Additional LEDs are located in various locations in the server. For more information
about LED indicator locations, see Section 2.5, “LED Functions” on page 2-18.
2-10SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
The Operator panel LEDs operate as described in TABLE 2-3.
TABLE 2-3Operator Panel LEDs and Switches
IconNameColorDescription
POWER LEDGreenIndicates the server power status.
• On: Server has power.
• Off: Server is without power.
• Blinking: The power-off sequence is in progress.
XSCF
STANDBY
LED
GreenIndicates the readiness of the XSCF.
• On: XSCF unit is functioning normally.
• Off: XSCF unit is stopped.
• Blinking: Under system initialization after server
power-on, or under system power-on process.
Indicates that server detected a fault.
CHECK LEDAmber
• On: Error detected that disables the startup.
• Off: Normal, or server power-off (power failure).
• Blinking: Indicates the position of fault.
Power switchSwitch to direct server power on/power off.
The Locked setting:
• Normal key position. Power on is available with the
Mode switch
(keyswitch)
Power switch, but power off is not.
• Disables the Power switch to prevent unauthorized
users from powering the server on or off.
• The Locked position is the recommended setting for
normal day-to-day operations.
The Service setting:
• Service should be provided at this position.
• Power on and off is available with Power switch.
• The key cannot be pulled out at this position.
Chapter 2 Fault Isolation2-11
The state displayed by LED combination is described in TABLE 2-4.
TABLE 2-4State Display by LED Combination (Operator Panel)
LED
XSCF
STANDBYCHECK
OffOffOffThe circuit breaker is switched off.
OffOffOnThe circuit breaker is switched on.
OffBlinkingOffThe XSCF is being initialized.
OffBlinkingOnAn error occurred in the XSCF.
OffOnOffThe XSCF is on standby.
OnOnOffWarm-up standby processing is in progress
BlinkingOnOffThe power-off sequence is in progress.
Description of the statePOWER
The system is waiting for power-on of the air
conditioning system.
(power-on is delayed).
The power-on sequence is in progress.
The system is in operation.
Fan termination is being delayed.
2-12SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
The operator panel mode switch is used to set the operation mode. The operator panel
power switch is used to power on and off the server.
TABLE 2-6 lists the settings and
corresponding functions of the mode switch on the operator panel.
TABLE 2-5Switches (Operator Panel)
NameDescription of Function
Mode switchUsed to set an operation mode for the server. Insert the special key that is under the
customer’s control, to switch between modes.
LockedNormal operation mode.
The system can be powered on with the power switch, but it
cannot be powered off with the power switch.
The key can be pulled out at this key position.
ServiceMode for maintenance.
The system can only be powered on and off with the power
switch.
The key cannot be pulled out at this key position.
Maintenance is performed in Service mode while the server
is stopped.
Because remote power control and automatic power control
of the server are disabled in Service mode, unintentional
power on can be prevented.
Power switchUsed to control the server power. Power on and power off are controlled by pressing this
switch in different patterns, as described below.
Holding down for a short time
(less than 4 seconds)
Regardless of the mode switch state, the server (all domains)
is powered on.
At this time, processing for waiting for facility (air
conditioners) power on and warm-up completion is skipped.
Holding down for a long time
in Service mode
(4 seconds or longer)
If power to the server is on (at least one domain is
operating), shutdown processing is executed for all domains
before the system is powered off.
If the system is being powered on, the power-on processing
is cancelled, and the system is powered off.
If the system is being powered off, the operation of the
Power switch is ignored, and the power-off processing is
continued.
Chapter 2 Fault Isolation2-13
TABLE 2-6Meanings of the Mode Switch
FunctionMode Switch
State DefinitionLockedService
Inhibition of Break Signal ReceptionEnabled. Reception of the
break signal can be enabled or
disabled for each domain
using setdomainmode.
Power On/Off by power switchOnly power on is enabledEnabled
Disabled
2.4Error Conditions
Always access the following web site first to interpret faults and obtain information
on FMA messages.
http://www.sun.com/msg
This web site can be used in the event of an Oracle Solaris or domain failure or to
look up specific FMA error messages it will not provide details on XSCF errors.
The web site directs you to provide the message ID that your software displayed. The
web site then provides knowledge articles about the fault and corrective action to
resolve the fault. The fault information and documentation at this web site is
updated regularly.
Predictive self-healing is an architecture and methodology for automatically
diagnosing, reporting, and handling software and hardware fault conditions. This
new technology lessens the time required to debug a hardware or software problem
and provides the administrator and technical support with detailed data about each
fault.
2.4.1Predictive Self-Healing Tools
In the Solaris 10 software, the fault manager runs in the background. If a failure
occurs, the system software recognizes the error and attempts to determine what
hardware is faulty. The software also takes steps to prevent that component from
being used until it has been replaced. Some of the specific actions the software takes
include:
■ Receives telemetry information about problems detected by the system software.
2-14SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
■ Diagnoses the problems.
■ Initiates pro-active self-healing activities. For example, the fault manager can
disable faulty components.
■ When possible, causes the faulty FRU to provide an LED indication of a fault in
addition to populating the system console messages with more details.
TABLE 2-7 shows a typical message generated when a fault occurs. The message
appears on your console and is recorded in the /var/adm/messages file.
Note – The message in TABLE 2-7 indicates that the fault has already been diagnosed.
Any corrective action that the system can perform has already taken place. If your
server is still running, it continues to run.
Chapter 2 Fault Isolation2-15
TABLE 2-7Predictive Self-Healing Message
Output DisplayedDescription
Nov 1 16:30:20 dt88-292 EVENT-TIME: Tue Nov 1 16:30:20
PST 2005
Nov 1 16:30:20 dt88-292 PLATFORM: SUNW,A70, CSN: -,
HOSTNAME: dt88-292
EVENT-TIME: the time stamp of
the diagnosis.
PLATFORM: A description of the
server encountering the problem.
Nov 1 16:30:20 dt88-292 SOURCE: eft, REV: 1.13SOURCE: Information on the
Diagnosis Engine used to
determine the fault.
Nov 1 16:30:20 dt88-292 EVENT-ID:
afc7e660-d609-4b2f-86b8-ae7c6b8d50c4
Nov 1 16:30:20 dt88-292 DESC:
Nov 1 16:30:20 dt88-292 A problem was detected in the
EVENT-ID: The Universally
Unique event ID for this fault.
DESC: A basic description of the
failure.
PCI-Express subsystem
Nov 1 16:30:20 dt88-292 Refer to
http://sun.com/msg/SUN4-8000-0Y for more information.
WEB SITE: Where to find specific
information and actions for this
fault.
Nov 1 16:30:20 dt88-292 AUTO-RESPONSE: One or more
device instances may be disabled.
AUTO-RESPONSE: What, if
anything, the system did to
alleviate any follow-on issues.
Nov 1 16:30:20 dt88-292 IMPACT: Loss of services
provided by the device instances associated with this
IMPACT: A description of what
that response might have done.
fault.
Nov 1 16:30:20 dt88-292 REC-ACTION: Schedule a repair
procedure to replace the affected device. Use Nov 1
16:30:20 dt88-292 fmdump -v -u EVENT_ID to identify the
REC-ACTION: A short description
of what the system administrator
should do.
device or contact Sun for support.
2-16SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
2.4.2Monitoring Output
To understand error conditions, collect the monitoring output information. For the
collection of the information, use the commands shown in
TABLE 2-8Commands for Checking the Monitoring Output
CommandOperandDescription
showlogs(8)consoleDisplays console of Domain.
monitorLogs messages that are displayed in the message window.
panicLogs output to the console during a panic.
iplCollects console data generated during the period of the
power on of a domain to the completion of the operating
system start.
2.4.3Messaging Output
To understand error conditions, collect messaging output information, use the
commands shown in
TABLE 2-9Commands for Checking the Messaging Output
TABLE 2-9.
TABLE 2-8.
CommandOperandDescription
showlogsenvDisplays the temperature history log. The environmental
temperature data and power status are indicated in 10-minute
intervals. the data is stored for a maximum of six months.
powerDisplays the power and reset information.
eventDisplays information reported to the operating system and
stored as event logs.
errorDisplays error logs.
fmdump(1M)
fmdump(8)
Displays fault management architecture diagnostic results and
errors. It is provided as an Oracle Solaris command and XSCF
Shell command.
Each error message logged by the predictive self-healing architecture has a code
associated with it as well as a web address that can be followed to get the most
up-to-date course of action for dealing with that error.
Refer to the Oracle Solaris 10 documentation for more information on predictive
self-healing.
Chapter 2 Fault Isolation2-17
2.5LED Functions
LED lights help the user find the component and provide information on the state of
the component.
This section explains the LEDs of each component that are to be checked when a
component is replaced. Most components are equipped with LEDs that help indicate
which component has the error and an LED to indicate whether the component can
be removed.
Some components, such as DIMMs, do not have LEDs. The state of a component
without LEDs can be checked using the showhardconf and ioxadm XSCF Shell
commands from the maintenance terminal. See the SPARC EnterpriseM3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide for more detailed
information.
TABLE 2-10 describes the LEDs and their functions.
TABLE 2-10 Component LEDs
LED NameDisplay and Meaning
READY (green) Indicates whether the component is operating.
OnIndicates that the component is operating. The component
cannot be disconnected and removed from the server while
the READY LED is On.
BlinkingIndicates that the component is being configured (or
disconnected).
For an XSCF unit it indicates that it is being initialized.
OffIndicates that the component is stopped. The component can
be disconnected and replaced.
CHECK
(amber)
Indicates that the component contains an error or that the component is a
target for replacement.
OnIndicates that an error has been detected.
BlinkingIndicates that the component is ready to be replaced. The
blinking LED acts as a locator.
OffIndicates no known error exists.
2-18SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
TABLE 2-11 describes the components and their LEDs.
TABLE 2-11 Component LED Descriptions
ComponentLED TypeLED DisplayMeaning
XSCF unitACTIVEOn (green)Indicates that the XSCF unit is active.
OffIndicates that the XSCF unit is on standby.
XSCF unit and IO
(display part for
LAN)
ACTIVEOn (green)Indicates that the communication is being
performed through the LAN port.
OffIndicates that no communication is being
performed through the LAN port.
LINK SPEEDOn (amber)Indicates that the communication speed for the
LAN port is 1G bps.
On (green)Indicates that the communication speed for the
LAN port is 100M bps.
OffIndicates that the communication speed for the
LAN port is 10M bps.
PCI slotPOWEROn (green)Indicates that the power to the PCI slot is turned
on. The PCI card cannot be removed.
OffIndicates that the power to the PCI slot is turned
off. The PCI card can be removed.
ATTENTIONOn (amber)Indicates that an error occurred in the PCI slot.
Blinking (amber)Indicates that the card in this PCI slot is a target
device for replacement.
OffIndicates the normal state of the PCI slot.
Chapter 2 Fault Isolation2-19
TABLE 2-11 Component LED Descriptions (Continued)
ComponentLED TypeLED DisplayMeaning
Power supply unit
(PSU)
READYOn (green)Indicates that the power is turned on and being
supplied.
Blinking (green)Indicates that the power is being supplied to the
power supply unit, but the power supply unit is
not turned on.
OffIndicates that power is not being supplied to the
power supply unit.
CHECKOn (amber)Indicates that an error occurred in the power
supply unit.
OffIndicates the normal state of the power supply
unit.
LED_ACOn (green)Power supply unit has AC applied and is
supplying 12V.
OffIndicates that AC is out of the specified
operating range and 12V is not being supplied
from the power supply unit.
LED_DCOn (green)Power supply unit has AC applied and is
supplying 48V. Standby pinhole provides a
manual backup to turn off 48V power.
OffIndicates that 48V is not being supplied from
the power supply unit.
FanATTENTIONOn (amber)Indicates that an error occurred.
Blinking (amber)Indicates that the fan is a target device for
replacement.
2-20SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
2.6Using the Diagnostic Commands
After the message in TABLE 2-7 is displayed, you might desire more information about
the fault. For complete information about troubleshooting commands, refer to the
Oracle Solaris 10 man pages or the XSCF Shell man pages. This section describes
some details of the following commands:
■ showlogs
■ fmdump
■ fmadm
■ fmstat
2.6.1Using the showlogs Command
The showlogs command displays the contents of a specified log in order of time
stamp starting with the oldest date. The showlogs command displays the following
logs:
■ error log
■ power log
■ event log
■ temperature and humidity record
■ monitoring message log
■ console message log
■ panic message log
■ IPL message log
An example of the showlogs output.
XSCF> showlogs error
Date: Oct 03 17:23:11 UTC 2006 Code: 80002000-ccff0000-0104340100000000
Status: Alarm Occurred: Oct 03 17:23:10.868 UTC 2006
FRU: /FAN_A#0
Msg: Abnormal FAN rotation speed. Insufficient rotation
XSCF>
Chapter 2 Fault Isolation2-21
2.6.2Using the fmdump Command
The fmdump command can be used to display the contents of any log files associated
with the Oracle Solaris fault manager.
The fmdump command produces output similar to
EXAMPLE 2-1. This example
assumes there is only one fault.
EXAMPLE 2-1fmdump Output
# fmdump
TIME UUID SUNW-MSG-ID
Nov 02 10:04:15.4911 0ee65618-2218-4997-c0dc-b5c410ed8ec2 SUN4-8000-0Y
2.6.2.1fmdump -VCommand
You can obtain more detail by using the -V option.
# fmdump -V -u 0ee65618-2218-4997-c0dc-b5c410ed8ec2
TIME UUID SUNW-MSG-ID
Nov 02 10:04:15.4911 0ee65618-2218-4997-c0dc-b5c410ed8ec2 SUN4-8000-0Y
100% fault.io.fire.asic
FRU: hc://product-id=SUNW,A70/motherboard=0
rsrc: hc:///motherboard=0/hostbridge=0/pciexrc=0
At least three lines of new output are delivered to the user with the -V option.
■ The first line is a summary of information you have seen before in the console
message but includes the time stamp, the UUID and the Message-ID.
■ The second line is a declaration of the certainty of the diagnosis. In this case we
are 100 percent sure the failure is in the ASIC described. If the diagnosis might
involve multiple components you might see two lines here with 50% in each (for
example)
■ The FRU line declares the part that needs to be replaced to return the server to a
fully operational state.
■ The rsrc line describes which component was taken out of service as a result of
this fault.
2-22SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
2.6.2.2fmdump -eCommand
To get information of the errors that caused this failure you can use the -e option, as
shown in the following example.
XSCF> fmdump -e
TIME CLASS
Oct 03 13:52:48.9532 ereport.fm.fmd.module
Oct 03 13:52:48.9610 ereport.fm.fmd.module
Oct 03 13:52:48.9674 ereport.fm.fmd.module
Oct 03 13:52:48.9738 ereport.fm.fmd.module
2.6.3Using the fmadmfaultyCommand
The fmadm faulty command can be used by administrators and service personnel
to view and modify system configuration parameters that are maintained by the
Oracle Solaris fault manager. The command is primarily used to determine the status
of a component involved in a fault, as shown in the following example.
The PCI device is degraded and is associated with the same UUID as seen above. You
might also see “faulted” states.
2.6.3.1fmadm repair Command
If fmadm faulty occurs, the faulty FRU (CPU, memory, or I/O unit) is replaced,
and then the fmadm repair command needs to be executed to clear FRU
information on the domain. If the fmadam repair command is not executed, error
messages continue to be output.
Chapter 2 Fault Isolation2-23
If fmadm faulty occurs, the FMA resource cache on the OS side can be cleared
without problems; the data in it need not match the hardware failure information
retained on the XSCF side.
The fmadm config command output shows you the version numbers of the
diagnosis engines in use by your server, as well as their current state. You can check
these versions against information on the My Oracle Support web site to determine if
you are running the latest diagnostic engines, as shown in the following example.
XSCF> fmadm config
MODULE VERSION STATUS DESCRIPTION
eft 1.16 active eft diagnosis engine
event-transport 2.0 active Event Transport Module
faultevent-post 1.0 active Gate Reaction Agent for errhandd
fmd-self-diagnosis 1.0 active Fault Manager Self-Diagnosis
iox_agent 1.0 active IO Box Recovery Agent
reagent 1.1 active Reissue Agent
sysevent-transport 1.0 active SysEvent Transport Agent
syslog-msgs 1.0 active Syslog Messaging Agent
XSCF>
2.6.4Using the fmstat Command
The fmstat command can report statistics associated with the Oracle Solaris fault
manager. The fmstat command shows information about DE performance. In the
example below, the fmd-self-diagnosis DE (also seen in the console output) has
received an event which it accepted. A case is “opened” for that event and a
diagnosis is performed to “solve” the cause for the failure. See the following
example.
2-24SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
These superuser commands can help you determine if you have issues in your
workstation, in the network, or within another server that you are networking with.
The following commands are described in this section:
■ “Using the iostat Command” on page 2-26
■ “Using the prtdiag Command” on page 2-27
■ “Using the prtconf Command” on page 2-30
■ “Using the netstat Command” on page 2-32
■ “Using the ping Command” on page 2-34
■ “Using the ps Command” on page 2-35
■ “Using the prstat Command” on page 2-36
Most of these commands are located in the /usr/bin or /usr/sbin directories.
Note – For additional details, options, examples, and the most up to date
information for each command refer to that command’s man page.
Chapter 2 Fault Isolation2-25
2.7.1Using the iostat Command
The iostat command iteratively reports terminal, drive, and tape I/O activity, as
well as CPU utilization.
2.7.1.1Options
TABLE 2-12 describes options for the iostat command and how those options can
help troubleshoot the server.
TABLE 2-12 Options for iostat
OptionDescriptionHow It Can Help
No optionReports status of local I/O devices.A quick three-line output of device status.
-cReports the percentage of time the
system has spent in user mode, in system
mode, waiting for I/O, and idling.
-eDisplays device error summary statistics.
The total errors, hard errors, soft errors,
and transport errors are displayed.
-EDisplays all device error statistics.Provides information about devices: manufacturer,
-nDisplays names in descriptive format.Descriptive format helps identify devices.
-xFor each drive, reports extended drive
statistics. The output is in tabular form.
Quick report of CPU status.
Provides a short table with accumulated errors.
Identifies suspect I/O devices.
model number, serial number, size, and errors.
Similar to the
information. This helps identify poor performance of
internal devices and other I/O devices across the
network.
-e option, but provides rate
The following example shows output for one iostat command.
# iostat -En
c0t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATEProduct: ST973401LSUN72GRevision: 0556 Serial No: 0521104T9D
Size: 73.40GB <73400057856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c0t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATEProduct: ST973401LSUN72GRevision: 0556 Serial No: 0521104V3V
Size: 73.40GB <73400057856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
#
2-26SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
2.7.2Using the prtdiag Command
The prtdiag command displays configuration and diagnostic information. The
diagnostic information identifies any failed component.
The prtdiag command is located in the /usr/platform/platform-name/sbin/
directory.
Note – The prtdiag command might indicate a slot number different than that
identified elsewhere in this document. This is normal.
2.7.2.1Options
TABLE 2-13 describes options for the prtdiag command and how those options can
help troubleshooting.
TABLE 2-13 Options for prtdiag
OptionDescriptionHow It Can Help
No optionLists components.Identifies CPU timing and PCI cards installed.
-vVerbose mode. Displays the
time of the most recent AC
power failure and the most
recent hardware fatal error
information.
Provides the same information as no option. Additionally
lists fan status, temperatures, ASIC, and PROM revisions.
Chapter 2 Fault Isolation2-27
The following example shows output for the prtdiag command in verbose mode.
# prtdiag -v
System Configuration: xxxx Server
System clock frequency: 1012 MHz
Memory size: 262144 Megabytes
==================================== CPUs ====================================
CPU CPU Run L2$ CPU CPU
LSB Chip ID MHz MB Impl. Mask
==================== Hardware Revisions ====================
System PROM revisions:
---------------------OBP 4.24.13 2010/02/08 13:17
=================== Environmental Status ===================
Mode switch is in LOCK mode
=================== System Processor Mode ===================
SPARC64-VII mode
Chapter 2 Fault Isolation2-29
2.7.3Using the prtconf Command
Similar to the show-devs command run at the ok prompt, the prtconf command
displays the devices that are configured.
The prtconf command identifies hardware that is recognized by the Oracle Solaris
OS. If hardware is not suspected of being bad yet software applications are having
trouble with the hardware, the prtconf command can indicate if the Oracle Solaris
software recognizes the hardware, and if a driver for the hardware is loaded.
2.7.3.1Options
TABLE 2-14 describes options for the prtconf command and how those options can
help troubleshooting.
TABLE 2-14 Options for prtconf
OptionDescriptionHow It Can Help
No optionDisplays the device tree of
devices recognized by the OS.
-DSimilar to the output of no
option, however the device
driver is listed.
-pSimilar to the output of no
option, yet is abbreviated.
-VDisplays the version and date of
the OpenBoot PROM firmware.
If a hardware device is recognized, then it is probably
functioning properly. If the message “
attached)
sub-device, then the driver for the device is corrupt or
missing.
Lists the driver needed or used by the OS to enable the
device.
Reports a brief list of the devices.
Provides a quick check of firmware version.
” is displayed for the device or for a
(driver not
The following example shows output for the prtconf command.
# prtconf
System Configuration: Sun Microsystems sun4u
Memory size: 8064 Megabytes
System Peripherals (Software Nodes):
SUNW,SPARC-Enterprise
scsi_vhci, instance #0
packages (driver not attached)
SUNW,builtin-drivers (driver not attached)
deblocker (driver not attached)
2-30SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
The prtconf output continued.
disk-label (driver not attached)
terminal-emulator (driver not attached)
obp-tftp (driver not attached)
ufs-file-system (driver not attached)
chosen (driver not attached)
openprom (driver not attached)
client-services (driver not attached)
options, instance #0
aliases (driver not attached)
memory (driver not attached)
virtual-memory (driver not attached)
pseudo-console, instance #0
nvram (driver not attached)
pseudo-mc, instance #0
cmp (driver not attached)
core (driver not attached)
cpu (driver not attached)
cpu (driver not attached)
core (driver not attached)
cpu (driver not attached)
cpu (driver not attached)
cmp (driver not attached)
core (driver not attached)
cpu (driver not attached)
cpu (driver not attached)
core (driver not attached)
cpu (driver not attached)
cpu (driver not attached)
pci, instance #0
ebus, instance #0
flashprom (driver not attached)
serial, instance #0
scfc, instance #0
panel, instance #0
pci, instance #0
pci, instance #0
pci, instance #1
pci, instance #3
scsi, instance #0
tape (driver not attached)
disk (driver not attached)
sd, instance #0 (driver not attached)
sd, instance #2
sd, instance #4
network, instance #0
network, instance #1 (driver not attached)
pci, instance #4
2-32SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
2.7.4.1Options
TABLE 2-15 describes options for the netstat command and how those options can
help troubleshooting.
TABLE 2-15 Options for netstat
OptionDescriptionHow It Can Help
-iDisplays the interface state,
including packets in/out, error
in/out, collisions, and queue.
-i intervalProviding a trailing number
with the
-i option repeats the
netstat command every
interval seconds.
-pDisplays the media table.Provides MAC address for hosts on the subnet.
-rDisplays the routing table.Provides routing information.
-nReplaces host names with IP
addresses.
The following example shows output for the netstat -p command.
# netstat -p
Net to Media Table: IPv4
Device IP Address Mask Flags Phys Addr
------ -------------------- --------------- -------- --------------bge0 san-ff1-14-a 255.255.255.255 o 00:14:4f:3a:93:61
bge0 san-ff2-40-a 255.255.255.255 o 00:14:4f:3a:93:85
sppp0 224.0.0.22 255.255.255.255
bge0 san-ff2-42-a 255.255.255.255 o 00:14:4f:3a:93:af
bge0 san09-lab-r01-66 255.255.255.255 o 00:e0:52:ec:1a:00
sppp0 192.168.1.1 255.255.255.255
bge0 san-ff2-9-b 255.255.255.255 o 00:03:ba:dc:af:2a
bge0 bizzaro 255.255.255.255 o 00:03:ba:11:b3:c1
bge0 san-ff2-9-a 255.255.255.255 o 00:03:ba:dc:af:29
bge0 racerx-b 255.255.255.255 o 00:0b:5d:dc:08:b0
bge0 224.0.0.0 240.0.0.0 SM 01:00:5e:00:00:00
#
Provides a quick overview of the network status.
Identifies intermittent or long duration network events.
By piping
can be viewed all at once.
Used when an address is more useful than a host name.
netstat output to a file, overnight activity
Chapter 2 Fault Isolation2-33
2.7.5Using the ping Command
The ping command sends ICMP ECHO_REQUEST packets to network hosts.
Depending on how the ping command is configured, the output displayed can
identify troublesome network links or nodes. The destination host is specified in the
variable hostname.
2.7.5.1Options
TABLE 2-16 describes options for the ping command and how those options can help
troubleshooting.
TABLE 2-16 Options for ping
OptionDescriptionHow It Can Help
hostnameThe probe packet is sent to
hostname and returned.
-g hostnameForces the probe packet to route
through a specified gateway.
-i interfaceDesignates which interface to
send and receive the probe
packet through.
-nReplaces host names with IP
addresses.
-sPings continuously in one-second
intervals. Ctrl-C aborts. Upon
abort, statistics are displayed.
-svRDisplays the route the probe
packet followed in one-second
intervals.
Verifies that a host is active on the network.
By identifying different routes to the target host, those
individual routes can be tested for quality.
Enables a simple check of secondary network interfaces.
Used when an address is more beneficial than a host name.
Helps identify intermittent or long-duration network events.
By piping
be viewed at once.
Indicates probe packet route and number of hops.
Comparing multiple routes can identify bottlenecks.
ping output to a file, activity overnight can later
The following example shows output for the ping -s command.
# ping -s san-ff2-17-a
PING san-ff2-17-a: 56 data bytes
64 bytes from san-ff2-17-a (10.1.67.31): icmp_seq=0. time=0.427 ms
64 bytes from san-ff2-17-a (10.1.67.31): icmp_seq=1. time=0.194 ms
^C
2-34SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
2.7.6Using the ps Command
The ps command lists the status of processes. Using options and rearranging the
command output can assist in determining the resource allocation.
2.7.6.1Options
TABLE 2-17 describes options for the ps command and how those options can help
troubleshooting.
TABLE 2-17 Options for ps
OptionDescriptionHow It Can Help
-eDisplays information for every
process.
-fGenerates a full listing.Provides the following process information: user ID,
-o optionEnables configurable output. The pid,
pcpu, pmem, and comm options
display process ID, percent CPU
consumption, percent memory
consumption, and the responsible
executable, respectively.
Identifies the process ID and the executable.
parent process ID, time when executed, and the path to
the executable.
Provides only most important information. Knowing
the percentage of resource consumption helps identify
processes that are affecting performance and might be
hung.
The following example shows output for one ps command.
# ps
PID TTYTIME CMD
101042 pts/30:00 ps
101025 pts/30:00 sh
#
Note – When using sort with the -r option, the column headings are printed so
that the value in the first column is equal to zero.
Chapter 2 Fault Isolation2-35
2.7.7Using the prstat Command
The prstat utility iteratively examines all active processes and reports statistics
based on the selected output mode and sort order. The prstat command provides
output similar to the ps command.
2.7.7.1Options
TABLE 2-18 describes options for the prstat command and how those options can
help troubleshooting.
TABLE 2-18 Options for prstat
OptionDescriptionHow It Can Help
No optionDisplays a sorted list of the top
processes that are consuming
the most CPU resources. List is
limited to the height of the
terminal window and the total
number of processes. Output is
automatically updated every
five seconds. Ctrl-C aborts.
-n numberLimits output to number of
lines.
-s keyPermits sorting list by key
parameter.
-vVerbose mode.Displays additional parameters.
Output identifies process ID, user ID, memory used, state,
CPU consumption, and command name.
Limits amount of data displayed and identifies primary
resource consumers.
Useful keys are cpu (default), time, and size.
The following example shows output for the prstat command.
2-36SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
2.8Other Issues
2.8.1Can’t Locate Boot Device
When the PCI-X card slot 0 is faulty or it is not seated properly, the firmware will
blacklist the entire PCI-X bridge device (and everything attached downstream from
it) causing the boot disk to disappear. The problem results in the showdisk
command failing to display the boot disk and the bootdisk command displaying
the console message “Can’t locate boot device”.
When this occurs remove the PCI/PCI-X card in slot 0 to see if the boot issue is
remedied. If the IO unit is fully stocked and it is not possible to remove the
PCI/PCI-X card, then you should attempt to place another card in slot 0, if possible.
If this also is not possible you should remove and reinstalling the existing card in
slot 0.
Chapter 2 Fault Isolation2-37
2-38SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
CHAPTER
3
Periodic Maintenance
This chapter describes the periodic maintenance required to keep the server running
regardless of whether a problem has occurred. The information is organized into the
following topic:
■ Section 3.1, “Tape Drive Unit” on page 3-1
3.1Tape Drive Unit
It might be necessary to use a cleaning tape when carrying out the cleaning
procedure.
Note – Contact your sales representative for tape drive unit options on M4000 and
M5000 servers.
3.1.1Cleaning the Tape Drive Unit
To avoid the "Clean Lamp" from prematurely illuminating, the following
maintenance rules should be followed:
■ Clean your tape drive unit once every 5 to 24 hours of continuous use, or once a
week.
■ Clean your tape drive unit once a month, even if it is not in use.
■ Clean your tape drive unit whenever the "Clean Lamp" indicator is lit or blinking.
■ Clean your tape drive unit before inserting a new data cassette.
3-1
■ Replace the cleaning cassette when the tape inside of the cassette has completely
wound up onto the right-hand spool or when the three lamps are in the following
states:"Off","Lit" and "Blinking."
■ Remove the cassette before turning the power "OFF". The tape life might be
shortened or a malfunction might occur during the backup process if the power is
turned "OFF" while the cassette is still inside.
Note – If the "cleaning lamp" starts blinking immediately after completion of a
cleaning operation, the data cassette might have been damaged. In this case, replace
the data cassette.
3-2SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
CHAPTER
4
FRU Replacement Preparation
This chapter describes how to prepare a field-replaceable unit (FRU) for safe
replacement. The information is organized into the following topics:
■ Section 4.1, “FRU Replacement Method” on page 4-1
■ Section 4.2, “Active Replacement” on page 4-4
■ Section 4.3, “Hot Replacement” on page 4-6
■ Section 4.4, “Cold Replacement (Powering the Server Off and On)” on page 4-12
4.1FRU Replacement Method
There are three basic methods for replacing the FRUs:
Active replacement – To replace a FRU while the domain, to which the FRU belongs,
continues running. Active replacement requires that the FRU be inactivated or
powered down using either an XSCF command or Oracle Solaris OS command.
Because the power supply unit (PSU) and fan unit (FAN) do not belong to any
domain, they are operated by using XSCF commands, regardless of the operating
state of the Oracle Solaris OS.
Note – The procedure for isolating the hard disk drive from the Oracle Solaris OS
varies depending on whether disk mirroring software or other support software is
used. For details, see the relevant software manuals.
Hot replacement –To replace a FRU while the domains are powered off. Depending
on the FRU to be replaced, the FRU can either be directly replaced or be inactivated
or powered down using an XSCF command.
Cold replacement – To replace a FRU while all domains are stopped and the server is
powered off and unplugged.
4-1
TABLE 4-1 lists the FRUs, location and access, and the replacement method.
CD-RW/DVD-RW Drive Unit (DVDU)FrontHot replacement
Cold replacement
Backplane unit (BPU_A, BPU_B)TopCold replacement
4-2SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
TABLE 4-1
FRUFRU Location/AccessRemoval Method(s)
FRU Replacement Information (Continued)
CPU module (CPUM_A)TopCold replacement
Memory board (MEMB)TopCold replacement
Motherboard (M4000) (MBU_A)RearCold replacement
Motherboard DC-DC Converter (M4000)
RearCold replacement
(DDC_A, DDC_B)
Motherboard (M5000) (MBU_B)TopCold replacement
Motherboard DC-DC Converter (M5000)
TopCold replacement
(DDC_A, DDC_B)
eXtended System Control facility unit (XSCFU)RearCold replacement
Hard disk drive backplane (HDDBP)TopCold replacement
CD-RW/DVD-RW backplaneTopCold replacement
Tape drive backplane (TAPEBP)TopCold replacement
Operator panel (OPNL)TopCold replacement
* When using active replacement for a PSU, only one power supply unit should be replaced at a time to ensure redundancy.
† When using active replacement for a 172-mm or 60-mmfan unit, only one fan unit should be replaced at atime to ensure redundancy.
Chapter 4 FRU Replacement Preparation4-3
4.2Active Replacement
In active replacement the Oracle Solaris OS must be configured to allow the
component to be replaced. Active replacement has four stages:
■ Section 4.2.1, “Removing a FRU From a Domain” on page 4-4
■ Section 4.2.2, “Removing and Replacing a FRU” on page 4-5
■ Section 4.2.3, “Adding a FRU Into a Domain” on page 4-5
■ Section 4.2.4, “Verifying Hardware Operation” on page 4-6
Note – If the hard disk drive is the boot device, the hard disk will have to be
replaced using cold replacement procedures. However, active replacement can be
used if the boot disk can be isolated from the Oracle Solaris OS by disk mirroring
software and other software.
4.2.1Removing a FRU From a Domain
Note – Before you remove a PCI cassette, make sure that there is no I/O activity on
the card in the cassette.
1. From the Oracle Solaris prompt, type the cfgadm command to get the
component status.
# cfgadm -a
Ap_IdTypeReceptacleOccupant
Condition
iou#0-pci#0etherne/hpconnectedconfiguredok
iou#0-pci#1fibre/hpconnectedconfiguredok
iou#0-pci#2pci-pci/hpconnectedconfiguredok
Ap_Id includes the IOU number (iou#0 or iou#1) and the PCI cassette slot number
(pci#1, pci#2, pci#3, pci#4).
4-4SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
Caution – If you use the PCI Hot Plug (PHP) function on the servers with Oracle
Solaris 10 9/10, or 142909-17 or later, enable the hotplug service as follows:
# svcadm enable hotplug
2. Type the cfgadm command to disconnect the component from the domain:
# cfgadm -c unconfigure Ap_Id
Note – For a PCI cassette, type the cfgadm -c disconnect command to
disconnect the component from the domain.
The Ap_Id is shown in the output of cfgadm.
3. Type the cfgadm command to confirm the component is now disconnected.
# cfgadm -a
Ap_IdTypeReceptacleOccupant
Condition
iou#0-pci#0etherne/hpdisconnected unconfigured
unknown
iou#0-pci#1fibre/hpconnectedconfiguredok
iou#0-pci#2pci-pci/hpconnectedconfiguredok
iou#0-pci#0 for example.
4.2.2Removing and Replacing a FRU
Once the FRU has been removed from the domain, see Section 4.3.1, “Removing and
Replacing a FRU” on page 4-7
4.2.3Adding a FRU Into a Domain
1. From the Oracle Solaris prompt, type the cfgadm command to connect the
component to the domain.
# cfgadm -c configure Ap_Id
The Ap_Id is shown in the output of cfgadm.
Chapter 4 FRU Replacement Preparation4-5
iou#0-pci#0 for example.
2. Type the cfgadm command to confirm the component is now connected.
# cfgadm -a
Ap_IdTypeReceptacleOccupant
Condition
iou#0-pci#0etherne/hpconnectedconfiguredok
iou#0-pci#1fibre/hpconnectedconfiguredok
iou#0-pci#2pci-pci/hpconnectedconfiguredok
4.2.4Verifying Hardware Operation
● Verify the state of the status LEDs.
The POWER LED should be On and the CHECK LED should not be On.
Note – If the hard disk drive is the boot device, the hard disk will have to be
replaced using cold replacement procedures. However, active replacement can be
used if the boot disk can be isolated from the Oracle Solaris OS by disk mirroring
software and other software.
4.3Hot Replacement
In hot replacement the Oracle Solaris OS does not need to be configured to allow the
component to be replaced. Depending on the FRU to be replaced, the FRU can either
be directly replaced or be inactivated or powered down using an XSCF command.
4-6SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
4.3.1Removing and Replacing a FRU
1. From the XSCF Shell prompt, type the replacefru command.
EXAMPLE 4-1replacefru command
XSCF> replacefru
---------------------------------------------------------------Maintenance/Replacement Menu
Please select a type of FRU to be replaced.
You are about to replace FAN_A#0.
Do you want to continue?[r:replace|c:cancel] :r
Please confirm the CHECK LED is blinking.
If this is the case, please replace FAN_A#0.
After replacement has been completed, please select[f:finish] :f
Chapter 4 FRU Replacement Preparation4-7
The replacefru command automatically tests the status of the component after the
remove and replace is finished.
EXAMPLE 4-2replacefru command status
Diagnostic tests of FAN_A#0 is started.
[This operation may take up to 2 minute(s)]
(progress scale reported in seconds)
0..... 30..... 60..... 90.....done
---------------------------------------------------------------Maintenance/Replacement Menu
Status of the replaced unit.
FRU Status
------------- -------FAN_A#0 Normal
---------------------------------------------------------------The replacement of FAN_A#0 has completed, normally.[f:finish] :f
---------------------------------------------------------------Maintenance/Replacement Menu
Please select a type of FRU to be replaced.
1. FAN (Fan Unit)
2. PSU (Power Supply Unit)
---------------------------------------------------------------Select [1,2|c:cancel] : C
XSCF>
Note – The display may vary depending on the XCP version.
When the tests are complete the program returns to the original menu. Select cancel
to return to the XSCF Shell prompt.
Refer to the replacefru man page for more information.
4-8SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
4.3.2Verifying Hardware Operation
1. Type the showhardconf command to confirm the new component is installed.
Refer to the showhardconf -u man page for more information.
3. Verify the state of the status LEDs on the FRU.
Refer to
TABLE 2-11 for LED status.
Chapter 4 FRU Replacement Preparation4-11
4.4Cold Replacement (Powering the Server
Off and On)
In cold replacement all business operations are stopped. Cold replacement is the act
of powering off the server and disconnecting input power. This is normally required
for safety when the inside of the server is accessed.
Note – The input power cables are used to ground the server. If the server is not
mounted in a rack use a grounding strap to ground the server.
Note – After a complete chassis power cycle (all power cords removed), make
certain to allow 30 seconds before connecting the power cords back into the chassis.
4.4.1Powering the Server Off Using Software
1. Notify users that the server is being powered off.
2. Back up the system files and data to tape, if necessary.
3. Log in to the XSCF Shell and type the poweroff command.
XSCF> poweroff -a
The following actions occur when the poweroff command is used:
■ The Oracle Solaris OS shuts down cleanly.
■ The server powers off to Standby mode (the XSCF unit and one fan will still have
power).
Refer to the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCFUser’s Guide for details.
4. Verify the state of the status LED on the XSCF.
The POWER LED should be off.
5. Disconnect all power cables from the input power source.
4-12SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
Caution – There is an electrical hazard if the power cords are not disconnected. All
power cords must be disconnected to completely remove power from the server.
4.4.2Powering the Server On Using Software
1. Make sure that the server has enough power supply units to run the desired
configuration.
2. Connect all power cables to the input power source.
3. Make sure the XSCF STANDBY LED on the operator panel is On.
4. Turn the keyswitch on the operator panel to the desired mode position (Locked
or Service).
5. Log into the XSCF Shell and type the poweron command.
XSCF> poweron -a
Refer to the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF
User’s Guide for details.
6. After a delay the following activities occur:
■ The operator panel POWER LED lights.
■ The system executes the power-on self-test (POST).
Then, the server is completely powered on.
Note – If the Oracle Solaris automatic booting is set, use the sendbreak -d
domain_id command after the display console banner is displayed but before the
system starts booting the operating system to get the ok prompt.
Chapter 4 FRU Replacement Preparation4-13
4.4.3Powering the Server Off Manually
1. Notify users that the server is being powered off.
2. Back up the system files and data to tape, if necessary.
3. Place the keyswitch in the Service position.
4. Press and hold the Power switch on the operator panel for four seconds or
longer to initiate the power off.
5. Verify the state of the status POWER LED on the operator panel is off.
6. Disconnect all power cables from the input power source.
Caution – There is an electrical hazard if the power cords are not disconnected. All
power cords must be disconnected to completely remove power from the server.
4.4.4Powering the Server On Manually
1. Make sure that the server has enough power supply units to run the desired
configuration.
2. Connect all power cables to the input power source.
3. Make sure the XSCF STANDBY LED is On.
4. Turn the keyswitch on the operator panel to the desired mode position (Locked
or Service).
5. Press the Power switch on the operator panel.
After a delay the following activities occur:
■ The operator panel POWER LED lights.
■ The system executes the power-on self-test (POST).
Then, the server is completely powered on.
Note – If the Oracle Solaris automatic booting is set, using the sendbreak -d
domain_id command after the display console banner is displayed but before the
system starts booting the operating system to get the ok prompt.
4-14SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
4.4.5Verifying Hardware Operation
1. From the ok prompt, press the Enter key, and press the “#.” (number sign and
period) keys to switch you from the domain console to the XSCF console.
2. Type the showhardconf command to confirm the new component is installed.
5. Type the probe-scsi-all command to confirm that the storage devices are
mounted.
EXAMPLE 4-9probe-scsi-all
ok probe-scsi-all
/pci@0,600000/pci@0/pci@8/pci@0/scsi@1
MPT Version 1.05, Firmware Version 1.07.00.00
Target 0
Unit 0DiskSEAGATE ST973401LSUN72G 0556143374738 Blocks,
73 GB
SASAddress 5000c5000092beb9 PhyNum 0
Target 1
Unit 0DiskSEAGATE ST973401LSUN72G 0556143374738 Blocks,
73 GB
SASAddress 5000c500002eeaf9 PhyNum 1
Target 3
Unit 0Removable Read Only deviceTSSTcorpCD/DVDW TS-L532USR01
SATA device PhyNum 3
ok
6. Type the boot command to start the operating system.
ok boot
4-18SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
CHAPTER
5
Internal Components Access
This chapter describes how to access the internal components. The information is
organized into the following topics:
■ Section 5.1, “Sliding the Server In and Out to the Fan Stop” on page 5-1
■ Section 5.2, “Top Cover Remove and Replace” on page 5-5
■ Section 5.3, “Fan Cover Remove and Replace” on page 5-8
5.1Sliding the Server In and Out to the Fan
Stop
The slide rails have two designated lock points. The first, the fan stop, is for easy
access to the fan units. The fan units are hot, active, or cold replacement components.
When using active replacement, only one fan unit should be replaced at a time to
ensure redundancy.
5-1
5.1.1Sliding the Server Out of the Equipment Rack
Caution – To prevent the equipment rack from tipping over, you must deploy the
antitilt feature, if applicable, before you slide the server out of the equipment rack.
Note – When drawing out the SPARC Enterprise M4000/M5000 Server to the front,
release the cable tie holding the PCI cables on the rear of the server.
Caution – Use proper ESD grounding techniques when handling components. See
Section 1.1, “Safety Precautions” on page 1-1.
1. Deploy the rack’s antitilt features (if applicable).
Refer to the manual that shipped with the rack for details on antitilt features.
2. If shipping brackets are attached to the back of the server, loosen the four (4)
captive screws (FIGURE 5-1).
5-2SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
FIGURE 5-1 Loosening the Captive Screws on the Shipping Brackets
Chapter 5 Internal Components Access5-3
3. Loosen the four (4) captive screws at the front of the server (FIGURE 5-2).
FIGURE 5-2 Loosening the Captive Screws and Pulling Out the Server
4. Pull the system to the fan stop.
The system automatically locks in place at the fan stop.
5.1.2Sliding the Server Into the Equipment Rack
1. Push the green plastic releases on each slide rail and push the server back into
the equipment rack.
2. Tighten the four (4) captive screws at the front of the server to secure it in the
rack (
FIGURE 5-2).
3. Tighten the four (4) captive screws on the shipping brackets at the rear of the
server (
4. Restore the rack antitilt features to their original position.
5-4SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
FIGURE 5-1).
5.2Top Cover Remove and Replace
You must slide the server out of the equipment rack before removing the top cover.
5.2.1Removing the Top Cover
Caution – To prevent the equipment rack from tipping over, you must deploy the
antitilt feature, if applicable, before you slide the server out of the equipment rack.
Note – When drawing out the SPARC Enterprise M4000/M5000 Server to the front,
release the cable tie holding the PCI cables on the rear of the server.
Caution – Use proper ESD grounding techniques when handling components. See
Section 1.1, “Safety Precautions” on page 1-1.
1. Deploy the rack’s antitilt features (if applicable).
Refer to the rack manual for details on the rack’s antitilt features.
2. Loosen the four (4) captive screws at the front of the server (
3. Loosen the four (4) captive screws on the shipping brackets at the rear of the
system (
Note – During installation the power cables should have been bundled into a loop
with enough slack to allow the system to slide out on the rails. This is called the
service loop. If this is not the case the power cables will have to be disconnected to
allow the server to pull all the way out of the equipment rack.
4. Pull the server to the fan stop.
The server automatically locks in place at the fan stop.
5. Push the green plastic releases on each slide rail and pull the server until it is
fully extended.
The server automatically locks in place when fully extended.
FIGURE 5-1).
Chapter 5 Internal Components Access5-5
FIGURE 5-2).
6. Loosen the captive screw(s) on the center top of the server.
The SPARC Enterprise M4000 server has one (1) captive screw (
SPARC Enterprise M5000 server has two (2) captive screws (
7. Slide the top cover towards the rear and then remove it.
FIGURE 5-3 Removing the M4000 Server Top Cover
FIGURE 5-1). The
FIGURE 5-4).
5-6SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
FIGURE 5-4 Removing the M5000 Server Top Cover
Chapter 5 Internal Components Access5-7
5.2.2Replacing the Top Cover
1. Align the top cover and then slide it towards the front of the server.
2. Tighten the captive screws at the center top of the server to secure the top cover
in place.
3. Push the green plastic releases on each slide rail and push the system back into
the equipment rack.
4. Tighten the four (4) captive screws at the front of the system to secure it in the
rack (
FIGURE 5-2).
5. Tighten the four (4) captive screws on the shipping brackets at the rear of the
server (
6. Reconnect the service loop cables to the rear of the server.
7. Restore the rack antitilt features to their original position.
FIGURE 5-1).
5.3Fan Cover Remove and Replace
All internal components are cold replacement components. The server must be
powered off and power cables disconnected from the input power source. You must
slide the server out of the equipment rack before removing the fan cover.
5.3.1Removing the Fan Cover
Caution – To prevent the equipment rack from tipping over, you must deploy the
antitilt feature, if applicable, before you slide the server out of the equipment rack.
5-8SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
Note – When drawing out the M400/M5000 Server to the front, release the cable tie
holding the PCI cables on the rear of the server.
Caution – Use proper ESD grounding techniques when handling components. See
Section 1.1, “Safety Precautions” on page 1-1.
1. Deploy the rack’s antitilt features (if applicable).
Refer to the rack manual for details on the rack’s antitilt features.
2. Loosen the four (4) captive screws at the front of the server (
FIGURE 5-2).
3. Loosen the four (4) captive screws on the shipping brackets at the rear of the
system (
FIGURE 5-1).
Note – During installation the power cables should have been bundled into a loop
with enough slack to allow the system to slide out on the rails. This is called the
service loop. If this is not the case the power cables will have to be disconnected to
allow the server to pull all the way out of the equipment rack.
4. Pull the server to the fan stop.
The server automatically locks in place at the fan stop.
5. Push the green plastic releases on each slide rail and pull the server until it is
fully extended.
The server automatically locks in place when fully extended.
6. Remove the 60-mm fan units and place them on an ESD mat.
See Section 10.1.2, “Removing the 60-mm Fan Module” on page 10-5.
7. Loosen the captive screw on the fan cover.
8. Lift the rear edge of the fan cover and remove it.
Chapter 5 Internal Components Access5-9
FIGURE 5-5 Removing the Fan Cover
5.3.2Replacing the Fan Cover
1. Align the tabs on the forward section of the fan cover and push the cover down
to secure it in place.
2. Tighten the captive screw on the fan cover.
3. Install the 60-mm fan units.
See Section 10.1.3, “Installing the 60-mm Fan Module” on page 10-6.
4. Push the green plastic releases on each slide rail and push the system back into
the equipment rack.
5. Tighten the four (4) captive screws at the front of the system to secure it in the
rack (
FIGURE 5-2).
6. Tighten the four (4) captive screws on the shipping brackets at the rear of the
server (
FIGURE 5-1).
7. Reconnect the service loop cables to the rear of the server.
8. Restore the rack antitilt features to their original position.
5-10SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
CHAPTER
6
Storage Devices Replacement
This chapter describes how to remove and install the main storage systems. The
information is organized into the following topics:
■ Section 6.1, “Hard Disk Drive Replacement” on page 6-1
■ Section 6.2, “CD-RW/DVD-RW Drive Unit (DVDU) Replacement” on page 6-12
■ Section 6.3, “Tape Drive Unit Replacement” on page 6-23
6.1Hard Disk Drive Replacement
Hard disk drives are active, hot, or cold replacement components. Hard disk drive
backplanes are cold replacement components. The hard disk drives are identical on
both midrange servers. Hard disk drives and hard disk drive backplane information
is organized into the following sections:
■ Section 6.1.1, “Accessing the Hard Disk Drive” on page 6-4
■ Section 6.1.2, “Removing the Hard Disk Drive” on page 6-4
■ Section 6.1.3, “Installing the Hard Disk Drive” on page 6-5
■ Section 6.1.4, “Securing the Server” on page 6-5
■ Section 6.1.5, “Accessing the Hard Disk Drive Backplane of the M4000 Server” on
page 6-6
■ Section 6.1.6, “Removing the Hard Disk Drive Backplane of the M4000 Server” on
page 6-6
■ Section 6.1.7, “Installing the Hard Disk Drive Backplane of the M4000 Server” on
page 6-7
■ Section 6.1.8, “Securing the Server” on page 6-8
■ Section 6.1.9, “Accessing the Hard Disk Drive Backplane of the M5000 Server” on
page 6-9
6-1
■ Section 6.1.10, “Removing the Hard Disk Drive Backplane of the M5000 Server” on
page 6-10
■ Section 6.1.11, “Installing the Hard Disk Drive Backplane of the M5000 Server” on
page 6-10
■ Section 6.1.12, “Securing the Server” on page 6-11
The following illustration shows the locations of the hard disk drives and the hard
disk backplane on the SPARC Enterprise M4000 server.
FIGURE 6-1 M4000 Server Hard Disk Drives and Hard Disk Drive Backplane Locations
1
2
3
Location NumberComponent
1Hard disk drive backplane (HDDBP#0 IOU#0)
2Hard disk drive (HDD#1)
3Hard disk drive (HDD#0)
6-2SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
The following illustration shows the locations of the hard disk drives and the hard
disk drive backplane on the SPARC Enterprise M5000 server.
FIGURE 6-2 M5000 Server Hard Disk Drives and Hard Disk Drive Backplane Locations
1
2
3
4
5
6
Location NumberComponent
1Hard disk drive backplane (HDDBP#1 IOU#1)
2Hard disk drive backplane (HDDBP#0 IOU#0)
3Hard disk drive (HDD#1)
4Hard disk drive (HDD#0)
5Hard disk drive (HDD#3)
6Hard disk drive (HDD#2)
Chapter 6 Storage Devices Replacement6-3
6.1.1Accessing the Hard Disk Drive
Note – If the hard disk drive is the boot device, the hard disk will have to be
replaced using cold replacement procedures. However, active replacement can be
used if the boot disk can be isolated from the Oracle Solaris OS by disk mirroring
software and other software. See Section 4.4, “Cold Replacement (Powering the
Server Off and On)” on page 4-12.
● Remove the hard disk drive from the domain.
This step includes using the cfgadm command to determine the Ap_Id and
disconnecting the hard disk drive. See Section 4.2.1, “Removing a FRU From a
Domain” on page 4-4.
6.1.2Removing the Hard Disk Drive
Caution – Use proper ESD grounding techniques when handling components. See
Section 1.1, “Safety Precautions” on page 1-1.
1. Push the button on the front of the hard disk drive to release the drive latch
FIGURE 6-3).
(
2. Pull the latch so that it is straight out from the hard disk drive to unseat the
drive.
3. Remove the hard disk drive and place it on the ESD mat.
FIGURE 6-3 Removing the Hard Disk Drive
6-4SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
6.1.3Installing the Hard Disk Drive
Caution – Do not force the hard disk drive into the slot. Doing so can cause damage
to the component and server.
1. Pull the latch so that it is straight out from the drive.
2. Align the drive in the slot and push it gently into position until it stops.
3. Secure the latch.
6.1.4Securing the Server
1. Add the hard disk drive to the domain.
This step includes using the cfgadm command to connect and confirm the hard
disk drive has been added to the domain. See Section 4.2.3, “Adding a FRU Into a
Domain” on page 4-5.
2. Verify the state of the status LEDs on the hard disk drive.
Chapter 6 Storage Devices Replacement6-5
6.1.5Accessing the Hard Disk Drive Backplane of the
M4000 Server
Caution – There is an electrical hazard if the power cords are not disconnected. All
power cords must be disconnected to completely remove power from the server.
Caution – Use proper ESD grounding techniques when handling components. See
Section 1.1, “Safety Precautions” on page 1-1.
1. Power off the server.
This step includes turning the key switch to the Service position, confirming that
the POWER LED is off, and disconnecting power cables. See Section 4.4.1,
“Powering the Server Off Using Software” on page 4-12.
Caution – To prevent the equipment rack from tipping over, you must deploy the
antitilt feature, if applicable, before you slide the server out of the equipment rack.
Note – When drawing out the SPARC Enterprise M4000/M5000 server to the front,
release the cable tie holding the PCI cables on the rear of the server.
2. Remove the fan cover.
This step includes deploying the rack’s antitilt features (if applicable), sliding the
server out of the equipment rack, removing the 60-mm fan units and removing the
fan cover. See Section 5.3.1, “Removing the Fan Cover” on page 5-8.
6.1.6Removing the Hard Disk Drive Backplane of the
M4000 Server
1. Remove the CD-RW/DVD-RW Drive Unit and place it on the ESD mat.
See Section 6.2.3, “Removing the CD-RW/DVD-RW Drive Unit” on page 6-16.
2. Remove the power and serial cables from the rear of the CD-RW/DVD-RW
Drive Backplane.
3. Loosen the captive screw that holds the rear of the CD-RW/DVD-RW Drive
Backplane in place.
4. Remove the CD-RW/DVD-RW backplane and place it on the ESD mat.
6-6SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
5. Remove all hard disk drives and place them on the ESD mat.
See Section 6.1.2, “Removing the Hard Disk Drive” on page 6-4.
6. Remove the power cable (p3) from the rear of the hard disk drive backplane.
7. Loosen the captive screw that holds the hard disk drive backplane in place.
8. Lift the hard disk drive backplane from the guide pins.
9. Remove the blue serial cable from the hard disk drive backplane and place the
backplane on the ESD mat.
6.1.7Installing the Hard Disk Drive Backplane of the
M4000 Server
1. Secure the blue serial cable to the hard disk drive backplane.
2. Place the hard disk drive backplane onto the guide pins.
3. Tighten the captive screw that holds down the rear of the hard disk drive
backplane in place.
4. Secure the power cable (p3) to the rear of the hard disk drive backplane.
Caution – Do not force any components into server slots. Doing so can cause damage
to the component and server.
5. Install the hard disk drives.
See Section 6.1.3, “Installing the Hard Disk Drive” on page 6-5.
6. Place the CD-RW/DVD-RW backplane onto the guide pin.
7. Tighten the captive screw that holds the rear of the CD-RW/DVD-RW Drive
Backplane in place.
8. Connect the power and serial cables to the rear of the CD-RW/DVD-RW Drive
Backplane.
9. Install the CD-RW/DVD-RW Drive Unit.
See Section 6.2.4, “Installing the CD-RW/DVD-RW Drive Unit” on page 6-17.
Chapter 6 Storage Devices Replacement6-7
6.1.8Securing the Server
1. Install the fan cover.
This step includes replacing the fan cover, installing the 60-mm fan units, sliding
the server in to the equipment rack and restoring the rack antitilt features to their
original position. See Section 5.3.2, “Replacing the Fan Cover” on page 5-10.
2. Power the server on.
This step includes reconnecting power cables, verifying the state of the LEDs, and
turning the keyswitch to the Locked position. See Section 4.4.2, “Powering the
Server On Using Software” on page 4-13.
Note – If the Oracle Solaris automatic booting is set, use the sendbreak -d
domain_id command after the display console banner is displayed but before the
system starts booting the operating system to get the ok prompt.
3. Confirm the hardware.
This step includes running programs to be certain all components are mounted
again and then booting the operating system.
Refer to Section 4.3.2, “Verifying Hardware Operation” on page 4-9 for more
information.
6-8SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.