FUJITSU M5000 User Manual

SPARC Enterprise M4000/M5000 Servers
Service Manual
Part No.: 819-2210-14, Manual Code: C120-E352-07EN December 2010, Revision A
Copyright ©2007, 2010,Oracle and/orits affiliates.All rightsreserved. FUJITSU LIMITEDprovided technicalinput andreview on portions of this material. Oracle and/orits affiliatesand FujitsuLimited eachown orcontrol intellectual propertyrights relatingto productsand technologydescribed inthis
document, andsuch products,technology andthis documentare protected bycopyright laws,patents, andother intellectualproperty lawsand international treaties.
This documentand theproduct andtechnology towhich itpertains are distributed underlicenses restrictingtheir use, copying, distribution,and decompilation. Nopart ofsuch productor technology,or ofthis document,may bereproduced inany formby anymeans withoutprior written authorization ofOracle and/orits affiliatesand FujitsuLimited, andtheir applicablelicensors, ifany. Thefurnishings ofthis documentto youdoes not give youany rightsor licenses,express orimplied, withrespect to the product ortechnology towhich itpertains, andthis documentdoes notcontain or represent any commitment of any kind on the part of Oracle or Fujitsu Limited, or any affiliateof eitherof them.
This documentand theproduct andtechnology describedin thisdocument mayincorporate third-party intellectual propertycopyrighted by and/or licensed fromthe suppliersto Oracleand/or itsaffiliates and Fujitsu Limited, including software andfont technology.
Per theterms ofthe GPLor LGPL,a copyof thesource codegoverned bythe GPLor LGPL,as applicable,is availableupon request by theEnd User. Please contact Oracleand/or itsaffiliates orFujitsu Limited.
This distributionmay includematerials developedby thirdparties. Parts ofthe productmay be derivedfrom BerkeleyBSD systems,licensed from the Universityof California. UNIXis aregistered trademarkin the U.S.and
in othercountries, exclusivelylicensed throughX/Open Company,Ltd. Oracle andJava areregisteredtrademarks ofOracle and/orits affiliates.Fujitsu andthe Fujitsulogo areregisteredtrademarks ofFujitsu Limited. All SPARC trademarksareused underlicense andare registered trademarksof SPARC International,Inc. inthe U.S.and othercountries. Products bearing
SPARCtrademarks are based uponarchitecturesdeveloped byOracle and/orits affiliates.SPARC64 is atrademark ofSPARC International, Inc.,used under licenseby FujitsuMicroelectronics, Inc. and Fujitsu Limited. Other names may be trademarks of their respectiveowners.
United StatesGovernment Rights- Commercialuse. U.S. Governmentusers are subject tothe standard government userlicense agreementsof Oracle and/or itsaffiliates andFujitsu Limitedand theapplicable provisions of theFARand itssupplements.
Disclaimer: Theonly warrantiesgranted byOracle andFujitsu Limited,and/or anyaffiliate ofeither ofthem inconnection withthis documentor any product ortechnology describedherein are thoseexpressly setforth inthe licenseagreement pursuantto whichthe product or technologyis provided. EXCEPT ASEXPRESSLYSET FORTH INSUCH AGREEMENT, ORACLE ORFUJITSU LIMITED,AND/OR THEIRAFFILIATES MAKE NO REPRESENTATIONS ORWARRANTIESOF ANYKIND (EXPRESSOR IMPLIED)REGARDING SUCHPRODUCT ORTECHNOLOGY ORTHIS DOCUMENT,WHICH AREALL PROVIDEDAS IS,AND ALLEXPRESS ORIMPLIED CONDITIONS,REPRESENTATIONS ANDWARRANTIES, INCLUDING WITHOUTLIMITATION ANYIMPLIED WARRANTY OFMERCHANTABILITY, FITNESS FORA PARTICULAR PURPOSEOR NON­INFRINGEMENT,ARE DISCLAIMED,EXCEPT TOTHE EXTENTTHAT SUCH DISCLAIMERSARE HELDTO BE LEGALLY INVALID. Unless otherwise expresslyset forthin suchagreement, to the extent allowed by applicable law,in noevent shallOracle orFujitsu Limited,and/or anyof their affiliates haveany liabilityto anythird party under any legal theory for any loss of revenuesor profits,loss ofuse ordata, orbusiness interruptions,or for any indirect,special, incidentalor consequentialdamages, evenif advisedof thepossibility ofsuch damages.
DOCUMENTATION ISPROVIDED “ASIS” ANDALL EXPRESSOR IMPLIEDCONDITIONS, REPRESENTATIONSAND WARRANTIES, INCLUDING ANYIMPLIED WARRANTY OFMERCHANTABILITY,FITNESS FORA PARTICULAR PURPOSEOR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPTTO THEEXTENT THAT SUCHDISCLAIMERS AREHELD TOBE LEGALLY INVALID.
Please
Recycle
Copyright ©2007, 2010,Oracle et/ouses sociétésaffiliées. Tous droits réservés. FUJITSU LIMITEDa fourniet vérifiédes donnéestechniques decertaines partiesde cecomposant. Oracle et/ouses sociétésaffiliées etFujitsu Limiteddétiennent etcontrôlent chacunedes droits de propriétéintellectuelle relatifs auxproduitset
technologies décritsdans cedocument. Demême, cesproduits, technologieset cedocument sontprotégés par des lois sur le copyright, des brevets, d’autres loissur lapropriété intellectuelle et des traités internationaux.
Ce document,le produitet lestechnologies afférents sont exclusivementdistribués avecdes licencesqui en restreignentl’utilisation, lacopie, la distribution etla décompilation.Aucune partiede ceproduit, deces technologiesou dece documentne peutêtre reproduite sousquelque formeque ce soit, parquelque moyenque cesoit, sansl’autorisation écritepréalable d’Oracleet/ou sessociétés affiliéeset deFujitsu Limited,et deleurs éventuels bailleurs delicence. Cedocument, bienqu’il vousait étéfourni, nevous confèreaucun droit et aucunelicence, expressesou tacites, concernant leproduit ou latechnologie auxquelsil serapporte. Parailleurs, ilne contientni nereprésente aucunengagement, dequelque typeque cesoit, dela partd’Oracle ou de FujitsuLimited, oudes sociétésaffiliées del’une oul’autre entité.
Ce document,ainsi queles produitset technologiesqu’il décrit,peuvent inclure des droitsde propriété intellectuellede partiestierces protégéspar copyright et/oucédés souslicence pardes fournisseursà Oracleet/ou sessociétés affiliéeset FujitsuLimited, ycompris deslogiciels etdes technologies relatives auxpolices decaractères.
Conformément auxconditions dela licenceGPL ouLGPL, unecopie ducode sourcerégi parla licenceGPL ouLGPL, selonle cas,est disponiblesur demande parl’Utilisateur final.Veuillez contacter Oracleet/ou sessociétés affiliées ou FujitsuLimited.
Cette distributionpeut comprendre des composantsdéveloppés pardes partiestierces. Des partiesde ceproduit peuventêtre dérivées des systèmes Berkeley BSD, distribués sous licence par l’Université de Californie. UNIX est une marque
déposée auxÉtats-Unis etdans d’autrespays, distribuéeexclusivement souslicence parX/Open Company,Ltd. Oracle etJava sontdes marquesdéposées d’OracleCorporation et/oude sessociétés affiliées. Fujitsu etle logoFujitsu sontdes marquesdéposées de
Fujitsu Limited. Toutesles marques SPARC sontutilisées souslicence etsont desmarques déposéesde SPARC International,Inc., auxÉtats-Unis et dans d’autres pays.Les
produits portantla marque SPARC reposent sur des architectures développéespar Oracleet/ou sessociétés affiliées.SPARC64 est unemarque de SPARC International, Inc.,utilisée souslicence parFujitsu Microelectronics, Inc. etFujitsu Limited.Tout autre nom mentionné peut correspondreà desmarques appartenant àd’autres propriétaires.
United StatesGovernment Rights- Commercialuse. U.S. Governmentusers are subject tothe standard government userlicense agreementsof Oracle and/or itsaffiliates andFujitsu Limitedand theapplicable provisions of theFARand itssupplements.
Avisde non-responsabilité : les seulesgaranties octroyées par Oracleet FujitsuLimited et/outoute sociétéaffiliée de l’une ou l’autreentité enrapport avec cedocument outout produitou toutetechnologie décritsdans lesprésentes correspondent aux garantiesexpressément stipuléesdans le contrat de licence régissantle produitou latechnologie fournis.SAUF MENTIONCONTRAIRE EXPRESSÉMENTSTIPULÉE DANSCE CONTRAT,ORACLE OU FUJITSU LIMITEDET LESSOCIÉTÉS AFFILIÉESÀ L’UNE OUL’AUTREENTITÉ REJETTENTTOUTE REPRÉSENTATIONOU TOUTE GARANTIE, QUELLE QU’ENSOIT LANATURE(EXPRESSE OUIMPLICITE) CONCERNANTCE PRODUIT, CETTETECHNOLOGIE OUCE DOCUMENT, LESQUELS SONTFOURNIS ENL’ÉTAT. ENOUTRE, TOUTESLES CONDITIONS,REPRÉSENTATIONS ETGARANTIES EXPRESSESOU TACITES, Y COMPRIS NOTAMMENT TOUTEGARANTIE IMPLICITERELATIVEÀ LAQUALITÉ MARCHANDE,À L’APTITUDE ÀUNE UTILISATION PARTICULIÈRE OUÀ L’ABSENCE DE CONTREFAÇON, SONTEXCLUES, DANSLA MESUREAUTORISÉE PARLA LOIAPPLICABLE. Saufmention contraire expressément stipulée dansce contrat,dans lamesure autoriséepar laloi applicable,en aucuncas Oracle ou FujitsuLimited et/ou l’une ou l’autre deleurs sociétésaffiliées ne sauraient être tenuesresponsables enversune quelconquepartie tierce,sous quelquethéorie juridiqueque cesoit, de tout manqueà gagnerou deperte deprofit, deproblèmes d’utilisation ou de perte de données, ou d’interruptionsd’activités, oude toutdommage indirect, spécial,secondaire ou consécutif, même si ces entités ont été préalablement informées d’une telle éventualité.
LA DOCUMENTATIONEST FOURNIE« EN L’ÉTAT » ETTOUTE AUTRECONDITION, DÉCLARATIONET GARANTIE,EXPRESSE OUTACITE,EST FORMELLEMENT EXCLUE,DANS LAMESURE AUTORISÉEPAR LA LOIEN VIGUEUR,Y COMPRISNOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE ÀLA QUALITÉMARCHANDE, ÀL’APTITUDEÀ UNEUTILISATIONPARTICULIÈRE OUÀ L’ABSENCE DE CONTREFAÇON.

Contents

Preface xvii
1. Safety and Tools 1–1
1.1 Safety Precautions 1–1
1.2 System Precautions 1–2
1.2.1 Electrical Safety Precautions 1–2
1.2.2 Equipment Rack Safety Precautions 1–2
1.2.3 Filler Boards and Filler Panels 1–3
1.2.4 Handling Components 1–3
2. Fault Isolation 2–1
2.1 Determining Which Diagnostics Tools to Use 2–1
2.2 Checking the Server and System Configuration 2–4
2.2.1 Checking the Hardware Configuration and FRU Status 2–4
2.2.1.1 Checking the Hardware Configuration 2–5
2.2.2 Checking the Software and Firmware Configuration 2–6
2.2.2.1 Checking the Software Configuration 2–7
2.2.2.2 Checking the Firmware Configuration 2–7
2.2.3 Downloading the Error Log Information 2–8
2.3 Operator Panel 2–8
v
2.4 Error Conditions 2–14
2.4.1 Predictive Self-Healing Tools 2–14
2.4.2 Monitoring Output 2–17
2.4.3 Messaging Output 2–17
2.5 LED Functions 2–18
2.6 Using the Diagnostic Commands 2–21
2.6.1 Using the showlogs Command 2–21
2.6.2 Using the fmdump Command 2–22
2.6.2.1 fmdump -V Command 2–22
2.6.2.2 fmdump -e Command 2–23
2.6.3 Using the fmadm faulty Command 2–23
2.6.3.1 fmadm repair Command 2–23
2.6.3.2 fmadm config Command 2–24
2.6.4 Using the fmstat Command 2–24
2.7 Traditional Oracle Solaris Diagnostic Commands 2–25
2.7.1 Using the iostat Command 2–26
2.7.1.1 Options 2–26
2.7.2 Using the prtdiag Command 2–27
2.7.2.1 Options 2–27
2.7.3 Using the prtconf Command 2–30
2.7.3.1 Options 2–30
2.7.4 Using the netstat Command 2–32
2.7.4.1 Options 2–33
2.7.5 Using the ping Command 2–34
2.7.5.1 Options 2–34
2.7.6 Using the ps Command 2–35
2.7.6.1 Options 2–35
2.7.7 Using the prstat Command 2–36
vi SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
2.7.7.1 Options 2–36
2.8 Other Issues 2–37
2.8.1 Can’t Locate Boot Device 2–37
3. Periodic Maintenance 3–1
3.1 Tape Drive Unit 3–1
3.1.1 Cleaning the Tape Drive Unit 3–1
4. FRU Replacement Preparation 4–1
4.1 FRU Replacement Method 4–1
4.2 Active Replacement 4–4
4.2.1 Removing a FRU From a Domain 4–4
4.2.2 Removing and Replacing a FRU 4–5
4.2.3 Adding a FRU Into a Domain 4–5
4.2.4 Verifying Hardware Operation 4–6
4.3 Hot Replacement 4–6
4.3.1 Removing and Replacing a FRU 4–7
4.3.2 Verifying Hardware Operation 4–9
4.4 Cold Replacement (Powering the Server Off and On) 4–12
4.4.1 Powering the Server Off Using Software 4–12
4.4.2 Powering the Server On Using Software 4–13
4.4.3 Powering the Server Off Manually 4–14
4.4.4 Powering the Server On Manually 4–14
4.4.5 Verifying Hardware Operation 4–15
5. Internal Components Access 5–1
5.1 Sliding the Server In and Out to the Fan Stop 5–1
5.1.1 Sliding the Server Out of the Equipment Rack 5–2
5.1.2 Sliding the Server Into the Equipment Rack 5–4
5.2 Top Cover Remove and Replace 5–5
Contents vii
5.2.1 Removing the Top Cover 5–5
5.2.2 Replacing the Top Cover 5–8
5.3 Fan Cover Remove and Replace 5–8
5.3.1 Removing the Fan Cover 5–8
5.3.2 Replacing the Fan Cover 5–10
6. Storage Devices Replacement 6–1
6.1 Hard Disk Drive Replacement 6–1
6.1.1 Accessing the Hard Disk Drive 6–4
6.1.2 Removing the Hard Disk Drive 6–4
6.1.3 Installing the Hard Disk Drive 6–5
6.1.4 Securing the Server 6–5
6.1.5 Accessing the Hard Disk Drive Backplane of the M4000 Server 6–6
6.1.6 Removing the Hard Disk Drive Backplane of the M4000 Server 6–6
6.1.7 Installing the Hard Disk Drive Backplane of the M4000 Server 6–7
6.1.8 Securing the Server 6–8
6.1.9 Accessing the Hard Disk Drive Backplane of the M5000 Server 6–9
6.1.10 Removing the Hard Disk Drive Backplane of the M5000 Server 6– 10
6.1.11 Installing the Hard Disk Drive Backplane of the M5000 Server 6– 10
6.1.12 Securing the Server 6–11
6.2 CD-RW/DVD-RW Drive Unit (DVDU) Replacement 6–12
6.2.1 Identifying the Type of CD-RW/DVD-RW Drive Unit 6–15
6.2.2 Accessing the CD-RW/DVD-RW Drive Unit 6–16
6.2.3 Removing the CD-RW/DVD-RW Drive Unit 6–16
6.2.4 Installing the CD-RW/DVD-RW Drive Unit 6–17
6.2.5 Securing the Server 6–17
6.2.6 Accessing the CD-RW/DVD-RW Drive Backplane of the M4000 Server 6–18
viii SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
6.2.7 Removing the CD-RW/DVD-RW Drive Backplane of the M4000 Server 6–18
6.2.8 Installing the CD-RW/DVD-RW Drive Backplane of the M4000 Server 6–19
6.2.9 Securing the Server 6–19
6.2.10 Accessing the CD-RW/DVD-RW Drive Backplane of the M5000 Server 6–20
6.2.11 Removing the CD-RW/DVD-RW Drive Backplane of the M5000 Server 6–21
6.2.12 Installing the CD-RW/DVD-RW Drive Backplane of the M5000 Server 6–21
6.2.13 Securing the Server 6–22
6.3 Tape Drive Unit Replacement 6–23
6.3.1 Accessing the Tape Drive Unit 6–26
6.3.2 Removing the Tape Drive Unit 6–26
6.3.3 Installing the Tape Drive Unit 6–27
6.3.4 Securing the Server 6–27
6.3.5 Accessing the Tape Drive Backplane of the M4000 Server 6–28
6.3.6 Removing the Tape Drive Backplane of the M4000 Server 6–29
6.3.7 Installing the Tape Drive Backplane of the M4000 Server 6–29
6.3.8 Securing the Server 6–30
6.3.9 Accessing the Tape Drive Backplane of the M5000 Server 6–31
6.3.10 Removing the Tape Drive Backplane of the M5000 Server 6–32
6.3.11 Installing the Tape Drive Backplane of the M5000 Server 6–32
6.3.12 Securing the Server 6–33
7. Power Systems Replacement 7–1
7.1 Power Supply Unit Replacement 7–1
7.1.1 Accessing the Power Supply Unit 7–4
7.1.2 Removing the Power Supply Unit 7–4
7.1.3 Installing the Power Supply Unit 7–5
Contents ix
7.1.4 Securing the Server 7–5
8. I/O Unit Replacement 8–1
8.1 PCI Cassette Replacement 8–4
8.1.1 Accessing the PCI Cassette 8–5
8.1.2 Removing the PCI Cassette 8–5
8.1.3 Installing the PCI Cassette 8–6
8.1.4 Securing the Server 8–7
8.2 PCI Card Replacement 8–7
8.2.1 Removing the PCI Card 8–7
8.2.2 Installing the PCI Card 8–8
8.3 I/O Unit Replacement 8–10
8.3.1 Accessing the I/O Unit 8–10
8.3.2 Removing the I/O Unit 8–10
8.3.3 Installing the I/O Unit 8–11
8.3.4 Securing the Server 8–12
8.4 I/O Unit DC-DC Converter Replacement 8–12
8.4.1 Accessing the I/O Unit DC-DC Converter (DDC_A#0 or DDC_B#0) 8–14
8.4.2 Removing the I/O Unit DC-DC Converter (DDC_A #0 or DDC_B #0) 8–14
8.4.3 Installing the I/O Unit DC-DC Converter (DDC_A #0 or DDC_B #0) 8–17
8.4.4 Securing the Server 8–21
8.4.5 Accessing the I/O Unit DC-DC Converter Riser 8–21
8.4.6 Removing the I/O Unit DC-DC Converter Riser 8–22
8.4.7 Replacing the I/O Unit DC-DC Converter Riser 8–24
8.4.8 Securing the Server 8–24
9. XSCF Unit Replacement 9–1
x SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
9.1 XSCF Unit Replacement 9–1
9.1.1 Accessing the XSCF Unit 9–3
9.1.2 Removing the XSCF Unit 9–4
9.1.3 Installing the XSCF Unit 9–5
9.1.4 Securing the Server 9–5
10. Fan Modules Replacement 10–1
10.1 Fan Module Replacement 10–1
10.1.1 Accessing the 60-mm Fan Module 10–4
10.1.2 Removing the 60-mm Fan Module 10–5
10.1.3 Installing the 60-mm Fan Module 10–6
10.1.4 Securing the Server 10–6
10.1.5 Accessing the 172-mm Fan Module 10–7
10.1.6 Removing the 172-mm Fan Module 10–8
10.1.7 Installing the 172-mm Fan Module 10–9
10.1.8 Securing the Server 10–9
10.1.9 Accessing the 60-mm Fan Backplane 10–10
10.1.10 Removing the 60-mm Fan Backplane 10–11
10.1.11 Installing the 60-mm Fan Backplane 10–12
10.1.12 Securing the Server 10–12
10.1.13 Accessing the SPARC Enterprise M4000 172-mm Fan Backplane 10–13
10.1.14 Removing the SPARC Enterprise M4000 172-mm Fan Backplane 10–13
10.1.15 Installing the M4000 Server 172-mm Fan Backplane 10–16
10.1.16 Securing the Server 10–16
10.1.17 Accessing the M5000 Server 172-mm Fan Backplane 10–17
10.1.18 Removing the M5000 Server 172-mm Fan Backplane 10–17
10.1.19 Installing the M5000 Server 172-mm Fan Backplane 10–20
10.1.20 Securing the Server 10–20
Contents xi
11. Memory Board Replacement 11–1
11.1 Memory Board Replacement 11–1
11.1.1 Accessing the Memory Board 11–4
11.1.2 Removing the Memory Board 11–5
11.1.3 Installing the Memory Board 11–6
11.1.4 Securing the Server 11–6
11.2 DIMM Replacement 11–7
11.2.1 Confirmation of DIMM Information 11–8
11.2.2 Memory Installation Configuration Rules 11–9
11.2.3 Installing Memory: 11–10
11.2.4 Accessing the DIMMs 11–10
11.2.5 Removing the DIMMs 11–11
11.2.6 Installing the DIMMs 11–12
11.2.7 Securing the Server 11–12
12. CPU Module Replacement 12–1
12.1 CPU Module Replacement 12–1
12.1.1 Accessing the CPU Module 12–4
12.1.2 Removing the CPU Module 12–5
12.1.3 Installing the CPU Module 12–6
12.1.4 Securing the Server 12–6
12.2 CPU Upgrade 12–7
12.2.1 SPARC64 VII/SPARC64 VII+ CPU Modules Added to a New Domain 12–8
Adding a SPARC64 VII/SPARC64 VII+ CPU Module to a New
Domain 128
12.2.2 SPARC64 VII/SPARC64 VII+ Processors Added to an Existing Domain 12–11
Preparing to Add SPARC64VII/SPARC64 VII+ Processors to an
Existing Domain 1211
xii SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
Adding a SPARC64 VII/SPARC64 VII+ CPU Module to a Domain
Configured With SPARC64 VI 12-13
12.2.3 Upgrading a SPARC64 VI CPU Module to SPARC64 VII/SPARC64 VII+ on an Existing Domain 12–15
13. Motherboard Unit Replacement 13–1
13.1 Motherboard Unit Replacement 13–1
13.1.1 Accessing the M4000 Server Motherboard Unit 13–4
13.1.2 Removing the M4000 Server Motherboard Unit 13–5
13.1.3 Installing the M4000 Server Motherboard Unit 13–6
13.1.4 Securing the Server 13–6
13.1.5 Accessing the M5000 Server Motherboard Unit 13–7
13.1.6 Removing the M5000 Server Motherboard Unit 13–8
13.1.7 Installing the M5000 Server Motherboard Unit 13–10
13.1.8 Securing the Server 13–11
13.2 DC-DC Converter Replacement 13–12
13.2.1 Accessing the M4000 Server DC-DC Converter 13–14
13.2.2 Removing the M4000 Server DC-DC Converter 13–15
13.2.3 Installing the M4000 Server DC-DC Converter 13–16
13.2.4 Securing the Server 13–16
13.2.5 Accessing the M5000 Server DC-DC Converter 13–17
13.2.6 Removing the M5000 Server DC-DC Converter 13–18
13.2.7 Installing the M5000 Server DC-DC Converter 13–18
13.2.8 Securing the Server 13–18
13.3 Motherboard Unit Upgrade 13–19
13.3.1 Notes on Upgrading 13–19
13.3.2 Replacing a Motherboard Unit as an Upgrade in an Existing Domain 13–20
14. Backplane Unit Replacement 14–1
Contents xiii
14.1 Backplane Unit Replacement 14–1
14.1.1 Accessing the M4000 Server Backplane Unit 14–3
14.1.2 Removing the M4000 Server Backplane Unit 14–5
14.1.3 Installing the M4000 Server Backplane Unit 14–7
14.1.4 Securing the Server 14–8
14.1.5 Accessing the M5000 Server Backplane Unit 14–9
14.1.6 Removing the M5000 Server Backplane Unit 14–10
14.1.7 Installing the M5000 Server Backplane Unit 14–12
14.1.8 Securing the Server 14–12
15. Operator Panel Replacement 15–1
15.1 Operator Panel Replacement 15–1
15.2 Accessing the Operator Panel 15–4
15.2.1 Removing the Operator Panel 15–4
15.2.2 Installing the Operator Panel 15–7
15.2.3 Securing the Server 15–7
A. Components List A–1
B. Rules for System Configuration B–1
B.1 Server Configuration B–1
C. FRU List C–1
C.1 Server Overview C–1
C.2 System Boards C–3
C.2.1 Motherboard Unit C–3
C.2.2 CPU Module C–4
C.2.3 Memory Board C–5
C.3 Backplane Unit C–6
C.4 I/O Unit C–6
xiv SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
C.5 Power C–7
C.6 FAN Module C–8
C.7 eXtended System Control Facility Unit C–9
C.8 Drives C–10
C.8.1 Hard Disk Drive C–10
C.8.2 CD-RW/DVD-RW Drive Unit (DVDU) C–11
C.8.3 Tape Drive Unit (TAPEU) C–11
D. External Interface Specifications D–1
D.1 Serial Port D–2
D.2 UPC (UPS Control) Port D–3
D.3 USB Port D–3
D.4 Connection Diagram for Serial Cable D–4
E. UPS Controller E–1
E.1 Overview E–1
E.2 Signal Cables E–1
E.3 Signal Line Configuration E–2
E.4 Power Supply Conditions E–3
E.4.1 Input circuit E–3
E.4.2 Output circuit E–4
E.5 UPS Cable E–4
E.6 UPC Connector E–5
F. Air Filters F–1
F.1 M4000 and M5000 Servers Air Filter F–2
F.1.1 Command Operations Procedures F–2
F.2 Air Filter Installation for the M4000 Server F–3
F.2.1 Removing the Air Filter From the M4000 Server F–8
F.3 Air Filter Installation for the M5000 Server F–9
Contents xv
F.3.1 Removing the Air Filter from the M5000 Server F–13
F.3.2 Servicing the Air Filter F–14
G. Abbreviations G–1
Index Index–1
xvi SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010

Preface

This service manual describes how to service the SPARC Enterprise M4000/M5000 servers from Oracle and Fujitsu. This document is intended for authorized service providers. References herein to the M4000 server or M5000 server are references to the SPARC Enterprise M4000 or SPARC Enterprise M5000 server.
This document is written for maintenance providers who have received formal service training. A single engineer service model is used for servicing SPARC Enterprise M4000/M5000 midrange servers with one exception: When the motherboard of a SPARC Enterprise M5000 server must be removed and that server is mounted above waist high in the rack, then two engineers or a platform must be used for safety.
This section explains:
“SPARC Enterprise M4000/M5000 Servers Documentation” on page xviii
“Text Conventions” on page xix
“Notes on Safety” on page xx
“Documentation Feedback” on page xx
xvii
SPARC Enterprise M4000/M5000 Servers Documentation
For the web location of all SPARC Enterprise M4000/M5000 servers documents, refer to the SPARC Enterprise M4000/M5000 Servers Getting Started Guide packaged with your server.
Product notes are available on the website only. Please check for the most recent update for your product.
Note – For Sun Oracle software-related manuals (Oracle Solaris OS, and so on), go
to: http://docs.sun.com
Book Titles Sun/Oracle Fujitsu
SPARC Enterprise M4000/M5000 Servers Site Planning Guide 819-2205 C120-H015
SPARC Enterprise Equipment Rack Mounting Guide 819-5367 C120-H016
SPARC Enterprise M4000/M5000 Servers Getting Started Guide
SPARC Enterprise M4000/M5000 Servers Overview Guide 819-2204 C120-E346
SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers Important Legal and Safety Information
SPARC Enterprise M4000/M5000 Servers Safety and Compliance Manual 819-2203 C120-E348
External I/O Expansion Unit Safety and Compliance Guide 819-1143 C120-E457
SPARC Enterprise M4000 Server Unpacking Guide 821-3043 C120-E349
SPARC Enterprise M5000 Server Unpacking Guide 821-3044 C120-E350
SPARC Enterprise M4000/M5000 Servers Installation Guide 819-2211 C120-E351
SPARC Enterprise M4000/M5000 Servers Service Manual 819-2210 C120-E352
External I/O Expansion Unit Installation and Service Manual 819-1141 C120-E329
SPARC Enterprise M/3000/4000/M5000/M8000/M9000 Servers Administration Guide
*
821-3045 C120-E345
821-2098 C120-E633
821-2794 C120-E331
SPARC Enterprise M/3000/4000/M5000/M8000/M9000 Servers XSCF User’s Guide
SPARC Enterprise M3000/4000/M5000/M8000/M9000 Servers XSCF Reference Manual
xviii SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
821-2797 C120-E332
Varies per release Varies per release
Book Titles Sun/Oracle Fujitsu
SPARC Enterprise M4000/M5000/M8000/M9000 Servers Dynamic Reconfiguration (DR) User ’s Guide
SPARC Enterprise M4000/M5000/M8000/M9000 Servers Capacity on Demand (COD) User’s Guide
SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers Product
Notes
SPARC Enterprise M4000/M5000 Servers Product Notes Varies per release Varies per release
External I/O Expansion Unit Product Notes 819-5324 C120-E456
SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers Glossary 821-2800 C120-E514
* All getting started guides are printed documents.
† For XCP version 1100 or later
821-2796 C120-E335
821-2795 C120-E336
Varies per release Varies per release
Text Conventions
This manual uses the following fonts and symbols to express specific types of information.
Fonts/symbols Meaning Example
AaBbCc123 What you type, when contrasted
with on-screen computer output. This font represents the example of
command input in the frame.
AaBbCc123 The names of commands, files, and
directories; on-screen computer output.
This font represents the example of command input in the frame.
Italic Indicates the name of a reference
manual
" " Indicates names of chapters,
sections, items, buttons, or menus
XSCF> adduser jsmith
XSCF> showuser -P User Name: jsmith Privileges: useradm
auditadm
See the SPARC Enterprise M/3000/4000/M5000/M8000/M900 0 Servers XSCF User’s Guide.
See Chapter 2, "System Features."
Preface xix
Notes on Safety
Read the following documents thoroughly before using or handling any SPARC Enterprise M4000/M5000 server.
SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers Important Legal and
Safety Information
SPARC Enterprise M4000/M5000 Servers Safety and Compliance Guide
Documentation Feedback
If you have any comments or requests regarding this document, go to the following web sites.
For Oracle users:
http://docs.sun.com
For Fujitsu users in U.S.A., Canada, and Mexico:
http://www.computers.us.fujitsu.com/www/support_servers.shtml?s upport/servers
For Fujitsu users in other countries, refer to this SPARC Enterprise contact:
http://www.fujitsu.com/global/contact/computing/sparce_index.html
xx SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
CHAPTER
1

Safety and Tools

This chapter describes safety and tools information. The information is organized into the following topics:
Section 1.1, “Safety Precautions” on page 1-1
Section 1.2, “System Precautions” on page 1-2

1.1 Safety Precautions

To protect both yourself and the equipment, observe the following safety precautions.
TABLE 1-1 ESD Precautions
Item Problem Precaution
ESD jack/wrist or foot strap
ESD mat ESD An approved ESD mat provides protection from static damage when used
ESD packaging box
Electrostatic Discharge (ESD)
ESD Place the board or component in the ESD safe packaging box after you
Connect the ESD connector to your server and wear the wrist strap or foot strap when handling printed circuit boards. There are two antistatic strap attachment points on the chassis:
1. Right side towards the front
2. Left side towards the rear
with a wrist strap or foot strap. The mat also cushions and protects small parts that are attached to printed circuit boards.
remove it.
1-1
Caution – Attach the cord of the antistatic wrist strap directly to the server. Do not
attach the antistatic wrist strap to the ESD mat connection.
The antistatic wrist strap and any components you remove must be at the same potential.

1.2 System Precautions

For your protection, observe the following safety precautions when servicing your equipment:
Follow all cautions, warnings, and instructions marked on the equipment.
Never push objects of any kind through openings in the equipment, as they might
touch dangerous voltage points or short out components that could result in fire or electric shock.
Refer servicing of equipment to qualified personnel.

1.2.1 Electrical Safety Precautions

Ensure that the voltage and frequency of the power outlet to be used match the electrical rating labels on the equipment.
Wear antistatic wrist straps when handling any magnetic storage devices, system boards, or other printed circuit boards.
Use only properly grounded power outlets as described in the SPARC Enterprise M4000/M5000 Servers Installation Guide.
Caution – Do not make mechanical or electrical modifications. The manufacturer is
not responsible for regulatory compliance of modified servers.

1.2.2 Equipment Rack Safety Precautions

All equipment racks should be anchored to the floor, ceiling, or to adjacent frames, using the manufacturer’s instructions.
1-2 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
Free-standing equipment racks should be supplied with a stabilizer feature, which must be sufficient to support the weight of the server when extended on its slides. This prevents instability during installation or service actions.
Where a stabilizer feature is not supplied and the equipment rack is not bolted to the floor, a safety evaluation must be conducted by the installation or service engineer. The safety evaluation determines stability when the server is extended on its slides, prior to any installation or service activity.
Prior to installing the equipment rack on a raised floor, a safety evaluation must be conducted by the installation or service engineer. The safety evaluation ensures that the raised floor has sufficient strength to withstand the forces upon it when the server is extended on its slides. The normal procedure in this case would be to fix the rack through the raised floor to the concrete floor below, using a proprietary mounting kit for the purpose.
Caution – If more than one server is installed in an equipment rack, service only one
server at a time.

1.2.3 Filler Boards and Filler Panels

Filler boards and panels, which are physically inserted into the server when a board or module has been removed are used for EMI protection and for air flow.

1.2.4 Handling Components

Caution – There is a separate ground located on the rear of the server. It is
important to ensure that the server is properly grounded.
Caution – The server is sensitive to static electricity. To prevent damage to the
board, connect an antistatic wrist strap between you and the server.
Caution – The boards have surface-mount components that can be broken by flexing
the boards.
To minimize the amount of board flexing, observe the following precautions:
Chapter 1 Safety and Tools 1-3
Hold the board by the handle and finger hold panels, where the board stiffener is
located. Do not hold the board at the ends.
When removing the board from the packaging, keep the board vertical until you
lay it on the cushioned ESD mat.
Do not place the board on a hard surface. Use a cushioned antistatic mat. The
board connectors and components have very thin pins that bend easily.
Be careful of small component parts located on both sides of the board.
Do not use an oscilloscope probe on the components. The soldered pins are easily
damaged or shorted by the probe point.
Transport the board in its packaging box.
Caution – The heat sinks can be damaged by incorrect handling. Do not touch the
heat sinks while replacing or removing boards. If a heat sink is loose or broken, obtain a replacement board. When storing or shipping a board, ensure that the heat sinks have sufficient protection.
Caution – On the PCI cassette, when removing cables such as LAN cable, if your
finger can’t reach the latch lock of the connector, press the latch with a flathead screwdriver to remove the cable. Forcing your finger into the clearance can cause damage to the PCI card.
1-4 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
CHAPTER
2

Fault Isolation

This chapter describes overview and fault diagnosis information. The information is organized into the following topics:
Section 2.1, “Determining Which Diagnostics Tools to Use” on page 2-1
Section 2.2, “Checking the Server and System Configuration” on page 2-4
Section 2.3, “Operator Panel” on page 2-8
Section 2.4, “Error Conditions” on page 2-14
Section 2.5, “LED Functions” on page 2-18
Section 2.6, “Using the Diagnostic Commands” on page 2-21
Section 2.7, “Traditional Oracle Solaris Diagnostic Commands” on page 2-25
Section 2.8, “Other Issues” on page 2-37

2.1 Determining Which Diagnostics Tools to Use

When a failure occurs, a message is often displayed on the monitor. Use the flowcharts in problems.
FIGURE 2-1 and FIGURE 2-2 to find the correct methods for diagnosing
2-1
FIGURE 2-1 Diagnostic Method Flow Chart
No
2-2 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
FIGURE 2-2 Diagnostic Method Flow Chart—Traditional Data Collection
Chapter 2 Fault Isolation 2-3

2.2 Checking the Server and System Configuration

Before and after maintenance work, the state and configuration of the server and components should be checked and the information saved. For recovery from a problem, conditions related to the problem and the repair status must be checked. The operating conditions must remain the same before and after maintenance.
A functioning For example:
The syslog file should not display error messages.
The XSCF Shell command showhardconf does not display the * mark.
The administrative console should not display error messages.
The server processor logs should not display any error messages.
The Oracle Solaris Operating System message files should not indicate any
additional errors.
server without any problems should not display any error conditions.

2.2.1 Checking the Hardware Configuration and FRU Status

To replace a faulty component and perform the maintenance on the server it is important to check and understand the hardware configuration of the server and the state of each hardware component.
The hardware configuration refers to information that indicates to which layer a component belongs in the hardware configuration.
The status of each hardware component refers to information on the condition of the standard or optional component in the server: temperature, power supply voltage, CPU operating conditions, and other times.
2-4 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
The hardware configuration and the status of each hardware component can be checked from the maintenance terminal using eXtended System Control Facility (XSCF) Shell commands, as shown in the following table.
TABLE 2-1 Commands for Checking Hardware Configuration
Command Description
showhardconf Displays hardware configuration.
showstatus Displays the status of a component. This command is used when
only a faulty component is checked.
showboards Displays the status of devices and resources.
showdcl Displays the hardware resource configuration information of a
domain.
showfru Displays the setting information of a device.
Also some conditions can be checked based on the On or blinking state of the component LEDs (see
TABLE 2-3).
2.2.1.1 Checking the Hardware Configuration
Login authority is required to check the hardware configuration. The following procedure for these checks can be made from the maintenance terminal:
1. Log in with the account of the XSCF hardware maintenance engineer.
2. Type showhardconf.
XSCF> showhardconf
The showhardconf command prints the hardware configuration information to
the screen. See the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User ’s Guide for more detailed information.
Chapter 2 Fault Isolation 2-5

2.2.2 Checking the Software and Firmware Configuration

The software and firmware configurations and versions affect the operation of the server. To change the configuration or investigate a problem, check the latest information and check for any problems in the software.
Software and firmware varies according to users:
The software configuration and version can be checked in the Oracle Solaris OS.
Refer to the Solaris 10 documentation for more information.
The firmware configuration and versions can be checked from the maintenance
terminal using XSCF Shell commands. Refer to the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide for more detailed
information.
Check the software and firmware configuration information with assistance from the system administrator. However, if you have received login authority from the system administrator, the commands shown in the table can be used from the maintenance terminal for these checks.
TABLE 2-2 Commands for Checking Software and Firmware Configuration
Command Description
showrev(1M) System administration command that displays information system
patches.
uname(1) System administration command that outputs the current system
information.
version(8) XSCF Shell command that outputs the current firmware version
information.
showhardconf(8) XSCF Shell command that indicates information on components
mounted on the server.
showstatus(8) XSCF Shell command that displays the status of a component. This
command is used when only a faulty component is to be checked.
2-6 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
TABLE 2-2 Commands for Checking Software and Firmware Configuration (Continued)
Command Description
showboards(8) XSCF Shell command that indicates information on eXtended system
board (XSB). It can indicate information on XSB that belongs to the specified domain and information on all XSBs mounted. The eXtended System Board (XSB) combines the hardware resources of a physical system board. The SPARC Enterprise servers can generate one (Uni-XSB) or four (Quad-XSB) XSB(s) from one physical system board.
showdcl(8) XSCF Shell command that displays the configuration information of a
domain (hardware resource information).
showfru(8) XSCF Shell command that displays the setting information of a
device.
2.2.2.1 Checking the Software Configuration
The following procedure for these checks can be made from the domain console:
1. Type showrev.
# showrev
The showrev command prints the system configuration information to the
screen.
2.2.2.2 Checking the Firmware Configuration
Login authority is required to check the firmware configuration. The following procedure for these checks can be made from the maintenance terminal:
1. Log in with the account of the XSCF hardware maintenance engineer.
2. Type version(8).
XSCF> version(8)
The version(8) command prints the firmware version information to the
screen. See the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide for more detailed information.
Chapter 2 Fault Isolation 2-7

2.2.3 Downloading the Error Log Information

If you want to download the error log information, use the XSCF log fetch function. The eXtended System Control facility unit (XSCFU) has an interface with external units so that a maintenance engineer can easily obtain useful maintenance information such as error logs
Connect the maintenance terminal, and use the command-line interface (CLI) or browser user interface (BUI) to issue a download instruction to the maintenance terminal to download Error Log information over the XSCF-LAN.

2.3 Operator Panel

When no network connection is available the operator panel is used to start or stop the server. The operator panel displays three LED status indicators, a Power switch, and a security keyswitch. The panel is located on the front of the server, in the upper right.
When the server is running, the Power and XSCF STANDBY LEDs (green) should be lit and the CHECK LED (amber) should not be lit. If the CHECK LED is lit, search the system logs to determine what is wrong.
The three LED status indicators on the operator panel provide the following:
General system status
System problem alerts
Location of the system fault
FIGURE 2-3 and FIGURE 2-4 show the operator panel.
2-8 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
FIGURE 2-3 M4000 Server Operator Panel
Location Number Component
1 POWER LED
2 XSCF STANDBY LED
3 CHECK LED
4 Power switch
5 Mode switch (keyswitch)
6 Antistatic ground socket
1 2 3 4
5
6
l
Chapter 2 Fault Isolation 2-9
FIGURE 2-4 M5000 Server Operator Panel
Location Number Component
1 POWER LED
2 XSCF STANDBY LED
3 CHECK LED
4 Power switch
5 Mode switch (keyswitch)
6 Antistatic ground socket
1 2 3
4
5
6
Additional LEDs are located in various locations in the server. For more information about LED indicator locations, see Section 2.5, “LED Functions” on page 2-18.
2-10 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
The Operator panel LEDs operate as described in TABLE 2-3.
TABLE 2-3 Operator Panel LEDs and Switches
Icon Name Color Description
POWER LED Green Indicates the server power status.
• On: Server has power.
• Off: Server is without power.
• Blinking: The power-off sequence is in progress.
XSCF STANDBY LED
Green Indicates the readiness of the XSCF.
• On: XSCF unit is functioning normally.
• Off: XSCF unit is stopped.
• Blinking: Under system initialization after server power-on, or under system power-on process.
Indicates that server detected a fault.
CHECK LED Amber
• On: Error detected that disables the startup.
• Off: Normal, or server power-off (power failure).
• Blinking: Indicates the position of fault.
Power switch Switch to direct server power on/power off.
The Locked setting:
• Normal key position. Power on is available with the
Mode switch (keyswitch)
Power switch, but power off is not.
• Disables the Power switch to prevent unauthorized users from powering the server on or off.
• The Locked position is the recommended setting for normal day-to-day operations.
The Service setting:
• Service should be provided at this position.
• Power on and off is available with Power switch.
• The key cannot be pulled out at this position.
Chapter 2 Fault Isolation 2-11
The state displayed by LED combination is described in TABLE 2-4.
TABLE 2-4 State Display by LED Combination (Operator Panel)
LED
XSCF STANDBY CHECK
Off Off Off The circuit breaker is switched off.
Off Off On The circuit breaker is switched on.
Off Blinking Off The XSCF is being initialized.
Off Blinking On An error occurred in the XSCF.
Off On Off The XSCF is on standby.
On On Off Warm-up standby processing is in progress
Blinking On Off The power-off sequence is in progress.
Description of the statePOWER
The system is waiting for power-on of the air conditioning system.
(power-on is delayed). The power-on sequence is in progress. The system is in operation.
Fan termination is being delayed.
2-12 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
The operator panel mode switch is used to set the operation mode. The operator panel power switch is used to power on and off the server.
TABLE 2-6 lists the settings and
corresponding functions of the mode switch on the operator panel.
TABLE 2-5 Switches (Operator Panel)
Name Description of Function
Mode switch Used to set an operation mode for the server. Insert the special key that is under the
customer’s control, to switch between modes.
Locked Normal operation mode.
The system can be powered on with the power switch, but it cannot be powered off with the power switch.
The key can be pulled out at this key position.
Service Mode for maintenance.
The system can only be powered on and off with the power switch.
The key cannot be pulled out at this key position. Maintenance is performed in Service mode while the server
is stopped. Because remote power control and automatic power control
of the server are disabled in Service mode, unintentional power on can be prevented.
Power switch Used to control the server power. Power on and power off are controlled by pressing this
switch in different patterns, as described below.
Holding down for a short time (less than 4 seconds)
Regardless of the mode switch state, the server (all domains) is powered on.
At this time, processing for waiting for facility (air conditioners) power on and warm-up completion is skipped.
Holding down for a long time in Service mode
(4 seconds or longer)
If power to the server is on (at least one domain is operating), shutdown processing is executed for all domains before the system is powered off.
If the system is being powered on, the power-on processing is cancelled, and the system is powered off.
If the system is being powered off, the operation of the Power switch is ignored, and the power-off processing is continued.
Chapter 2 Fault Isolation 2-13
TABLE 2-6 Meanings of the Mode Switch
Function Mode Switch
State Definition Locked Service
Inhibition of Break Signal Reception Enabled. Reception of the
break signal can be enabled or disabled for each domain using setdomainmode.
Power On/Off by power switch Only power on is enabled Enabled
Disabled

2.4 Error Conditions

Always access the following web site first to interpret faults and obtain information on FMA messages.
http://www.sun.com/msg
This web site can be used in the event of an Oracle Solaris or domain failure or to look up specific FMA error messages it will not provide details on XSCF errors.
The web site directs you to provide the message ID that your software displayed. The web site then provides knowledge articles about the fault and corrective action to resolve the fault. The fault information and documentation at this web site is updated regularly.
Predictive self-healing is an architecture and methodology for automatically diagnosing, reporting, and handling software and hardware fault conditions. This new technology lessens the time required to debug a hardware or software problem and provides the administrator and technical support with detailed data about each fault.

2.4.1 Predictive Self-Healing Tools

In the Solaris 10 software, the fault manager runs in the background. If a failure occurs, the system software recognizes the error and attempts to determine what hardware is faulty. The software also takes steps to prevent that component from being used until it has been replaced. Some of the specific actions the software takes include:
Receives telemetry information about problems detected by the system software.
2-14 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
Diagnoses the problems.
Initiates pro-active self-healing activities. For example, the fault manager can
disable faulty components.
When possible, causes the faulty FRU to provide an LED indication of a fault in
addition to populating the system console messages with more details.
TABLE 2-7 shows a typical message generated when a fault occurs. The message
appears on your console and is recorded in the /var/adm/messages file.
Note – The message in TABLE 2-7 indicates that the fault has already been diagnosed.
Any corrective action that the system can perform has already taken place. If your server is still running, it continues to run.
Chapter 2 Fault Isolation 2-15
TABLE 2-7 Predictive Self-Healing Message
Output Displayed Description
Nov 1 16:30:20 dt88-292 EVENT-TIME: Tue Nov 1 16:30:20 PST 2005
Nov 1 16:30:20 dt88-292 PLATFORM: SUNW,A70, CSN: -, HOSTNAME: dt88-292
EVENT-TIME: the time stamp of the diagnosis.
PLATFORM: A description of the server encountering the problem.
Nov 1 16:30:20 dt88-292 SOURCE: eft, REV: 1.13 SOURCE: Information on the
Diagnosis Engine used to determine the fault.
Nov 1 16:30:20 dt88-292 EVENT-ID: afc7e660-d609-4b2f-86b8-ae7c6b8d50c4
Nov 1 16:30:20 dt88-292 DESC: Nov 1 16:30:20 dt88-292 A problem was detected in the
EVENT-ID: The Universally Unique event ID for this fault.
DESC: A basic description of the failure.
PCI-Express subsystem
Nov 1 16:30:20 dt88-292 Refer to http://sun.com/msg/SUN4-8000-0Y for more information.
WEB SITE: Where to find specific information and actions for this fault.
Nov 1 16:30:20 dt88-292 AUTO-RESPONSE: One or more device instances may be disabled.
AUTO-RESPONSE: What, if anything, the system did to alleviate any follow-on issues.
Nov 1 16:30:20 dt88-292 IMPACT: Loss of services provided by the device instances associated with this
IMPACT: A description of what that response might have done.
fault.
Nov 1 16:30:20 dt88-292 REC-ACTION: Schedule a repair procedure to replace the affected device. Use Nov 1 16:30:20 dt88-292 fmdump -v -u EVENT_ID to identify the
REC-ACTION: A short description of what the system administrator should do.
device or contact Sun for support.
2-16 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010

2.4.2 Monitoring Output

To understand error conditions, collect the monitoring output information. For the collection of the information, use the commands shown in
TABLE 2-8 Commands for Checking the Monitoring Output
Command Operand Description
showlogs(8) console Displays console of Domain.
monitor Logs messages that are displayed in the message window.
panic Logs output to the console during a panic.
ipl Collects console data generated during the period of the
power on of a domain to the completion of the operating system start.

2.4.3 Messaging Output

To understand error conditions, collect messaging output information, use the commands shown in
TABLE 2-9 Commands for Checking the Messaging Output
TABLE 2-9.
TABLE 2-8.
Command Operand Description
showlogs env Displays the temperature history log. The environmental
temperature data and power status are indicated in 10-minute intervals. the data is stored for a maximum of six months.
power Displays the power and reset information.
event Displays information reported to the operating system and
stored as event logs.
error Displays error logs.
fmdump(1M) fmdump(8)
Displays fault management architecture diagnostic results and errors. It is provided as an Oracle Solaris command and XSCF Shell command.
Each error message logged by the predictive self-healing architecture has a code associated with it as well as a web address that can be followed to get the most up-to-date course of action for dealing with that error.
Refer to the Oracle Solaris 10 documentation for more information on predictive self-healing.
Chapter 2 Fault Isolation 2-17

2.5 LED Functions

LED lights help the user find the component and provide information on the state of the component.
This section explains the LEDs of each component that are to be checked when a component is replaced. Most components are equipped with LEDs that help indicate which component has the error and an LED to indicate whether the component can be removed.
Some components, such as DIMMs, do not have LEDs. The state of a component without LEDs can be checked using the showhardconf and ioxadm XSCF Shell commands from the maintenance terminal. See the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide for more detailed information.
TABLE 2-10 describes the LEDs and their functions.
TABLE 2-10 Component LEDs
LED Name Display and Meaning
READY (green) Indicates whether the component is operating.
On Indicates that the component is operating. The component
cannot be disconnected and removed from the server while the READY LED is On.
Blinking Indicates that the component is being configured (or
disconnected). For an XSCF unit it indicates that it is being initialized.
Off Indicates that the component is stopped. The component can
be disconnected and replaced.
CHECK (amber)
Indicates that the component contains an error or that the component is a target for replacement.
On Indicates that an error has been detected.
Blinking Indicates that the component is ready to be replaced. The
blinking LED acts as a locator.
Off Indicates no known error exists.
2-18 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
TABLE 2-11 describes the components and their LEDs.
TABLE 2-11 Component LED Descriptions
Component LED Type LED Display Meaning
XSCF unit ACTIVE On (green) Indicates that the XSCF unit is active.
Off Indicates that the XSCF unit is on standby.
XSCF unit and IO (display part for LAN)
ACTIVE On (green) Indicates that the communication is being
performed through the LAN port.
Off Indicates that no communication is being
performed through the LAN port.
LINK SPEED On (amber) Indicates that the communication speed for the
LAN port is 1G bps.
On (green) Indicates that the communication speed for the
LAN port is 100M bps.
Off Indicates that the communication speed for the
LAN port is 10M bps.
PCI slot POWER On (green) Indicates that the power to the PCI slot is turned
on. The PCI card cannot be removed.
Off Indicates that the power to the PCI slot is turned
off. The PCI card can be removed.
ATTENTION On (amber) Indicates that an error occurred in the PCI slot.
Blinking (amber) Indicates that the card in this PCI slot is a target
device for replacement.
Off Indicates the normal state of the PCI slot.
Chapter 2 Fault Isolation 2-19
TABLE 2-11 Component LED Descriptions (Continued)
Component LED Type LED Display Meaning
Power supply unit (PSU)
READY On (green) Indicates that the power is turned on and being
supplied.
Blinking (green) Indicates that the power is being supplied to the
power supply unit, but the power supply unit is not turned on.
Off Indicates that power is not being supplied to the
power supply unit.
CHECK On (amber) Indicates that an error occurred in the power
supply unit.
Off Indicates the normal state of the power supply
unit.
LED_AC On (green) Power supply unit has AC applied and is
supplying 12V.
Off Indicates that AC is out of the specified
operating range and 12V is not being supplied from the power supply unit.
LED_DC On (green) Power supply unit has AC applied and is
supplying 48V. Standby pinhole provides a manual backup to turn off 48V power.
Off Indicates that 48V is not being supplied from
the power supply unit.
Fan ATTENTION On (amber) Indicates that an error occurred.
Blinking (amber) Indicates that the fan is a target device for
replacement.
2-20 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010

2.6 Using the Diagnostic Commands

After the message in TABLE 2-7 is displayed, you might desire more information about the fault. For complete information about troubleshooting commands, refer to the Oracle Solaris 10 man pages or the XSCF Shell man pages. This section describes some details of the following commands:
showlogs
fmdump
fmadm
fmstat

2.6.1 Using the showlogs Command

The showlogs command displays the contents of a specified log in order of time stamp starting with the oldest date. The showlogs command displays the following logs:
error log
power log
event log
temperature and humidity record
monitoring message log
console message log
panic message log
IPL message log
An example of the showlogs output.
XSCF> showlogs error Date: Oct 03 17:23:11 UTC 2006 Code: 80002000-ccff0000-0104340100000000 Status: Alarm Occurred: Oct 03 17:23:10.868 UTC 2006 FRU: /FAN_A#0 Msg: Abnormal FAN rotation speed. Insufficient rotation XSCF>
Chapter 2 Fault Isolation 2-21

2.6.2 Using the fmdump Command

The fmdump command can be used to display the contents of any log files associated with the Oracle Solaris fault manager.
The fmdump command produces output similar to
EXAMPLE 2-1. This example
assumes there is only one fault.
EXAMPLE 2-1 fmdump Output
# fmdump
TIME UUID SUNW-MSG-ID Nov 02 10:04:15.4911 0ee65618-2218-4997-c0dc-b5c410ed8ec2 SUN4-8000-0Y
2.6.2.1 fmdump -V Command
You can obtain more detail by using the -V option.
# fmdump -V -u 0ee65618-2218-4997-c0dc-b5c410ed8ec2 TIME UUID SUNW-MSG-ID Nov 02 10:04:15.4911 0ee65618-2218-4997-c0dc-b5c410ed8ec2 SUN4-8000-0Y 100% fault.io.fire.asic FRU: hc://product-id=SUNW,A70/motherboard=0 rsrc: hc:///motherboard=0/hostbridge=0/pciexrc=0
At least three lines of new output are delivered to the user with the -V option.
The first line is a summary of information you have seen before in the console
message but includes the time stamp, the UUID and the Message-ID.
The second line is a declaration of the certainty of the diagnosis. In this case we
are 100 percent sure the failure is in the ASIC described. If the diagnosis might involve multiple components you might see two lines here with 50% in each (for example)
The FRU line declares the part that needs to be replaced to return the server to a
fully operational state.
The rsrc line describes which component was taken out of service as a result of
this fault.
2-22 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
2.6.2.2 fmdump -e Command
To get information of the errors that caused this failure you can use the -e option, as shown in the following example.
XSCF> fmdump -e TIME CLASS Oct 03 13:52:48.9532 ereport.fm.fmd.module Oct 03 13:52:48.9610 ereport.fm.fmd.module Oct 03 13:52:48.9674 ereport.fm.fmd.module Oct 03 13:52:48.9738 ereport.fm.fmd.module

2.6.3 Using the fmadm faulty Command

The fmadm faulty command can be used by administrators and service personnel to view and modify system configuration parameters that are maintained by the Oracle Solaris fault manager. The command is primarily used to determine the status of a component involved in a fault, as shown in the following example.
# fmadm faulty STATERESOURCE / UUID
-------- -------------------------------------------------------­degraded dev:////pci@1e,600000
0ee65618-2218-4997-c0dc-b5c410ed8ec2
# fmadm repair
0ee65618-2218-4997-c0dc-b5c410ed8ec2
The PCI device is degraded and is associated with the same UUID as seen above. You might also see “faulted” states.
2.6.3.1 fmadm repair Command
If fmadm faulty occurs, the faulty FRU (CPU, memory, or I/O unit) is replaced, and then the fmadm repair command needs to be executed to clear FRU information on the domain. If the fmadam repair command is not executed, error messages continue to be output.
Chapter 2 Fault Isolation 2-23
If fmadm faulty occurs, the FMA resource cache on the OS side can be cleared without problems; the data in it need not match the hardware failure information retained on the XSCF side.
# fmadm repair STATERESOURCE / UUID
-------- -------------------------------------------------------­degraded dev:////pci@1e,600000
0ee65618-2218-4997-c0dc-b5c410ed8ec2
2.6.3.2 fmadm config Command
The fmadm config command output shows you the version numbers of the diagnosis engines in use by your server, as well as their current state. You can check these versions against information on the My Oracle Support web site to determine if you are running the latest diagnostic engines, as shown in the following example.
XSCF> fmadm config MODULE VERSION STATUS DESCRIPTION eft 1.16 active eft diagnosis engine event-transport 2.0 active Event Transport Module faultevent-post 1.0 active Gate Reaction Agent for errhandd fmd-self-diagnosis 1.0 active Fault Manager Self-Diagnosis iox_agent 1.0 active IO Box Recovery Agent reagent 1.1 active Reissue Agent sysevent-transport 1.0 active SysEvent Transport Agent syslog-msgs 1.0 active Syslog Messaging Agent XSCF>

2.6.4 Using the fmstat Command

The fmstat command can report statistics associated with the Oracle Solaris fault manager. The fmstat command shows information about DE performance. In the example below, the fmd-self-diagnosis DE (also seen in the console output) has received an event which it accepted. A case is “opened” for that event and a diagnosis is performed to “solve” the cause for the failure. See the following example.
2-24 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
XSCF> fmstat module ev_recv ev_acpt wait svc_t %w %b open solve memsz bufsz eft 0 0 0.0 0.0 0 0 0 0 3.3M 0 event-transport 0 0 0.0 0.0 0 0 0 0 6.4K 0 faultevent-post 2 0 0.0 8.9 0 0 0 0 0 0 fmd-self-diagnosis 24 24 0.0 352.1 0 0 1 0 24b 0 iox_agent 0 0 0.0 0.0 0 0 0 0 0 0 reagent 0 0 0.0 0.0 0 0 0 0 0 0 sysevent-transport 0 0 0.0 8700.4 0 0 0 0 0 0 syslog-msgs 0 0 0.0 0.0 0 0 0 0 97b 0 XSCF>

2.7 Traditional Oracle Solaris Diagnostic Commands

These superuser commands can help you determine if you have issues in your workstation, in the network, or within another server that you are networking with.
The following commands are described in this section:
“Using the iostat Command” on page 2-26
“Using the prtdiag Command” on page 2-27
“Using the prtconf Command” on page 2-30
“Using the netstat Command” on page 2-32
“Using the ping Command” on page 2-34
“Using the ps Command” on page 2-35
“Using the prstat Command” on page 2-36
Most of these commands are located in the /usr/bin or /usr/sbin directories.
Note – For additional details, options, examples, and the most up to date
information for each command refer to that command’s man page.
Chapter 2 Fault Isolation 2-25

2.7.1 Using the iostat Command

The iostat command iteratively reports terminal, drive, and tape I/O activity, as well as CPU utilization.
2.7.1.1 Options
TABLE 2-12 describes options for the iostat command and how those options can
help troubleshoot the server.
TABLE 2-12 Options for iostat
Option Description How It Can Help
No option Reports status of local I/O devices. A quick three-line output of device status.
-c Reports the percentage of time the
system has spent in user mode, in system mode, waiting for I/O, and idling.
-e Displays device error summary statistics.
The total errors, hard errors, soft errors, and transport errors are displayed.
-E Displays all device error statistics. Provides information about devices: manufacturer,
-n Displays names in descriptive format. Descriptive format helps identify devices.
-x For each drive, reports extended drive
statistics. The output is in tabular form.
Quick report of CPU status.
Provides a short table with accumulated errors. Identifies suspect I/O devices.
model number, serial number, size, and errors.
Similar to the information. This helps identify poor performance of internal devices and other I/O devices across the network.
-e option, but provides rate
The following example shows output for one iostat command.
# iostat -En c0t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: SEAGATE Product: ST973401LSUN72G Revision: 0556 Serial No: 0521104T9D Size: 73.40GB <73400057856 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c0t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: SEAGATE Product: ST973401LSUN72G Revision: 0556 Serial No: 0521104V3V Size: 73.40GB <73400057856 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 #
2-26 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010

2.7.2 Using the prtdiag Command

The prtdiag command displays configuration and diagnostic information. The diagnostic information identifies any failed component.
The prtdiag command is located in the /usr/platform/platform-name/sbin/ directory.
Note – The prtdiag command might indicate a slot number different than that
identified elsewhere in this document. This is normal.
2.7.2.1 Options
TABLE 2-13 describes options for the prtdiag command and how those options can
help troubleshooting.
TABLE 2-13 Options for prtdiag
Option Description How It Can Help
No option Lists components. Identifies CPU timing and PCI cards installed.
-v Verbose mode. Displays the
time of the most recent AC power failure and the most recent hardware fatal error information.
Provides the same information as no option. Additionally lists fan status, temperatures, ASIC, and PROM revisions.
Chapter 2 Fault Isolation 2-27
The following example shows output for the prtdiag command in verbose mode.
# prtdiag -v System Configuration: xxxx Server System clock frequency: 1012 MHz Memory size: 262144 Megabytes ==================================== CPUs ====================================
CPU CPU Run L2$ CPU CPU LSB Chip ID MHz MB Impl. Mask
--- ---- ---------------------------------------- ---- --- ----- ---­ 00 0 0, 1, 2, 3, 4, 5, 6, 7 2660 11.0 7 192 00 1 8, 9, 10, 11, 12, 13, 14, 15 2660 11.0 7 192 00 2 16, 17, 18, 19, 20, 21, 22, 23 2660 11.0 7 192 00 3 24, 25, 26, 27, 28, 29, 30, 31 2660 11.0 7 192 01 0 32, 33, 34, 35, 36, 37, 38, 39 2660 11.0 7 192 01 1 40, 41, 42, 43, 44, 45, 46, 47 2660 11.0 7 192 01 2 48, 49, 50, 51, 52, 53, 54, 55 2660 11.0 7 192 01 3 56, 57, 58, 59, 60, 61, 62, 63 2660 11.0 7 192
============================ Memory Configuration ============================
Memory Available Memory DIMM # of Mirror Interleave LSB Group Size Status Size DIMMs Mode Factor
--- ------ ------------------ ------- ------ ----- ------- ---------­ 00 A 65536MB okay 4096MB 16 no 8-way 00 B 65536MB okay 4096MB 16 no 8-way 01 A 65536MB okay 4096MB 16 no 8-way 01 B 65536MB okay 4096MB 16 no 8-way ========================= IO Devices =========================
IO Lane/Frq LSB Type LPID RvID,DvID,VnID BDF State Act, Max Name Model Logical Path
-----------­00 PCIe 0 bc, 8532, 10b5 2, 0, 0 okay 8, 8 pci-pciex10b5,8532 NA /pci@0,600000/pci@0
00 PCIe 0 bc, 8532, 10b5 3, 8, 0 okay 8, 8 pci-pciex10b5,8532 NA /pci@0,600000/pci@0/pci@8
00 PCIe 0 bc, 8532, 10b5 3, 9, 0 okay 1, 8 pci-pciex10b5,8532 NA /pci@0,600000/pci@0/pci@9
00 PCIx 0 8, 125, 1033 4, 0, 0 okay 100, 133 pci-pciexclass,060400 NA /pci@0,600000/pci@0/pci@8/pci@0
00 PCIx 0 8, 125, 1033 4, 0, 1 okay --, 133 pci-pciexclass,060400 NA /pci@0,600000/pci@0/pci@8/pci@0,1
2-28 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
The prtdiag -v output continued.
IO Lane/Frq LSB Type LPID RvID,DvID,VnID BDF State Act, Max Name Model Logical Path
-----------­00 PCIx 0 2, 50, 1000 5, 1, 0 okay --, 133 scsi-pci1000,50 LSI,1064 /pci@0,600000/pci@0/pci@8/pci@0/scsi@1
00 PCIx 0 10, 1648, 14e4 5, 2, 0 okay --, 133 network-pci14e4,1648 NA /pci@0,600000/pci@0/pci@8/pci@0/network@2
00 PCIx 0 10, 1648, 14e4 5, 2, 1 okay --, 133 network-pci14e4,1648 NA /pci@0,600000/pci@0/pci@8/pci@0/network@2,1
01 PCIe 16 bc, 8532, 10b5 2, 0, 0 okay 8, 8 pci-pciex10b5,8532 NA /pci@10,600000/pci@0
01 PCIe 16 bc, 8532, 10b5 3, 8, 0 okay 8, 8 pci-pciex10b5,8532 NA /pci@10,600000/pci@0/pci@8
01 PCIe 16 bc, 8532, 10b5 3, 9, 0 okay 1, 8 pci-pciex10b5,8532 NA /pci@10,600000/pci@0/pci@9
01 PCIx 16 8, 125, 1033 4, 0, 0 okay 100, 133 pci-pciexclass,060400 NA /pci@10,600000/pci@0/pci@8/pci@0
01 PCIx 16 8, 125, 1033 4, 0, 1 okay --, 133 pci-pciexclass,060400 NA /pci@10,600000/pci@0/pci@8/pci@0,1
01 PCIx 16 2, 50, 1000 5, 1, 0 okay --, 133 scsi-pci1000,50 LSI,1064 /pci@10,600000/pci@0/pci@8/pci@0/scsi@1
01 PCIx 16 10, 1648, 14e4 5, 2, 0 okay --, 133 network-pci14e4,1648 NA /pci@10,600000/pci@0/pci@8/pci@0/network@2
01 PCIx 16 10, 1648, 14e4 5, 2, 1 okay --, 133 network-pci14e4,1648 NA /pci@10,600000/pci@0/pci@8/pci@0/network@2,1
==================== Hardware Revisions ==================== System PROM revisions:
---------------------­OBP 4.24.13 2010/02/08 13:17 =================== Environmental Status ===================
Mode switch is in LOCK mode =================== System Processor Mode =================== SPARC64-VII mode
Chapter 2 Fault Isolation 2-29

2.7.3 Using the prtconf Command

Similar to the show-devs command run at the ok prompt, the prtconf command displays the devices that are configured.
The prtconf command identifies hardware that is recognized by the Oracle Solaris OS. If hardware is not suspected of being bad yet software applications are having trouble with the hardware, the prtconf command can indicate if the Oracle Solaris software recognizes the hardware, and if a driver for the hardware is loaded.
2.7.3.1 Options
TABLE 2-14 describes options for the prtconf command and how those options can
help troubleshooting.
TABLE 2-14 Options for prtconf
Option Description How It Can Help
No option Displays the device tree of
devices recognized by the OS.
-D Similar to the output of no option, however the device driver is listed.
-p Similar to the output of no option, yet is abbreviated.
-V Displays the version and date of the OpenBoot PROM firmware.
If a hardware device is recognized, then it is probably functioning properly. If the message “
attached)
sub-device, then the driver for the device is corrupt or missing.
Lists the driver needed or used by the OS to enable the device.
Reports a brief list of the devices.
Provides a quick check of firmware version.
” is displayed for the device or for a
(driver not
The following example shows output for the prtconf command.
# prtconf System Configuration: Sun Microsystems sun4u Memory size: 8064 Megabytes System Peripherals (Software Nodes):
SUNW,SPARC-Enterprise scsi_vhci, instance #0 packages (driver not attached) SUNW,builtin-drivers (driver not attached) deblocker (driver not attached)
2-30 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
The prtconf output continued.
disk-label (driver not attached) terminal-emulator (driver not attached) obp-tftp (driver not attached) ufs-file-system (driver not attached) chosen (driver not attached) openprom (driver not attached) client-services (driver not attached) options, instance #0 aliases (driver not attached) memory (driver not attached) virtual-memory (driver not attached) pseudo-console, instance #0 nvram (driver not attached) pseudo-mc, instance #0 cmp (driver not attached) core (driver not attached) cpu (driver not attached) cpu (driver not attached) core (driver not attached) cpu (driver not attached) cpu (driver not attached) cmp (driver not attached) core (driver not attached) cpu (driver not attached) cpu (driver not attached) core (driver not attached) cpu (driver not attached) cpu (driver not attached) pci, instance #0 ebus, instance #0 flashprom (driver not attached) serial, instance #0 scfc, instance #0 panel, instance #0 pci, instance #0 pci, instance #0 pci, instance #1 pci, instance #3 scsi, instance #0 tape (driver not attached) disk (driver not attached) sd, instance #0 (driver not attached) sd, instance #2 sd, instance #4 network, instance #0 network, instance #1 (driver not attached) pci, instance #4
Chapter 2 Fault Isolation 2-31
network, instance #0 (driver not attached) pci, instance #2 SUNW,qlc, instance #0 fp (driver not attached) disk (driver not attached) fp, instance #2 SUNW,qlc, instance #1 fp (driver not attached) disk (driver not attached) fp, instance #0 pci, instance #1 pci, instance #15 pci, instance #16 pci, instance #25 pci, instance #31 pci, instance #32 pci, instance #33 pci, instance #18 pci, instance #29 pci, instance #34 pci, instance #35 pci, instance #36 pci, instance #2 pci, instance #5 pci, instance #6 pci, instance #7 pci, instance #8 pci, instance #9 pci, instance #10 pci, instance #11 pci, instance #12 pci, instance #13 pci, instance #14 pci, instance #3 os-io (driver not attached) iscsi, instance #0 pseudo, instance #0 #

2.7.4 Using the netstat Command

The netstat command displays the network status.
2-32 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
2.7.4.1 Options
TABLE 2-15 describes options for the netstat command and how those options can
help troubleshooting.
TABLE 2-15 Options for netstat
Option Description How It Can Help
-i Displays the interface state,
including packets in/out, error in/out, collisions, and queue.
-i interval Providing a trailing number
with the
-i option repeats the
netstat command every
interval seconds.
-p Displays the media table. Provides MAC address for hosts on the subnet.
-r Displays the routing table. Provides routing information.
-n Replaces host names with IP
addresses.
The following example shows output for the netstat -p command.
# netstat -p Net to Media Table: IPv4 Device IP Address Mask Flags Phys Addr
------ -------------------- --------------- -------- --------------­bge0 san-ff1-14-a 255.255.255.255 o 00:14:4f:3a:93:61 bge0 san-ff2-40-a 255.255.255.255 o 00:14:4f:3a:93:85 sppp0 224.0.0.22 255.255.255.255 bge0 san-ff2-42-a 255.255.255.255 o 00:14:4f:3a:93:af bge0 san09-lab-r01-66 255.255.255.255 o 00:e0:52:ec:1a:00 sppp0 192.168.1.1 255.255.255.255 bge0 san-ff2-9-b 255.255.255.255 o 00:03:ba:dc:af:2a bge0 bizzaro 255.255.255.255 o 00:03:ba:11:b3:c1 bge0 san-ff2-9-a 255.255.255.255 o 00:03:ba:dc:af:29 bge0 racerx-b 255.255.255.255 o 00:0b:5d:dc:08:b0 bge0 224.0.0.0 240.0.0.0 SM 01:00:5e:00:00:00
#
Provides a quick overview of the network status.
Identifies intermittent or long duration network events. By piping can be viewed all at once.
Used when an address is more useful than a host name.
netstat output to a file, overnight activity
Chapter 2 Fault Isolation 2-33

2.7.5 Using the ping Command

The ping command sends ICMP ECHO_REQUEST packets to network hosts. Depending on how the ping command is configured, the output displayed can identify troublesome network links or nodes. The destination host is specified in the variable hostname.
2.7.5.1 Options
TABLE 2-16 describes options for the ping command and how those options can help
troubleshooting.
TABLE 2-16 Options for ping
Option Description How It Can Help
hostname The probe packet is sent to
hostname and returned.
-g hostname Forces the probe packet to route
through a specified gateway.
-i interface Designates which interface to
send and receive the probe packet through.
-n Replaces host names with IP
addresses.
-s Pings continuously in one-second
intervals. Ctrl-C aborts. Upon abort, statistics are displayed.
-svR Displays the route the probe
packet followed in one-second intervals.
Verifies that a host is active on the network.
By identifying different routes to the target host, those individual routes can be tested for quality.
Enables a simple check of secondary network interfaces.
Used when an address is more beneficial than a host name.
Helps identify intermittent or long-duration network events. By piping be viewed at once.
Indicates probe packet route and number of hops. Comparing multiple routes can identify bottlenecks.
ping output to a file, activity overnight can later
The following example shows output for the ping -s command.
# ping -s san-ff2-17-a
PING san-ff2-17-a: 56 data bytes 64 bytes from san-ff2-17-a (10.1.67.31): icmp_seq=0. time=0.427 ms 64 bytes from san-ff2-17-a (10.1.67.31): icmp_seq=1. time=0.194 ms ^C
----san-ff2-17-a PING Statistics---­2 packets transmitted, 2 packets received, 0% packet loss round-trip (ms) min/avg/max/stddev = 0.172/0.256/0.427/0.102 #
2-34 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010

2.7.6 Using the ps Command

The ps command lists the status of processes. Using options and rearranging the command output can assist in determining the resource allocation.
2.7.6.1 Options
TABLE 2-17 describes options for the ps command and how those options can help
troubleshooting.
TABLE 2-17 Options for ps
Option Description How It Can Help
-e Displays information for every
process.
-f Generates a full listing. Provides the following process information: user ID,
-o option Enables configurable output. The pid,
pcpu, pmem, and comm options
display process ID, percent CPU consumption, percent memory consumption, and the responsible executable, respectively.
Identifies the process ID and the executable.
parent process ID, time when executed, and the path to the executable.
Provides only most important information. Knowing the percentage of resource consumption helps identify processes that are affecting performance and might be hung.
The following example shows output for one ps command.
# ps
PID TTY TIME CMD 101042 pts/3 0:00 ps 101025 pts/3 0:00 sh #
Note – When using sort with the -r option, the column headings are printed so
that the value in the first column is equal to zero.
Chapter 2 Fault Isolation 2-35

2.7.7 Using the prstat Command

The prstat utility iteratively examines all active processes and reports statistics based on the selected output mode and sort order. The prstat command provides output similar to the ps command.
2.7.7.1 Options
TABLE 2-18 describes options for the prstat command and how those options can
help troubleshooting.
TABLE 2-18 Options for prstat
Option Description How It Can Help
No option Displays a sorted list of the top
processes that are consuming the most CPU resources. List is limited to the height of the terminal window and the total number of processes. Output is automatically updated every five seconds. Ctrl-C aborts.
-n number Limits output to number of
lines.
-s key Permits sorting list by key
parameter.
-v Verbose mode. Displays additional parameters.
Output identifies process ID, user ID, memory used, state, CPU consumption, and command name.
Limits amount of data displayed and identifies primary resource consumers.
Useful keys are cpu (default), time, and size.
The following example shows output for the prstat command.
# prstat -n 5 -s size PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 100463 root 66M 61M sleep 59 0 0:01:03 0.0% fmd/19 100006 root 11M 9392K sleep 59 0 0:00:09 0.0% svc.configd/16 100004 root 10M 8832K sleep 59 0 0:00:04 0.0% svc.startd/14 100061 root 9440K 6624K sleep 59 0 0:00:01 0.0% snmpd/1 100132 root 8616K 5368K sleep 59 0 0:00:04 0.0% nscd/35 Total: 52 processes, 188 lwps, load averages: 0.00, 0.00, 0.00 #
2-36 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010

2.8 Other Issues

2.8.1 Can’t Locate Boot Device

When the PCI-X card slot 0 is faulty or it is not seated properly, the firmware will blacklist the entire PCI-X bridge device (and everything attached downstream from it) causing the boot disk to disappear. The problem results in the showdisk command failing to display the boot disk and the bootdisk command displaying the console message “Can’t locate boot device”.
When this occurs remove the PCI/PCI-X card in slot 0 to see if the boot issue is remedied. If the IO unit is fully stocked and it is not possible to remove the PCI/PCI-X card, then you should attempt to place another card in slot 0, if possible. If this also is not possible you should remove and reinstalling the existing card in slot 0.
Chapter 2 Fault Isolation 2-37
2-38 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
CHAPTER
3

Periodic Maintenance

This chapter describes the periodic maintenance required to keep the server running regardless of whether a problem has occurred. The information is organized into the following topic:
Section 3.1, “Tape Drive Unit” on page 3-1

3.1 Tape Drive Unit

It might be necessary to use a cleaning tape when carrying out the cleaning procedure.
Note – Contact your sales representative for tape drive unit options on M4000 and
M5000 servers.

3.1.1 Cleaning the Tape Drive Unit

To avoid the "Clean Lamp" from prematurely illuminating, the following maintenance rules should be followed:
Clean your tape drive unit once every 5 to 24 hours of continuous use, or once a
week.
Clean your tape drive unit once a month, even if it is not in use.
Clean your tape drive unit whenever the "Clean Lamp" indicator is lit or blinking.
Clean your tape drive unit before inserting a new data cassette.
3-1
Replace the cleaning cassette when the tape inside of the cassette has completely
wound up onto the right-hand spool or when the three lamps are in the following states:"Off","Lit" and "Blinking."
Remove the cassette before turning the power "OFF". The tape life might be
shortened or a malfunction might occur during the backup process if the power is turned "OFF" while the cassette is still inside.
Note – If the "cleaning lamp" starts blinking immediately after completion of a
cleaning operation, the data cassette might have been damaged. In this case, replace the data cassette.
3-2 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
CHAPTER
4

FRU Replacement Preparation

This chapter describes how to prepare a field-replaceable unit (FRU) for safe replacement. The information is organized into the following topics:
Section 4.1, “FRU Replacement Method” on page 4-1
Section 4.2, “Active Replacement” on page 4-4
Section 4.3, “Hot Replacement” on page 4-6
Section 4.4, “Cold Replacement (Powering the Server Off and On)” on page 4-12

4.1 FRU Replacement Method

There are three basic methods for replacing the FRUs:
Active replacement – To replace a FRU while the domain, to which the FRU belongs, continues running. Active replacement requires that the FRU be inactivated or powered down using either an XSCF command or Oracle Solaris OS command. Because the power supply unit (PSU) and fan unit (FAN) do not belong to any domain, they are operated by using XSCF commands, regardless of the operating state of the Oracle Solaris OS.
Note – The procedure for isolating the hard disk drive from the Oracle Solaris OS
varies depending on whether disk mirroring software or other support software is used. For details, see the relevant software manuals.
Hot replacement –To replace a FRU while the domains are powered off. Depending on the FRU to be replaced, the FRU can either be directly replaced or be inactivated or powered down using an XSCF command.
Cold replacement – To replace a FRU while all domains are stopped and the server is powered off and unplugged.
4-1
TABLE 4-1 lists the FRUs, location and access, and the replacement method.
TABLE 4-1 FRU Replacement Information
FRU FRU Location/Access Removal Method(s)
PCI cassette (PCIe) Rear Active replacement (cfgadm)
Hot replacement Cold replacement
Hard disk drive (HDD) Front Active replacement (cfgadm)
Hot replacement Cold replacement
*
Power supply unit (PSU) Front Active replacement
(replacefru) Hot replacement (replacefru) Cold replacement
172-mm fans (FAN_A) Top Active replacement
(replacefru) Hot replacement(replacefru) Cold replacement
60-mm fans (FAN_B) Top Active replacement (replacefru)
Hot replacement (replacefru) Cold replacement
Tape drive unit (TAPEU) Front Active replacement
Hot replacement Cold replacement
I/O unit (IOU) Rear Cold replacement
I/O unit DC-DC Converter Rear Cold replacement
I/O unit DDC Riser (DDCR) Rear Cold replacement
CD-RW/DVD-RW Drive Unit (DVDU) Front Hot replacement
Cold replacement
Backplane unit (BPU_A, BPU_B) Top Cold replacement
4-2 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
TABLE 4-1
FRU FRU Location/Access Removal Method(s)
FRU Replacement Information (Continued)
CPU module (CPUM_A) Top Cold replacement
Memory board (MEMB) Top Cold replacement
Motherboard (M4000) (MBU_A) Rear Cold replacement
Motherboard DC-DC Converter (M4000)
Rear Cold replacement
(DDC_A, DDC_B)
Motherboard (M5000) (MBU_B) Top Cold replacement
Motherboard DC-DC Converter (M5000)
Top Cold replacement
(DDC_A, DDC_B)
eXtended System Control facility unit (XSCFU) Rear Cold replacement
Hard disk drive backplane (HDDBP) Top Cold replacement
CD-RW/DVD-RW backplane Top Cold replacement
Tape drive backplane (TAPEBP) Top Cold replacement
Operator panel (OPNL) Top Cold replacement
* When using active replacement for a PSU, only one power supply unit should be replaced at a time to ensure redundancy. † When using active replacement for a 172-mm or 60-mmfan unit, only one fan unit should be replaced at atime to ensure redundancy.
Chapter 4 FRU Replacement Preparation 4-3

4.2 Active Replacement

In active replacement the Oracle Solaris OS must be configured to allow the component to be replaced. Active replacement has four stages:
Section 4.2.1, “Removing a FRU From a Domain” on page 4-4
Section 4.2.2, “Removing and Replacing a FRU” on page 4-5
Section 4.2.3, “Adding a FRU Into a Domain” on page 4-5
Section 4.2.4, “Verifying Hardware Operation” on page 4-6
Note – If the hard disk drive is the boot device, the hard disk will have to be
replaced using cold replacement procedures. However, active replacement can be used if the boot disk can be isolated from the Oracle Solaris OS by disk mirroring software and other software.

4.2.1 Removing a FRU From a Domain

Note – Before you remove a PCI cassette, make sure that there is no I/O activity on
the card in the cassette.
1. From the Oracle Solaris prompt, type the cfgadm command to get the
component status.
# cfgadm -a Ap_Id Type Receptacle Occupant Condition iou#0-pci#0 etherne/hp connected configured ok iou#0-pci#1 fibre/hp connected configured ok iou#0-pci#2 pci-pci/hp connected configured ok
Ap_Id includes the IOU number (iou#0 or iou#1) and the PCI cassette slot number (pci#1, pci#2, pci#3, pci#4).
4-4 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
Caution – If you use the PCI Hot Plug (PHP) function on the servers with Oracle
Solaris 10 9/10, or 142909-17 or later, enable the hotplug service as follows:
# svcadm enable hotplug
2. Type the cfgadm command to disconnect the component from the domain:
# cfgadm -c unconfigure Ap_Id
Note – For a PCI cassette, type the cfgadm -c disconnect command to
disconnect the component from the domain.
The Ap_Id is shown in the output of cfgadm.
3. Type the cfgadm command to confirm the component is now disconnected.
# cfgadm -a Ap_Id Type Receptacle Occupant Condition iou#0-pci#0 etherne/hp disconnected unconfigured unknown iou#0-pci#1 fibre/hp connected configured ok iou#0-pci#2 pci-pci/hp connected configured ok
iou#0-pci#0 for example.

4.2.2 Removing and Replacing a FRU

Once the FRU has been removed from the domain, see Section 4.3.1, “Removing and
Replacing a FRU” on page 4-7

4.2.3 Adding a FRU Into a Domain

1. From the Oracle Solaris prompt, type the cfgadm command to connect the
component to the domain.
# cfgadm -c configure Ap_Id
The Ap_Id is shown in the output of cfgadm.
Chapter 4 FRU Replacement Preparation 4-5
iou#0-pci#0 for example.
2. Type the cfgadm command to confirm the component is now connected.
# cfgadm -a Ap_Id Type Receptacle Occupant Condition iou#0-pci#0 etherne/hp connected configured ok iou#0-pci#1 fibre/hp connected configured ok iou#0-pci#2 pci-pci/hp connected configured ok

4.2.4 Verifying Hardware Operation

Verify the state of the status LEDs.
The POWER LED should be On and the CHECK LED should not be On.
Note – If the hard disk drive is the boot device, the hard disk will have to be
replaced using cold replacement procedures. However, active replacement can be used if the boot disk can be isolated from the Oracle Solaris OS by disk mirroring software and other software.

4.3 Hot Replacement

In hot replacement the Oracle Solaris OS does not need to be configured to allow the component to be replaced. Depending on the FRU to be replaced, the FRU can either be directly replaced or be inactivated or powered down using an XSCF command.
4-6 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010

4.3.1 Removing and Replacing a FRU

1. From the XSCF Shell prompt, type the replacefru command.
EXAMPLE 4-1 replacefru command
XSCF> replacefru
---------------------------------------------------------------­Maintenance/Replacement Menu Please select a type of FRU to be replaced.
1. FAN (Fan Unit)
2. PSU (Power Supply Unit)
----------------------------------------------------------------
Select [1,2|c:cancel] :1
---------------------------------------------------------------­Maintenance/Replacement Menu Please select a FAN to be replaced.
No. FRU Status
--- --------------- ------------------
1. FAN_A#0 Faulted
2. FAN_A#1 Normal
3. FAN_A#2 Normal
4. FAN_A#3 Normal
---------------------------------------------------------------­Select [1-4|b:back] :1
You are about to replace FAN_A#0. Do you want to continue?[r:replace|c:cancel] :r
Please confirm the CHECK LED is blinking. If this is the case, please replace FAN_A#0. After replacement has been completed, please select[f:finish] :f
Chapter 4 FRU Replacement Preparation 4-7
The replacefru command automatically tests the status of the component after the remove and replace is finished.
EXAMPLE 4-2 replacefru command status
Diagnostic tests of FAN_A#0 is started. [This operation may take up to 2 minute(s)] (progress scale reported in seconds)
0..... 30..... 60..... 90.....done
---------------------------------------------------------------­Maintenance/Replacement Menu Status of the replaced unit.
FRU Status
------------- -------­FAN_A#0 Normal
---------------------------------------------------------------­The replacement of FAN_A#0 has completed, normally.[f:finish] :f
---------------------------------------------------------------­Maintenance/Replacement Menu Please select a type of FRU to be replaced.
1. FAN (Fan Unit)
2. PSU (Power Supply Unit)
---------------------------------------------------------------­Select [1,2|c:cancel] : C XSCF>
Note – The display may vary depending on the XCP version.
When the tests are complete the program returns to the original menu. Select cancel to return to the XSCF Shell prompt.
Refer to the replacefru man page for more information.
4-8 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010

4.3.2 Verifying Hardware Operation

1. Type the showhardconf command to confirm the new component is installed.
EXAMPLE 4-3 showhardconf
XSCF> showhardconf SPARC Enterprise M5000; + Serial:BCF07500B6; Operator_Panel_Switch:Locked; + Power_Supply_System:Dual; SCF-ID:XSCF#0; + System_Power:On; System_Phase:Cabinet Power On; Domain#0 Domain_Status:Initialization Phase; Domain#1 Domain_Status:Initialization Phase;
MBU_B Status:Normal; Ver:0201h; Serial:BC07490823 ;
+ FRU-Part-Number:CF00541-0478 05 /541-0478-05 ;
+ Memory_Size:64 GB;
CPUM#0-CHIP#0 Status:Normal; Ver:0501h; Serial:PP0723016Q ;
+ FRU-Part-Number:CA06761-D204 A0 /LGA-JUPP-01 ; + Freq:2.530 GHz; Type:32; + Core:4; Strand:2; :
CPUM#3-CHIP#1 Status:Normal; Ver:0501h; Serial:PP074804E9 ;
+ FRU-Part-Number:CA06761-D204 A0 /LGA-JUPP-01 ; + Freq:2.530 GHz; Type:32; + Core:4; Strand:2; MEMB#0 Status:Normal; Ver:0101h; Serial:BF09061G0E ;
+ FRU-Part-Number:CF00541-0545 06 /541-0545-06 ; MEM#0A Status:Normal;
+ Code:c1000000000000005372T128000HR3.7A 356d-0d016912; + Type:1A; Size:1 GB; : MEM#3B Status:Normal;
+ Code:c1000000000000004572T128000HR3.7A 252b-04123424; + Type:1A; Size:1 GB; : MEMB#7 Status:Normal; Ver:0101h; Serial:BF09061GBA ;
+ FRU-Part-Number:CF00541-0545 06 /541-0545-06 ;
MEM#0A Status:Normal;
+ Code:2cffffffffffffff0818HTF12872Y-53EB3 0300-d504600c; + Type:1A; Size:1 GB; : MEM#3B Status:Normal;
+ Code:7f7ffe00000000004aEBE10RD4AGFA-5C-E 3020-2229c19c; + Type:1A; Size:1 GB;
Chapter 4 FRU Replacement Preparation 4-9
This sample shows the showhardconf output continued.
EXAMPLE 4-4 showhardconf
DDC_A#0 Status:Normal; DDC_A#1 Status:Normal; DDC_A#2 Status:Normal; DDC_A#3 Status:Normal; DDC_B#0 Status:Normal; DDC_B#1 Status:Normal; IOU#0 Status:Normal; Ver:0101h; Serial:BF07486TEU ;
+ FRU-Part-Number:CF00541-2240 02 /541-2240-02 ; + Type 1; DDC_A#0 Status:Normal; DDCR Status:Normal; DDC_B#0 Status:Normal; IOU#1 Status:Normal; Ver:0101h; Serial:BF073226HP ; + FRU-Part-Number:CF00541-4361 01 /541-4361-01 ; + Type 1; DDC_A#0 Status:Normal; DDCR Status:Normal; DDC_B#0 Status:Normal;
XSCFU Status:Normal,Active; Ver:0101h; Serial:BF07435D98 ;
+ FRU-Part-Number:CF00541-0481 04 /541-0481-04 ; OPNL Status:Normal; Ver:0101h; Serial:BF0747690D ;
+ FRU-Part-Number:CF00541-0850 06 /541-0850-06 ; PSU#0 Status:Normal; Serial:0017527-0738063762; + FRU-Part-Number:CF00300-1898 0350 /300-1898-03-50; + Power_Status:Off; AC:200 V; PSU#3 Status:Normal; Serial:0017527-0738063767; + FRU-Part-Number:CF00300-1898 0350 /300-1898-03-50; + Power_Status:Input fail; AC: - ; FANBP_C Status:Normal; Ver:0501h; Serial:FF2#24 ;
+ FRU-Part-Number:CF00541-3099 01 /541-3099-01 ; FAN_A#0 Status:Normal; FAN_A#1 Status:Normal; FAN_A#2 Status:Normal; FAN_A#3 Status:Normal;
Refer to the showhardconf man page for more information.
4-10 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
2. Type the showhardconf -u command to display the number of FRUs in each
unit.
EXAMPLE 4-5 showhardconf -u
XSCF> showhardconf -u SPARC Enterprise M5000; Memory_Size:64 GB; +-----------------------------------+------------+ | FRU | Quantity | +-----------------------------------+------------+ | MBU_B | 1 | | CPUM | 4 | | Freq:2.530 GHz; | ( 8) | | MEMB | 8 | | MEM | 64 | | Type:1A; Size:1 GB; | ( 64) | | DDC_A | 4 | | DDC_B | 2 | | IOU | 2 | | DDC_A | 2 | | DDCR | 2 | | DDC_B | 2 | | XSCFU | 1 | | OPNL | 1 | | PSU | 4 | | FANBP_C | 1 | | FAN_A | 4 | +-----------------------------------+------------+
Refer to the showhardconf -u man page for more information.
3. Verify the state of the status LEDs on the FRU.
Refer to
TABLE 2-11 for LED status.
Chapter 4 FRU Replacement Preparation 4-11

4.4 Cold Replacement (Powering the Server Off and On)

In cold replacement all business operations are stopped. Cold replacement is the act of powering off the server and disconnecting input power. This is normally required for safety when the inside of the server is accessed.
Note – The input power cables are used to ground the server. If the server is not
mounted in a rack use a grounding strap to ground the server.
Note – After a complete chassis power cycle (all power cords removed), make
certain to allow 30 seconds before connecting the power cords back into the chassis.

4.4.1 Powering the Server Off Using Software

1. Notify users that the server is being powered off.
2. Back up the system files and data to tape, if necessary.
3. Log in to the XSCF Shell and type the poweroff command.
XSCF> poweroff -a
The following actions occur when the poweroff command is used:
The Oracle Solaris OS shuts down cleanly.
The server powers off to Standby mode (the XSCF unit and one fan will still have
power).
Refer to the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide for details.
4. Verify the state of the status LED on the XSCF.
The POWER LED should be off.
5. Disconnect all power cables from the input power source.
4-12 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
Caution – There is an electrical hazard if the power cords are not disconnected. All
power cords must be disconnected to completely remove power from the server.

4.4.2 Powering the Server On Using Software

1. Make sure that the server has enough power supply units to run the desired
configuration.
2. Connect all power cables to the input power source.
3. Make sure the XSCF STANDBY LED on the operator panel is On.
4. Turn the keyswitch on the operator panel to the desired mode position (Locked
or Service).
5. Log into the XSCF Shell and type the poweron command.
XSCF> poweron -a
Refer to the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide for details.
6. After a delay the following activities occur:
The operator panel POWER LED lights.
The system executes the power-on self-test (POST).
Then, the server is completely powered on.
Note – If the Oracle Solaris automatic booting is set, use the sendbreak -d
domain_id command after the display console banner is displayed but before the system starts booting the operating system to get the ok prompt.
Chapter 4 FRU Replacement Preparation 4-13

4.4.3 Powering the Server Off Manually

1. Notify users that the server is being powered off.
2. Back up the system files and data to tape, if necessary.
3. Place the keyswitch in the Service position.
4. Press and hold the Power switch on the operator panel for four seconds or
longer to initiate the power off.
5. Verify the state of the status POWER LED on the operator panel is off.
6. Disconnect all power cables from the input power source.
Caution – There is an electrical hazard if the power cords are not disconnected. All
power cords must be disconnected to completely remove power from the server.

4.4.4 Powering the Server On Manually

1. Make sure that the server has enough power supply units to run the desired
configuration.
2. Connect all power cables to the input power source.
3. Make sure the XSCF STANDBY LED is On.
4. Turn the keyswitch on the operator panel to the desired mode position (Locked
or Service).
5. Press the Power switch on the operator panel.
After a delay the following activities occur:
The operator panel POWER LED lights.
The system executes the power-on self-test (POST).
Then, the server is completely powered on.
Note – If the Oracle Solaris automatic booting is set, using the sendbreak -d
domain_id command after the display console banner is displayed but before the system starts booting the operating system to get the ok prompt.
4-14 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010

4.4.5 Verifying Hardware Operation

1. From the ok prompt, press the Enter key, and press the “#.” (number sign and
period) keys to switch you from the domain console to the XSCF console.
2. Type the showhardconf command to confirm the new component is installed.
EXAMPLE 4-6 showhardconf
XSCF> showhardconf SPARC Enterprise 5000; + Serial:BE80601000; Operator_Panel_Switch:Service; + Power_Supply_System:Single; SCF-ID:XSCF#0; + System_Power:On; Domain#0 Domain_Status:Powered Off;
MBU_B Status:Normal; Ver:0101h; Serial:78670002978: ; + FRU-Part-Number:CF00541-0478 01 /541-0478-01 ; + Memory_Size:64 GB; CPUM#0-CHIP#0 Status:Normal; Ver:0201h; Serial:PP0629L068 ; + FRU-Part-Number:CF00375-3477 50 /375-3477-50 ; + Freq:2.150 GHz; Type:16; + Core:2; Strand:2; CPUM#0-CHIP#1 Status:Normal; Ver:0201h; Serial:PP0629L068 ; + FRU-Part-Number:CF00375-3477 50 /375-3477-500 ; + Freq:2.150 GHz; Type:16; + Core:2; Strand:2; MEMB#0 Status:Normal; Ver:0101h; Serial:01068: ; + FRU-Part-Number:CF00541-0545 01 /541-0545-01 ; MEM#0A Status:Normal; + Code:c1000000000000004572T128000HR3.7A 252b-04123520; + Type:1B; Size:1 GB; MEM#0B Status:Normal; + Code:c1000000000000004572T128000HR3.7A 252b-04123e25; + Type:1B; Size:1 GB; MEM#1A Status:Normal; + Code:c1000000000000004572T128000HR3.7A 252b-04123722; + Type:1B; Size:1 GB; MEM#1B Status:Normal; + Code:c1000000000000004572T128000HR3.7A 252b-04123b25; + Type:1B; Size:1 GB; MEM#2A Status:Normal; + Code:c1000000000000004572T128000HR3.7A 252b-04123e20; + Type:1B; Size:1 GB; MEM#2B Status:Normal; + Code:c1000000000000004572T128000HR3.7A 252b-04123822; + Type:1B; Size:1 GB;
Chapter 4 FRU Replacement Preparation 4-15
This sample shows the showhardconf output continued.
EXAMPLE 4-7 showhardconf
DDC_A#0 Status:Normal; DDC_A#1 Status:Normal; DDC_A#2 Status:Normal; DDC_A#3 Status:Normal; DDC_B#0 Status:Normal; DDC_B#1 Status:Normal; IOU#0 Status:Normal; Ver:0101h; Serial:7867000395 ; + FRU-Part-Number:CF00541-0493 01 /541-0493-01 ; DDC_A#0 Status:Normal; DDCR Status:Normal; DDC_B#0 Status:Normal; XSCFU Status:Normal,Active; Ver:0101h; Serial:78670002628 ; + FRU-Part-Number:CF00541-0481 01 /541-0481-01 ; OPNL Status:Normal; Ver:0101h; Serial:78670000878 ; + FRU-Part-Number:CF00541-0850 01 /541-0850-01 ; PSU#0 Status:Normal; Serial:XF0345;3 + FRU-Part-Number:CF00300-1898 50 /300-1898-50; + Power_Status:Off; AC:200 V; PSU#1 Status:Normal; Serial:XF0346; + FRU-Part-Number:CF00300-1898 50 /300-1898-50; + Power_Status:Off; AC:200 V; PSU#2 Status:Normal; Serial:XF03470; + FRU-Part-Number:CF00300-1898 50 /300-1898-50; + Power_Status:Off; AC:200 V; PSU#3 Status:Normal; Serial:XF0348; + FRU-Part-Number:CF00300-1898 50 /300-1898-50; + Power_Status:Off; AC:200 V; FANBP_C Status:Normal; Ver:0101h; Serial:7867000053 ; + FRU-Part-Number:CF00541-0848 01 /541-0848-01 ; FAN_A#0 Status:Normal; FAN_A#1 Status:Normal; FAN_A#2 Status:Normal; FAN_A#3 Status:Normal; XSCF>
Refer to the showhardconf man page for more information.
3. Type the console command to switch from the XSCF console to the ok prompt
(domain console) again:
XSCF> console -d 0
4-16 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
4. From the ok prompt, type the show-devs command to ensure all PCI cards are
mounted:
EXAMPLE 4-8 show-devs
ok show-devs /pci@41,700000 /pci@40,600000 /pci@48,4000 /cmp@480,0 /pseudo-mc@240,200 /nvram /pseudo-console /virtual-memory /memory@m0 /aliases /options /openprom /chosen /packages /pci@40,600000/pci@0 /pci@40,600000/pci@0/pci@9 /pci@40,600000/pci@0/pci@8 /pci@40,600000/pci@0/pci@8/pci@0,1 /pci@40,600000/pci@0/pci@8/pci@0 /pci@40,600000/pci@0/pci@8/pci@0,1/ethernet@1 /pci@40,600000/pci@0/pci@8/pci@0/network@2,1 /pci@40,600000/pci@0/pci@8/pci@0/network@2 /pci@40,600000/pci@0/pci@8/pci@0/scsi@1 /pci@40,600000/pci@0/pci@8/pci@0/scsi@1/disk /pci@40,600000/pci@0/pci@8/pci@0/scsi@1/tape /pci@48,4000/ebus@1 /pci@48,4000/ebus@1/panel@14,280030 /pci@48,4000/ebus@1/scfc@14,200000 /pci@48,4000/ebus@1/serial@14,400000 /pci@48,4000/ebus@1/flashprom@10,0 /cmp@480,0/core@1 /cmp@480,0/core@0 /cmp@480,0/core@1/cpu@1 /cmp@480,0/core@1/cpu@0 /cmp@480,0/core@0/cpu@1 /cmp@480,0/core@0/cpu@0 /openprom/client-services /packages/obp-tftp /packages/terminal-emulator /packages/disk-label /packages/deblocker /packages/SUNW,builtin-drivers ok
Chapter 4 FRU Replacement Preparation 4-17
5. Type the probe-scsi-all command to confirm that the storage devices are
mounted.
EXAMPLE 4-9 probe-scsi-all
ok probe-scsi-all /pci@0,600000/pci@0/pci@8/pci@0/scsi@1
MPT Version 1.05, Firmware Version 1.07.00.00
Target 0 Unit 0 Disk SEAGATE ST973401LSUN72G 0556 143374738 Blocks, 73 GB SASAddress 5000c5000092beb9 PhyNum 0 Target 1 Unit 0 Disk SEAGATE ST973401LSUN72G 0556 143374738 Blocks, 73 GB SASAddress 5000c500002eeaf9 PhyNum 1 Target 3 Unit 0 Removable Read Only device TSSTcorpCD/DVDW TS-L532USR01 SATA device PhyNum 3 ok
6. Type the boot command to start the operating system.
ok boot
4-18 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
CHAPTER
5

Internal Components Access

This chapter describes how to access the internal components. The information is organized into the following topics:
Section 5.1, “Sliding the Server In and Out to the Fan Stop” on page 5-1
Section 5.2, “Top Cover Remove and Replace” on page 5-5
Section 5.3, “Fan Cover Remove and Replace” on page 5-8

5.1 Sliding the Server In and Out to the Fan Stop

The slide rails have two designated lock points. The first, the fan stop, is for easy access to the fan units. The fan units are hot, active, or cold replacement components. When using active replacement, only one fan unit should be replaced at a time to ensure redundancy.
5-1

5.1.1 Sliding the Server Out of the Equipment Rack

Caution – To prevent the equipment rack from tipping over, you must deploy the
antitilt feature, if applicable, before you slide the server out of the equipment rack.
Note – When drawing out the SPARC Enterprise M4000/M5000 Server to the front,
release the cable tie holding the PCI cables on the rear of the server.
Caution – Use proper ESD grounding techniques when handling components. See
Section 1.1, “Safety Precautions” on page 1-1.
1. Deploy the rack’s antitilt features (if applicable).
Refer to the manual that shipped with the rack for details on antitilt features.
2. If shipping brackets are attached to the back of the server, loosen the four (4)
captive screws (FIGURE 5-1).
5-2 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
FIGURE 5-1 Loosening the Captive Screws on the Shipping Brackets
Chapter 5 Internal Components Access 5-3
3. Loosen the four (4) captive screws at the front of the server (FIGURE 5-2).
FIGURE 5-2 Loosening the Captive Screws and Pulling Out the Server
4. Pull the system to the fan stop.
The system automatically locks in place at the fan stop.

5.1.2 Sliding the Server Into the Equipment Rack

1. Push the green plastic releases on each slide rail and push the server back into
the equipment rack.
2. Tighten the four (4) captive screws at the front of the server to secure it in the
rack (
FIGURE 5-2).
3. Tighten the four (4) captive screws on the shipping brackets at the rear of the
server (
4. Restore the rack antitilt features to their original position.
5-4 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
FIGURE 5-1).

5.2 Top Cover Remove and Replace

You must slide the server out of the equipment rack before removing the top cover.

5.2.1 Removing the Top Cover

Caution – To prevent the equipment rack from tipping over, you must deploy the
antitilt feature, if applicable, before you slide the server out of the equipment rack.
Note – When drawing out the SPARC Enterprise M4000/M5000 Server to the front,
release the cable tie holding the PCI cables on the rear of the server.
Caution – Use proper ESD grounding techniques when handling components. See
Section 1.1, “Safety Precautions” on page 1-1.
1. Deploy the rack’s antitilt features (if applicable).
Refer to the rack manual for details on the rack’s antitilt features.
2. Loosen the four (4) captive screws at the front of the server (
3. Loosen the four (4) captive screws on the shipping brackets at the rear of the
system (
Note – During installation the power cables should have been bundled into a loop
with enough slack to allow the system to slide out on the rails. This is called the service loop. If this is not the case the power cables will have to be disconnected to allow the server to pull all the way out of the equipment rack.
4. Pull the server to the fan stop.
The server automatically locks in place at the fan stop.
5. Push the green plastic releases on each slide rail and pull the server until it is
fully extended.
The server automatically locks in place when fully extended.
FIGURE 5-1).
Chapter 5 Internal Components Access 5-5
FIGURE 5-2).
6. Loosen the captive screw(s) on the center top of the server.
The SPARC Enterprise M4000 server has one (1) captive screw ( SPARC Enterprise M5000 server has two (2) captive screws (
7. Slide the top cover towards the rear and then remove it.
FIGURE 5-3 Removing the M4000 Server Top Cover
FIGURE 5-1). The
FIGURE 5-4).
5-6 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
FIGURE 5-4 Removing the M5000 Server Top Cover
Chapter 5 Internal Components Access 5-7

5.2.2 Replacing the Top Cover

1. Align the top cover and then slide it towards the front of the server.
2. Tighten the captive screws at the center top of the server to secure the top cover
in place.
3. Push the green plastic releases on each slide rail and push the system back into
the equipment rack.
4. Tighten the four (4) captive screws at the front of the system to secure it in the
rack (
FIGURE 5-2).
5. Tighten the four (4) captive screws on the shipping brackets at the rear of the
server (
6. Reconnect the service loop cables to the rear of the server.
7. Restore the rack antitilt features to their original position.
FIGURE 5-1).

5.3 Fan Cover Remove and Replace

All internal components are cold replacement components. The server must be powered off and power cables disconnected from the input power source. You must slide the server out of the equipment rack before removing the fan cover.

5.3.1 Removing the Fan Cover

Caution – To prevent the equipment rack from tipping over, you must deploy the
antitilt feature, if applicable, before you slide the server out of the equipment rack.
5-8 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
Note – When drawing out the M400/M5000 Server to the front, release the cable tie
holding the PCI cables on the rear of the server.
Caution – Use proper ESD grounding techniques when handling components. See
Section 1.1, “Safety Precautions” on page 1-1.
1. Deploy the rack’s antitilt features (if applicable).
Refer to the rack manual for details on the rack’s antitilt features.
2. Loosen the four (4) captive screws at the front of the server (
FIGURE 5-2).
3. Loosen the four (4) captive screws on the shipping brackets at the rear of the
system (
FIGURE 5-1).
Note – During installation the power cables should have been bundled into a loop
with enough slack to allow the system to slide out on the rails. This is called the service loop. If this is not the case the power cables will have to be disconnected to allow the server to pull all the way out of the equipment rack.
4. Pull the server to the fan stop.
The server automatically locks in place at the fan stop.
5. Push the green plastic releases on each slide rail and pull the server until it is
fully extended.
The server automatically locks in place when fully extended.
6. Remove the 60-mm fan units and place them on an ESD mat.
See Section 10.1.2, “Removing the 60-mm Fan Module” on page 10-5.
7. Loosen the captive screw on the fan cover.
8. Lift the rear edge of the fan cover and remove it.
Chapter 5 Internal Components Access 5-9
FIGURE 5-5 Removing the Fan Cover

5.3.2 Replacing the Fan Cover

1. Align the tabs on the forward section of the fan cover and push the cover down
to secure it in place.
2. Tighten the captive screw on the fan cover.
3. Install the 60-mm fan units.
See Section 10.1.3, “Installing the 60-mm Fan Module” on page 10-6.
4. Push the green plastic releases on each slide rail and push the system back into
the equipment rack.
5. Tighten the four (4) captive screws at the front of the system to secure it in the
rack (
FIGURE 5-2).
6. Tighten the four (4) captive screws on the shipping brackets at the rear of the
server (
FIGURE 5-1).
7. Reconnect the service loop cables to the rear of the server.
8. Restore the rack antitilt features to their original position.
5-10 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
CHAPTER
6

Storage Devices Replacement

This chapter describes how to remove and install the main storage systems. The information is organized into the following topics:
Section 6.1, “Hard Disk Drive Replacement” on page 6-1
Section 6.2, “CD-RW/DVD-RW Drive Unit (DVDU) Replacement” on page 6-12
Section 6.3, “Tape Drive Unit Replacement” on page 6-23

6.1 Hard Disk Drive Replacement

Hard disk drives are active, hot, or cold replacement components. Hard disk drive backplanes are cold replacement components. The hard disk drives are identical on both midrange servers. Hard disk drives and hard disk drive backplane information is organized into the following sections:
Section 6.1.1, “Accessing the Hard Disk Drive” on page 6-4
Section 6.1.2, “Removing the Hard Disk Drive” on page 6-4
Section 6.1.3, “Installing the Hard Disk Drive” on page 6-5
Section 6.1.4, “Securing the Server” on page 6-5
Section 6.1.5, “Accessing the Hard Disk Drive Backplane of the M4000 Server” on
page 6-6
Section 6.1.6, “Removing the Hard Disk Drive Backplane of the M4000 Server” on
page 6-6
Section 6.1.7, “Installing the Hard Disk Drive Backplane of the M4000 Server” on
page 6-7
Section 6.1.8, “Securing the Server” on page 6-8
Section 6.1.9, “Accessing the Hard Disk Drive Backplane of the M5000 Server” on
page 6-9
6-1
Section 6.1.10, “Removing the Hard Disk Drive Backplane of the M5000 Server” on
page 6-10
Section 6.1.11, “Installing the Hard Disk Drive Backplane of the M5000 Server” on
page 6-10
Section 6.1.12, “Securing the Server” on page 6-11
The following illustration shows the locations of the hard disk drives and the hard disk backplane on the SPARC Enterprise M4000 server.
FIGURE 6-1 M4000 Server Hard Disk Drives and Hard Disk Drive Backplane Locations
1
2
3
Location Number Component
1 Hard disk drive backplane (HDDBP#0 IOU#0)
2 Hard disk drive (HDD#1)
3 Hard disk drive (HDD#0)
6-2 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
The following illustration shows the locations of the hard disk drives and the hard disk drive backplane on the SPARC Enterprise M5000 server.
FIGURE 6-2 M5000 Server Hard Disk Drives and Hard Disk Drive Backplane Locations
1
2
3
4
5
6
Location Number Component
1 Hard disk drive backplane (HDDBP#1 IOU#1)
2 Hard disk drive backplane (HDDBP#0 IOU#0)
3 Hard disk drive (HDD#1)
4 Hard disk drive (HDD#0)
5 Hard disk drive (HDD#3)
6 Hard disk drive (HDD#2)
Chapter 6 Storage Devices Replacement 6-3

6.1.1 Accessing the Hard Disk Drive

Note – If the hard disk drive is the boot device, the hard disk will have to be
replaced using cold replacement procedures. However, active replacement can be used if the boot disk can be isolated from the Oracle Solaris OS by disk mirroring software and other software. See Section 4.4, “Cold Replacement (Powering the
Server Off and On)” on page 4-12.
Remove the hard disk drive from the domain.
This step includes using the cfgadm command to determine the Ap_Id and disconnecting the hard disk drive. See Section 4.2.1, “Removing a FRU From a
Domain” on page 4-4.

6.1.2 Removing the Hard Disk Drive

Caution – Use proper ESD grounding techniques when handling components. See
Section 1.1, “Safety Precautions” on page 1-1.
1. Push the button on the front of the hard disk drive to release the drive latch
FIGURE 6-3).
(
2. Pull the latch so that it is straight out from the hard disk drive to unseat the
drive.
3. Remove the hard disk drive and place it on the ESD mat.
FIGURE 6-3 Removing the Hard Disk Drive
6-4 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010

6.1.3 Installing the Hard Disk Drive

Caution – Do not force the hard disk drive into the slot. Doing so can cause damage
to the component and server.
1. Pull the latch so that it is straight out from the drive.
2. Align the drive in the slot and push it gently into position until it stops.
3. Secure the latch.

6.1.4 Securing the Server

1. Add the hard disk drive to the domain.
This step includes using the cfgadm command to connect and confirm the hard disk drive has been added to the domain. See Section 4.2.3, “Adding a FRU Into a
Domain” on page 4-5.
2. Verify the state of the status LEDs on the hard disk drive.
Chapter 6 Storage Devices Replacement 6-5

6.1.5 Accessing the Hard Disk Drive Backplane of the M4000 Server

Caution – There is an electrical hazard if the power cords are not disconnected. All
power cords must be disconnected to completely remove power from the server.
Caution – Use proper ESD grounding techniques when handling components. See
Section 1.1, “Safety Precautions” on page 1-1.
1. Power off the server.
This step includes turning the key switch to the Service position, confirming that the POWER LED is off, and disconnecting power cables. See Section 4.4.1,
“Powering the Server Off Using Software” on page 4-12.
Caution – To prevent the equipment rack from tipping over, you must deploy the
antitilt feature, if applicable, before you slide the server out of the equipment rack.
Note – When drawing out the SPARC Enterprise M4000/M5000 server to the front,
release the cable tie holding the PCI cables on the rear of the server.
2. Remove the fan cover.
This step includes deploying the rack’s antitilt features (if applicable), sliding the server out of the equipment rack, removing the 60-mm fan units and removing the fan cover. See Section 5.3.1, “Removing the Fan Cover” on page 5-8.

6.1.6 Removing the Hard Disk Drive Backplane of the M4000 Server

1. Remove the CD-RW/DVD-RW Drive Unit and place it on the ESD mat.
See Section 6.2.3, “Removing the CD-RW/DVD-RW Drive Unit” on page 6-16.
2. Remove the power and serial cables from the rear of the CD-RW/DVD-RW
Drive Backplane.
3. Loosen the captive screw that holds the rear of the CD-RW/DVD-RW Drive
Backplane in place.
4. Remove the CD-RW/DVD-RW backplane and place it on the ESD mat.
6-6 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
5. Remove all hard disk drives and place them on the ESD mat.
See Section 6.1.2, “Removing the Hard Disk Drive” on page 6-4.
6. Remove the power cable (p3) from the rear of the hard disk drive backplane.
7. Loosen the captive screw that holds the hard disk drive backplane in place.
8. Lift the hard disk drive backplane from the guide pins.
9. Remove the blue serial cable from the hard disk drive backplane and place the
backplane on the ESD mat.

6.1.7 Installing the Hard Disk Drive Backplane of the M4000 Server

1. Secure the blue serial cable to the hard disk drive backplane.
2. Place the hard disk drive backplane onto the guide pins.
3. Tighten the captive screw that holds down the rear of the hard disk drive
backplane in place.
4. Secure the power cable (p3) to the rear of the hard disk drive backplane.
Caution – Do not force any components into server slots. Doing so can cause damage
to the component and server.
5. Install the hard disk drives.
See Section 6.1.3, “Installing the Hard Disk Drive” on page 6-5.
6. Place the CD-RW/DVD-RW backplane onto the guide pin.
7. Tighten the captive screw that holds the rear of the CD-RW/DVD-RW Drive
Backplane in place.
8. Connect the power and serial cables to the rear of the CD-RW/DVD-RW Drive
Backplane.
9. Install the CD-RW/DVD-RW Drive Unit.
See Section 6.2.4, “Installing the CD-RW/DVD-RW Drive Unit” on page 6-17.
Chapter 6 Storage Devices Replacement 6-7

6.1.8 Securing the Server

1. Install the fan cover.
This step includes replacing the fan cover, installing the 60-mm fan units, sliding the server in to the equipment rack and restoring the rack antitilt features to their original position. See Section 5.3.2, “Replacing the Fan Cover” on page 5-10.
2. Power the server on.
This step includes reconnecting power cables, verifying the state of the LEDs, and turning the keyswitch to the Locked position. See Section 4.4.2, “Powering the
Server On Using Software” on page 4-13.
Note – If the Oracle Solaris automatic booting is set, use the sendbreak -d
domain_id command after the display console banner is displayed but before the system starts booting the operating system to get the ok prompt.
3. Confirm the hardware.
This step includes running programs to be certain all components are mounted again and then booting the operating system.
Refer to Section 4.3.2, “Verifying Hardware Operation” on page 4-9 for more information.
6-8 SPARC Enterprise M4000/M5000 Servers Service Manual • December 2010
Loading...