Manual Code C120-E330-08EN
Part No. 819-7877-15
August 2009, Revision A
Page 4
Copyright 2007-2009 FUJITSU LIMITED, 1-1, Kamikodanaka 4-chome, Nakahara-ku, Kawasaki-shi, Kanagawa-ken 211-8588, Japan. All rights
reserved.
Sun Microsystems, Inc. provided technical input and review on portions of this material.
Sun Microsystems,Inc. andFujitsu Limited eachown orcontrol intellectualproperty rights relating to products andtechnology described in
this document,and such products, technology andthis documentare protectedby copyright laws, patents andother intellectual property laws
and internationaltreaties. Theintellectual propertyrights of SunMicrosystems, Inc.and Fujitsu Limited in suchproducts, technologyand this
document include,without limitation, one or moreof theUnited States patentslisted athttp://www.sun.com/patentsand one or more
additional patentsor patent applications in theUnited States or other countries.
This documentand the product and technologyto whichit pertains are distributedunder licenses restricting theiruse, copying, distribution,
and decompilation.No part of such productor technology,or of thisdocument, maybe reproducedin any form by anymeans without prior
written authorizationof Fujitsu Limited and SunMicrosystems, Inc.,and their applicablelicensors, ifany.The furnishingof this documentto
you doesnot give you any rightsor licenses, express or implied,with respectto theproduct or technology to whichit pertains,and this
document doesnot contain or representany commitment ofany kindon the partof FujitsuLimited or SunMicrosystems, Inc.,or anyaffiliate of
either ofthem.
This documentand the product and technologydescribed inthis document mayincorporate third-partyintellectual propertycopyrighted by
and/or licensedfrom suppliersto Fujitsu Limitedand/or SunMicrosystems, Inc.,including software and font technology.
Per theterms of the GPL orLGPL, a copy of thesource codegoverned by the GPL orLGPL, as applicable, is availableupon requestby the End
User.Please contactFujitsu Limited orSun Microsystems,Inc
This distribution may include materials developed by third parties.
Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark
in the U.S. and in other countries, exclusively licensed through X/Open Company, Ltd.
Sun, Sun Microsystems, the Sun logo, Java, Netra, Solaris, Sun Ray, Answerbook2, docs.sun.com, OpenBoot, and Sun Fire are trademarks or
registered trademarks of Sun Microsystems, Inc., or its subsidiaries, in the U.S. and other countries.
Fujitsu and the Fujitsu logo are registered trademarks of Fujitsu Limited.
All SPARC trademarks are used under license and are registered trademarks of SPARC International, Inc. in the U.S. and other countries.
Products bearing SPARC trademarks are based upon architecture developed by Sun Microsystems, Inc.
SPARC64 is a trademark of SPARC International, Inc., used under license by Fujitsu Microelectronics, Inc. and Fujitsu Limited.
The OPEN LOOK and Sun™ Graphical User Interfacewas developed by Sun Microsystems, Inc. for itsusers and licensees. Sun acknowledges
the pioneering efforts of Xerox in researching and developing theconcept of visual or graphical user interfaces forthe computer industry. Sun
holds anon-exclusive license from Xeroxto the Xerox Graphical User Interface, whichlicense alsocovers Sun’s licenseeswho implementOPEN
LOOK GUIs and otherwise comply with Sun’s written license agreements.
United StatesGovernment Rights - Commercial use.U.S. Governmentusers aresubject to thestandard governmentuser license agreements of
Sun Microsystems,Inc. andFujitsu Limited andthe applicableprovisions ofthe FARand itssupplements.
Disclaimer: The only warranties granted by Fujitsu Limited, Sun Microsystems, Inc. or any affiliate of either of them in connection with this
document or any product or technology described herein are those expressly set forth in the license agreement pursuant to which the product
or technology is provided. EXCEPT AS EXPRESSLY SET FORTH IN SUCH AGREEMENT, FUJITSU LIMITED, SUN MICROSYSTEMS, INC.
AND THEIRAFFILIATES MAKENO REPRESENTATIONSOR WARRANTIES OF ANY KIND (EXPRESS OR IMPLIED) REGARDINGSUCH
PRODUCT OR TECHNOLOGY OR THIS DOCUMENT, WHICH ARE ALL PROVIDED AS IS, AND ALL EXPRESS OR IMPLIED
CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTY OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE
EXTENT THAT SUCH DISCLAIMERS ARE HELDTO BE LEGALLY INVALID. Unless otherwiseexpressly set forth in such agreement, to the
extent allowed by applicable law, in no event shall Fujitsu Limited, Sun Microsystems, Inc. or any of their affiliates have any liability to any
third party under any legal theory for any loss of revenues or profits, loss of use or data, or business interruptions, or for any indirect, special,
incidental or consequential damages, even if advised of the possibility of such damages.
DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES,
INCLUDING ANYIMPLIED WARRANTY OF MERCHANTABILITY, FITNESSFOR A PARTICULARPURPOSE OR NON-INFRINGEMENT,
ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Entrée et revue tecnical fournies par Sun Microsystems, Incl sur des parties de ce matériel.
Sun Microsystems, Inc. et Fujitsu Limited détiennent et contrôlent toutes deux des droits de propriété intellectuelle relatifs aux produits et
technologies décrits dans ce document. De même, ces produits, technologies et ce document sont protégés par des lois sur le copyright, des
brevets, d’autreslois sur la propriétéintellectuelle et des traités internationaux. Les droits de propriété intellectuelle de SunMicrosystems, Inc.
et Fujitsu Limited concernant ces produits, ces technologies et ce document comprennent, sans que cette liste soit exhaustive, un ou plusieurs
des brevets déposésaux États-Unis et indiqués à l’adresse http://www.sun.com/patents de même qu’unou plusieurs brevets ouapplications
brevetées supplémentaires aux États-Unis et dans d’autres pays.
Ce document, le produit et les technologies afférents sont exclusivement distribués avec des licences qui en restreignent l’utilisation, la copie,
la distribution et la décompilation. Aucune partie de ce produit, de ces technologies ou de ce document ne peut être reproduite sous quelque
forme quece soit, parquelque moyen quece soit, sans l’autorisation écrite préalablede Fujitsu Limitedet de Sun Microsystems, Inc., etde leurs
éventuels bailleurs de licence. Ce document, bien qu’il vous ait été fourni, ne vous confère aucun droit et aucune licence, expresses ou tacites,
concernant le produitou la technologie auxquelsil se rapporte. Par ailleurs, il ne contient nine représente aucun engagement,de quelque type
que ce soit, de la part de Fujitsu Limited ou de Sun Microsystems, Inc., ou des sociétés affiliées.
Ce document, et le produit et les technologies qu’il décrit, peuvent inclure des droits de propriété intellectuelle de parties tierces protégés par
copyright et/ou cédés sous licence par des fournisseurs à Fujitsu Limited et/ou Sun Microsystems, Inc., y compris des logiciels et des
technologies relatives aux polices de caractères.
Par limites du GPL ou du LGPL, une copie du code source régi par le GPL ou LGPL, comme applicable, est sur demande vers la fin utilsateur
disponible; veuillez contacter Fujitsu Limted ou Sun Microsystems, Inc.
Cette distribution peut comprendre des composants développés par des tierces parties.
Des parties de ce produit pourront être dérivées des systèmes Berkeley BSD licenciés par l’Université de Californie. UNIX est une marque
déposée aux Etats-Unis et dans d’autres pays et licenciée exclusivement par X/Open Company, Ltd.
Sun, Sun Microsystems, le logo Sun, Java, Netra, Solaris, Sun Ray, Answerbook2, docs.sun.com, OpenBoot, et Sun Fire sont des marques de
fabrique ou des marques déposées de Sun Microsystems, Inc., ou ses filiales, aux Etats-Unis et dans d’autres pays.
Fujitsu et le logo Fujitsu sont des marques déposées de Fujitsu Limited.
Toutes les marques SPARC sont utilisées sous licence et sont des marques de fabrique ou des marques déposées de SPARC International, Inc.
aux Etats-Unis et dans d’autres pays. Les produits portant les marques SPARC sont basés sur une architecture développée par Sun
Microsystems, Inc.
SPARC64 est une marques déposée de SPARC International, Inc., utilisée sous le permis par Fujitsu Microelectronics, Inc. et Fujitsu Limited.
L’interface d’utilisation graphique OPEN LOOK et Sun™ a été développée par Sun Microsystems, Inc. pour ses utilisateurs et licenciés. Sun
reconnaît les effortsde pionniers de Xerox pour la recherche et le développementdu concept des interfaces d’utilisation visuelle ou graphique
pour l’industrie de l’informatique. Sun détient une license non exclusive de Xerox sur l’interface d’utilisation graphique Xerox, cette licence
couvrant également les licenciés de Sun qui mettent en place l’interface d’utilisation graphique OPEN LOOK et qui, en outre, se conforment
aux licences écrites de Sun.
Droits du gouvernement américain - logiciel commercial. Les utilisateurs du gouvernement américain sont soumis aux contrats de licence
standard de Sun Microsystems, Inc. et de Fujitsu Limited ainsi qu’aux clauses applicables stipulées dans le FAR et ses suppléments.
Avis de non-responsabilité: les seules garanties octroyéespar Fujitsu Limited,Sun Microsystems,Inc. ou toutesociété affiliée del’une ou l’autre
entité enrapport avec cedocument ou toutproduit ou toutetechnologie décrit(e) dansles présentes correspondent aux garanties expressément
stipulées dans le contrat de licence régissant le produit ou la technologie fourni(e). SAUF MENTION CONTRAIRE EXPRESSÉMENT
STIPULÉE DANS CE CONTRAT, FUJITSU LIMITED, SUN MICROSYSTEMS, INC. ET LES SOCIÉTÉS AFFILIÉES REJETTENT TOUTE
REPRÉSENTATION OU TOUTE GARANTIE, QUELLE QU’EN SOIT LA NATURE (EXPRESSE OU IMPLICITE) CONCERNANT CE
PRODUIT,CETTE TECHNOLOGIE OUCE DOCUMENT, LESQUELS SONT FOURNIS ENL’ÉTAT. ENOUTRE, TOUTES LESCONDITIONS,
REPRÉSENTATIONS ETGARANTIES EXPRESSES OU TACITES, Y COMPRIS NOTAMMENTTOUTE GARANTIE IMPLICITE RELATIVE À
LA QUALITÉ MARCHANDE, À L’APTITUDE À UNE UTILISATION PARTICULIÈRE OU À L’ABSENCE DE CONTREFAÇON, SONT
EXCLUES, DANS LA MESURE AUTORISÉE PAR LA LOI APPLICABLE. Sauf mention contraire expressément stipulée dans ce contrat, dans
la mesure autoriséepar la loi applicable, en aucun cas Fujitsu Limited,Sun Microsystems, Inc. ou l’une de leurs filiales nesauraient être tenues
responsables envers une quelconque partie tierce, sous quelque théorie juridique que ce soit, de tout manque à gagner ou de perte de profit,
de problèmes d’utilisation ou de perte de données, ou d’interruptions d’activités, ou de tout dommage indirect, spécial, secondaire ou
consécutif, même si ces entités ont été préalablement informées d’une telle éventualité.
LA DOCUMENTATION EST FOURNIE “EN L’ETAT” ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES
OU TACITES SONTFORMELLEMENT EXCLUES,DANS LA MESURE AUTORISEE PARLA LOI APPLICABLE, Y COMPRISNOTAMMENT
TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L’APTITUDE A UNE UTILISATION PARTICULIERE OU A
L’ABSENCE DE CONTREFACON.
Page 6
Page 7
Contents
Prefacexvii
1.Safety and Tools1–1
1.1Symbols1–1
1.1.1Text Conventions1–1
1.1.2Prompt Notations1–2
1.1.2.1Command syntax1–2
1.1.3Environmental Requirements for Using This Product1–3
1.1.4Conventions for Alert Messages1–3
1.1.4.1Alert Messages in the Text1–3
1.2Precautions1–4
1.2.1Operating Environment of the Product1–4
1.2.2Maintenance1–4
1.2.3Conversion and Reuse of This Product1–5
1.3Tools Required for Maintenance1–5
2.Product Overview and Troubleshooting2–1
2.1System Views2–1
2.1.1SPARC Enterprise M8000 Server2–3
2.1.2SPARC Enterprise M9000 Server (Base Cabinet)2–6
v
Page 8
2.1.3SPARC Enterprise M9000 Server (Expansion Cabinet)2–9
2.2Labels2–11
2.2.1System Name Plate Label, Rating Label, ID Label (Japan) or EZ
Label (besides Japan), and Standard Label2–11
2.2.2Labels About Handling2–15
2.3Operator Panel2–16
2.3.1Operator Panel Location2–16
2.3.2Appearance and Operations2–17
2.3.3LED2–18
2.3.4Switch2–20
2.4Determining Which Diagnostics Methods To Use2–21
2.5Checking the Server and System Configuration2–23
2.5.1Checking the Hardware Configuration and FRU Status2–23
2.5.1.1Checking the Hardware Configuration2–24
2.5.2Checking the Software and XSCF Firmware Configurations2–24
2.5.2.1Checking the Software Configuration2–25
2.5.2.2Checking the Firmware Configuration2–26
2.5.3Downloading the Error Log Information2–26
2.6Error Conditions and Action To Be Taken2–26
2.6.1Predictive Self-Healing Tools2–27
2.6.2Monitoring Output2–28
2.6.3Messaging Output2–29
2.7LED Error Display2–30
2.7.1When target FRU is indicated by LEDs2–31
2.7.2When target FRU is not indicated by LEDs2–31
2.8Using the Troubleshooting Commands2–34
2.8.1Using the showlogs Command2–34
2.8.2Using the fmdump Command2–35
2.8.3Using the fmadm Command2–35
viSPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 9
2.8.3.1fmadm config Command2–35
2.8.3.2fmadm faulty Command2–36
2.8.3.3fmadm repair Command2–36
2.8.4Using the fmstat Command2–37
2.9Traditional Solaris Troubleshooting Commands2–37
2.9.1iostat Command2–38
2.9.1.1Options2–38
2.9.2prtdiag Command2–39
2.9.2.1Options2–39
2.9.3prtconf Command2–44
2.9.3.1Options2–45
2.9.4netstat Command2–46
2.9.4.1Options2–46
2.9.5ping Command2–47
2.9.5.1Options2–48
2.9.6ps Command2–49
2.9.6.1Options2–49
2.9.7prstat Command2–50
2.9.7.1Options2–50
3.Periodic Maintenance3–1
3.1Cleaning a Tape Drive Unit3–1
3.2Cleaning an Air Filter (Server)3–2
3.3Cleaning an Air Filter (I/O Unit)3–5
4.FRU Removal Preparation4–1
4.1Types of Replacement Procedures4–2
4.1.1FRU Replacement4–2
4.2Active Replacement4–3
Contentsvii
Page 10
4.2.1Disconnecting a FRU from a Domain4–3
4.2.1.1Disconnecting a CMU/IOU4–3
4.2.1.2Disconnecting a PCI card4–4
4.2.2Disconnecting and Replacing a FRU4–5
4.2.3Configuring a FRU into a Domain4–6
4.2.3.1Configuring CMU/IOU4–6
4.2.3.2Configuring a PCI card4–7
4.2.4Confirming the Hardware4–7
4.3Hot Replacement4–12
4.3.1Disconnecting and Replacing a FRU4–12
4.3.2Confirming the Hardware4–15
4.4Cold Replacement4–18
4.4.1Powering the Server Off4–18
4.4.2Powering the Server On4–19
4.4.2.1From the Operator Panel4–19
4.4.2.2From the Maintenance Terminal4–20
4.4.3Confirming the Hardware4–20
4.5Power-On/Off of Main Line Switch4–24
4.5.1Types of Power Supply4–24
4.5.1.1AC Input Power4–25
4.5.1.2Power System4–27
4.5.2Power-On/Off Procedures of Main Line Switch4–27
4.5.2.1Power-On4–28
4.5.2.2Power-Off4–28
4.5.3Main Line Switch Locations4–28
4.5.3.1SPARC Enterprise M8000 Server Single-Phase Power
Feed4–29
4.5.3.2SPARC Enterprise M8000 Server Single-Phase and Dual
Power Feed4–30
viiiSPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 11
4.5.3.3SPARC Enterprise M8000 Server Three-Phase Power
Feed4–31
4.5.3.4SPARC Enterprise M9000 Server Single-Phase Power
Feed4–32
4.5.3.5SPARC Enterprise M9000 Server Single-Phase and Dual
Power Feed4–33
4.5.3.6SPARC Enterprise M9000 Server Three-Phase Power
Feed4–34
4.6Emergency Switch-Off4–35
4.7Cable Routing of the SPARC Enterprise M8000 Server4–35
4.7.1Cable Routing When the External I/O Expansion Unit Mounted
4–35
4.7.1.1Precautions For Cable Routing4–36
4.7.1.2When Three External I/O Expansion Units Mounted
4–38
4.7.1.3For Server Using Three-phase Power Feed4–42
4.7.1.4When One External I/O Expansion Unit Mounted4–
46
5.Internal Components Access5–1
5.1How to Open and Close Doors5–1
5.2Corresponding Components and Doors That Can Be Accessed5–2
5.3How to Remove a Door5–3
5.4How to Remove a Side Panel5–4
6.Replacement of CPU/Memory Board Unit (CMU), CPU, and DIMM6–1
6.1Overview of the CMU6–1
6.2CPU Upgrade6–5
6.2.1SPARC64 VII CPU Modules Added to a New Domain6–5
6.2.2SPARC64 VII Processors Added to an Existing Domain6–8
6.2.2.1Preparing to Add SPARC64 VII Processors to anExisting
Domain6–8
Contentsix
Page 12
6.2.2.2Adding a SPARC64 VII CPU Module to a Domain
Configured With SPARC64 VI6–10
6.2.2.3Upgrading a SPARC64 VI CPU Module to SPARC64 VII
on an Existing Domain6–11
6.3Active Replacement and Hot Replacement6–12
6.4Cold Replacement6–21
6.5CPU and DIMM Replacement6–25
6.5.1Replacing a CPU Module6–27
6.5.2Memory Module Mounting Conditions6–37
6.5.2.1Confirmation of DIMM Information6–38
6.5.2.2DIMM Mounting Conditions6–39
6.5.3DIMM Replacement6–41
7.I/O Unit (IOU) Replacement7–1
7.1Overview of the IOU7–2
7.2Active Replacement and Hot Replacement7–8
7.3Cold Replacement7–16
8.FAN Unit Replacement8–1
8.1Overview of the FAN Unit8–2
8.2Active Replacement and Hot Replacement8–8
8.3Cold Replacement8–13
9.Power Supply Unit (PSU) Replacement9–1
9.1Overview of the PSU9–1
9.2Active Replacement and Hot Replacement9–8
9.3Cold Replacement9–11
10.Operator Panel Replacement10–1
10.1Overview of the Operator Panel10–1
10.2Cold Replacement10–4
xSPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 13
11.XSCF Unit Replacement11–1
11.1Overview of the XSCFU11–1
11.2Active Replacement and Hot Replacement11–5
11.3Cold Replacement11–11
12.Hard Disk Drive (HDD) Replacement12–1
12.1Overview of the HDD12–1
12.2Active Replacement12–2
12.3Cold Replacement12–5
13.PCI Slot Device Replacement13–1
13.1Overview of PCI Slot Devices13–1
13.2Active Replacement13–3
13.3Hot Replacement13–16
13.4Cold Replacement13–19
14.CD-RW/DVD-RW Drive Unit Replacement14–1
14.1Overview of a CD-RW/DVD-RW Drive Unit14–1
14.2Active Replacement14–4
14.3Hot Replacement14–9
14.4Cold Replacement14–11
15.Tape Drive Unit Replacement15–1
15.1Overview of the Tape Drive Unit15–1
15.2Active Replacement15–5
15.3Hot Replacement15–9
15.4Cold Replacement15–10
16.Clock Control Unit Replacement16–1
16.1Overview of the CLKU16–1
16.2Cold Replacement16–3
Contentsxi
Page 14
17.Crossbar Unit Replacement17–1
17.1Overview of XBUs17–1
17.2Cold Replacement17–3
18.AC Section Replacement18–1
18.1Overview of ACSs18–1
18.2Cold Replacement18–5
19.DDC Replacement19–1
19.1Overview of the DDC19–1
19.2Active Replacement and Hot Replacement19–3
19.3Cold Replacement19–6
20.Backplane Replacement20–1
20.1Overview of the BP20–1
20.2Cold Replacement20–1
20.2.1M8000/M9000 BPs20–2
20.2.2PSU BP20–14
20.2.3FAN BP20–21
21.Sensor Unit Replacement21–1
21.1Overview of the SNSU21–1
21.2Cold Replacement21–4
22.Media Backplane Replacement22–1
22.1Overview of the MEDBP22–1
22.2Cold Replacement22–5
23.Switch Backplane Replacement23–1
23.1Overview of SWBPs23–1
23.2Cold Replacement23–5
xiiSPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 15
24.Addition and Deletion of a RDPF Option, Power Cabinet, and M9000
Expansion Cabinet24–1
24.1Addition and Deletion of Rack-mountable Dual Power Feed24–2
24.1.1Overview of RDPF24–2
24.1.2Addition and Deletion24–3
24.1.2.1Addition24–3
24.1.2.2Deletion24–12
24.2Addition and Deletion of Power Cabinet24–12
24.3Addition and Deletion of M9000 Expansion Cabinet24–15
25.Addition and Deletion of CMU, DIMM, IOU, HDD, PCI Cards and TAPEU
25–1
25.1Addition25–2
25.1.1Active Addition25–2
25.1.2Cold Addition25–3
25.2Deletion25–4
25.2.1Active Deletion25–5
25.2.2Cold Deletion25–5
A. System ConfigurationA–1
A.1Installation ConditionsA–1
A.2System ConfigurationA–2
A.2.1SPARC Enterprise M8000 ServerA–2
A.2.2SPARC Enterprise M9000 Server (Base Cabinet)A–4
A.2.3SPARC Enterprise M9000 Server (Base Cabinet + Expansion
Cabinet)A–6
B. ComponentsB–1
B.1CPU Memory Board UnitB–4
B.2CPU ModuleB–5
B.3MemoryB–7
Contentsxiii
Page 16
B.4I/O UnitB–8
B.5Hard Disk DriveB–10
B.6PCI CassetteB–10
B.7IOU Onboard Device CardB–12
B.8Link Card (External I/O Expansion Unit Connection Card)B–13
B.9Crossbar UnitB–14
B.10Clock Control UnitB–16
B.11XSCF UnitB–17
B.12CD-RW/DVD-RW Drive UnitB–19
B.13Tape Drive UnitB–20
B.14Operator PanelB–21
B.15Sensor UnitB–23
B.16Power Supply UnitB–24
B.17AC SectionB–25
B.18FAN UnitB–30
B.19Power CabinetB–32
B.20Rack-mountable Dual Power FeedB–34
B.21BackplaneB–35
B.22DDCB–37
B.23PSU BackplaneB–38
B.24FAN BackplaneB–39
B.25Media BackplaneB–42
B.26Switch BackplaneB–43
C. External Interface SpecificationsC–1
C.1Serial PortC–1
C.2UPC PortC–2
C.3USB PortC–2
C.4Connection Diagram for Serial CableC–3
xivSPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 17
D. UPS ControllerD–1
D.1OverviewD–1
D.2Signal CableD–1
D.3Configuration of Signal LinesD–2
D.4Power Supply ConditionsD–4
D.4.1Input CircuitD–4
D.4.2Output CircuitD–4
D.5UPS CableD–5
D.6ConnectionsD–6
D.7UPC portD–7
E. XSCF Unit Replacement When XCP 1040 or 1041 Is in the ServerE–1
AbbreviationsAbbreviations–1
IndexIndex–1
Contentsxv
Page 18
xviSPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 19
Preface
This manual a maintenance manual for the SPARC Enterprise™ M8000/M9000
servers. The manual explains basic operations and detailed replacement procedures
for field-replaceable units (FRUs), which are components that can be replaced at the
customer's site.
As a rule, Fujitsu certified service engineers on the SPARC Enterprise M8000/M9000
servers. However, another person such as the system administrator may perform
some of the simple work, under the direction of one of the service engineers. This
manual is intended for the service engineers and other persons described above.
Notes precede the chapters, sections, and paragraphs that cover the work that only
Fujitsu-certified service engineers are allowed to perform. Follow instructions for all
work.
This section explains:
■ “Glossary” on page xvii
■ “Structure and Contents of This Manual” on page xviii
■ “SPARC Enterprise M8000/M9000 Servers Documentation” on page xxi
■ “Product Handling” on page xxiv
■ “Limitations and Cautions” on page xxvi
■ “Fujitsu Welcomes Your Comments” on page xxvii
Glossary
For the terms used in the “SPARC Enterprise M8000/M9000 Servers
Documentation” on page xxi, refer to the SPARC Enterprise
M3000/M4000/M5000/M8000/M9000 Servers Glossary.
xvii
Page 20
Structure and Contents of This Manual
This manual is organized as described below:
■ PART I Basic Information for Maintenance and Troubleshooting
Provides notes on handling the SPARC Enterprise servers and rules about
operation and descriptions, and it also describes the required tools for
maintenance.
■ Chapter 1 Safety and Tools:
Provides notes on handling the SPARC Enterprise servers and rules about
operation and descriptions, and it describes the required tools for maintenance.
■ Chapter 2 Product Overview and Troubleshooting:
Provides information that is required in troubleshooting.
■ Chapter 3 Periodic Maintenance:
Explains the maintenance work that must be performed regularly regardless of
whether a problem has occurred. The actual work is limited to preventing dust
in the environment from creating pollution.
■ Chapter 4 FRU Removal Preparation:
Explains the required basic operations for replacing components.
■ PART II Maintenance
Explains how to remove and replace FRUs.
■ Chapter 5 Internal Components Access:
Explains how to access each part of the system.
■ Chapter 6 Replacement of CPU/Memory Board Unit (CMU), CPU, and DIMM:
Explains how to replace each storage device.
■ Chapter 7 I/O Unit (IOU) Replacement:
Explains the replacement procedures for an I/O unit (IOU).
■ Chapter 8 FAN Unit Replacement:
Explains the replacement procedures for a fan unit.
xviii SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 21
■ Chapter 9 Power Supply Unit (PSU) Replacement:
Explains the replacement procedures for a power supply unit (PSU).
■ Chapter 10 Operator Panel Replacement:
Explains the replacement procedures for the operator panel.
■ Chapter 11 XSCF Unit Replacement:
Explains the replacement procedures for an XSCF unit.
■ Chapter 12 Hard Disk Drive (HDD) Replacement:
Explains the replacement procedures for a hard disk drive (HDD).
■ Chapter 13 PCI Slot Device Replacement:
Explains the replacement procedures for a device mounted in a PCI slot of an
IOU.
■ Chapter 14 CD-RW/DVD-RW Drive Unit Replacement:
Explains the replacement procedures for the CD-RW/DVD-RW drive unit.
■ Chapter 15 Tape Drive Unit Replacement:
Explains the replacement procedures for the tape drive unit.
■ Chapter 16 Clock Control Unit Replacement:
Explains the replacement procedure for the clock control unit.
■ Chapter 17 Crossbar Unit Replacement:
Explains the replacement procedure for a crossbar unit.
■ Chapter 18 AC Section Replacement:
Explains the replacement procedures for a fan unit.
■ Chapter 19 DDC Replacement:
Explains the replacement procedure for the DDC.
■ Chapter 20 Backplane Replacement:
Explains the replacement procedure for a backplane.
■ Chapter 21 Sensor Unit Replacement:
Explains the replacement procedure for the sensor unit.
■ Chapter 22 Media Backplane Replacement:
Explains the replacement procedure for the media backplane.
■ Chapter 23 Switch Backplane Replacement:
Explains the replacement procedure for the switch backplane.
Prefacexix
Page 22
■ Chapter 24 Addition and Deletion of a Rack-mountable Dual Power Feed Option,
Power Cabinet, and M9000 Expansion Cabinet:
Explains the replacement procedures for rack-mountable dual power
feed(RDPF).
■ Chapter 25 Addition and Deletion of CMU, DIMM, IOU, HDD, PCI Cards and
TAPEU:
Explains the procedures for adding a unit to the SPARC Enterprise
M8000/M9000 servers and deleting a unit from the SPARC Enterprise
M8000/M9000 servers.
■ Appendix A System Configuration:
Describes the installation conditions and configuration of the SPARC
Enterprise server.
■ Appendix B Components:
Provides figures showing the components that compose the SPARC Enterprise
servers.
■ Appendix C External Interface Specifications:
Describes the specifications of the connectors provided on the SPARC
Enterprise server unit.
■ Appendix D UPS Controller:
Describes the connection of UPC interface, which controls UPS
(Uninterruptible Power Supply).
■ Appendix E XSCF Unit Replacement When XCP 1040 or 1041 Is in the Server:
Provides a replacement procedure to use when the server uses an older version
of XCP firmware than is present in the replacement XSCFU.
■ Abbreviations
Provides the full spellings of abbreviations used in this manual.
■ Index
Provides keywords and corresponding reference page numbers so that the
reader can easily search for items in this manual as necessary.
xx SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Note – Product Notes are available on the website only. Please check for the most
recent update on your product.
2. Documentation CD
For the Documentation CD, please contact your local sales representative.
■ SPARC Enterprise M8000/M9000 Servers Documentation CD (C120-E364)
3. Manual on the Enhanced Support Facility x.x CD-ROM disk
■ Remote maintenance service
Book TitleManual Code
Enhanced Support Facility User's Guide for REMCSC112-B067
4. Manual (man page) provided in the system
XSCF man page
Note – The man page can be referenced on the XSCF Shell, and it provides the same
content as the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF
Reference Manual.
5. Sun Microsystems Software (for Solaris OS, etc.) Related Manuals
http://docs.sun.com
6. Information on Using the RCI function
This manual does not contain an explanation of the RCI build procedure. For
information on using the RCI function, refer to the RCI Build Procedure and RCIUser’s Guide provided on the website.
Prefacexxiii
Page 26
Product Handling
Maintenance
Caution – Certain tasks in this manual should only be performed by a certified
service engineer. User must not perform these tasks. Incorrect operation of these
tasks may cause electric shock, injury, or fire.
■ Installation and reinstallation of all components, and initial settings
■ Removal of front, rear, or side covers
■ Mounting/de-mounting of optional internal devices
■ Plugging or unplugging of external interface cards
■ Maintenance and inspections (repairing, and regular diagnosis and maintenance)
Caution – The following tasks regarding this product and the optional products
provided from Fujitsu should only be performed by a certified service engineer.
Users must not perform these tasks. Incorrect operation of these tasks may cause
malfunction.
■ Unpacking optional adapters and such packages delivered to the users
■ Plugging or unplugging of external interface cards
Remodeling/Rebuilding
Caution – Any modification and/or recycling of this product and its components
may be carried out only by a certified service engineer and must not be done by the
customer under any circumstances. Otherwise, electric shock, injury or fire may
result.
xxiv SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 27
Emission of Laser Beam (Invisible)
Caution – The main unit and high-speed optical interconnect cabinet contain
modules that generate invisible laser radiation. Laser beams are generated while the
equipment is operating, even if an optical cable is disconnected or a cover is
removed. Do not look at any light-emitting part directly or through an optical
apparatus (e.g., magnifying glass, microscope).
Prefacexxv
Page 28
Limitations and Cautions
Power Control and Operator Panel Mode Switch
When you use the remote power control utilizing the RCI function or the automatic
power control system (referred to below as APCS), you can disable this remote
power control or the APCS by switching to Service mode on the operator panel.
Disabling these features ensures that you do not unintentionally switch the system
power on or off during maintenance. Note system power off with the APCS cannot
be disabled with the mode switch. Therefore, be sure to turn off automatic power
control via APCS before starting maintenance.
If you switch the mode while using the RCI or the automatic power control, the
system power is controlled as follows.
FunctionMode switch
LockedService
RCIRemote power-on/power-off
operations are enabled.
Automatic
power control
To use the RCI function, see the SPARC Enterprise
M3000/M4000/M5000/M8000/M9000 Servers RCI Build Procedure and the SPARC
Enterprise M3000/M4000/M5000/M8000/M9000 Servers RCI User’s Guide which are
available on the website of manuals.
To use the APCS, see the Enhanced Support Facility User's Guide for MachineAdministration Automatic Power Control Function (Supplement Edition) .
xxvi SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Automatic power-on/poweroff operations are enabled.
Remote power-on/power-off
operations are disabled.
Automatic power-on is
disabled, but power-off
remains enabled.
Page 29
Fujitsu Welcomes Your Comments
If you have any comments or requests regarding this document, or if you find any
unclear statements in the document, please state your points specifically on the form
at the following URL.
xxviii SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 31
PA R T
IBasic Information for Maintenance and
Troubleshooting
Part I provides maintenance information, explains methods of problem analysis,
troubleshooting, and basic operations for replacing FRUs.
Page 32
Page 33
CHAPTER
1
Safety and Tools
This chapter provides notes on handling the high-end server descriptions and rules
about operation, and it lists the tools required for maintenance.
This information is explained in the following sections:
■ Section 1.1, “Symbols” on page 1-1
■ Section 1.2, “Precautions” on page 1-4
■ Section 1.3, “Tools Required for Maintenance” on page 1-5
1.1Symbols
1.1.1Text Conventions
This manual uses the following fonts and symbols to express specific types of
information.
Fonts/symbolsMeaningExample
AaBbCc123What you type, when
contrasted with on-screen
computer output.
This font represents the
example of command input in
the frame.
XSCF> adduser jsmith
1-1
Page 34
Fonts/symbolsMeaningExample
AaBbCc123The names of commands, files,
and directories; on-screen
computer output.
This font represents the
example of command input in
the frame.
ItalicIndicates the name of a
reference manual.
" "Indicates names of chapters,
sections, items, buttons, or
menus.
1.1.2Prompt Notations
The following prompt notations are used in this manual.
ShellPrompt Notations
XSCFXSCF>
C shellmachine-name%
C shell super usermachine-name#
Bourne shell and Korn shell$
Bourne shell and Korn shell super user#
OpenBoot™ PROMok
XSCF> showuser -P
User Name: jsmith
Privileges: useradm
auditadm
See the SPARC Enterprise
M3000/M4000/M5000/M8000/M
9000 Servers XSCF User’s Guide .
See Chapter 2, "Product
Overview and
Troubleshooting."
1.1.2.1Command syntax
The command syntax is as follows:
■ A variable that requires input of a value is enclosed in <>.
■ An optional element is enclosed in [ ].
■ A group of options for an optional keyword is enclosed in [ ] and delimited by |.
■ A group of options for a mandatory keyword is enclosed in {} and delimited by |.
■ The command syntax is shown in a box.
Example:
XSCF> showuser -a
1-2SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 35
1.1.3Environmental Requirements for Using This
Product
This product is a computer that is intended to be used in a computer room. For
details on the operational environment, see the SPARC Enterprise M8000/M9000Servers Site Planning Guide.
1.1.4Conventions for Alert Messages
This manual uses the following conventions to show alert messages, which are
intended to prevent injury to the user or bystanders as well as property damage, and
important messages that are useful to the user.
This indicates a hazardous situation that could result in death or serious
personal injury (potential hazard) if the user does not perform the
procedure correctly.
This indicates a hazardous situation that could result in minor or moderate
personal injury if the user does not perform the procedure correctly. This
signal also indicates that damage to the product or other property may
occur if the user does not perform the procedure correctly.
This indicates information that could help the user to use the product more
effectively.
1.1.4.1Alert Messages in the Text
An alert message in the text consists of a signal indicating an alert level followed by
an alert statement. Alert messages are indented to distinguish them from regular text
as shown in the following example. Also, a space of one line precedes and follows an
alert statement.
The tasks listed below for this product and optional product provided by Fujitsu
should be performed only by authorized service personnel.
The user must not perform these tasks. Incorrect operation of these tasks may cause
electric shock, injury, or fire.
■ Installation and reinstallation of all components
■ Removal of front, rear, or side covers
■ Mounting/unmounting of optional internal devices
■ Connecting/disconnecting of external interface cables
■ Maintenance (repair and regular diagnosis and maintenance)
Chapter 1 Safety and Tools1-3
Page 36
1.2Precautions
The following notes must be observed in maintenance work:
1.2.1Operating Environment of the Product
Use the SPARC Enterprise in the correct operating environment. The SPARC
Enterprise are assumed to be used in a computer room. For details of the operating
environment, see the SPARC Enterprise M8000/M9000 Servers Site Planning Guide.
1.2.2Maintenance
The work listed below is to be performed by authorized service engineers. Persons
who are not authorized service engineers must not perform the work. Otherwise,
electric shock, injury, or fire may result.
■ Installation, transport, and initial setup of each device
■ Removal of the front, rear, or a side cover.
■ Mounting or removing internal optional components
■ Connecting or disconnecting an external interface cable
■ Maintenance (repair, regular diagnosis, and maintenance)
The work listed below is to be performed by authorized service engineers. Persons
who are not authorized service engineers must not perform the work. Otherwise, an
equipment failure may result.
■ Unpacking or installing products, such as an optional adapter, that are delivered
to the customer
■ Connecting or disconnecting an external interface cable
1-4SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 37
1.2.3Conversion and Reuse of This Product
If this product is converted or a used article of this product is overhauled for the
purpose of reuse, unexpected injury on users and bystanders or damage to their
property may result.
1.3Tools Required for Maintenance
The maintenance work described in Chapter 6 to Chapter 24 requires maintenance
software to confirm that the SPARC Enterprise and other components are operating
correctly and to collect status information and log data about the server and
components. The work for mounting, removing, or replacing a specific component
requires screwdrivers, and special tools such as an antistatic wrist strap. These items
are listed in
TABLE 1-1Maintenance Tools
No.NameUse
1Torque wrench
[8.24 N*m (84 kgf*cm)]
2Sockets for 10 mm (M6) torque
wrench
3Sockets for 13 mm (M8) torque
wrench
4Torque wrench extension
5Torque screwdriver
[0.2 N*m (2.0 kgf*cm)]
6Slotted bitUsed to secure the clock cables between the cabinets if the
7Wrist strapFor antistatic purposes
8Conductive matFor antistatic purposes
9CPU module replacement toolFor mounting and removing CPU Modules (accessory)
10SunVTSTest program
TABLE 1-1.
Used to fix the bus bars of the power cabinet.
Used to replace the BP_A in the SPARC Enterprise M8000
server.
Used to fix the bus bars of the power cabinet.
Used to secure the clock cables between the cabinets if the
expansion cabinet of the SPARC Enterprise M9000 server is
mounted.
expansion cabinet of the SPARC Enterprise M9000 server is
mounted.
Chapter 1 Safety and Tools1-5
Page 38
Caution – Be sure to wear an antistatic wrist strap when handling components.
Place removed components on an antistatic conductive mat. Failure to do so may
result in serious damage or injury.
1-6SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 39
CHAPTER
2
Product Overview and
Troubleshooting
This chapter provides information that is required in troubleshooting.
This information is explained in the following sections:
■ Section 2.1, “System Views” on page 2-1
■ Section 2.2, “Labels” on page 2-11
■ Section 2.3, “Operator Panel” on page 2-16
■ Section 2.4, “Determining Which Diagnostics Methods To Use” on page 2-21
■ Section 2.5, “Checking the Server and System Configuration” on page 2-23
■ Section 2.6, “Error Conditions and Action To Be Taken” on page 2-26
■ Section 2.7, “LED Error Display” on page 2-30
■ Section 2.8, “Using the Troubleshooting Commands” on page 2-34
■ Section 2.9, “Traditional Solaris Troubleshooting Commands” on page 2-37
2.1System Views
This section provides views of the high-end server. The figures can be used to locate
the component in the server to be subjected to maintenance.
In terms of its structure, the high-end server consists of a cabinet that includes
various mounted components and a front door, rear door, and side covers that
protect the mounted components. The side covers are removed when cabinets are
connected to each other or when the dual power feed option is connected to the
cabinet. The operator panel, which is mounted on the front door, is always
accessible. Each door can be locked with a key so that only the administrator can
open it.
2-1
Page 40
The front and rear views of FIGURE 2-1, FIGURE 2-2, FIGURE 2-4, FIGURE 2-5, FIGURE 2-7,
and
FIGURE 2-8 include names and abbreviations for field-replaceable units (FRUs).
Components that are mounted inside the system are shown
and
FIGURE 2-9. The abbreviations are used in messages and the like. If multiple
FIGURE 2-3, FIGURE 2-6,
FRUs of the same type are mounted, the number sign # and a sequential number is
added to their names to distinguish them from one another. Owing to the reduced
scale, certain components (FRUs) are difficult to show in the figures. Accordingly,
the layout of these components as viewed from one side is indicated in the table
connected by a lead line to the component location.
2-2SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 41
2.1.1SPARC Enterprise M8000 Server
FIGURE 2-1 Front View - M8000
PSU
DDC
XSCFU
TAPEU
DVDU
SNSU
FAN_B
CMU
FAN_A
Air Filter
Chapter 2 Product Overview and Troubleshooting2-3
Page 42
FIGURE 2-2 Rear View - M8000
ACS
FAN_B
IOU
Air Filter
2-4SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 43
PSUBP_B
PSUBP_A
FIGURE 2-3 Internal View - M8000
FANBP_C
MEDBP
FANBP_C
BP_A
SWBP
Chapter 2 Product Overview and Troubleshooting2-5
Page 44
2.1.2SPARC Enterprise M9000 Server (Base Cabinet)
FIGURE 2-4 Front View - M9000 (Base Cabinet)
PSU
TAPEU
DVDU
SNSU
FAN_A
ACS
XBU
CLKU
XSCFU
IOU
Air Filter
2-6SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 45
FIGURE 2-5 Rear View - M9000 (Base Cabinet)
FAN_A
CMU
IOU
Air Filter
Chapter 2 Product Overview and Troubleshooting2-7
Page 46
FIGURE 2-6 Internal View - M9000 (Base Cabinet)
PSUBP_A
BP_B
MEDBP
SWBP
FANBP_B
FANBP_A
2-8SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 47
2.1.3SPARC Enterprise M9000 Server (Expansion
Cabinet)
FIGURE 2-7 Front View - M9000 (with the Expansion Cabinet)
PSU
cable support bracket
TAPEU
DVDU
SNSU
FAN_A
ACS
XBU
CLKU
XSCFU
IOU
Air Filter
Chapter 2 Product Overview and Troubleshooting2-9
Page 48
FIGURE 2-8 Rear View - M9000 (with the Expansion Cabinet)
FAN_A
CMU
IOU
Air Filter
2-10SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 49
FIGURE 2-9 Internal View - M9000 (with the Expansion Cabinet)
PSUBP_A
BP_B
MEDBP
SWBP
2.2Labels
FANBP_B
FANBP_A
2.2.1System Name Plate Label, Rating Label, ID Label
(Japan) or EZ Label (besides Japan), and Standard
Label
The important labels affixed on this server are shown in FIGURE 2-10 and FIGURE 2-11.
The actual description on the labels may differ from
■ The system name plate label includes the model number, serial number, and
hardware version, all of which are required for maintenance and management.
■ The rating label, which is affixed near the AC power supply, includes the power
input rating for the AC power supply.
Chapter 2 Product Overview and Troubleshooting2-11
FIGURE 2-10 and FIGURE 2-11.
Page 50
■ The ID label or EZ label is affixed on the front door of the server, and it includes
the model name and serial number, both of which are written on the system name
plate label.
ID label (Japan)
■ The standard label is affixed near the system name plate label, and it includes the
EZ label (besides Japan)
certification standards that apply:
Safety: NRTL/C
Electrical interference: VCCI-A, FCC-A, DOC-A, and MIC
Safety and electrical interference: CE
2-12SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 51
FIGURE 2-10 M8000 Label Location
System Name Plate Label
FrontRear
Standard label
Chapter 2 Product Overview and Troubleshooting2-13
Page 52
FIGURE 2-11 M9000 Label Location
System Name Plate Label
Front
Rear
Standard label
2-14SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 53
2.2.2Labels About Handling
The labels shown below, which are affixed on the high-end server, provide field
engineers with important information on component removal and mounting.
Caution – Never peel off the labels.
■ Removing and installing a CPU/memory board unit (CMU)
■ Removing a crossbar unit (XBU)
Chapter 2 Product Overview and Troubleshooting2-15
Page 54
■ Removing an I/O unit (IOU)
2.3Operator Panel
The operator panel controls the high-end server power. The operator panel is usually
locked with a key to prevent the server from being mistakenly powered off through
an operator error during system operation.
Before starting maintenance work, ask the system administrator to unlock the
operation panel.
2.3.1Operator Panel Location
FIGURE 2-12 indicates the location of the operator panel (OPNL) of the high-end
servers. The expansion cabinet is not equipped with the operator panel.
2-16SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 55
OPNL
FIGURE 2-12 Operator Panel Location (at the Front of M8000)
2.3.2Appearance and Operations
The operator panel can be used while the front door of the server is closed. Field
engineers, and the system administrator use the operation panel to check the
operating state of the server and to perform system power operations. To check the
operating state of the server, look at the LEDs. The operating state of the server is
checked by observing the LEDs, and the power supply is operated with the POWER
switch.
FIGURE 2-13 shows the appearance of the operator panel.
Chapter 2 Product Overview and Troubleshooting2-17
Page 56
FIGURE 2-13 Operator Panel
2.3.3LED
TABLE 2-1 lists the states of the server that are displayed with the LEDs on the
operator panel.
The blinking period is one second (frequency of 1 Hz).
Besides the states listed in
of the server using combinations of the three LEDs.
TABLE 2-1, the operator panel also displays various states
TABLE 2-2 indicates the states that
are usually displayed in the course of operation from the power-on to power-off of
the high-end server.
TABLE 2-1State Display by the LEDs (Operator Panel)
LEDNameLight colorDescription of function and state
POWERGreenIndicates whether power to the SPARC Enterprise server is on.
OffIndicates the power-off state.
LitIndicates the power-on state.
BlinkingThe power-off sequence is in progress.
STANDBYGreenIndicates whether the XSCF can be powered on.
XSCFOffIndicates that the system cannot be powered on.
BlinkingIndicates that initialization processing of the SPARC
Enterprise server is in progress after main line
switches were switched on.
LitIndicates that the system can be powered on.
2-18SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 57
TABLE 2-1State Display by the LEDs (Operator Panel) (Continued)
LEDNameLight colorDescription of function and state
CHECKAmberIndicates the operating status of the SPARC Enterprise server.
OffNormal state. Otherwise, this indicates that the main
line switches were switched off or a power failure
occurred.
Blinking
(*1)
Indicates that the operator panel is the maintenance
target device.
LitIndicates that the server cannot be started.
Note – *1)If the maintenance target component is indicted by a blinking CHECK
LED, the LED may be called a locater.
TABLE 2-2State Display by LED Combination (Operator Panel)
LED
POWERXSCF
STANDBY
CHECK
Description of the state
OffOffOffThe main line switch is switched off.
OffOffOnThe main line switch is switched on.
OffBlinkingOffThe XSCF is being initialized.
OffBlinkingOnAn error occurred in the XSCF.
OffOnOff• The XSCF is on standby.
• The system is waiting for power-on of the air conditioning
system.
OnOnOff• Warm-up standby processing is in progress (power-on is
delayed).
• The power-on sequence is in progress.
• The system is in operation.
BlinkingOnOff• The power-off sequence is in progress.
• Fan termination is being delayed.
Chapter 2 Product Overview and Troubleshooting2-19
Page 58
2.3.4Switch
The operator panel has the mode switch, which sets the operation mode, and the
POWER switch, which is used to power on and off the system.
TABLE 2-3Switches (Operator Panel)
SwitchNameDescription of function
ModeThis key switch is used to set an operation mode for the server.
Insert the special key that is under the customer’s control, to
switch between modes.
LockedNormal operation mode
• The system can be powered on with the POWER switch, but
it cannot be powered off with the POWER switch.
• The key can be pulled out at this key position.
ServiceMode for maintenance
• The system can be powered on and off with the POWER
switch.
• The key cannot be pulled out at this key position.
• Maintenance is performed in Service mode while the server
is stopped.
POWERThis switch is used to control the server power.
Power-on and power-off are controlled by pressing this switch
in different patterns, as described below.
Holding down for a
short time
(less than 4 seconds)
Holding down for a
long time in Service
mode
(4 seconds or
longer)
Regardless of the mode switch state, the server (all domains) is
powered on.
At this time, processing for waiting for facility (air
conditioners) power-on and warm-up completion is
skipped.(*1)
• If power to the server is on (at least one domain is
operating), shutdown processing is executed for all domains
before power-off processing.
• If the system is being powered on, the power-on processing
is cancelled, and the system is powered off.
• If the system is being powered off, the operation of the
POWER switch is ignored, and the power-off processing is
continued.
Note – *1)In normal operation, the server is powered on only when the computer
room environmental conditions satisfy the specified values. Then, the server
remains in the reset state until the operating system is booted.
2-20SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 59
TABLE 2-4Meanings of the Mode Switch
FunctionMode Switch
LockedLocked
Inhibition of Break Signal ReceptionEnabled. Reception of the break signal
can be enabled or disabled for each
domain using setdomainmode.
Power On/Off by power switchOnly power on is enabledEnabled
Disabled
2.4Determining Which Diagnostics
Methods To Use
When an error occurs, a message is often displayed on the monitor. Use the
flowcharts in
FIGURE 2-14 to find the correct methods for diagnosing problems.
Chapter 2 Product Overview and Troubleshooting2-21
Page 60
FIGURE 2-14 Diagnostic Method Flow Chart
OS panic occurred or there’s an
error on performance
Start
Check OS console and XSCF
console for error information displayed
Check /var/adm/messages
on Solaris OS
FMA message?
YES
Execute
information
fmadm to display fault
Message ID
available?
e-mail sent or not by
XSCF mail function?
NO
Is there error message
on XSCF console?
NO
NO
YES
YESNO
Execute showlogs or fmadm
on XSCF to display the fault
information
YES
Write down the displayed fault
information
Use fmadm ?
NO
YES
Enter Message ID in
http://sun.com/msg/ to
refer to fault information
Trouble
resolved?
YES
2-22SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
NO
Collect information about your
server
Contact service engineer
End
Page 61
2.5Checking the Server and System
Configuration
Before and after maintenance work, the state and configuration of the server and
components should be checked and the information saved. For recovery from a
problem, conditions related to the problem and the repair status must be checked.
The operating conditions must remain the same before and after maintenance.
A functioning server without any problems should not display any error conditions.
For example:
■ The syslog file should not display error messages.
■ * mark is not displayed in SCF shell command showhandconf
■ The administrative console should not display error messages.
■ The server processor logs should not display any error messages.
■ The Solaris™ Operating System (Solaris OS) message files should not indicate any
additional errors.
2.5.1Checking the Hardware Configuration and FRU
Status
To replace a faulty component and perform the maintenance on the server it is
important to check and understand the hardware configuration of the server and the
state of each hardware component.
The hardware configuration refers to information that indicates to what layer a
component belongs in the hardware configuration.
The status of each hardware component refers to information on the condition of the
standard or optional component in the server: temperature, power supply voltage,
CPU operating conditions, and other times.
The hardware configuration and the status of each hardware component can be
checked from the maintenance terminal using XSCF Shell commands.
TABLE 2-5 lists commands for checking the hardware configuration and status. For
details, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCFReference Manual.
Chapter 2 Product Overview and Troubleshooting2-23
Page 62
TABLE 2-5Commands for Checking Hardware Configuration and Status
CommandDescription
showhardconfDisplays the system layer that includes a faulty component.
showstatusDisplays the status of a component. This command is used to check only a faulty
component.
showboardsDisplays the use status of individual devices and resources.
showdclDisplays domain configuration information (hardware resource information).
showfruDisplays device setting information.
ioxadmDisplays the FRU status of external I/O expansion unit as normal or abnormal.
Also some conditions can be checked based on the lit and/ or blinking state of the
component LEDs (
TABLE 2-11 and TABLE 2-12).
2.5.1.1Checking the Hardware Configuration
Login authority is required to check the hardware configuration. The following
procedure for these checks can be made from the maintenance terminal. Ask the
system administrator for necessary information, such as a password. For the detailed
procedure, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 ServersXSCF User’s Guide.
1. Log in to the XSCF.
2. Execute the showhardconf command.
XSCF> showhardconf
The showhardconf command will print the hardware configuration information
to the screen. See the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 ServersXSCF User’s Guide for more detailed information.
2.5.2Checking the Software and XSCF Firmware
Configurations
The software and firmware configurations and versions affect the operation of the
server. To change the configuration or investigate a problem, check the latest
information and check for any problems in the software.
Software and firmware varies according to users.
2-24SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 63
■ The software configuration and version can be checked in the Solaris Operating
System. Refer to the Solaris OS documentation for more information.
■ The firmware configuration and versions can be checked from the maintenance
terminal using XSCF Shell commands. Refer to the SPARC Enterprise
M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide for more detailed
information.
Check the software and firmware configuration information with assistance from the
system administrator. However, if you have received login authority from the
system administrator, the following commands can be used from the maintenance
terminal for these checks:
TABLE 2-6Commands for Checking the Software Configuration
CommandDescription
showrev(1M)Displays information on patches applied to the system.
uname(1)Outputs current information regarding the system to the standard output.
TABLE 2-7Commands for Checking the XSCF Firmware Configuration
CommandDescription
version(8)XSCF Shell command that outputs the current firmware version information.
showhardconf(8)XSCF Shell command that displays what layer of the system includes a faulty
component.
showstatus(8)XSCF Shell command that displays the status of a component. This command is used
when only a faulty component is to be checked.
showdcl(8)XSCF Shell command that displays the configuration information of a domain
(hardware resource information).
showfru(8)XSCF Shell command that displays the setting information of a device.
2.5.2.1Checking the Software Configuration
The following procedure for these checks can be made from any terminal window
terminal.
1. Execute the showrev command.
# showrev
The showrev command will print the system configuration information to the
screen.
Chapter 2 Product Overview and Troubleshooting2-25
Page 64
2.5.2.2Checking the Firmware Configuration
Login authority is required to check the firmware configuration. The following
procedure for these checks can be made from the maintenance terminal:.
1. Log in to the XSCF.
2. Execute the version command.
XSCF> version
The version command will print the firmware version information to the screen.
See the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’sGuide for more detailed information.
2.5.3Downloading the Error Log Information
If you want to download the error log information, use the XSCF log fetch function.
The XSCF Unit has an interface with external units so that a service engineer can
easily obtain useful maintenance information such as error logs
Connect the maintenance terminal, and use the CLI or BUI to issue a download
instruction to the maintenance terminal to download Error Log information over the
XSCF-LAN.
Note – When the XSCF unit has a redundant configuration, log in also to the
standby XSCF and obtain the log file in the same manner.
2.6Error Conditions and Action To Be Taken
This section describes error conditions and relevant corrective actions.
This work is explained in the following sections:
■ Section 2.6.1, “Predictive Self-Healing Tools” on page 2-27
■ Section 2.6.2, “Monitoring Output” on page 2-28
■ Section 2.6.3, “Messaging Output” on page 2-29
Details of the fault information, see the SPARC Enterprise
M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide.
2-26SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 65
You can find more detailed descriptions of Solaris OS Predictive Self-Healing at the
website below:
Predictive self-healing is an architecture and methodology for automatically
diagnosing, reporting, and handling software and hardware fault conditions. This
new technology lessens the time required to debug a hardware or software problem
and provides the administrator and technical support with detailed data about each
fault.
2.6.1Predictive Self-Healing Tools
In Solaris OS, the fault manager runs in the background. If a failure occurs, the
system software recognizes the error and attempts to determine what hardware is
faulty. The software also takes steps to prevent that component from being used
until it has been replaced. Some of the specific activities the software takes include:
■ Receives telemetry information about problems detected by the system software
■ Diagnoses the problems
■ Initiates pro-active self-healing activities. For example, the fault manager can
disable faulty components.
The state of a FRU, group of FRUs, or part of a FRU, that has been isolated
because a fault was detected. The isolation is usually done to prevent possibly
faulty components from affecting other system components. The part that is
isolated is not always the faulty part alone; a normal part may be degraded to
isolate the faulty part. If a function required for the operation of the system is
degraded, a system failure may result.
■ When possible, causes the faulty FRU to provide an LED indication of a fault in
addition to populating the system console messages with more details
TABLE 2-8 shows a typical message generated when a fault occurs. The message
appears on your console and is recorded in the /var/adm/messages file.
Note – The message in TABLE 2-8 indicates that the fault has already been diagnosed.
Any corrective action that the system can perform has already taken place. If your
server is still running, it continues to run.
Chapter 2 Product Overview and Troubleshooting2-27
Page 66
TABLE 2-8Predictive Self Healing Message
Output displayedDescription
Nov 1 16:30:20 dt88-292 EVENT-TIME: Tue Nov 1 16:30:20 PST 2005EVENT-TIME: the time stamp of
the diagnosis.
Nov 1 16:30:20 dt88-292 PLATFORM: SUNW,A70, CSN: -, HOSTNAME:
dt88-292
Nov 1 16:30:20 dt88-292 SOURCE: eft, REV: 1.13SOURCE: Information on the
Nov 1 16:30:20 dt88-292 EVENT-ID:
afc7e660-d609-4b2f-86b8-ae7c6b8d50c4
Nov 1 16:30:20 dt88-292 DESC:
Nov 1 16:30:20 dt88-292 A problem was detected in the PCI-Express
subsystem
Nov 1 16:30:20 dt88-292 Refer to http://sun.com/msg/SUN4-8000-0Y for
more information.
Nov 1 16:30:20 dt88-292 AUTO-RESPONSE: One or more device instances
may be disabled
Nov 1 16:30:20 dt88-292 IMPACT: Loss of services provided by the device
instances associated with this fault
Nov 1 16:30:20 dt88-292 REC-ACTION: Schedule a repair procedure to
replace the affected device. Use Nov 1 16:30:20 dt88-292 fmdump -v -u
EVENT_ID to identify the device or contact Sun for support.
PLATFORM: A description of the
server encountering the problem.
Diagnosis Engine used to
determine the fault.
EVENT-ID: The Universally
Unique event ID for this fault.
DESC: A basic description of the
failure.
WEBSITE: Where to find specific
information and actions for this
fault.
AUTO-RESPONSE: What, if
anything, the system did to
alleviate any follow-on issues
IMPACT: A description of what
that response might have done.
REC-ACTION: A short description
of what the system administrator
should do.
2.6.2Monitoring Output
To understand error conditions, collect monitoring output information, by using the
commands shown below.
TABLE 2-9 lists the commands for checking the monitoring output.
2-28SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 67
TABLE 2-9Commands for Checking the Monitoring Output
CommandOperandDescription
showlogs(8) consoleXSCF firmware collects console logs of console messages that were output
through the XSCF. This command collects all the console messages displayed to
users.
monitorLogs the messages displayed in the message window of the BUI/CLI.
panicSaves as panic logs the console logs that are logged when a reset is received after
a panic notification.
iplCollects the console data generated during a period from power-on of a domain
to completion of operating system startup (system running).
2.6.3Messaging Output
To understand error conditions, collect messaging output information, by using the
commands shown below.
TABLE 2-10 lists the commands for checking the messaging output.
Chapter 2 Product Overview and Troubleshooting2-29
Page 68
TABLE 2-10 Commands for Checking the Messaging Output
CommandOperandDescription
showlogsenvCollects the temperature history log. The SPARC Enterprise server environmental
temperature data and power status are collected at a 10-minute interval. The data
is stored for a maximum of six months.
powerCollects the log of power events and reset events. The target range covers the
SPARC Enterprise server, External I/O Expansion units, and UPSs.
eventCollects the message which accompanies the command or the progress of
operation such as Dynamic Reconfiguration (DR), the status of operation on the
operator panel, the event such as the shut down request to OS due to power
failure or abnormal temperature, as event log. This information is used to analyze
faults and investigate the use status of individual devices at a customer's site, and
it is kept as a maintenance work history.
errorInformation on the SPARC Enterprise server hardware faults detected by the SCF,
POST/OpenBoot PROM, or ESF machine management and software monitoring
error information are logged as SCF error logs. The showlogs error command
can display with hexadecimal codes the error information stored in the SCF error
log and information on faulty components.
fmdump(1M)
fmdump(8)
Hardware and software are automatically diagnosed according to the fault
management architecture (FMA), and the diagnosis results and errors are
automatically recorded. The fmdump command can display the recorded
information. It is provided as a Solaris OS command and XSCF Shell command.
The information can be checked at the site at the specified URL by using a
displayed message ID.
Each error message logged by the predictive self-healing architecture has a code
associated with it as well as a web address that can be followed to get the most
up-to-date course of action for dealing with that error.
Refer to the Solaris OS documentation for more information on predictive
self-healing.
2.7LED Error Display
This section explains the LEDs of each FRU that are to be checked when the relevant
FRU is replaced. Each LED can be checked after the door of a cabinet is opened.
Whether the state of the entire system is normal can be learned by checking the
operator panel (outside). When an error occurs in an individual hardware
component in the system, the LEDs of the FRU containing the hardware component
2-30SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 69
that has caused the error indicate that an error has occurred. The LEDs on the
operator panel (back) indicate the status of the operator as a single unit. However,
some FRUs like DIMMs do not have LEDs.
Whether a FRU without LEDs is in the normal state can be checked by executing the
XSCF Shell commands showhardconf and ioxadm from a maintenance terminal.
For details of the commands, see the SPARC EnterpriseM3000/M4000/M5000/M8000/M9000 Servers XSCF Reference Manual.
2.7.1When target FRU is indicated by LEDs
When an error message is displayed at the system console and the cause of the error
is in hardware, a faulty FRU must be removed and replaced. Each FRU is equipped
with an LED to indicate whether an error has occurred in the FRU and an LED to
indicate whether the FRU can be removed. Most FRUs are named READY LED and
CHECK LED. In some cases, names are not indicated but the icons are always
printed or icon labels are always affixed. Such FRUs include the back of the operator
panel, XSCFUs, CMUs, XBUs, CLKUs, FANs, and HDDs.
2.7.2When target FRU is not indicated by LEDs
For some FRUs, the READY LED and CHECK LED are not used as the names of the
LEDs that are checked at replacement. Even in such a case, the same icons as those
for the READY LED and CHECK LED are used so that the meaning of LEDs can be
understood. Even if the names of LEDs are not indicated, the icons are always
printed or icon labels are always affixed.
TABLE 2-11 LED Display That Should Be Checked When a FRU Is Replaced (Common)
LEDDisplay and meaning
READY
(green)
Indicates whether the unit is operating (whether it is configured into the system).
LitIndicates that the FRU is operating. The FRU cannot be disconnected and
removed from the system. Therefore, the FRU cannot be replaced.
BlinkingIndicates that the FRU is being configured into the system (or, for an XSCFU,
being initialized) or being disconnected from the system. However, for a PSU, it
indicates that the main line switch has been switched on.
OffIndicates that the FRU is stopped and disconnected from the system. Therefore,
the FRU can be replaced.
Chapter 2 Product Overview and Troubleshooting2-31
Page 70
TABLE 2-11 LED Display That Should Be Checked When a FRU Is Replaced (Common) (Continued)
LEDDisplay and meaning
CHECK
(amber)
Indicates either that the unit contains an error or that the unit is a target device for replacement.
LitIndicates that an error has been detected in the hardware of the FRU. (For an
HDD, the LED is lit according to the instruction from the software or
middleware.)
Blinking (*1) Indicates that the FRU is to be replaced.
OffIndicates that the state of the FRU is normal.
Note – *1)If the maintenance target component is indicted by a blinking CHECK
LED, the LED may be called a locater.
TABLE 2-12 Status Display of LEDs Defined Individually for Each FRU
LED
FRU
XSCFUREADYLit (green)Indicates that the XSCFU is in use. In this state, the
Blinking
(green)
OffIndicates that the XSCFU can be replaced.
CHECKLit (amber)Indicates that an error was detected in the XSCFU.
Blinking
(amber)
OffIndicates that the XSCFU is in the normal state.
ACTIVELit (green)Indicates that the XSCFU is in use (active).
OffIndicates that the XSCFU is on standby.
MeaningTypeDisplay
XSCFU cannot be removed (cannot be replaced).
Indicates that the XSCFU is being initialized.
However, this LED remains on for a few minutes
immediately after power-on (until the start of
initialization). It does not indicate an error during that
time.
Indicates that the XSCFU is a replacement target.
2-32SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 71
TABLE 2-12 Status Display of LEDs Defined Individually for Each FRU (Continued)
LED
FRU
XSCFU and IOU
(display part for
LAN)
ACTLit (green)Indicates that communication is being performed
OffIndicates that no communication is being performed
MeaningTypeDisplay
through the Ethernet port (LAN port).
through the Ethernet port (LAN port).
LINK SPEEDLit (amber)Only for an IOU: Indicates that the communication
speed is 1G bps.
Lit (green)Indicates that the communication speed is 100M bps.
OffIndicates that the communication speed is 10M bps.
HDDREADYLit (green)Indicates that the HDD is in operation. In this state, the
HDD cannot be removed (cannot be replaced).
OK
Blinking
(green)
Indicates that the HDD is being connected. In this state,
the HDD cannot be removed (cannot be replaced).
OffIndicates that the HDD can be replaced.
CHECKLit (amber)Indicates that an error was detected in the HDD.
However, this LED remains on for a few minutes
immediately after power-on (until the start of
initialization). It does not indicate an error during that
time.
Blinking
Indicates that the HDD is a replacement target.
(amber)
OffIndicates that the HDD is in the normal state.
PCI card
(inside an external
I/O expansion unit)
(Power)Lit (green)Indicates that power is being supplied to the PCI slot.
OffIndicates that the PCI card in the PCI slot is stopped.
(Attention)Lit (amber)Indicates that an error occurred in the hardware of the
PCI slot.
Blinking
(amber)
Indicates that the PCI card in this PCI slot is a device to
be replaced.
OffIndicates that the hardware of the PCI slot is normal.
Chapter 2 Product Overview and Troubleshooting2-33
Page 72
TABLE 2-12 Status Display of LEDs Defined Individually for Each FRU (Continued)
LED
FRU
PSU: power supply
unit
POWERLit (green)Indicates that the power to the system is turned on and
Blinking
(green)
MeaningTypeDisplay
being supplied.
Indicates that power is being supplied to the PSU, but
the PSU is not turned on.
OffIndicates that power is not being supplied to the PSU.
FAILLit (amber)Indicates that an error occurred in the PSU.
Maintenance can be performed.
OffIndicates that the PSU is normal.
PRFLLit (amber)Indicates that the rotational speed of the cooling fan in
the PSU is abnormal.
OffIndicates that the rotational speed of the cooling fan in
the PSU is normal.
2.8Using the Troubleshooting Commands
After the message in TABLE 2-8 is displayed, you might desire more information
about the fault. For complete information about troubleshooting commands, refer to
the Solaris OS man pages or the XSCF Shell man pages. This section describes some
details of the following commands:
■ Section 2.8.1, “Using the showlogs Command” on page 2-34
■ Section 2.8.2, “Using the fmdump Command” on page 2-35
■ Section 2.8.3, “Using the fmadm Command” on page 2-35
■ Section 2.8.4, “Using the fmstat Command” on page 2-37
2.8.1Using the showlogs Command
The showlogs command will display the contents of a specified log in order of
timestamp starting with the oldest date. The showlogs command will display the
following logs:
■ error log
■ power log
■ event log
■ temperature and humidity record
■ monitoring message log
2-34SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
The fmdump command can be used to display the contents of any log files associated
with the Solaris Fault Manager.
The fmdump command produces the following output. This example assumes there
is only one fault.
# fmdump
TIME UUID SUNW-MSG-ID
Nov 02 10:04:15.4911 0ee65618-2218-4997-c0dc-b5c410ed8ec2 SUN4-8000-0Y
2.8.3Using the fmadm Command
This section describes the use of the fmadm command.
The administrator and all service personnel can use the fmadm command. This
command can display and change the system configuration parameters managed by
the Solaris Fault Manager.
2.8.3.1fmadm config Command
The fmadm config command outputs the version and status of the diagnostic
engine used by the server. To determine whether the latest diagnostic engine is
running, compare the version with the information on the SunSolve website.
Chapter 2 Product Overview and Troubleshooting2-35
The fmadm faulty command can be used mainly to identify the status of faulty
components.
In the following example, the PCI card is degraded and associated with the
following UUID 49847040-ce57-e453-9adc-fe66c7c65384. Also, the "faulted"
state may be displayed.
Note – The error information remains in the Solaris OS even when replacement of
the faulty component is completed. Identify the UUID by executing the fmadm
faulty command, and reset the error information by executing the fmadm repair
command with the UUID specified.
2.8.3.3fmadm repair Command
The fmadm repair command can be used to reset the error information for a faulty
component in the Solaris OS.
# fmadm repair 49847040-ce57-e453-9adc-fe66c7c65384
fmadm: recorded repair to 3de29de5-6332-ec64-9b49-bacc739fe3c3
2-36SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 75
Note – The error information remains in the Solaris OS even when replacement of
the faulty component is completed. Identify the UUID by executing the fmadm
faulty command, and reset the error information by executing the fmadm repair
command with the UUID specified.
2.8.4Using the fmstat Command
The fmstat command can report statistics associated with the Solaris Fault
Manager. The fmstat command shows information about DE performance. In the
example below, the eft DE (also seen in the console output) has received an event
which it accepted. A case is "opened" for that event and a diagnosis is performed to
"solve" the cause for the failure.
These superuser commands can help you determine if you have issues in your
workstation, in the network, or within another server that you are networking with.
The following commands are described in this section:
■ Section 2.9.1, “iostat Command” on page 2-38
■ Section 2.9.2, “prtdiag Command” on page 2-39
■ Section 2.9.3, “prtconf Command” on page 2-44
■ Section 2.9.4, “netstat Command” on page 2-46
■ Section 2.9.5, “ping Command” on page 2-47
■ Section 2.9.6, “ps Command” on page 2-49
■ Section 2.9.7, “prstat Command” on page 2-50
Most of these commands are located in the /usr/bin or /usr/sbin directories.
Chapter 2 Product Overview and Troubleshooting2-37
Page 76
2.9.1iostat Command
The iostat command iteratively reports terminal, drive, and tape I/O activity, as
well as CPU utilization.
2.9.1.1Options
TABLE 2-13 describes options for the iostat command and how those options can
help troubleshoot the server.
TABLE 2-13 Options for iostat
OptionDescriptionHow It Can Help
No optionReports status of local I/O devices.A quick three-line output of device
status.
-cReports the percentage of time the system has spent in user
mode, in system mode, waiting for I/O, and idling.
-eDisplays device error summary statistics. The total errors,
hard errors, soft errors, and transport errors are displayed.
-EDisplays all device error statistics.Provides information about
-nDisplays names in descriptive format.Descriptive format helps identify
-xFor each drive, reports extended drive statistics. The output
is in tabular form.
Quick report of CPU status.
Provides a short table with
accumulated errors. Identifies
suspect I/O devices.
devices: manufacturer, model
number, serial number, size, and
errors.
devices.
Similar to the -e option, but
provides rate information. This
helps identify poor performance of
internal devices and other I/O
devices across the network.
2-38SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 77
The following example shows output for one iostat command.
# iostat -En
c0t0d0Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: ST3120026ARevision: 8.01Serial No: 3JT4H4C2
Size: 120.03GB <120031641600 bytes>
Media Error: 0 Device Not Ready: 0No Device: 0 Recoverable: 0
Illegal Request: 0
c0t2d0Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: LITE-ONProduct: COMBO SOHC-4832K Revision: O3K1 Serial
No:
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
2.9.2prtdiag Command
The prtdiag command displays configuration and diagnostic information. The
diagnostic information identifies any failed component.
The prtdiag command is located in the
/usr/platform/platform-name/sbin/ directory.
Note – The prtdiag command might indicate a slot number different than that
identified elsewhere in this document. This is normal.
2.9.2.1Options
TABLE 2-14 describes options for the prtdiag command and how those options can
help troubleshooting.
TABLE 2-14 Options for prtdiag
OptionDescriptionHow It Can Help
No optionLists components.Identifies CPU timing and PCI
cards installed.
-vVerbose mode. Displays the time of the most recent AC
power failure, the most recent hardware fatal error
information, and (if applicable) environmental condition.
Chapter 2 Product Overview and Troubleshooting2-39
Provides the same information as
no option. Additionally lists fan
status, temperatures, ASIC, and
PROM revisions.
Page 78
The following example shows output for the prtdiag command in verbose mode.
# prtdiag -v
bash-3.2# cat /etc/release
Solaris Express Community Edition snv_81 SPARC
Copyright 2008 Sun Microsystems, Inc.All Rights Reserved.
Use is subject to license terms.
Assembled 15 January 2008
bash-3.2# prtdiag
System Configuration:Sun Microsystemssun4u XXXX SPARC Enterprise M8000 Server
System clock frequency: 960 MHz
Memory size: 32768 Megabytes
=================== Environmental Status ===================
Mode switch is in UNLOCK mode
=================== System Processor Mode ===================
SPARC64-VII mode
bash-3.2#
2.9.3prtconf Command
Similar to the show-devs command run at the ok prompt, the prtconf command
displays the devices that are configured.
The prtconf command identifies hardware that is recognized by the Solaris OS. If
hardware is not suspected of being bad yet software applications are having trouble
with the hardware, the prtconf command can indicate if the Solaris OS software
recognizes the hardware, and if a driver for the hardware is loaded.
2-44SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 83
2.9.3.1Options
TABLE 2-15 describes options for the prtconf command and how those options can
help troubleshooting.
TABLE 2-15 Options for prtconf
OptionDescriptionHow It Can Help
No optionDisplays the device tree of devices recognized by the OS.If a hardware device is recognized,
then it is probably functioning
properly. If the message "(driver
not attached)" is displayed for the
device or for a sub-device, then the
driver for the device is corrupt or
missing.
-DSimilar to the output of no option, however the device
driver is listed.
-pSimilar to the output of no option, yet is abbreviated.Reports a brief list of the devices.
-VDisplays the version and date of the OpenBoot PROM
firmware.
The following example shows output for the prtconf command.
# prtconf
System Configuration: Sun Microsystems sun4u
Memory size: 32768 Megabytes
System Peripherals (Software Nodes):
Lists the driver needed or used by
the OS to enable the device.
Provides a quick check of firmware
version.
SUNW,SPARC-Enterprise
scsi_vhci, instance #0
packages (driver not attached)
SUNW,builtin-drivers (driver not attached)
deblocker (driver not attached)
disk-label (driver not attached)
terminal-emulator (driver not attached)
obp-tftp (driver not attached)
ufs-file-system (driver not attached)
chosen (driver not attached)
openprom (driver not attached)
client-services (driver not attached)
options, instance #0
aliases (driver not attached)
memory (driver not attached)
virtual-memory (driver not attached)
Chapter 2 Product Overview and Troubleshooting2-45
Page 84
pseudo-console, instance #0
nvram (driver not attached)
pseudo-mc, instance #0
pseudo-mc, instance #1
pseudo-mc, instance #4
cmp (driver not attached)
core (driver not attached)
cpu (driver not attached)
cpu (driver not attached)
(The rest is omitted.)
2.9.4netstat Command
The netstat command displays the network status.
2.9.4.1Options
TABLE 2-16 describes options for the netstat command and how those options can
help troubleshooting.
TABLE 2-16 Options for netstat
OptionDescriptionHow It Can Help
-iDisplays the interface state, including packets in/out, error
in/out, collisions, and queue.
-i intervalProviding a trailing number with the -i option repeats the
netstat command every interval seconds.
-pDisplays the media table.Provides MAC address for hosts
-rDisplays the routing table.Provides routing information.
-nReplaces host names with IP addresses.Used when an address is more
2-46SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Provides a quick overview of the
network status.
Identifies intermittent or long
duration network events. By
piping netstat output to a file,
overnight activity can be viewed
all at once.
on the subnet.
useful than a host name.
Page 85
The following example shows output for the netstat -p command.
# netstat -p
Net to Media Table: IPv4
Device IP Address Mask Flags Phys Addr
The ping command sends ICMP ECHO_REQUEST packets to network hosts.
Depending on how the ping command is configured, the output displayed can
identify troublesome network links or nodes. The destination host is specified in the
variable hostname.
Chapter 2 Product Overview and Troubleshooting2-47
Page 86
2.9.5.1Options
TABLE 2-17 describes options for the ping command and how those options can help
troubleshooting.
TABLE 2-17 Options for ping
OptionDescriptionHow It Can Help
hostnameThe probe packet is sent to hostname and returned.Verifies that a host is active on the
network.
-g hostnameForces the probe packet to route through a specified
gateway.
-i interfaceDesignates which interface to send and receive the probe
packet through.
-nReplaces host names with IP addresses.Used when an address is more
-sPings continuously in one-second intervals. Ctrl-C aborts.
Upon abort, statistics are displayed.
-svRDisplays the route the probe packet followed in one second
intervals.
By identifying different routes to
the target host, those individual
routes can be tested for quality.
Enables a simple check of
secondary network interfaces.
beneficial than a host name.
Helps identify intermittent or
long-duration network events. By
piping ping output to a file,
activity overnight can later be
viewed at once.
Indicates probe packet route and
number of hops. Comparing
multiple routes can identify
bottlenecks.
The following example shows output for the ping -s command.
# ping -s teddybear
PING teddybear: 56 data bytes
64 bytes from teddybear (192.146.77.140): icmp_seq=0. time=1. ms
64 bytes from teddybear (192.146.77.140): icmp_seq=1. time=0. ms
64 bytes from teddybear (192.146.77.140): icmp_seq=2. time=0. ms
^C
2-48SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 87
2.9.6ps Command
The ps command lists the status of processes. Using options and rearranging the
command output can assist in determining the resource allocation.
2.9.6.1Options
TABLE 2-18 describes options for the ps command and how those options can help
troubleshooting.
TABLE 2-18 Options for ps
OptionDescriptionHow It Can Help
-eDisplays information for every process.Identifies the process ID and the
executable.
-fGenerates a full listing.Provides the following process
information: user ID, parent
process ID, time when executed,
and the path to the executable.
-o optionEnables configurable output. The pid, pcpu, pmem, and
comm options display process ID, percent CPU consumption,
percent memory consumption, and the responsible
executable, respectively.
Provides only most important
information. Knowing the
percentage of resource
consumption helps identify
processes that are affecting
performance and might be hung.
The following example shows output for one ps command.
# ps -eo pcpu,pid,comm|sort -rn
1.4 100317 /usr/openwin/bin/Xsun
0.9 100460 dtwm
0.1 100677 ps
0.1 100600 ksh
0.1 100591 /usr/dt/bin/dtterm
0.1 100462 /usr/dt/bin/sdtperfmeter
0.1 100333 mibiisa
%CPUPID COMMAND
0.0 100652 /bin/csh
...
Note – When using sort with the -r option, the column headings are printed so that
the value in the first column is equal to zero.
Chapter 2 Product Overview and Troubleshooting2-49
Page 88
2.9.7prstat Command
The prstat utility iteratively examines all active processes and reports statistics based
on the selected output mode and sort order. The prstat command provides output
similar to the ps command.
2.9.7.1Options
TABLE 2-19 describes options for the prstat command and how those options can
help troubleshooting.
TABLE 2-19 Options for prstat
OptionDescriptionHow It Can Help
No optionDisplays a sorted list of the top processes that are
consuming the most CPU resources. List is limited to the
height of the terminal window and the total number of
processes. Output is automatically updated every five
seconds. Ctrl-C aborts.
-n numberLimits output to number of lines.Limits amount of data displayed
-s keyPermits sorting list by key parameter.Useful keys are cpu (default),
-vVerbose mode.Displays additional parameters.
Output identifies process ID, user
ID, memory used, state, CPU
consumption, and command
name.
and identifies primary resource
consumers.
time, and size.
The following example shows output for the prstat command.
2-50SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 89
CHAPTER
3
Periodic Maintenance
Caution – The information in this chapter and subsequent chapters is intended for
service engineers.
Persons other than the authorized field engineers should use this information only
for reference and should not actually perform the work.
This chapter explains the maintenance work that must be performed regularly
regardless of whether a problem has occurred. The actual work is limited to
preventing dust in the environment from creating pollution.
This information is explained in the following sections:
■ Section 3.1, “Cleaning a Tape Drive Unit” on page 3-1
■ Section 3.2, “Cleaning an Air Filter (Server)” on page 3-2
■ Section 3.3, “Cleaning an Air Filter (I/O Unit)” on page 3-5
The high-end server is equipped with air filters at the bottom of the cabinet. These
air filters filter out dust particles from the air that the fans suck in from the floor into
the cabinet. If the filters become clogged, the ventilation volume is reduced and the
temperature rises, leading to problems. Although the frequency of cleaning varies
with the operating environment, the air filters must be cleaned on a regular basis to
ensure that they do not become clogged with dust. Each I/O unit also has air filters.
Clean them at the same time that the air filters of the server are cleaned.
When the service life expiration date of an air filter has already passed, replace it by
referring to the air filter cleaning procedure.
3.1Cleaning a Tape Drive Unit
The head in a tape drive unit must be cleaned regularly.
3-1
Page 90
Each tape drive unit used for operation must be cleaned once every 24 hours of
operation. Even tape drive units not used for operation must be cleaned once every
month.
Although cleaning work can be performed in either hot or cold system maintenance
mode, the SPARC Enterprise server power must be on when a cleaning cassette is
used. The cleaning procedure is as follows.
1. If a tape cassette has been inserted in the tape drive unit, remove it from the
unit.
2. While holding the cleaning cassette with themark side facing right, insert
it into the tape drive unit slot.
Head cleaning begins automatically.
3. The cleaning cassette is automatically ejected when cleaning is completed.
Remove it from the slot.
4. To use the tape cassette that was removed in Step 1, reinsert it into the tape
drive unit.
5. Confirm that the tape drive unit is in the normal state.
At this point, head cleaning is finished.
If one of the following problems occurs, replace the cleaning cassette immediately:
■ The cleaning cassette is not automatically ejected within one minute after being
inserted.
■ The tape is fully wound on the take-up reel on the right side. (The cassette can no
longer be reused.)
Use only specified cleaning cassettes.
Note – Contact your sales representative for tape drive unit options on SPARC
Enterprise M8000/M9000 servers.
3.2Cleaning an Air Filter (Server)
An air filter may be cleaned while power to the server is on. Although the air filters
must be cleaned once a year, be sure to clean them if they become visibly dirty, even
if they are not scheduled for cleaning.
A high-end server cabinet is equipped with a total of six air filters: three at the front
and three at the rear at the bottom.
3-2SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 91
Note – One concern about cleaning the air filters while power to the server is on is
that dislodged dust may be sucked inside the system when the air filters are pulled
out. Therefore, gently and slowly pull them out. Complete the cleaning as quickly as
possible.
Caution – If you must use a vacuum cleaner for this work, use it outside the
computer room. Do not use it inside the computer room. Using a vacuum cleaner
inside the computer room may result in a server failure.
Because the structure and the mounting environment of air filters are the same, the
descriptions in the figures covering filter cleaning refer to, as an example, the air
filters at the front of each model.
1. Unlock and open the front and rear doors of the server. For details, see
Chapter 5.
2. Using a Phillips screwdriver, loosen the screw securing the fixing bracket of an
air filter, and turn the bracket so that it faces downward.
Chapter 3 Periodic Maintenance3-3
Page 92
FIGURE 3-1 Removing Air Filters (Example for the M8000)
Fixing bracket (x3)
3. Pull out all of the air filters.
4. Use a cleaner to remove dust from the air filters. Attach a brush to the tip of the
cleaner, and clean both sides of the filters.
5. Restore each air filter to its original location and orientation, which means the
knob is on the side closest to you and the arrow on the label points up (the
latticework faces upward).
6. When this restoring work is completed for all the air filters, turn the fixing
brackets of the air filters until they face upward, and then tighten the screws
firmly with the Phillips screwdriver. Finally, close the front and rear doors of
the SPARC Enterprise server.
Removal of Air Filters
This filter cleaning procedure applies to both high-end servers.
3-4SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 93
FIGURE 3-2 Removing of Air Filters (Example Using the M9000 Base Cabinet)
Fixing bracket (x3)
3.3Cleaning an Air Filter (I/O Unit)
Each I/O unit has two air filters. Clean them at the same time that the air filters of
the server are cleaned.
Caution – If you must use a vacuum cleaner for this work, use it outside the
computer room. Do not use it inside the computer room. Using a vacuum cleaner
inside the computer room may result in a server failure.
The cleaning procedure is as follows.
1. Loosen the screws securing the filter cover, and remove the filter cover.
2. Pull out the air filter from the filter cover.
Chapter 3 Periodic Maintenance3-5
Page 94
FIGURE 3-3 Removing of Air Filters (I/O Unit)
3. Use a vacuum cleaner to remove dust from the air filter.
4. After the cleaning is completed, follow the removal procedure in reverse order
to mount it.
3-6SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 95
CHAPTER
4
FRU Removal Preparation
This chapter explains the required basic operations for replacing components, in the
following sections:
■ Section 4.1, “Types of Replacement Procedures” on page 4-2
■ Section 4.2, “Active Replacement” on page 4-3
■ Section 4.3, “Hot Replacement” on page 4-12
■ Section 4.4, “Cold Replacement” on page 4-18
■ Section 4.5, “Power-On/Off of Main Line Switch” on page 4-24
■ Section 4.6, “Emergency Switch-Off” on page 4-35
■ Section 4.7, “Cable Routing of the SPARC Enterprise M8000 Server” on page 4-35
When actually performing the work of replacing a component, use the operator
panel and the maintenance terminal by referring to the operator panel display,
maintenance terminal display, and the LED display of the component.
Depending on the target component, the server must be powered off or a domain
must be stopped.
For the LED display of each component, see Section 2.7, “LED Error Display” on
page 2-30. Three replacement types are defined for judging whether power-off of the
server or stopping a domain is necessary: active replacement, hot replacement, and
cold replacement. See Part II, Maintenance. For information on the swapping types
of each component, see Appendix B.
Note – Some of the XSCF functions have restrictions on their use. Register the
necessary user privileges for each field engineer in advance. Field engineers cannot
use functions that have not been registered for them. The system administrator sets
and changes the users and their privileges. For details, see the SPARC EnterpriseM3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide.
Power-on and power-off of the server and emergency power-off are explained in the
last part of this chapter.
■ Section 4.5, “Power-On/Off of Main Line Switch” on page 4-24
4-1
Page 96
■ Section 4.6, “Emergency Switch-Off” on page 4-35
4.1Types of Replacement Procedures
4.1.1FRU Replacement
The three types of replacement procedures explained below are supported for FRU
replacement. Choose the most suitable replacement procedure according to the
customer's system environment.
■ Active replacement
A target FRU is operated while the Solaris OS of the domain to which the FRU
belongs is operating. The target FRU is operated by using Solaris OS commands or
XSCF commands. Because the power supply unit (PSU) and fan unit (FAN) do not
belong to any domain, they are operated by using XSCF commands regardless of the
operating state of the Solaris OS.
Note – The procedure for disconnecting a hard disk drive from the domain depends
on whether disk mirroring software or similar support software is active. For
details, see the related individual software manuals.
■ Hot replacement
A target FRU is operated while the domain to which the FRU belongs is stopped.
Depending on the target FRU, there are two cases as follows:
■ Operated with XSCF commands.
■ Operated directly, not by using XSCF commands.
■ Cold replacement
After all the domains are stopped and then the server is powered off, a FRU is
operated.
Note – Do not operate a target FRU while the OpenBoot PROM is running (the ok
prompt is displayed). After stopping the relevant domain (power-off) or starting the
Solaris OS, operate the target FRU.
4-2SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 97
4.2Active Replacement
In active replacement the Solaris OS must be configured to allow the component to
be replaced. Active replacement has four stages:
■ Section 4.2.1, “Disconnecting a FRU from a Domain” on page 4-3
■ Section 4.2.2, “Disconnecting and Replacing a FRU” on page 4-5
■ Section 4.2.3, “Configuring a FRU into a Domain” on page 4-6
■ Section 4.2.4, “Confirming the Hardware” on page 4-7
Note – If the hard disk drive is the boot device, it is necessary to replace it
according to the cold replacement procedure. However, if the disk mirroring
software or other support software can disconnect the relevant boot disk from
Solaris OS, active replacement can be performed.
4.2.1Disconnecting a FRU from a Domain
4.2.1.1Disconnecting a CMU/IOU
Perform the following procedure to disconnect a CMU or IOU when the Solaris OS
is operating:
1. Checking resources
Check the resources that are connected to a CMU or IOU to be disconnected, and
verify that the system is not affected when it is disconnected.
2. Disconnecting from the domain
To disconnect the CMU or IOU from the domain, enter the following command
from the terminal that is connected to the XSCF:
XSCF> deleteboard 01-0
The system administrator permission is required for executing this command.
Chapter 4 FRU Removal Preparation4-3
Page 98
4.2.1.2Disconnecting a PCI card
Caution – Before you remove the PCI cassette, make sure that there is no activity on
the card in the cassette.
Caution – In the PCI cassette part, when removing cables such as LAN cable, if
your finger can't reach the latch lock of the connector, press the latch with a flathead
screwdriver to remove the cable. Forcing your finger into the clearance can cause
damage to the PCI card.
1. From the Solaris OS use the cfgadm command to get the component status:
You are about to replace FAN_A#0.
Do you want to continue?[r:replace|c:cancel] :r
Please confirm the Check LED is blinking.
If this is the case, please replace FAN_A#0.
After replacement has been completed, please select[f:finish] :f
The replacefru command will automatically test the status of the component after
the disconnecting off and replace has finished.
Chapter 4 FRU Removal Preparation4-5
Page 100
Diagnostic tests for FAN_A#0 have started.
[This operation may take up to 2 minute(s)]
(progress scale reported in seconds)
0.....30.....60.....90.....done
-------------------------------------------------Maintenance/Replacement Menu
Status of the replaced unit.
FRUStatus
--------------------FAN_A#0Normal
-------------------------------------------------The replacement of FAN_A#0 has completed, normally.[f:finish] :f
-------------------------------------------------Maintenance/Replacement Menu
Please select a type of FRU to be replaced.
1. FAN(Fan Unit)
2. PSU(Power Supply Unit)
-------------------------------------------------Select [1,2|c:cancel] : C
XSCF>
When the tests are complete the program will return to the original menu. Select
cancel to return to the XSCF Shell prompt.
Note – The display may vary depending on the XCP version.
4.2.3Configuring a FRU into a Domain
4.2.3.1Configuring CMU/IOU
Perform the following procedure to configure a CMU or IOU when the Solaris OS is
operating:
4-6SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.