FUJITSU M8000 User Manual

Page 1
Page 2
Page 3
SPARC Enterprise
M8000/M9000 Servers
Service Manual
Manual Code C120-E330-08EN Part No. 819-7877-15 August 2009, Revision A
Page 4
Copyright 2007-2009 FUJITSU LIMITED, 1-1, Kamikodanaka 4-chome, Nakahara-ku, Kawasaki-shi, Kanagawa-ken 211-8588, Japan. All rights reserved.
Sun Microsystems, Inc. provided technical input and review on portions of this material. Sun Microsystems,Inc. andFujitsu Limited eachown orcontrol intellectualproperty rights relating to products andtechnology described in
this document,and such products, technology andthis documentare protectedby copyright laws, patents andother intellectual property laws and internationaltreaties. Theintellectual propertyrights of SunMicrosystems, Inc.and Fujitsu Limited in suchproducts, technologyand this document include,without limitation, one or moreof theUnited States patentslisted athttp://www.sun.com/patentsand one or more additional patentsor patent applications in theUnited States or other countries.
This documentand the product and technologyto whichit pertains are distributedunder licenses restricting theiruse, copying, distribution, and decompilation.No part of such productor technology,or of thisdocument, maybe reproducedin any form by anymeans without prior written authorizationof Fujitsu Limited and SunMicrosystems, Inc.,and their applicablelicensors, ifany.The furnishingof this documentto you doesnot give you any rightsor licenses, express or implied,with respectto theproduct or technology to whichit pertains,and this document doesnot contain or representany commitment ofany kindon the partof FujitsuLimited or SunMicrosystems, Inc.,or anyaffiliate of either ofthem.
This documentand the product and technologydescribed inthis document mayincorporate third-partyintellectual propertycopyrighted by and/or licensedfrom suppliersto Fujitsu Limitedand/or SunMicrosystems, Inc.,including software and font technology.
Per theterms of the GPL orLGPL, a copy of thesource codegoverned by the GPL orLGPL, as applicable, is availableupon requestby the End User.Please contactFujitsu Limited orSun Microsystems,Inc
This distribution may include materials developed by third parties. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark
in the U.S. and in other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, Java, Netra, Solaris, Sun Ray, Answerbook2, docs.sun.com, OpenBoot, and Sun Fire are trademarks or
registered trademarks of Sun Microsystems, Inc., or its subsidiaries, in the U.S. and other countries. Fujitsu and the Fujitsu logo are registered trademarks of Fujitsu Limited. All SPARC trademarks are used under license and are registered trademarks of SPARC International, Inc. in the U.S. and other countries.
Products bearing SPARC trademarks are based upon architecture developed by Sun Microsystems, Inc. SPARC64 is a trademark of SPARC International, Inc., used under license by Fujitsu Microelectronics, Inc. and Fujitsu Limited. The OPEN LOOK and Sun™ Graphical User Interfacewas developed by Sun Microsystems, Inc. for itsusers and licensees. Sun acknowledges
the pioneering efforts of Xerox in researching and developing theconcept of visual or graphical user interfaces forthe computer industry. Sun holds anon-exclusive license from Xeroxto the Xerox Graphical User Interface, whichlicense alsocovers Sun’s licenseeswho implementOPEN LOOK GUIs and otherwise comply with Sun’s written license agreements.
United StatesGovernment Rights - Commercial use.U.S. Governmentusers aresubject to thestandard governmentuser license agreements of Sun Microsystems,Inc. andFujitsu Limited andthe applicableprovisions ofthe FARand itssupplements.
Disclaimer: The only warranties granted by Fujitsu Limited, Sun Microsystems, Inc. or any affiliate of either of them in connection with this document or any product or technology described herein are those expressly set forth in the license agreement pursuant to which the product or technology is provided. EXCEPT AS EXPRESSLY SET FORTH IN SUCH AGREEMENT, FUJITSU LIMITED, SUN MICROSYSTEMS, INC. AND THEIRAFFILIATES MAKENO REPRESENTATIONSOR WARRANTIES OF ANY KIND (EXPRESS OR IMPLIED) REGARDINGSUCH PRODUCT OR TECHNOLOGY OR THIS DOCUMENT, WHICH ARE ALL PROVIDED AS IS, AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELDTO BE LEGALLY INVALID. Unless otherwiseexpressly set forth in such agreement, to the extent allowed by applicable law, in no event shall Fujitsu Limited, Sun Microsystems, Inc. or any of their affiliates have any liability to any third party under any legal theory for any loss of revenues or profits, loss of use or data, or business interruptions, or for any indirect, special, incidental or consequential damages, even if advised of the possibility of such damages.
DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANYIMPLIED WARRANTY OF MERCHANTABILITY, FITNESSFOR A PARTICULARPURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Please
Page 5
Copyright 2007-2009 FUJITSU LIMITED, 1-1, Kamikodanaka 4-chome, Nakahara-ku, Kawasaki-shi, Kanagawa-ken 211-8588, Japon. Tous droits réservés.
Entrée et revue tecnical fournies par Sun Microsystems, Incl sur des parties de ce matériel. Sun Microsystems, Inc. et Fujitsu Limited détiennent et contrôlent toutes deux des droits de propriété intellectuelle relatifs aux produits et
technologies décrits dans ce document. De même, ces produits, technologies et ce document sont protégés par des lois sur le copyright, des brevets, d’autreslois sur la propriétéintellectuelle et des traités internationaux. Les droits de propriété intellectuelle de SunMicrosystems, Inc. et Fujitsu Limited concernant ces produits, ces technologies et ce document comprennent, sans que cette liste soit exhaustive, un ou plusieurs des brevets déposésaux États-Unis et indiqués à l’adresse http://www.sun.com/patents de même qu’unou plusieurs brevets ouapplications brevetées supplémentaires aux États-Unis et dans d’autres pays.
Ce document, le produit et les technologies afférents sont exclusivement distribués avec des licences qui en restreignent l’utilisation, la copie, la distribution et la décompilation. Aucune partie de ce produit, de ces technologies ou de ce document ne peut être reproduite sous quelque forme quece soit, parquelque moyen quece soit, sans l’autorisation écrite préalablede Fujitsu Limitedet de Sun Microsystems, Inc., etde leurs éventuels bailleurs de licence. Ce document, bien qu’il vous ait été fourni, ne vous confère aucun droit et aucune licence, expresses ou tacites, concernant le produitou la technologie auxquelsil se rapporte. Par ailleurs, il ne contient nine représente aucun engagement,de quelque type que ce soit, de la part de Fujitsu Limited ou de Sun Microsystems, Inc., ou des sociétés affiliées.
Ce document, et le produit et les technologies qu’il décrit, peuvent inclure des droits de propriété intellectuelle de parties tierces protégés par copyright et/ou cédés sous licence par des fournisseurs à Fujitsu Limited et/ou Sun Microsystems, Inc., y compris des logiciels et des technologies relatives aux polices de caractères.
Par limites du GPL ou du LGPL, une copie du code source régi par le GPL ou LGPL, comme applicable, est sur demande vers la fin utilsateur disponible; veuillez contacter Fujitsu Limted ou Sun Microsystems, Inc.
Cette distribution peut comprendre des composants développés par des tierces parties. Des parties de ce produit pourront être dérivées des systèmes Berkeley BSD licenciés par l’Université de Californie. UNIX est une marque
déposée aux Etats-Unis et dans d’autres pays et licenciée exclusivement par X/Open Company, Ltd. Sun, Sun Microsystems, le logo Sun, Java, Netra, Solaris, Sun Ray, Answerbook2, docs.sun.com, OpenBoot, et Sun Fire sont des marques de
fabrique ou des marques déposées de Sun Microsystems, Inc., ou ses filiales, aux Etats-Unis et dans d’autres pays. Fujitsu et le logo Fujitsu sont des marques déposées de Fujitsu Limited. Toutes les marques SPARC sont utilisées sous licence et sont des marques de fabrique ou des marques déposées de SPARC International, Inc.
aux Etats-Unis et dans d’autres pays. Les produits portant les marques SPARC sont basés sur une architecture développée par Sun Microsystems, Inc.
SPARC64 est une marques déposée de SPARC International, Inc., utilisée sous le permis par Fujitsu Microelectronics, Inc. et Fujitsu Limited. L’interface d’utilisation graphique OPEN LOOK et Sun™ a été développée par Sun Microsystems, Inc. pour ses utilisateurs et licenciés. Sun
reconnaît les effortsde pionniers de Xerox pour la recherche et le développementdu concept des interfaces d’utilisation visuelle ou graphique pour l’industrie de l’informatique. Sun détient une license non exclusive de Xerox sur l’interface d’utilisation graphique Xerox, cette licence couvrant également les licenciés de Sun qui mettent en place l’interface d’utilisation graphique OPEN LOOK et qui, en outre, se conforment aux licences écrites de Sun.
Droits du gouvernement américain - logiciel commercial. Les utilisateurs du gouvernement américain sont soumis aux contrats de licence standard de Sun Microsystems, Inc. et de Fujitsu Limited ainsi qu’aux clauses applicables stipulées dans le FAR et ses suppléments.
Avis de non-responsabilité: les seules garanties octroyéespar Fujitsu Limited,Sun Microsystems,Inc. ou toutesociété affiliée del’une ou l’autre entité enrapport avec cedocument ou toutproduit ou toutetechnologie décrit(e) dansles présentes correspondent aux garanties expressément stipulées dans le contrat de licence régissant le produit ou la technologie fourni(e). SAUF MENTION CONTRAIRE EXPRESSÉMENT STIPULÉE DANS CE CONTRAT, FUJITSU LIMITED, SUN MICROSYSTEMS, INC. ET LES SOCIÉTÉS AFFILIÉES REJETTENT TOUTE REPRÉSENTATION OU TOUTE GARANTIE, QUELLE QU’EN SOIT LA NATURE (EXPRESSE OU IMPLICITE) CONCERNANT CE PRODUIT,CETTE TECHNOLOGIE OUCE DOCUMENT, LESQUELS SONT FOURNIS ENL’ÉTAT. ENOUTRE, TOUTES LESCONDITIONS, REPRÉSENTATIONS ETGARANTIES EXPRESSES OU TACITES, Y COMPRIS NOTAMMENTTOUTE GARANTIE IMPLICITE RELATIVE À LA QUALITÉ MARCHANDE, À L’APTITUDE À UNE UTILISATION PARTICULIÈRE OU À L’ABSENCE DE CONTREFAÇON, SONT EXCLUES, DANS LA MESURE AUTORISÉE PAR LA LOI APPLICABLE. Sauf mention contraire expressément stipulée dans ce contrat, dans la mesure autoriséepar la loi applicable, en aucun cas Fujitsu Limited,Sun Microsystems, Inc. ou l’une de leurs filiales nesauraient être tenues responsables envers une quelconque partie tierce, sous quelque théorie juridique que ce soit, de tout manque à gagner ou de perte de profit, de problèmes d’utilisation ou de perte de données, ou d’interruptions d’activités, ou de tout dommage indirect, spécial, secondaire ou consécutif, même si ces entités ont été préalablement informées d’une telle éventualité.
LA DOCUMENTATION EST FOURNIE “EN L’ETAT” ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES OU TACITES SONTFORMELLEMENT EXCLUES,DANS LA MESURE AUTORISEE PARLA LOI APPLICABLE, Y COMPRISNOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L’APTITUDE A UNE UTILISATION PARTICULIERE OU A L’ABSENCE DE CONTREFACON.
Page 6
Page 7

Contents

Preface xvii
1. Safety and Tools 1–1
1.1 Symbols 1–1
1.1.1 Text Conventions 1–1
1.1.2 Prompt Notations 1–2
1.1.2.1 Command syntax 1–2
1.1.3 Environmental Requirements for Using This Product 1–3
1.1.4 Conventions for Alert Messages 1–3
1.1.4.1 Alert Messages in the Text 1–3
1.2 Precautions 1–4
1.2.1 Operating Environment of the Product 1–4
1.2.2 Maintenance 1–4
1.2.3 Conversion and Reuse of This Product 1–5
1.3 Tools Required for Maintenance 1–5
2. Product Overview and Troubleshooting 2–1
2.1 System Views 2–1
2.1.1 SPARC Enterprise M8000 Server 2–3
2.1.2 SPARC Enterprise M9000 Server (Base Cabinet) 2–6
v
Page 8
2.1.3 SPARC Enterprise M9000 Server (Expansion Cabinet) 2–9
2.2 Labels 2–11
2.2.1 System Name Plate Label, Rating Label, ID Label (Japan) or EZ Label (besides Japan), and Standard Label 2–11
2.2.2 Labels About Handling 2–15
2.3 Operator Panel 2–16
2.3.1 Operator Panel Location 2–16
2.3.2 Appearance and Operations 2–17
2.3.3 LED 2–18
2.3.4 Switch 2–20
2.4 Determining Which Diagnostics Methods To Use 2–21
2.5 Checking the Server and System Configuration 2–23
2.5.1 Checking the Hardware Configuration and FRU Status 2–23
2.5.1.1 Checking the Hardware Configuration 2–24
2.5.2 Checking the Software and XSCF Firmware Configurations 2–24
2.5.2.1 Checking the Software Configuration 2–25
2.5.2.2 Checking the Firmware Configuration 2–26
2.5.3 Downloading the Error Log Information 2–26
2.6 Error Conditions and Action To Be Taken 2–26
2.6.1 Predictive Self-Healing Tools 2–27
2.6.2 Monitoring Output 2–28
2.6.3 Messaging Output 2–29
2.7 LED Error Display 2–30
2.7.1 When target FRU is indicated by LEDs 2–31
2.7.2 When target FRU is not indicated by LEDs 2–31
2.8 Using the Troubleshooting Commands 2–34
2.8.1 Using the showlogs Command 2–34
2.8.2 Using the fmdump Command 2–35
2.8.3 Using the fmadm Command 2–35
vi SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 9
2.8.3.1 fmadm config Command 2–35
2.8.3.2 fmadm faulty Command 2–36
2.8.3.3 fmadm repair Command 2–36
2.8.4 Using the fmstat Command 2–37
2.9 Traditional Solaris Troubleshooting Commands 2–37
2.9.1 iostat Command 2–38
2.9.1.1 Options 2–38
2.9.2 prtdiag Command 2–39
2.9.2.1 Options 2–39
2.9.3 prtconf Command 2–44
2.9.3.1 Options 2–45
2.9.4 netstat Command 2–46
2.9.4.1 Options 2–46
2.9.5 ping Command 2–47
2.9.5.1 Options 2–48
2.9.6 ps Command 2–49
2.9.6.1 Options 2–49
2.9.7 prstat Command 2–50
2.9.7.1 Options 2–50
3. Periodic Maintenance 3–1
3.1 Cleaning a Tape Drive Unit 3–1
3.2 Cleaning an Air Filter (Server) 3–2
3.3 Cleaning an Air Filter (I/O Unit) 3–5
4. FRU Removal Preparation 4–1
4.1 Types of Replacement Procedures 4–2
4.1.1 FRU Replacement 4–2
4.2 Active Replacement 4–3
Contents vii
Page 10
4.2.1 Disconnecting a FRU from a Domain 4–3
4.2.1.1 Disconnecting a CMU/IOU 4–3
4.2.1.2 Disconnecting a PCI card 4–4
4.2.2 Disconnecting and Replacing a FRU 4–5
4.2.3 Configuring a FRU into a Domain 4–6
4.2.3.1 Configuring CMU/IOU 4–6
4.2.3.2 Configuring a PCI card 4–7
4.2.4 Confirming the Hardware 4–7
4.3 Hot Replacement 4–12
4.3.1 Disconnecting and Replacing a FRU 4–12
4.3.2 Confirming the Hardware 4–15
4.4 Cold Replacement 4–18
4.4.1 Powering the Server Off 4–18
4.4.2 Powering the Server On 4–19
4.4.2.1 From the Operator Panel 4–19
4.4.2.2 From the Maintenance Terminal 4–20
4.4.3 Confirming the Hardware 4–20
4.5 Power-On/Off of Main Line Switch 4–24
4.5.1 Types of Power Supply 4–24
4.5.1.1 AC Input Power 4–25
4.5.1.2 Power System 4–27
4.5.2 Power-On/Off Procedures of Main Line Switch 4–27
4.5.2.1 Power-On 4–28
4.5.2.2 Power-Off 4–28
4.5.3 Main Line Switch Locations 4–28
4.5.3.1 SPARC Enterprise M8000 Server Single-Phase Power
Feed 4–29
4.5.3.2 SPARC Enterprise M8000 Server Single-Phase and Dual
Power Feed 4–30
viii SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 11
4.5.3.3 SPARC Enterprise M8000 Server Three-Phase Power Feed 4–31
4.5.3.4 SPARC Enterprise M9000 Server Single-Phase Power Feed 4–32
4.5.3.5 SPARC Enterprise M9000 Server Single-Phase and Dual Power Feed 4–33
4.5.3.6 SPARC Enterprise M9000 Server Three-Phase Power Feed 4–34
4.6 Emergency Switch-Off 4–35
4.7 Cable Routing of the SPARC Enterprise M8000 Server 4–35
4.7.1 Cable Routing When the External I/O Expansion Unit Mounted 4–35
4.7.1.1 Precautions For Cable Routing 4–36
4.7.1.2 When Three External I/O Expansion Units Mounted
4–38
4.7.1.3 For Server Using Three-phase Power Feed 4–42
4.7.1.4 When One External I/O Expansion Unit Mounted 4–
46
5. Internal Components Access 5–1
5.1 How to Open and Close Doors 5–1
5.2 Corresponding Components and Doors That Can Be Accessed 5–2
5.3 How to Remove a Door 5–3
5.4 How to Remove a Side Panel 5–4
6. Replacement of CPU/Memory Board Unit (CMU), CPU, and DIMM 6–1
6.1 Overview of the CMU 6–1
6.2 CPU Upgrade 6–5
6.2.1 SPARC64 VII CPU Modules Added to a New Domain 6–5
6.2.2 SPARC64 VII Processors Added to an Existing Domain 6–8
6.2.2.1 Preparing to Add SPARC64 VII Processors to anExisting
Domain 6–8
Contents ix
Page 12
6.2.2.2 Adding a SPARC64 VII CPU Module to a Domain Configured With SPARC64 VI 6–10
6.2.2.3 Upgrading a SPARC64 VI CPU Module to SPARC64 VII on an Existing Domain 6–11
6.3 Active Replacement and Hot Replacement 6–12
6.4 Cold Replacement 6–21
6.5 CPU and DIMM Replacement 6–25
6.5.1 Replacing a CPU Module 6–27
6.5.2 Memory Module Mounting Conditions 6–37
6.5.2.1 Confirmation of DIMM Information 6–38
6.5.2.2 DIMM Mounting Conditions 6–39
6.5.3 DIMM Replacement 6–41
7. I/O Unit (IOU) Replacement 7–1
7.1 Overview of the IOU 7–2
7.2 Active Replacement and Hot Replacement 7–8
7.3 Cold Replacement 7–16
8. FAN Unit Replacement 8–1
8.1 Overview of the FAN Unit 8–2
8.2 Active Replacement and Hot Replacement 8–8
8.3 Cold Replacement 8–13
9. Power Supply Unit (PSU) Replacement 9–1
9.1 Overview of the PSU 9–1
9.2 Active Replacement and Hot Replacement 9–8
9.3 Cold Replacement 9–11
10. Operator Panel Replacement 10–1
10.1 Overview of the Operator Panel 10–1
10.2 Cold Replacement 10–4
x SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 13
11. XSCF Unit Replacement 11–1
11.1 Overview of the XSCFU 11–1
11.2 Active Replacement and Hot Replacement 11–5
11.3 Cold Replacement 11–11
12. Hard Disk Drive (HDD) Replacement 12–1
12.1 Overview of the HDD 12–1
12.2 Active Replacement 12–2
12.3 Cold Replacement 12–5
13. PCI Slot Device Replacement 13–1
13.1 Overview of PCI Slot Devices 13–1
13.2 Active Replacement 13–3
13.3 Hot Replacement 13–16
13.4 Cold Replacement 13–19
14. CD-RW/DVD-RW Drive Unit Replacement 14–1
14.1 Overview of a CD-RW/DVD-RW Drive Unit 14–1
14.2 Active Replacement 14–4
14.3 Hot Replacement 14–9
14.4 Cold Replacement 14–11
15. Tape Drive Unit Replacement 15–1
15.1 Overview of the Tape Drive Unit 15–1
15.2 Active Replacement 15–5
15.3 Hot Replacement 15–9
15.4 Cold Replacement 15–10
16. Clock Control Unit Replacement 16–1
16.1 Overview of the CLKU 16–1
16.2 Cold Replacement 16–3
Contents xi
Page 14
17. Crossbar Unit Replacement 17–1
17.1 Overview of XBUs 17–1
17.2 Cold Replacement 17–3
18. AC Section Replacement 18–1
18.1 Overview of ACSs 18–1
18.2 Cold Replacement 18–5
19. DDC Replacement 19–1
19.1 Overview of the DDC 19–1
19.2 Active Replacement and Hot Replacement 19–3
19.3 Cold Replacement 19–6
20. Backplane Replacement 20–1
20.1 Overview of the BP 20–1
20.2 Cold Replacement 20–1
20.2.1 M8000/M9000 BPs 20–2
20.2.2 PSU BP 20–14
20.2.3 FAN BP 20–21
21. Sensor Unit Replacement 21–1
21.1 Overview of the SNSU 21–1
21.2 Cold Replacement 21–4
22. Media Backplane Replacement 22–1
22.1 Overview of the MEDBP 22–1
22.2 Cold Replacement 22–5
23. Switch Backplane Replacement 23–1
23.1 Overview of SWBPs 23–1
23.2 Cold Replacement 23–5
xii SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 15
24. Addition and Deletion of a RDPF Option, Power Cabinet, and M9000 Expansion Cabinet 24–1
24.1 Addition and Deletion of Rack-mountable Dual Power Feed 24–2
24.1.1 Overview of RDPF 24–2
24.1.2 Addition and Deletion 24–3
24.1.2.1 Addition 24–3
24.1.2.2 Deletion 24–12
24.2 Addition and Deletion of Power Cabinet 24–12
24.3 Addition and Deletion of M9000 Expansion Cabinet 24–15
25. Addition and Deletion of CMU, DIMM, IOU, HDD, PCI Cards and TAPEU 25–1
25.1 Addition 25–2
25.1.1 Active Addition 25–2
25.1.2 Cold Addition 25–3
25.2 Deletion 25–4
25.2.1 Active Deletion 25–5
25.2.2 Cold Deletion 25–5
A. System Configuration A–1
A.1 Installation Conditions A–1
A.2 System Configuration A–2
A.2.1 SPARC Enterprise M8000 Server A–2
A.2.2 SPARC Enterprise M9000 Server (Base Cabinet) A–4
A.2.3 SPARC Enterprise M9000 Server (Base Cabinet + Expansion
Cabinet) A–6
B. Components B–1
B.1 CPU Memory Board Unit B–4
B.2 CPU Module B–5
B.3 Memory B–7
Contents xiii
Page 16
B.4 I/O Unit B–8
B.5 Hard Disk Drive B–10
B.6 PCI Cassette B–10
B.7 IOU Onboard Device Card B–12
B.8 Link Card (External I/O Expansion Unit Connection Card) B–13
B.9 Crossbar Unit B–14
B.10 Clock Control Unit B–16
B.11 XSCF Unit B–17
B.12 CD-RW/DVD-RW Drive Unit B–19
B.13 Tape Drive Unit B–20
B.14 Operator Panel B–21
B.15 Sensor Unit B–23
B.16 Power Supply Unit B–24
B.17 AC Section B–25
B.18 FAN Unit B–30
B.19 Power Cabinet B–32
B.20 Rack-mountable Dual Power Feed B–34
B.21 Backplane B–35
B.22 DDC B–37
B.23 PSU Backplane B–38
B.24 FAN Backplane B–39
B.25 Media Backplane B–42
B.26 Switch Backplane B–43
C. External Interface Specifications C–1
C.1 Serial Port C–1
C.2 UPC Port C–2
C.3 USB Port C–2
C.4 Connection Diagram for Serial Cable C–3
xiv SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 17
D. UPS Controller D–1
D.1 Overview D–1
D.2 Signal Cable D–1
D.3 Configuration of Signal Lines D–2
D.4 Power Supply Conditions D–4
D.4.1 Input Circuit D–4
D.4.2 Output Circuit D–4
D.5 UPS Cable D–5
D.6 Connections D–6
D.7 UPC port D–7
E. XSCF Unit Replacement When XCP 1040 or 1041 Is in the Server E–1
Abbreviations Abbreviations–1
Index Index–1
Contents xv
Page 18
xvi SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 19

Preface

This manual a maintenance manual for the SPARC Enterprise™ M8000/M9000 servers. The manual explains basic operations and detailed replacement procedures for field-replaceable units (FRUs), which are components that can be replaced at the customer's site.
As a rule, Fujitsu certified service engineers on the SPARC Enterprise M8000/M9000 servers. However, another person such as the system administrator may perform some of the simple work, under the direction of one of the service engineers. This manual is intended for the service engineers and other persons described above.
Notes precede the chapters, sections, and paragraphs that cover the work that only Fujitsu-certified service engineers are allowed to perform. Follow instructions for all work.
This section explains:
“Glossary” on page xvii
“Structure and Contents of This Manual” on page xviii
“SPARC Enterprise M8000/M9000 Servers Documentation” on page xxi
“Product Handling” on page xxiv
“Limitations and Cautions” on page xxvi
“Fujitsu Welcomes Your Comments” on page xxvii
Glossary
For the terms used in the “SPARC Enterprise M8000/M9000 Servers
Documentation” on page xxi, refer to the SPARC Enterprise
M3000/M4000/M5000/M8000/M9000 Servers Glossary.
xvii
Page 20
Structure and Contents of This Manual
This manual is organized as described below:
PART I Basic Information for Maintenance and Troubleshooting
Provides notes on handling the SPARC Enterprise servers and rules about operation and descriptions, and it also describes the required tools for maintenance.
Chapter 1 Safety and Tools:
Provides notes on handling the SPARC Enterprise servers and rules about operation and descriptions, and it describes the required tools for maintenance.
Chapter 2 Product Overview and Troubleshooting:
Provides information that is required in troubleshooting.
Chapter 3 Periodic Maintenance:
Explains the maintenance work that must be performed regularly regardless of whether a problem has occurred. The actual work is limited to preventing dust in the environment from creating pollution.
Chapter 4 FRU Removal Preparation:
Explains the required basic operations for replacing components.
PART II Maintenance
Explains how to remove and replace FRUs.
Chapter 5 Internal Components Access:
Explains how to access each part of the system.
Chapter 6 Replacement of CPU/Memory Board Unit (CMU), CPU, and DIMM:
Explains how to replace each storage device.
Chapter 7 I/O Unit (IOU) Replacement:
Explains the replacement procedures for an I/O unit (IOU).
Chapter 8 FAN Unit Replacement:
Explains the replacement procedures for a fan unit.
xviii SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 21
Chapter 9 Power Supply Unit (PSU) Replacement:
Explains the replacement procedures for a power supply unit (PSU).
Chapter 10 Operator Panel Replacement:
Explains the replacement procedures for the operator panel.
Chapter 11 XSCF Unit Replacement:
Explains the replacement procedures for an XSCF unit.
Chapter 12 Hard Disk Drive (HDD) Replacement:
Explains the replacement procedures for a hard disk drive (HDD).
Chapter 13 PCI Slot Device Replacement:
Explains the replacement procedures for a device mounted in a PCI slot of an IOU.
Chapter 14 CD-RW/DVD-RW Drive Unit Replacement:
Explains the replacement procedures for the CD-RW/DVD-RW drive unit.
Chapter 15 Tape Drive Unit Replacement:
Explains the replacement procedures for the tape drive unit.
Chapter 16 Clock Control Unit Replacement:
Explains the replacement procedure for the clock control unit.
Chapter 17 Crossbar Unit Replacement:
Explains the replacement procedure for a crossbar unit.
Chapter 18 AC Section Replacement:
Explains the replacement procedures for a fan unit.
Chapter 19 DDC Replacement:
Explains the replacement procedure for the DDC.
Chapter 20 Backplane Replacement:
Explains the replacement procedure for a backplane.
Chapter 21 Sensor Unit Replacement:
Explains the replacement procedure for the sensor unit.
Chapter 22 Media Backplane Replacement:
Explains the replacement procedure for the media backplane.
Chapter 23 Switch Backplane Replacement:
Explains the replacement procedure for the switch backplane.
Preface xix
Page 22
Chapter 24 Addition and Deletion of a Rack-mountable Dual Power Feed Option,
Power Cabinet, and M9000 Expansion Cabinet:
Explains the replacement procedures for rack-mountable dual power feed(RDPF).
Chapter 25 Addition and Deletion of CMU, DIMM, IOU, HDD, PCI Cards and
TAPEU:
Explains the procedures for adding a unit to the SPARC Enterprise M8000/M9000 servers and deleting a unit from the SPARC Enterprise M8000/M9000 servers.
Appendix A System Configuration:
Describes the installation conditions and configuration of the SPARC Enterprise server.
Appendix B Components:
Provides figures showing the components that compose the SPARC Enterprise servers.
Appendix C External Interface Specifications:
Describes the specifications of the connectors provided on the SPARC Enterprise server unit.
Appendix D UPS Controller:
Describes the connection of UPC interface, which controls UPS (Uninterruptible Power Supply).
Appendix E XSCF Unit Replacement When XCP 1040 or 1041 Is in the Server:
Provides a replacement procedure to use when the server uses an older version of XCP firmware than is present in the replacement XSCFU.
Abbreviations
Provides the full spellings of abbreviations used in this manual.
Index
Provides keywords and corresponding reference page numbers so that the reader can easily search for items in this manual as necessary.
xx SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 23
SPARC Enterprise M8000/M9000 Servers Documentation
The manuals listed below are provided for reference.
Book Titles Manual Codes
SPARC Enterprise M8000/M9000 Servers Site Planning Guide C120-H014
SPARC Enterprise Equipment Rack Mounting Guide C120-H016
SPARC Enterprise M8000/M9000 Servers Getting Started Guide C120-E323
SPARC Enterprise M8000/M9000 Servers Overview Guide C120-E324
Important Safety Information for Hardware Systems C120-E391
SPARC Enterprise M8000/M9000 Servers Safety and Compliance Guide C120-E326
External I/O Expansion Unit Safety and Compliance Guide C120-E457
SPARC Enterprise M8000/M9000 Servers Unpacking Guide C120-E327
SPARC Enterprise M8000/M9000 Servers Installation Guide C120-E328
SPARC Enterprise M8000/M9000 Servers Service Manual C120-E330
External I/O Expansion Unit Installation and Service Manual C120-E329
SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers RCI Build Procedure
SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers Administration Guide
SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide
SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF Reference Manual
SPARC Enterprise M4000/M5000/M8000/M9000 Servers Dynamic Reconfiguration (DR) User’s Guide
SPARC Enterprise M4000/M5000/M8000/M9000 Servers Capacity on Demand (COD) User’s Guide
SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers RCI User’s Guide
SPARC Enterprise M8000/M9000 Servers Product Notes Go to the Web
C120-E361
C120-E331
C120-E332
Go to the Web
C120-E335
C120-E336
C120-E360
Preface xxi
Page 24
Book Titles Manual Codes
External I/O Expansion Unit Product Notes C120-E456
SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers
C120-E514
Glossary
SPARC Enterprise /PRIMEQUEST Common Installation Planning
C120-H007
Manual
xxii SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 25
1. Manuals on the Web
The latest versions of all the SPARC Enterprise Series manuals are available at the following websites.
Global Site
http://www.fujitsu.com/sparcenterprise/manual/
Japanese Site
http://primeserver.fujitsu.com/sparcenterprise/manual/
Note – Product Notes are available on the website only. Please check for the most
recent update on your product.
2. Documentation CD
For the Documentation CD, please contact your local sales representative.
SPARC Enterprise M8000/M9000 Servers Documentation CD (C120-E364)
3. Manual on the Enhanced Support Facility x.x CD-ROM disk
Remote maintenance service
Book Title Manual Code
Enhanced Support Facility User's Guide for REMCS C112-B067
4. Manual (man page) provided in the system
XSCF man page
Note – The man page can be referenced on the XSCF Shell, and it provides the same
content as the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF Reference Manual.
5. Sun Microsystems Software (for Solaris OS, etc.) Related Manuals
http://docs.sun.com
6. Information on Using the RCI function
This manual does not contain an explanation of the RCI build procedure. For information on using the RCI function, refer to the RCI Build Procedure and RCI User’s Guide provided on the website.
Preface xxiii
Page 26
Product Handling
Maintenance
Caution – Certain tasks in this manual should only be performed by a certified
service engineer. User must not perform these tasks. Incorrect operation of these tasks may cause electric shock, injury, or fire.
Installation and reinstallation of all components, and initial settings
Removal of front, rear, or side covers
Mounting/de-mounting of optional internal devices
Plugging or unplugging of external interface cards
Maintenance and inspections (repairing, and regular diagnosis and maintenance)
Caution – The following tasks regarding this product and the optional products
provided from Fujitsu should only be performed by a certified service engineer. Users must not perform these tasks. Incorrect operation of these tasks may cause malfunction.
Unpacking optional adapters and such packages delivered to the users
Plugging or unplugging of external interface cards
Remodeling/Rebuilding
Caution – Any modification and/or recycling of this product and its components
may be carried out only by a certified service engineer and must not be done by the customer under any circumstances. Otherwise, electric shock, injury or fire may result.
xxiv SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 27
Emission of Laser Beam (Invisible)
Caution – The main unit and high-speed optical interconnect cabinet contain
modules that generate invisible laser radiation. Laser beams are generated while the equipment is operating, even if an optical cable is disconnected or a cover is removed. Do not look at any light-emitting part directly or through an optical apparatus (e.g., magnifying glass, microscope).
Preface xxv
Page 28
Limitations and Cautions
Power Control and Operator Panel Mode Switch
When you use the remote power control utilizing the RCI function or the automatic power control system (referred to below as APCS), you can disable this remote power control or the APCS by switching to Service mode on the operator panel.
Disabling these features ensures that you do not unintentionally switch the system power on or off during maintenance. Note system power off with the APCS cannot be disabled with the mode switch. Therefore, be sure to turn off automatic power control via APCS before starting maintenance.
If you switch the mode while using the RCI or the automatic power control, the system power is controlled as follows.
Function Mode switch
Locked Service
RCI Remote power-on/power-off
operations are enabled.
Automatic power control
To use the RCI function, see the SPARC Enterprise
M3000/M4000/M5000/M8000/M9000 Servers RCI Build Procedure and the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers RCI User’s Guide which are
available on the website of manuals.
To use the APCS, see the Enhanced Support Facility User's Guide for Machine Administration Automatic Power Control Function (Supplement Edition) .
xxvi SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Automatic power-on/power­off operations are enabled.
Remote power-on/power-off operations are disabled.
Automatic power-on is disabled, but power-off remains enabled.
Page 29
Fujitsu Welcomes Your Comments
If you have any comments or requests regarding this document, or if you find any unclear statements in the document, please state your points specifically on the form at the following URL.
For Users in U.S.A., Canada, and Mexico:
http://www.computers.us.fujitsu.com/www/support_servers.shtml?support /servers
For Users in Other Countries:
SPARC Enterprise contact
http://www.fujitsu.com/global/contact/computing/sparce_index.html
Preface xxvii
Page 30
xxviii SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 31
PA R T
I Basic Information for Maintenance and
Troubleshooting
Part I provides maintenance information, explains methods of problem analysis, troubleshooting, and basic operations for replacing FRUs.
Page 32
Page 33
CHAPTER
1

Safety and Tools

This chapter provides notes on handling the high-end server descriptions and rules about operation, and it lists the tools required for maintenance.
This information is explained in the following sections:
Section 1.1, “Symbols” on page 1-1
Section 1.2, “Precautions” on page 1-4
Section 1.3, “Tools Required for Maintenance” on page 1-5

1.1 Symbols

1.1.1 Text Conventions
This manual uses the following fonts and symbols to express specific types of information.
Fonts/symbols Meaning Example
AaBbCc123 What you type, when
contrasted with on-screen computer output.
This font represents the example of command input in the frame.
XSCF> adduser jsmith
1-1
Page 34
Fonts/symbols Meaning Example
AaBbCc123 The names of commands, files,
and directories; on-screen computer output.
This font represents the example of command input in the frame.
Italic Indicates the name of a
reference manual.
" " Indicates names of chapters,
sections, items, buttons, or menus.
1.1.2 Prompt Notations
The following prompt notations are used in this manual.
Shell Prompt Notations
XSCF XSCF>
C shell machine-name%
C shell super user machine-name#
Bourne shell and Korn shell $
Bourne shell and Korn shell super user #
OpenBoot™ PROM ok
XSCF> showuser -P User Name: jsmith Privileges: useradm
auditadm
See the SPARC Enterprise M3000/M4000/M5000/M8000/M 9000 Servers XSCF User’s Guide .
See Chapter 2, "Product Overview and Troubleshooting."
1.1.2.1 Command syntax
The command syntax is as follows:
A variable that requires input of a value is enclosed in <>.
An optional element is enclosed in [ ].
A group of options for an optional keyword is enclosed in [ ] and delimited by |.
A group of options for a mandatory keyword is enclosed in {} and delimited by |.
The command syntax is shown in a box.
Example:
XSCF> showuser -a
1-2 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 35
1.1.3 Environmental Requirements for Using This Product
This product is a computer that is intended to be used in a computer room. For details on the operational environment, see the SPARC Enterprise M8000/M9000 Servers Site Planning Guide.
1.1.4 Conventions for Alert Messages
This manual uses the following conventions to show alert messages, which are intended to prevent injury to the user or bystanders as well as property damage, and important messages that are useful to the user.
This indicates a hazardous situation that could result in death or serious personal injury (potential hazard) if the user does not perform the procedure correctly.
This indicates a hazardous situation that could result in minor or moderate personal injury if the user does not perform the procedure correctly. This signal also indicates that damage to the product or other property may occur if the user does not perform the procedure correctly.
This indicates information that could help the user to use the product more effectively.
1.1.4.1 Alert Messages in the Text
An alert message in the text consists of a signal indicating an alert level followed by an alert statement. Alert messages are indented to distinguish them from regular text as shown in the following example. Also, a space of one line precedes and follows an alert statement.
The tasks listed below for this product and optional product provided by Fujitsu should be performed only by authorized service personnel. The user must not perform these tasks. Incorrect operation of these tasks may cause electric shock, injury, or fire.
Installation and reinstallation of all components
Removal of front, rear, or side covers
Mounting/unmounting of optional internal devices
Connecting/disconnecting of external interface cables
Maintenance (repair and regular diagnosis and maintenance)
Chapter 1 Safety and Tools 1-3
Page 36

1.2 Precautions

The following notes must be observed in maintenance work:
1.2.1 Operating Environment of the Product
Use the SPARC Enterprise in the correct operating environment. The SPARC Enterprise are assumed to be used in a computer room. For details of the operating environment, see the SPARC Enterprise M8000/M9000 Servers Site Planning Guide.
1.2.2 Maintenance
The work listed below is to be performed by authorized service engineers. Persons who are not authorized service engineers must not perform the work. Otherwise, electric shock, injury, or fire may result.
Installation, transport, and initial setup of each device
Removal of the front, rear, or a side cover.
Mounting or removing internal optional components
Connecting or disconnecting an external interface cable
Maintenance (repair, regular diagnosis, and maintenance)
The work listed below is to be performed by authorized service engineers. Persons who are not authorized service engineers must not perform the work. Otherwise, an equipment failure may result.
Unpacking or installing products, such as an optional adapter, that are delivered
to the customer
Connecting or disconnecting an external interface cable
1-4 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 37
1.2.3 Conversion and Reuse of This Product
If this product is converted or a used article of this product is overhauled for the purpose of reuse, unexpected injury on users and bystanders or damage to their property may result.

1.3 Tools Required for Maintenance

The maintenance work described in Chapter 6 to Chapter 24 requires maintenance software to confirm that the SPARC Enterprise and other components are operating correctly and to collect status information and log data about the server and components. The work for mounting, removing, or replacing a specific component requires screwdrivers, and special tools such as an antistatic wrist strap. These items are listed in
TABLE 1-1 Maintenance Tools
No. Name Use
1 Torque wrench
[8.24 N*m (84 kgf*cm)]
2 Sockets for 10 mm (M6) torque
wrench
3 Sockets for 13 mm (M8) torque
wrench
4 Torque wrench extension
5 Torque screwdriver
[0.2 N*m (2.0 kgf*cm)]
6 Slotted bit Used to secure the clock cables between the cabinets if the
7 Wrist strap For antistatic purposes
8 Conductive mat For antistatic purposes
9 CPU module replacement tool For mounting and removing CPU Modules (accessory)
10 SunVTS Test program
TABLE 1-1.
Used to fix the bus bars of the power cabinet.
Used to replace the BP_A in the SPARC Enterprise M8000 server.
Used to fix the bus bars of the power cabinet.
Used to secure the clock cables between the cabinets if the expansion cabinet of the SPARC Enterprise M9000 server is mounted.
expansion cabinet of the SPARC Enterprise M9000 server is mounted.
Chapter 1 Safety and Tools 1-5
Page 38
Caution – Be sure to wear an antistatic wrist strap when handling components.
Place removed components on an antistatic conductive mat. Failure to do so may result in serious damage or injury.
1-6 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 39
CHAPTER
2

Product Overview and Troubleshooting

This chapter provides information that is required in troubleshooting.
This information is explained in the following sections:
Section 2.1, “System Views” on page 2-1
Section 2.2, “Labels” on page 2-11
Section 2.3, “Operator Panel” on page 2-16
Section 2.4, “Determining Which Diagnostics Methods To Use” on page 2-21
Section 2.5, “Checking the Server and System Configuration” on page 2-23
Section 2.6, “Error Conditions and Action To Be Taken” on page 2-26
Section 2.7, “LED Error Display” on page 2-30
Section 2.8, “Using the Troubleshooting Commands” on page 2-34
Section 2.9, “Traditional Solaris Troubleshooting Commands” on page 2-37

2.1 System Views

This section provides views of the high-end server. The figures can be used to locate the component in the server to be subjected to maintenance.
In terms of its structure, the high-end server consists of a cabinet that includes various mounted components and a front door, rear door, and side covers that protect the mounted components. The side covers are removed when cabinets are connected to each other or when the dual power feed option is connected to the cabinet. The operator panel, which is mounted on the front door, is always accessible. Each door can be locked with a key so that only the administrator can open it.
2-1
Page 40
The front and rear views of FIGURE 2-1, FIGURE 2-2, FIGURE 2-4, FIGURE 2-5, FIGURE 2-7, and
FIGURE 2-8 include names and abbreviations for field-replaceable units (FRUs).
Components that are mounted inside the system are shown and
FIGURE 2-9. The abbreviations are used in messages and the like. If multiple
FIGURE 2-3, FIGURE 2-6,
FRUs of the same type are mounted, the number sign # and a sequential number is added to their names to distinguish them from one another. Owing to the reduced scale, certain components (FRUs) are difficult to show in the figures. Accordingly, the layout of these components as viewed from one side is indicated in the table connected by a lead line to the component location.
2-2 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 41
2.1.1 SPARC Enterprise M8000 Server
FIGURE 2-1 Front View - M8000
PSU
DDC
XSCFU
TAPEU
DVDU
SNSU
FAN_B
CMU
FAN_A
Air Filter
Chapter 2 Product Overview and Troubleshooting 2-3
Page 42
FIGURE 2-2 Rear View - M8000
ACS
FAN_B
IOU
Air Filter
2-4 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 43
PSUBP_B
PSUBP_A
FIGURE 2-3 Internal View - M8000
FANBP_C
MEDBP
FANBP_C
BP_A
SWBP
Chapter 2 Product Overview and Troubleshooting 2-5
Page 44
2.1.2 SPARC Enterprise M9000 Server (Base Cabinet)
FIGURE 2-4 Front View - M9000 (Base Cabinet)
PSU
TAPEU
DVDU
SNSU
FAN_A
ACS
XBU
CLKU
XSCFU
IOU
Air Filter
2-6 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 45
FIGURE 2-5 Rear View - M9000 (Base Cabinet)
FAN_A
CMU
IOU
Air Filter
Chapter 2 Product Overview and Troubleshooting 2-7
Page 46
FIGURE 2-6 Internal View - M9000 (Base Cabinet)
PSUBP_A
BP_B
MEDBP
SWBP
FANBP_B
FANBP_A
2-8 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 47
2.1.3 SPARC Enterprise M9000 Server (Expansion Cabinet)
FIGURE 2-7 Front View - M9000 (with the Expansion Cabinet)
PSU
cable support bracket
TAPEU
DVDU
SNSU
FAN_A
ACS
XBU
CLKU
XSCFU
IOU
Air Filter
Chapter 2 Product Overview and Troubleshooting 2-9
Page 48
FIGURE 2-8 Rear View - M9000 (with the Expansion Cabinet)
FAN_A
CMU
IOU
Air Filter
2-10 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 49
FIGURE 2-9 Internal View - M9000 (with the Expansion Cabinet)
PSUBP_A
BP_B
MEDBP
SWBP

2.2 Labels

FANBP_B
FANBP_A
2.2.1 System Name Plate Label, Rating Label, ID Label (Japan) or EZ Label (besides Japan), and Standard Label
The important labels affixed on this server are shown in FIGURE 2-10 and FIGURE 2-11. The actual description on the labels may differ from
The system name plate label includes the model number, serial number, and
hardware version, all of which are required for maintenance and management.
The rating label, which is affixed near the AC power supply, includes the power
input rating for the AC power supply.
Chapter 2 Product Overview and Troubleshooting 2-11
FIGURE 2-10 and FIGURE 2-11.
Page 50
The ID label or EZ label is affixed on the front door of the server, and it includes
the model name and serial number, both of which are written on the system name plate label.
ID label (Japan)
The standard label is affixed near the system name plate label, and it includes the
EZ label (besides Japan)
certification standards that apply: Safety: NRTL/C Electrical interference: VCCI-A, FCC-A, DOC-A, and MIC Safety and electrical interference: CE
2-12 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 51
FIGURE 2-10 M8000 Label Location
System Name Plate Label
Front Rear
Standard label
Chapter 2 Product Overview and Troubleshooting 2-13
Page 52
FIGURE 2-11 M9000 Label Location
System Name Plate Label
Front
Rear
Standard label
2-14 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 53
2.2.2 Labels About Handling
The labels shown below, which are affixed on the high-end server, provide field engineers with important information on component removal and mounting.
Caution – Never peel off the labels.
Removing and installing a CPU/memory board unit (CMU)
Removing a crossbar unit (XBU)
Chapter 2 Product Overview and Troubleshooting 2-15
Page 54
Removing an I/O unit (IOU)

2.3 Operator Panel

The operator panel controls the high-end server power. The operator panel is usually locked with a key to prevent the server from being mistakenly powered off through an operator error during system operation.
Before starting maintenance work, ask the system administrator to unlock the operation panel.
2.3.1 Operator Panel Location
FIGURE 2-12 indicates the location of the operator panel (OPNL) of the high-end
servers. The expansion cabinet is not equipped with the operator panel.
2-16 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 55
OPNL
FIGURE 2-12 Operator Panel Location (at the Front of M8000)
2.3.2 Appearance and Operations
The operator panel can be used while the front door of the server is closed. Field engineers, and the system administrator use the operation panel to check the operating state of the server and to perform system power operations. To check the operating state of the server, look at the LEDs. The operating state of the server is checked by observing the LEDs, and the power supply is operated with the POWER switch.
FIGURE 2-13 shows the appearance of the operator panel.
Chapter 2 Product Overview and Troubleshooting 2-17
Page 56
FIGURE 2-13 Operator Panel
2.3.3 LED
TABLE 2-1 lists the states of the server that are displayed with the LEDs on the
operator panel.
The blinking period is one second (frequency of 1 Hz).
Besides the states listed in of the server using combinations of the three LEDs.
TABLE 2-1, the operator panel also displays various states
TABLE 2-2 indicates the states that
are usually displayed in the course of operation from the power-on to power-off of the high-end server.
TABLE 2-1 State Display by the LEDs (Operator Panel)
LED Name Light color Description of function and state
POWER Green Indicates whether power to the SPARC Enterprise server is on.
Off Indicates the power-off state.
Lit Indicates the power-on state.
Blinking The power-off sequence is in progress.
STANDBY Green Indicates whether the XSCF can be powered on.
XSCF Off Indicates that the system cannot be powered on.
Blinking Indicates that initialization processing of the SPARC
Enterprise server is in progress after main line switches were switched on.
Lit Indicates that the system can be powered on.
2-18 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 57
TABLE 2-1 State Display by the LEDs (Operator Panel) (Continued)
LED Name Light color Description of function and state
CHECK Amber Indicates the operating status of the SPARC Enterprise server.
Off Normal state. Otherwise, this indicates that the main
line switches were switched off or a power failure occurred.
Blinking (*1)
Indicates that the operator panel is the maintenance target device.
Lit Indicates that the server cannot be started.
Note – *1)If the maintenance target component is indicted by a blinking CHECK
LED, the LED may be called a locater.
TABLE 2-2 State Display by LED Combination (Operator Panel)
LED
POWER XSCF
STANDBY
CHECK
Description of the state
Off Off Off The main line switch is switched off.
Off Off On The main line switch is switched on.
Off Blinking Off The XSCF is being initialized.
Off Blinking On An error occurred in the XSCF.
Off On Off • The XSCF is on standby.
• The system is waiting for power-on of the air conditioning system.
On On Off • Warm-up standby processing is in progress (power-on is
delayed).
• The power-on sequence is in progress.
• The system is in operation.
Blinking On Off • The power-off sequence is in progress.
• Fan termination is being delayed.
Chapter 2 Product Overview and Troubleshooting 2-19
Page 58
2.3.4 Switch
The operator panel has the mode switch, which sets the operation mode, and the POWER switch, which is used to power on and off the system.
TABLE 2-3 Switches (Operator Panel)
Switch Name Description of function
Mode This key switch is used to set an operation mode for the server.
Insert the special key that is under the customer’s control, to switch between modes.
Locked Normal operation mode
• The system can be powered on with the POWER switch, but it cannot be powered off with the POWER switch.
• The key can be pulled out at this key position.
Service Mode for maintenance
• The system can be powered on and off with the POWER switch.
• The key cannot be pulled out at this key position.
• Maintenance is performed in Service mode while the server is stopped.
POWER This switch is used to control the server power.
Power-on and power-off are controlled by pressing this switch in different patterns, as described below.
Holding down for a short time
(less than 4 seconds)
Holding down for a long time in Service mode
(4 seconds or longer)
Regardless of the mode switch state, the server (all domains) is powered on.
At this time, processing for waiting for facility (air conditioners) power-on and warm-up completion is skipped.(*1)
• If power to the server is on (at least one domain is operating), shutdown processing is executed for all domains before power-off processing.
• If the system is being powered on, the power-on processing is cancelled, and the system is powered off.
• If the system is being powered off, the operation of the POWER switch is ignored, and the power-off processing is continued.
Note – *1)In normal operation, the server is powered on only when the computer
room environmental conditions satisfy the specified values. Then, the server remains in the reset state until the operating system is booted.
2-20 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 59
TABLE 2-4 Meanings of the Mode Switch
Function Mode Switch
Locked Locked
Inhibition of Break Signal Reception Enabled. Reception of the break signal
can be enabled or disabled for each domain using setdomainmode.
Power On/Off by power switch Only power on is enabled Enabled
Disabled

2.4 Determining Which Diagnostics Methods To Use

When an error occurs, a message is often displayed on the monitor. Use the flowcharts in
FIGURE 2-14 to find the correct methods for diagnosing problems.
Chapter 2 Product Overview and Troubleshooting 2-21
Page 60
FIGURE 2-14 Diagnostic Method Flow Chart
OS panic occurred or there’s an error on performance
Start
Check OS console and XSCF console for error information displayed
Check /var/adm/messages
on Solaris OS
FMA message?
YES
Execute information
fmadm to display fault
Message ID
available?
e-mail sent or not by XSCF mail function?
NO
Is there error message
on XSCF console?
NO
NO
YES
YESNO
Execute showlogs or fmadm on XSCF to display the fault information
YES
Write down the displayed fault information
Use fmadm ?
NO
YES
Enter Message ID in
http://sun.com/msg/ to
refer to fault information
Trouble
resolved?
YES
2-22 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
NO
Collect information about your server
Contact service engineer
End
Page 61

2.5 Checking the Server and System Configuration

Before and after maintenance work, the state and configuration of the server and components should be checked and the information saved. For recovery from a problem, conditions related to the problem and the repair status must be checked. The operating conditions must remain the same before and after maintenance.
A functioning server without any problems should not display any error conditions. For example:
The syslog file should not display error messages.
* mark is not displayed in SCF shell command showhandconf
The administrative console should not display error messages.
The server processor logs should not display any error messages.
The Solaris™ Operating System (Solaris OS) message files should not indicate any
additional errors.
2.5.1 Checking the Hardware Configuration and FRU Status
To replace a faulty component and perform the maintenance on the server it is important to check and understand the hardware configuration of the server and the state of each hardware component.
The hardware configuration refers to information that indicates to what layer a component belongs in the hardware configuration.
The status of each hardware component refers to information on the condition of the standard or optional component in the server: temperature, power supply voltage, CPU operating conditions, and other times.
The hardware configuration and the status of each hardware component can be checked from the maintenance terminal using XSCF Shell commands.
TABLE 2-5 lists commands for checking the hardware configuration and status. For
details, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF Reference Manual.
Chapter 2 Product Overview and Troubleshooting 2-23
Page 62
TABLE 2-5 Commands for Checking Hardware Configuration and Status
Command Description
showhardconf Displays the system layer that includes a faulty component.
showstatus Displays the status of a component. This command is used to check only a faulty
component.
showboards Displays the use status of individual devices and resources.
showdcl Displays domain configuration information (hardware resource information).
showfru Displays device setting information.
ioxadm Displays the FRU status of external I/O expansion unit as normal or abnormal.
Also some conditions can be checked based on the lit and/ or blinking state of the component LEDs (
TABLE 2-11 and TABLE 2-12).
2.5.1.1 Checking the Hardware Configuration
Login authority is required to check the hardware configuration. The following procedure for these checks can be made from the maintenance terminal. Ask the system administrator for necessary information, such as a password. For the detailed procedure, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide.
1. Log in to the XSCF.
2. Execute the showhardconf command.
XSCF> showhardconf
The showhardconf command will print the hardware configuration information to the screen. See the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide for more detailed information.
2.5.2 Checking the Software and XSCF Firmware Configurations
The software and firmware configurations and versions affect the operation of the server. To change the configuration or investigate a problem, check the latest information and check for any problems in the software.
Software and firmware varies according to users.
2-24 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 63
The software configuration and version can be checked in the Solaris Operating
System. Refer to the Solaris OS documentation for more information.
The firmware configuration and versions can be checked from the maintenance
terminal using XSCF Shell commands. Refer to the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide for more detailed
information.
Check the software and firmware configuration information with assistance from the system administrator. However, if you have received login authority from the system administrator, the following commands can be used from the maintenance terminal for these checks:
TABLE 2-6 Commands for Checking the Software Configuration
Command Description
showrev(1M) Displays information on patches applied to the system.
uname(1) Outputs current information regarding the system to the standard output.
TABLE 2-7 Commands for Checking the XSCF Firmware Configuration
Command Description
version(8) XSCF Shell command that outputs the current firmware version information.
showhardconf(8) XSCF Shell command that displays what layer of the system includes a faulty
component.
showstatus(8) XSCF Shell command that displays the status of a component. This command is used
when only a faulty component is to be checked.
showdcl(8) XSCF Shell command that displays the configuration information of a domain
(hardware resource information).
showfru(8) XSCF Shell command that displays the setting information of a device.
2.5.2.1 Checking the Software Configuration
The following procedure for these checks can be made from any terminal window terminal.
1. Execute the showrev command.
# showrev
The showrev command will print the system configuration information to the screen.
Chapter 2 Product Overview and Troubleshooting 2-25
Page 64
2.5.2.2 Checking the Firmware Configuration
Login authority is required to check the firmware configuration. The following procedure for these checks can be made from the maintenance terminal:.
1. Log in to the XSCF.
2. Execute the version command.
XSCF> version
The version command will print the firmware version information to the screen. See the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide for more detailed information.
2.5.3 Downloading the Error Log Information
If you want to download the error log information, use the XSCF log fetch function. The XSCF Unit has an interface with external units so that a service engineer can easily obtain useful maintenance information such as error logs
Connect the maintenance terminal, and use the CLI or BUI to issue a download instruction to the maintenance terminal to download Error Log information over the XSCF-LAN.
Note – When the XSCF unit has a redundant configuration, log in also to the
standby XSCF and obtain the log file in the same manner.

2.6 Error Conditions and Action To Be Taken

This section describes error conditions and relevant corrective actions.
This work is explained in the following sections:
Section 2.6.1, “Predictive Self-Healing Tools” on page 2-27
Section 2.6.2, “Monitoring Output” on page 2-28
Section 2.6.3, “Messaging Output” on page 2-29
Details of the fault information, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide.
2-26 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 65
You can find more detailed descriptions of Solaris OS Predictive Self-Healing at the website below:
http://www.sun.com/bigadmin/features/articles/selfheal.html
Predictive self-healing is an architecture and methodology for automatically diagnosing, reporting, and handling software and hardware fault conditions. This new technology lessens the time required to debug a hardware or software problem and provides the administrator and technical support with detailed data about each fault.
2.6.1 Predictive Self-Healing Tools
In Solaris OS, the fault manager runs in the background. If a failure occurs, the system software recognizes the error and attempts to determine what hardware is faulty. The software also takes steps to prevent that component from being used until it has been replaced. Some of the specific activities the software takes include:
Receives telemetry information about problems detected by the system software
Diagnoses the problems
Initiates pro-active self-healing activities. For example, the fault manager can
disable faulty components.
The state of a FRU, group of FRUs, or part of a FRU, that has been isolated because a fault was detected. The isolation is usually done to prevent possibly faulty components from affecting other system components. The part that is isolated is not always the faulty part alone; a normal part may be degraded to isolate the faulty part. If a function required for the operation of the system is degraded, a system failure may result.
When possible, causes the faulty FRU to provide an LED indication of a fault in
addition to populating the system console messages with more details
TABLE 2-8 shows a typical message generated when a fault occurs. The message
appears on your console and is recorded in the /var/adm/messages file.
Note – The message in TABLE 2-8 indicates that the fault has already been diagnosed.
Any corrective action that the system can perform has already taken place. If your server is still running, it continues to run.
Chapter 2 Product Overview and Troubleshooting 2-27
Page 66
TABLE 2-8 Predictive Self Healing Message
Output displayed Description
Nov 1 16:30:20 dt88-292 EVENT-TIME: Tue Nov 1 16:30:20 PST 2005 EVENT-TIME: the time stamp of
the diagnosis.
Nov 1 16:30:20 dt88-292 PLATFORM: SUNW,A70, CSN: -, HOSTNAME: dt88-292
Nov 1 16:30:20 dt88-292 SOURCE: eft, REV: 1.13 SOURCE: Information on the
Nov 1 16:30:20 dt88-292 EVENT-ID: afc7e660-d609-4b2f-86b8-ae7c6b8d50c4
Nov 1 16:30:20 dt88-292 DESC: Nov 1 16:30:20 dt88-292 A problem was detected in the PCI-Express
subsystem
Nov 1 16:30:20 dt88-292 Refer to http://sun.com/msg/SUN4-8000-0Y for more information.
Nov 1 16:30:20 dt88-292 AUTO-RESPONSE: One or more device instances may be disabled
Nov 1 16:30:20 dt88-292 IMPACT: Loss of services provided by the device instances associated with this fault
Nov 1 16:30:20 dt88-292 REC-ACTION: Schedule a repair procedure to replace the affected device. Use Nov 1 16:30:20 dt88-292 fmdump -v -u EVENT_ID to identify the device or contact Sun for support.
PLATFORM: A description of the server encountering the problem.
Diagnosis Engine used to determine the fault.
EVENT-ID: The Universally Unique event ID for this fault.
DESC: A basic description of the failure.
WEBSITE: Where to find specific information and actions for this fault.
AUTO-RESPONSE: What, if anything, the system did to alleviate any follow-on issues
IMPACT: A description of what that response might have done.
REC-ACTION: A short description of what the system administrator should do.
2.6.2 Monitoring Output
To understand error conditions, collect monitoring output information, by using the commands shown below.
TABLE 2-9 lists the commands for checking the monitoring output.
2-28 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 67
TABLE 2-9 Commands for Checking the Monitoring Output
Command Operand Description
showlogs(8) console XSCF firmware collects console logs of console messages that were output
through the XSCF. This command collects all the console messages displayed to users.
monitor Logs the messages displayed in the message window of the BUI/CLI.
panic Saves as panic logs the console logs that are logged when a reset is received after
a panic notification.
ipl Collects the console data generated during a period from power-on of a domain
to completion of operating system startup (system running).
2.6.3 Messaging Output
To understand error conditions, collect messaging output information, by using the commands shown below.
TABLE 2-10 lists the commands for checking the messaging output.
Chapter 2 Product Overview and Troubleshooting 2-29
Page 68
TABLE 2-10 Commands for Checking the Messaging Output
Command Operand Description
showlogs env Collects the temperature history log. The SPARC Enterprise server environmental
temperature data and power status are collected at a 10-minute interval. The data is stored for a maximum of six months.
power Collects the log of power events and reset events. The target range covers the
SPARC Enterprise server, External I/O Expansion units, and UPSs.
event Collects the message which accompanies the command or the progress of
operation such as Dynamic Reconfiguration (DR), the status of operation on the operator panel, the event such as the shut down request to OS due to power failure or abnormal temperature, as event log. This information is used to analyze faults and investigate the use status of individual devices at a customer's site, and it is kept as a maintenance work history.
error Information on the SPARC Enterprise server hardware faults detected by the SCF,
POST/OpenBoot PROM, or ESF machine management and software monitoring error information are logged as SCF error logs. The showlogs error command can display with hexadecimal codes the error information stored in the SCF error log and information on faulty components.
fmdump(1M) fmdump(8)
Hardware and software are automatically diagnosed according to the fault management architecture (FMA), and the diagnosis results and errors are automatically recorded. The fmdump command can display the recorded information. It is provided as a Solaris OS command and XSCF Shell command. The information can be checked at the site at the specified URL by using a displayed message ID.
Each error message logged by the predictive self-healing architecture has a code associated with it as well as a web address that can be followed to get the most up-to-date course of action for dealing with that error.
Refer to the Solaris OS documentation for more information on predictive self-healing.

2.7 LED Error Display

This section explains the LEDs of each FRU that are to be checked when the relevant FRU is replaced. Each LED can be checked after the door of a cabinet is opened.
Whether the state of the entire system is normal can be learned by checking the operator panel (outside). When an error occurs in an individual hardware component in the system, the LEDs of the FRU containing the hardware component
2-30 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 69
that has caused the error indicate that an error has occurred. The LEDs on the operator panel (back) indicate the status of the operator as a single unit. However, some FRUs like DIMMs do not have LEDs.
Whether a FRU without LEDs is in the normal state can be checked by executing the XSCF Shell commands showhardconf and ioxadm from a maintenance terminal. For details of the commands, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF Reference Manual.
2.7.1 When target FRU is indicated by LEDs
When an error message is displayed at the system console and the cause of the error is in hardware, a faulty FRU must be removed and replaced. Each FRU is equipped with an LED to indicate whether an error has occurred in the FRU and an LED to indicate whether the FRU can be removed. Most FRUs are named READY LED and CHECK LED. In some cases, names are not indicated but the icons are always printed or icon labels are always affixed. Such FRUs include the back of the operator panel, XSCFUs, CMUs, XBUs, CLKUs, FANs, and HDDs.
2.7.2 When target FRU is not indicated by LEDs
For some FRUs, the READY LED and CHECK LED are not used as the names of the LEDs that are checked at replacement. Even in such a case, the same icons as those for the READY LED and CHECK LED are used so that the meaning of LEDs can be understood. Even if the names of LEDs are not indicated, the icons are always printed or icon labels are always affixed.
TABLE 2-11 LED Display That Should Be Checked When a FRU Is Replaced (Common)
LED Display and meaning
READY (green)
Indicates whether the unit is operating (whether it is configured into the system).
Lit Indicates that the FRU is operating. The FRU cannot be disconnected and
removed from the system. Therefore, the FRU cannot be replaced.
Blinking Indicates that the FRU is being configured into the system (or, for an XSCFU,
being initialized) or being disconnected from the system. However, for a PSU, it indicates that the main line switch has been switched on.
Off Indicates that the FRU is stopped and disconnected from the system. Therefore,
the FRU can be replaced.
Chapter 2 Product Overview and Troubleshooting 2-31
Page 70
TABLE 2-11 LED Display That Should Be Checked When a FRU Is Replaced (Common) (Continued)
LED Display and meaning
CHECK (amber)
Indicates either that the unit contains an error or that the unit is a target device for replacement.
Lit Indicates that an error has been detected in the hardware of the FRU. (For an
HDD, the LED is lit according to the instruction from the software or middleware.)
Blinking (*1) Indicates that the FRU is to be replaced.
Off Indicates that the state of the FRU is normal.
Note – *1)If the maintenance target component is indicted by a blinking CHECK
LED, the LED may be called a locater.
TABLE 2-12 Status Display of LEDs Defined Individually for Each FRU
LED
FRU
XSCFU READY Lit (green) Indicates that the XSCFU is in use. In this state, the
Blinking (green)
Off Indicates that the XSCFU can be replaced.
CHECK Lit (amber) Indicates that an error was detected in the XSCFU.
Blinking (amber)
Off Indicates that the XSCFU is in the normal state.
ACTIVE Lit (green) Indicates that the XSCFU is in use (active).
Off Indicates that the XSCFU is on standby.
MeaningType Display
XSCFU cannot be removed (cannot be replaced).
Indicates that the XSCFU is being initialized.
However, this LED remains on for a few minutes immediately after power-on (until the start of initialization). It does not indicate an error during that time.
Indicates that the XSCFU is a replacement target.
2-32 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 71
TABLE 2-12 Status Display of LEDs Defined Individually for Each FRU (Continued)
LED
FRU
XSCFU and IOU (display part for
LAN)
ACT Lit (green) Indicates that communication is being performed
Off Indicates that no communication is being performed
MeaningType Display
through the Ethernet port (LAN port).
through the Ethernet port (LAN port).
LINK SPEED Lit (amber) Only for an IOU: Indicates that the communication
speed is 1G bps.
Lit (green) Indicates that the communication speed is 100M bps.
Off Indicates that the communication speed is 10M bps.
HDD READY Lit (green) Indicates that the HDD is in operation. In this state, the
HDD cannot be removed (cannot be replaced).
OK
Blinking (green)
Indicates that the HDD is being connected. In this state, the HDD cannot be removed (cannot be replaced).
Off Indicates that the HDD can be replaced.
CHECK Lit (amber) Indicates that an error was detected in the HDD.
However, this LED remains on for a few minutes immediately after power-on (until the start of initialization). It does not indicate an error during that time.
Blinking
Indicates that the HDD is a replacement target.
(amber)
Off Indicates that the HDD is in the normal state.
PCI card (inside an external
I/O expansion unit)
(Power) Lit (green) Indicates that power is being supplied to the PCI slot.
Off Indicates that the PCI card in the PCI slot is stopped.
(Attention) Lit (amber) Indicates that an error occurred in the hardware of the
PCI slot.
Blinking (amber)
Indicates that the PCI card in this PCI slot is a device to be replaced.
Off Indicates that the hardware of the PCI slot is normal.
Chapter 2 Product Overview and Troubleshooting 2-33
Page 72
TABLE 2-12 Status Display of LEDs Defined Individually for Each FRU (Continued)
LED
FRU
PSU: power supply unit
POWER Lit (green) Indicates that the power to the system is turned on and
Blinking (green)
MeaningType Display
being supplied.
Indicates that power is being supplied to the PSU, but the PSU is not turned on.
Off Indicates that power is not being supplied to the PSU.
FAIL Lit (amber) Indicates that an error occurred in the PSU.
Maintenance can be performed.
Off Indicates that the PSU is normal.
PRFL Lit (amber) Indicates that the rotational speed of the cooling fan in
the PSU is abnormal.
Off Indicates that the rotational speed of the cooling fan in
the PSU is normal.

2.8 Using the Troubleshooting Commands

After the message in TABLE 2-8 is displayed, you might desire more information about the fault. For complete information about troubleshooting commands, refer to the Solaris OS man pages or the XSCF Shell man pages. This section describes some details of the following commands:
Section 2.8.1, “Using the showlogs Command” on page 2-34
Section 2.8.2, “Using the fmdump Command” on page 2-35
Section 2.8.3, “Using the fmadm Command” on page 2-35
Section 2.8.4, “Using the fmstat Command” on page 2-37
2.8.1 Using the showlogs Command
The showlogs command will display the contents of a specified log in order of timestamp starting with the oldest date. The showlogs command will display the following logs:
error log
power log
event log
temperature and humidity record
monitoring message log
2-34 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 73
console message log
panic message log
IPL message log
XSCF> showlogs error Dated: Mar 30 12:45:31 JST 2005 Code: 00112233-44556677-8899aabbcceeff0 Status: Alarm Component: PSU#1,PSU#2 Msg: ACFAIL occurred (ACS=3)(FEP type = A1) Date: Mar 30 17:45:31 JST 2005 Code: 00112233-44556677-8899aabbcceeff0 Status: Faulted Component: PSU#1,PSU#2,* Msg: ACFAIL occurred (ACS=3)(FEP type = A1) XSCF>0
2.8.2 Using the fmdump Command
The fmdump command can be used to display the contents of any log files associated with the Solaris Fault Manager.
The fmdump command produces the following output. This example assumes there is only one fault.
# fmdump TIME UUID SUNW-MSG-ID Nov 02 10:04:15.4911 0ee65618-2218-4997-c0dc-b5c410ed8ec2 SUN4-8000-0Y
2.8.3 Using the fmadm Command
This section describes the use of the fmadm command.
The administrator and all service personnel can use the fmadm command. This command can display and change the system configuration parameters managed by the Solaris Fault Manager.
2.8.3.1 fmadm config Command
The fmadm config command outputs the version and status of the diagnostic engine used by the server. To determine whether the latest diagnostic engine is running, compare the version with the information on the SunSolve website.
Chapter 2 Product Overview and Troubleshooting 2-35
Page 74
# fmadm config MODULE VERSION STATUS DESCRIPTION cpumem-diagnosis 1.5 active UltraSPARC-III/IV CPU/Memory Diagnosis cpumem-retire 1.0 active CPU/Memory Retire Agent eft 1.13 active eft diagnosis engine fmd-self-diagnosis 1.0 active Fault Manager Self-Diagnosis io-retire 1.0 active I/O Retire Agent syslog-msgs 1.0 active Syslog Messaging Agent
2.8.3.2 fmadm faulty Command
The fmadm faulty command can be used mainly to identify the status of faulty components.
In the following example, the PCI card is degraded and associated with the following UUID 49847040-ce57-e453-9adc-fe66c7c65384. Also, the "faulted" state may be displayed.
# fmadm faulty STATE RESOURCE / UUID
----- -------------------------------------------------­degraded dev:////pci@2,600000
49847040-ce57-e453-9adc-fe66c7c65384
Note – The error information remains in the Solaris OS even when replacement of
the faulty component is completed. Identify the UUID by executing the fmadm faulty command, and reset the error information by executing the fmadm repair
command with the UUID specified.
2.8.3.3 fmadm repair Command
The fmadm repair command can be used to reset the error information for a faulty component in the Solaris OS.
# fmadm repair 49847040-ce57-e453-9adc-fe66c7c65384 fmadm: recorded repair to 3de29de5-6332-ec64-9b49-bacc739fe3c3
2-36 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 75
Note – The error information remains in the Solaris OS even when replacement of
the faulty component is completed. Identify the UUID by executing the fmadm faulty command, and reset the error information by executing the fmadm repair
command with the UUID specified.
2.8.4 Using the fmstat Command
The fmstat command can report statistics associated with the Solaris Fault Manager. The fmstat command shows information about DE performance. In the example below, the eft DE (also seen in the console output) has received an event which it accepted. A case is "opened" for that event and a diagnosis is performed to "solve" the cause for the failure.
# fmstat module ev_recv ev_acpt wait svc_t %w %b open solve memsz bufsz cpumem-diagnosis 0 0 0.0 0.0 0 0 0 0 3.0 K0 cpumem-retire 0 0 0.0 0.0 0 0 0000 eft 1 1 0.0 1191.8 0 0 1 1 3.3M 11K fmd-self-diagnosis 0 0 0.0 0.0 0 0 0000 io-retire 1 0 0.0 32.4 0 0 0 0 37b 0 syslog-msgs 1 0 0.0 0.5 0 0 0 0 32b 0

2.9 Traditional Solaris Troubleshooting Commands

These superuser commands can help you determine if you have issues in your workstation, in the network, or within another server that you are networking with.
The following commands are described in this section:
Section 2.9.1, “iostat Command” on page 2-38
Section 2.9.2, “prtdiag Command” on page 2-39
Section 2.9.3, “prtconf Command” on page 2-44
Section 2.9.4, “netstat Command” on page 2-46
Section 2.9.5, “ping Command” on page 2-47
Section 2.9.6, “ps Command” on page 2-49
Section 2.9.7, “prstat Command” on page 2-50
Most of these commands are located in the /usr/bin or /usr/sbin directories.
Chapter 2 Product Overview and Troubleshooting 2-37
Page 76
2.9.1 iostat Command
The iostat command iteratively reports terminal, drive, and tape I/O activity, as well as CPU utilization.
2.9.1.1 Options
TABLE 2-13 describes options for the iostat command and how those options can
help troubleshoot the server.
TABLE 2-13 Options for iostat
Option Description How It Can Help
No option Reports status of local I/O devices. A quick three-line output of device
status.
-c Reports the percentage of time the system has spent in user mode, in system mode, waiting for I/O, and idling.
-e Displays device error summary statistics. The total errors, hard errors, soft errors, and transport errors are displayed.
-E Displays all device error statistics. Provides information about
-n Displays names in descriptive format. Descriptive format helps identify
-x For each drive, reports extended drive statistics. The output
is in tabular form.
Quick report of CPU status.
Provides a short table with accumulated errors. Identifies suspect I/O devices.
devices: manufacturer, model number, serial number, size, and errors.
devices.
Similar to the -e option, but provides rate information. This helps identify poor performance of internal devices and other I/O devices across the network.
2-38 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 77
The following example shows output for one iostat command.
# iostat -En c0t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: ST3120026A Revision: 8.01 Serial No: 3JT4H4C2 Size: 120.03GB <120031641600 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 c0t2d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: LITE-ON Product: COMBO SOHC-4832K Revision: O3K1 Serial No: Size: 0.00GB <0 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0
2.9.2 prtdiag Command
The prtdiag command displays configuration and diagnostic information. The diagnostic information identifies any failed component.
The prtdiag command is located in the /usr/platform/platform-name/sbin/ directory.
Note – The prtdiag command might indicate a slot number different than that
identified elsewhere in this document. This is normal.
2.9.2.1 Options
TABLE 2-14 describes options for the prtdiag command and how those options can
help troubleshooting.
TABLE 2-14 Options for prtdiag
Option Description How It Can Help
No option Lists components. Identifies CPU timing and PCI
cards installed.
-v Verbose mode. Displays the time of the most recent AC power failure, the most recent hardware fatal error information, and (if applicable) environmental condition.
Chapter 2 Product Overview and Troubleshooting 2-39
Provides the same information as no option. Additionally lists fan status, temperatures, ASIC, and PROM revisions.
Page 78
The following example shows output for the prtdiag command in verbose mode.
# prtdiag -v bash-3.2# cat /etc/release
Solaris Express Community Edition snv_81 SPARC
Copyright 2008 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 15 January 2008 bash-3.2# prtdiag System Configuration: Sun Microsystems sun4u XXXX SPARC Enterprise M8000 Server System clock frequency: 960 MHz Memory size: 32768 Megabytes
==================================== CPUs ====================================
CPU CPU Run L2$ CPU CPU
LSB Chip ID MHz MB Impl. Mask
--- ---- ---------------------------------------- ---- --- ----- ----
00 0 0, 1, 2, 3, 4, 5, 6, 7 2640 6.0 7 144 00 1 8, 9, 10, 11, 12, 13, 14, 15 2640 6.0 7 144 00 2 16, 17, 18, 19, 20, 21, 22, 23 2640 6.0 7 144 00 3 24, 25, 26, 27, 28, 29, 30, 31 2640 6.0 7 144
============================ Memory Configuration ============================
Memory Available Memory DIMM # of Mirror Interleave
LSB Group Size Status Size DIMMs Mode Factor
--- ------ ------------------ ------- ------ ----- ------- ----------
00 A 16384MB okay 1024MB 16 no 8-way 00 B 16384MB okay 1024MB 16 no 8-way
2-40 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 79
========================= IO Cards =========================
LSB Name Model
--- -------------- -----------­00 scsi LSI,1064 00 network N/A 00 network N/A 00 scsi LSI,1064 00 network N/A 00 network N/A
==================== Hardware Revisions ====================
System PROM revisions:
----------------------
OBP 4.24.4 2007/11/05 10:27
=================== Environmental Status ===================
Mode switch is in UNLOCK mode
=================== System Processor Mode ===================
SPARC64-VII mode
bash-3.2# prtdiag -v System Configuration: Sun Microsystems sun4u XXXX SPARC Enterprise M8000 Se rver System clock frequency: 960 MHz Memory size: 32768 Megabytes
Chapter 2 Product Overview and Troubleshooting 2-41
Page 80
==================================== CPUs ====================================
CPU CPU Run L2$ CPU CPU LSB Chip ID MHz MB Impl. Mask
--- ---- ---------------------------------------- ---- --- ----- ---­00 0 0, 1, 2, 3, 4, 5, 6, 7 2640 6.0 7 144 00 1 8, 9, 10, 11, 12, 13, 14, 15 2640 6.0 7 144 00 2 16, 17, 18, 19, 20, 21, 22, 23 2640 6.0 7 144 00 3 24, 25, 26, 27, 28, 29, 30, 31 2640 6.0 7 144
============================ Memory Configuration ============================
Memory Available Memory DIMM # of Mirror Interleave
LSB Group Size Status Size DIMMs Mode Factor
--- ------ ------------------ ------- ------ ----- ------- ---------­00 A 16384MB okay 1024MB 16 no 8-way 00 B 16384MB okay 1024MB 16 no 8-way
========================= IO Devices =========================
IO Lane/Frq
LSB Type LPID RvID,DvID,VnID BDF State Act, Max Name
Model
--- ----- ---- ------------------ --------- ----- ----------- --------------
---------------- --------------------
Logical Path
-----------­00 PCIx 0 7, 125, 1033 2, 0, 0 okay 133, 133 pci-pciexclas s,060400 N/A
/pci@0,600000/pci@0
00 PCIx 0 7, 125, 1033 2, 0, 1 okay 133, 133 pci-pciexclas s,060400 N/A
/pci@0,600000/pci@0,1
2-42 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 81
00 PCI 0 2, 50, 1000 3, 1, 0 okay --, 133 scsi-pci1000, 50 LSI,1064
/pci@0,600000/pci@0/scsi@1
00 PCI 0 10, 1648, 14e4 4, 1, 0 okay --, 133 network-pci14 e4,1648 N/A
/pci@0,600000/pci@0,1/network@1
00 PCI 0 10, 1648, 14e4 4, 1, 1 okay --, 133 network-pci14 e4,1648 N/A
/pci@0,600000/pci@0,1/network@1,1
00 PCIx 4 5, 125, 1033 2, 0, 0 okay 133, 133 pci-pciexclas s,060400 N/A
/pci@4,600000/pci@0
00 PCIx 4 5, 125, 1033 2, 0, 1 okay 133, 133 pci-pciexclas s,060400 N/A
/pci@4,600000/pci@0,1
00 PCI 4 2, 50, 1000 3, 1, 0 okay --, 133 scsi-pci1000, 50 LSI,1064
/pci@4,600000/pci@0/scsi@1
00 PCI 4 10, 1648, 14e4 4, 1, 0 okay --, 133 network-pci14 e4,1648 N/A
/pci@4,600000/pci@0,1/network@1
00 PCI 4 10, 1648, 14e4 4, 1, 1 okay --, 133 network-pci14 e4,1648 N/A
/pci@4,600000/pci@0,1/network@1,1
Chapter 2 Product Overview and Troubleshooting 2-43
Page 82
==================== Hardware Revisions ====================
System PROM revisions:
----------------------
OBP 4.24.4 2007/11/05 10:27
=================== Environmental Status ===================
Mode switch is in UNLOCK mode
=================== System Processor Mode ===================
SPARC64-VII mode
bash-3.2#
2.9.3 prtconf Command
Similar to the show-devs command run at the ok prompt, the prtconf command displays the devices that are configured.
The prtconf command identifies hardware that is recognized by the Solaris OS. If hardware is not suspected of being bad yet software applications are having trouble with the hardware, the prtconf command can indicate if the Solaris OS software recognizes the hardware, and if a driver for the hardware is loaded.
2-44 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 83
2.9.3.1 Options
TABLE 2-15 describes options for the prtconf command and how those options can
help troubleshooting.
TABLE 2-15 Options for prtconf
Option Description How It Can Help
No option Displays the device tree of devices recognized by the OS. If a hardware device is recognized,
then it is probably functioning properly. If the message "(driver not attached)" is displayed for the device or for a sub-device, then the driver for the device is corrupt or missing.
-D Similar to the output of no option, however the device driver is listed.
-p Similar to the output of no option, yet is abbreviated. Reports a brief list of the devices.
-V Displays the version and date of the OpenBoot PROM
firmware.
The following example shows output for the prtconf command.
# prtconf System Configuration: Sun Microsystems sun4u Memory size: 32768 Megabytes System Peripherals (Software Nodes):
Lists the driver needed or used by the OS to enable the device.
Provides a quick check of firmware version.
SUNW,SPARC-Enterprise scsi_vhci, instance #0 packages (driver not attached) SUNW,builtin-drivers (driver not attached) deblocker (driver not attached) disk-label (driver not attached) terminal-emulator (driver not attached) obp-tftp (driver not attached) ufs-file-system (driver not attached) chosen (driver not attached) openprom (driver not attached) client-services (driver not attached) options, instance #0 aliases (driver not attached) memory (driver not attached) virtual-memory (driver not attached)
Chapter 2 Product Overview and Troubleshooting 2-45
Page 84
pseudo-console, instance #0 nvram (driver not attached) pseudo-mc, instance #0 pseudo-mc, instance #1 pseudo-mc, instance #4 cmp (driver not attached) core (driver not attached) cpu (driver not attached) cpu (driver not attached)
(The rest is omitted.)
2.9.4 netstat Command
The netstat command displays the network status.
2.9.4.1 Options
TABLE 2-16 describes options for the netstat command and how those options can
help troubleshooting.
TABLE 2-16 Options for netstat
Option Description How It Can Help
-i Displays the interface state, including packets in/out, error in/out, collisions, and queue.
-i interval Providing a trailing number with the -i option repeats the netstat command every interval seconds.
-p Displays the media table. Provides MAC address for hosts
-r Displays the routing table. Provides routing information.
-n Replaces host names with IP addresses. Used when an address is more
2-46 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Provides a quick overview of the network status.
Identifies intermittent or long duration network events. By piping netstat output to a file, overnight activity can be viewed all at once.
on the subnet.
useful than a host name.
Page 85
The following example shows output for the netstat -p command.
# netstat -p
Net to Media Table: IPv4 Device IP Address Mask Flags Phys Addr
------ ---------------------- ----------------- ----- --------------­bge0 phatair-46 255.255.255.255 08:00:20:92:4a:47 bge0 ns-umpk27-02-46 255.255.255.255 08:00:20:93:fb:99 bge0 moreair-46 255.255.255.255 08:00:20:8a:e5:03 bge0 fermpk28a-46 255.255.255.255 00:00:0c:07:ac:2e bge0 fermpk28as-46 255.255.255.255 00:50:e2:61:d8:00 bge0 kayakr 255.255.255.255 08:00:20:d1:83:c7 bge0 matlock 255.255.255.255 SP 00:03:ba:27:01:48 bge0 toronto2 255.255.255.255 08:00:20:b6:15:b5 bge0 tocknett 255.255.255.255 08:00:20:7c:f5:94 bge0 mpk28-lobby 255.255.255.255 08:00:20:a6:d5:c8 bge0 efyinisedeg 255.255.255.255 08:00:20:8d:6a:80 bge0 froggy 255.255.255.255 08:00:20:73:70:44 bge0 d-mpk28-46-245 255.255.255.255 00:10:60:24:0e:00 bge0 224.0.0.0 240.0.0.0 SM 01:00:5e:00:00:00
2.9.5 ping Command
The ping command sends ICMP ECHO_REQUEST packets to network hosts. Depending on how the ping command is configured, the output displayed can identify troublesome network links or nodes. The destination host is specified in the variable hostname.
Chapter 2 Product Overview and Troubleshooting 2-47
Page 86
2.9.5.1 Options
TABLE 2-17 describes options for the ping command and how those options can help
troubleshooting.
TABLE 2-17 Options for ping
Option Description How It Can Help
hostname The probe packet is sent to hostname and returned. Verifies that a host is active on the
network.
-g hostname Forces the probe packet to route through a specified gateway.
-i interface Designates which interface to send and receive the probe packet through.
-n Replaces host names with IP addresses. Used when an address is more
-s Pings continuously in one-second intervals. Ctrl-C aborts.
Upon abort, statistics are displayed.
-svR Displays the route the probe packet followed in one second intervals.
By identifying different routes to the target host, those individual routes can be tested for quality.
Enables a simple check of secondary network interfaces.
beneficial than a host name.
Helps identify intermittent or long-duration network events. By piping ping output to a file, activity overnight can later be viewed at once.
Indicates probe packet route and number of hops. Comparing multiple routes can identify bottlenecks.
The following example shows output for the ping -s command.
# ping -s teddybear PING teddybear: 56 data bytes 64 bytes from teddybear (192.146.77.140): icmp_seq=0. time=1. ms 64 bytes from teddybear (192.146.77.140): icmp_seq=1. time=0. ms 64 bytes from teddybear (192.146.77.140): icmp_seq=2. time=0. ms ^C
----teddybear PING Statistics---­3 packets transmitted, 3 packets received, 0% packet loss round-trip (ms) min/avg/max = 0/0/1
2-48 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 87
2.9.6 ps Command
The ps command lists the status of processes. Using options and rearranging the command output can assist in determining the resource allocation.
2.9.6.1 Options
TABLE 2-18 describes options for the ps command and how those options can help
troubleshooting.
TABLE 2-18 Options for ps
Option Description How It Can Help
-e Displays information for every process. Identifies the process ID and the executable.
-f Generates a full listing. Provides the following process information: user ID, parent process ID, time when executed, and the path to the executable.
-o option Enables configurable output. The pid, pcpu, pmem, and
comm options display process ID, percent CPU consumption,
percent memory consumption, and the responsible executable, respectively.
Provides only most important information. Knowing the percentage of resource consumption helps identify processes that are affecting performance and might be hung.
The following example shows output for one ps command.
# ps -eo pcpu,pid,comm|sort -rn
1.4 100317 /usr/openwin/bin/Xsun
0.9 100460 dtwm
0.1 100677 ps
0.1 100600 ksh
0.1 100591 /usr/dt/bin/dtterm
0.1 100462 /usr/dt/bin/sdtperfmeter
0.1 100333 mibiisa
%CPU PID COMMAND
0.0 100652 /bin/csh
...
Note – When using sort with the -r option, the column headings are printed so that
the value in the first column is equal to zero.
Chapter 2 Product Overview and Troubleshooting 2-49
Page 88
2.9.7 prstat Command
The prstat utility iteratively examines all active processes and reports statistics based on the selected output mode and sort order. The prstat command provides output similar to the ps command.
2.9.7.1 Options
TABLE 2-19 describes options for the prstat command and how those options can
help troubleshooting.
TABLE 2-19 Options for prstat
Option Description How It Can Help
No option Displays a sorted list of the top processes that are
consuming the most CPU resources. List is limited to the height of the terminal window and the total number of processes. Output is automatically updated every five seconds. Ctrl-C aborts.
-n number Limits output to number of lines. Limits amount of data displayed
-s key Permits sorting list by key parameter. Useful keys are cpu (default),
-v Verbose mode. Displays additional parameters.
Output identifies process ID, user ID, memory used, state, CPU consumption, and command name.
and identifies primary resource consumers.
time, and size.
The following example shows output for the prstat command.
# prstat -n 5 -s size PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 100524 mm39236 28M 21M sleep 48 0 0:00.26 0.3% maker6X.exe/1 100317 root 28M 69M sleep 59 0 0:00.26 0.7% Xsun/1 100460 mm39236 11M 8760K sleep 59 0 0:00.03 0.0% dtwm/8 100453 mm39236 8664K 4928K sleep 48 0 0:00.00 0.0% dtsession/4 100591 mm39236 7616K 5448K sleep 49 0 0:00.02 0.1% dtterm/1 Total: 65 processes, 159 lwps, load averages: 0.03, 0.02, 0.04
2-50 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 89
CHAPTER
3

Periodic Maintenance

Caution – The information in this chapter and subsequent chapters is intended for
service engineers. Persons other than the authorized field engineers should use this information only for reference and should not actually perform the work.
This chapter explains the maintenance work that must be performed regularly regardless of whether a problem has occurred. The actual work is limited to preventing dust in the environment from creating pollution.
This information is explained in the following sections:
Section 3.1, “Cleaning a Tape Drive Unit” on page 3-1
Section 3.2, “Cleaning an Air Filter (Server)” on page 3-2
Section 3.3, “Cleaning an Air Filter (I/O Unit)” on page 3-5
The high-end server is equipped with air filters at the bottom of the cabinet. These air filters filter out dust particles from the air that the fans suck in from the floor into the cabinet. If the filters become clogged, the ventilation volume is reduced and the temperature rises, leading to problems. Although the frequency of cleaning varies with the operating environment, the air filters must be cleaned on a regular basis to ensure that they do not become clogged with dust. Each I/O unit also has air filters. Clean them at the same time that the air filters of the server are cleaned.
When the service life expiration date of an air filter has already passed, replace it by referring to the air filter cleaning procedure.

3.1 Cleaning a Tape Drive Unit

The head in a tape drive unit must be cleaned regularly.
3-1
Page 90
Each tape drive unit used for operation must be cleaned once every 24 hours of operation. Even tape drive units not used for operation must be cleaned once every month.
Although cleaning work can be performed in either hot or cold system maintenance mode, the SPARC Enterprise server power must be on when a cleaning cassette is used. The cleaning procedure is as follows.
1. If a tape cassette has been inserted in the tape drive unit, remove it from the unit.
2. While holding the cleaning cassette with the mark side facing right, insert it into the tape drive unit slot.
Head cleaning begins automatically.
3. The cleaning cassette is automatically ejected when cleaning is completed. Remove it from the slot.
4. To use the tape cassette that was removed in Step 1, reinsert it into the tape drive unit.
5. Confirm that the tape drive unit is in the normal state.
At this point, head cleaning is finished.
If one of the following problems occurs, replace the cleaning cassette immediately:
The cleaning cassette is not automatically ejected within one minute after being
inserted.
The tape is fully wound on the take-up reel on the right side. (The cassette can no
longer be reused.)
Use only specified cleaning cassettes.
Note – Contact your sales representative for tape drive unit options on SPARC
Enterprise M8000/M9000 servers.

3.2 Cleaning an Air Filter (Server)

An air filter may be cleaned while power to the server is on. Although the air filters must be cleaned once a year, be sure to clean them if they become visibly dirty, even if they are not scheduled for cleaning.
A high-end server cabinet is equipped with a total of six air filters: three at the front and three at the rear at the bottom.
3-2 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 91
Note – One concern about cleaning the air filters while power to the server is on is
that dislodged dust may be sucked inside the system when the air filters are pulled out. Therefore, gently and slowly pull them out. Complete the cleaning as quickly as possible.
Caution – If you must use a vacuum cleaner for this work, use it outside the
computer room. Do not use it inside the computer room. Using a vacuum cleaner inside the computer room may result in a server failure.
Because the structure and the mounting environment of air filters are the same, the descriptions in the figures covering filter cleaning refer to, as an example, the air filters at the front of each model.
1. Unlock and open the front and rear doors of the server. For details, see
Chapter 5.
2. Using a Phillips screwdriver, loosen the screw securing the fixing bracket of an air filter, and turn the bracket so that it faces downward.
Chapter 3 Periodic Maintenance 3-3
Page 92
FIGURE 3-1 Removing Air Filters (Example for the M8000)
Fixing bracket (x3)
3. Pull out all of the air filters.
4. Use a cleaner to remove dust from the air filters. Attach a brush to the tip of the cleaner, and clean both sides of the filters.
5. Restore each air filter to its original location and orientation, which means the knob is on the side closest to you and the arrow on the label points up (the latticework faces upward).
6. When this restoring work is completed for all the air filters, turn the fixing brackets of the air filters until they face upward, and then tighten the screws firmly with the Phillips screwdriver. Finally, close the front and rear doors of the SPARC Enterprise server.
Removal of Air Filters
This filter cleaning procedure applies to both high-end servers.
3-4 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 93
FIGURE 3-2 Removing of Air Filters (Example Using the M9000 Base Cabinet)
Fixing bracket (x3)

3.3 Cleaning an Air Filter (I/O Unit)

Each I/O unit has two air filters. Clean them at the same time that the air filters of the server are cleaned.
Caution – If you must use a vacuum cleaner for this work, use it outside the
computer room. Do not use it inside the computer room. Using a vacuum cleaner inside the computer room may result in a server failure.
The cleaning procedure is as follows.
1. Loosen the screws securing the filter cover, and remove the filter cover.
2. Pull out the air filter from the filter cover.
Chapter 3 Periodic Maintenance 3-5
Page 94
FIGURE 3-3 Removing of Air Filters (I/O Unit)
3. Use a vacuum cleaner to remove dust from the air filter.
4. After the cleaning is completed, follow the removal procedure in reverse order to mount it.
3-6 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 95
CHAPTER
4

FRU Removal Preparation

This chapter explains the required basic operations for replacing components, in the following sections:
Section 4.1, “Types of Replacement Procedures” on page 4-2
Section 4.2, “Active Replacement” on page 4-3
Section 4.3, “Hot Replacement” on page 4-12
Section 4.4, “Cold Replacement” on page 4-18
Section 4.5, “Power-On/Off of Main Line Switch” on page 4-24
Section 4.6, “Emergency Switch-Off” on page 4-35
Section 4.7, “Cable Routing of the SPARC Enterprise M8000 Server” on page 4-35
When actually performing the work of replacing a component, use the operator panel and the maintenance terminal by referring to the operator panel display, maintenance terminal display, and the LED display of the component.
Depending on the target component, the server must be powered off or a domain must be stopped.
For the LED display of each component, see Section 2.7, “LED Error Display” on
page 2-30. Three replacement types are defined for judging whether power-off of the
server or stopping a domain is necessary: active replacement, hot replacement, and cold replacement. See Part II, Maintenance. For information on the swapping types of each component, see Appendix B.
Note – Some of the XSCF functions have restrictions on their use. Register the
necessary user privileges for each field engineer in advance. Field engineers cannot use functions that have not been registered for them. The system administrator sets and changes the users and their privileges. For details, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide.
Power-on and power-off of the server and emergency power-off are explained in the last part of this chapter.
Section 4.5, “Power-On/Off of Main Line Switch” on page 4-24
4-1
Page 96
Section 4.6, “Emergency Switch-Off” on page 4-35

4.1 Types of Replacement Procedures

4.1.1 FRU Replacement
The three types of replacement procedures explained below are supported for FRU replacement. Choose the most suitable replacement procedure according to the customer's system environment.
Active replacement
A target FRU is operated while the Solaris OS of the domain to which the FRU belongs is operating. The target FRU is operated by using Solaris OS commands or XSCF commands. Because the power supply unit (PSU) and fan unit (FAN) do not belong to any domain, they are operated by using XSCF commands regardless of the operating state of the Solaris OS.
Note – The procedure for disconnecting a hard disk drive from the domain depends
on whether disk mirroring software or similar support software is active. For details, see the related individual software manuals.
Hot replacement
A target FRU is operated while the domain to which the FRU belongs is stopped. Depending on the target FRU, there are two cases as follows:
Operated with XSCF commands.
Operated directly, not by using XSCF commands.
Cold replacement
After all the domains are stopped and then the server is powered off, a FRU is operated.
Note – Do not operate a target FRU while the OpenBoot PROM is running (the ok
prompt is displayed). After stopping the relevant domain (power-off) or starting the Solaris OS, operate the target FRU.
4-2 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 97

4.2 Active Replacement

In active replacement the Solaris OS must be configured to allow the component to be replaced. Active replacement has four stages:
Section 4.2.1, “Disconnecting a FRU from a Domain” on page 4-3
Section 4.2.2, “Disconnecting and Replacing a FRU” on page 4-5
Section 4.2.3, “Configuring a FRU into a Domain” on page 4-6
Section 4.2.4, “Confirming the Hardware” on page 4-7
Note – If the hard disk drive is the boot device, it is necessary to replace it
according to the cold replacement procedure. However, if the disk mirroring software or other support software can disconnect the relevant boot disk from Solaris OS, active replacement can be performed.
4.2.1 Disconnecting a FRU from a Domain
4.2.1.1 Disconnecting a CMU/IOU
Perform the following procedure to disconnect a CMU or IOU when the Solaris OS is operating:
1. Checking resources
Check the resources that are connected to a CMU or IOU to be disconnected, and verify that the system is not affected when it is disconnected.
2. Disconnecting from the domain
To disconnect the CMU or IOU from the domain, enter the following command from the terminal that is connected to the XSCF:
XSCF> deleteboard 01-0
The system administrator permission is required for executing this command.
Chapter 4 FRU Removal Preparation 4-3
Page 98
4.2.1.2 Disconnecting a PCI card
Caution – Before you remove the PCI cassette, make sure that there is no activity on
the card in the cassette.
Caution – In the PCI cassette part, when removing cables such as LAN cable, if
your finger can't reach the latch lock of the connector, press the latch with a flathead screwdriver to remove the cable. Forcing your finger into the clearance can cause damage to the PCI card.
1. From the Solaris OS use the cfgadm command to get the component status:
# cfgadm AP_ID Type Receptacle Occupant Condition ... iou#0-pci#1 unknown empty unconfigured unknown iou#0-pci#2 unknown empty unconfigured unknown iou#0-pci#3 etherne/hp connected configured ok iou#0-pci#4 fibre/hp connected configured ok
AP_ID is comprised of the IOU number (iou#0 or iou#1) and the PCI cassette slot number (pci#1, pci#2, pci#3, pci#4)
2. Use the cfgadm command to unconfigure the component from the hardware:
# cfgadm -c unconfigure AP_ID
where AP_ID is the IOU and PCI card as shown in the output of cfgadm.
3. Use the cfgadm command to stop supplying power to the component:
# cfgadm -c disconnect AP_ID
where AP_ID is the IOU and PCI card as shown in the output of cfgadm.
4. Use the cfgadm command to confirm the component from the domain is now disconnected and unconfigured:
# cfgadm AP_ID Type Receptacle Occupant Condition ... iou#0-pci#1 unknown empty unconfigured unknown iou#0-pci#2 unknown empty unconfigured unknown iou#0-pci#3 etherne/hp disconnected unconfigured ok iou#0-pci#4 fibre/hp connected configured ok
4-4 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Page 99
4.2.2 Disconnecting and Replacing a FRU
1. From the XSCF Shell prompt, use the replacefru command:
XSCF> replacefru
---------------------------------------------------------------­Maintenance/Replacement Menu Please select a type of FRU to be replaced.
1. CMU/IOU(CPU Memory Board Unit/IO Unit)
2. FAN(Fan Unit)
3. PSU(Power Supply Unit)
4. XSCFU(Extended System Control Facility Unit)
5. DDC_A(DDC for BP_A)
---------------------------------------------------------------­Select [1-5|c:cancel] :
Note – DDC_A is displayed only for the M8000.
The command is menu-driven. The example continues using a FAN unit.
Select [1,2|c:cancel] :1
--------------------------------------------------­Maintenance/Replacement Menu Please select a FAN to be replaced.
No. FRU Status
--- --------------- ------------------
1. FAN_A#0 Normal
2. FAN_A#1 Normal
3. FAN_A#2 Normal
4. FAN_A#3 Normal
--------------------------------------------------­Select [1-4|b:back] :1
You are about to replace FAN_A#0. Do you want to continue?[r:replace|c:cancel] :r
Please confirm the Check LED is blinking. If this is the case, please replace FAN_A#0. After replacement has been completed, please select[f:finish] :f
The replacefru command will automatically test the status of the component after the disconnecting off and replace has finished.
Chapter 4 FRU Removal Preparation 4-5
Page 100
Diagnostic tests for FAN_A#0 have started. [This operation may take up to 2 minute(s)]
(progress scale reported in seconds)
0..... 30..... 60..... 90.....done
-------------------------------------------------­Maintenance/Replacement Menu Status of the replaced unit.
FRU Status
------------- -------­FAN_A#0 Normal
-------------------------------------------------­The replacement of FAN_A#0 has completed, normally.[f:finish] :f
-------------------------------------------------­Maintenance/Replacement Menu Please select a type of FRU to be replaced.
1. FAN (Fan Unit)
2. PSU (Power Supply Unit)
-------------------------------------------------­Select [1,2|c:cancel] : C XSCF>
When the tests are complete the program will return to the original menu. Select cancel to return to the XSCF Shell prompt.
Note – The display may vary depending on the XCP version.
4.2.3 Configuring a FRU into a Domain
4.2.3.1 Configuring CMU/IOU
Perform the following procedure to configure a CMU or IOU when the Solaris OS is operating:
4-6 SPARC Enterprise M8000/M9000 Servers Service Manual • August 2009
Loading...