Fujitsu M3000 User Manual

SPARC Enterprise
M3000 Server
Service Manual
Manual Code C120-E540-03EN Part No. 820-5685-12 November 2009, Revision A
Copyright 2008-2009 FUJITSU LIMITED, 1-1, Kamikodanaka 4-chome, Nakahara-ku, Kawasaki-shi, Kanagawa-ken 211-8588, Japan.All rights reserved.
Sun Microsystems, Inc. provided technical input and review on portions of this material. Sun Microsystems,Inc. andFujitsu Limited eachown orcontrol intellectualproperty rights relating to products andtechnology described in
this document,and such products, technology andthis documentare protectedby copyright laws, patents andother intellectual property laws and internationaltreaties. Theintellectual property rights of SunMicrosystems, Inc.and Fujitsu Limited in suchproducts, technologyand this document include,without limitation, one or moreof theUnited States patentslisted athttp://www.sun.com/patentsand one ormore additional patentsor patent applications in theUnited States or other countries.
This documentand the product and technologyto whichit pertains are distributedunder licensesrestricting their use, copying, distribution, and decompilation.No part of such productor technology,or ofthis document, maybe reproducedin anyform by anymeans withoutprior written authorizationof Fujitsu Limited and SunMicrosystems, Inc.,and their applicablelicensors, ifany.The furnishingof this documentto you doesnot give you any rightsor licenses, express or implied,with respectto theproduct or technology to whichit pertains, and this document doesnot contain or represent any commitment ofany kindon the partof FujitsuLimited or SunMicrosystems, Inc.,or any affiliate of either ofthem.
This documentand the product and technologydescribed inthis document mayincorporate third-partyintellectual propertycopyrighted by and/or licensedfrom suppliersto Fujitsu Limitedand/or SunMicrosystems, Inc.,including software and font technology.
Per theterms of the GPL orLGPL, a copy of thesource codegoverned by theGPL orLGPL, as applicable,is availableupon requestby the End User.Please contactFujitsu Limited orSun Microsystems,Inc
This distribution may include materials developed by third parties. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark
in the U.S. and in other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, Java, Netra, Solaris, Sun Ray, Answerbook2, docs.sun.com, OpenBoot, and Sun Fire are trademarks or
registered trademarks of Sun Microsystems, Inc., or its subsidiaries, in the U.S. and other countries. Fujitsu and the Fujitsu logo are registered trademarks of Fujitsu Limited. All SPARC trademarks are used under license and are registered trademarks of SPARC International, Inc. in the U.S. and other countries.
Products bearing SPARC trademarks are based upon architecture developed by Sun Microsystems, Inc. SPARC64 is a trademark of SPARC International, Inc., used under license by Fujitsu Microelectronics, Inc. and Fujitsu Limited. The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sunacknowledges
the pioneering efforts of Xerox in researching and developing theconcept of visual or graphical user interfaces forthe computer industry. Sun holds anon-exclusive license from Xerox to the Xerox GraphicalUser Interface, which license alsocovers Sun’s licensees who implementOPEN LOOK GUIs and otherwise comply with Sun’s written license agreements.
United StatesGovernment Rights - Commercial use.U.S. Governmentusers aresubject to thestandard governmentuser license agreements of Sun Microsystems,Inc. andFujitsu Limited andthe applicableprovisions ofthe FARand itssupplements.
Disclaimer: The only warranties granted by Fujitsu Limited, Sun Microsystems, Inc. or any affiliate of either of them in connection with this document or any product or technology described herein are those expressly set forth in the license agreement pursuant to which the product or technology is provided. EXCEPT AS EXPRESSLY SET FORTH IN SUCH AGREEMENT, FUJITSU LIMITED, SUN MICROSYSTEMS, INC. AND THEIRAFFILIATES MAKENO REPRESENTATIONSOR WARRANTIES OF ANY KIND (EXPRESS OR IMPLIED)REGARDING SUCH PRODUCT OR TECHNOLOGY OR THIS DOCUMENT, WHICH ARE ALL PROVIDED AS IS, AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID. Unless otherwise expressly set forth in such agreement, to the extent allowed by applicable law, in no event shall Fujitsu Limited, Sun Microsystems, Inc. or any of their affiliates have any liability to any third party under any legal theory for any loss of revenues or profits, loss of use or data, or business interruptions, or for any indirect, special, incidental or consequential damages, even if advised of the possibility of such damages.
DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANYIMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Please
Copyright 2008-2009 FUJITSU LIMITED, 1-1, Kamikodanaka 4-chome, Nakahara-ku, Kawasaki-shi, Kanagawa-ken 211-8588, Japon. Tous droits réservés.
Entrée et revue tecnical fournies par Sun Microsystems, Incl sur des parties de ce matériel. Sun Microsystems, Inc. et Fujitsu Limited détiennent et contrôlent toutes deux des droits de propriété intellectuelle relatifs aux produits et
technologies décrits dans ce document. De même, ces produits, technologies et ce document sont protégés par des lois sur le copyright, des brevets, d’autreslois sur la propriétéintellectuelle et des traités internationaux. Les droits de propriété intellectuelle de Sun Microsystems, Inc. et Fujitsu Limited concernant ces produits, ces technologies et ce document comprennent, sans que cette liste soit exhaustive, un ou plusieurs des brevets déposésaux États-Unis et indiqués à l’adresse http://www.sun.com/patents de même qu’un ou plusieursbrevets ou applications brevetées supplémentaires aux États-Unis et dans d’autres pays.
Ce document, le produit et les technologies afférents sont exclusivement distribués avec des licences qui en restreignent l’utilisation, la copie, la distribution et la décompilation. Aucune partie de ce produit, de ces technologies ou de ce document ne peut être reproduite sous quelque forme quece soit, par quelque moyen que ce soit, sans l’autorisation écrite préalable de Fujitsu Limited et de Sun Microsystems, Inc., etde leurs éventuels bailleurs de licence. Ce document, bien qu’il vous ait été fourni, ne vous confère aucun droit et aucune licence, expresses ou tacites, concernant le produitou la technologie auxquelsil se rapporte. Par ailleurs, il ne contient nine représente aucun engagement,de quelque type que ce soit, de la part de Fujitsu Limited ou de Sun Microsystems, Inc., ou des sociétés affiliées.
Ce document, et le produit et les technologies qu’il décrit, peuvent inclure des droits de propriété intellectuelle de parties tierces protégés par copyright et/ou cédés sous licence par des fournisseurs à Fujitsu Limited et/ou Sun Microsystems, Inc., y compris des logiciels et des technologies relatives aux polices de caractères.
Par limites du GPL ou du LGPL, une copie du code source régi par le GPL ou LGPL, comme applicable, est sur demande vers la fin utilsateur disponible; veuillez contacter Fujitsu Limted ou Sun Microsystems, Inc.
Cette distribution peut comprendre des composants développés par des tierces parties. Des parties de ce produit pourront être dérivées des systèmes Berkeley BSD licenciés par l’Université de Californie. UNIX est une marque
déposée aux Etats-Unis et dans d’autres pays et licenciée exclusivement par X/Open Company, Ltd. Sun, Sun Microsystems, le logo Sun, Java, Netra, Solaris, Sun Ray, Answerbook2, docs.sun.com, OpenBoot, et Sun Fire sont des marques de
fabrique ou des marques déposées de Sun Microsystems, Inc., ou ses filiales, aux Etats-Unis et dans d’autres pays. Fujitsu et le logo Fujitsu sont des marques déposées de Fujitsu Limited. Toutes les marques SPARC sont utilisées sous licence et sont des marques de fabrique ou des marques déposées de SPARC International, Inc.
aux Etats-Unis et dans d’autres pays. Les produits portant les marques SPARC sont basés sur une architecture développée par Sun Microsystems, Inc.
SPARC64 est une marques déposée de SPARC International, Inc., utilisée sous le permis par Fujitsu Microelectronics, Inc. et Fujitsu Limited. L’interface d’utilisation graphique OPEN LOOK et Sun™ a été développée par Sun Microsystems, Inc. pour ses utilisateurs et licenciés. Sun
reconnaît les effortsde pionniers de Xerox pour la recherche et le développement du concept des interfaces d’utilisation visuelle ou graphique pour l’industrie de l’informatique. Sun détient une license non exclusive de Xerox sur l’interface d’utilisation graphique Xerox, cette licence couvrant également les licenciés de Sun qui mettent en place l’interface d’utilisation graphique OPEN LOOK et qui, en outre, se conforment aux licences écrites de Sun.
Droits du gouvernement américain - logiciel commercial. Les utilisateurs du gouvernement américain sont soumis aux contrats de licence standard de Sun Microsystems, Inc. et de Fujitsu Limited ainsi qu’aux clauses applicables stipulées dans le FAR et ses suppléments.
Avis denon-responsabilité: les seulesgaranties octroyéespar Fujitsu Limited,Sun Microsystems, Inc. ou toutesociété affiliée del’une ou l’autre entité enrapport avec cedocument ou toutproduit ou toutetechnologie décrit(e) dansles présentes correspondent aux garanties expressément stipulées dans le contrat de licence régissant le produit ou la technologie fourni(e). SAUF MENTION CONTRAIRE EXPRESSÉMENT STIPULÉE DANS CE CONTRAT, FUJITSU LIMITED, SUN MICROSYSTEMS, INC. ET LES SOCIÉTÉS AFFILIÉES REJETTENT TOUTE REPRÉSENTATION OU TOUTE GARANTIE, QUELLE QU’EN SOIT LA NATURE (EXPRESSE OU IMPLICITE) CONCERNANT CE PRODUIT,CETTE TECHNOLOGIE OUCE DOCUMENT, LESQUELS SONT FOURNISEN L’ÉTAT. EN OUTRE,TOUTES LES CONDITIONS, REPRÉSENTATIONS ET GARANTIES EXPRESSES OU TACITES, Y COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITERELATIVE À LA QUALITÉ MARCHANDE, À L’APTITUDE À UNE UTILISATION PARTICULIÈRE OU À L’ABSENCE DE CONTREFAÇON, SONT EXCLUES, DANS LA MESURE AUTORISÉE PAR LA LOI APPLICABLE. Sauf mention contraire expressément stipulée dans ce contrat, dans la mesure autoriséepar la loi applicable, en aucun cas Fujitsu Limited,Sun Microsystems, Inc. ou l’une de leurs filiales nesauraient être tenues responsables envers une quelconque partie tierce, sous quelque théorie juridique que ce soit, de tout manque à gagner ou de perte de profit, de problèmes d’utilisation ou de perte de données, ou d’interruptions d’activités, ou de tout dommage indirect, spécial, secondaire ou consécutif, même si ces entités ont été préalablement informées d’une telle éventualité.
LA DOCUMENTATION EST FOURNIE “EN L’ETAT” ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES OU TACITES SONT FORMELLEMENTEXCLUES, DANSLA MESURE AUTORISEE PAR LA LOIAPPLICABLE, Y COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L’APTITUDE A UNE UTILISATION PARTICULIERE OU A L’ABSENCE DE CONTREFACON.

Contents

Preface xiii
1. Safety Precautions for Maintenance 1–1
1.1 ESD Precautions 1–1
1.2 Server Precautions 1–3
1.2.1 Electrical Safety Precautions 1–3
1.2.2 Equipment Rack Safety Precautions 1–3
1.2.3 Component Handling Precautions 1–4
2. Hardware Overview 2–1
2.1 Name of Each Part 2–1
2.2 Operator Panel 2–5
2.2.1 Operator Panel Overview 2–6
2.2.2 Switches on the Operator Panel 2–7
2.2.3 LEDs on the Operator Panel 2–9
2.3 LED Functions of Components 2–11
2.4 External Interface Port on Rear Panel 2–13
2.5 Labels 2–17
3. Troubleshooting 3–1
3.1 Emergency Power Off 3–1
v
3.2 Failure Diagnostic Method 3–2
3.3 Checking the Server and System Configuration 3–4
3.3.1 Checking the Hardware Configuration and FRU Status 3–4
3.3.1.1 Checking the Hardware Configuration. 3–5
3.3.2 Checking the Software and Firmware Configurations 3–6
3.3.2.1 Checking the Software Configuration 3–7
3.3.2.2 Checking the Firmware Configuration 3–7
3.3.2.3 Downloading Error Log Information 3–7
3.4 Error Conditions 3–8
3.4.1 Predictive Self-Healing Tools 3–8
3.4.2 Monitoring Output 3–10
3.4.3 Messaging Output 3–10
3.5 Using Troubleshooting Commands 3–11
3.5.1 Using the showhardconf Command 3–11
3.5.2 Using the showlogs Command 3–14
3.5.3 Using the showstatus Command 3–15
3.5.4 Using the fmdump Command 3–16
3.5.4.1 fmdump -V Command 3–16
3.5.4.2 fmdump -e Command 3–17
3.5.5 Using the fmadm Command 3–17
3.5.5.1 Using the fmadm faulty Command 3–17
3.5.5.2 fmadm repair Command 3–18
3.5.5.3 fmadm config Command 3–18
3.5.6 Using the fmstat Command 3–19
3.6 General Solaris Troubleshooting Commands 3–19
3.6.1 Using the iostat Command 3–20
3.6.1.1 Options 3–20
3.6.2 Using the prtdiag Command 3–21
vi SPARC Enterprise M3000 Server Service Manual • November 2009
3.6.2.1 Options 3–21
3.6.3 Using the prtconf Command 3–23
3.6.3.1 Options 3–24
3.6.4 Using the netstat Command 3–26
3.6.4.1 Options 3–26
3.6.5 Using the ping Command 3–27
3.6.5.1 Options 3–27
3.6.6 Using the ps Command 3–28
3.6.6.1 Options 3–29
3.6.7 Using the prstat Command 3–29
3.6.7.1 Options 3–30
4. FRU Replacement Preparation 4–1
4.1 Tools Required for Maintenance 4–1
4.2 FRU Replacement and Installation Methods 4–2
4.2.1 FRU Replacement 4–2
4.2.2 FRU Installation 4–3
4.3 Active Replacement/Active Addition 4–5
4.3.1 Releasing a FRU from a Domain 4–5
4.3.2 FRU Removal and Replacement 4–6
4.3.3 Configuring a FRU in a Domain 4–6
4.3.4 Verifying the Hardware Operation 4–7
4.4 Hot Replacement/Hot Addition 4–7
4.4.1 FRU Removal and Replacement 4–8
4.4.2 Verifying the Hardware Operation 4–10
4.5 Cold Replacement/Cold Addition 4–12
4.5.1 Powering off the Server 4–12
4.5.1.1 Power-off by Using the XSCF Command 4–12
4.5.1.2 Power off by Using the Operator Panel 4–13
Contents vii
4.5.2 FRU Removal and Replacement 4–13
4.5.3 Powering on the Server 4–13
4.5.3.1 Power-on by Using the XSCF Command 4–13
4.5.3.2 Power-on by Using the Operator Panel 4–14
4.5.4 Verifying the Hardware Operation 4–15
5. Internal Components Access 5–1
5.1 Sliding the Server Into and Out of the Equipment Rack 5–1
5.1.1 Sliding the Server Out from the Equipment Rack 5–1
5.1.2 Sliding the Server into the Equipment Rack 5–3
5.2 Removing and Attaching the Top Cover 5–3
5.2.1 Removing the Top Cover 5–3
5.2.2 Attaching the Top Cover 5–4
5.3 Removing and Attaching the Air Duct 5–4
5.3.1 Accessing the Air Duct 5–5
5.3.2 Removing the Air Duct 5–5
5.3.3 Attaching the Air Duct 5–6
5.4 Removing and Attaching the Fan Cover 5–6
5.4.1 Removing the Fan Cover 5–6
5.4.2 Attaching the Fan Cover 5–7
6. Motherboard Unit Replacement 6–1
6.1 Accessing the Motherboard Unit 6–4
6.2 Removing the Motherboard Unit 6–7
6.3 Mounting the Motherboard Unit 6–8
6.4 Reassembling the Server 6–9
7. Replacement and Installation of Memory 7–1
7.1 Memory Mounting Rules 7–3
7.1.1 Confirmation of DIMM Information 7–3
viii SPARC Enterprise M3000 Server Service Manual • November 2009
7.1.2 Memory Mounting Conditions 7–4
7.2 Accessing the DIMMs 7–7
7.3 Removing the DIMMs 7–8
7.4 Installing the DIMMs 7–9
7.5 Reassembling the Server 7–9
8. Replacement and Installation of PCIe Cards 8–1
8.1 Accessing a PCIe Card 8–3
8.2 Removing a PCIe Card 8–3
8.3 Mounting a PCIe Card 8–4
8.4 Reassembling the Server 8–5
9. Replacement and Installation of a Hard Disk Drive (HDD) 9–1
9.1 Accessing a Hard Disk Drive 9–3
9.2 Removing a Hard Disk Drive 9–3
9.3 Installing a Hard Disk Drive 9–5
9.4 Reassembling the Server 9–5
10. Replacing the Hard Disk Drive Backplane 10–1
10.1 Accessing the Hard Disk Drive Backplane 10–2
10.2 Removing the Hard Disk Drive Backplane 10–3
10.3 Mounting the Hard Disk Drive Backplane 10–5
10.4 Reassembling the Server 10–6
11. CD-RW/DVD-RW Drive Unit (DVDU) Replacement 11–1
11.1 Accessing the CD-RW/DVD-RW Drive Unit 11–2
11.2 Removing the CD-RW/DVD-RW Drive Unit 11–3
11.3 Mounting the CD-RW/DVD-RW Drive Unit 11–4
11.4 Reassembling the Server 11–5
12. Power Supply Unit Replacement 12–1
Contents ix
12.1 Accessing a Power Supply Unit 12–3
12.2 Removing the Power Supply Unit 12–3
12.3 Mounting the Power Supply Unit 12–5
12.4 Reassembling the Server 12–5
13. Fan Unit Replacement 13–1
13.1 Accessing a Fan Unit 13–3
13.2 Removing a Fan Unit 13–3
13.3 Mounting a Fan Unit 13–5
13.4 Reassembling the Server 13–5
14. Fan Backplane Replacement 14–1
14.1 Accessing the Fan Backplane 14–2
14.2 Removing the Fan Backplane 14–5
14.3 Mounting the Fan Backplane 14–6
14.4 Reassembling the Server 14–6
15. Operator Panel Replacement 15–1
15.1 Accessing the Operator Panel 15–3
15.2 Removing the Operator Panel 15–4
15.3 Mounting the Operator Panel 15–5
15.4 Reassembling the Server 15–5
A. Components List A–1
B. FRU List B–1
B.1 Server Overview B–1
B.2 Motherboard Unit B–2
B.2.1 Memory (DIMM) B–3
B.2.2 PCIe Slot B–3
B.2.3 CPU B–4
x SPARC Enterprise M3000 Server Service Manual • November 2009
B.2.4 XSCF Unit B–4
B.3 Drive B–5
B.3.1 Hard Disk Drive B–5
B.3.2 CD-RW/DVD-RW Drive Unit (DVDU) B–6
B.4 Power Supply Unit B–6
B.5 Fan Unit B–7
C. External Interface Specifications C–1
C.1 Serial Port C–2
C.2 UPC Port C–2
C.3 USB Port C–3
C.4 SAS Port C–3
C.5 Connection Diagram for Serial Cable C–4
D. UPS Controller D–1
D.1 Overview D–1
D.2 Signal Cable D–2
D.3 Configuration of Signal Lines D–3
D.4 Power Supply Conditions D–4
D.4.1 Input Circuit D–4
D.4.2 Output Circuit D–5
D.5 UPS Cable D–5
D.6 Connections D–6
E. DC Power Supply Model E–1
E.1 The Server Views E–2
E.2 LED Functions of Power Supply Unit E–4
E.3 Electricals Specifications E–5
E.4 Using the showhardconf Command E–6
Contents xi
Abbreviations Abbreviations–1
Index Index–1
xii SPARC Enterprise M3000 Server Service Manual • November 2009

Preface

This manual describes how to service SPARC Enterprise™ M3000 server. It is written for maintenance providers who have received training under a self-maintenance contract.
This section includes:
“Glossary” on page xiii
“Structure and Contents of This Manual” on page xiv
“M3000 Server Documentation” on page xv
“Text Conventions” on page xviii
“Prompt Notations” on page xviii
“Syntax of the Command-Line Interface (CLI)” on page xix
“Environment Requirements for Using This Product” on page xix
“Conventions for Alert Messages” on page xx
“Notes on Safety” on page xxi
“Alert Labels” on page xxiv
“Product Handling” on page xxv
“Limitations and Cautions” on page xxvi
“Fujitsu Welcomes Your Comments” on page xxviii
Glossary
For the terms used in the “M3000 Server Documentation” on page xv, refer to the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers Glossary.
xiii
Structure and Contents of This Manual
This manual is organized as described below:
CHAPTER 1 Safety Precautions for Maintenance
Provides safety precautions required for maintenance.
CHAPTER 2 Hardware Overview
Explains the names of components and also explains the LEDs on the operator panel and rear panel.
CHAPTER 3 Troubleshooting
Explains fault diagnosis information.
CHAPTER 4 FRU Replacement Preparation
Explains the method of preparing for the safe replacement of FRUs.
CHAPTER 5 Internal Components Access
Explains how to access internal components.
CHAPTER 6 Motherboard Unit Replacement
Explains how to replace the motherboard unit.
CHAPTER 7 Replacement and Installation of Memory
Explains how to replace and install memory (DIMMs).
CHAPTER 8 Replacement and Installation of PCIe Cards
Explains how to replace and install PCIe cards.
CHAPTER 9 Replacement and Installation of a Hard Disk Drive (HDD)
Explains how to replace and install hard disk drive.
CHAPTER 10 Replacing the Hard Disk Drive Backplane
Explains how to replace the hard disk drive backplane.
CHAPTER 11 CD-RW/DVD-RW Drive Unit (DVDU) Replacement
Explains how to replace the CD-RW/DVD-RW drive unit.
CHAPTER 12 Power Supply Unit Replacement
Explains how to replace a power supply unit.
CHAPTER 13 Fan Unit Replacement
Explains how to replace a fan unit.
xiv SPARC Enterprise M3000 Server Service Manual • November 2009
CHAPTER 14 Fan Backplane Replacement
Explains how to replace the fan backplane.
CHAPTER 15 Operator Panel Replacement
Explains how to replace the operator panel.
APPENDIX A Components List
Explains the server nomenclature and component numbering.
APPENDIX B FRU List
Explains FRUs.
APPENDIX C External Interface Specifications
Explains connector specifications for external interfaces.
APPENDIX D UPS Controller
Explains the UPS controller (UPC) that controls the uninterruptible power supply (UPS) unit.
APPENDIX E DC Power Supply Model
Describes the requirements specific to the DC power supply model.
Abbreviations
Provides the full spellings of abbreviations used in this manual.
Index
Provides keywords and corresponding reference page numbers so that the reader can easily search for items in this manual as necessary.
M3000 Server Documentation
The manuals listed below are provided for reference.
Book titles Manual codes
SPARC Enterprise M3000 Server Site Planning Guide C120-H030
SPARC Enterprise Equipment Rack Mounting Guide C120-H016
SPARC Enterprise M3000 Server Getting Started Guide C120-E536
SPARC Enterprise M3000 Server Overview Guide C120-E537
Important Safety Information for Hardware Systems C120-E391
SPARC Enterprise M3000 Server Safety and Compliance Guide C120-E538
Preface xv
Book titles Manual codes
SPARC Enterprise M3000 Server Installation Guide C120-E539
SPARC Enterprise M3000 Server Service Manual C120-E540
SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers RCI Build Procedure
SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers Administration Guide
SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide
SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF Reference Manual
SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers RCI User’s Guide
SPARC Enterprise M3000 Server Product Notes Go to the Web
SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers Glossary
SPARC Enterprise /PRIMEQUEST Common Installation Planning Manual
C120-E361
C120-E331
C120-E332
Go to the Web
C120-E360
C120-E514
C120-H007
1. Manuals on the web The latest versions of all the SPARC Enterprise Series manuals are available at the
following websites. Global Site
http://www.fujitsu.com/sparcenterprise/manual/
Japanese Site
http://primeserver.fujitsu.com/sparcenterprise/manual/
Note – SPARC Enterprise M3000 Server Product Notes are available on the website
only. Please check for the most recent update on your product.
2. Documentation CD For the Documentation CD, please contact your local sales representative.
SPARC Enterprise M3000 Server Documentation CD (C120-E541)
xvi SPARC Enterprise M3000 Server Service Manual • November 2009
3. Manual on the Enhanced Support Facility x.x CD-ROM disk
Remote maintenance service
Book title Manual code
Enhanced Support Facility User's Guide for REMCS C112-B067
4. Manual (man page) provided in system XSCF man page
Note – The man page can be referenced on the XSCF Shell, and it provides the same
content as the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide.
5. Sun Microsystems software (for Solaris OS, etc.) related manuals
http://docs.sun.com
6. Information on using the RCI function The manual does not contain an explanation of the RCI build procedure. For
information on using the RCI function, refer to the SPARC Enterprise
M3000/M4000/M5000/M8000/M9000 Servers RCI Build Procedure and SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers RCI User's Guide provided
on the website.
Preface xvii
Text Conventions
This manual uses the following fonts and symbols to express specific types of information.
Fonts/symbols Meaning Example
AaBbCc123 What you type, when contrasted
with on-screen computer output. This font represents the example of
command input in the frame.
AaBbCc123 The names of commands, files, and
directories; on-screen computer output.
This font represents the example of command output in the frame.
Italic Indicates the name of a reference
manual
" " Indicates names of chapters,
sections, items, buttons, or menus
XSCF> adduser jsmith
XSCF> showuser -p User Name: jsmith Privileges: useradm
See the SPARC Enterprise M3000/M4000/M5000/M8000/M90 00 Servers XSCF User’s Guide
See Chapter 2, "Hardware Overview."
Prompt Notations
The following prompt notations are used in this manual.
Shell Prompt notations
XSCF
C shell machine-name%
C shell super user machine-name#
Bourne shell and Korn shell
Bourne shell and Korn shell super user
OpenBoot™ PROM
XSCF>
$
#
ok
auditadm
xviii SPARC Enterprise M3000 Server Service Manual • November 2009
Syntax of the Command-Line Interface (CLI)
The command syntax is as follows:
A variable that requires input of a value must be enclosed in <>.
An optional element must be enclosed in [ ].
A group of options for an optional keyword must be enclosed in [ ] and delimited
by |.
A group of options for a mandatory keyword must be enclosed in {} and
delimited by |.
The command syntax is shown in a box.
Example:
XSCF> showuser -a
Environment Requirements for Using This Product
This product is a computer that is intended to be used in a data center. For details of the system requirements, refer to SPARC Enterprise M3000 Server Installation Guide.
Preface xix
Conventions for Alert Messages
This manual uses the following conventions to show alert messages, which are intended to prevent injury to the user or bystanders as well as property damage, and important messages that are useful to the user.
WARNING: This indicates a hazardous situation that could result in death or serious personal injury (potential hazard) if the user does not perform the procedure correctly.
CAUTION: This indicates a hazardous situation that could result in minor or moderate personal injury if the user does not perform the procedure correctly. This signal also indicates that damage to the product or other property may occur if the user does not perform the procedure correctly.
IMPORTANT: This indicates information that could help the user to use the product more effectively.
Alert messages in the text
An alert message in the text consists of a signal indicating an alert level followed by an alert statement. Alert messages are indented to distinguish them from regular text. Also, a space of one line precedes and follows an alert statement.
WARNING: The tasks listed below for this product and optional product provided by Fujitsu Computers should be performed only by field engineer.
The user must not perform these tasks. Incorrect operation of these tasks may cause electric shock, injury, or fire.
Installation and reinstallation of all components
Removal of front panel and top cover
Mounting/unmounting of optional internal devices
Connecting/disconnecting of external interface cables
Maintenance (repair and regular diagnosis and maintenance)
Also, important alert messages are shown in “Important Alert Messages” on
page xxi.
xx SPARC Enterprise M3000 Server Service Manual • November 2009
Notes on Safety
Important Alert Messages
This manual provides the following important alert signals:
Caution – The WARNING signal indicates a dangerous situation could result in
death or serious injury if the user does not perform the procedure correctly.
Task Warning
Normal operation
Emergency Smoking or fire
Electric shock, fire
Do not damage, break, or modify the power cords. Cord damage may cause electric shock or fire.
If smoke or fire is generated from the server, press down the power switch for 4 seconds or more, power off the server, and then remove the cord clamp and disconnect the power cord.
Preface xxi
Caution – The CAUTION signal indicates a hazardous situation could result in
minor or moderate personal injury if the user does not perform the procedure correctly. This signal also indicates that damage to the product or other property may occur if the user does not perform the procedure correctly.
Task Warning
Normal operation
Equipment damage
Be sure to follow the precautions below when installing the main unit. Otherwise, the equipment may be damaged.
• Do not block ventilation slits.
• Avoid installing the equipment in a place exposed to direct sunlight or near devices that becomes extremely hot.
• Avoid installing the equipment in a dusty place or a place directly exposed to corrosive gas or salty air.
• Avoid installing the equipment in a place exposed to strong vibration. Also, install the equipment on a level surface so that it is stable.
• The grounding resistance must not be greater than 10 method varies by the building where you install the server. Make sure that the facility administrator or a qualified electrician verifies the grounding method for the building and performs the grounding work.
• Do not run any cable beneath any equipment. Also, prevent cables from becoming taut. Never disconnect any power cord from the equipment while power is being supplied to the equipment.
• Do not place anything on top of the main unit. Do not use the main unit as a workspace.
• Avoid exposing the equipment to rapid changes in the ambient temperature, such as a rapid increase during transport in winter. A rapid increase in the ambient temperature causes moisture to condense in the equipment. Use the equipment only after the difference between its temperature and the ambient temperature is negligible.
• Avoid installing the equipment near a copy machine, air conditioner, welding machine, or any other devices generating electronic noise.
• Take preventive action to minimize static electricity at the installation location. Note that static electricity is easily generated in some carpets and can cause the equipment to malfunction.
• Confirm that the power supply voltage and frequency during operation match the rated values indicated on the equipment.
• Do not insert any object into an opening in the equipment. Components inside the equipment use high voltage. Conductive foreign matter, such as a metal object, inserted into the equipment, may cause a short circuit between components, resulting in fire, electric shock, or equipment damage.
• For maintenance of the equipment, contact your authorized service personnel.
. The grounding
xxii SPARC Enterprise M3000 Server Service Manual • November 2009
Task Warning
Normal operation
Data destruction
Confirm the items listed below before turning off the power. Otherwise, data may be destroyed.
• All applications have completed processing.
• No user is using the equipment.
• When the main unit power is turned off, the POWER LED on the operation panel is turned off. Be sure to confirm that the POWER LED is off before turning off the main power (uninterruptible power supply [UPS], power input, etc.).
If necessary, back up files before turning off the system power.
Data destruction
Do not forcibly stop a domain that is operating normally. Otherwise, data may be destroyed.
Data destruction
Do not disconnect the power cord from the power input while power is being supplied. Otherwise, data stored on hard disk units may be destroyed.
Maintenance Failure
Unpacking and maintenance of this equipment and Fujitsu optional products must always be performed by a certified field engineer. Customer shall never perform this work by themselves, as this could lead to failures.
Equipment damage
When handling parts, be sure to wear an antistatic wrist strap and connect the clip to the grounding port of the server. Place removed parts on an antistatic conductive mat. Failure to do so may result in serious damage or injury.
Electric shock
Before doing the maintenance, unplug the power cords. This product uses double pole/neutral fusing which could create an electric shock hazard.
Preface xxiii
Alert Labels
The following are labels attached to this product:
Caution – Never peel off the labels.
SPARC Enterprise M3000
(Front View)
xxiv SPARC Enterprise M3000 Server Service Manual • November 2009
Product Handling
Maintenance
Caution – Certain tasks in this manual should only be performed by a certified
service engineer. User must not perform these tasks. Incorrect operation of these tasks may cause electric shock, injury, or fire.
Installation and reinstallation of all components, and initial settings
Removal of front panel and top cover
Mounting/de-mounting of optional internal devices
Plugging or unplugging of external interface cards
Maintenance and inspections (repairing, and regular diagnosis and maintenance)
Caution – The following tasks regarding this product and the optional products
provided from Fujitsu should only be performed by a certified service engineer. Users must not perform these tasks. Incorrect operation of these tasks may cause malfunction.
Unpacking optional adapters and such packages delivered to the users
Plugging or unplugging of external interface cards
Remodeling/Rebuilding
Caution – Any modification and/or recycling of this product and its components
may be carried out only by a certified service engineer and must not be done by the customer under any circumstances. Otherwise, electric shock, injury or fire may result.
Preface xxv
Emission of Laser Beam (Invisible)
Caution – The server contains modules that generate invisible laser radiation. Laser
beams are generated while the equipment is operating, even if an optical cable is disconnected or a cover is removed. Do not look at any laser light source directly or through an optical apparatus (e.g., magnifying glass, microscope).
Limitations and Cautions
Power Control and Operator Panel Mode Switch
When you use the remote power control utilizing the RCI function or the automatic power control system (referred to below as APCS), you can disable this remote power control or the APCS by switching to Service mode on the operator panel.
Disabling these features ensures that you do not unintentionally switch the system power on or off during maintenance. Note system power off with the APCS cannot be disabled with the mode switch. Therefore, be sure to turn off automatic power control via APCS before starting maintenance.
If you switch the mode while using the RCI or the automatic power control, the system power is controlled as follows.
Function Mode switch
Locked Service
RCI Remote power-on/power-off
operations are enabled.
Automatic power control
To use the RCI function, see the SPARC Enterprise
M3000/M4000/M5000/M8000/M9000 Servers RCI Build Procedure and the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers RCI User’s Guide which are
available on the website of manuals.
xxvi SPARC Enterprise M3000 Server Service Manual • November 2009
Automatic power-on/power­off operations are enabled.
Remote power-on/power-off operations are disabled.
Automatic power-on is disabled, but power-off remains enabled.
To use the APCS, see the Enhanced Support Facility User's Guide for Machine Administration Automatic Power Control Function (Supplement Edition).
Preface xxvii
Fujitsu Welcomes Your Comments
If you have any comments or requests regarding this document or if you find any unclear statements in the document, please state your points specifically on the form at the following URL.
For Users in U.S.A., Canada, and Mexico:
http://www.computers.us.fujitsu.com/www/support_servers.shtml?support/servers
For Users in Other Countries: SPARC Enterprise contact
http://www.fujitsu.com/global/contact/computing/sparce_index.html
xxviii SPARC Enterprise M3000 Server Service Manual • November 2009
CHAPTER
1

Safety Precautions for Maintenance

This chapter provides safety precautions required for maintenance.
Section 1.1, “ESD Precautions” on page 1-1
Section 1.2, “Server Precautions” on page 1-3

1.1 ESD Precautions

To ensure that you and bystanders are not exposed to harm and to prevent damage to the system, observe the following safety precautions.
TABLE1-1 ESD Precautions
Item Precaution
ESD connector/wrist strap Connect the ESD connector to your server and wear the antistatic wrist strap
when handling printed circuit boards. See connection destination.
Conductive mat An approved conductive mat provides protection from static damage when
used with a wrist strap. The mat also cushions and protects small parts that are attached to printed circuit boards.
ESD safe packaging box Place a printed board or component in the ESD safe packaging box after you
remove it.
FIGURE 1-1, for the wrist strap
1-1
FIGURE 1-1 Wrist Strap Connection Destination
Hard disk drive or fan unit:
Connect to one of two thumbscrews on the front of the server.
FRU* other than hard disk drive and fan unit
Connect to either upper right on the front or upper left on the rear of the server.
* FRU: Field Replaceable Unit
Caution – Do not connect the wrist strap cable to the conductive mat. Connect it
directly to the server.
The wrist strap and FRU must have the same level of potential.
1-2 SPARC Enterprise M3000 Server Service Manual • November 2009

1.2 Server Precautions

When maintaining the server, observe the following precautions for your protection.
Follow all cautions, warnings, and instructions marked on the server.
Caution – Do not insert any object in an opening of the server. If any object comes
into contact with a high-voltage part or short-circuits a component, fire or electric shock might result.
Refer servicing of the server to the service engineer.

1.2.1 Electrical Safety Precautions

Ensure that the voltage and frequency of the power source to be used matches the
electrical rating labels on the server.
Wear antistatic wrist straps when handling hard disk drives, motherboard units,
or other printed circuit boards.
Use grounded power outlets as described in the SPARC Enterprise M3000 Server
Installation Guide.
Caution – Do not make mechanical or electrical modifications. We are not
responsible for regulatory compliance of modified servers.

1.2.2 Equipment Rack Safety Precautions

The equipment racks must be anchored to the floor, ceiling, or to adjacent frames.
Some equipment racks are supplied with a Quake-Resistant Options Kit or
stabilizer, which supports the weight of the server when it is extended on its slide rails. This prevents the equipment from toppling over during installation or maintenance.
In the following cases, a safety evaluation must be conducted by the service
engineer prior to installation or maintenance work.
When no Quake-Resistant Options Kits or stabilizers are attached and the
equipment rack is not anchored to the floor, ensure safety by confirming that the server does not fall over when it is pulled out from the slide rails.
Chapter 1 Safety Precautions for Maintenance 1-3
When the equipment rack is mounted on a raised floor, ensure that the raised
floor has sufficient strength to withstand the weight upon it when the server is extended on its slide rails. Fix the equipment rack through the raised floor to the concrete floor below it, using a proprietary mounting kit for this purpose.
Caution – If more than one server is installed in an equipment rack, maintain the
servers one at a time.
For details of equipment racks, see the SPARC Enterprise Equipment Rack Mounting Guide.

1.2.3 Component Handling Precautions

Caution – The server is easily damaged by static electricity. To prevent damage to
printed circuit boards, wear a wrist strap and connect it to the server prior to starting maintenance.
Caution – Do not bend the motherboard unit (MBU) or the components mounted
on circuit boards might be damaged.
To prevent the motherboard unit from being bent, observe the following precautions:
Hold the motherboard unit by the handle, where the board stiffener is located.
When removing the motherboard unit from the packaging, keep the motherboard
unit horizontal until you lay it on the cushioned conductive mat.
Connectors and components on the motherboard unit have thin pins that bend
easily. Therefore, do not place the motherboard unit on a hard surface.
Be careful not to damage the small parts located on both sides of the motherboard unit.
Caution – The heat sinks can be damaged by incorrect handling. Do not touch the
heat sinks while replacing or removing motherboard units. If a heat sink is loose or broken, obtain a replacement motherboard unit. When storing or carrying a motherboard unit, ensure that the heat sinks have sufficient protection.
Caution – When removing a cable such as the LAN cable, if your fingers do not
reach the latch lock of the connecter, use a flat head screwdriver to push the latch to disconnect the cable. If you forcibly insert your fingers into the service clearance, the LAN port of the motherboard unit of PCI Express (PCIe) cards may be damaged.
1-4 SPARC Enterprise M3000 Server Service Manual • November 2009
CHAPTER
2

Hardware Overview

This chapter explains the names of components and also explains the LEDs on the operator panel and rear panel.
Section 2.1, “Name of Each Part” on page 2-1
Section 2.2, “Operator Panel” on page 2-5
Section 2.3, “LED Functions of Components” on page 2-11
Section 2.4, “External Interface Port on Rear Panel” on page 2-13
Section 2.5, “Labels” on page 2-17

2.1 Name of Each Part

This section explains the names of parts mounted on the M3000 server.
Among these parts, those which can be replaced in the field by a certified field engineer are called Field Replaceable Units (FRU). For information on the actual replacement/expansion procedure for FRUs, see Chapter 6 to Chapter 15.
The server consists of a chassis in which various components are mounted, top cover to protect the mounted components, front panel, and rear panel. An operator panel is located on the front panel, and ports used to connect external interfaces are located on the rear panel. From the LEDs on the operator panel and rear panel, error and other status information can be checked. For details, see Section 2.2, “Operator
Panel” on page 2-5 to Section 2.4, “External Interface Port on Rear Panel” on page 2-13.
2-1
FIGURE 2-1, FIGURE 2-2 and FIGURE 2-3 are the internal view, front view, and rear view
of the server, respectively, and they indicate the names and abbreviated names of main components.
FIGURE 2-1 Server (Internal View)
Fan backplane (FANBP_B)
Fan unit (FAN_A)
DC-DC converter
(DDC)
Hard disk drive backplane (HDDBP)
CPU
Memory (DIMM) XSCF unit (XSCFU)
Motherboard unit
PCIe slot
Power supply unit (PSU)
CD-RW/DVD-RW drive unit (DVDU)
2-2 SPARC Enterprise M3000 Server Service Manual • November 2009
PCIe card (PCIe)
FIGURE 2-2 Server (Front View)
12 34
Location No. Component
1 Fan unit (FAN_A)
2 Operator panel (OPNL)
3 Hard disk drive (HDD) (2.5-inch SAS disk)
4 CD-RW/DVD-RW drive unit (DVDU)
Chapter 2 Hardware Overview 2-3
FIGURE 2-3 Server (Rear View) (AC Power Supply Model)
123456 7
Location No. Component
1 Power supply unit (PSU)
2 PCIe slot
3 RCI port
4 USB port (for XSCF)
5 Serial port (for XSCF)
6 LAN port (for XSCF)
7 UPC port
8 Serial Attached SCSI (SAS) port
9 Gigabit Ethernet (GbE) port (for OS)
89
2-4 SPARC Enterprise M3000 Server Service Manual • November 2009

2.2 Operator Panel

The operator panel has the important function of controlling the power of the server. The operator panel is usually locked with a key to prevent the server from being mistakenly powered off during system operation.
Before starting maintenance work, ask the system administrator to unlock the operator panel.
Chapter 2 Hardware Overview 2-5

2.2.1 Operator Panel Overview

The system administrator or service engineer checks the operating status of the server with LEDs or operates the power supply with the power switch. shows the location of the operator panel.
FIGURE 2-4 Operator Panel Location
FIGURE 2-4
1
2
3
4
5
Location number Component
1 POWER LED
2 XSCF STANDBY LED
3 CHECK LED
4 Power button
5 Mode switch (key switch)
2-6 SPARC Enterprise M3000 Server Service Manual • November 2009

2.2.2 Switches on the Operator Panel

TABLE 2-1 depicts the functions of the switches on the operator panel.
The switches on the operator panel include the mode switch for setting the operation mode and the power switch for turning on and off the server.
TABLE2-1 Switches (Operator Panel)
Switch Name Description of function
Mode Switch
(Key Switch)
Power button This button is used to turn on or turn off the power to the
Holding down the button for a short time (less than 4 seconds)
Holding down the button for a long time in Service mode (4 seconds or longer)
* In normal operation, the server is powered on only when the data center environmental conditions satisfy the specified values. Then,
the server remains in the reset state until the operating system is booted.
Locked Normal operation mode
Service Mode for maintenance
This switch is used to set the operation mode for the server. Insert the special key that is under the customer's control, to
switch between modes.
• The system can be powered on with the power button, but it cannot be powered off with the power button.
• The key can be pulled out at this key position.
• The system can be powered on and off with the power button.
• The key cannot be pulled out at this key position.
• To stop and maintain the server, set the mode to Service.
server (a domain). Power on and power off are controlled by pressing this button
in different patterns, as described below.
Regardless of the mode switch setting, the server is powered on.
If set in the XSCF, facility (air conditioners) power-on and warm-up processing is skipped.
• If power to the server is on, OS shutdown processing is executed for all domains before the system is powered off.
• If the server is being powered on, the power-on processing is cancelled, and the server is powered off.
• If the server is being powered off, the operation of the power button is ignored, and the power-off processing is continued.
*
TABLE 2-2 shows the function of the mode switch.
Chapter 2 Hardware Overview 2-7
TABLE2-2 Mode Switch Function
Function Mode switch
Locked Service
Inhibition of Break Signal Reception Enabled Reception of the
Break signal can be enabled or disabled for each domain using
setdomainmode
command.
Power On/Off by power button Only Power On is
enabled.
Disabled
Enabled
2-8 SPARC Enterprise M3000 Server Service Manual • November 2009

2.2.3 LEDs on the Operator Panel

TABLE 2-3 lists the server states displayed with the LEDs on the operator panel.
The three LED indicators on the operator panel indicate the following:
General system status
System error warning
System error location
Besides the states listed in
TABLE 2-3, the operator panel also displays various states
of the server using combinations of the three LEDs. are displayed in the course of operation from power-on to power-off of the server.
The blinking interval is 1 second (1 Hz).
TABLE2-3 LEDs on the Operator Panel
Icon Name Color Description
POWER LED Green Indicates the server power status.
• On: The power to the server (a domain) is on.
• Off: The power to the server is off.
• Blinking: The server is powered off.
XSCF
XSCF STANDBY LED
CHECK LED Amber Indicates that the server has detected an error. This is
Green Indicates the XSCF unit status.
• On: XSCF unit is functioning normally.
• Off: Input power source is off or is just after turned on, and
• Blinking: System initialization is in progress after power
sometimes called a locator.
• On: An error that hinders startup was detected.
• Off: Normal, or power is not being supplied.
• Blinking: Indicates that the unit is a maintenance target.
TABLE 2-4 indicates the states that
XSCF unit is stopped.
was turned on.
In service mode, break signals can be suppressed. If the key position is switched to Service, the server will boot into service mode the next time it reboots. Service is selected by default at the initial power-on.
Chapter 2 Hardware Overview 2-9
TABLE2-4 State Display by Combination of LEDs on the Operator Panel
Name Description POWER
*
XSCF STANDBY CHECK
XSCF
Off Off Off Power is not being supplied.
Off Off On Power has been turned on.
Off Blinking Off The XSCF unit is being initialized.
Off Blinking On An error occurred in the XSCF unit.
Off On Off The XSCF unit is in the standby state.
The server is waiting for power-on of the air conditioning facilities in the data center.
On On Off Warm-up standby processing is in progress (power is
turned on after the end of processing). The power-on sequence is in progress. The server is in operation.
Blinking On Off The power-off sequence is in progress.
(The fan units are stopped after the end of processing.)
* READY LED is referred to when the XSCF unit status is indicated.
2-10 SPARC Enterprise M3000 Server Service Manual • November 2009

2.3 LED Functions of Components

This section explains the LEDs of each component. When replacing a FRU, check in advance the states of LEDs.
Normal system state can be confirmed by checking the operator panel. If an error occurs in an individual hardware component in the server, the LEDs of the FRU containing the hardware component which caused the error will indicate the error location. However, some FRUs such as DIMMs do not have LEDs.
To check the state of a FRU that has no LEDs, use an XSCF Shell command such as showhardconf in the maintenance terminal. For details, see
TABLE 2-5 describes the component LEDs and their functions.
TABLE2-5 Component LEDs and Their Functions
Component Name Color Description
Motherboard unit (MBU)
POWER Indicates whether the MBU is operating.
On (green) Indicates that the motherboard is operating. The motherboard
cannot be removed from the server while the POWER LED is on.
Blinking (green)
Off Indicates that the MBU is stopped. The MBU can be
CHECK Indicates the motherboard unit status.
On (amber) Indicates that an error occurred in the MBU.
Off Indicates that the MBU is in the normal state.
Indicates that the MBU is being incorporated into the system or being disconnected from the system.
disconnected and replaced.
TABLE 3-1.
Chapter 2 Hardware Overview 2-11
TABLE2-5 Component LEDs and Their Functions (Continued)
Component Name Color Description
Hard disk drive (HDD)
Power supply unit (PSU)
Indicates that the hard disk drive can be removed. However, this LED is not used.
CHECK On (amber) Indicates that an error occurred in the HDD. However, this
LED stays on for several minutes (until initialization starts) immediately after power-on. This state does not indicate an error.
Blinking
Indicates that the HDD is ready to be replaced.
(amber)
Off Indicates that the HDD is in the normal state.
READY On (green) Indicates that the HDD is operating. The HDD cannot be
removed (cannot be replaced).
OK
Blinking (green)
Indicates that the HDD is performing communication. The HDD cannot be removed (cannot be replaced).
Off The HDD can be replaced.
DC On (green) Indicates that power is turned on and being supplied.
AC On (green) Indicates that input power is being supplied to the power
supply unit.
Off Indicates that input power is not being supplied to the power
supply unit.
CHECK On (amber) Indicates that an error occurred in the PSU.
Blinking
Indicates that the power supply unit is ready to be replaced.
(amber)
Off Indicates that the PSU is in the normal state.
Fan unit (FAN_A) CHECK On (amber) Indicates that an error occurred in the fan unit.
Blinking
Indicates that the fan unit is ready to be replaced.
(amber)
Off Indicates that the fan unit is in the normal state.
2-12 SPARC Enterprise M3000 Server Service Manual • November 2009
TABLE2-5 Component LEDs and Their Functions (Continued)
Component Name Color Description
LAN port display part
ACTIVE On (green) Indicates that communication is being performed through the
LAN port.
Off Indicates that communication is not being performed through
the LAN port.
LINK SPEED
On (amber) Indicates that the communication speed of the LAN port is 1
Gbps.
On (green) Indicates that the communication speed of the LAN port is
100 Mbps.
Off Indicates that the communication speed of the LAN port is 10
Mbps.

2.4 External Interface Port on Rear Panel

This section shows the location of the external interface ports located on the server rear panel and explains their functions.
Chapter 2 Hardware Overview 2-13
FIGURE 2-5 External Interface Port Locations
12345 6
789101112
2-14 SPARC Enterprise M3000 Server Service Manual • November 2009
TABLE2-6 External Interface Port Functions
Location number Component Description
1 RCI port Used to connect the server to a peripheral device
having a RCI connector to enable power interlocking and error monitoring.
2 USB port (for XSCF) Exclusive for maintenance personnel. Cannot be
connected to general-purpose USB devices.
3 Serial port (for XSCF) Connects to the XSCF unit through serial
connection to set up and manage the server.
4 LAN port 1
(for XSCF)
Accommodates a 100Base-TX LAN cable to set up the server and display status.
• XSCF Shell (command-line interface: CLI):
• XSCF Web (browser user interface: BUI):
5 LAN port 0
(for XSCF)
Through CLI or BUI, the user or system administrator monitors the server, displays status, operates domains, and displays information on the console.
6 UPC port 1 By connecting an uninterruptible power supply
(UPS) unit that has the UPS controller (UPC) interface, stable power supply is provided in the event of a failure in the power supply or even a
7 UPC port 0
large-scale power failure. If a single power feed is used, connect a UPS cable
to UPC port 0. In a dual power feed, connect UPS cables to UPC ports 0 and 1.
Chapter 2 Hardware Overview 2-15
TABLE2-6 External Interface Port Functions (Continued)
Location number Component Description
8 GbE port 0 (for OS) Up to 4 100Base-TX/1000Base-T cables can be
connected to GbE ports. High-capacity data can be transferred at a high
speed.
9 GbE port 1 (for OS)
10 GbE port 2 (for OS)
11 GbE port 3 (for OS)
12 SAS port Accommodates external Serial Attached SCSI (SAS)
devices such as a tape drive.
2-16 SPARC Enterprise M3000 Server Service Manual • November 2009

2.5 Labels

This section explains the labels and the card affixed to the server.
Note – The information on the label might differ from that shown on the affixed
labels.
The model number, serial number, and hardware version, all of which are
required for maintenance and management, are shown on the system faceplate label.
The standards label is affixed close to the system faceplate label and shows the
approval standards.
Safety: NRTL/C
Radio wave: VCCI-A, FCC-A, DOC-A, MIC
Safety and radio wave: CE
A label-affixed card that can be inserted or extracted is provided near the power supply unit at the right side at the rear of the server (see be instered in such a way that the standards label faces the outside of the server and the system faceplate label faces the inside of the server.
TABLE 2-6). The card should
Chapter 2 Hardware Overview 2-17
FIGURE 2-6 Label Locations
Inside: System faceplate label
Outside: Standards label
2-18 SPARC Enterprise M3000 Server Service Manual • November 2009
CHAPTER
3

Troubleshooting

This chapter provides the fault diagnosis information and the actions to take for problems.
Section 3.1, “Emergency Power Off” on page 3-1
Section 3.2, “Failure Diagnostic Method” on page 3-2
Section 3.3, “Checking the Server and System Configuration” on page 3-4
Section 3.4, “Error Conditions” on page 3-8
Section 3.5, “Using Troubleshooting Commands” on page 3-11
Section 3.6, “General Solaris Troubleshooting Commands” on page 3-19

3.1 Emergency Power Off

This section explains how to power off in an emergency.
Caution – In an emergency (such as smoke or flames coming from the server),
immediately stop using the server and turn off the power supply. Regardless of the type of business, give top priority to fire prevention measures.
1. Press the power switch for more than 4 seconds to power off the server.
3-1
2. Remove the power cord clamp and disconnect the cable.
FIGURE 3-1 Power-off Method

3.2 Failure Diagnostic Method

When an error occurs, a message is displayed on the maintenance monitor in many cases. Use the flowchart in failures.
3-2 SPARC Enterprise M3000 Server Service Manual • November 2009
FIGURE 3-2 to find the correct methods for diagnosing
FIGURE 3-2 Diagnostic Method Flowchart
Start
OS panic or performance
error?
mail function sent an E-mail
Check whether an error message is displayed on the OS console and XSCF console.
The XSCF console displays
Check /var/adm/messages in the Solaris OS.
FMA message?
YES
Execute fmadm to display fault information.
Can the message
ID be used?
YES
Enter the message ID in http://sun.com/msg/ to refer to fault information.
NO
NO
Is the power OK or
AC OK LED off?
NO
The XSCF
message?
NO
an error message?
YES
YES
YESNO
Execute showlogs or fmadm in the XSCF to display fault information.
Make a memo of the displayed fault information.
Check the power supply unit and its connection.
Has the problem been
solved?
YES
NO
Contact your service engineer.
End
Chapter 3 Troubleshooting 3-3

3.3 Checking the Server and System Configuration

The operating conditions must remain the same before and after maintenance. If an error occurs in the server, save the system configuration and component status information. Confirm that the recovered state after maintenance is the same as that before maintenance.
If an error occurs in the server, one of the following messages is displayed.
Solaris™ Operating System message file
XSCF Shell showhardconf(8) command and showstatus(8) command
Management console
Service processor log

3.3.1 Checking the Hardware Configuration and FRU Status

To replace a faulty FRU and perform the maintenance on the server, it is important to check and understand the hardware configuration of the server and the state of each hardware component.
The hardware configuration refers to information that indicates to which layer a hardware component belongs.
The status of each hardware component refers to information on the conditions of a standard or optional component in the server: temperature, power supply voltage, CPU operating conditions, and other status information.
To check the hardware configuration and the status of each hardware component, use XSCF Shell commands from the maintenance terminal. See commands used.
TABLE3-1 Commands for Checking Hardware Configuration
Command Description
showhardconf Displays hardware configuration.
showstatus Displays the status of a component. This command is used only when a
faulty component is checked.
3-4 SPARC Enterprise M3000 Server Service Manual • November 2009
TABLE 3-1 for the
TABLE3-1 Commands for Checking Hardware Configuration (Continued)
Command Description
showboards Displays information on the system board (XSB).
showdcl Displays the hardware resource configuration information of a domain.
showfru Displays the setting information of a device.
The status of each component can be checked based on the On or blinking state of the component LEDs.
For the component types and LED states, see
TABLE 2-3 and TABLE 2-5.
For details of commands, see the SPARC Enterprise
M3000/M4000/M5000/M8000/M9000 Servers XSCF User's Guide and the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF Reference Manual .
3.3.1.1 Checking the Hardware Configuration.
To check the hardware configuration, authority (user account) to log in with the XSCF user account to the XSCF is required. The following procedure can be used to check the hardware configuration from the maintenance terminal.
Ask the system administrator for the required information, such as the user account and password. For details, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User's Guide .
1. Log in to XSCF Shell.
2. Type showhardconf.
XSCF> showhardconf
The showhardconf command displays hardware configuration information. For details, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User's Guide.
Chapter 3 Troubleshooting 3-5

3.3.2 Checking the Software and Firmware Configurations

The software and firmware configurations and versions affect the operation of the server. To change the configuration or investigate a problem, check the latest information and check for any problems in the software.
Software and firmware varies according to user conditions.
The software configuration and version can be checked in the Solaris Operating
System. Refer to the Solaris OS documentation for more information.
The firmware configuration and versions can be checked from the maintenance
terminal using XSCF Shell commands. Refer to the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User's Guide for more detailed
information.
Check the software and firmware configuration information with assistance from the system administrator. However, if you have received login authority from the system administrator, the following commands can be used from the maintenance terminal for these checks:
TABLE3-2 Commands for Checking the Software Configuration
Command Description
showrev(1M) Displays system configuration information and Solaris OS patch information.
uname(1) Outputs current system information.
TABLE3-3 Commands for Checking the XSCF Firmware Configuration
Command Description
version(8) Outputs current firmware version information.
showhardconf(8) Outputs information on the components mounted on the server.
showstatus(8) Displays the status of a component. This command is used only when a faulty
component is checked.
showboards(8) Displays XSB information. It can display information on an XSB that belongs to the
specified domain and information on all XSBs mounted. An XSB combines hardware resources on physical system boards. The M3000 server consists of a single physical system board (Uni-XSB).
showdcl(8) Displays the configuration information of a domain (hardware resource information).
showfru(8) Displays the setting information of a device.
3-6 SPARC Enterprise M3000 Server Service Manual • November 2009
3.3.2.1 Checking the Software Configuration
The following procedure can be used to check the software configuration from the domain console.
Type showrev.
# showrev
The showrev command displays system configuration information on the screen.
3.3.2.2 Checking the Firmware Configuration
Login authority is required to check the firmware configuration. The procedure below can be used to check the configuration from the maintenance terminal.
1. Log in with the account of the XSCF hardware maintenance engineer.
2. Type version.
XSCF> version
The version command displays firmware version information on the screen. For details, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User's Guide.
3.3.2.3 Downloading Error Log Information
To download error log information, use the XSCF log fetch function. The XSCF unit has an interface with external units so that the service engineer can easily obtain useful maintenance information such as error logs.
Connect the maintenance terminal, and use the XSCF Shell or XSCF Web to download error log information to the maintenance terminal.
Chapter 3 Troubleshooting 3-7

3.4 Error Conditions

This section describes error conditions and relevant corrective actions.
This work is explained in the following sections:
Section 3.4.1, “Predictive Self-Healing Tools” on page 3-8
Section 3.4.2, “Monitoring Output” on page 3-10
Section 3.4.3, “Messaging Output” on page 3-10
Details of the fault information, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User's Guide.
You can find more detailed descriptions of Solaris OS Predictive Self-Healing at the website below:
http://www.sun.com/bigadmin/features/articles/selfheal.html
Predictive self-healing is an architecture and methodology for automatically diagnosing, reporting, and handling software and hardware error conditions. This new technology reduces the time required to debug a hardware or software problem and provides the administrator and service engineer with detailed data about each error.

3.4.1 Predictive Self-Healing Tools

In the Solaris OS, Solaris Fault Manager runs in the background. When an error occurs, the system software recognizes the error and attempts to determine the faulty hardware component. The system software also takes steps to prevent the faulty component from being used until it has been replaced. The system software performs the following activities:
Receives telemetry information about errors detected by the system software.
Diagnoses the errors.
Initiates predictive self-healing activities. For example, Solaris Fault Manager can
disable faulty components.
When possible, causes the faulty FRU to provide an LED indication of the error in
addition to populating system console messages with more details.
3-8 SPARC Enterprise M3000 Server Service Manual • November 2009
TABLE 3-4 shows typical messages generated when an error occurs. Messages are
displayed on your console and are recorded in the /var/adm/messages file.
A message in
TABLE 3-4 indicates that the fault has already been diagnosed. If there
was any corrective action that the system could take, the system has already taken it. If your server is still running, the corrective action continues to be taken.
TABLE3-4 Predictive Self-Healing Messages
Output displayed Description
Nov 1 16:30:20 dt88-292 EVENT-TIME:Tue Nov 1 16:30:20 PST 2005
Nov 1 16:30:20 dt88-292 PLATFORM:SUNW,A70, CSN:-, HOSTNAME:dt88-292
Nov 1 16:30:20 dt88-292 SOURCE:eft, REV: 1.13 SOURCE: Information on the Diagnosis Engine used to
Nov 1 16:30:20 dt88-292 EVENT-ID:afc7e660-d609­4b2f-86b8-ae7c6b8d50c4
Nov 1 16:30:20 dt88-292 DESC: Nov 1 16:30:20 dt88-292 A problem was detected in the
PCI Express subsystem
Nov 1 16:30:20 dt88-292 Refer to http://sun.com/msg/SUN4-8000-0Y for more information.
Nov 1 16:30:20 dt88-292 AUTO-RESPONSE:One or more device instances may be disabled.
Nov 1 16:30:20 dt88-292 IMPACT:Loss of services provided by the device instances associated with this fault.
Nov 1 16:30:20 dt88-292 REC-ACTION:Schedule a repair procedure to replace the affected device.Use Nov 1 16:30:20 dt88-292 fmdump -v -u EVENT_ID to identify the device or contact Sun for support.
EVENT-TIME: The time stamp of the diagnosis
PLATFORM: A description of the server encountering the error
determine the error
EVENT-ID: The Universally Unique event ID for this error
DESC: A basic description of the error
WEB SITE: Where to find specific information and actions for this error
AUTO-RESPONSE: What, if anything, the system did to alleviate any follow-on problems
IMPACT: A description of what is considered to be the impact of the fault
REC-ACTION: A brief description of the corrective action the system administrator should take
Chapter 3 Troubleshooting 3-9

3.4.2 Monitoring Output

To understand error conditions, collect monitoring output information. For the collection of the information, use the commands shown in
TABLE3-5 XSCF Commands for Checking Monitoring Output
Command Operand Description
showlogs(8) console Displays the console of a domain.
monitor Logs messages that are displayed in the message window.
panic Logs output to the console during a panic.
ipl Collects console data generated during the period of the power-on of a
domain to the completion of the Solaris OS start.

3.4.3 Messaging Output

To understand error conditions, collect messaging output information. For the collection of the information, use the commands shown in
TABLE3-6 Commands for Checking Messaging Output
TABLE 3-5.
TABLE 3-6.
Command Operand Description
showlogs(8) env Displays the temperature history log. The environmental temperature
data and power status are indicated in 10-minute intervals. The data is stored for a maximum of six months.
power Displays power and reset information.
event Displays information reported to the system and stored it as event
logs.
error Displays error logs.
fmdump (1M) fmdump(8)
Displays FMA diagnostic results and errors. This command is provided as a Solaris OS command and XSCF Shell command.
Each error message logged by the predictive self-healing architecture has a message ID and Web address associated with the message. From this message ID and Web address, information on the most up-to-date corrective measures can be retrieved.
For details of predictive self-healing, see the Solaris OS documents.
3-10 SPARC Enterprise M3000 Server Service Manual • November 2009

3.5 Using Troubleshooting Commands

When any message listed in TABLE 3-4 is displayed, detailed information on the error may be required. For details on troubleshooting commands, see manual pages of the Solaris OS or XSCF Shell. This section provides detailed explanations of the following commands:
“Using the showhardconf Command” on page 3-11
“Using the showlogs Command” on page 3-14
“Using the showstatus Command” on page 3-15
“Using the fmdump Command” on page 3-16
“Using the fmadm Command” on page 3-17
“Using the fmstat Command” on page 3-19

3.5.1 Using the showhardconf Command

The showhardconf command displays information on each FRU. The following information is displayed:
Current configuration and status
Number of mounted units
Domain information
Name properties of the PCIe card
Chapter 3 Troubleshooting 3-11
XSCF> showhardconf SPARC Enterprise M3000; + Serial:IKK0813023; Operator_Panel_Switch:Locked; + Power_Supply_System:Single; SCF-ID:XSCF#0; + System_Power:On; System_Phase:Cabinet Power On; Domain#0 Domain_Status:OpenBoot Execution Completed;
MBU_A Status:Normal; Ver:0101h; Serial:PP0829045F ; + FRU-Part-Number:CA07082-D902 A1 /541-3302-01 ; + CPU Status:Normal; + Freq:2.520 GHz; Type:32; + Core:4; Strand:2; + Memory_Size:8 GB; MEM#0A Status:Normal; + Code:ce0000000000000001M3 93T2950EZA-CE6 4145-473b3c23; + Type:1A; Size:1 GB; MEM#0B Status:Normal; + Code:7f7ffe00000000004aEBE10RD4AJFA-5C-E 3020-223b2918; + Type:1A; Size:1 GB; MEM#1A Status:Normal; + Code:7f7ffe00000000004aEBE10RD4AJFA-5C-E 3020-223b28af; + Type:1A; Size:1 GB; MEM#1B Status:Normal; + Code:7f7ffe00000000004aEBE10RD4AJFA-5C-E 3020-223b28ab; + Type:1A; Size:1 GB; MEM#2A Status:Normal; + Code:7f7ffe00000000004aEBE10RD4AJFA-5C-E 3020-223b283e; + Type:1A; Size:1 GB; MEM#2B Status:Normal; + Code:7f7ffe00000000004aEBE10RD4AJFA-5C-E 3020-223b2829; + Type:1A; Size:1 GB; MEM#3A Status:Normal; + Code:7f7ffe00000000004aEBE10RD4AJFA-5C-E 3020-223b2840; + Type:1A; Size:1 GB; MEM#3B Status:Normal; + Code:7f7ffe00000000004aEBE10RD4AJFA-5C-E 3020-223b2830; + Type:1A; Size:1 GB; PCI#0 Name_Property:fibre-channel; Card_Type:Other; PCI#1 Name_Property:fibre-channel; Card_Type:Other; PCI#2 Name_Property:pci; Card_Type:Other; PCI#3 Name_Property:pci; Card_Type:Other; OPNL Status:Normal; Ver:0101h; Serial:PP0829045Y ; + FRU-Part-Number:CA07082-D912 A0 /541-3306-01 ;
3-12 SPARC Enterprise M3000 Server Service Manual • November 2009
The showhardconf output continued:
PSU#0 Status:Normal; Serial:EA08260208; + FRU-Part-Number:CA01022-0720 03C /300-2193-03 ; + Power_Status:On; + Type:AC; PSU#1 Status:Normal; Serial:EA08260210; + FRU-Part-Number:CA01022-0720 03C /300-2193-03 ; + Power_Status:On; + Type:AC; FANBP_B Status:Normal; Ver:0101h; Serial:PP082704TD ; + FRU-Part-Number:CA20399-B12X 006AB/541-3304-02 ; FAN_A#0 Status:Normal; FAN_A#1 Status:Normal;
For details, see the showhardconf manual pages.
Chapter 3 Troubleshooting 3-13

3.5.2 Using the showlogs Command

The showlogs command displays information of specified logs in the order of time stamps. The information with the oldest time stamp is displayed first. The showlogs command displays the following logs:
Error log
Power log
Event log
Temperature and humidity record
Monitoring message log
Console message log
Panic message log
IPL message log
XSCF> showlogs error Date: Jun 17 11:05:32 JST 2008 Code: 80000000-c3ff0000-0173000600000000 Status: Alarm Occurred: Jun 17 11:05:32.522 JST 2008 FRU: /PSU#1 Msg: PSU shortage Date: Jun 17 13:41:46 JST 2008 Code: 80002080-7801c201-0130000000000000 Status: Alarm Occurred: Jun 17 13:41:44.861 JST 2008 FRU: /MBU_A,* Msg: Board control error (MBC link error) Date: Jun 17 13:46:31 JST 2008 Code: 60000000-cd01c701-0164010100000000 Status: Warning Occurred: Jun 17 13:46:31.158 JST 2008 FRU: /OPNL,/FANBP_B Msg: TWI access error XSCF>
3-14 SPARC Enterprise M3000 Server Service Manual • November 2009

3.5.3 Using the showstatus Command

The showstatus command displays information about faulty or degraded units that are among the FRUs composing the server and information on the units on the layers immediately above the layers of the faulty or degraded units. For each of the displayed units, an asterisk (*) indicating that the unit is faulty is displayed with any of the following status indicators, which is displayed after "Status:".
Normal: Normal state
Faulted: The unit is faulty and is not operating.
Degraded: The unit is operating. The unit is partly faulty or degraded and some
error has been detected. Although a faulty state is displayed for the unit, it is operating normally.
Deconfigured: There is no problem with the unit itself, but it is degraded due to a
configuration problem, environmental problem, or the degradation of another unit.
Maintenance: Maintenance is being performed. replacefru(8) or addfru(8) is
being executed.
XSCF> showstatus FANBP_B Status:Normal; * FAN_A#0 Status:Faulted; XSCF>
Chapter 3 Troubleshooting 3-15

3.5.4 Using the fmdump Command

The fmdump command displays the contents of the log managed by the module called Fault Manager.
This example assumes that only one error exists.
# fmdump TIME UUID SUNW-MSG-ID Nov 02 10:04:15.4911 0ee65618-2218-4997-c0dc-b5c410ed8ec2 SUN4-8000-0Y
3.5.4.1 fmdump -V Command
To get more detailed information you can use the -e option, as shown in the following example.
# fmdump -V -u 0ee65618-2218-4997-c0dc-b5c410ed8ec2 TIME UUID SUNW-MSG-ID Nov 02 10:04:15.4911 0ee65618-2218-4997-c0dc-b5c410ed8ec2 SUN4-8000-0Y 100% fault.io.fire.asic FRU: hc://product-id=SUNW,A70/motherboard=0 rsrc: hc:///motherboard=0/hostbridge=0/pciexrc=0
The output method using the -V option displays at least three additional lines.
The first line is the same information shown for console messages above,
including a time stamp, UUID, and message ID.
The second line is a declaration of the certainty of diagnosis. In this case we are
100 percent sure the failure is in the ASIC described. If the diagnosis may involve multiple components, you may see two lines here with 50% in each of the two lines.
The "FRU" line indicates what component must be replaced to return the server to
a fully operational state.
The "rsrc" line indicates the component that has become unusable because of this
error.
3-16 SPARC Enterprise M3000 Server Service Manual • November 2009
3.5.4.2 fmdump -e Command
To get information of the errors that caused this failure you can use the -e option, as shown in the following example.
# fmdump -e TIME CLASS Nov 02 10:04:14.3008 ereport.io.fire.jbc.mb_per

3.5.5 Using the fmadm Command

3.5.5.1 Using the fmadm faulty Command
The fmadm faulty command can be used by administrators and service personnel to view and modify system configuration parameters that are maintained by the Solaris fault manager. The command is primarily used to determine the status of a component involved in a fault, as shown in the following example:
# fmadm faulty STATERESOURCE / UUID
-------- -------------------------------------------------------­degraded dev:////pci@1e,600000 0ee65618-2218-4997-c0dc-b5c410ed8ec2 # fmadm repair 0ee65618-2218-4997-c0dc-b5c410ed8ec2
The PCIe slot has been degraded and it is associated with the same UUID as above. Also, the "faulted" status may be displayed.
Chapter 3 Troubleshooting 3-17
3.5.5.2 fmadm repair Command
When the fmadm faulty command displays a fault, the fmadm repair command must be executed to clear the FRU information in the domain after replacement of the motherboard unit that has encountered the error. If the fmadm repair command is not executed, the error message is not cleared.
If the fmadm faulty command displays a fault, clearing the FMA resource cache on the operating system side causes no problem. Data in the cache does not need to match the hardware fault information held by the XSCF.
# fmadm repair STATERESOURCE / UUID
-------- -------------------------------------------------------­degraded dev:////pci@1e,600000 0ee65618-2218-4997-c0dc-b5c410ed8ec2
3.5.5.3 fmadm config Command
The fmadm config command output displays the version number and current status of the diagnosis engine that is being used by the server. Whether the latest engine is being used can be determined by consulting the SunSolve web site.
# fmadm config MODULE VERSION STATUS DESCRIPTION cpumem-diagnosis 1.6 active CPU/Memory Diagnosis cpumem-retire 1.1 active CPU/Memory Retire Agent disk-transport 1.0 active Disk Transport Agent eft 1.16 active eft diagnosis engine event-transport 2.0 active Event Transport Module fabric-xlate 1.0 active Fabric Ereport Translater fmd-self-diagnosis 1.0 active Fault Manager Self-Diagnosis io-retire 1.0 active I/O Retire Agent snmp-trapgen 1.0 active SNMP Trap Generation Agent sysevent-transport 1.0 active SysEvent Transport Agent syslog-msgs 1.0 active Syslog Messaging Agent zfs-diagnosis 1.0 active ZFS Diagnosis Engine zfs-retire 1.0 active ZFS Retire Agent
3-18 SPARC Enterprise M3000 Server Service Manual • November 2009

3.5.6 Using the fmstat Command

The fmstat command reports statistical information and a set of modules that are associated with the module called Solaris Fault Manager. By using the fmstat command, statistical information about the diagnostic engine and diagnostic agent that are currently involved in fault management can be displayed.
The following output example shows that the fmd-self-diagnosis DE module (displayed also on the console output) has received accepted events.
# fmstat module ev_recv ev_acpt wait svc_t %w %b open solve memsz bufsz cpumem-diagnosis 0 0 0.0 0.0 0 0 0 0 3.0K 0 cpumem-retire 0 0 0.0 0.0 0 0 0 0 0 0 disk-transport 0 0 0.0 1793.8 0 0 0 0 40b 0 eft 0 0 0.0 0.0 0 0 0 0 1.2M 0 event-transport 0 0 0.0 0.0 0 0 0 0 210b 0 fabric-xlate 0 0 0.0 0.0 0 0 0 0 0 0 fmd-self-diagnosis 0 0 0.0 0.0 0 0 0 0 0 0 io-retire 0 0 0.0 0.0 0 0 0 0 0 0 snmp-trapgen 0 0 0.0 0.0 0 0 0 0 32b 0 sysevent-transport 0 0 0.0 2395.3 0 0 0 0 0 0 syslog-msgs 0 0 0.0 0.0 0 0 0 0 0 0 zfs-diagnosis 0 0 0.0 0.0 0 0 0 0 0 0 zfs-retire 0 0 0.0 0.0 0 0 0 0 0 0

3.6 General Solaris Troubleshooting Commands

Superuser commands of this type are useful to determine whether there is a problem with the server, network, or another server connected via the network.
This section explains the following commands:
“Using the iostat Command” on page 3-20
“Using the prtdiag Command” on page 3-21
“Using the prtconf Command” on page 3-23
“Using the netstat Command” on page 3-26
“Using the ping Command” on page 3-27
“Using the ps Command” on page 3-28
“Using the prstat Command” on page 3-29
Chapter 3 Troubleshooting 3-19
Most of these commands are located in the /usr/bin directory or /usr/sbin directory.

3.6.1 Using the iostat Command

The iostat command repeatedly reports terminal, drive, and I/O activity, as well as CPU utilization.
3.6.1.1 Options
TABLE 3-7 lists the options of the iostat command and how those options can help
troubleshoot the server.
TABLE3-7 Options for iostat
Option Description How it can help
No option Reports status of local I/O devices. A quick three-line output of device status
information.
-c Reports the percentages of time the system has spent in user mode, in system mode, waiting for I/O, and idling.
-e Displays device error summary statistics. Displays the total number of errors, hardware errors, software errors, and transfer errors.
-E Displays all device error statistics. Provides information about devices:
-n Displays names in a descriptive format. The descriptive format helps identify devices.
-x Reports extended drive statistics of each drive.
The output is in a tabular form.
Quick report of CPU status
Provides a short table with accumulated errors. Identifies suspect I/O devices.
manufacturer, model number, serial number, size, and errors.
Similar to the -e option, but provides rate information. This helps identify internal devices with poor performance and other I/O devices with poor performance across the network.
3-20 SPARC Enterprise M3000 Server Service Manual • November 2009
The following example shows output for the iostat command:
# iostat -En c0t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: ST3120026A Revision: 8.01 Serial No: 3JT4H4C2 Size: 120.03GB <120031641600 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 c0t2d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: LITE-ON Product: COMBO SOHC-4832K Revision: O3K1 Serial No: Size: 0.00GB <0 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0

3.6.2 Using the prtdiag Command

The prtdiag command displays system configuration and diagnostic information. The diagnostic information identifies any failed FRU in the system.
The prtdiag command is located in the /usr/platform/platform-name/sbin/ directory.
The prtdiag command may indicate a slot number different from that shown elsewhere in this document. This is normal.
3.6.2.1 Options
TABLE 3-8 lists the options of the prtdiag command and how those options can help
troubleshooting.
TABLE3-8 Options for prtdiag
Option Description How it can help
No option Lists components. Shows CPU information, memory
configuration, PCIe cards installed, OBP version, status of the mode switch, and CPU operation mode.
-v Verbose mode. Provides the same information as no option. Additionally, displays the detail information of PCIe cards.
Chapter 3 Troubleshooting 3-21
The following example shows output for the prtdiag command in verbose mode:
# prtdiag -v System Configuration: Sun Microsystems sun4u SPARC Enterprise M3000 Server System clock frequency: 1064 MHz Memory size: 7808 Megabytes
================================== CPUs ===========================
CPU CPU Run L2$ CPU CPU LSB Chip ID MHz MB Impl. Mask
--- ---- ----------------------- ---- --- ----- ---­ 00 0 0, 1, 2, 3, 4, 5, 6, 7 2520 5.0 7 145
=================== Memory Configuration ==========================
Memory Available Memory DIMM # of Mirror Interleave LSB Group Size Status Size DIMMs Mode Factor
--- ------ ------------ ------- ------ ----- ------- ---------­ 00 A 4096MB okay 1024MB 4 no 2-way 00 B 3712MB okay 1024MB 4 no 2-way
========================= IO Devices =============================
IO Lane/Frq LSB Type LPID RvID,DvID,VnID BDF State Act,Max Name Model Logical Path
--- ----- ---- --------------- ------- ----- -------- ----------------- ----
------ ----------------------------
00 PCIe 0 aa, 8533, 10b5 2, 0, 0 okay 8, 8 pci-pciex10b5,8533 N/A /pci@0,600000/pci@0 00 PCIe 0 aa, 8533, 10b5 3, 0, 0 okay 4, 8 pci-pciex10b5,8533 N/A /pci@0,600000/pci@0/pci@0 00 PCIe 0 aa, 8533, 10b5 3, 1, 0 okay 4, 4 pci-pciex10b5,8533 N/A /pci@0,600000/pci@0/pci@1 00 PCIe 0 aa, 8533, 10b5 3, 2, 0 okay 4, 4 pci-pciex10b5,8533 N/A /pci@0,600000/pci@0/pci@2 00 PCIe 0 aa, 8533, 10b5 3, 8, 0 okay 0, 8 pci-pciex10b5,8533 N/A /pci@0,600000/pci@0/pci@8 00 PCIe 0 8, 58, 1000 4, 0, 0 okay 4, 8 scsi-pciex1000,58 LSI,1068E /pci@0,600000/pci@0/pci@0/scsi@0 00 PCIx 0 b5, 103, 1166 5, 0, 0 okay 133,133 pci-pciex1166,103 N/A /pci@0,600000/pci@0/pci@1/pci@0 00 PCI 0 a3, 1678, 14e4 6, 4, 0 okay --,133 network-pci14e4,1678 N/A /pci@0,600000/pci@0/pci@1/pci@0/network@4
3-22 SPARC Enterprise M3000 Server Service Manual • November 2009
The prtdiag output continued:
00 PCI 0 a3, 1678, 14e4 6, 4, 1 okay --,133 network-pci14e4,1678 N/A /pci@0,600000/pci@0/pci@1/pci@0/network@4,1 00 PCIx 0 b5, 103, 1166 7, 0, 0 okay 133,133 pci-pciex1166,103 N/A /pci@0,600000/pci@0/pci@2/pci@0 00 PCI 0 a3, 1678, 14e4 8, 4, 0 okay --,133 network-pci14e4,1678 N/A /pci@0,600000/pci@0/pci@2/pci@0/network@4 00 PCI 0 a3, 1678, 14e4 8, 4, 1 okay --,133 network-pci14e4,1678 N/A /pci@0,600000/pci@0/pci@2/pci@0/network@4,1 00 PCIe 1 aa, 8533, 10b5 2, 0, 0 okay 8, 8 pci-pciex10b5,8533 N/A /pci@1,700000/pci@0 00 PCIe 1 aa, 8533, 10b5 3, 0, 0 okay 0, 8 pci-pciex10b5,8533 N/A /pci@1,700000/pci@0/pci@0 00 PCIe 1 aa, 8533, 10b5 3, 8, 0 okay 0, 8 pci-pciex10b5,8533 N/A /pci@1,700000/pci@0/pci@8 00 PCIe 1 aa, 8533, 10b5 3, 9, 0 okay 0, 8 pci-pciex10b5,8533 N/A /pci@1,700000/pci@0/pci@9 ==================== Hardware Revisions ====================
System PROM revisions:
----------------------
OBP 4.24.8 2008/04/23 15:15
=================== Environmental Status ===================
Mode switch is in LOCK mode
=================== System Processor Mode ===================
SPARC64-VII mode
#

3.6.3 Using the prtconf Command

Similar to the show-devs command executed at the ok prompt, the prtconf command displays the devices that are configured.
The prtconf command identifies hardware that is recognized by the Solaris OS. If software applications are having problems with hardware but the hardware is not suspected of being faulty, the prtconf command can be used to check whether the Solaris software recognizes the hardware and whether a driver for the hardware is loaded.
Chapter 3 Troubleshooting 3-23
3.6.3.1 Options
TABLE 3-9 lists the options of the prtconf command and how those options can help
troubleshooting.
TABLE3-9 Options for prtconf
Option Description How it can help
No option Displays the device tree of devices recognized
by the operating system.
-D Similar to the output of no option, but device driver names are listed.
-p Similar to the output of no option, yet is abbreviated.
-V Displays the version and date of the OpenBoot™ PROM firmware.
If a hardware device is recognized, then it is considered to be functioning properly. If the message "(driver not attached)" is displayed for the device or sub-device, then the driver for the device is corrupt or missing.
Lists the drivers needed or used by the operating system to enable the device.
Provides a brief list of the devices.
Useful for a quick check of the firmware version.
The following example shows output for the prtconf command:
# prtconf System Configuration: Sun Microsystems sun4u Memory size: 7616 Megabytes System Peripherals (Software Nodes):
SUNW,SPARC-Enterprise scsi_vhci, instance #0 packages (driver not attached) SUNW,probe-error-handler (driver not attached) SUNW,builtin-drivers (driver not attached) deblocker (driver not attached) disk-label (driver not attached) terminal-emulator (driver not attached) obp-tftp (driver not attached) ufs-file-system (driver not attached) chosen (driver not attached) openprom (driver not attached) client-services (driver not attached) options, instance #0 aliases (driver not attached) memory (driver not attached) virtual-memory (driver not attached) pseudo-console, instance #0
3-24 SPARC Enterprise M3000 Server Service Manual • November 2009
The prtconf output continued:
nvram (driver not attached) pseudo-mc, instance #0 cmp (driver not attached) core (driver not attached) cpu (driver not attached) cpu (driver not attached) core (driver not attached) cpu (driver not attached) cpu (driver not attached) core (driver not attached) cpu (driver not attached) cpu (driver not attached) core (driver not attached) cpu (driver not attached) cpu (driver not attached) pci, instance #0 ebus, instance #0 flashprom (driver not attached) serial, instance #0 scfc, instance #0 panel, instance #0 pci, instance #0 pci, instance #0 pci, instance #1 scsi, instance #0 tape (driver not attached) disk (driver not attached) sd, instance #1 sd, instance #0 pci, instance #2 pci, instance #0 network, instance #0 network, instance #1 (driver not attached) pci, instance #3 pci, instance #1 network, instance #2 (driver not attached) network, instance #3 (driver not attached) pci, instance #4 pci, instance #1 pci, instance #5 pci, instance #6 pci, instance #7 pci, instance #8 os-io (driver not attached) iscsi, instance #0
Chapter 3 Troubleshooting 3-25
The prtconf output continued:
pseudo, instance #0 #

3.6.4 Using the netstat Command

The netstat command displays the network status and protocol statistics.
3.6.4.1 Options
TABLE 3-10 lists the options of the netstat command and how those options can
help troubleshooting.
TABLE3-10 Options for netstat
Option Description How it can help
-i Displays the interface status. The information includes packets in/out, errors in/out, collisions, and queues.
-i interval Repeats the setstat command in the intervals of as many seconds as specified after the -i option.
-p Displays the media table. Provides the MAC address for hosts on the
-r Displays the routing table. Provides routing information.
-n Replaces host names with IP addresses and
displays them.
Provides a quick overview of the network status.
Identifies intermittent or long duration network events. By piping setstat output to a file, overnight activity can be viewed all at once.
subnet.
Used when an IP address is more useful than a host name.
3-26 SPARC Enterprise M3000 Server Service Manual • November 2009
The following example shows the output for the netstat -p command:
# netstat -p Net to Media Table: IPv4 Device IP Address Mask Flags Phys Addr
------ -------------------- --------------- -------- --------------­bge0 san-ff1-14-a 255.255.255.255 o 00:14:4f:3a:93:61 bge0 san-ff2-40-a 255.255.255.255 o 00:14:4f:3a:93:85 sppp0 224.0.0.22 255.255.255.255 bge0 san-ff2-42-a 255.255.255.255 o 00:14:4f:3a:93:af bge0 san09-lab-r01-66 255.255.255.255 o 00:e0:52:ec:1a:00 sppp0 192.168.1.1 255.255.255.255 bge0 san-ff2-9-b 255.255.255.255 o 00:03:ba:dc:af:2a bge0 bizzaro 255.255.255.255 o 00:03:ba:11:b3:c1 bge0 san-ff2-9-a 255.255.255.255 o 00:03:ba:dc:af:29 bge0 racerx-b 255.255.255.255 o 00:0b:5d:dc:08:b0 bge0 224.0.0.0 240.0.0.0 SM 01:00:5e:00:00:00 #

3.6.5 Using the ping Command

The ping command sends an ICMP ECHO_REQUEST packet to a network host. Depending on how the ping command is configured, troublesome network links or nodes can be identified from the displayed output. The destination host is specified in the variable hostname.
3.6.5.1 Options
TABLE 3-11 lists the options of the ping command and how those options can help
troubleshooting.
TABLE3-11 Options for ping
Option Description How it can help
hostname The probe packet is sent to hostname and
returned.
-g hostname Forcibly routes the probe packet through a specified gateway.
-i interface Specifies through which interface to send and receive the probe packet.
Verifies that a host is active on the network.
By sending the probe packet through different routes to the target host, individual routes can be tested for quality.
Enables a simple check of secondary network interfaces.
Chapter 3 Troubleshooting 3-27
TABLE3-11 Options for ping (Continued)
Option Description How it can help
-n Replaces host names with IP addresses and displays them.
-s Continues to repeat ping at intervals of 1 second. Pressing After it is stopped, statistics are displayed.
-svR Displays the route the probe packet followed in 1-second intervals.
CTRL-C stops the execution.
The following example shows output for the ping -s command:
# ping -s san-ff2-17-a PING san-ff2-17-a: 56 data bytes 64 bytes from san-ff2-17-a (10.1.67.31): icmp_seq=0. time=0.427 ms 64 bytes from san-ff2-17-a (10.1.67.31): icmp_seq=1. time=0.194 ms ^C
----san-ff2-17-a PING Statistics---­2 packets transmitted, 2 packets received, 0% packet loss round-trip (ms) min/avg/max/stddev = 0.172/0.256/0.427/0.102 #

3.6.6 Using the ps Command

Used when an IP address is more useful than a host name.
Helps identify intermittent or long duration network events. By piping ping output to a file, overnight activity can be viewed all at once.
Indicates the probe packet route and number of hops. Comparing multiple routes can identify bottlenecks.
The ps commands lists the status of processes. If no option is specified, the ps command outputs information about the processes that have the same execution user ID as the user who is executing this command and are controlled from the same control terminal as this command.
If any option is specified, the output information is controlled according to the specified option.
3-28 SPARC Enterprise M3000 Server Service Manual • November 2009
3.6.6.1 Options
TABLE 3-12 lists the options of the ps command and how those options can help
troubleshooting.
TABLE3-12 Options for ps
Option Description How it can help
-e Displays information for every process. Identifies the process ID and the executable files.
-f Generates a full listing. Provides the following process information: user ID, parent process ID, time when executed, and the paths to the executable files.
-o option Enables configurable output. The pid, pcpu,
pmem, and comm options display process ID, percent CPU consumption, percent memory consumption, and the relevant executable file, respectively.
The following example shows output for the ps command:
# ps PID TTY TIME CMD 101042 pts/3 0:00 ps 101025 pts/3 0:00 sh #
Provides only most important information. Knowing the percentage of resource consumption helps identify processes that are affecting performance and might be hung.
When using sort with the -r option, the column headings are output so that the value in the first column is equal to zero.

3.6.7 Using the prstat Command

The prstat utility repeatedly examines all the active processes in the system and reports statistics based on the selected output mode and sort order. The prstat command provides output similar to the ps command.
Chapter 3 Troubleshooting 3-29
3.6.7.1 Options
TABLE 3-13 lists the options of the prstat command and how those options can help
troubleshooting.
TABLE3-13 Options for prstat
Option Description How it can help
No option Displays a list of the processes sorted in
descending order of consumption amount of CPU resources. The list is limited to the height of the terminal window and the total number of processes. Output is automatically updated every 5 seconds. Pressing execution.
-n number Limits the number of output lines. Limits the amount of data displayed and
-s key Enables the sorting of list contents by key
parameter.
-v Verbose mode Displays additional parameters.
CTRL-C stops the
The following example shows output for the prstat command:
# prstat -n 5 -s size PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 100463 root 66M 61M sleep 59 0 0:01:03 0.0% fmd/19 100006 root 11M 9392K sleep 59 0 0:00:09 0.0% svc.configd/16 100004 root 10M 8832K sleep 59 0 0:00:04 0.0% svc.startd/14 100061 root 9440K 6624K sleep 59 0 0:00:01 0.0% snmpd/1 100132 root 8616K 5368K sleep 59 0 0:00:04 0.0% nscd/35 Total: 52 processes, 188 lwps, load averages: 0.00, 0.00, 0.00 #
Output identifies the process ID, user ID, used amount of memory, state, CPU consumption, and command name.
displays processes consuming many resources.
Useful keys are cpu (default), time, and size.
3-30 SPARC Enterprise M3000 Server Service Manual • November 2009
CHAPTER
4

FRU Replacement Preparation

This chapter explains the method of preparing for the safe replacement of FRUs.
Section 4.1, “Tools Required for Maintenance” on page 4-1
Section 4.2, “FRU Replacement and Installation Methods” on page 4-2
Section 4.3, “Active Replacement/Active Addition” on page 4-5
Section 4.4, “Hot Replacement/Hot Addition” on page 4-7
Section 4.5, “Cold Replacement/Cold Addition” on page 4-12

4.1 Tools Required for Maintenance

The actual maintenance work described in Chapter 5 to Chapter 15 requires maintenance software to confirm that the server and other components are operating correctly and to collect status information and log data on the server and components. Work for mounting, removing, or replacing a specific component requires special tools, including screwdrivers and an antistatic wrist strap. These items are generally named maintenance tools and are listed in
TABLE4-1 Maintenance Tools
Item Part name Use
1 Phillips screwdriver (No. 2)
2 Wrist strap For electrostatic control
3 Conductive mat For electrostatic control
4 SunVTS Test program
TABLE 4-1.
4-1

4.2 FRU Replacement and Installation Methods

This section explains how to replace and install FRUs.

4.2.1 FRU Replacement

There are three methods of replacing FRUs, as follows:
Active replacement
A target FRU is operated while the Solaris OS of the domain to which the FRU belongs is operating.
The target FRU is operated by using Solaris OS commands or XSCF commands. Because the power supply unit (PSU) and fan unit (FAN) do not belong to any
domain, they are operated by using XSCF commands regardless of the operating state of the Solaris OS.
Note – The hard disk drive has a redundant configuration only when disk mirroring
software is used.
Note – If a hard disk drive is an unmirrored boot device, it must be replaced by
using the cold replacement procedure. However, if a boot device can be disconnected by means of a Solaris OS function or disk mirroring software function, active replacement can also be performed. The procedure for disconnecting a hard disk drive varies depending on the software being used. For details, see the manuals for the relevant software.
Hot replacement
A target FRU is operated while the domain to which the FRU belongs is stopped.
Depending on the target FRU, there are two cases as follows:
Power supply unit/Fan unit: operated with XSCF commands.
Hard disk drive: operated directly, not by using XSCF commands.
4-2 SPARC Enterprise M3000 Server Service Manual • November 2009
Cold replacement
After all the domains are stopped and then the server is powered off, a FRU is operated.
Note – Do not operate a target FRU while the OpenBoot PROM is running (the ok
prompt is displayed). After stopping the relevant domain (power-off) or starting the Solaris OS, operate the target FRU.

4.2.2 FRU Installation

For empty slots without hard disk drives or PCIe cards, the number of mounted FRUs can be changed from 1 to the maximum number as required. There are some components that are tentatively mounted physically in the server. If such a component is a hard disk drive, it is called an HDD dummy, and if such a component is a PCIe card, it is called a PCIe slot cover. These components are necessary to protect the server from noise and to properly cool the server.
The same methods as those used for replacement are used for installation.
Note – When installing a new component in an empty slot, remove the HDD
dummy or PCIe slot cover and then install a new FRU.
TABLE 4-2 lists the access locations and applicable replacement methods for each
FRU.
TABLE4-2 FRU Access Locations and Replacement Methods
FRU
Motherboard unit
Access location
Top Yes No No Chapter 6 (MBU_A, MBU_A_2, MBU_A_3, MBU_A_4)
Memory (DIMM) Top Yes No No Chapter 7
PCIe card (PCIe) Top Yes No No Chapter 8
Hard disk drive (HDD) Front Yes Yes
Hard disk drive backplane (HDDBP) Top Yes No No Chapter 10
CD-RW/DVD-RW drive unit
Front/top Yes No No Chapter 11 (DVDU)
Power supply unit (PSU) Rear Yes Yes
Cold replacement
Hot replacement
*
Chapter 4 FRU Replacement Preparation 4-3
Active replacement
Ye s
Ye s
Where to find the procedure
Chapter 9
Chapter 12
TABLE4-2 FRU Access Locations and Replacement Methods (Continued)
FRU
Access location
Fan unit (FAN_A) Top Yes Yes
Cold replacement
Hot replacement
Active replacement
Ye s
Where to find the procedure
Chapter 13
Fan backplane (FANBP_B) Top Yes No No Chapter 14
Operator panel (OPNL) Front/top Yes No No Chapter 15
* The FRU is operated directly, without using XSCF commands. † The FRU is operated with XSCF commands.
The hard disk drive has a redundant configuration only when disk mirroring software is used.
If a harddisk drive is an unmirrored boot device, it must be replaced by usingthe cold replacement procedure. However, ifa boot
device can be disconnected by means of a Solaris OS function or disk mirroring software function, active replacement can also be performed. The procedure for disconnectinga hard disk drive varies dependingon the software being used. Fordetails, see theman­uals for the relevant software.
TABLE 4-3 lists the access location and applicable installation methods for each FRU.
TABLE4-3
FRU
Motherboard unit
FRU Access Locations and Installation Methods
Access location Cold addition Hot addition
TopNoNoNo
Active addition
Where to find the procedure
(MBU_A, MBU_A_2, MBU_A_3, MBU_A_4)
Memory (DIMM) Top Yes No No Chapter 7
PCIe card (PCIe) Top Yes No No Chapter 8
Hard disk drive (HDD) Front Yes Yes
*
Ye s
Chapter 9
Hard disk drive backplane (HDDBP) Top No No No
CD-RW/DVD-RW drive unit
Front/top No No No
(DVDU)
Power supply unit (PSU) Rear No No No
Fan unit (FAN_A) Top No No No
Fan backplane (FANBP_B) Top No No No
Operator panel (OPNL) Front/top No No No
* The FRU is operated directly, without using XSCF commands. † The FRU is operated with XSCF commands.
4-4 SPARC Enterprise M3000 Server Service Manual • November 2009

4.3 Active Replacement/Active Addition

In active replacement, the target FRU is operated while the Solaris OS of the domain to which the FRU belongs is operating.
The target FRU is operated using Solaris OS commands or XSCF commands.
Because the power supply unit (PSU) and fan unit (FAN) do not belong to any domain, they are operated by using XSCF commands regardless of the operating state of the Solaris OS.
Active replacement has the following four stages:
“Releasing a FRU from a Domain” on page 4-5
“FRU Removal and Replacement” on page 4-6
“Configuring a FRU in a Domain” on page 4-6
“Verifying the Hardware Operation” on page 4-7
For active installation, see Section 4.3.3, “Configuring a FRU in a Domain” on
page 4-6 and "Section 4.3.4, “Verifying the Hardware Operation” on page 4-7.

4.3.1 Releasing a FRU from a Domain

1. From the Solaris OS, type the cfgadm command to obtain the component
status.
# cfgadm -a
2. Stop the application from using the component and disconnect the component from the Solaris OS.
The READY LED (green) of the HDD goes off.
Note – If a hard disk drive is an unmirrored boot device, it must be replaced by
using the cold replacement procedure. However, if a boot device can be disconnected by means of a Solaris OS function or disk mirroring software function, active replacement can also be performed.
3. Type the cfgadm -c command to disconnect the component from the Solaris OS.
# cfgadm -c unconfigure Ap_Id
Chapter 4 FRU Replacement Preparation 4-5
4. Type the cfgadm -x command to confirm that the CHECK LED blinks.
# cfgadm -x led=fault, mode=blink Ap_Id
The Ap_Id is shown in the output of cfgadm (for example, disk#0). The CHECK LED (amber) of the HDD blinks.
5. Type the cfgadm command to verify that the component has been
disconnected.
# cfgadm -a
The disconnected component is displayed as being unconfigured.

4.3.2 FRU Removal and Replacement

After the disconnection of a FRU from a domain, the same procedure as that for Hot Replacement/Hot Addition applies. See Section 4.4, “Hot Replacement/Hot
Addition” on page 4-7.

4.3.3 Configuring a FRU in a Domain

This section explains the procedure for active replacement/installation by using Solaris OS commands. For information on using the XSCF command, see Section 4.4,
“Hot Replacement/Hot Addition” on page 4-7.
1. Type the cfgadm -c command from the Solaris OS to integrate the component
into the Solaris OS.
# cfgadm -c configure Ap_Id
2. Type the cfgadm -x command to confirm that the CHECK LED is off.
# cfgadm -x led=fault, mode=off Ap_Id
The Ap_Id is shown in the output of cfgadm (for example, disk#0). The CHECK LED (amber) of the HDD is turned off.
4-6 SPARC Enterprise M3000 Server Service Manual • November 2009
3. Type the cfgadm command to verify that the component has been configured.
# cfgadm -a
The configured component is displayed as being configured. The READY LED (green) of the HDD goes on.

4.3.4 Verifying the Hardware Operation

Confirm the status of the LED indicators.
For information on the LED status, see
TABLE 2-3 and TABLE 2-5.

4.4 Hot Replacement/Hot Addition

In hot replacement, the target FRU is operated while the domain to which the FRU belongs is stopped.
Depending on the target FRU, there are two cases as follows:
Power supply unit/Fan unit: operated with XSCF commands.
Hard disk drive: operated directly, not by using XSCF commands.
For hot addition, do the same operation as that for hot replacement.
Chapter 4 FRU Replacement Preparation 4-7

4.4.1 FRU Removal and Replacement

Type the replacefru command from the XSCF Shell prompt.
XSCF> replacefru
---------------------------------------------------------------------­Maintenance/Replacement Menu Please select a type of FRU to be replaced.
1. FAN (Fan Unit)
2. PSU (Power Supply Unit)
---------------------------------------------------------------------­Select [1,2|c:cancel] :1
---------------------------------------------------------------------­Maintenance/Replacement Menu Please select a FAN to be replaced.
No. FRU Status
--- --------------- ------------------
1. FAN_A#0 Normal
2. FAN_A#1 Normal
---------------------------------------------------------------------­Select [1,2|b:back] :1
You are about to replace FAN_A#0. Do you want to continue?[r:replace|c:cancel] :r
Please confirm the Check LED is blinking. If this is the case, please replace FAN_A#0. After replacement has been completed, please select[f:finish] :f
4-8 SPARC Enterprise M3000 Server Service Manual • November 2009
The replacefru command automatically tests the status of the component after the completion of removal and replacement.
Diagnostic tests for FAN_A#0 have started. [This operation may take up to 3 minute(s)] (progress scale reported in seconds)
0..... 30..done
---------------------------------------------------------------------­Maintenance/Replacement Menu Status of the replaced FRU.
FRU Status
------------- -------­FAN_A#0 Normal
---------------------------------------------------------------------­The replacement of FAN_A#0 has completed normally.[f:finish] :f
---------------------------------------------------------------------­Maintenance/Replacement Menu Please select a type of FRU to be replaced.
1. FAN (Fan Unit)
2. PSU (Power Supply Unit)
---------------------------------------------------------------------­Select [1,2|c:cancel] :c
The display may vary depending on the XCP version
When the tests are complete, the program displays the original menu again. To return to the XSCF Shell prompt, select cancel.
For details, see the manual pages of replacefru.
Chapter 4 FRU Replacement Preparation 4-9

4.4.2 Verifying the Hardware Operation

1. Type the showhardconf command to confirm that the new component has
been installed.
XSCF> showhardconf SPARC Enterprise M3000; + Serial:IKK0813023; Operator_Panel_Switch:Locked; + Power_Supply_System:Single; SCF-ID:XSCF#0; + System_Power:On; System_Phase:Cabinet Power On; Domain#0 Domain_Status:OpenBoot Execution Completed;
MBU_A Status:Normal; Ver:0101h; Serial:PP082202QX ; + FRU-Part-Number:CA07082-D901 A1 /541-3302-01 ; + CPU Status:Normal; + Freq:2.520 GHz; Type:32; + Core:4; Strand:2; + Memory_Size:8 GB; MEM#0A Status:Normal; + Code:ce0000000000000001M3 93T2950EZA-CE6 4145-473b3c23; + Type:1A; Size:1 GB; MEM#0B Status:Normal; + Code:7f7ffe00000000004aEBE10RD4AJFA-5C-E 3020-223b2918; + Type:1A; Size:1 GB; MEM#1A Status:Normal; + Code:7f7ffe00000000004aEBE10RD4AJFA-5C-E 3020-223b28af; + Type:1A; Size:1 GB; MEM#1B Status:Normal; + Code:7f7ffe00000000004aEBE10RD4AJFA-5C-E 3020-223b28ab; + Type:1A; Size:1 GB; MEM#2A Status:Normal; + Code:7f7ffe00000000004aEBE10RD4AJFA-5C-E 3020-223b283e; + Type:1A; Size:1 GB; MEM#2B Status:Normal; + Code:7f7ffe00000000004aEBE10RD4AJFA-5C-E 3020-223b2829; + Type:1A; Size:1 GB; MEM#3A Status:Normal; + Code:7f7ffe00000000004aEBE10RD4AJFA-5C-E 3020-223b2840; + Type:1A; Size:1 GB; MEM#3B Status:Normal; + Code:7f7ffe00000000004aEBE10RD4AJFA-5C-E 3020-223b2830; + Type:1A; Size:1 GB; PCI#0 Name_Property:fibre-channel; Card_Type:Other; PCI#1 Name_Property:fibre-channel; Card_Type:Other; PCI#2 Name_Property:pci; Card_Type:Other; PCI#3 Name_Property:pci; Card_Type:Other;
4-10 SPARC Enterprise M3000 Server Service Manual • November 2009
The showhardconf output continued:
OPNL Status:Normal; Ver:0101h; Serial:PP082202R8 ; + FRU-Part-Number:CA07082-D911 A1 /541-3306-01 ; PSU#0 Status:Normal; Serial:EA08210127; + FRU-Part-Number:CA01022-0720 02B /300-2193-02 ; + Power_Status:On; + Type:AC; PSU#1 Status:Normal; Serial:EA08210131; + FRU-Part-Number:CA01022-0720 02B /300-2193-02 ; + Power_Status:On; + Type:AC; FANBP_B Status:Normal; Ver:0101h; Serial:PP0821031E ; + FRU-Part-Number:CA20399-B12X 004AA/541-3304-01 ; FAN_A#0 Status:Normal; FAN_A#1 Status:Normal; XSCF>
For details, see the manual pages of showhardconf.
2. Confirm the state of the status LEDs of the FRU.
For information on the LED status, see
TABLE 2-3 and TABLE 2-5.
Chapter 4 FRU Replacement Preparation 4-11

4.5 Cold Replacement/Cold Addition

In cold replacement, all business operations must be stopped. When accessing the server, power off the server and disconnect the power cord to ensure safety.
For cold addition, do the same operation as that for cold replacement.

4.5.1 Powering off the Server

This section explains how to power off the server.
4.5.1.1 Power-off by Using the XSCF Command
1. Notify users that the server is being powered off.
2. Back up the system files and data to tape, if necessary.
3. A user with platadm or fieldeng authority must log in to the XSCF Shell and enter the poweroff command.
XSCF> poweroff -a
The following activity is executed when the poweroff command is used:
The Solaris OS shuts down completely.
The server is powered off and the server enters standby mode. (The power to
the XSCF unit remains on.)
For details, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User's Guide.
4. Verify that the POWER LED on the operator panel is off.
5. Disconnect all the power cords from the power outlets.
Caution – There is a risk of electrical failure if the power cords are not
disconnected. All the power cords must be disconnected to completely cut the power to the server.
4-12 SPARC Enterprise M3000 Server Service Manual • November 2009
4.5.1.2 Power off by Using the Operator Panel
1. Notify users that the server is being powered off.
2. Back up the system files and data to tape, if necessary.
3. Turn the mode switch on the operator panel to the Service position.
4. Press the power switch on the operator panel for 4 seconds or more.
5. Verify that the POWER LED on the operator panel is off.
6. Disconnect all the power cords from the power outlets.
Caution – There is a risk of electrical failure if the power cords are not
disconnected. All the power cords must be disconnected to completely cut the power to the server.

4.5.2 FRU Removal and Replacement

In cold replacement, a FRU is removed and replaced while the power is turned off. After the FRU replacement, power on the server.

4.5.3 Powering on the Server

This section explains how to power on the server.
4.5.3.1 Power-on by Using the XSCF Command
1. Verify that the server has enough power supply units to operate in the desired configuration.
2. Connect all the power cords to power outlets.
3. Verify that the XSCF STANDBY LED on the operator panel is on.
4. Turn the mode switch on the operator panel to the desired mode position (Locked or Service).
Chapter 4 FRU Replacement Preparation 4-13
5. A user with platadm or fieldeng authority must log in to the XSCF Shell and type the poweron command.
XSCF> poweron -a
Soon, the following activity is executed:
The POWER LED on the operator panel is turned on.
The power-on self-test (POST) is executed.
Then, the server is completely powered on.
Note – If automatic startup of the Solaris OS is specified, use the sendbreak -d
domain_id command of the XSCF Shell to display the ok prompt after the display
console banner is displayed but before the system starts booting the Solaris OS.
For details, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User's Guide.
4.5.3.2 Power-on by Using the Operator Panel
1. Verify that the server has enough power supply units to operate in the desired configuration.
2. Connect all the power cords to power outlets.
3. Verify that the XSCF STANDBY LED on the operator panel is on.
4. Turn the mode switch on the operator panel to the desired mode position (Locked or Service).
5. Press the power button on the operator panel.
Soon, the following activity is executed:
The POWER LED on the operator panel is turned on.
The power-on self-test (POST) is executed.
Then, the server is completely powered on.
Note – If automatic startup of the Solaris OS is specified, use the sendbreak -d
domain_id command of the XSCF Shell to display the ok prompt after the display
console banner is displayed but before the system starts booting the Solaris OS.
4-14 SPARC Enterprise M3000 Server Service Manual • November 2009

4.5.4 Verifying the Hardware Operation

1. In response to the ok prompt, press the ENTER key and enter ”#” (default value) and then press the ”.” (period) key.
The domain console is switched to the XSCF console.
2. Use the showhardconf command to confirm that the new component has been installed.
XSCF> showhardconf SPARC Enterprise M3000; + Serial:IKK0813023; Operator_Panel_Switch:Locked; + Power_Supply_System:Single; SCF-ID:XSCF#0; + System_Power:On; System_Phase:Cabinet Power On; Domain#0 Domain_Status:OpenBoot Execution Completed;
MBU_A Status:Normal; Ver:0101h; Serial:PP082202QX ; + FRU-Part-Number:CA07082-D901 A1 /541-3302-01 ; + CPU Status:Normal; + Freq:2.520 GHz; Type:32; + Core:4; Strand:2; + Memory_Size:8 GB; MEM#0A Status:Normal; + Code:ce0000000000000001M3 93T2950EZA-CE6 4145-473b3c23; + Type:1A; Size:1 GB; MEM#0B Status:Normal; + Code:7f7ffe00000000004aEBE10RD4AJFA-5C-E 3020-223b2918; + Type:1A; Size:1 GB; MEM#1A Status:Normal; + Code:7f7ffe00000000004aEBE10RD4AJFA-5C-E 3020-223b28af; + Type:1A; Size:1 GB; MEM#1B Status:Normal; + Code:7f7ffe00000000004aEBE10RD4AJFA-5C-E 3020-223b28ab; + Type:1A; Size:1 GB; MEM#2A Status:Normal; + Code:7f7ffe00000000004aEBE10RD4AJFA-5C-E 3020-223b283e; + Type:1A; Size:1 GB; MEM#2B Status:Normal; + Code:7f7ffe00000000004aEBE10RD4AJFA-5C-E 3020-223b2829; + Type:1A; Size:1 GB; MEM#3A Status:Normal; + Code:7f7ffe00000000004aEBE10RD4AJFA-5C-E 3020-223b2840; + Type:1A; Size:1 GB; MEM#3B Status:Normal; + Code:7f7ffe00000000004aEBE10RD4AJFA-5C-E 3020-223b2830; + Type:1A; Size:1 GB; PCI#0 Name_Property:fibre-channel; Card_Type:Other; PCI#1 Name_Property:fibre-channel; Card_Type:Other;
Chapter 4 FRU Replacement Preparation 4-15
The showhardconf output continued:
PCI#2 Name_Property:pci; Card_Type:Other; PCI#3 Name_Property:pci; Card_Type:Other; OPNL Status:Normal; Ver:0101h; Serial:PP082202R8 ; + FRU-Part-Number:CA07082-D911 A1 /541-3306-01 ; PSU#0 Status:Normal; Serial:EA08210127; + FRU-Part-Number:CA01022-0720 02B /300-2193-02 ; + Power_Status:On; + Type:AC; PSU#1 Status:Normal; Serial:EA08210131; + FRU-Part-Number:CA01022-0720 02B /300-2193-02 ; + Power_Status:On; + Type:AC; FANBP_B Status:Normal; Ver:0101h; Serial:PP0821031E ; + FRU-Part-Number:CA20399-B12X 004AA/541-3304-01 ; FAN_A#0 Status:Normal; FAN_A#1 Status:Normal; XSCF>
For details, see the manual pages of showhardconf.
3. Type the console command to switch from the XSCF console to the ok prompt
(domain console) again:
XSCF> console -d 0
4. From the ok prompt, type the show-devs command to confirm that all the
PCIe cards are mounted.
{0} ok show-devs /pci@1,700000 /pci@0,600000 /pci@8,4000 /cmp@400,0 /pseudo-mc@200,200 /nvram /pseudo-console /virtual-memory /memory@m0 /aliases /options /openprom /chosen /packages
4-16 SPARC Enterprise M3000 Server Service Manual • November 2009
The show-devs output continued:
/pci@1,700000/pci@0 /pci@1,700000/pci@0/pci@9 /pci@1,700000/pci@0/pci@8 /pci@1,700000/pci@0/pci@0 /pci@1,700000/pci@0/pci@9/pci@0 /pci@1,700000/pci@0/pci@9/pci@0/FJSV,e2ta@4,1 /pci@1,700000/pci@0/pci@9/pci@0/FJSV,e2ta@4 /pci@1,700000/pci@0/pci@8/pci@0 /pci@1,700000/pci@0/pci@8/pci@0/FJSV,e2ta@4,1 /pci@1,700000/pci@0/pci@8/pci@0/FJSV,e2ta@4 /pci@1,700000/pci@0/pci@0/pci@0 /pci@1,700000/pci@0/pci@0/pci@0/FJSV,e2ta@4,1 /pci@1,700000/pci@0/pci@0/pci@0/FJSV,e2ta@4 /pci@0,600000/pci@0 /pci@0,600000/pci@0/pci@8 /pci@0,600000/pci@0/pci@2 /pci@0,600000/pci@0/pci@1 /pci@0,600000/pci@0/pci@0 /pci@0,600000/pci@0/pci@8/pci@0 /pci@0,600000/pci@0/pci@8/pci@0/FJSV,e2ta@4,1 /pci@0,600000/pci@0/pci@8/pci@0/FJSV,e2ta@4 /pci@0,600000/pci@0/pci@2/pci@0 /pci@0,600000/pci@0/pci@2/pci@0/network@4,1 /pci@0,600000/pci@0/pci@2/pci@0/network@4 /pci@0,600000/pci@0/pci@1/pci@0 /pci@0,600000/pci@0/pci@1/pci@0/network@4,1 /pci@0,600000/pci@0/pci@1/pci@0/network@4 /pci@0,600000/pci@0/pci@0/scsi@0 /pci@0,600000/pci@0/pci@0/scsi@0/disk /pci@0,600000/pci@0/pci@0/scsi@0/tape /pci@8,4000/ebus@1 /pci@8,4000/ebus@1/panel@14,280030 /pci@8,4000/ebus@1/scfc@14,200000 /pci@8,4000/ebus@1/serial@14,400000 /pci@8,4000/ebus@1/flashprom@10,0 /cmp@400,0/core@3 /cmp@400,0/core@2 /cmp@400,0/core@1 /cmp@400,0/core@0 /cmp@400,0/core@3/cpu@1 /cmp@400,0/core@3/cpu@0 /cmp@400,0/core@2/cpu@1 /cmp@400,0/core@2/cpu@0 /cmp@400,0/core@1/cpu@1 /cmp@400,0/core@1/cpu@0
Chapter 4 FRU Replacement Preparation 4-17
The show-devs output continued:
/cmp@400,0/core@0/cpu@1 /cmp@400,0/core@0/cpu@0 /openprom/client-services /packages/obp-tftp /packages/terminal-emulator /packages/disk-label /packages/deblocker /packages/SUNW,builtin-drivers /packages/SUNW,probe-error-handler {0} ok
5. Type the probe-scsi-all command to confirm that the storage devices are
mounted.
{0} ok probe-scsi-all /pci@0,600000/pci@0/pci@0/scsi@0
MPT Version 1.05, Firmware Version 1.24.00.00
Target 0 Unit 0 Disk FUJITSU MAY2073RC 3701 143374738 Blocks, 73 GB SASAddress 500000e0197292c2 PhyNum 0 Target 1 Unit 0 Disk FUJITSU MAY2073RC 3701 143374738 Blocks, 73 GB SASAddress 500000e019728f22 PhyNum 1 Target 2 Unit 0 Disk FUJITSU MAY2073RC 3701 143374738 Blocks, 73 GB SASAddress 500000e019729002 PhyNum 2 Target 3 Unit 0 Disk FUJITSU MAY2073RC 3701 143374738 Blocks, 73 GB SASAddress 500000e019729302 PhyNum 3 Target 4 Unit 0 Removable Read Only device MATSHITADVD-RAM UJ875AS 1000 SATA device PhyNum 4
{0} ok
6. Type the boot command to start the Solaris OS.
ok boot
4-18 SPARC Enterprise M3000 Server Service Manual • November 2009
Loading...