Manual Code C120-E352-06EN
Part No. 819-7903-13
August 2009, Revision A
Copyright 2007-2009 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved.
FUJITSU LIMITED provided technical input and review on portions of this material.
Sun Microsystems, Inc. and Fujitsu Limited each own or control intellectual property rights relating to products and technology described in
this document, and such products, technology and this document are protected by copyright laws, patents and other intellectual property laws
and international treaties. The intellectual property rights of Sun Microsystems, Inc. and Fujitsu Limited in such products, technology and this
document include, without limitation, one or more of the United States patents listed at http://www.sun.com/patents and one or more
additional patents or patent applications in the United States or other countries.
This document and the product and technology to which it pertains are distributed under licenses restricting their use, copying, distribution,
and decompilation. No part of such product or technology, or of this document, may be reproduced in any form by any means without prior
written authorization of Fujitsu Limited and Sun Microsystems, Inc., and their applicable licensors, if any. The furnishing of this document to
you does not give you any rights or licenses, express or implied, with respect to the product or technology to which it pertains, and this
document does not contain or represent any commitment of any kind on the part of Fujitsu Limited or Sun Microsystems, Inc., or any affiliate of
either of them.
This document and the product and technology described in this document may incorporate third-party intellectual property copyrighted by
and/or licensed from suppliers to Fujitsu Limited and/or Sun Microsystems, Inc., including software and font technology.
Per the terms of the GPL or LGPL, a copy of the source code governed by the GPL or LGPL, as applicable, is available upon request by the End
User. Please contact Fujitsu Limited or Sun Microsystems, Inc.
This distribution may include materials developed by third parties.
Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark
in the U.S. and in other countries, exclusively licensed through X/Open Company, Ltd.
Sun, Sun Microsystems, the Sun logo, Java, Netra, Solaris, Sun Ray, Answerbook2, docs.sun.com, OpenBoot, and Sun Fire are trademarks or
registered trademarks of Sun Microsystems, Inc., or its subsidiaries, in the U.S. and other countries.
Fujitsu and the Fujitsu logo are registered trademarks of Fujitsu Limited.
All SPARC trademarks are used under license and are registered trademarks of SPARC International, Inc. in the U.S. and other countries.
Products bearing SPARC trademarks are based upon architecture developed by Sun Microsystems, Inc.
SPARC64 is a trademark of SPARC International, Inc., used under license by Fujitsu Microelectronics, Inc. and Fujitsu Limited.
The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges
the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun
holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun’s licensees who implement OPEN
LOOK GUIs and otherwise comply with Sun’s written license agreements.
United States Government Rights - Commercial use. U.S. Government users are subject to the standard government user license agreements of
Sun Microsystems, Inc. and Fujitsu Limited and the applicable provisions of the FAR and its supplements.
Disclaimer: The only warranties granted by Fujitsu Limited, Sun Microsystems, Inc. or any affiliate of either of them in connection with this
document or any product or technology described herein are those expressly set forth in the license agreement pursuant to which the product
or technology is provided. EXCEPT AS EXPRESSLY SET FORTH IN SUCH AGREEMENT, FUJITSU LIMITED, SUN MICROSYSTEMS, INC.
AND THEIR AFFILIATES MAKE NO REPRESENTATIONS OR WARRANTIES OF ANY KIND (EXPRESS OR IMPLIED) REGARDING SUCH
PRODUCT OR TECHNOLOGY OR THIS DOCUMENT, WHICH ARE ALL PROVIDED AS IS, AND ALL EXPRESS OR IMPLIED
CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTY OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE
EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID. Unless otherwise expressly set forth in such agreement, to the
extent allowed by applicable law, in no event shall Fujitsu Limited, Sun Microsystems, Inc. or any of their affiliates have any liability to any
third party under any legal theory for any loss of revenues or profits, loss of use or data, or business interruptions, or for any indirect, special,
incidental or consequential damages, even if advised of the possibility of such damages.
DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES,
INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT,
ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Please
Recycle
Copyright 2007-2009 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, Etats-Unis. Tous droits réservés.
Entrée et revue tecnical fournies par FUJITSU LIMITED sur des parties de ce matériel.
Sun Microsystems, Inc. et Fujitsu Limited détiennent et contrôlent toutes deux des droits de propriété intellectuelle relatifs aux produits et
technologies décrits dans ce document. De même, ces produits, technologies et ce document sont protégés par des lois sur le copyright, des
brevets, d’autres lois sur la propriété intellectuelle et des traités internationaux. Les droits de propriété intellectuelle de Sun Microsystems, Inc.
et Fujitsu Limited concernant ces produits, ces technologies et ce document comprennent, sans que cette liste soit exhaustive, un ou plusieurs
des brevets déposés aux États-Unis et indiqués à l’adresse http://www.sun.com/patents de même qu’un ou plusieurs brevets ou applications
brevetées supplémentaires aux États-Unis et dans d’autres pays.
Ce document, le produit et les technologies afférents sont exclusivement distribués avec des licences qui en restreignent l’utilisation, la copie,
la distribution et la décompilation. Aucune partie de ce produit, de ces technologies ou de ce document ne peut être reproduite sous quelque
forme que ce soit, par quelque moyen que ce soit, sans l’autorisation écrite préalable de Fujitsu Limited et de Sun Microsystems, Inc., et de leurs
éventuels bailleurs de licence. Ce document, bien qu’il vous ait été fourni, ne vous confère aucun droit et aucune licence, expresses ou tacites,
concernant le produit ou la technologie auxquels il se rapporte. Par ailleurs, il ne contient ni ne représente aucun engagement, de quelque type
que ce soit, de la part de Fujitsu Limited ou de Sun Microsystems, Inc., ou des sociétés affiliées.
Ce document, et le produit et les technologies qu’il décrit, peuvent inclure des droits de propriété intellectuelle de parties tierces protégés par
copyright et/ou cédés sous licence par des fournisseurs à Fujitsu Limited et/ou Sun Microsystems, Inc., y compris des logiciels et des
technologies relatives aux polices de caractères.
Par limites du GPL ou du LGPL, une copie du code source régi par le GPL ou LGPL, comme applicable, est sur demande vers la fin utilsateur
disponible; veuillez contacter Fujitsu Limted ou Sun Microsystems, Inc.
Cette distribution peut comprendre des composants développés par des tierces parties.
Des parties de ce produit pourront être dérivées des systèmes Berkeley BSD licenciés par l’Université de Californie. UNIX est une marque
déposée aux Etats-Unis et dans d’autres pays et licenciée exclusivement par X/Open Company, Ltd.
Sun, Sun Microsystems, le logo Sun, Java, Netra, Solaris, Sun Ray, Answerbook2, docs.sun.com, OpenBoot, et Sun Fire sont des marques de
fabrique ou des marques déposées de Sun Microsystems, Inc., ou ses filiales, aux Etats-Unis et dans d’autres pays.
Fujitsu et le logo Fujitsu sont des marques déposées de Fujitsu Limited.
Toutes les marques SPARC sont utilisées sous licence et sont des marques de fabrique ou des marques déposées de SPARC International, Inc.
aux Etats-Unis et dans d’autres pays. Les produits portant les marques SPARC sont basés sur une architecture développée par Sun
Microsystems, Inc.
SPARC64 est une marques déposée de SPARC International, Inc., utilisée sous le permis par Fujitsu Microelectronics, Inc. et Fujitsu Limited.
L’interface d’utilisation graphique OPEN LOOK et Sun™ a été développée par Sun Microsystems, Inc. pour ses utilisateurs et licenciés. Sun
reconnaît les efforts de pionniers de Xerox pour la recherche et le développement du concept des interfaces d’utilisation visuelle ou graphique
pour l’industrie de l’informatique. Sun détient une license non exclusive de Xerox sur l’interface d’utilisation graphique Xerox, cette licence
couvrant également les licenciés de Sun qui mettent en place l’interface d’utilisation graphique OPEN LOOK et qui, en outre, se conforment
aux licences écrites de Sun.
Droits du gouvernement américain - logiciel commercial. Les utilisateurs du gouvernement américain sont soumis aux contrats de licence
standard de Sun Microsystems, Inc. et de Fujitsu Limited ainsi qu’aux clauses applicables stipulées dans le FAR et ses suppléments.
Avis de non-responsabilité: les seules garanties octroyées par Fujitsu Limited, Sun Microsystems, Inc. ou toute société affiliée de l’une ou l’autre
entité en rapport avec ce document ou tout produit ou toute technologie décrit(e) dans les présentes correspondent aux garanties expressément
stipulées dans le contrat de licence régissant le produit ou la technologie fourni(e). SAUF MENTION CONTRAIRE EXPRESSÉMENT
STIPULÉE DANS CE CONTRAT, FUJITSU LIMITED, SUN MICROSYSTEMS, INC. ET LES SOCIÉTÉS AFFILIÉES REJETTENT TOUTE
REPRÉSENTATION OU TOUTE GARANTIE, QUELLE QU’EN SOIT LA NATURE (EXPRESSE OU IMPLICITE) CONCERNANT CE
PRODUIT, CETTE TECHNOLOGIE OU CE DOCUMENT, LESQUELS SONT FOURNIS EN L’ÉTAT. EN OUTRE, TOUTES LES CONDITIONS,
REPRÉSENTATIONS ET GARANTIES EXPRESSES OU TACITES, Y COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE À
LA QUALITÉ MARCHANDE, À L’APTITUDE À UNE UTILISATION PARTICULIÈRE OU À L’ABSENCE DE CONTREFAÇON, SONT
EXCLUES, DANS LA MESURE AUTORISÉE PAR LA LOI APPLICABLE. Sauf mention contraire expressément stipulée dans ce contrat, dans
la mesure autorisée par la loi applicable, en aucun cas Fujitsu Limited, Sun Microsystems, Inc. ou l’une de leurs filiales ne sauraient être tenues
responsables envers une quelconque partie tierce, sous quelque théorie juridique que ce soit, de tout manque à gagner ou de perte de profit,
de problèmes d’utilisation ou de perte de données, ou d’interruptions d’activités, ou de tout dommage indirect, spécial, secondaire ou
consécutif, même si ces entités ont été préalablement informées d’une telle éventualité.
LA DOCUMENTATION EST FOURNIE “EN L’ETAT” ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES
OU TACITES SONT FORMELLEMENT EXCLUES, DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE, Y COMPRIS NOTAMMENT
TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L’APTITUDE A UNE UTILISATION PARTICULIERE OU A
L’ABSENCE DE CONTREFACON.
Contents
Prefacexvii
1.Safety and Tools1–1
1.1Safety Precautions1–1
1.2System Precautions1–2
1.2.1Electrical Safety Precautions1–2
1.2.2Equipment Rack Safety Precautions1–2
1.2.3Filler Boards and Filler Panels1–3
1.2.4Handling Components1–3
2.Fault Isolation2–1
2.1Determining Which Diagnostics Tools to Use2–1
2.2Checking the Server and System Configuration2–4
2.2.1Checking the Hardware Configuration and FRU Status2–4
2.2.1.1Checking the Hardware Configuration2–5
2.2.2Checking the Software and Firmware Configuration2–6
2.2.2.1Checking the Software Configuration2–7
2.2.2.2Checking the Firmware Configuration2–7
2.2.3Downloading the Error Log Information2–8
2.3Operator Panel2–9
v
2.4Error Conditions2–14
2.4.1Predictive Self-Healing Tools2–14
2.4.2Monitoring Output2–17
2.4.3Messaging Output2–17
2.5LED Functions2–18
2.6Using the Diagnostic Commands2–21
2.6.1Using the showlogs Command2–21
2.6.2Using the fmdump Command2–22
2.6.2.1fmdump -V Command2–22
2.6.2.2fmdump -e Command2–23
2.6.3Using the fmadm faulty Command2–23
2.6.3.1fmadm repair Command2–23
2.6.3.2fmadm config Command2–24
2.6.4Using the fmstat Command2–24
2.7Traditional Solaris Diagnostic Commands2–26
2.7.1Using the iostat Command2–27
2.7.1.1Options2–27
2.7.2Using the prtdiag Command2–28
2.7.2.1Options2–28
2.7.3Using the prtconf Command2–30
2.7.3.1Options2–30
2.7.4Using the netstat Command2–32
2.7.4.1Options2–33
2.7.5Using the ping Command2–34
2.7.5.1Options2–34
2.7.6Using the ps Command2–35
2.7.6.1Options2–35
2.7.7Using the prstat Command2–36
viSPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
2.7.7.1Options2–36
2.8Other Issues2–37
2.8.1Can’t Locate Boot Device2–37
3.Periodic Maintenance3–1
3.1Tape Drive Unit3–1
3.1.1Cleaning the Tape Drive Unit3–1
4.FRU Replacement Preparation4–1
4.1FRU Replacement Method4–1
4.2Active Replacement4–4
4.2.1Removing a FRU From a Domain4–4
4.2.2Removing and Replacing a FRU4–5
4.2.3Adding a FRU Into a Domain4–5
4.2.4Verifying Hardware Operation4–6
4.3Hot Replacement4–6
4.3.1Removing and Replacing a FRU4–7
4.3.2Verifying Hardware Operation4–9
4.4Cold Replacement (Powering the Server Off and On)4–12
4.4.1Powering the Server Off Using Software4–12
4.4.2Powering the Server On Using Software4–13
4.4.3Powering the Server Off Manually4–14
4.4.4Powering the Server On Manually4–14
4.4.5Verifying Hardware Operation4–15
5.Internal Components Access5–1
5.1Sliding the Server In and Out to the Fan Stop5–1
5.1.1Sliding the Server Out of the Equipment Rack5–2
5.1.2Sliding the Server Into the Equipment Rack5–4
5.2Top Cover Remove and Replace5–5
Contentsvii
5.2.1Removing the Top Cover5–5
5.2.2Replacing the Top Cover5–8
5.3Fan Cover Remove and Replace5–8
5.3.1Removing the Fan Cover5–8
5.3.2Replacing the Fan Cover5–10
6.Storage Devices Replacement6–1
6.1Hard Disk Drive Replacement6–1
6.1.1Accessing the Hard Disk Drive6–4
6.1.2Removing the Hard Disk Drive6–4
6.1.3Installing the Hard Disk Drive6–5
6.1.4Securing the Server6–5
6.1.5Accessing the Hard Disk Drive Backplane of the SPARC Enterprise
M4000 Server6–6
6.1.6Removing the Hard Disk Drive Backplane of the SPARC Enterprise
M4000 Server6–6
6.1.7Installing the Hard Disk Drive Backplane of the SPARC Enterprise
M4000 Server6–7
6.1.8Securing the Server6–8
6.1.9Accessing the Hard Disk Drive Backplane of the SPARC Enterprise
M5000 Server6–9
6.1.10Removing the Hard Disk Drive Backplane of the SPARC Enterprise
M5000 Server6–10
6.1.11Installing the Hard Disk Drive Backplane of the SPARC Enterprise
M5000 Server6–10
6.1.12Securing the Server6–11
6.2CD-RW/DVD-RW Drive Unit (DVDU) Replacement6–12
6.2.1Accessing the CD-RW/DVD-RW Drive Unit6–15
6.2.2Removing the CD-RW/DVD-RW Drive Unit6–15
6.2.3Installing the CD-RW/DVD-RW Drive Unit6–16
6.2.4Securing the Server6–16
viiiSPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
6.2.5Accessing the CD-RW/DVD-RW Drive Backplane of the SPARC
Enterprise M4000 Server6–17
6.2.6Removing the CD-RW/DVD-RW Drive Backplane of the SPARC
Enterprise M4000 Server6–17
6.2.7Installing the CD-RW/DVD-RW Drive Backplane of the SPARC
Enterprise M4000 Server6–18
6.2.8Securing the Server6–18
6.2.9Accessing the CD-RW/DVD-RW Drive Backplane of the SPARC
Enterprise M5000 Server6–19
6.2.10Removing the CD-RW/DVD-RW Drive Backplane of the SPARC
Enterprise M5000 Server6–20
6.2.11Installing the CD-RW/DVD-RW Drive Backplane of the SPARC
Enterprise M5000 Server6–20
6.2.12Securing the Server6–21
6.3Tape Drive Unit Replacement6–22
6.3.1Accessing the Tape Drive Unit6–25
6.3.2Removing the Tape Drive Unit6–25
6.3.3Installing the Tape Drive Unit6–26
6.3.4Securing the Server6–26
6.3.5Accessing the Tape Drive Backplane of the SPARC Enterprise
M4000 Server6–27
6.3.6Removing the Tape Drive Backplane of the SPARC Enterprise
M4000 Server6–28
6.3.7Installing the Tape Drive Backplane of the SPARC Enterprise
M4000 Server6–28
6.3.8Securing the Server6–29
6.3.9Accessing the Tape Drive Backplane of the SPARC Enterprise
M5000 Server6–30
6.3.10Removing the Tape Drive Backplane of the SPARC Enterprise
M5000 Server6–31
6.3.11Installing the Tape Drive Backplane of the SPARC Enterprise
M5000 Server6–31
6.3.12Securing the Server6–32
Contentsix
7.Power Systems Replacement7–1
7.1Power Supply Unit Replacement7–1
7.1.1Accessing the Power Supply Unit7–4
7.1.2Removing the Power Supply Unit7–4
7.1.3Installing the Power Supply Unit7–5
7.1.4Securing the Server7–5
8.I/O Unit Replacement8–1
8.1PCI Cassette Replacement8–4
8.1.1Accessing the PCI Cassette8–5
8.1.2Removing the PCI Cassette8–5
8.1.3Installing the PCI Cassette8–6
8.1.4Securing the Server8–7
8.2PCI Card Replacement8–7
8.2.1Removing the PCI Card8–7
8.2.2Installing the PCI Card8–8
8.3I/O Unit Replacement8–10
8.3.1Accessing the I/O Unit8–10
8.3.2Removing the I/O Unit8–10
8.3.3Installing the I/O Unit8–11
8.3.4Securing the Server8–12
8.4I/O Unit DC-DC Converter Replacement8–12
8.4.1Accessing the I/O Unit DC-DC Converter (DDC_A#0 or
DDC_B#0)8–14
8.4.2Removing the I/O Unit DC-DC Converter (DDC_A #0 or DDC_B
#0)8–14
8.4.3Installing the I/O Unit DC-DC Converter (DDC_A #0 or DDC_B
#0)8–17
8.4.4Securing the Server8–21
8.4.5Accessing the I/O Unit DC-DC Converter Riser8–21
xSPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
8.4.6Removing the I/O Unit DC-DC Converter Riser8–22
8.4.7Replacing the I/O Unit DC-DC Converter Riser8–24
8.4.8Securing the Server8–24
9.XSCF Unit Replacement9–1
9.1XSCF Unit Replacement9–1
9.1.1Accessing the XSCF Unit9–3
9.1.2Removing the XSCF Unit9–4
9.1.3Installing the XSCF Unit9–5
9.1.4Securing the Server9–5
10.Fan Modules Replacement10–1
10.1Fan Module Replacement10–1
10.1.1Accessing the 60-mm Fan Module10–4
10.1.2Removing the 60-mm Fan Module10–5
10.1.3Installing the 60-mm Fan Module10–6
10.1.4Securing the Server10–6
10.1.5Accessing the 172-mm Fan Module10–7
10.1.6Removing the 172-mm Fan Module10–8
10.1.7Installing the 172-mm Fan Module10–9
10.1.8Securing the Server10–9
10.1.9Accessing the 60-mm Fan Backplane10–10
10.1.10 Removing the 60-mm Fan Backplane10–11
10.1.11 Installing the 60-mm Fan Backplane10–12
10.1.12 Securing the Server10–12
10.1.13 Accessing the SPARC Enterprise M4000 172-mm Fan Backplane
10–13
10.1.14 Removing the SPARC Enterprise M4000 172-mm Fan Backplane
10–13
Contentsxi
10.1.15 Installing the SPARC Enterprise M4000 172-mm Fan Backplane
10–16
10.1.16 Securing the Server10–16
10.1.17 Accessing the SPARC Enterprise M5000 172-mm Fan Backplane
10–17
10.1.18 Removing the SPARC Enterprise M5000 172-mm Fan Backplane
10–17
10.1.19 Installing the SPARC Enterprise M5000 172-mm Fan Backplane
10–20
xiiSPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
12.1.4Securing the Server12–6
12.2CPU Upgrade12–7
12.2.1SPARC64 VII CPU Modules Added to a New Domain12–8
12.2.2SPARC64 VII Processors Added to an Existing Domain12–11
13.Motherboard Unit Replacement13–1
13.1Motherboard Unit Replacement13–1
13.1.1Accessing the SPARC Enterprise M4000 Motherboard Unit13–4
13.1.2Removing the SPARC Enterprise M4000 Motherboard Unit13–5
13.1.3Installing the SPARC Enterprise M4000 Motherboard Unit13–6
13.1.4Securing the Server13–6
13.1.5Accessing the SPARC Enterprise M5000 Motherboard Unit13–7
13.1.6Removing the SPARC Enterprise M5000 Motherboard Unit13–8
13.1.7Installing the SPARC Enterprise M5000 Motherboard Unit13–10
13.1.8Securing the Server13–11
13.2DC-DC Converter Replacement13–12
13.2.1Accessing the SPARC Enterprise M4000 DC-DC Converter13–14
13.2.2Removing the SPARC Enterprise M4000 DC-DC Converter13–15
13.2.3Installing the SPARC Enterprise M4000 DC-DC Converter13–16
13.2.4Securing the Server13–16
13.2.5Accessing the SPARC Enterprise M5000 DC-DC Converter13–17
13.2.6Removing the SPARC Enterprise M5000 DC-DC Converter13–18
13.2.7Installing the SPARC Enterprise M5000 DC-DC Converter13–18
13.2.8Securing the Server13–18
14.Backplane Unit Replacement14–1
14.1Backplane Unit Replacement14–1
14.1.1Accessing the SPARC Enterprise M4000 Backplane Unit14–4
14.1.2Removing the SPARC Enterprise M4000 Backplane Unit14–5
Contentsxiii
14.1.3Installing the SPARC Enterprise M4000 Backplane Unit14–7
14.1.4Securing the Server14–8
14.1.5Accessing the SPARC Enterprise M5000 Backplane Unit14–9
14.1.6Removing the SPARC Enterprise M5000 Backplane Unit14–10
14.1.7Installing the SPARC Enterprise M5000 Backplane Unit14–12
14.1.8Securing the Server14–12
15.Operator Panel Replacement15–1
15.1Operator Panel Replacement15–1
15.2Accessing the Operator Panel15–4
15.2.1Removing the Operator Panel15–4
15.2.2Installing the Operator Panel15–7
15.2.3Securing the Server15–7
A. Components ListA–1
B. Rules for System ConfigurationB–1
B.1Server ConfigurationB–1
C. FRU ListC–1
C.1Server OverviewC–1
C.2System BoardsC–3
C.2.1Motherboard UnitC–3
C.2.2CPU ModuleC–4
C.2.3Memory BoardC–5
C.3Backplane UnitC–6
C.4I/O UnitC–6
C.5PowerC–7
C.6FAN ModuleC–8
C.7eXtended System Control Facility UnitC–9
xivSPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
C.8DrivesC–10
C.8.1Hard Disk DriveC–10
C.8.2CD-RW/DVD-RW Drive Unit (DVDU)C–11
C.8.3Tape Drive Unit (TAPEU)C–11
D. External Interface SpecificationsD–1
D.1Serial PortD–2
D.2UPC (UPs Control) PortD–3
D.3USB PortD–3
D.4Connection Diagram for Serial CableD–4
E. UPS ControllerE–1
E.1OverviewE–1
E.2Signal CablesE–1
E.3Signal Line ConfigurationE–2
E.4Power Supply ConditionsE–3
E.4.1Input circuitE–3
E.4.2Output circuitE–4
E.5UPS CableE–4
E.6UPC ConnectorE–5
F. AbbreviationsF–1
IndexIndex–1
Contentsxv
xviSPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
Preface
This manual describes how to service SPARC Enterprise™ M4000/M5000 servers. It
is written for maintenance providers who have received training under a selfmaintenance contract.
This section includes:
■ “Glossary” on page xvii
■ “Structure and Contents of This Manual” on page xviii
■ “SPARC Enterprise M4000/M5000 Servers Documentation” on page xix
■ “Text Conventions” on page xxii
■ “Prompt Notations” on page xxii
■ “Syntax of the Command-Line Interface (CLI)” on page xxiii
■ “Environment Requirements for Using This Product” on page xxiii
■ “Conventions for Alert Messages” on page xxiv
■ “Notes on Safety” on page xxv
■ “Alert Labels” on page xxviii
■ “Product Handling” on page xxix
■ “Limitations and Cautions” on page xxx
■ “Fujitsu Welcomes Your Comments” on page xxxi
Glossary
For the terms used in the “SPARC Enterprise M4000/M5000 Servers
Documentation” on page xix, refer to the SPARC Enterprise
M3000/M4000/M5000/M8000/M9000 Servers Glossary.
xvii
Structure and Contents of This Manual
This manual is organized as described below:
■ CHAPTER 1 Safety and Tools
Describes safety and tool information.
■ CHAPTER 2 Fault Isolation
Describes overview and fault diagnosis information.
■ CHAPTER 3 Periodic Maintenance
Describes the periodic maintenance required to keep the server running
regardless of whether a problem has occurred.
■ CHAPTER 4 FRU Replacement Preparation
Describes how to prepare a field-replaceable unit (FRU) for safe removal.
■ CHAPTER 5 Internal Components Access
Describes how to access the internal components.
■ CHAPTER 6 Storage Devices Replacement
Describes how to remove and install the main storage systems.
■ CHAPTER 7 Power Systems Replacement
Describes the power supply units and how to remove and replace them.
■ CHAPTER 8 I/O Unit Replacement
Describes how to remove and install the I/O unit and PCI cassettes.
■ CHAPTER 9 XSCF Unit Replacement
Provides an overview of the unit and describes how to remove and replace it.
■ CHAPTER 10 Fan Modules Replacement
Describes how to remove and install the fan modules.
■ CHAPTER 11 Memory Board Replacement
Describes how to remove and replace memory boards and DIMMs.
■ CHAPTER 12 CPU Module Replacement
Describes how to remove and replace the CPU modules.
■ CHAPTER 13 Motherboard Unit Replacement
Describes how to remove and replace the motherboard.
■ CHAPTER 14 Backplane Unit Replacement
Describes how to remove and replace the backplane unit.
xviii SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
■ CHAPTER 15 Operator Panel Replacement
Describes how to remove and install the operator panel.
■ APPENDIX A Components List
Shows the midrange servers nomenclature and component numbering.
■ APPENDIX B Rules for System Configuration
Shows the system configurations for the midrange servers.
■ APPENDIX C FRU List
Shows the midrange server FRUs.
■ APPENDIX D External Interface Specifications
Shows the enternal interface specifications for the midrange servers.
■ APPENDIX E UPS Controller
Shows the connection of UPC interface, which controls UPS (Uninterruptible
Power Supply).
■ APPENDIX F Abbreviations
Shows the lists that the proper names of acronyms found in this manual.
■ Index
Provides keywords and corresponding reference page numbers so that the
reader can easily search for items in this manual as necessary.
This manual uses the following fonts and symbols to express specific types of
information.
Fonts/symbolsMeaningExample
AaBbCc123What you type, when contrasted
with on-screen computer output.
This font represents the example of
command input in the frame.
AaBbCc123The names of commands, files, and
directories; on-screen computer
output.
This font represents the example of
command output in the frame.
ItalicIndicates the name of a reference
manual
" "Indicates names of chapters,
sections, items, buttons, or menus
XSCF> adduser jsmith
XSCF> showuser -p
User Name: jsmith
Privileges: useradm
auditadm
See the SPARC Enterprise
M3000/M4000/M5000/M8000/M9000
Servers XSCF User’s Guide.
See Chapter 2, "Fault Installation."
Prompt Notations
The following prompt notations are used in this manual.
ShellPrompt Notations
XSCFXSCF>
C shellmachine-name%
C shell super usermachine-name#
Bourne shell and Korn shell$
Bourne shell and Korn shell
super user
OpenBoot™ PROMok
xxii SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
#
Syntax of the Command-Line Interface
(CLI)
The command syntax is as follows:
■ A variable that requires input of a value must be enclosed in <>.
■ An optional element must be enclosed in [ ].
■ A group of options for an optional keyword must be enclosed in [ ] and delimited
by |.
■ A group of options for a mandatory keyword must be enclosed in {} and
delimited by |.
■ The command syntax is shown in a box.
Example:
XSCF> showuser -a
Environment Requirements for Using
This Product
This product is a computer that is intended to be used in a computer room.
For details on the operational environment, see the SPARC Enterprise M4000/M5000 Servers Site Planning Guide.
Prefacexxiii
Conventions for Alert Messages
This manual uses the following conventions to show alert messages, which are
intended to prevent injury to the user or bystanders as well as property damage, and
important messages that are useful to the user.
WARNING:
This indicates a hazardous situation that could result in death or serious personal
injury (potential hazard) if the user does not perform the procedure correctly.
CAUTION:
This indicates a hazardous situation that could result in minor or moderate personal
injury if the user does not perform the procedure correctly. This signal also indicates
that damage to the product or other property may occur if the user does not perform
the procedure correctly.
IMPORTANT:
This indicates information that could help the user to use the product more
effectively.
Alert messages in the text
An alert message in the text consists of a signal indicating an alert level followed by
an alert statement. Alert messages are indented to distinguish them from regular
text. Also, a space of one line precedes and follows an alert statement.
WARNING:
The tasks listed below for this product and optional product provided by Fujitsu
should be performed only by authorized service personnel.
The user must not perform these tasks. Incorrect operation of these tasks may cause
electric shock, injury, or fire.
■ Installation and reinstallation of all components
■ Removal of front, rear, or side covers
■ Mounting/unmounting of optional internal devices
■ Connecting/disconnecting of external interface cables
■ Maintenance (repair and regular diagnosis and maintenance)
Also, important alert messages are shown in “Important Alert Messages” on
page xxv.
xxiv SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
Notes on Safety
Important Alert Messages
This manual provides the following important alert signals:
Caution – The WARNING signal indicates a dangerous situation could result in
death or serious injury if the user does not perform the procedure correctly.
Tas kWar nin g
Normal
operation
Electric shock, fire
Do not damage, break, or modify the power cords. Cord damage may
cause electric shock or fire.
Prefacexxv
Caution – The CAUTION signal indicates a hazardous situation could result in
minor or moderate personal injury if the user does not perform the procedure
correctly. This signal also indicates that damage to the product or other property
may occur if the user does not perform the procedure correctly.
TaskWarning
Normal
operation
Equipment damage
Be sure to follow the precautions below when installing the main unit.
Otherwise, the equipment may be damaged.
• Do not block ventilation slits.
• Avoid installing the equipment in a place exposed to direct sunlight or
near equipment that becomes extremely hot.
• Avoid installing the equipment in a dusty place or a place directly
exposed to corrosive gas or salty air.
• Avoid installing the equipment in a place exposed to strong vibration.
Also, install the equipment on a level surface so that it is stable.
• The equipment can be grounded using shared grounding. However, the
grounding method varies with the building where it is installed. Be sure
to confirm the related standards to ground the equipment correctly.
• Do not run any cable beneath any equipment. Also, prevent cables from
becoming taut. Never disconnect any power cord from the equipment
while power is being supplied to the equipment.
• Do not place anything on top of the main unit. Do not use the main unit
as a workspace.
• Avoid exposing the equipment to rapid changes in the ambient
temperature, such as a rapid increase during transport in winter. A rapid
increase in the ambient temperature causes moisture to condense in the
equipment. Use the equipment only after the difference between its
temperature and the ambient temperature is negligible.
• Avoid installing the equipment near a copy machine, air conditioner, or
welding machine, which is noisy.
• Take preventive action to minimize static electricity at the installation
location. Note that static electricity is easily generated in some carpets
and can cause the equipment to malfunction.
• Confirm that the power supply voltage and frequency during operation
match the rated values indicated on the equipment.
• Do not insert any object into an opening in the equipment. Components
inside the equipment use high voltage. Conductive foreign matter, such
as a metal object, inserted into the equipment, may cause a short circuit
between components, resulting in fire, electric shock, or equipment
damage.
• For maintenance of the equipment, contact your authorized service
personnel.
xxvi SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
TaskWarning
Normal
operation
Data destruction
Confirm the items listed below before turning off the power. Otherwise,
data may be destroyed.
• All applications have completed processing.
• No user is using the equipment.
• When the main unit power is turned off, the POWER LED on the
operation panel is turned off. Be sure to confirm that the POWER LED is
off before turning off the main power (uninterruptible power supply
[UPS], power distribution box, etc.).
If necessary, back up files before turning off the system power.
Data destruction
Do not forcibly stop a domain that is operating normally. Otherwise, data
may be destroyed.
Data destruction
Do not disconnect the power cord from the AC power input while power
is being supplied. Otherwise, data stored on hard disk units may be
destroyed.
Prefacexxvii
Alert Labels
FF2 (Front View)
The followings are labels attached to this product:
■ Never peel off the labels.
xxviii SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
Product Handling
Maintenance
Caution – Certain tasks in this manual should only be performed by a certified
service engineer. User must not perform these tasks. Incorrect operation of these
tasks may cause electric shock, injury, or fire.
■ Installation and reinstallation of all components, and initial settings
■ Removal of front, rear, or side covers
■ Mounting/de-mounting of optional internal devices
■ Plugging or unplugging of external interface cards
■ Maintenance and inspections (repairing, and regular diagnosis and maintenance)
Caution – The following tasks regarding this product and the optional products
provided from Fujitsu should only be performed by a certified service engineer.
Users must not perform these tasks. Incorrect operation of these tasks may cause
malfunction.
■ Unpacking optional adapters and such packages delivered to the users
■ Plugging or unplugging of external interface cards
Remodeling/Rebuilding
Caution – Any modification and/or recycling of this product and its components
may be carried out only by a certified service engineer and must not be done by the
customer under any circumstances. Otherwise, electric shock, injury or fire may
result.
Prefacexxix
Emission of Laser Beam (Invisible)
Caution – The main unit contains modules that generate invisible laser radiation.
Laser beams are generated while the equipment is operating, even if an optical cable
is disconnected or a cover is removed. Do not look at any light-emitting part directly
or through an optical apparatus (e.g., magnifying glass, microscope).
Limitations and Cautions
Power Control and Operator Panel Mode Switch
When you use the remote power control utilizing the RCI function or the automatic power
control system (referred to below as APCS), you can disable this remote power control or the
APCS by switching to Service mode on the operator panel.
Disabling these features ensures that you do not unintentionally switch the system power on
or off during maintenance. Note system power off with the APCS cannot be disabled with
the mode switch. Therefore, be sure to turn off automatic power control via APCS before
starting maintenance.
If you switch the mode while using the RCI or the automatic power control, the system
power is controlled as follows.
FunctionMode switch
LockedService
RCIRemote power-on/power-off
operations are enabled.
Automatic
power control
To use the RCI function, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000
Servers RCI Build Procedure and the SPARC Enterprise
M3000/M4000/M5000/M8000/M9000 Servers RCI User’s Guide which are available on the
website of manuals.
To use the APCS, see the Enhanced Support Facility User's Guide for Machine Administration Automatic Power Control Function (Supplement Edition).
xxx SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
Automatic power-on/power-off
operations are enabled.
Remote power-on/power-off
operations are disabled.
Automatic power-on is disabled,
but power-off remains enabled.
Fujitsu Welcomes Your Comments
If you have any comments or requests regarding this document, or if you find any
unclear statements in the document, please state your points specifically on the form
at the following URL.
xxxii SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
CHAPTER
1
Safety and Tools
This chapter describes safety and tools information. The information is organized
into the following topics:
■ “Safety Precautions” on page 1-1
■ “System Precautions” on page 1-2
■ “System Precautions” on page 1-2
1.1Safety Precautions
To protect both yourself and the equipment, observe the following safety
precautions.
TABLE 1-1 ESD Precautions
ItemProblemPrecaution
ESD
jack/wrist or
foot strap
ESD matESDAn approved ESD mat provides protection from static damage when used
ESD
packaging
box
Electrostatic
Discharge (ESD)
ESDPlace the board or component in the ESD safe packaging box after you
Connect the ESD connector to your server and wear the wrist strap or foot
strap when handling printed circuit boards. There are two antistatic strap
attachment points on the chassis:
1. Right side towards the front
2. Left side towards the rear
with a wrist strap or foot strap. The mat also cushions and protects small
parts that are attached to printed circuit boards.
remove it.
1-1
Caution – Attach the cord of the antistatic wrist strap directly to the server. Do not
attach the antistatic wrist strap to the ESD mat connection.
The antistatic wrist strap and any components you remove must be at the same
potential.
1.2System Precautions
For your protection, observe the following safety precautions when servicing your
equipment:
■ Follow all cautions, warnings, and instructions marked on the equipment.
■ Never push objects of any kind through openings in the equipment, as they might
touch dangerous voltage points or short out components that could result in fire
or electric shock.
■ Refer servicing of equipment to qualified personnel.
1.2.1Electrical Safety Precautions
Ensure that the voltage and frequency of the power outlet to be used match the
electrical rating labels on the equipment.
Wear antistatic wrist straps when handling any magnetic storage devices,
system boards, or other printed circuit boards.
Use only properly grounded power outlets as described in the SPARC Enterprise M4000/M5000 Servers Installation Guide.
Caution – Do not make mechanical or electrical modifications. The manufacturer is
not responsible for regulatory compliance of modified servers.
1.2.2Equipment Rack Safety Precautions
All equipment racks should be anchored to the floor, ceiling, or to adjacent frames,
using the manufacturer’s instructions.
1-2SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
Free-standing equipment racks should be supplied with a stabilizer feature, which
must be sufficient to support the weight of the server when extended on its slides.
This prevents instability during installation or service actions.
Where a stabilizer feature is not supplied and the equipment rack is not bolted to the
floor, a safety evaluation must be conducted by the installation or service engineer.
The safety evaluation determines stability when the server is extended on its slides,
prior to any installation or service activity.
Prior to installing the equipment rack on a raised floor, a safety evaluation must be
conducted by the installation or service engineer. The safety evaluation ensures that
the raised floor has sufficient strength to withstand the forces upon it when the
server is extended on its slides. The normal procedure in this case would be to fix
the rack through the raised floor to the concrete floor below, using a proprietary
mounting kit for the purpose.
Caution – If more than one server is installed in an equipment rack, service only
one server at a time.
1.2.3Filler Boards and Filler Panels
Filler boards and panels, which are physically inserted into the server when a board
or module has been removed are used for EMI protection and for air flow.
1.2.4Handling Components
Caution – There is a separate ground located on the rear of the server. It is
important to ensure that the server is properly grounded.
Caution – The server is sensitive to static electricity. To prevent damage to the
board, connect an antistatic wrist strap between you and the server.
Caution – The boards have surface-mount components that can be broken by
flexing the boards.
To minimize the amount of board flexing, observe the following precautions:
Chapter 1Safety and Tools1-3
■ Hold the board by the handle and finger hold panels, where the board stiffener is
located. Do not hold the board at the ends.
■ When removing the board from the packaging, keep the board vertical until you
lay it on the cushioned ESD mat.
■ Do not place the board on a hard surface. Use a cushioned antistatic mat. The
board connectors and components have very thin pins that bend easily.
■ Be careful of small component parts located on both sides of the board.
■ Do not use an oscilloscope probe on the components. The soldered pins are easily
damaged or shorted by the probe point.
■ Transport the board in its packaging box.
Caution – The heat sinks can be damaged by incorrect handling. Do not touch the
heat sinks while replacing or removing boards. If a heat sink is loose or broken,
obtain a replacement board. When storing or shipping a board, ensure that the heat
sinks have sufficient protection.
Caution – On the PCI cassette, when removing cables such as LAN cable, if your
finger can’t reach the latch lock of the connector, press the latch with a flathead
screwdriver to remove the cable. Forcing your finger into the clearance can cause
damage to the PCI card.
1-4SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
CHAPTER
2
Fault Isolation
This chapter describes overview and fault diagnosis information. The information is
organized into the following topics:
■ “Determining Which Diagnostics Tools to Use” on page 2-1
■ “Checking the Server and System Configuration” on page 2-4
■ “Operator Panel” on page 2-9
■ “Error Conditions” on page 2-14
■ “LED Functions” on page 2-18
■ “Using the Diagnostic Commands” on page 2-21
■ “Traditional Solaris Diagnostic Commands” on page 2-26
■ “Other Issues” on page 2-37
2.1Determining Which Diagnostics Tools to
Use
When a failure occurs, a message is often displayed on the monitor. Use the
flowcharts in
problems.
FIGURE 2-1 and FIGURE 2-2 to find the correct methods for diagnosing
2-1
FIGURE 2-1 Diagnostic Method Flow Chart
No
2-2SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
FIGURE 2-2 Diagnostic Method Flow Chart—Traditional Data Collection
Chapter 2 Fault Isolation2-3
2.2Checking the Server and System
Configuration
Before and after maintenance work, the state and configuration of the server and
components should be checked and the information saved. For recovery from a
problem, conditions related to the problem and the repair status must be checked.
The operating conditions must remain the same before and after maintenance.
A functioning
For example:
■ The syslog file should not display error messages.
■ The XSCF Shell command showhardconf does not display the * mark.
■ The administrative console should not display error messages.
■ The server processor logs should not display any error messages.
■ The Solaris™ Operating System message files should not indicate any additional
errors.
server without any problems should not display any error conditions.
2.2.1Checking the Hardware Configuration and FRU
Status
To replace a faulty component and perform the maintenance on the server it is
important to check and understand the hardware configuration of the server and the
state of each hardware component.
The hardware configuration refers to information that indicates to which layer a
component belongs in the hardware configuration.
The status of each hardware component refers to information on the condition of the
standard or optional component in the server: temperature, power supply voltage,
CPU operating conditions, and other times.
2-4SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
The hardware configuration and the status of each hardware component can be
checked from the maintenance terminal using eXtended System Control Facility
(XSCF) Shell commands, as shown in the following table.
TABLE 2-1 Commands for Checking Hardware Configuration
CommandDescription
showhardconfDisplays hardware configuration.
showstatusDisplays the status of a component. This command is used when
only a faulty component is checked.
showboardsDisplays the status of devices and resources.
showdclDisplays the hardware resource configuration information of a
domain.
showfruDisplays the setting information of a device.
Also some conditions can be checked based on the On or blinking state of the
component LEDs (see
TABLE 2-3).
2.2.1.1Checking the Hardware Configuration
Login authority is required to check the hardware configuration. The following
procedure for these checks can be made from the maintenance terminal:
1. Log in with the account of the XSCF hardware maintenance engineer.
2. Type showhardconf.
XSCF> showhardconf
The showhardconf command prints the hardware configuration information to
the screen. See the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers
XSCF User’s Guide for more detailed information.
Chapter 2 Fault Isolation2-5
2.2.2Checking the Software and Firmware
Configuration
The software and firmware configurations and versions affect the operation of the
server. To change the configuration or investigate a problem, check the latest
information and check for any problems in the software.
Software and firmware varies according to users:
■ The software configuration and version can be checked in the Solaris OS. Refer to
the Solaris 10 documentation for more information.
■ The firmware configuration and versions can be checked from the maintenance
terminal using XSCF Shell commands. Refer to the SPARC Enterprise
M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide for more detailed
information.
Check the software and firmware configuration information with assistance from the
system administrator. However, if you have received login authority from the
system administrator, the commands shown in the table can be used from the
maintenance terminal for these checks.
TABLE 2-2 Commands for Checking Software and Firmware Configuration
CommandDescription
showrev(1M)System administration command that displays information system
patches.
uname(1)System administration command that outputs the current system
information.
version(8)XSCF Shell command that outputs the current firmware version
information.
showhardconf(8)XSCF Shell command that indicates information on components
mounted on the server.
showstatus(8)XSCF Shell command that displays the status of a component. This
command is used when only a faulty component is to be checked.
2-6SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
TABLE 2-2 Commands for Checking Software and Firmware Configuration (Continued)
CommandDescription
showboards(8)XSCF Shell command that indicates information on eXtended system
board (XSB). It can indicate information on XSB that belongs to the
specified domain and information on all XSBs mounted. The
eXtended System Board (XSB) combines the hardware resources of a
physical system board. The SPARC Enterprise servers can generate
one (Uni-XSB) or four (Quad-XSB) XSB(s) from one physical system
board.
showdcl(8)XSCF Shell command that displays the configuration information of a
domain (hardware resource information).
showfru(8)XSCF Shell command that displays the setting information of a
device.
2.2.2.1Checking the Software Configuration
The following procedure for these checks can be made from the domain console:
1. Type showrev.
# showrev
The showrev command prints the system configuration information to the
screen.
2.2.2.2Checking the Firmware Configuration
Login authority is required to check the firmware configuration. The following
procedure for these checks can be made from the maintenance terminal:
1. Log in with the account of the XSCF hardware maintenance engineer.
2. Type version(8).
XSCF> version(8)
The version(8) command prints the firmware version information to the
screen. See the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers
XSCF User’s Guide for more detailed information.
Chapter 2 Fault Isolation2-7
2.2.3Downloading the Error Log Information
If you want to download the error log information, use the XSCF log fetch function.
The eXtended System Control facility unit (XSCFU) has an interface with external
units so that a maintenance engineer can easily obtain useful maintenance
information such as error logs
Connect the maintenance terminal, and use the command-line interface (CLI) or
browser user interface (BUI) to issue a download instruction to the maintenance
terminal to download Error Log information over the XSCF-LAN.
2-8SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
2.3Operator Panel
1
2
3
4
5
6
When no network connection is available the operator panel is used to start or stop
the server. The operator panel displays three LED status indicators, a Power switch,
and a security keyswitch. The panel is located on the front of the server, in the upper
right.
When the server is running, the Power and XSCF STANDBY LEDs (green) should be
lit and the CHECK LED (amber) should not be lit. If the CHECK LED is lit, search
the system logs to determine what is wrong.
The three LED status indicators on the operator panel provide the following:
■ General system status
■ System problem alerts
■ Location of the system fault
FIGURE 2-3 and FIGURE 2-4 show the operator panel.
FIGURE 2-3 SPARC Enterprise M4000 Operator Panel
Location NumberComponent
1POWER LED
2XSCF STANDBY LED
3CHECK LED
4Power switch
5Mode switch (keyswitch)
l
Chapter 2 Fault Isolation2-9
Location NumberComponent
1
2
3
4
5
6
6Antistatic ground socket
FIGURE 2-4 SPARC Enterprise M5000 Operator Panel
Location NumberComponent
1POWER LED
2XSCF STANDBY LED
3CHECK LED
4Power switch
5Mode switch (keyswitch)
6Antistatic ground socket
Additional LEDs are located in various locations in the server. For more information
about LED indicator locations, see Section 2.5, “LED Functions” on page 2-18.
2-10SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
The Operator panel LEDs operate as described in TAB LE 2 -3.
TABLE 2-3 Operator Panel LEDs and Switches
IconName ColorDescription
POWER LEDGreenIndicates the server power status.
• On: Server has power.
• Off: Server is without power.
• Blinking: The power-off sequence is in progress.
XSCF
STANDBY
LED
Green Indicates the readiness of the XSCF.
• On: XSCF unit is functioning normally.
• Off: XSCF unit is stopped.
• Blinking: Under system initialization after server
power-on, or under system power-on process.
Indicates that server detected a fault.
CHECK LEDAmber
• On: Error detected that disables the startup.
• Off: Normal, or server power-off (power failure).
• Blinking: Indicates the position of fault.
Power switchSwitch to direct server power on/power off.
The Locked setting:
• Normal key position. Power on is available with the
Mode switch
(keyswitch)
Power switch, but power off is not.
• Disables the Power switch to prevent unauthorized
users from powering the server on or off.
• The Locked position is the recommended setting for
normal day-to-day operations.
The Service setting:
• Service should be provided at this position.
• Power on and off is available with Power switch.
• The key cannot be pulled out at this position.
Chapter 2 Fault Isolation2-11
The state displayed by LED combination is described in TAB LE 2 -4.
TABLE 2-4 State Display by LED Combination (Operator Panel)
LED
XSCF
STANDBYCHECK
OffOffOffThe circuit breaker is switched off.
OffOffOnThe circuit breaker is switched on.
OffBlinkingOffThe XSCF is being initialized.
OffBlinkingOnAn error occurred in the XSCF.
OffOnOffThe XSCF is on standby.
OnOnOffWarm-up standby processing is in progress
BlinkingOnOffThe power-off sequence is in progress.
Description of the statePOWER
The system is waiting for power-on of the air
conditioning system.
(power-on is delayed).
The power-on sequence is in progress.
The system is in operation.
Fan termination is being delayed.
2-12SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
The operator panel mode switch is used to set the operation mode. The operator
panel power switch is used to power on and off the server.
TABLE 2-6 lists the settings
and corresponding functions of the mode switch on the operator panel.
TABLE 2-5 Switches (Operator Panel)
NameDescription of Function
Mode switchUsed to set an operation mode for the server. Insert the special key that is under the
customer’s control, to switch between modes.
LockedNormal operation mode.
The system can be powered on with the power switch, but it
cannot be powered off with the power switch.
The key can be pulled out at this key position.
ServiceMode for maintenance.
The system can only be powered on and off with the power
switch.
The key cannot be pulled out at this key position.
Maintenance is performed in Service mode while the server
is stopped.
Because remote power control and automatic power control
of the server are disabled in Service mode, unintentional
power on can be prevented.
Power switchUsed to control the server power. Power on and power off are controlled by pressing this
switch in different patterns, as described below.
Holding down for a short time
(less than 4 seconds)
Holding down for a long time
in Service mode
(4 seconds or longer)
Regardless of the mode switch state, the server (all domains)
is powered on.
At this time, processing for waiting for facility (air
conditioners) power on and warm-up completion is skipped.
If power to the server
operating), shutdown processing is executed for all domains
before the system is powered off.
If the system is being powered on, the power-on processing
is cancelled, and the system is powered off.
If the system is being powered off, the operation of the
Power switch is ignored, and the power-off processing is
continued.
is on (at least one domain is
Chapter 2 Fault Isolation2-13
TABLE 2-6 Meanings of the Mode Switch
FunctionMode Switch
State DefinitionLockedService
Inhibition of Break Signal ReceptionEnabled. Reception of the
break signal can be enabled or
disabled for each domain
using setdomainmode.
Power On/Off by power switchOnly power on is enabled Enabled
Disabled
2.4Error Conditions
Always access the following web site first to interpret faults and obtain information
on FMA messages.
http://www.sun.com/msg
This web site can be used in the event of a Solaris or domain failure or to look up
specific FMA error messages it will not provide details on XSCF errors.
The web site directs you to provide the message ID that your software displayed.
The web site then provides knowledge articles about the fault and corrective action
to resolve the fault. The fault information and documentation at this web site is
updated regularly.
Predictive self-healing is an architecture and methodology for automatically
diagnosing, reporting, and handling software and hardware fault conditions. This
new technology lessens the time required to debug a hardware or software problem
and provides the administrator and technical support with detailed data about each
fault.
2.4.1Predictive Self-Healing Tools
In the Solaris 10 software, the fault manager runs in the background. If a failure
occurs, the system software recognizes the error and attempts to determine what
hardware is faulty. The software also takes steps to prevent that component from
being used until it has been replaced. Some of the specific actions the software takes
include:
■ Receives telemetry information about problems detected by the system software.
2-14SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
■ Diagnoses the problems.
■ Initiates pro-active self-healing activities. For example, the fault manager can
disable faulty components.
■ When possible, causes the faulty FRU to provide an LED indication of a fault in
addition to populating the system console messages with more details.
TABLE 2-7 shows a typical message generated when a fault occurs. The message
appears on your console and is recorded in the /var/adm/messages file.
Note – The message in TABLE 2 -7 indicates that the fault has already been diagnosed.
Any corrective action that the system can perform has already taken place. If your
server is still running, it continues to run.
Chapter 2 Fault Isolation2-15
TABLE 2-7 Predictive Self-Healing Message
Output DisplayedDescription
Nov 1 16:30:20 dt88-292 EVENT-TIME: Tue Nov 1 16:30:20
PST 2005
Nov 1 16:30:20 dt88-292 PLATFORM: SUNW,A70, CSN: -,
HOSTNAME: dt88-292
EVENT-TIME: the time stamp of
the diagnosis.
PLATFORM: A description of the
server encountering the problem.
Nov 1 16:30:20 dt88-292 SOURCE: eft, REV: 1.13SOURCE: Information on the
Diagnosis Engine used to
determine the fault.
Nov 1 16:30:20 dt88-292 EVENT-ID: afc7e660-d609-4b2f86b8-ae7c6b8d50c4
Nov 1 16:30:20 dt88-292 DESC:
Nov 1 16:30:20 dt88-292 A problem was detected in the
EVENT-ID: The Universally
Unique event ID for this fault.
DESC: A basic description of the
failure.
PCI-Express subsystem
Nov 1 16:30:20 dt88-292 Refer to
http://sun.com/msg/SUN4-8000-0Y for more information.
WEB SITE: Where to find specific
information and actions for this
fault.
Nov 1 16:30:20 dt88-292 AUTO-RESPONSE: One or more
device instances may be disabled.
AUTO-RESPONSE: What, if
anything, the system did to
alleviate any follow-on issues.
Nov 1 16:30:20 dt88-292 IMPACT: Loss of services
provided by the device instances associated with this
IMPACT: A description of what
that response might have done.
fault.
Nov 1 16:30:20 dt88-292 REC-ACTION: Schedule a repair
procedure to replace the affected device. Use Nov 1
16:30:20 dt88-292 fmdump -v -u EVENT_ID to identify the
REC-ACTION: A short description
of what the system administrator
should do.
device or contact Sun for support.
2-16SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
2.4.2Monitoring Output
To understand error conditions, collect the monitoring output information. For the
collection of the information, use the commands shown in
TABLE 2-8 Commands for Checking the Monitoring Output
CommandOperandDescription
showlogs(8)consoleDisplays console of Domain.
monitorLogs messages that are displayed in the message window.
panicLogs output to the console during a panic.
iplCollects console data generated during the period of the
power on of a domain to the completion of the operating
system start.
2.4.3Messaging Output
To understand error conditions, collect messaging output information, use the
commands shown in
TABLE 2-9 Commands for Checking the Messaging Output
TABLE 2-9.
TABLE 2-8.
CommandOperandDescription
showlogsenvDisplays the temperature history log. The environmental
temperature data and power status are indicated in 10-minute
intervals. the data is stored for a maximum of six months.
powerDisplays the power and reset information.
eventDisplays information reported to the operating system and
stored as event logs.
errorDisplays error logs.
fmdump(1M)
fmdump(8)
Displays fault management architecture diagnostic results and
errors. It is provided as a Solaris command and XSCF Shell
command.
Each error message logged by the predictive self-healing architecture has a code
associated with it as well as a web address that can be followed to get the most upto-date course of action for dealing with that error.
Refer to the Solaris 10 documentation for more information on predictive selfhealing.
Chapter 2 Fault Isolation2-17
2.5LED Functions
LED lights help the user find the component and provide information on the state of
the component.
This section explains the LEDs of each component that are to be checked when a
component is replaced. Most components are equipped with LEDs that help indicate
which component has the error and an LED to indicate whether the component can
be removed.
Some components, such as DIMMs, do not have LEDs. The state of a component
without LEDs can be checked using the showhardconf and ioxadm XSCF Shell
commands from the maintenance terminal. See the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide for more detailed
information.
TABLE 2-10 describes the LEDs and their functions.
TABLE 2-10 Component LEDs
LED NameDisplay and Meaning
READY (green) Indicates whether the component is operating.
OnIndicates that the component is operating. The component
cannot be disconnected and removed from the server while
the READY LED is On.
BlinkingIndicates that the component is being configured (or
disconnected).
For an XSCF unit it indicates that it is being initialized.
OffIndicates that the component is stopped. The component can
be disconnected and replaced.
CHECK
(amber)
Indicates that the component contains an error or that the component is a
target for replacement.
OnIndicates that an error has been detected.
BlinkingIndicates that the component is ready to be replaced. The
blinking LED acts as a locator.
OffIndicates no known error exists.
2-18SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
TABLE 2-11 describes the components and their LEDs.
TABLE 2-11 Component LED Descriptions
ComponentLED TypeLED DisplayMeaning
XSCF unitACTIVEOn (green)Indicates that the XSCF unit is active.
OffIndicates that the XSCF unit is on standby.
XSCF unit and IO
(display part for
LAN)
ACTIVEOn (green)Indicates that the communication is being
performed through the LAN port.
OffIndicates that no communication is being
performed through the LAN port.
LINK SPEEDOn (amber)Indicates that the communication speed for the
LAN port is 1G bps.
On (green)Indicates that the communication speed for the
LAN port is 100M bps.
OffIndicates that the communication speed for the
LAN port is 10M bps.
PCI slotPOWEROn (green)Indicates that the power to the PCI slot is turned
on. The PCI card cannot be removed.
OffIndicates that the power to the PCI slot is turned
off. The PCI card can be removed.
ATTENTIONOn (amber)Indicates that an error occurred in the PCI slot.
Blinking (amber)Indicates that the card in this PCI slot is a target
device for replacement.
OffIndicates the normal state of the PCI slot.
Chapter 2 Fault Isolation2-19
TABLE 2-11 Component LED Descriptions (Continued)
ComponentLED TypeLED DisplayMeaning
Power supply unit
(PSU)
READYOn (green)Indicates that the power is turned on and being
supplied.
Blinking (green)Indicates that the power is being supplied to the
power supply unit, but the power supply unit is
not turned on.
OffIndicates that power is not being supplied to the
power supply unit.
CHECKOn (amber)Indicates that an error occurred in the power
supply unit.
OffIndicates the normal state of the power supply
unit.
LED_ACOn (green)Power supply unit has AC applied and is
supplying 12V.
OffIndicates that AC is out of the specified
operating range and 12V is not being supplied
from the power supply unit.
LED_DCOn (green)Power supply unit has AC applied and is
supplying 48V. Standby pinhole provides a
manual backup to turn off 48V power.
OffIndicates that 48V is not being supplied from
the power supply unit.
FanATTENTIONOn (amber)Indicates that an error occurred.
Blinking (amber)Indicates that the fan is a target device for
replacement.
2-20SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
2.6Using the Diagnostic Commands
After the message in TA BLE 2-7 is displayed, you might desire more information
about the fault. For complete information about troubleshooting commands, refer to
the Solaris 10 man pages or the XSCF Shell man pages. This section describes some
details of the following commands:
■ showlogs
■ fmdump
■ fmadm
■ fmstat
2.6.1Using the showlogs Command
The showlogs command displays the contents of a specified log in order of time
stamp starting with the oldest date. The showlogs command displays the following
logs:
■ error log
■ power log
■ event log
■ temperature and humidity record
■ monitoring message log
■ console message log
■ panic message log
■ IPL message log
An example of the showlogs output.
XSCF> showlogs error
Date: Oct 03 17:23:11 UTC 2006 Code: 80002000-ccff0000-0104340100000000
Status: Alarm Occurred: Oct 03 17:23:10.868 UTC 2006
FRU: /FAN_A#0
Msg: Abnormal FAN rotation speed. Insufficient rotation
XSCF>
Chapter 2 Fault Isolation2-21
2.6.2Using the fmdump Command
The fmdump command can be used to display the contents of any log files associated
with the Solaris fault manager.
The fmdump command produces output similar to
CODE EXAMPLE 2-1. This example
assumes there is only one fault.
CODE EXAMPLE 2-1 fmdump Output
# fmdump
TIME UUID SUNW-MSG-ID
Nov 02 10:04:15.4911 0ee65618-2218-4997-c0dc-b5c410ed8ec2 SUN4-8000-0Y
2.6.2.1fmdump -V Command
You can obtain more detail by using the -V option.
# fmdump -V -u 0ee65618-2218-4997-c0dc-b5c410ed8ec2
TIME UUID SUNW-MSG-ID
Nov 02 10:04:15.4911 0ee65618-2218-4997-c0dc-b5c410ed8ec2 SUN4-8000-0Y
100% fault.io.fire.asic
FRU: hc://product-id=SUNW,A70/motherboard=0
rsrc: hc:///motherboard=0/hostbridge=0/pciexrc=0
At least three lines of new output are delivered to the user with the -V option.
■ The first line is a summary of information you have seen before in the console
message but includes the time stamp, the UUID and the Message-ID.
■ The second line is a declaration of the certainty of the diagnosis. In this case we
are 100 percent sure the failure is in the ASIC described. If the diagnosis might
involve multiple components you might see two lines here with 50% in each (for
example)
■ The FRU line declares the part that needs to be replaced to return the server to a
fully operational state.
■ The rsrc line describes which component was taken out of service as a result of
this fault.
2-22SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
2.6.2.2fmdump -e Command
To get information of the errors that caused this failure you can use the -e option,
as shown in the following example.
XSCF> fmdump -e
TIME CLASS
Oct 03 13:52:48.9532 ereport.fm.fmd.module
Oct 03 13:52:48.9610 ereport.fm.fmd.module
Oct 03 13:52:48.9674 ereport.fm.fmd.module
Oct 03 13:52:48.9738 ereport.fm.fmd.module
2.6.3Using the fmadm faulty Command
The fmadm faulty command can be used by administrators and service personnel
to view and modify system configuration parameters that are maintained by the
Solaris fault manager. The command is primarily used to determine the status of a
component involved in a fault, as shown in the following example.
The PCI device is degraded and is associated with the same UUID as seen above.
You might also see “faulted” states.
2.6.3.1fmadm repair Command
If fmadm faulty occurs, the faulty FRU (CPU, memory, or I/O unit) is replaced,
and then the fmadm repair command needs to be executed to clear FRU
information on the domain. If the fmadam repair command is not executed, error
messages continue to be output.
Chapter 2 Fault Isolation2-23
If fmadm faulty occurs, the FMA resource cache on the OS side can be cleared
without problems; the data in it need not match the hardware failure information
retained on the XSCF side.
The fmadm config command output shows you the version numbers of the
diagnosis engines in use by your server, as well as their current state. You can check
these versions against information on the SunSolve web site to determine if you are
running the latest diagnostic engines, as shown in the following example.
XSCF> fmadm config
MODULE VERSION STATUS DESCRIPTION
eft 1.16 active eft diagnosis engine
event-transport 2.0 active Event Transport Module
faultevent-post 1.0 active Gate Reaction Agent for errhandd
fmd-self-diagnosis 1.0 active Fault Manager Self-Diagnosis
iox_agent 1.0 active IO Box Recovery Agent
reagent 1.1 active Reissue Agent
sysevent-transport 1.0 active SysEvent Transport Agent
syslog-msgs 1.0 active Syslog Messaging Agent
XSCF>
2.6.4Using the fmstat Command
The fmstat command can report statistics associated with the Solaris fault
manager. The fmstat command shows information about DE performance. In the
example below, the fmd-self-diagnosis DE (also seen in the console output) has
received an event which it accepted. A case is “opened” for that event and a
diagnosis is performed to “solve” the cause for the failure. See the following
example.
2-24SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
These superuser commands can help you determine if you have issues in your
workstation, in the network, or within another server that you are networking with.
The following commands are described in this section:
■ “Using the iostat Command” on page 2-27
■ “Using the prtdiag Command” on page 2-28
■ “Using the prtconf Command” on page 2-30
■ “Using the netstat Command” on page 2-32
■ “Using the ping Command” on page 2-34
■ “Using the ps Command” on page 2-35
■ “Using the prstat Command” on page 2-36
Most of these commands are located in the /usr/bin or /usr/sbin directories.
Note – For additional details, options, examples, and the most up to date
information for each command refer to that command’s man page.
2-26SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
2.7.1Using the iostat Command
The iostat command iteratively reports terminal, drive, and tape I/O activity, as
well as CPU utilization.
2.7.1.1Options
TABLE 2-12 describes options for the iostat command and how those options can
help troubleshoot the server.
TABLE 2-12 Options for iostat
OptionDescriptionHow It Can Help
No optionReports status of local I/O devices.A quick three-line output of device status.
-cReports the percentage of time the
system has spent in user mode, in system
mode, waiting for I/O, and idling.
-eDisplays device error summary statistics.
The total errors, hard errors, soft errors,
and transport errors are displayed.
-EDisplays all device error statistics.Provides information about devices: manufacturer,
-nDisplays names in descriptive format.Descriptive format helps identify devices.
-xFor each drive, reports extended drive
statistics. The output is in tabular form.
Quick report of CPU status.
Provides a short table with accumulated errors.
Identifies suspect I/O devices.
model number, serial number, size, and errors.
Similar to the
information. This helps identify poor performance of
internal devices and other I/O devices across the
network.
-e option, but provides rate
The following example shows output for one iostat command.
# iostat -En
c0t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATE Product: ST973401LSUN72G Revision: 0556 Serial No: 0521104T9D
Size: 73.40GB <73400057856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c0t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATE Product: ST973401LSUN72G Revision: 0556 Serial No: 0521104V3V
Size: 73.40GB <73400057856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
#
Chapter 2 Fault Isolation2-27
2.7.2Using the prtdiag Command
The prtdiag command displays configuration and diagnostic information. The
diagnostic information identifies any failed component.
The prtdiag command is located in the /usr/platform/platform-name/sbin/
directory.
Note – The prtdiag command might indicate a slot number different than that
identified elsewhere in this document. This is normal.
2.7.2.1Options
TABLE 2-13 describes options for the prtdiag command and how those options can
help troubleshooting.
TABLE 2-13 Options for prtdiag
OptionDescriptionHow It Can Help
No optionLists components.Identifies CPU timing and PCI cards installed.
-vVerbose mode. Displays the
time of the most recent AC
power failure and the most
recent hardware fatal error
information.
Provides the same information as no option. Additionally
lists fan status, temperatures, ASIC, and PROM revisions.
2-28SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
Thefollowing example shows output for the prtdiag command in verbose mode.
# prtdiag -v
System Configuration: Sun Microsystems sun4u Sun SPARC Enterprise M5000 Server
System clock frequency: 1012 MHz
Memory size: 8064 Megabytes
=================== Environmental Status ===================
Mode switch is in UNLOCK mode
#
Chapter 2 Fault Isolation2-29
2.7.3Using the prtconf Command
Similar to the show-devs command run at the ok prompt, the prtconf command
displays the devices that are configured.
The prtconf command identifies hardware that is recognized by the Solaris OS. If
hardware is not suspected of being bad yet software applications are having trouble
with the hardware, the prtconf command can indicate if the Solaris software
recognizes the hardware, and if a driver for the hardware is loaded.
2.7.3.1Options
TABLE 2-14 describes options for the prtconf command and how those options can
help troubleshooting.
TABLE 2-14 Options for prtconf
OptionDescriptionHow It Can Help
No optionDisplays the device tree of
devices recognized by the OS.
-DSimilar to the output of no
option, however the device
driver is listed.
-pSimilar to the output of no
option, yet is abbreviated.
-VDisplays the version and date of
the OpenBoot PROM firmware.
If a hardware device is recognized, then it is probably
functioning properly. If the message “
attached)
device, then the driver for the device is corrupt or missing.
Lists the driver needed or used by the OS to enable the
device.
Reports a brief list of the devices.
Provides a quick check of firmware version.
” is displayed for the device or for a sub-
(driver not
The following example shows output for the prtconf command.
# prtconf
System Configuration: Sun Microsystems sun4u
Memory size: 8064 Megabytes
System Peripherals (Software Nodes):
SUNW,SPARC-Enterprise
scsi_vhci, instance #0
packages (driver not attached)
SUNW,builtin-drivers (driver not attached)
deblocker (driver not attached)
2-30SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
The prtconf output continued.
disk-label (driver not attached)
terminal-emulator (driver not attached)
obp-tftp (driver not attached)
ufs-file-system (driver not attached)
chosen (driver not attached)
openprom (driver not attached)
client-services (driver not attached)
options, instance #0
aliases (driver not attached)
memory (driver not attached)
virtual-memory (driver not attached)
pseudo-console, instance #0
nvram (driver not attached)
pseudo-mc, instance #0
cmp (driver not attached)
core (driver not attached)
cpu (driver not attached)
cpu (driver not attached)
core (driver not attached)
cpu (driver not attached)
cpu (driver not attached)
cmp (driver not attached)
core (driver not attached)
cpu (driver not attached)
cpu (driver not attached)
core (driver not attached)
cpu (driver not attached)
cpu (driver not attached)
pci, instance #0
ebus, instance #0
flashprom (driver not attached)
serial, instance #0
scfc, instance #0
panel, instance #0
pci, instance #0
pci, instance #0
pci, instance #1
pci, instance #3
scsi, instance #0
tape (driver not attached)
disk (driver not attached)
sd, instance #0 (driver not attached)
sd, instance #2
sd, instance #4
network, instance #0
network, instance #1 (driver not attached)
pci, instance #4
2-32SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
2.7.4.1Options
TABLE 2-15 describes options for the netstat command and how those options can
help troubleshooting.
TABLE 2-15 Options for netstat
OptionDescriptionHow It Can Help
-iDisplays the interface state,
including packets in/out, error
in/out, collisions, and queue.
-i intervalProviding a trailing number
with the
-i option repeats the
netstat command every
interval seconds.
-pDisplays the media table.Provides MAC address for hosts on the subnet.
-rDisplays the routing table.Provides routing information.
-nReplaces host names with IP
addresses.
following example shows output for the netstat -p command.
The
# netstat -p
Net to Media Table: IPv4
Device IP Address Mask Flags Phys Addr
------ -------------------- --------------- -------- --------------bge0 san-ff1-14-a 255.255.255.255 o 00:14:4f:3a:93:61
bge0 san-ff2-40-a 255.255.255.255 o 00:14:4f:3a:93:85
sppp0 224.0.0.22 255.255.255.255
bge0 san-ff2-42-a 255.255.255.255 o 00:14:4f:3a:93:af
bge0 san09-lab-r01-66 255.255.255.255 o 00:e0:52:ec:1a:00
sppp0 192.168.1.1 255.255.255.255
bge0 san-ff2-9-b 255.255.255.255 o 00:03:ba:dc:af:2a
bge0 bizzaro 255.255.255.255 o 00:03:ba:11:b3:c1
bge0 san-ff2-9-a 255.255.255.255 o 00:03:ba:dc:af:29
bge0 racerx-b 255.255.255.255 o 00:0b:5d:dc:08:b0
bge0 224.0.0.0 240.0.0.0 SM 01:00:5e:00:00:00
#
Provides a quick overview of the network status.
Identifies intermittent or long duration network events.
By piping
can be viewed all at once.
Used when an address is more useful than a host name.
netstat output to a file, overnight activity
Chapter 2 Fault Isolation2-33
2.7.5Using the ping Command
The ping command sends ICMP ECHO_REQUEST packets to network hosts.
Depending on how the ping command is configured, the output displayed can
identify troublesome network links or nodes. The destination host is specified in the
variable hostname.
2.7.5.1Options
TABLE 2-16 describes options for the ping command and how those options can help
troubleshooting.
TABLE 2-16 Options for ping
OptionDescriptionHow It Can Help
hostnameThe probe packet is sent to
hostname and returned.
-g hostnameForces the probe packet to route
through a specified gateway.
-i interfaceDesignates which interface to
send and receive the probe
packet through.
-nReplaces host names with IP
addresses.
-sPings continuously in one-second
intervals. Ctrl-C aborts. Upon
abort, statistics are displayed.
-svRDisplays the route the probe
packet followed in one-second
intervals.
Verifies that a host is active on the network.
By identifying different routes to the target host, those
individual routes can be tested for quality.
Enables a simple check of secondary network interfaces.
Used when an address is more beneficial than a host name.
Helps identify intermittent or long-duration network events.
By piping
be viewed at once.
Indicates probe packet route and number of hops.
Comparing multiple routes can identify bottlenecks.
ping output to a file, activity overnight can later
following example shows output for the ping -s command.
The
# ping -s san-ff2-17-a
PING san-ff2-17-a: 56 data bytes
64 bytes from san-ff2-17-a (10.1.67.31): icmp_seq=0. time=0.427 ms
64 bytes from san-ff2-17-a (10.1.67.31): icmp_seq=1. time=0.194 ms
^C
2-34SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
2.7.6Using the ps Command
The ps command lists the status of processes. Using options and rearranging the
command output can assist in determining the resource allocation.
2.7.6.1Options
TABLE 2-17 describes options for the ps command and how those options can help
troubleshooting.
TABLE 2-17 Options for ps
OptionDescriptionHow It Can Help
-eDisplays information for every
process.
-fGenerates a full listing.Provides the following process information: user ID,
-o optionEnables configurable output. The pid,
pcpu, pmem, and comm options
display process ID, percent CPU
consumption, percent memory
consumption, and the responsible
executable, respectively.
Identifies the process ID and the executable.
parent process ID, time when executed, and the path to
the executable.
Provides only most important information. Knowing
the percentage of resource consumption helps identify
processes that are affecting performance and might be
hung.
following example shows output for one ps command.
The
# ps
PID TTY TIME CMD
101042 pts/3 0:00 ps
101025 pts/3 0:00 sh
#
Note – When using sort with the -r option, the column headings are printed so
that the value in the first column is equal to zero.
Chapter 2 Fault Isolation2-35
2.7.7Using the prstat Command
The prstat utility iteratively examines all active processes and reports statistics
based on the selected output mode and sort order. The prstat command provides
output similar to the ps command.
2.7.7.1Options
TABLE 2-18 describes options for the prstat command and how those options can
help troubleshooting.
TABLE 2-18 Options for prstat
OptionDescriptionHow It Can Help
No optionDisplays a sorted list of the top
processes that are consuming
the most CPU resources. List is
limited to the height of the
terminal window and the total
number of processes. Output is
automatically updated every
five seconds. Ctrl-C aborts.
-n numberLimits output to number of
lines.
-s keyPermits sorting list by key
parameter.
-vVerbose mode.Displays additional parameters.
Output identifies process ID, user ID, memory used, state,
CPU consumption, and command name.
Limits amount of data displayed and identifies primary
resource consumers.
Useful keys are cpu (default), time, and size.
The following example shows output for the prstat command.
2-36SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
2.8Other Issues
2.8.1Can’t Locate Boot Device
When the PCI-X card slot 0 is faulty or it is not seated properly, the firmware will
blacklist the entire PCI-X bridge device (and everything attached downstream from
it) causing the boot disk to disappear. The problem results in the showdisk
command failing to display the boot disk and the bootdisk command displaying
the console message “Can’t locate boot device”.
When this occurs remove the PCI/PCI-X card in slot 0 to see if the boot issue is
remedied. If the IO unit is fully stocked and it is not possible to remove the
PCI/PCI-X card, then you should attempt to place another card in slot 0, if possible.
If this also is not possible you should remove and reinstalling the existing card in
slot 0.
Chapter 2 Fault Isolation2-37
2-38SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
CHAPTER
3
Periodic Maintenance
This chapter describes the periodic maintenance required to keep the server running
regardless of whether a problem has occurred.
3.1Tape Drive Unit
It might be necessary to use a cleaning tape when carrying out the cleaning
procedure.
Note – Contact your sales representative for tape drive unit options on SPARC
Enterprise M4000/M5000 servers.
3.1.1Cleaning the Tape Drive Unit
To avoid the "Clean Lamp" from prematurely illuminating, the following
maintenance rules should be followed:
■ Clean your tape drive unit once every 5 to 24 hours of continuous use, or once a
week.
■ Clean your tape drive unit once a month, even if it is not in use.
■ Clean your tape drive unit whenever the "Clean Lamp" indicator is lit or blinking.
■ Clean your tape drive unit before inserting a new data cassette.
■ Replace the cleaning cassette when the tape inside of the cassette has completely
wound up onto the right-hand spool or when the three lamps are in the following
states:"Off","Lit" and "Blinking."
3-1
■ Remove the cassette before turning the power "OFF". The tape life might be
shortened or a malfunction might occur during the backup process if the power is
turned "OFF" while the cassette is still inside.
Note – If the "cleaning lamp" starts blinking immediately after completion of a
cleaning operation, the data cassette might have been damaged. In this case, replace
the data cassette.
3-2SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
CHAPTER
4
FRU Replacement Preparation
This chapter describes how to prepare a field-replaceable unit (FRU) for safe
replacement. The information is organized into the following topics:
■ “FRU Replacement Method” on page 4-1
■ “Active Replacement” on page 4-4
■ “Hot Replacement” on page 4-6
■ “Cold Replacement (Powering the Server Off and On)” on page 4-12
4.1FRU Replacement Method
There are three basic methods for replacing the FRUs:
Active replacement – To replace a FRU while the domain, to which the FRU belongs,
continues running. Active replacement requires that the FRU be inactivated or
powered down using either an XSCF command or Solaris OS command. Because the
power supply unit (PSU) and fan unit (FAN) do not belong to any domain, they are
operated by using XSCF commands, regardless of the operating state of the Solaris
OS.
Note – The procedure for isolating the hard disk drive from the Solaris OS varies
depending on whether disk mirroring software or other support software is used.
For details, see the relevant software manuals.
Hot replacement –To replace a FRU while the domains are powered off. Depending
on the FRU to be replaced, the FRU can either be directly replaced or be inactivated
or powered down using an XSCF command.
Cold replacement – To replace a FRU while all domains are stopped and the server
is powered off and unplugged.
4-1
TABLE 4-1 lists the FRUs, location and access, and the replacement method.
CD-RW/DVD-RW Drive Unit (DVDU)FrontHot replacement
Cold replacement
Backplane unit (BPU_A, BPU_B)TopCold replacement
4-2SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
TABLE 4-1
FRU FRU Location/AccessRemoval Method(s)
FRU Replacement Information (Continued)
CPU module (CPUM_A)TopCold replacement
Memory board (MEMB)TopCold replacement
Motherboard (SPARC Enterprise M4000)
RearCold replacement
(MBU_A)
Motherboard DC-DC Converter (SPARC
RearCold replacement
Enterprise M4000) (DDC_A, DDC_B)
Motherboard (SPARC Enterprise M5000)
TopCold replacement
(MBU_B)
Motherboard DC-DC Converter (SPARC
TopCold replacement
Enterprise M5000) (DDC_A, DDC_B)
eXtended System Control facility unit (XSCFU)RearCold replacement
Hard disk drive backplane (HDDBP)TopCold replacement
CD-RW/DVD-RW backplane TopCold replacement
Tape drive backplane (TAPEBP)TopCold replacement
Operator panel (OPNL)TopCold replacement
* When using active replacement for a PSU, only one power supply unit should be replaced at a time to ensure redundancy.
\ When using active replacement for a 172-mm or 60-mm fan unit, only one fan unit should be replaced at a time to ensure redundancy.
Chapter 4 FRU Replacement Preparation4-3
4.2Active Replacement
In active replacement the Solaris OS must be configured to allow the component to
be replaced. Active replacement has four stages:
■ “Removing a FRU From a Domain” on page 4-4
■ “Removing and Replacing a FRU” on page 4-5
■ “Adding a FRU Into a Domain” on page 4-5
■ “Verifying Hardware Operation” on page 4-6
Note – If the hard disk drive is the boot device, the hard disk will have to be
replaced using cold replacement procedures. However, active replacement can be
used if the boot disk can be isolated from the Solaris OS by disk mirroring software
and other software.
4.2.1Removing a FRU From a Domain
Note – Before you remove a PCI cassette, make sure that there is no I/O activity on
the card in the cassette.
1. From the Solaris prompt, type the cfgadm command to get the component
status.
# cfgadm -a
Ap_Id Type Receptacle Occupant
Condition
iou#0-pci#0 etherne/hp connected configured ok
iou#0-pci#1 fibre/hp connected configured ok
iou#0-pci#2 pci-pci/hp connected configured ok
Ap_Id includes the IOU number (iou#0 or iou#1) and the PCI cassette slot number
(pci#1, pci#2, pci#3, pci#4).
2. Type the cfgadm command to disconnect the component from the domain:
# cfgadm -c unconfigure Ap_Id
4-4SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
Note – For a PCI cassette, type the cfgadm -c disconnect command to
disconnect the component from the domain.
The Ap_Id is shown in the output of cfgadm.
3. Type the cfgadm command to confirm the component is now disconnected.
# cfgadm -a
Ap_Id Type Receptacle Occupant
Condition
iou#0-pci#0 etherne/hp disconnected unconfigured
unknown
iou#0-pci#1 fibre/hp connected configured ok
iou#0-pci#2 pci-pci/hp connected configured ok
iou#0-pci#0 for example.
4.2.2Removing and Replacing a FRU
Once the FRU has been removed from the domain, see “Removing and Replacing a
FRU” on page 4-7
4.2.3Adding a FRU Into a Domain
1. From the Solaris prompt, type the cfgadm command to connect the component
to the domain.
# cfgadm -c configure Ap_Id
The Ap_Id is shown in the output of cfgadm.
2. Type the cfgadm command to confirm the component is now connected.
# cfgadm -a
Ap_Id Type Receptacle Occupant
Condition
iou#0-pci#0 etherne/hp connected configured ok
iou#0-pci#1 fibre/hp connected configured ok
iou#0-pci#2 pci-pci/hp connected configured ok
iou#0-pci#0 for example.
Chapter 4 FRU Replacement Preparation4-5
4.2.4Verifying Hardware Operation
● Verify the state of the status LEDs.
The POWER LED should be On and the CHECK LED should not be On.
Note – Note - If the hard disk drive is the boot device, the hard disk will have to be
replaced using cold replacement procedures. However, active replacement can be
used if the boot disk can be isolated from the Solaris OS by disk mirroring software
and other software.
4.3Hot Replacement
In hot replacement the Solaris OS does not need to be configured to allow the
component to be replaced. Depending on the FRU to be replaced, the FRU can either
be directly replaced or be inactivated or powered down using an XSCF command.
4-6SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
4.3.1Removing and Replacing a FRU
1. From the XSCF Shell prompt, type the replacefru command.
CODE EXAMPLE 4-1replacefru command
XSCF> replacefru
---------------------------------------------------------------Maintenance/Replacement Menu
Please select a type of FRU to be replaced.
You are about to replace FAN_A#0.
Do you want to continue?[r:replace|c:cancel] :r
Please confirm the CHECK LED is blinking.
If this is the case, please replace FAN_A#0.
After replacement has been completed, please select[f:finish] :f
Chapter 4 FRU Replacement Preparation4-7
The replacefru command automatically tests the status of the component after the
remove and replace is finished.
CODE EXAMPLE 4-2replacefru command status
Diagnostic tests of FAN_A#0 is started.
[This operation may take up to 2 minute(s)]
(progress scale reported in seconds)
0..... 30..... 60..... 90.....done
---------------------------------------------------------------Maintenance/Replacement Menu
Status of the replaced unit.
FRU Status
------------- -------FAN_A#0 Normal
---------------------------------------------------------------The replacement of FAN_A#0 has completed, normally.[f:finish] :f
---------------------------------------------------------------Maintenance/Replacement Menu
Please select a type of FRU to be replaced.
1. FAN (Fan Unit)
2. PSU (Power Supply Unit)
---------------------------------------------------------------Select [1,2|c:cancel] : C
XSCF>
Note – The display may vary depending on the XCP version.
When the tests are complete the program returns to the original menu. Select cancel
to return to the XSCF Shell prompt.
Refer to the replacefru man page for more information.
4-8SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
4.3.2Verifying Hardware Operation
1. Type the showhardconf command to confirm the new component is installed.
Refer to the showhardconf -u man page for more information.
3. Verify the state of the status LEDs on the FRU.
Refer to
TABLE 2-11 for LED status.
Chapter 4 FRU Replacement Preparation4-11
4.4Cold Replacement (Powering the Server
Off and On)
In cold replacement all business operations are stopped. Cold replacement is the act
of powering off the server and disconnecting input power. This is normally required
for safety when the inside of the server is accessed.
Note – The input power cables are used to ground the server. If the server is not
mounted in a rack use a grounding strap to ground the server.
Note – After a complete chassis power cycle (all power cords removed), make
certain to allow 30 seconds before plugging the power cords back into the chassis.
4.4.1Powering the Server Off Using Software
1. Notify users that the server is being powered off.
2. Back up the system files and data to tape, if necessary.
3. Log in to the XSCF Shell and type the poweroff command.
XSCF> poweroff -a
The following actions occur when the poweroff command is used:
■ The Solaris OS shuts down cleanly.
■ The server powers off to Standby mode (the XSCF unit and one fan will still have
power).
Refer to the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide for details.
4. Verify the state of the status LED on the XSCF.
The POWER LED should be off.
5. Disconnect all power cables from the input power source.
4-12SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
Caution – There is an electrical hazard if the power cords are not disconnected. All
power cords must be disconnected to completely remove power from the server.
4.4.2Powering the Server On Using Software
1. Make sure that the server has enough power supply units to run the desired
configuration.
2. Connect all power cables to the input power source.
3. Make sure the XSCF STANDBY LED on the operator panel is On.
4. Turn the keyswitch on the operator panel to the desired mode position (Locked
or Service).
5. Log into the XSCF Shell and type the poweron command.
XSCF> poweron -a
Refer to the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF
User’s Guide for details.
6. After a delay the following activities occur:
■ The operator panel POWER LED lights.
■ The system executes the power-on self-test (POST).
Then, the server is completely powered on.
Note – If the Solaris automatic booting is set, use the sendbreak -d domain_id
command after the display console banner is displayed but before the system starts
booting the operating system to get the ok prompt.
Chapter 4 FRU Replacement Preparation4-13
4.4.3Powering the Server Off Manually
1. Notify users that the server is being powered off.
2. Back up the system files and data to tape, if necessary.
3. Place the keyswitch in the Service position.
4. Press and hold the Power switch on the operator panel for four seconds or
longer to initiate the power off.
5. Verify the state of the status POWER LED on the operator panel is off.
6. Disconnect all power cables from the input power source.
Caution – There is an electrical hazard if the power cords are not disconnected. All
power cords must be disconnected to completely remove power from the server.
4.4.4Powering the Server On Manually
1. Make sure that the server has enough power supply units to run the desired
configuration.
2. Connect all power cables to the input power source.
3. Make sure the XSCF STANDBY LED is On.
4. Turn the keyswitch on the operator panel to the desired mode position (Locked
or Service).
5. Press the Power switch on the operator panel.
After a delay the following activities occur:
■ The operator panel POWER LED lights.
■ The system executes the power-on self-test (POST).
Then, the server is completely powered on.
Note – If the Solaris automatic booting is set, using the sendbreak -d domain_id
command after the display console banner is displayed but before the system starts
booting the operating system to get the ok prompt.
4-14SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
4.4.5Verifying Hardware Operation
1. From the ok prompt, press the Enter key, and press the “#.” (number sign and
period) keys to switch you from the domain console to the XSCF console.
2. Type the showhardconf command to confirm the new component is installed.
5. Type the probe-scsi-all command to confirm that the storage devices are
mounted.
CODE EXAMPLE 4-9probe-scsi-all
ok probe-scsi-all
/pci@0,600000/pci@0/pci@8/pci@0/scsi@1
MPT Version 1.05, Firmware Version 1.07.00.00
Target 0
Unit 0 Disk SEAGATE ST973401LSUN72G 0556 143374738 Blocks,
73 GB
SASAddress 5000c5000092beb9 PhyNum 0
Target 1
Unit 0 Disk SEAGATE ST973401LSUN72G 0556 143374738 Blocks,
73 GB
SASAddress 5000c500002eeaf9 PhyNum 1
Target 3
Unit 0 Removable Read Only device TSSTcorpCD/DVDW TS-L532USR01
SATA device PhyNum 3
ok
6. Type the boot command to start the operating system.
ok boot
4-18SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
CHAPTER
5
Internal Components Access
This chapter describes how to access the internal components. The information is
organized into the following topics:
■ “Sliding the Server In and Out to the Fan Stop” on page 5-1
■ “Top Cover Remove and Replace” on page 5-5
■ “Fan Cover Remove and Replace” on page 5-8
5.1Sliding the Server In and Out to the Fan
Stop
The slide rails have two designated lock points. The first, the fan stop, is for easy
access to the fan units. The fan units are hot, active, or cold replacement
components. When using active replacement, only one fan unit should be replaced at
a time to ensure redundancy.
5-1
5.1.1Sliding the Server Out of the Equipment Rack
Caution – To prevent the equipment rack from tipping over, you must deploy the
antitilt feature, if applicable, before you slide the server out of the equipment rack.
Note – When drawing out the SPARC Enterprise M4000/M5000 Server to the front,
release the cable tie holding the PCI cables on the rear of the server.
Caution – Use proper ESD grounding techniques when handling components. See
“Safety Precautions” on page 1-1.
1. Deploy the rack’s antitilt features (if applicable).
Refer to the manual that shipped with the rack for details on antitilt features.
2. If shipping brackets are attached to the back of the server, loosen the four (4)
captive screws (
FIGURE 5-1).
5-2SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
FIGURE 5-1 Loosening the Captive Screws on the Shipping Brackets
Chapter 5 Internal Components Access5-3
3. Loosen the four (4) captive screws at the front of the server (FIGURE 5-2).
FIGURE 5-2 Loosening the Captive Screws and Pulling Out the Server
4. Pull the system to the fan stop.
The system automatically locks in place at the fan stop.
5.1.2Sliding the Server Into the Equipment Rack
1. Push the green plastic releases on each slide rail and push the server back into
the equipment rack.
2. Tighten the four (4) captive screws at the front of the server to secure it in the
FIGURE 5-2).
rack (
3. Tighten the four (4) captive screws on the shipping brackets at the rear of the
server (
4. Restore the rack antitilt features to their original position.
5-4SPARC Enterprise M4000/M5000 Servers Service Manual • August 2009
FIGURE 5-1).
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.