document, and such products, technology and this document are protected by copyright laws, patents, and other intellectual property laws and
international treaties.
This document and the product and technology to which it pertains are distributed under licenses restricting their use, copying, distribution, and
decompilation. No part of such product or technology, or of this document, may be reproduced in any form by any means without prior written
authorization of Oracle and/or its affiliates and Fujitsu Limited, and their applicable licensors, if any. The furnishings of this document to you does not
give you any rights or licenses, express or implied, with respect to the product or technology to which it pertains, and this document does not contain or
represent any commitment of any kind on the part of Oracle or Fujitsu Limited, or any affiliate of either of them.
This document and the product and technology described in this document may incorporate third-party intellectual property copyrighted by and/or
licensed from the suppliers to Oracle and/or its affiliates and Fujitsu Limited, including software and font technology.
Per the terms of the GPL or LGPL, a copy of the source code governed by the GPL or LGPL, as applicab le, is available upon request b y the End User. Please
contact Oracle and/or its affiliates or Fujitsu Limited.
This distribution may include materials developed by third parties.
Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and
in other countries, exclusively licensed through X/Open Company, Ltd.
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Fujitsu and the Fujitsu logo are registered trademarks of Fujitsu Limited.
All SPARC trademarks are used under license and are registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing
SPARC trademarks are based upon architectures developed by Oracle and/or its affiliates. SPARC64 is a trademark of SPARC International, Inc., used
under license by Fujitsu Microelectronics, Inc. and Fujitsu Limited. Other names may be trademarks of their respective owners.
United States Government Rights - Commercial use. U.S. Government users are subject to the standard government user license agreements of Oracle
and/or its affiliates and Fujitsu Limited and the applicable provisions of the FAR and its supplements.
Disclaimer: The only warranties granted by Oracle and Fujitsu Limited, and/or any affiliate of either of them in connection with this document or any
product or technology described herein are those expressly set forth in the license agreement pursuant to which the product or technology is provided.
EXCEPT AS EXPRESSLY SET FORTH IN SUCH AGREEMENT, ORACLE OR FUJITSU LIMITED, AND/OR THEIR AFFILIATES MAKE NO
REPRESENTATIONS OR WARRANTIES OF ANY KIND (EXPRESS OR IMPLIED) REGARDING SUCH PRODUCT OR TECHNOLOGY OR THIS
DOCUMENT, WHICH ARE ALL PROVIDED AS IS, AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES,
INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NONINFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID. Unless
otherwise expressly set forth in such agreement, to the extent allowed by applicable law, in no event shall Oracle or Fujitsu Limited, and/or any of their
affiliates have any liability to any third party under any legal theory for any loss of revenues or profits, loss of use or data, or business interruptions, or for
any indirect, special, incidental or consequential damages, even if advised of the possibility of such damages.
DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES,
INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE
DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
technologies décrits dans ce document. De même, ces produits, technologies et ce document sont protégés par des lois sur le copyright, des brevets,
d’autres lois sur la propriété intellectuelle et des traités internationaux.
Ce document, le produit et les technologies afférents sont exclusivement distribués avec des licences qui en restreignent l’utilisation, la copie, la
distribution et la décompilation. Aucune partie de ce produit, de ces technologies ou de ce document ne peut être reproduite sous quelque forme que ce
soit, par quelque moyen que ce soit, sans l’autorisation écrite préalable d’Oracle et/ou ses sociétés affiliées et de Fujitsu Limited, et de leurs éventuels
bailleurs de licence. Ce document, bien qu’il vous ait été fourni, ne vous confère aucun droit et aucune licence, expresses ou tacites, concernant le produit
ou la technologie auxquels il se rapporte. Par ailleurs, il ne contient ni ne représente aucun engagement, de quelque type que ce soit, de la part d’Oracle ou
de Fujitsu Limited, ou des sociétés affiliées de l’une ou l’autre entité.
Ce document, ainsi que les produits et technologies qu’il décrit, peuvent inclure des droits de propriété intellectuelle de parties tierces protégés par
copyright et/ou cédés sous licence par des fournisseurs à Oracle et/ou ses sociétés affiliées et Fujitsu Limited, y compris des logiciels et des technologies
relatives aux polices de caractères.
Conformément aux conditions de la licence GPL ou LGPL, une copie du code source régi par la licence GPL ou LGPL, selon le cas, est disponible sur
demande par l’Utilisateur final. Veuillez contacter Oracle et/ou ses sociétés affiliées ou Fujitsu Limited.
Cette distribution peut comprendre des composants développés par des parties tierces.
Des parties de ce produit peuvent être dérivées des systèmes Berkeley BSD, distribués sous licence par l’Université de Californie. UNIX est une marque
déposée aux États-Unis et dans d’autres pays, distribuée exclusivement sous licence par X/Open Company, Ltd.
Oracle et Java sont des marques déposées d’Oracle Corporation et/ou de ses sociétés affiliées. Fujitsu et le logo Fujitsu sont des marques déposées de
Fujitsu Limited.
Toutes les marques SPARC sont utilisées sous licence et sont des marques déposées de SPARC International, Inc., aux États-Unis et dans d’autres pays. Les
produits portant la marque SPARC reposent sur des architectures développées par Oracle et/ou ses sociétés affiliées. SPARC64 est une marque de SPARC
International, Inc., utilisée sous licence par Fujitsu Microelectronics, Inc. et Fujitsu Limited. Tout autre nom mentionné peut corresp ondre à des marq ues
appartenant à d’autres propriétaires.
United States Government Rights - Commercial use. U.S. Government users are subject to the standard government user license agreements of Oracle
and/or its affiliates and Fujitsu Limited and the applicable provisions of the FAR and its supplements.
Avis de non-responsabilité : les seules garanties octroyées par Oracle et Fujitsu Limited et/ou toute société affiliée de l’une ou l’autre entité en rapport avec
ce document ou tout produit ou toute technologie décrits dans les présentes correspondent aux garanties expressément stipulées dans le contrat de licence
régissant le produit ou la technologie fournis. SAUF MENTION CONTRAIRE EXPRESSÉMENT STIPULÉE DANS CE CONTRAT, ORACLE OU FUJITSU
LIMITED ET LES SOCIÉTÉS AFFILIÉES À L’UNE OU L’AUTRE ENTITÉ REJETTENT TOUTE REPRÉSENTATION OU TOUTE GARANTIE, QUELLE
QU’EN SOIT LA NATURE (EXPRESSE OU IMPLICITE) CONCERNANT CE PRODUIT, CETTE TECHNOLOGIE OU CE DOCUMENT, LESQUELS
SONT FOURNIS EN L’ÉTAT. EN OUTRE, TOUTE S LES CONDITIONS, REPRÉSENTATIONS ET GARANTIES EXPRESSES OU TACITES, Y COMPRIS
NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE À LA QUALITÉ MARCHANDE, À L’APTITUDE À UNE UT ILISATION PARTICULIÈRE
OU À L’ABSENCE DE CONTREFAÇON, SONT EXCLUES, DANS LA MESURE AU TORISÉE PAR LA LOI APPLICABLE. Sauf mention contraire
expressément stipulée dans ce contrat, dans la mesure autorisée par la loi applicable, en aucun cas Oracle ou Fujitsu Limited et/ou l’une ou l’autre de leurs
sociétés affiliées ne sauraient être tenues responsables envers une quelconque partie tierce, sous quelque théorie juridique que ce soit, de tout manque à
gagner ou de perte de profit, de problèmes d’utilisation ou de perte de données, ou d’interruptions d’activités, ou de tout dommage indirect, spécial,
secondaire ou consécutif, même si ces entités ont été préalablement informées d’une telle éventualité.
LA DOCUMENTATION EST FOURNIE « EN L’ÉTAT » ET TOUTE AUTRE CONDITION, DÉCLARATION ET GARANTIE, EXPRESSE OU TACITE, EST
FORMELLEMENT EXCLUE, DANS LA MESURE AUTORISÉE PAR LA LOI EN VIGUEUR, Y COMPRIS NOTAMMENT TOUTE GARANTIE
IMPLICITE RELATIVE À LA QUALITÉ MARCHANDE, À L’APTITUDE À UNE UTILISATION PARTICULIÈRE OU À L’ABSENCE DE
CONTREFAÇON.
Contents
Prefacexiii
1.Safety Precautions for Maintenance1–1
1.1ESD Precautions1–1
1.2Server Precautions1–3
1.2.1Electrical Safety Precautions1–3
1.2.2Equipment Rack Safety Precautions1–3
1.2.3Component Handling Precautions1–4
2.Hardware Overview2–1
2.1Name of Each Part2–1
2.2Operator Panel2–5
2.2.1Operator Panel Overview2–6
2.2.2Switches on the Operator Panel2–7
2.2.3LEDs on the Operator Panel2–9
2.3LED Functions of Components2–11
2.4External Interface Port on Rear Panel2–13
2.5Labels2–17
3.Troubleshooting3–1
3.1Emergency Power Off3–1
v
3.2Failure Diagnostic Method3–2
3.3Checking the Server and System Configuration3–4
3.3.1Checking the Hardware Configuration and FRU Status3–4
3.3.1.1Checking the Hardware Configuration.3–5
3.3.2Checking the Software and Firmware Configurations3–6
3.3.2.1Checking the Software Configuration3–7
3.3.2.2Checking the Firmware Configuration3–7
3.3.2.3Downloading Error Log Information3–7
3.4Error Conditions3–8
3.4.1Predictive Self-Healing Tools3–8
3.4.2Monitoring Output3–10
3.4.3Messaging Output3–10
3.5Using Troubleshooting Commands3–11
3.5.1Using the showhardconf Command3–11
3.5.2Using the showlogs Command3–14
3.5.3Using the showstatus Command3–15
3.5.4Using the fmdump Command3–16
3.5.4.1fmdump -V Command3–16
3.5.4.2fmdump -e Command3–17
3.5.5Using the fmadm Command3–17
3.5.5.1Using the fmadm faulty Command3–17
3.5.5.2fmadm repair Command3–18
3.5.5.3fmadm config Command3–18
3.5.6Using the fmstat Command3–19
3.6General Oracle Solaris Troubleshooting Commands3–19
3.6.1Using the iostat Command3–20
3.6.1.1Options3–20
3.6.2Using the prtdiag Command3–21
viSPARC Enterprise M3000 Server Service Manual • March 2012
3.6.2.1Options3–21
3.6.3Using the prtconf Command3–23
3.6.3.1Options3–24
3.6.4Using the netstat Command3–26
3.6.4.1Options3–26
3.6.5Using the ping Command3–27
3.6.5.1Options3–27
3.6.6Using the ps Command3–28
3.6.6.1Options3–29
3.6.7Using the prstat Command3–29
3.6.7.1Options3–30
4.FRU Replacement Preparation4–1
4.1Tools Required for Maintenance4–1
4.2FRU Replacement and Installation Methods4–2
4.2.1FRU Replacement4–2
4.2.2FRU Installation4–4
4.3Active Replacement/Active Addition4–5
4.3.1Releasing a FRU from a Domain4–5
4.3.2FRU Removal and Replacement4–6
4.3.3Configuring a FRU in a Domain4–6
4.3.4Verifying the Hardware Operation4–7
4.4Hot Replacement/Hot Addition4–7
4.4.1FRU Removal and Replacement (Power supply unit/Fan unit)
4–8
4.4.2Verifying the Hardware Operation (Power supply unit/Fan unit)
4–10
4.4.3Verifying the Hardware Operation (Hard disk drive)4–11
4.5Cold Replacement/Cold Addition4–12
4.5.1Powering off the Server4–12
Contentsvii
4.5.1.1Power-off by Using the XSCF Command4–12
4.5.1.2Power off by Using the Operator Panel4–13
4.5.2FRU Removal and Replacement4–13
4.5.3Powering on the Server4–13
4.5.3.1Power-on by Using the XSCF Command4–13
4.5.3.2Power-on by Using the Operator Panel4–14
4.5.4Verifying the Hardware Operation4–15
5.Internal Components Access5–1
5.1Sliding the Server Into and Out of the Equipment Rack5–1
5.1.1Sliding the Server Out from the Equipment Rack5–1
5.1.2Sliding the Server into the Equipment Rack5–3
5.2Removing and Attaching the Top Cover5–3
5.2.1Removing the Top Cover5–3
5.2.2Attaching the Top Cover5–4
5.3Removing and Attaching the Air Duct5–4
5.3.1Removing the Air Duct5–4
5.3.2Attaching the Air Duct5–6
5.4Removing and Attaching the Fan Cover5–7
5.4.1Removing the Fan Cover5–7
5.4.2Attaching the Fan Cover5–8
6.Motherboard Unit Replacement6–1
6.1Accessing the Motherboard Unit6–4
6.2Removing the Motherboard Unit6–7
6.3Mounting the Motherboard Unit6–8
6.4Reassembling the Server6–9
7.Replacement and Installation of Memory7–1
7.1Memory Mounting Rules7–3
viiiSPARC Enterprise M3000 Server Service Manual • March 2012
7.1.1Confirmation of DIMM Information7–3
7.1.2Memory Mounting Conditions7–4
7.2Accessing the DIMMs7–7
7.3Removing the DIMMs7–8
7.4Installing the DIMMs7–9
7.5Reassembling the Server7–9
8.Replacement and Installation of PCIe Cards8–1
8.1Accessing a PCIe Card8–3
8.2Removing a PCIe Card8–4
8.3Mounting a PCIe Card8–5
8.4Reassembling the Server8–5
9.Replacement and Installation of a Hard Disk Drive (HDD)9–1
9.1Accessing a Hard Disk Drive9–3
9.1.1Active Replacement9–3
9.1.2Hot Replacement9–3
9.1.3Cold Replacement9–4
9.2Removing a Hard Disk Drive9–4
9.3Installing a Hard Disk Drive9–6
9.4Reassembling the Server9–6
9.4.1Active Replacement9–6
9.4.2Hot Replacement9–7
9.4.3Cold Replacement9–7
10.Replacing the Hard Disk Drive Backplane10–1
10.1Accessing the Hard Disk Drive Backplane10–2
10.2Removing the Hard Disk Drive Backplane10–3
10.3Mounting the Hard Disk Drive Backplane10–5
10.4Reassembling the Server10–6
Contentsix
11.CD-RW/DVD-RW Drive Unit (DVDU) Replacement11–1
11.1Identifying the Type of CD-RW/DVD-RW Drive Unit11–3
11.2Accessing the CD-RW/DVD-RW Drive Unit11–4
11.3Removing the CD-RW/DVD-RW Drive Unit11–5
11.4Mounting the CD-RW/DVD-RW Drive Unit11–6
11.5Reassembling the Server11–6
12.Power Supply Unit Replacement12–1
12.1Accessing a Power Supply Unit12–3
12.2Removing the Power Supply Unit12–3
12.3Mounting the Power Supply Unit12–5
12.4Reassembling the Server12–5
13.Fan Unit Replacement13–1
13.1Accessing a Fan Unit13–3
13.2Removing a Fan Unit13–4
13.3Mounting a Fan Unit13–5
13.4Reassembling the Server13–5
14.Fan Backplane Replacement14–1
14.1Accessing the Fan Backplane14–2
14.2Removing the Fan Backplane14–5
14.3Mounting the Fan Backplane14–6
14.4Reassembling the Server14–6
15.Operator Panel Replacement15–1
15.1Accessing the Operator Panel15–3
15.2Removing the Operator Panel15–4
15.3Mounting the Operator Panel15–5
15.4Reassembling the Server15–5
xSPARC Enterprise M3000 Server Service Manual • March 2012
A. Components ListA–1
B. FRU ListB–1
B.1Server OverviewB–1
B.2Motherboard UnitB–2
B.2.1Memory (DIMM)B–3
B.2.2PCIe SlotB–3
B.2.3CPUB–4
B.2.4XSCF UnitB–4
B.3DriveB–5
B.3.1Hard Disk DriveB–5
B.3.2CD-RW/DVD-RW Drive Unit (DVDU)B–6
B.4Power Supply UnitB–6
B.5Fan UnitB–7
C. External Interface SpecificationsC–1
C.1Serial PortC–2
C.2UPC PortC–2
C.3USB PortC–3
C.4SAS PortC–3
C.5Connection Diagram for Serial CableC–4
D. UPS ControllerD–1
D.1OverviewD–1
D.2Signal CableD–2
D.3Configuration of Signal LinesD–3
D.4Power Supply ConditionsD–4
D.4.1Input CircuitD–4
D.4.2Output CircuitD–5
D.5UPS CableD–5
Contentsxi
D.6ConnectionsD–6
E. DC Power Supply ModelE–1
E.1The Server ViewsE–2
E.2LED Functions of Power Supply UnitE–4
E.3Electrical SpecificationsE–5
E.4Using the showhardconf CommandE–6
F. Reactivating a Hardware RAID Boot VolumeF–1
AbbreviationsAbbreviations–1
IndexIndex–1
xiiSPARC Enterprise M3000 Server Service Manual • March 2012
Preface
This manual describes how to service SPARC Enterprise M3000 server from Oracle
and Fujitsu. This document is written for maintenance providers who have received
formal service training. References herein to the M3000 server are reference to the
SPARC Enterprise M3000 server.
This preface includes the following sections:
■ “Audience” on page xiii
■ “Related Documentation” on page xiv
■ “Text Conventions” on page xv
■ “Notes on Safety” on page xv
■ “Syntax of the Command-Line Interface (CLI)” on page xvi
■ “Documentation Feedback” on page xvi
Audience
This guide is written for experienced system administrators with working
knowledge of computer networks and advanced knowledge of the Oracle Solaris
Operating System (Oracle Solaris OS).
xiii
Related Documentation
All documents for your server are available online at the following locations.
DocumentationLink
Sun Oracle software-related manuals
(Oracle Solaris OS, and so on)
xviSPARC Enterprise M3000 Server Service Manual • March 2012
CHAPTER
1
Safety Precautions for Maintenance
This chapter provides safety precautions required for maintenance.
■ Section 1.1, “ESD Precautions” on page 1-1
■ Section 1.2, “Server Precautions” on page 1-3
1.1ESD Precautions
To ensure that you and bystanders are not exposed to harm and to prevent damage
to the system, observe the following safety precautions.
TABLE 1-1 ESD Precautions
ItemPrecaution
ESD connector/wrist strapConnect the ESD connector to your server and wear the antistatic wrist strap
when handling printed circuit boards. See
connection destination.
Conductive matAn approved conductive mat provides protection from static damage when
used with a wrist strap. The mat also cushions and protects small parts that
are attached to printed circuit boards.
ESD safe packaging boxPlace a printed board or component in the ESD safe packaging box after you
remove it.
FIGURE 1-1, for the wrist strap
1-1
FIGURE 1-1 Wrist Strap Connection Destination
■ Hard disk drive or fan unit:
Connect to one of two thumbscrews
on the front of the server.
■ FRU* other than hard disk drive and fan unit
Connect to either upper right on the front or upper left
on the rear of the server.
* FRU: Field Replaceable Unit
Caution – Do not connect the wrist strap cable to the conductive mat. Connect it
directly to the server.
The wrist strap and FRU must have the same level of potential.
1-2SPARC Enterprise M3000 Server Service Manual • March 2012
1.2Server Precautions
When maintaining the server, observe the following precautions for your protection.
■ Follow all cautions, warnings, and instructions marked on the server.
Caution – Do not insert any object in an opening of the server. If any object comes
into contact with a high-voltage part or short-circuits a component, fire or electric
shock might result.
■ Refer servicing of the server to the service engineer.
1.2.1Electrical Safety Precautions
■ Ensure that the voltage and frequency of the power source to be used matches the
electrical rating labels on the server.
■ Wear antistatic wrist straps when handling hard disk drives, motherboard units,
or other printed circuit boards.
■ Use grounded power outlets as described in the SPARC Enterprise M3000 Server
Installation Guide.
Caution – Do not make mechanical or electrical modifications. We are not
responsible for regulatory compliance of modified servers.
1.2.2Equipment Rack Safety Precautions
■ The equipment racks must be anchored to the floor, ceiling, or to adjacent frames.
■ Some equipment racks are supplied with a Quake-Resistant Options Kit or
stabilizer, which supports the weight of the server when it is extended on its slide
rails. This prevents the equipment from toppling over during installation or
maintenance.
■ In the following cases, a safety evaluation must be conducted by the service
engineer prior to installation or maintenance work.
■ When no Quake-Resistant Options Kits or stabilizers are attached and the
equipment rack is not anchored to the floor, ensure safety by confirming that
the server does not fall over when it is pulled out from the slide rails.
Chapter 1 Safety Precautions for Maintenance1-3
■ When the equipment rack is mounted on a raised floor, ensure that the raised
floor has sufficient strength to withstand the weight upon it when the server is
extended on its slide rails. Fix the equipment rack through the raised floor to
the concrete floor below it, using a proprietary mounting kit for this purpose.
Caution – If more than one server is installed in an equipment rack, maintain the
servers one at a time.
For details of equipment racks, see the SPARC Enterprise Equipment Rack Mounting Guide.
1.2.3Component Handling Precautions
Caution – The server is easily damaged by static electricity. To prevent damage to
printed circuit boards, wear a wrist strap and connect it to the server prior to
starting maintenance.
Caution – Do not bend the motherboard unit (MBU) or the components mounted
on circuit boards might be damaged.
To prevent the motherboard unit from being bent, observe the following precautions:
■ Hold the motherboard unit by the handle, where the board stiffener is located.
■ When removing the motherboard unit from the packaging, keep the motherboard
unit horizontal until you lay it on the cushioned conductive mat.
■ Connectors and components on the motherboard unit have thin pins that bend
easily. Therefore, do not place the motherboard unit on a hard surface.
■ Be careful not to damage the small parts located on both sides of the motherboard unit.
Caution – The heat sinks can be damaged by incorrect handling. Do not touch the
heat sinks while replacing or removing motherboard units. If a heat sink is loose or
broken, obtain a replacement motherboard unit. When storing or carrying a
motherboard unit, ensure that the heat sinks have sufficient protection.
Caution – When removing a cable such as the LAN cable, if your fingers do not
reach the latch lock of the connecter, use a flat head screwdriver to push the latch to
disconnect the cable. If you forcibly insert your fingers into the service clearance, the
LAN port of the motherboard unit of PCI Express (PCIe) cards may be damaged.
1-4SPARC Enterprise M3000 Server Service Manual • March 2012
CHAPTER
2
Hardware Overview
This chapter explains the names of components and also explains the LEDs on the
operator panel and rear panel.
■ Section 2.1, “Name of Each Part” on page 2-1
■ Section 2.2, “Operator Panel” on page 2-5
■ Section 2.3, “LED Functions of Components” on page 2-11
■ Section 2.4, “External Interface Port on Rear Panel” on page 2-13
■ Section 2.5, “Labels” on page 2-17
2.1Name of Each Part
This section explains the names of parts mounted on the M3000 server.
Among these parts, those which can be replaced in the field by a certified field
engineer are called Field Replaceable Units (FRU). For information on the actual
replacement/expansion procedure for FRUs, see Chapter 6 to Chapter 15.
The server consists of a chassis in which various components are mounted, top cover
to protect the mounted components, front panel, and rear panel. An operator panel
is located on the front panel, and ports used to connect external interfaces are
located on the rear panel. From the LEDs on the operator panel and rear panel, error
and other status information can be checked. For details, see Section 2.2, “Operator
Panel” on page 2-5 to Section 2.4, “External Interface Port on Rear Panel” on
page 2-13.
2-1
FIGURE 2-1, FIGURE 2-2 and FIGURE 2-3 are the internal view, front view, and rear view
CPU
CD-RW/DVD-RW drive unit (DVDU)
Fan
unit
(FAN_A)
Memory (DIMM)XSCF unit (XSCFU)
PCIe
slot
Power supply unit (PSU)
DC-DC
converter
(DDC)
Fan backplane (FANBP_B)
PCIe card (PCIe)
Hard disk drive backplane
(HDDBP)
Motherboard unit
of the server, respectively, and they indicate the names and abbreviated names of
main components.
FIGURE 2-1 Server (Internal View)
2-2SPARC Enterprise M3000 Server Service Manual • March 2012
Note – The form of the DC-DC converter may be different depending on the
motherboard unit which is mounted.
FIGURE 2-2 Server (Front View)
1234
Location NumberComponent
1Fan unit (FAN_A)
2Operator panel (OPNL)
3Hard disk drive (HDD) (2.5-inch SAS disk)
4CD-RW/DVD-RW drive unit (DVDU)
Chapter 2 Hardware Overview2-3
FIGURE 2-3 Server (Rear View) (AC Power Supply Model)
123456 7
89
Location NumberComponent
1Power supply unit (PSU)
2PCIe slot
3RCI port
*
4USB port (for XSCF)
5Serial port (for XSCF)
6LAN port (for XSCF)
7UPC port
8Serial Attached SCSI (SAS) port
9Gigabit Ethernet (GbE) port (for OS)
* For information on whether the RCI function is supported for your server, see the
2-4SPARC Enterprise M3000 Server Service Manual • March 2012
2.2Operator Panel
The operator panel has the important function of controlling the power of the server.
The operator panel is usually locked with a key to prevent the server from being
mistakenly powered off during system operation.
Before starting maintenance work, ask the system administrator to unlock the
operator panel.
Chapter 2 Hardware Overview2-5
2.2.1Operator Panel Overview
1
2
3
4
5
The system administrator or service engineer checks the operating status of the
server with LEDs or operates the power supply with the power switch.
shows the location of the operator panel.
FIGURE 2-4 Operator Panel Location
FIGURE 2-4
Location NumberComponent
1POWER LED
2XSCF STANDBY LED
3CHECK LED
4Power button
5Mode switch (key switch)
2-6SPARC Enterprise M3000 Server Service Manual • March 2012
2.2.2Switches on the Operator Panel
TABLE 2-1 depicts the functions of the switches on the operator panel.
The switches on the operator panel include the mode switch for setting the operation
mode and the power switch for turning on and off the server.
TABLE 2-1 Switches (Operator Panel)
SwitchNameDescription of Function
Mode
Switch
(Key
Switch)
Power buttonThis button is used to turn on or turn off the power to the
Holding down the button
for a short time
(less than 4 seconds)
Holding down the button
for a long time in Service
mode
(4 seconds or longer)
* In normal operation, the server is powered on only when the data center environmental conditions satisfy the specified values. Then,
the server remains in the reset state until the operating system is booted.
LockedNormal operation mode
ServiceMode for maintenance
This switch is used to set the operation mode for the server.
Insert the special key that is under the customer's control, to
switch between modes.
• The system can be powered on with the power button, but
it cannot be powered off with the power button.
• The key can be pulled out at this key position.
• The system can be powered on and off with the power
button.
• The key cannot be pulled out at this key position.
• To stop and maintain the server, set the mode to Service.
server (a domain).
Power on and power off are controlled by pressing this button
in different patterns, as described below.
Regardless of the mode switch setting, the server is powered
on.
If set in the XSCF, facility (air conditioners) power-on and
warm-up processing is skipped.
• If power to the server is on, OS shutdown processing is
executed for all domains before the system is powered off.
• If the server is being powered on, the power-on processing
is cancelled, and the server is powered off.
• If the server is being powered off, the operation of the
power button is ignored, and the power-off processing is
continued.
*
Chapter 2 Hardware Overview2-7
TABLE 2-2 shows the function of the mode switch.
TABLE 2-2 Mode Switch Function
FunctionMode Switch
LockedService
Inhibition of Break Signal ReceptionEnabled Reception of the
Break signal can be
enabled or disabled for
each domain using
setdomainmode
command.
Power On/Off by power buttonOnly Power On is
enabled.
Disabled
Enabled
2-8SPARC Enterprise M3000 Server Service Manual • March 2012
2.2.3LEDs on the Operator Panel
TABLE 2-3 lists the server states displayed with the LEDs on the operator panel.
The three LED indicators on the operator panel indicate the following:
■ General system status
■ System error warning
■ System error location
Besides the states listed in
TAB LE 2 -3, the operator panel also displays various states
of the server using combinations of the three LEDs.
are displayed in the course of operation from power-on to power-off of the server.
The blinking interval is 1 second (1 Hz).
TABLE 2-3 LEDs on the Operator Panel
IconNameStatusDescription
POWER LEDGreenIndicates the server power status.
• On: The power to the server (a domain) is on.
• Off: The power to the server is off.
• Blinking: The server is powered off.
XSCF
XSCF
STANDBY
LED
CHECK LEDAmberIndicates that the server has detected an error. This is
GreenIndicates the XSCF unit status.
• On: XSCF unit is functioning normally.
• Off: Input power source is off or is just after turned on, and
• Blinking: System initialization is in progress after power
sometimes called a locator.
• On: An error that hinders startup was detected.
• Off: Normal, or power is not being supplied.
• Blinking: Indicates that the unit is a maintenance target.
TAB LE 2 -4 indicates the states that
XSCF unit is stopped.
was turned on.
In service mode, break signals can be suppressed. If the key position is switched to
Service, the server will boot into service mode the next time it reboots. Service is
selected by default at the initial power-on.
Chapter 2 Hardware Overview2-9
TABLE 2-4 State Display by Combination of LEDs on the Operator Panel
NameDescription
*
POWER
XSCF STANDBYCHECK
XSCF
OffOffOffPower is not being supplied.
OffOffOnPower has been turned on.
OffBlinkingOffThe XSCF unit is being initialized.
OffBlinkingOnAn error occurred in the XSCF unit.
OffOnOffThe XSCF unit is in the standby state.
The server is waiting for power-on of the air
conditioning facilities in the data center.
OnOnOffWarm-up standby processing is in progress (power is
turned on after the end of processing).
The power-on sequence is in progress.
The server is in operation.
BlinkingOnOffThe power-off sequence is in progress.
(The fan units are stopped after the end of processing.)
* READY LED is referred to when the XSCF unit status is indicated.
2-10SPARC Enterprise M3000 Server Service Manual • March 2012
2.3LED Functions of Components
This section explains the LEDs of each component. When replacing a FRU, check in
advance the states of LEDs.
Normal system state can be confirmed by checking the operator panel. If an error
occurs in an individual hardware component in the server, the LEDs of the
component containing the hardware component which caused the error will indicate
the error location. However, some components such as DIMMs do not have LEDs.
To check the state of a component that has no LEDs, use an XSCF Shell command
such as showhardconf in the maintenance terminal. For details, see
TABLE 2-5 describes the component LEDs and their functions.
TABLE 2-5 Component LEDs and Their Functions
ComponentNameStatusDescription
Motherboard unit
(MBU)
POWERIndicates whether the MBU is operating.
On (green)Indicates that the motherboard is operating. The motherboard
cannot be removed from the server while the POWER LED is
on.
Blinking
(green)
OffIndicates that the MBU is stopped. The MBU can be
CHECKIndicates the motherboard unit status.
On (amber)Indicates that an error occurred in the MBU.
OffIndicates that the MBU is in the normal state.
Indicates that the MBU is being incorporated into the system
or being disconnected from the system.
disconnected and replaced.
TABLE 3-1.
Chapter 2 Hardware Overview2-11
TABLE 2-5 Component LEDs and Their Functions (Continued)
ComponentNameStatusDescription
Hard disk drive
(HDD)
Indicates that the hard disk drive can be removed. However,
this LED is not used.
CHECKOn (amber)Indicates that an error occurred in the HDD. However, this
LED stays on for several minutes (until initialization starts)
immediately after power-on. This state does not indicate an
error.
Blinking
Indicates that the HDD is ready to be replaced.
(amber)
OffIndicates that the HDD is in the normal state.
READYOn (green)Indicates that the HDD is operating. The HDD cannot be
removed (cannot be replaced).
OK
Blinking
(green)
Indicates that the HDD is performing communication.
The HDD cannot be removed (cannot be replaced).
OffThe HDD can be replaced.
Power supply unit
(PSU)
DCOn (green)Indicates that power is turned on and being supplied.
OffIndicates that power is turned off and not being supplied.
ACOn (green)Indicates that input power is being supplied to the power
supply unit.
OffIndicates that input power is not being supplied to the power
supply unit.
CHECKOn (amber)Indicates that an error occurred in the PSU.
Blinking
Indicates that the power supply unit is ready to be replaced.
(amber)
OffIndicates that the PSU is in the normal state.
Fan unit (FAN_A)CHECKOn (amber)Indicates that an error occurred in the fan unit.
Blinking
Indicates that the fan unit is ready to be replaced.
(amber)
OffIndicates that the fan unit is in the normal state.
2-12SPARC Enterprise M3000 Server Service Manual • March 2012
TABLE 2-5 Component LEDs and Their Functions (Continued)
ComponentNameStatusDescription
LAN port display
part
ACTIVEOn (green)Indicates that communication is being performed through the
LAN port.
OffIndicates that communication is not being performed through
the LAN port.
LINK
SPEED
On (amber)Indicates that the communication speed of the LAN port is 1
Gbps.
On (green)Indicates that the communication speed of the LAN port is
100 Mbps.
OffIndicates that the communication speed of the LAN port is 10
Mbps.
2.4External Interface Port on Rear Panel
This section shows the location of the external interface ports located on the server
rear panel and explains their functions.
Chapter 2 Hardware Overview2-13
FIGURE 2-5 External Interface Port Locations
12345 6
789101112
2-14SPARC Enterprise M3000 Server Service Manual • March 2012
TABLE 2-6 External Interface Port Functions
Location Number ComponentDescription
1RCI portUsed to connect the server to a peripheral device
having a RCI connector to enable power
interlocking and error monitoring.
For information on whether the RCI function is
supported for your server, see the SPARC Enterprise
2USB port (for XSCF)Exclusive for maintenance personnel. Cannot be
connected to general-purpose USB devices.
3Serial port (for XSCF)Connects to the XSCF unit through serial
connection to set up and manage the server.
4LAN port 1
(for XSCF)
Accommodates a 100Base-TX LAN cable to set up
the server and display status.
• XSCF Shell (command-line interface: CLI):
• XSCF Web (browser user interface: BUI):
5LAN port 0
(for XSCF)
Through CLI or BUI, the user or system
administrator monitors the server, displays
status, operates domains, and displays
information on the console.
6UPC port 1By connecting an uninterruptible power supply
(UPS) unit that has the UPS controller (UPC)
interface, stable power supply is provided in the
event of a failure in the power supply or even a
7UPC port 0
large-scale power failure.
If a single power feed is used, connect a UPS cable
to UPC port 0. In a dual power feed, connect UPS
cables to UPC ports 0 and 1.
Chapter 2 Hardware Overview2-15
TABLE 2-6 External Interface Port Functions (Continued)
Location Number ComponentDescription
8GbE port 0 (for OS)Up to 4 100Base-TX/1000Base-T cables can be
connected to GbE ports.
High-capacity data can be transferred at a high
speed.
9GbE port 1 (for OS)
10GbE port 2 (for OS)
11GbE port 3 (for OS)
12SAS portAccommodates external Serial Attached SCSI (SAS)
devices such as a tape drive.
2-16SPARC Enterprise M3000 Server Service Manual • March 2012
2.5Labels
This section explains the labels and the card affixed to the server.
Note – The information on the label might differ from that shown on the affixed
labels.
■ The model number, serial number, and hardware version, all of which are
required for maintenance and management, are shown on the system faceplate
label.
■ The standards label is affixed close to the system faceplate label and shows the
approval standards.
■ Safety: NRTL/C
■ Radio wave: VCCI-A, FCC-A, DOC-A, MIC
■ Safety and radio wave: CE
A label-affixed card that can be inserted or extracted is provided near the power
supply unit at the right side at the rear of the server (see
be inserted in such a way that the standards label faces the outside of the server and
the system faceplate label faces the inside of the server.
TABLE 2-6). The card should
Chapter 2 Hardware Overview2-17
FIGURE 2-6 Label Locations
Inside: System faceplate label
Outside: Standards label
2-18SPARC Enterprise M3000 Server Service Manual • March 2012
CHAPTER
3
Troubleshooting
This chapter provides the fault diagnosis information and the actions to take for
problems.
■ Section 3.1, “Emergency Power Off” on page 3-1
■ Section 3.2, “Failure Diagnostic Method” on page 3-2
■ Section 3.3, “Checking the Server and System Configuration” on page 3-4
■ Section 3.4, “Error Conditions” on page 3-8
■ Section 3.5, “Using Troubleshooting Commands” on page 3-11
■ Section 3.6, “General Oracle Solaris Troubleshooting Commands” on page 3-19
3.1Emergency Power Off
This section explains how to power off in an emergency.
Caution – In an emergency (such as smoke or flames coming from the server),
immediately stop using the server and turn off the power supply. Regardless of the
type of business, give top priority to fire prevention measures.
1. Press the power switch for more than 4 seconds to power off the server.
3-1
2. Remove the power cord clamp and disconnect the cable.
FIGURE 3-1 Power-off Method
3.2Failure Diagnostic Method
When an error occurs, a message is displayed on the maintenance monitor in many
cases. Use the flowchart in
failures.
3-2SPARC Enterprise M3000 Server Service Manual • March 2012
FIGURE 3-2 to find the correct methods for diagnosing
FIGURE 3-2 Diagnostic Method Flowchart
The XSCF
mail function sent an E-mail
message?
Start
Check whether an error message
is displayed on the OS console
and XSCF console.
The XSCF console displays
an error message?
Check /var/adm/messages in the
Oracle Solaris OS.
FMA message?
Can the message
ID be used?
Execute fmadm to display fault
information.
Enter the message ID in https://support.oracle.com/
to refer to fault information.
Execute showlogs or fmadm in the
XSCF to display fault information.
Make a memo of the displayed
fault information.
Contact your service engi neer.
End
YESNO
YES
NO
NO
NO
NO
Has the problem been
solved?
YES
YES
YES
OS panic or performance
error?
Is the power OK or
AC OK LED off?
YES
Check the power supply unit and
its connection.
NO
Chapter 3 Troubleshooting3-3
3.3Checking the Server and System
Configuration
The operating conditions must remain the same before and after maintenance. If an
error occurs in the server, save the system configuration and component status
information. Confirm that the recovered state after maintenance is the same as that
before maintenance.
If an error occurs in the server, one of the following messages is displayed.
■ Oracle Solaris Operating System message file
■ XSCF Shell showhardconf(8) command and showstatus(8) command
■ Management console
■ Service processor log
3.3.1Checking the Hardware Configuration and FRU
Status
To replace a faulty FRU and perform the maintenance on the server, it is important
to check and understand the hardware configuration of the server and the state of
each hardware component.
The hardware configuration refers to information that indicates to which layer a
hardware component belongs.
The status of each hardware component refers to information on the conditions of a
standard or optional component in the server: temperature, power supply voltage,
CPU operating conditions, and other status information.
To check the hardware configuration and the status of each hardware component,
use XSCF Shell commands from the maintenance terminal. See
commands used.
TABLE 3-1 Commands for Checking Hardware Configuration
CommandDescription
showhardconfDisplays hardware configuration.
showstatusDisplays the status of a component. This command is used only when a
faulty component is checked.
3-4SPARC Enterprise M3000 Server Service Manual • March 2012
TABLE 3-1 for the
TABLE 3-1 Commands for Checking Hardware Configuration (Continued)
CommandDescription
showboardsDisplays information on the system board (XSB).
showdclDisplays the hardware resource configuration information of a domain.
showfruDisplays the setting information of a device.
The status of each component can be checked based on the On or blinking state of
the component LEDs.
For the component types and LED states, see
TAB LE 2 -3 and TA BLE 2- 5.
For details of commands, see the SPARC Enterprise
M3000/M4000/M5000/M8000/M9000 Servers XSCF User's Guide and the SPARC
Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF Reference Manual.
3.3.1.1Checking the Hardware Configuration.
To check the hardware configuration, authority (user account) to log in with the
XSCF user account to the XSCF is required. The following procedure can be used to
check the hardware configuration from the maintenance terminal.
Ask the system administrator for the required information, such as the user account
and password. For details, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User's Guide.
1. Log in to XSCF Shell.
2. Type showhardconf.
XSCF> showhardconf
The showhardconf command displays hardware configuration information. For
details, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User's Guide.
Chapter 3 Troubleshooting3-5
3.3.2Checking the Software and Firmware
Configurations
The software and firmware configurations and versions affect the operation of the
server. To change the configuration or investigate a problem, check the latest
information and check for any problems in the software.
Software and firmware varies according to user conditions.
■ The software configuration and version can be checked in the Oracle Solaris
Operating System. Refer to the Oracle Solaris OS documentation for more
information.
■ The firmware configuration and versions can be checked from the maintenance
terminal using XSCF Shell commands. Refer to the SPARC Enterprise
M3000/M4000/M5000/M8000/M9000 Servers XSCF User's Guide for more detailed
information.
Check the software and firmware configuration information with assistance from the
system administrator. However, if you have received login authority from the system
administrator, the following commands can be used from the maintenance terminal
TABLE 3-2 Commands for Checking the Software Configuration
CommandDescription
showrev(1M)Displays system configuration information and Oracle Solaris OS patch information.
uname(1)Outputs current system information.
for these checks:
TABLE 3-3 Commands for Checking the XSCF Firmware Configuration
CommandDescription
version(8)Outputs current firmware version information.
showhardconf(8)Outputs information on the components mounted on the server.
showstatus(8)Displays the status of a component. This command is used only when a faulty
component is checked.
showboards(8)Displays XSB information. It can display information on an XSB that belongs to the
specified domain and information on all XSBs mounted. An XSB combines hardware
resources on physical system boards. The M3000 server consists of a single physical
system board (Uni-XSB).
showdcl(8)Displays the configuration information of a domain (hardware resource information).
showfru(8)Displays the setting information of a device.
3-6SPARC Enterprise M3000 Server Service Manual • March 2012
3.3.2.1Checking the Software Configuration
The following procedure can be used to check the software configuration from the
domain console.
● Ty pe showrev.
# showrev
The showrev command displays system configuration information on the screen.
3.3.2.2Checking the Firmware Configuration
Login authority is required to check the firmware configuration. The procedure
below can be used to check the configuration from the maintenance terminal.
1. Log in with the account of the XSCF hardware field engineer.
2. Type version.
XSCF> version
The version command displays firmware version information on the screen. For
details, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User's Guide.
3.3.2.3Downloading Error Log Information
To download error log information, use the XSCF log fetch function. The XSCF unit
has an interface with external units so that the authorized service personnel can
easily obtain useful maintenance information such as error logs.
Connect the maintenance terminal, and use the XSCF Shell or XSCF Web to
download error log information to the maintenance terminal.
Chapter 3 Troubleshooting3-7
3.4Error Conditions
This section describes error conditions and relevant corrective actions.
This work is explained in the following sections:
■ Section 3.4.1, “Predictive Self-Healing Tools” on page 3-8
■ Section 3.4.2, “Monitoring Output” on page 3-10
■ Section 3.4.3, “Messaging Output” on page 3-10
Details of the fault information, see the SPARC Enterprise
M3000/M4000/M5000/M8000/M9000 Servers XSCF User's Guide.
You can find more detailed descriptions of Oracle Solaris OS Predictive Self-Healing
at the website below:
Predictive self-healing is an architecture and methodology for automatically
diagnosing, reporting, and handling software and hardware error conditions. This
new technology reduces the time required to debug a hardware or software problem
and provides the administrator and service engineer with detailed data about each
error.
3.4.1Predictive Self-Healing Tools
In the Oracle Solaris OS, Oracle Solaris Fault Manager runs in the background. When
an error occurs, the system software recognizes the error and attempts to determine
the faulty hardware component. The system software also takes steps to prevent the
faulty component from being used until it has been replaced. The system software
performs the following activities:
■ Receives telemetry information about errors detected by the system software.
■ Diagnoses the errors.
■ Initiates predictive self-healing activities. For example, Oracle Solaris Fault
Manager can disable faulty components.
■ When possible, causes the faulty FRU to provide an LED indication of the error in
addition to populating system console messages with more details.
3-8SPARC Enterprise M3000 Server Service Manual • March 2012
TABLE 3-4 shows typical messages generated when an error occurs. Messages are
displayed on your console and are recorded in the /var/adm/messages file.
A message in
TAB LE 3- 4 indicates that the fault has already been diagnosed. If there
was any corrective action that the system could take, the system has already taken it.
If your server is still running, the corrective action continues to be taken.
TABLE 3-4 Predictive Self-Healing Messages
Output DisplayedDescription
Nov 1 16:30:20 dt88-292 EVENT-TIME:Tue Nov 1
16:30:20 PST 2005
Nov 1 16:30:20 dt88-292 PLATFORM:SUNW,A70,
CSN:-, HOSTNAME:dt88-292
Nov 1 16:30:20 dt88-292 SOURCE:eft, REV: 1.13SOURCE: Information on the Diagnosis Engine used to
Nov 1 16:30:20 dt88-292 EVENT-ID:afc7e660-d6094b2f-86b8-ae7c6b8d50c4
Nov 1 16:30:20 dt88-292 DESC:
Nov 1 16:30:20 dt88-292 A problem was detected in the
PCI Express subsystem
Nov 1 16:30:20 dt88-292 Refer to
http://sun.com/msg/SUN4-8000-0Y for more
information.
Nov 1 16:30:20 dt88-292 AUTO-RESPONSE:One or
more device instances may be disabled.
Nov 1 16:30:20 dt88-292 IMPACT:Loss of services
provided by the device instances associated with this
fault.
Nov 1 16:30:20 dt88-292 REC-ACTION:Schedule a
repair procedure to replace the affected device.Use
Nov 1 16:30:20 dt88-292 fmdump -v -u EVENT_ID to
identify the device or contact Sun for support.
EVENT-TIME: The time stamp of the diagnosis
PLATFORM: A description of the server encountering
the error
determine the error
EVENT-ID: The Universally Unique event ID for this
error
DESC: A basic description of the error
WEB SITE: Where to find specific information and
actions for this error
AUTO-RESPONSE: What, if anything, the system did
to alleviate any follow-on problems
IMPACT: A description of what is considered to be the
impact of the fault
REC-ACTION: A brief description of the corrective
action the system administrator should take
Chapter 3 Troubleshooting3-9
3.4.2Monitoring Output
To understand error conditions, collect monitoring output information. For the
collection of the information, use the commands shown in
TABLE 3-5 XSCF Commands for Checking Monitoring Output
CommandOperandDescription
showlogs(8)consoleDisplays the console of a domain.
monitorLogs messages that are displayed in the message window.
panicLogs output to the console during a panic.
iplCollects console data generated during the period of the power-on of a
domain to the completion of the Oracle Solaris OS start.
3.4.3Messaging Output
To understand error conditions, collect messaging output information. For the
collection of the information, use the commands shown in
TABLE 3-6 Commands for Checking Messaging Output
TABLE 3-5.
TABLE 3-6.
CommandOperandDescription
showlogs(8)envDisplays the temperature history log. The environmental temperature
data and power status are indicated in 10-minute intervals. The data is
stored for a maximum of six months.
powerDisplays power and reset information.
eventDisplays information reported to the system and stored it as event
logs.
errorDisplays error logs.
fmdump (1M)
fmdump(8)
Displays FMA diagnostic results and errors. This command is
provided as an Oracle Solaris OS command and XSCF Shell command.
Each error message logged by the predictive self-healing architecture has a message
ID and Web address associated with the message. From this message ID and Web
address, information on the most up-to-date corrective measures can be retrieved.
For details of predictive self-healing, see the Oracle Solaris OS documents.
3-10SPARC Enterprise M3000 Server Service Manual • March 2012
3.5Using Troubleshooting Commands
When any message listed in TABLE 3 -4 is displayed, detailed information on the error
may be required. For details on troubleshooting commands, see manual pages of the
Oracle Solaris OS or XSCF Shell. This section provides detailed explanations of the
following commands:
■ “Using the showhardconf Command” on page 3-11
■ “Using the showlogs Command” on page 3-14
■ “Using the showstatus Command” on page 3-15
■ “Using the fmdump Command” on page 3-16
■ “Using the fmadm Command” on page 3-17
■ “Using the fmstat Command” on page 3-19
3.5.1Using the showhardconf Command
The showhardconf command displays information on each FRU. The following
information is displayed:
The showlogs command displays information of specified logs in the order of time
stamps. The information with the oldest time stamp is displayed first. The
showlogs command displays the following logs:
■ Error log
■ Power log
■ Event log
■ Temperature and humidity record
■ Monitoring message log
■ Console message log
■ Panic message log
■ IPL message log
XSCF> showlogs error
Date: Jun 17 11:05:32 JST 2008 Code: 80000000-c3ff0000-0173000600000000
Status: Alarm Occurred: Jun 17 11:05:32.522 JST 2008
FRU: /PSU#1
Msg: PSU shortage
Date: Jun 17 13:41:46 JST 2008 Code: 80002080-7801c201-0130000000000000
Status: Alarm Occurred: Jun 17 13:41:44.861 JST 2008
FRU: /MBU_A,*
Msg: Board control error (MBC link error)
Date: Jun 17 13:46:31 JST 2008 Code: 60000000-cd01c701-0164010100000000
Status: Warning Occurred: Jun 17 13:46:31.158 JST 2008
FRU: /OPNL,/FANBP_B
Msg: TWI access error
XSCF>
3-14SPARC Enterprise M3000 Server Service Manual • March 2012
3.5.3Using the showstatus Command
The showstatus command displays information about faulty or degraded units
that are among the FRUs composing the server and information on the units on the
layers immediately above the layers of the faulty or degraded units. For each of the
displayed units, an asterisk (*) indicating that the unit is faulty is displayed with any
of the following status indicators, which is displayed after "Status:".
■ Normal: Normal state
■ Faulted: The unit is faulty and is not operating.
■ Degraded: The unit is operating. The unit is partly faulty or degraded and some
error has been detected. Although a faulty state is displayed for the unit, it is
operating normally.
■ Deconfigured: There is no problem with the unit itself, but it is degraded due to a
configuration problem, environmental problem, or the degradation of another
unit.
■ Maintenance: Maintenance is being performed. replacefru(8) or addfru(8) is
The fmdump command displays the contents of the log managed by the module
called Fault Manager.
This example assumes that only one error exists.
# fmdump
TIME UUID SUNW-MSG-ID
Nov 02 10:04:15.4911 0ee65618-2218-4997-c0dc-b5c410ed8ec2 SUN4-8000-0Y
3.5.4.1fmdump -V Command
To get more detailed information you can use the -e option, as shown in the
following example.
# fmdump -V -u 0ee65618-2218-4997-c0dc-b5c410ed8ec2
TIME UUID SUNW-MSG-ID
Nov 02 10:04:15.4911 0ee65618-2218-4997-c0dc-b5c410ed8ec2 SUN4-8000-0Y
100% fault.io.fire.asic
FRU: hc://product-id=SUNW,A70/motherboard=0
rsrc: hc:///motherboard=0/hostbridge=0/pciexrc=0
The output method using the -V option displays at least three additional lines.
■ The first line is the same information shown for console messages above,
including a time stamp, UUID, and message ID.
■ The second line is a declaration of the certainty of diagnosis. In this case we are
100 percent sure the failure is in the ASIC described. If the diagnosis may involve
multiple components, you may see two lines here with 50% in each of the two
lines.
■ The "FRU" line indicates what component must be replaced to return the server to
a fully operational state.
■ The "rsrc" line indicates the component that has become unusable because of this
error.
3-16SPARC Enterprise M3000 Server Service Manual • March 2012
3.5.4.2fmdump -e Command
To get information of the errors that caused this failure you can use the -e option, as
shown in the following example.
# fmdump -e
TIME CLASS
Nov 02 10:04:14.3008 ereport.io.fire.jbc.mb_per
3.5.5Using the fmadm Command
3.5.5.1Using the fmadm faulty Command
The fmadm faulty command can be used by administrators and service personnel to
view and modify system configuration parameters that are maintained by the Oracle
Solaris Fault Manager. The command is primarily used to determine the status of a
component involved in a fault, as shown in the following example:
The PCIe slot has been degraded and it is associated with the same UUID as above.
Also, the "faulted" status may be displayed.
Chapter 3 Troubleshooting3-17
3.5.5.2fmadm repair Command
When the fmadm faulty command displays a fault, the fmadm repair command
must be executed to clear the FRU information in the domain after replacement of
the motherboard unit that has encountered the error. If the fmadm repair
command is not executed, the error message is not cleared.
If the fmadm faulty command displays a fault, clearing the FMA resource cache
on the operating system side causes no problem. Data in the cache does not need to
match the hardware fault information held by the XSCF.
The fmadm config command output displays the version number and current
status of the diagnosis engine that is being used by the server. Whether the latest
engine is being used can be determined by consulting the My Oracle Support web
site.
# fmadm config
MODULE VERSION STATUS DESCRIPTION
cpumem-diagnosis 1.6 active CPU/Memory Diagnosis
cpumem-retire 1.1 active CPU/Memory Retire Agent
disk-transport 1.0 active Disk Transport Agent
eft 1.16 active eft diagnosis engine
event-transport 2.0 active Event Transport Module
fabric-xlate 1.0 active Fabric Ereport Translater
fmd-self-diagnosis 1.0 active Fault Manager Self-Diagnosis
io-retire 1.0 active I/O Retire Agent
snmp-trapgen 1.0 active SNMP Trap Generation Agent
sysevent-transport 1.0 active SysEvent Transport Agent
syslog-msgs 1.0 active Syslog Messaging Agent
zfs-diagnosis 1.0 active ZFS Diagnosis Engine
zfs-retire 1.0 active ZFS Retire Agent
3-18SPARC Enterprise M3000 Server Service Manual • March 2012
3.5.6Using the fmstat Command
The fmstat command reports statistical information and a set of modules that are
associated with the module called Oracle Solaris Fault Manager. By using the
fmstat command, statistical information about the diagnostic engine and
diagnostic agent that are currently involved in fault management can be displayed.
The following output example shows that the fmd-self-diagnosis DE module
(displayed also on the console output) has received accepted events.
3.6General Oracle Solaris Troubleshooting
Commands
Superuser commands of this type are useful to determine whether there is a problem
with the server, network, or another server connected via the network.
This section explains the following commands:
■ “Using the iostat Command” on page 3-20
■ “Using the prtdiag Command” on page 3-21
■ “Using the prtconf Command” on page 3-23
■ “Using the netstat Command” on page 3-26
■ “Using the ping Command” on page 3-27
■ “Using the ps Command” on page 3-28
■ “Using the prstat Command” on page 3-29
Chapter 3 Troubleshooting3-19
Most of these commands are located in the /usr/bin directory or /usr/sbin
directory.
3.6.1Using the iostat Command
The iostat command repeatedly reports terminal, drive, and I/O activity, as well
as CPU utilization.
3.6.1.1Options
TABLE 3-7 lists the options of the iostat command and how those options can help
troubleshoot the server.
TABLE 3-7 Options for iostat
OptionDescriptionHow It Can Help
No optionReports status of local I/O devices.A quick three-line output of device status
-cReports the percentages of time the system has
-eDisplays device error summary statistics.
-EDisplays all device error statistics.Provides information about devices:
-nDisplays names in a descriptive format.The descriptive format helps identify devices.
-xReports extended drive statistics of each drive.
spent in user mode, in system mode, waiting
for I/O, and idling.
Displays the total number of errors, hardware
errors, software errors, and transfer errors.
The output is in a tabular form.
information.
Quick report of CPU status
Provides a short table with accumulated
errors. Identifies suspect I/O devices.
manufacturer, model number, serial number,
size, and errors.
Similar to the -e option, but provides rate
information. This helps identify internal
devices with poor performance and other I/O
devices with poor performance across the
network.
3-20SPARC Enterprise M3000 Server Service Manual • March 2012
The following example shows output for the iostat command:
# iostat -En
c0t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: ST3120026A Revision: 8.01 Serial No: 3JT4H4C2
Size: 120.03GB <120031641600 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0
c0t2d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: LITE-ON Product: COMBO SOHC-4832K Revision: O3K1 Serial No:
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
3.6.2Using the prtdiag Command
The prtdiag command displays system configuration and diagnostic information.
The diagnostic information identifies any failed FRU in the system.
The prtdiag command is located in the /usr/platform/platform-name/sbin/ directory.
The prtdiag command may indicate a slot number different from that shown
elsewhere in this document. This is normal.
3.6.2.1Options
TABLE 3-8 lists the options of the prtdiag command and how those options can help
troubleshooting.
TABLE 3-8 Options for prtdiag
OptionDescriptionHow it can help
No optionLists components.Shows CPU information, memory
configuration, PCIe cards installed, OBP
version, status of the mode switch, and CPU
operation mode.
-vVerbose mode.Provides the same information as no option.
Additionally, displays the detail information
of PCIe cards.
Chapter 3 Troubleshooting3-21
The following example shows output for the prtdiag command in verbose mode:
# prtdiag -v
System Configuration: Sun Microsystems sun4u SPARC Enterprise M3000 Server
System clock frequency: 1064 MHz
Memory size: 7808 Megabytes
=================== Environmental Status ===================
The prtdiag output continued:
Mode switch is in LOCK mode
=================== System Processor Mode ===================
SPARC64-VII mode
#
3.6.3Using the prtconf Command
Similar to the show-devs command executed at the ok prompt, the prtconf
command displays the devices that are configured.
The prtconf command identifies hardware that is recognized by the Oracle Solaris
OS. If software applications are having problems with hardware but the hardware is
not suspected of being faulty, the prtconf command can be used to check whether
the Oracle Solaris software recognizes the hardware and whether a driver for the
hardware is loaded.
Chapter 3 Troubleshooting3-23
3.6.3.1Options
TABLE 3-9 lists the options of the prtconf command and how those options can help
troubleshooting.
TABLE 3-9 Options for prtconf
OptionDescriptionHow it can help
No optionDisplays the device tree of devices recognized
by the operating system.
-DSimilar to the output of no option, but device
driver names are listed.
-pSimilar to the output of no option, yet is
abbreviated.
-VDisplays the version and date of the OpenBoot
PROM firmware.
The following example shows output for the prtconf command:
# prtconf
System Configuration: Sun Microsystems sun4u
Memory size: 7616 Megabytes
System Peripherals (Software Nodes):
If a hardware device is recognized, then it is
considered to be functioning properly. If the
message "(driver not attached)" is displayed
for the device or sub-device, then the driver
for the device is corrupt or missing.
Lists the drivers needed or used by the
operating system to enable the device.
Provides a brief list of the devices.
Useful for a quick check of the firmware
version.
SUNW,SPARC-Enterprise
scsi_vhci, instance #0
packages (driver not attached)
SUNW,probe-error-handler (driver not attached)
SUNW,builtin-drivers (driver not attached)
deblocker (driver not attached)
disk-label (driver not attached)
terminal-emulator (driver not attached)
obp-tftp (driver not attached)
ufs-file-system (driver not attached)
chosen (driver not attached)
openprom (driver not attached)
client-services (driver not attached)
options, instance #0
aliases (driver not attached)
memory (driver not attached)
virtual-memory (driver not attached)
pseudo-console, instance #0
3-24SPARC Enterprise M3000 Server Service Manual • March 2012
The prtconf output continued:
nvram (driver not attached)
pseudo-mc, instance #0
cmp (driver not attached)
core (driver not attached)
cpu (driver not attached)
cpu (driver not attached)
core (driver not attached)
cpu (driver not attached)
cpu (driver not attached)
core (driver not attached)
cpu (driver not attached)
cpu (driver not attached)
core (driver not attached)
cpu (driver not attached)
cpu (driver not attached)
pci, instance #0
ebus, instance #0
flashprom (driver not attached)
serial, instance #0
scfc, instance #0
panel, instance #0
pci, instance #0
pci, instance #0
pci, instance #1
scsi, instance #0
tape (driver not attached)
disk (driver not attached)
sd, instance #1
sd, instance #0
pci, instance #2
pci, instance #0
network, instance #0
network, instance #1 (driver not attached)
pci, instance #3
pci, instance #1
network, instance #2 (driver not attached)
network, instance #3 (driver not attached)
pci, instance #4
pci, instance #1
pci, instance #5
pci, instance #6
pci, instance #7
pci, instance #8
os-io (driver not attached)
iscsi, instance #0
Chapter 3 Troubleshooting3-25
The prtconf output continued:
pseudo, instance #0
#
3.6.4Using the netstat Command
The netstat command displays the network status and protocol statistics.
3.6.4.1Options
TABLE 3-10 lists the options of the netstat command and how those options can
TABLE 3-10 Options for netstat
OptionDescriptionHow It Can Help
help troubleshooting.
-iDisplays the interface status. The information
includes packets in/out, errors in/out,
collisions, and queues.
-i intervalRepeats the setstat command in the
intervals of as many seconds as specified after
the -i option.
-pDisplays the media table.Provides the MAC address for hosts on the
-rDisplays the routing table.Provides routing information.
-nReplaces host names with IP addresses and
displays them.
Provides a quick overview of the network
status.
Identifies intermittent or long duration
network events. By piping setstat output to
a file, overnight activity can be viewed all at
once.
subnet.
Used when an IP address is more useful than a
host name.
3-26SPARC Enterprise M3000 Server Service Manual • March 2012
The following example shows the output for the netstat -p command:
# netstat -p
Net to Media Table: IPv4
Device IP Address Mask Flags Phys Addr
------ -------------------- --------------- -------- --------------bge0 san-ff1-14-a 255.255.255.255 o 00:14:4f:3a:93:61
bge0 san-ff2-40-a 255.255.255.255 o 00:14:4f:3a:93:85
sppp0 224.0.0.22 255.255.255.255
bge0 san-ff2-42-a 255.255.255.255 o 00:14:4f:3a:93:af
bge0 san09-lab-r01-66 255.255.255.255 o 00:e0:52:ec:1a:00
sppp0 192.168.1.1 255.255.255.255
bge0 san-ff2-9-b 255.255.255.255 o 00:03:ba:dc:af:2a
bge0 bizzaro 255.255.255.255 o 00:03:ba:11:b3:c1
bge0 san-ff2-9-a 255.255.255.255 o 00:03:ba:dc:af:29
bge0 racerx-b 255.255.255.255 o 00:0b:5d:dc:08:b0
bge0 224.0.0.0 240.0.0.0 SM 01:00:5e:00:00:00
#
3.6.5Using the ping Command
The ping command sends an ICMP ECHO_REQUEST packet to a network host.
Depending on how the ping command is configured, troublesome network links or
nodes can be identified from the displayed output. The destination host is specified
in the variable hostname.
3.6.5.1Options
TABLE 3-11 lists the options of the ping command and how those options can help
troubleshooting.
TABLE 3-11 Options for ping
OptionDescriptionHow it can help
hostnameThe probe packet is sent to hostname and
returned.
-ghostnameForcibly routes the probe packet through a
specified gateway.
-iinterfaceSpecifies through which interface to send and
receive the probe packet.
Veri fie s th at a ho s t i s active on the network.
By sending the probe packet through different
routes to the target host, individual routes can
be tested for quality.
Enables a simple check of secondary network
interfaces.
Chapter 3 Troubleshooting3-27
TABLE 3-11 Options for ping (Continued)
OptionDescriptionHow it can help
-nReplaces host names with IP addresses and
displays them.
-sContinues to repeat ping at intervals of 1
second. Pressing
After it is stopped, statistics are displayed.
-svRDisplays the route the probe packet followed
in 1-second intervals.
CTRL-C stops the execution.
Used when an IP address is more useful than a
host name.
Helps identify intermittent or long duration
network events. By piping ping output to a
file, overnight activity can be viewed all at
once.
Indicates the probe packet route and number
of hops. Comparing multiple routes can
identify bottlenecks.
The following example shows output for the ping -s command:
# ping -s san-ff2-17-a
PING san-ff2-17-a: 56 data bytes
64 bytes from san-ff2-17-a (10.1.67.31): icmp_seq=0. time=0.427 ms
64 bytes from san-ff2-17-a (10.1.67.31): icmp_seq=1. time=0.194 ms
^C
The ps commands lists the status of processes. If no option is specified, the ps
command outputs information about the processes that have the same execution user ID as the user who is executing this command and are controlled from the same
control terminal as this command.
If any option is specified, the output information is controlled according to the
specified option.
3-28SPARC Enterprise M3000 Server Service Manual • March 2012
3.6.6.1Options
TABLE 3-12 lists the options of the ps command and how those options can help
TABLE 3-12 Options for ps
OptionDescriptionHow It Can Help
-eDisplays information for every process.Identifies the process ID and the executable
-fGenerates a full listing.Provides the following process information:
-o optionEnables configurable output. The pid, pcpu,
troubleshooting.
files.
user ID, parent process ID, time when
executed, and the paths to the executable files.
Provides only most important information.
pmem, and comm options display process ID,
percent CPU consumption, percent memory
consumption, and the relevant executable file,
respectively.
Knowing the percentage of resource
consumption helps identify processes that are
affecting performance and might be hung.
The following example shows output for the ps command:
# ps
PID TTY TIME CMD
101042 pts/3 0:00 ps
101025 pts/3 0:00 sh
#
When using sort with the -r option, the column headings are output so that the
value in the first column is equal to zero.
3.6.7Using the prstat Command
The prstat utility repeatedly examines all the active processes in the system and
reports statistics based on the selected output mode and sort order. The prstat
command provides output similar to the ps command.
Chapter 3 Troubleshooting3-29
3.6.7.1Options
TABLE 3-13 lists the options of the prstat command and how those options can help
TABLE 3-13 Options for prstat
OptionDescriptionHow It Can Help
No optionDisplays a list of the processes sorted in
-n numberLimits the number of output lines.Limits the amount of data displayed and
Output identifies the process ID, user ID, used
descending order of consumption amount of
CPU resources. The list is limited to the height
of the terminal window and the total number
of processes. Output is automatically updated
every 5 seconds. Pressing
execution.
parameter.
CTRL-C stops the
amount of memory, state, CPU consumption,
and command name.
displays processes consuming many resources.
Useful keys are cpu (default), time, and size.
The following example shows output for the prstat command:
3-30SPARC Enterprise M3000 Server Service Manual • March 2012
CHAPTER
4
FRU Replacement Preparation
This chapter explains the method of preparing for the safe replacement of FRUs.
■ Section 4.1, “Tools Required for Maintenance” on page 4-1
■ Section 4.2, “FRU Replacement and Installation Methods” on page 4-2
■ Section 4.3, “Active Replacement/Active Addition” on page 4-5
■ Section 4.4, “Hot Replacement/Hot Addition” on page 4-7
■ Section 4.5, “Cold Replacement/Cold Addition” on page 4-12
4.1Tools Required for Maintenance
The actual maintenance work described in Chapter 5 to Chapter 15 requires
maintenance software to confirm that the server and other components are operating
correctly and to collect status information and log data on the server and
components. Work for mounting, removing, or replacing a specific component
requires special tools, including screwdrivers and an antistatic wrist strap. These
items are generally named maintenance tools and are listed in
TABLE 4-1 Maintenance Tools
ItemPart NameUse
1Phillips screwdriver (No. 2)
2Wrist strapFor electrostatic control
3Conductive matFor electrostatic control
4Oracle VTSTest program
TABLE 4-1.
4-1
4.2FRU Replacement and Installation
Methods
This section explains how to replace and install FRUs.
4.2.1FRU Replacement
There are three methods of replacing FRUs, as follows:
■ Active replacement
A target FRU is operated while the Oracle Solaris OS of the domain to which the
FRU belongs is operating.
The target FRU is operated by using Oracle Solaris OS commands or XSCF
commands.
Because the power supply unit (PSU) and fan unit (FAN) do not belong to any
domain, they are operated by using XSCF commands regardless of the operating
state of the Oracle Solaris OS.
Note – The hard disk drive will have a redundant configuration by setting the
mirroring.
Note – ■ If a hard disk drive is a nonmirrored boot device, it must be replaced
according to the cold replacement procedure.
■ If a hard disk drive is in a mirrored configuration, active replacement can be
performed on the failed drive because the mirrored hard disk drive continues to be
online and functioning. The hard disk replacement procedure varies by the
mirroring configuration method. When it is configured with hardware RAID, see the
SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers Administration Guide.
When it is configured with software RAID, see the manuals for the software in use.
Note – Hardware RAID is available only on the M3000 server with the SPARC64
VII+ processors.
Note – To activate a hardware RAID boot volume after replacing the MBU of an
M3000 server with SPARC64 VII+ processors, see Appendix F.
4-2SPARC Enterprise M3000 Server Service Manual • March 2012
■ Hot replacement
A target FRU is operated while the domain to which the FRU belongs is stopped.
Depending on the target FRU, there are two cases as follows:
■ Power supply unit/Fan unit: operated with XSCF commands.
■ Hard disk drive: operated directly, not by using XSCF commands.
■ Cold replacement
After all the domains are stopped and then the server is powered off, a FRU is
operated.
Note – Do not operate a target FRU while the OpenBoot PROM is running (the ok
prompt is displayed). After stopping the relevant domain (power-off) or starting the
Oracle Solaris OS, operate the target FRU.
TABLE 4-2 lists the access locations and applicable replacement methods for each
FRU.
TABLE 4-2 FRU Access Locations and Replacement Methods
Hard disk drive backplane (HDDBP) TopYesNoNoChapter 10
CD-RW/DVD-RW drive unit
Front/topYesNoNoChapter 11
(DVDU)
Power supply unit (PSU)RearYesYes
Fan unit (FAN_A)TopYesYes
†
†
Yes
Yes
†
†
Chapter 12
Chapter 13
Fan backplane (FANBP_B)TopYesNoNoChapter 14
Operator panel (OPNL)Front/topYesNoNoChapter 15
* The FRU is operated directly, without using XSCF commands.
† The FRU is operated with XSCF commands.
Chapter 4 FRU Replacement Preparation4-3
‡ ■ The hard disk drive will have a redundant configuration by setting the mirroring.
■ If a hard disk drive is a nonmirrored boot device, it must be replaced according to the cold replacement procedure.
■ If a hard disk drive is in a mirrored configuration, active replacement can be performed on the failed drive because the mirrored
hard disk drive continues to be online and functioning. The hard disk replacement procedure varies by the mirroring configuration
method. When it is configured with hardware RAID, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers Adminis-tration Guide. When it is configured with software RAID, see the manuals for the software in use.
4.2.2FRU Installation
For empty slots without hard disk drives or PCIe cards, the number of mounted
FRUs can be changed from 1 to the maximum number as required. There are some
components that are tentatively mounted physically in the server. If such a
component is a hard disk drive, it is called an HDD dummy, and if such a
component is a PCIe card, it is called a PCIe slot cover. These components are
necessary to protect the server from noise and to properly cool the server.
The same methods as those used for replacement are used for installation.
Note – When installing a new FRU in an empty slot, remove the HDD dummy or
PCIe slot cover and then install a new FRU.
TABLE 4-3
FRU
Motherboard unit
TABLE 4-3 lists the access location and applicable installation methods for each FRU.
4-4SPARC Enterprise M3000 Server Service Manual • March 2012
TABLE 4-3
FRU Access Locations and Installation Methods (Continued)
FRU
Operator panel (OPNL)Front/topNoNoNo
* The FRU is operated directly, without using XSCF commands.
† The FRU is operated with XSCF commands.
Access
LocationCold Addition Hot Addition
Active
Addition
Where to Find the
Procedure
4.3Active Replacement/Active Addition
In active replacement, the target FRU is operated while the Oracle Solaris OS of the
domain to which the FRU belongs is operating.
The target FRU is operated using Oracle Solaris OS commands or XSCF commands.
Because the power supply unit (PSU) and fan unit (FAN) do not belong to any
domain, they are operated by using XSCF commands regardless of the operating
state of the Oracle Solaris OS.
Active replacement has the following four stages:
■ “Releasing a FRU from a Domain” on page 4-5
■ “FRU Removal and Replacement” on page 4-6
■ “Configuring a FRU in a Domain” on page 4-6
■ “Verifying the Hardware Operation” on page 4-7
For active installation, see Section 4.3.3, “Configuring a FRU in a Domain” on
page 4-6 and "Section 4.3.4, “Verifying the Hardware Operation” on page 4-7.
4.3.1Releasing a FRU from a Domain
Note – ■ If a hard disk drive is a nonmirrored boot device, it must be replaced
according to the cold replacement procedure.
■ If a hard disk drive is in a mirrored configuration, active replacement can be
performed on the failed drive because the mirrored hard disk drive continues to be
online and functioning.
Chapter 4 FRU Replacement Preparation4-5
1. From the Oracle Solaris OS, type the cfgadm command to obtain the FRU
status.
# cfgadm -a
2. Stop the application from using the FRU and disconnect the FRU from the
Oracle Solaris OS.
The READY LED (green) of the HDD goes off.
3. Type the cfgdevice -c unconfigure command to disconnect the FRU from
the Oracle Solaris OS.
# cfgadm -c unconfigureAp_Id
4. Type the cfgadm -x command to confirm that the CHECK LED blinks.
# cfgadm -x led=fault, mode=blinkAp_Id
The Ap_Id is shown in the output of cfgadm (for example, disk#0).
The CHECK LED (amber) of the HDD blinks.
5. Type the cfgadm command to verify that the FRU has been disconnected.
# cfgadm -a
The disconnected FRU is displayed as being unconfigured.
4.3.2FRU Removal and Replacement
After the disconnection of a FRU from a domain, the same procedure as that for Hot
Replacement/Hot Addition applies. See Section 4.4, “Hot Replacement/Hot
Addition” on page 4-7.
4.3.3Configuring a FRU in a Domain
This section explains the procedure for active replacement/installation by using
Oracle Solaris OS commands. For information on using the XSCF command, see
Section 4.4, “Hot Replacement/Hot Addition” on page 4-7.
4-6SPARC Enterprise M3000 Server Service Manual • March 2012
1. Type the cfgdevice -c unconfigure command from the Oracle Solaris OS
to integrate the FRU into the Oracle Solaris OS.
# cfgadm -c configureAp_Id
The Ap_Id is shown in the output of cfgadm (for example, disk#0).
2. Type the cfgadm -x command to confirm that the CHECK LED is off.
# cfgadm -x led=fault, mode=offAp_Id
The Ap_Id is shown in the output of cfgadm (for example, disk#0).
The CHECK LED (amber) of the HDD is turned off.
3. Type the cfgadm command to verify that the FRU has been configured.
# cfgadm -a
The configured FRU is displayed as being configured.
The READY LED (green) of the HDD goes on.
4.3.4Verifying the Hardware Operation
■ Confirm the status of the LED indicators.
For information on the LED status, see
TABLE 2-3 and TABLE 2-5.
4.4Hot Replacement/Hot Addition
In hot replacement, the target FRU is operated while the domain to which the FRU
belongs is stopped.
Depending on the target FRU, there are two cases as follows:
■ Power supply unit/Fan unit: operated with XSCF commands. See Section 4.4.1,
“FRU Removal and Replacement (Power supply unit/Fan unit)” on page 4-8.
■ Hard disk drive: operated directly, not by using XSCF commands.
For hot addition, do the same operation as that for hot replacement.
Chapter 4 FRU Replacement Preparation4-7
4.4.1FRU Removal and Replacement (Power supply
unit/Fan unit)
● Ty pe t h e replacefru command from the XSCF Shell prompt.
The replacefru command is a menu-driven interactive command.
XSCF> replacefru
---------------------------------------------------------------------Maintenance/Replacement Menu
Please select a type of FRU to be replaced.
You are about to replace FAN_A#0.
Do you want to continue?[r:replace|c:cancel] :r
Please confirm the Check LED is blinking.
If this is the case, please replace FAN_A#0.
After replacement has been completed, please select[f:finish] :f
4-8SPARC Enterprise M3000 Server Service Manual • March 2012
The replacefru command automatically tests the status of the FRU after the
completion of removal and replacement.
Diagnostic tests for FAN_A#0 have started.
[This operation may take up to 3 minute(s)]
(progress scale reported in seconds)
0..... 30..done
---------------------------------------------------------------------Maintenance/Replacement Menu
Status of the replaced FRU.
FRU Status
------------- -------FAN_A#0 Normal
---------------------------------------------------------------------The replacement of FAN_A#0 has completed normally.[f:finish] :f
---------------------------------------------------------------------Maintenance/Replacement Menu
Please select a type of FRU to be replaced.
For details, see the manual pages of showhardconf.
2. Confirm the state of the status LEDs of the FRU.
For information on the LED status, see
TAB LE 2- 3 and TA BLE 2-5 .
4.4.3Verifying the Hardware Operation (Hard disk
drive)
● Ty pe t h e probe-scsi-all command to confirm that the new hard disk drive
has been installed.
Chapter 4 FRU Replacement Preparation4-11
4.5Cold Replacement/Cold Addition
In cold replacement, all business operations must be stopped. When accessing the
server, power off the server and disconnect the power cord to ensure safety.
For cold addition, do the same operation as that for cold replacement.
4.5.1Powering off the Server
This section explains how to power off the server.
4.5.1.1Power-off by Using the XSCF Command
1. Notify users that the server is being powered off.
2. Back up the system files and data to tape, if necessary.
3. A user with platadm or fieldeng authority must log in to the XSCF Shell and
enter the poweroff command.
XSCF> poweroff -a
The following activity is executed when the poweroff command is used:
■ The Oracle Solaris OS shuts down completely.
■ The server is powered off and the server enters standby mode. (The power to
the XSCF unit remains on.)
For details, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User's Guide.
4. Verify that the POWER LED on the operator panel is off.
5. Disconnect all the power cords from the power outlets.
Caution – There is a risk of electrical failure if the power cords are not
disconnected. All the power cords must be disconnected to completely cut the power
to the server.
4-12SPARC Enterprise M3000 Server Service Manual • March 2012
4.5.1.2Power off by Using the Operator Panel
1. Notify users that the server is being powered off.
2. Back up the system files and data to tape, if necessary.
3. Turn the mode switch on the operator panel to the Service position.
4. Press the power switch on the operator panel for 4 seconds or more.
5. Verify that the POWER LED on the operator panel is off.
6. Disconnect all the power cords from the power outlets.
Caution – There is a risk of electrical failure if the power cords are not
disconnected. All the power cords must be disconnected to completely cut the power
to the server.
4.5.2FRU Removal and Replacement
In cold replacement, a FRU is removed and replaced while the power is turned off.
After the FRU replacement, power on the server.
4.5.3Powering on the Server
This section explains how to power on the server.
4.5.3.1Power-on by Using the XSCF Command
1. Verify that the server has enough power supply units to operate in the desired
configuration.
2. Connect all the power cords to power outlets.
3. Verify that the XSCF STANDBY LED on the operator panel is on.
4. Turn the mode switch on the operator panel to the desired mode position
(Locked or Service).
Chapter 4 FRU Replacement Preparation4-13
5. A user with platadm or fieldeng authority must log in to the XSCF Shell and
type the poweron command.
XSCF> poweron -a
Soon, the following activity is executed:
■ The POWER LED on the operator panel is turned on.
■ The power-on self-test (POST) is executed.
Then, the server is completely powered on.
Note – If automatic startup of the Oracle Solaris OS is specified, use the sendbreak
-d domain_id command of the XSCF Shell to display the ok prompt after the
display console banner is displayed but before the system starts booting the Oracle
Solaris OS.
For details, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User's Guide.
4.5.3.2Power-on by Using the Operator Panel
1. Verify that the server has enough power supply units to operate in the desired
configuration.
2. Connect all the power cords to power outlets.
3. Verify that the XSCF STANDBY LED on the operator panel is on.
4. Turn the mode switch on the operator panel to the desired mode position
(Locked or Service).
5. Press the power button on the operator panel.
Soon, the following activity is executed:
■ The POWER LED on the operator panel is turned on.
■ The power-on self-test (POST) is executed.
Then, the server is completely powered on.
Note – If automatic startup of the Oracle Solaris OS is specified, use the sendbreak
-d domain_id command of the XSCF Shell to display the ok prompt after the
display console banner is displayed but before the system starts booting the Oracle
Solaris OS.
4-14SPARC Enterprise M3000 Server Service Manual • March 2012
4.5.4Verifying the Hardware Operation
1. In response to the ok prompt, press the ENTER key and enter ”#” (default value)
and then press the ”.” (period) key.
The domain console is switched to the XSCF console.
2. Use the showhardconf command to confirm that the new FRU has been
installed.
5. Type the probe-scsi-all command to confirm that the storage devices are
mounted.
{0} ok probe-scsi-all
/pci@0,600000/pci@0/pci@0/scsi@0
MPT Version 1.05, Firmware Version 1.24.00.00
Target 0
Unit 0 Disk FUJITSU MAY2073RC 3701 143374738 Blocks, 73 GB
SASAddress 500000e0197292c2 PhyNum 0
Target 1
Unit 0 Disk FUJITSU MAY2073RC 3701 143374738 Blocks, 73 GB
SASAddress 500000e019728f22 PhyNum 1
Target 2
Unit 0 Disk FUJITSU MAY2073RC 3701 143374738 Blocks, 73 GB
SASAddress 500000e019729002 PhyNum 2
Target 3
Unit 0 Disk FUJITSU MAY2073RC 3701 143374738 Blocks, 73 GB
SASAddress 500000e019729302 PhyNum 3
Target 4
Unit 0 Removable Read Only device MATSHITADVD-RAM UJ875AS 1000
SATA device PhyNum 4
{0} ok
6. Type the boot command to start the Oracle Solaris OS.
ok boot
4-18SPARC Enterprise M3000 Server Service Manual • March 2012
CHAPTER
5
Internal Components Access
This chapter explains how to access internal components.
■ Section 5.1, “Sliding the Server Into and Out of the Equipment Rack” on page 5-1
■ Section 5.2, “Removing and Attaching the Top Cover” on page 5-3
■ Section 5.3, “Removing and Attaching the Air Duct” on page 5-4
■ Section 5.4, “Removing and Attaching the Fan Cover” on page 5-7
5.1Sliding the Server Into and Out of the
Equipment Rack
This section explains how to slide the server out from the equipment rack and how
to push it into the equipment rack.
For details of equipment racks, see the SPARC Enterprise Equipment Rack Mounting Guide.
5.1.1Sliding the Server Out from the Equipment Rack
Caution – To prevent the equipment rack from tipping over, you must deploy the
antitilt feature, if applicable, before you slide the server out of the equipment rack.
Note – If cable management arms are not attached, remove the cable ties securing
the PCI cables to the rear of the server and slide the server out.
5-1
Caution – To ensure that you and bystanders are not exposed to harm and to
prevent damage to the system, observe the ESD safety precautions. See Section 1.1,
“ESD Precautions” on page 1-1.
1. If the equipment rack is supplied with a Quake-Resistant Options Kit or
stabilizer, be sure to install it.
2. Slide the server out as far as possible.
When the server is drawn out completely, the server is automatically locked in
the predetermined position.
3. Loosen the two screws at the front of the server (
FIGURE 5-1 Pulling the Server Out from an Equipment Rack
FIGURE 5-1).
4. Slide the server out.
5-2SPARC Enterprise M3000 Server Service Manual • March 2012
5.1.2Sliding the Server into the Equipment Rack
RearFront
1. Push the server back into the equipment rack.
2. Tighten the two screws at the front of the server to secure it in the equipment
FIGURE 5-1).
rack (
3. If the equipment rack is supplied with a Quake-Resistant Options Kit or
stabilizer, return to its original position.
5.2Removing and Attaching the Top Cover
5.2.1Removing the Top Cover
1. Before removing the top cover, pull out the server from the equipment rack.
For details see Section 5.1, “Sliding the Server Into and Out of the Equipment
Rack” on page 5-1.
2. Loosen the three screws at the top rear of the server.
3. To remove the top cover, slide it toward the rear side and raise it (
FIGURE 5-2 Removing the Top Cover
FIGURE 5-2).
Chapter 5 Internal Components Access5-3
5.2.2Attaching the Top Cover
1. Align the top cover.
2. Tighten the three screws at the top rear of the server to secure the top cover in
the predetermined position.
3. Push the server back into the equipment rack.
For details, see "Section 5.1.2, “Sliding the Server into the Equipment Rack” on
page 5-3.
5.3Removing and Attaching the Air Duct
5.3.1Removing the Air Duct
Caution – Before the air duct is removed, the top cover must be removed. For
details, see Section 5.2, “Removing and Attaching the Top Cover” on page 5-3.
■ Hold the air duct and lift it upwards.
5-4SPARC Enterprise M3000 Server Service Manual • March 2012
FIGURE 5-3 Removing the Air Duct
Chapter 5 Internal Components Access5-5
5.3.2Attaching the Air Duct
1. Set the tab at the front of the air duct in place and then lower the air duct
FIGURE 5-4).
(
Prevent cables from interfering each other.
FIGURE 5-4 Attaching the Air Duct
2. Attach the top cover.
For details, see Section 5.2.2, “Attaching the Top Cover” on page 5-4.
5-6SPARC Enterprise M3000 Server Service Manual • March 2012
5.4Removing and Attaching the Fan Cover
5.4.1Removing the Fan Cover
Caution – Before the fan cover is removed, the server must be pulled out from the
equipment rack. For the procedure for pulling the server out from the equipment
rack, see "Section 5.1, “Sliding the Server Into and Out of the Equipment Rack” on
page 5-1.
1. Loosen one screw at the right of the fan cover.
2. Raise the right end of the fan cover and remove it (
FIGURE 5-5 Removing the Fan Cover
FIGURE 5-5).
Chapter 5 Internal Components Access5-7
5.4.2Attaching the Fan Cover
1. Align the tab on the left end of the fan cover in the predetermined position
and then secure the fan cover in position.
2. Tighten the one screw on the right side of the fan cover.
3. Push the server back into the equipment rack.
For details, see Section 5.1.2, “Sliding the Server into the Equipment Rack” on
page 5-3.
5-8SPARC Enterprise M3000 Server Service Manual • March 2012
CHAPTER
6
Motherboard Unit Replacement
This chapter explains how to replace the motherboard unit.
■ Section 6.1, “Accessing the Motherboard Unit” on page 6-4
■ Section 6.2, “Removing the Motherboard Unit” on page 6-7
■ Section 6.3, “Mounting the Motherboard Unit” on page 6-8
■ Section 6.4, “Reassembling the Server” on page 6-9
The motherboard unit is a cold replacement component. The server must be
completely powered off, the power cords must be disconnected, and all DIMMs and
PCIe cards must be removed, before the motherboard unit is replaced. See Chapter 7,
Replacement and Installation of Memory and Chapter 8, Replacement and
Installation of PCIe Cards.
Note – There are two types of motherboard units for the M3000 server: the
motherboard unit mounted with the CPU consisting of two-core processors and the
motherboard unit mounted with the CPU consisting of four-core processors. When
replacing a motherboard unit, the replacement unit must contain the same type of
processor. For example, do not replace a motherboard unit that has two-core
processors with one that contains four-core processors.
Note – When replacing the motherboard unit, use the same type of FRU as the FRU
mounted on the previous motherboard unit. If you use a different type of FRU, it
may not work properly.
Note – Do not replace the motherboard unit and the operator panel at the same
time. Otherwise, the system may not operate correctly. Use the showhardconf
command or showstatus command to verify that the replacement unit of the first
replaced FRU is fully operational, before replacing the other FRU.
6-1
Note – When replacing the motherboard unit, attach connection destination labels
to each of the LAN cable and UPS cable connected to the XSCF unit before removing
these cables.
Note – When mounting the motherboard unit, connect the LAN cable and UPS
cable to the XSCF unit.
Note – After the replacement of the motherboard unit is completed, the system
clock must be reset. For details of the setting method, see the SPARC Enterprise
M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide.
Note – After the replacement of the motherboard unit is completed, the versions of
the XCP and Oracle Solaris OS must be checked. For details of version number
checking and other such tasks, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide.
Note – To activate a hardware RAID boot volume after replacing the MBU of an
M3000 server with SPARC64 VII+ processors, see Appendix F.
Because the XSCF unit is mounted on the motherboard unit, it cannot be replaced
singly. For details of the XSCF unit, see the Appendix B.2.4.
6-2SPARC Enterprise M3000 Server Service Manual • March 2012
FIGURE 6-1 indicates the location of the motherboard unit.
FIGURE 6-1 Motherboard Unit Location
Chapter 6 Motherboard Unit Replacement6-3
FIGURE 6-2 indicates the locations of DIMMs, PCIe cards, and shutter unit.
1
2
3
FIGURE 6-2 Locations of DIMMs, PCIe Cards, and Shutter Unit
Location NumberComponent
1Memory (DIMM)
2PCIe card
3Shutter unit
6.1Accessing the Motherboard Unit
Caution – There is a risk of electrical failure if the power cords are not
disconnected. All the power cords must be disconnected to completely cut the power
to the server.
6-4SPARC Enterprise M3000 Server Service Manual • March 2012
Caution – To ensure that you and bystanders are not exposed to harm and to
prevent damage to the system, observe the ESD safety precautions. See Section 1.1,
“ESD Precautions” on page 1-1.
1. Power off the server.
This procedure includes the steps of turning the mode switch on the operator
panel to the Service position, verifying that the POWER LED is off, and
disconnecting the power cord. For details, see Section 4.5.1, “Powering off the
Server” on page 4-12.
2. Remove all the cables from the external interface block on the rear panel.
3. Pull the power supply unit out several centimeters to the rear side.
4. Slide the server out from the equipment rack.
For details, see Section 5.1.1, “Sliding the Server Out from the Equipment Rack”
on page 5-1.
Caution – To prevent the equipment rack from tipping over, you must deploy the
antitilt feature, if applicable, before you slide the server out of the equipment rack.
Note – When the cable management arm is not mounted, remove the cable ties that
secure the PCI cable to the rear of the server, and slide the server out.
5. Remove the top cover.
For details, see Section 5.2.1, “Removing the Top Cover” on page 5-3.
6. Remove the PCIe cards.
For details, see Section 8.2, “Removing a PCIe Card” on page 8-4.
7. Remove the air duct.
For details, see Section 5.3.1, “Removing the Air Duct” on page 5-4.
8. Disconnect all the cables from the motherboard unit.
9. Loosen the two screws securing the shutter unit, and slide the securing bracket
on the power supply unit.
Chapter 6 Motherboard Unit Replacement6-5
10. Remove the shutter unit.
FIGURE 6-3 Removing the Shutter Unit
6-6SPARC Enterprise M3000 Server Service Manual • March 2012
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.