intellectual propertylaws. Except as expressly permitted in yourlicense agreement or allowedby law, youmay notuse, copy,reproduce,translate,
broadcast, modify, license, transmit, distribute, exhibit, perform, publish,or displayany part,in anyform, or by any means. Reverse engineering,
disassembly, or decompilationof this software, unless required by lawfor interoperability, is prohibited.
The informationcontained herein is subject to changewithout noticeand isnot warranted to be error-free. If youfind anyerrors, please reportthem tous
in writing.
If thisis software or relatedsoftware documentation that is deliveredto theU.S. Government or anyone licensing it on behalf of the U.S.Government, the
following noticeis applicable:
U.S. GOVERNMENTRIGHTS Programs, software, databases,and related documentation and technical data deliveredto U.S. Government customers are
"commercial computersoftware" or "commercialtechnical data"pursuant tothe applicable Federal Acquisition Regulation and agency-specific
supplemental regulations.As such, the use, duplication, disclosure, modification,and adaptationshall besubject to the restrictions and license termsset
forth inthe applicableGovernment contract,and, to the extent applicable by the terms of the Government contract, theadditional rightsset forthin FA R
52.227-19, CommercialComputer Software License (December 2007). Oracle America,Inc., 500Oracle Parkway, Redwood City, CA94065.
This software or hardware is developed for general use in avariety of information management applications.It isnot developed or intended foruse inany
inherently dangerous applications, includingapplications whichmay create a risk of personal injury. If you use this software orhardware in dangerous
applications, thenyou shallbe responsible to takeall appropriate fail-safe, backup,redundancy, and other measuresto ensure its safeuse. Oracle
Corporation andits affiliates disclaim anyliability forany damagescaused byuse of this software or hardware in dangerousapplications.
Oracle andJava are registered trademarks of Oracle and/or its affiliates.Other namesmay be trademarks of their respective owners.
Intel andIntel Xeonare trademarks or registered trademarksof Intel Corporation. All SPARC trademarks are usedunder licenseand are trademarks or
registered trademarks of SPARC International, Inc.AMD, Opteron, theAMD logo,and the AMD Opteron logoare trademarks orregistered trademarks of
Advanced MicroDevices. UNIX is a registeredtrademark ofThe Open Group.
This software or hardware and documentation may provide access to or information on content, products, and services from third parties. Oracle
Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and
services. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party
content, products, or services.
restrictions d’utilisationet de divulgation. Sauf disposition de votrecontrat delicence oude la loi, vous ne pouvez pas copier,reproduire,traduire,
diffuser, modifier,breveter, transmettre, distribuer, exposer,exécuter, publier ouafficher le logiciel, même partiellement, sous quelque forme etpar
quelque procédéque ce soit. Par ailleurs, il est interdit deprocéder à toute ingénierie inverse du logiciel, de ledésassembler oude ledécompiler, excepté à
des finsd’interopérabilité avec des logiciels tiers ou tel que prescritpar laloi.
Les informationsfournies dansce documentsont susceptiblesde modification sans préavis. Par ailleurs, Oracle Corporation ne garantit pasqu’elles
soient exemptesd’erreurs et vous invite, le cas échéant, à lui en fairepart par écrit.
Si celogiciel, oula documentationqui l’accompagne,est concédé sous licence au Gouvernement des Etats-Unis, ou à touteentité quidélivre la licencede
ce logicielou l’utilisepour lecompte du Gouvernement des Etats-Unis, la notice suivante s’applique :
U.S. GOVERNMENTRIGHTS. Programs, software, databases,and related documentation and technical data deliveredto U.S. Government customers
are "commercial computer software" or "commercialtechnical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific
supplemental regulations. As such, theuse, duplication, disclosure, modification, and adaptation shallbe subjectto therestrictions and license terms set
forth inthe applicableGovernment contract,and, to the extent applicable by the terms of the Government contract, theadditional rightsset forthin FA R
52.227-19, CommercialComputer Software License (December 2007). Oracle America, Inc., 500 OracleParkway, Redwood City, CA 94065.
Ce logicielou matériela étédéveloppé pourun usage général dans le cadre d’applicationsde gestiondes informations.Ce logiciel ou matériel n’est pas
conçu nin’est destinéà être utilisé dansdes applicationsà risque,notamment dans des applications pouvant causer des dommages corporels.Si vous
utilisez celogiciel oumatériel dansle cadre d’applications dangereuses, il estde votre responsabilité deprendretoutes lesmesures de secours, de
sauvegarde, deredondance et autresmesures nécessaires à son utilisationdans desconditions optimalesde sécurité. Oracle Corporation et ses affiliés
déclinent touteresponsabilité quant aux dommages causés par l’utilisation de ce logiciel oumatériel pource typed’applications.
Oracle etJava sontdes marques déposées d’OracleCorporation et/oude sesaffiliés.Tout autrenom mentionnépeut correspondre à des marques
appartenant àd’autres propriétaires qu’Oracle.
Intel etIntel Xeonsont desmarques ou des marques déposéesd’Intel Corporation.Toutes les marques SPARC sont utilisées souslicence etsont des
marques oudes marques déposées de SPARC International, Inc. AMD,Opteron, le logo AMD et le logo AMD Opteronsont desmarques ou des marques
déposées d’AdvancedMicro Devices. UNIX est une marque déposéed’The OpenGroup.
Ce logicielou matérielet ladocumentation quil’accompagne peuvent fournir des informations ou des liens donnant accès àdes contenus,des produits et
des servicesémanant detiers. OracleCorporation etses affiliés déclinent toute responsabilitéou garantie expresse quant aux contenus, produitsou
services émanantde tiers.En aucuncas, OracleCorporation et ses affiliés nesauraient être tenus pour responsables des pertes subies, des coûts
occasionnés oudes dommagescausés parl’accès àdes contenus, produits ou services tiers, ouà leurutilisation.
Please
Recycle
Page 3
Contents
Using This Documentationix
Identifying Components1
Illustrated Parts Breakdown1
Front and Rear Panel Components3
Detecting and Managing Faults5
Diagnostics Overview5
Diagnostics Process7
Diagnostics LEDs10
Managing Faults (Oracle ILOM)11
Oracle ILOM Troubleshooting Overview12
Fault Management12
Fault Clearing13
Oracle Solaris Fault Manager Commands in Oracle ILOM14
Drive Faults14
▼Access the SP (Oracle ILOM)15
▼Display FRU Information (show Command)17
▼Check for Faults (show faulty Command)18
▼Check for Faults (fmadm faulty Command)20
▼Clear Faults (clear_fault_action Property)21
Service-Related Oracle ILOM Commands22
Interpreting Log Files and System Messages23
iii
Page 4
▼Check the Message Buffer (dmesg Command)24
▼View System Message Log Files24
▼List FRU Status (prtdiag Command)25
Checking if Oracle VTS Software Is
Installed27
Oracle VTS Overview27
▼Check if Oracle VTS Software Is Installed28
Managing Faults (POST)29
POST Overview29
Oracle ILOM Properties That Affect POST Behavior30
▼Configure POST33
▼Run POST With Maximum Testing35
▼Interpret POST Fault Messages37
▼Clear POST-Detected Faults37
POST Output Reference39
Managing Faults (PSH)41
PSH Overview41
▼Check for PSH-Detected Faults42
▼Clear PSH-Detected Faults44
Managing Components (ASR)45
ASR Overview46
▼Display System Components47
▼Disable System Components48
▼Enable System Components49
Preparing for Service51
Safety Information51
Safety Symbols52
ESD Measures52
ivNetra SPARC T4-1B Server Module Service Manual • June 2012
Page 5
Antistatic Wrist Strap Use53
Antistatic Mat53
Handling Precautions53
Tools Needed for Service54
▼Find the Modular System Chassis Serial Number54
▼Find the Server Module Serial Number55
▼Locate the Server Module56
Preparing the Server Module for Removal56
▼Shut Down the OS and Host (Commands)57
▼Shut Down the OS and Host (Power Button – Graceful)59
▼Shut Down the OS and Host (Emergency Shutdown)59
▼Set the Server Module to a Ready-to-Remove State60
▼Remove the Server Module From the Modular System61
▼Remove the Cover63
Servicing Drives65
Drive Configuration66
Drive LEDs67
Drive Hot-Plugging Guidelines68
▼Locate a Faulty Drive68
▼Remove a Drive69
▼Remove a Drive Filler70
▼Install a Drive71
▼Install a Drive Filler73
▼Verify Drive Functionality74
Servicing Memory75
Memory Faults75
DIMM Configuration77
Contentsv
Page 6
DIMM Handling Precautions79
▼Locate a Faulty DIMM79
▼Remove a DIMM80
▼Install a DIMM81
▼Clear the Fault and Verify the Functionality of the Replacement DIMM82
▼Verify DIMM Functionality86
Servicing the REM89
▼Remove a REM89
▼Install a REM90
Servicing the FEM93
▼Remove a FEM93
▼Install a FEM94
Servicing the SP Card97
▼Remove the SP Card97
▼Install the SP Card98
Servicing the ID PROM101
▼Remove the ID PROM101
▼Install the ID PROM102
▼Verify the ID PROM103
Servicing a USB Flash Drive105
▼Remove a USB Flash Drive105
▼Install a USB Flash Drive106
Servicing the Battery109
▼Replace the Battery109
Replacing the Server Module Enclosure Assembly (Motherboard)113
viNetra SPARC T4-1B Server Module Service Manual • June 2012
Page 7
▼Transfer Components to Another Enclosure Assembly113
Returning the Server Module to Operation117
▼Replace the Cover117
▼Install the Server Module Into the Modular System118
▼Power On the Host (Oracle ILOM)120
▼Power On the Host (Power Button)120
Glossary123
Index129
Contentsvii
Page 8
viiiNetra SPARC T4-1B Server Module Service Manual • June 2012
Page 9
Using This Documentation
This service manual explains how to identify faults, replace parts, and add additional
options in Oracle’s Netra SPARC T4-1B server module.
This document is written for technicians, system administrators, authorized service
providers, and users who have experience troubleshooting and replacing hardware.
■ “Related Documentation” on page ix
■ “Feedback” on page x
■ “Support and Accessibility” on page x
Related Documentation
DocumentationLinks
All Oracle productshttp://www.oracle.com/documentation
4Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 15
Detecting and Managing Faults
These topics explain how to use various diagnostic tools to monitor server module
status and troubleshoot faults in the server module.
■ “Diagnostics Overview” on page 5
■ “Diagnostics Process” on page 7
■ “Diagnostics LEDs” on page 10
■ “Managing Faults (Oracle ILOM)” on page 11
■ “Interpreting Log Files and System Messages” on page 23
■ “Checking if Oracle VTS Software Is Installed” on page 27
■ “Managing Faults (POST)” on page 29
■ “Managing Faults (PSH)” on page 41
■ “Managing Components (ASR)” on page 45
Related Information
■ “Preparing for Service” on page 51
Diagnostics Overview
You can use a variety of diagnostic tools, commands, and indicators to monitor and
troubleshoot a server module:
■ LEDs – Provide a quick visual notification of the status of the server module and
of some of the FRUs.
■ Oracle ILOM – This firmware runs on the SP. In addition to providing the
interface between the hardware and OS, Oracle ILOM also tracks and reports the
health of key server module components. Oracle ILOM works closely with POST
and PSH technology to keep the server module running even when there is a
faulty component. You can log in to multiple SP accounts simultaneously and
have separate Oracle ILOM shell commands executing concurrently under each
account.
5
Page 16
Note – Unless indicated otherwise, all examples of interaction with the SP are
depicted with Oracle ILOM shell commands.
■ POST – Performs diagnostics on server module components upon reset to ensure
the integrity of those components. POST can be configured and works with Oracle
ILOM to take faulty components offline if needed.
■ PSH – This Oracle Solaris OS technology continuously monitors the health of the
CPU, memory, and other components, and works with Oracle ILOM to take a
faulty component offline if needed. The PSH technology enables server modules to
accurately predict component failures and mitigate many serious problems before
they occur.
■ Log files and command interface – Provide the standard Oracle Solaris OS log
files and investigative commands that can be accessed and displayed on the
device of your choice.
■ Oracle VTS (formerly SunVTS) – An application that exercises the server
module, provides hardware validation, and discloses possible faulty components
with recommendations for repair.
The LEDs, Oracle ILOM, PSH, and many of the log files and console messages are
integrated. For example, when the Oracle Solaris OS detects a fault, it displays the
fault, logs it, and passes information to Oracle ILOM, where it is logged. Depending
on the fault, one or more LEDs might also be illuminated.
The diagnostic flowchart in “Diagnostics Process” on page 7 illustrates an approach
for using the server module diagnostics to identify a faulty FRU. The diagnostics you
use, and the order in which you use them, depend on the nature of the problem you
are troubleshooting. Therefore, you might perform some actions and not others.
Related Information
■ SPARC and Netra SPARC T4 Series Servers Administration Guide
■ “Diagnostics Process” on page 7
■ “Diagnostics LEDs” on page 10
■ “Managing Faults (Oracle ILOM)” on page 11
■ “Interpreting Log Files and System Messages” on page 23
■ “Managing Faults (PSH)” on page 41
■ “Managing Faults (POST)” on page 29
■ “Managing Components (ASR)” on page 45
■ “Checking if Oracle VTS Software Is Installed” on page 27
6Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 17
Diagnostics Process
Use the flowchart to understand how to use the server module’s diagnostic tools to
manage faults. Also see the table that follows this flowchart.
Detecting and Managing Faults7
Page 18
Flowchart
No.Diagnostic ActionPossible OutcomeAdditional Information
1.Check the Power
OK LED.
2.Run the Oracle
ILOM showfaulty command
to check for faults.
3.Check the Oracle
Solaris log files for
fault information.
4.Run the Oracle VTS
software.
If this LED is not lit, check the power source and
ensure that the server module is properly
installed in the modular system chassis.
This command displays the following kinds of
faults:
• Environmental and configuration
• PSH-detected
• POST-detected
Faulty FRUs are identified in fault messages
using the FRU name.
All Oracle ILOM detected fault messages begin
with the characters SPT.
The Oracle Solaris message buffer and log files
record system events, and provide information
about faults.
• If system messages indicate a faulty device,
replace the FRU.
• For more diagnostic information, review the
Oracle VTS report. See number 4.
• If Oracle VTS reports a faulty device, replace
it.
• If Oracle VTS does not report a faulty device,
run POST. See number 5.
• “Diagnostics LEDs” on
page 10
• “Service-Related Oracle
ILOM Commands” on
page 22
• “Check for Faults (show
faulty Command)” on
page 18
• “Interpreting Log Files and
System Messages” on
page 23
• “Checking if Oracle VTS
Software Is Installed” on
page 27
5.Run POST.POST performs basic tests of the server module
components and reports faulty FRUs.
6.Check if the fault is
environmental.
Determine if the fault is an environmental fault
or a configuration fault.
If the fault listed by the show faulty
command displays a temperature or voltage
fault, then the fault is an environmental fault.
Environmental faults can be caused by faulty
FRUs, or by environmental conditions such as
when computer room ambient temperature is
too high, or airflow is blocked. When the
environmental condition is corrected, the fault
automatically clears.
8Netra SPARC T4-1B Server Module Service Manual • June 2012
• “Managing Faults (POST)”
on page 29
• “Oracle ILOM Properties
That Affect POST Behavior”
on page 30
• “Check for Faults (show
faulty Command)” on
page 18
Page 19
Flowchart
No.Diagnostic ActionPossible OutcomeAdditional Information
7.Determine if the
fault was detected
by PSH.
8.Determine if the
fault was detected
by POST.
9.Contact technical
support.
Related Information
If the fault message does not begin with the
characters SPT, the fault was detected by the
PSH feature.
After the FRU is replaced, perform the
procedure to clear PSH detected faults.
POST performs basic tests of the server module
components and reports faulty FRUs. When
POST detects a faulty FRU, POST logs the fault
and if possible, takes the FRU offline. POST
detected FRUs display the following text in the
fault message:
Forced fail reason
where reason is the name of the power-on routine
that detected the failure.
The majority of hardware faults are detected by
the server module’s diagnostics. In rare cases a
problem might require additional
troubleshooting. If you are unable to determine
the cause of the problem, contact Oracle Support
or go to: http://support.oracle.com
• “Managing Faults (PSH)” on
page 41
• “Clear PSH-Detected
Faults” on page 44
• “Managing Faults (POST)”
on page 29
• “Clear POST-Detected
Faults” on page 37
• “POST Output Reference”
on page 39
• “Support and Accessibility”
on page x
■ SPARC and Netra SPARC T4 Series Servers Administration Guide
■ “Diagnostics Overview” on page 5
■ “Diagnostics LEDs” on page 10
■ “Managing Faults (Oracle ILOM)” on page 11
■ “Interpreting Log Files and System Messages” on page 23
■ “Managing Faults (PSH)” on page 41
■ “Managing Faults (POST)” on page 29
■ “Managing Components (ASR)” on page 45
■ “Checking if Oracle VTS Software Is Installed” on page 27
Detecting and Managing Faults9
Page 20
Diagnostics LEDs
The server module has LEDs on the front panel and on the drives. The LEDs conform
to ANSI SIS. For the locations of these LEDs, see “Front and Rear Panel
Components” on page 3, and “Drive LEDs” on page 67.
LED or ButtonIcon or LabelColorDescription
Locator LED
and button
Ready to
Remove LED
Service Action
Required LED
Power OK LEDGreenIndicates the following conditions:
WhiteYou can turn on the Locator LED to identify a particular server
module. When on, the LED blinks rapidly. There are two methods
for turning a Locator LED on:
• Issuing the Oracle ILOM command set /SYS/LOCATEvalue=Fast_Blink.
• Pressing the Locator button.
The Locator LED functions as the physical presence switch.
BlueSteady state – If LED is off, it is not safe to remove the server
module from the modular system chassis. You must use Oracle
ILOM to shut down the server module and put the blade into
ready to remove state before this LED is on.
AmberIndicates that service is required. POST and Oracle ILOM are two
diagnostics tools that can detect a fault or failure resulting in this
indication. Also, faults detected by PSH can result in Oracle
ILOM lighting this LED.
The Oracle ILOM show faulty command provides details about
any faults that cause this indicator to light.
Under some fault conditions, individual component fault LEDs
are turned on in addition to the Service Action Required LED.
• Off – Host is not running in its normal state. Host power might
be off. The SP might be running.
• Steady on – Host is powered on and is running in its normal
operating state. No service actions are required.
• Fast blink – Host is running in standby mode and can be
quickly returned to full function.
• Slow blink – A normal, but transitory activity is taking place.
Slow blinking might indicate that diagnostics are running, or
the host is booting.
10Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 21
LED or ButtonIcon or LabelColorDescription
On/Standby
button
Drive Ready to
Remove LED
Drive Service
Action
Required LED
Drive
OK/Activity
LED
n/aThe recessed Power button toggles the host on or off.
• Press once to turn the host on.
• Press once to shut the host down to a standby state.
• Press and hold for 4 seconds to perform an emergency
shutdown.
BlueIndicates that the drive can be removed during a hot-plug
operation.
AmberIndicates that the drive has experienced a fault condition.
GreenIndicates the following drive status:
• On – Drive is idle and available for use.
• Off – Read or write activity is in progress.
Related Information
■ “Diagnostics Overview” on page 5
■ “Diagnostics Process” on page 7
■ “Managing Faults (Oracle ILOM)” on page 11
■ “Interpreting Log Files and System Messages” on page 23
■ “Managing Faults (PSH)” on page 41
■ “Managing Faults (POST)” on page 29
■ “Managing Components (ASR)” on page 45
■ “Checking if Oracle VTS Software Is Installed” on page 27
Managing Faults (Oracle ILOM)
These topics explain how to use Oracle ILOM, the SP firmware, to diagnose faults
and verify successful repairs.
■ “Oracle ILOM Troubleshooting Overview” on page 12
■ “Access the SP (Oracle ILOM)” on page 15
■ “Display FRU Information (show Command)” on page 17
Detecting and Managing Faults11
Page 22
■ “Check for Faults (show faulty Command)” on page 18
■ “Check for Faults (fmadm faulty Command)” on page 20
■ “Clear Faults (clear_fault_action Property)” on page 21
■ “Service-Related Oracle ILOM Commands” on page 22
■ “Oracle ILOM Properties That Affect POST Behavior” on page 30
Related Information
■ “Diagnostics Overview” on page 5
■ “Diagnostics Process” on page 7
■ “Interpreting Log Files and System Messages” on page 23
■ “Managing Faults (PSH)” on page 41
■ “Managing Faults (POST)” on page 29
■ “Managing Components (ASR)” on page 45
■ “Checking if Oracle VTS Software Is Installed” on page 27
■ “POST Overview” on page 29
■ “Oracle ILOM Properties That Affect POST Behavior” on page 30
Oracle ILOM Troubleshooting Overview
Oracle ILOM enables you to remotely run diagnostics, such as POST, that would
otherwise require physical proximity to the server module. You can also configure
Oracle ILOM to send email alerts of hardware failures, hardware warnings, and other
events related to the server module or Oracle ILOM.
The SP runs independently of the server module, using the server module’s standby
power. Therefore, Oracle ILOM continues to function when the server module OS
goes offline or when the server module is powered off.
Fault Management
Error conditions detected by Oracle ILOM, POST, and PSH are forwarded to Oracle
ILOM for fault handling.
12Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 23
The Oracle ILOM fault manager evaluates error messages it receives to determine
whether the condition being reported should be classified as an alert or a fault.
■ Alerts – When the fault manager determines that an error condition being
reported does not indicate a faulty FRU, the fault manager classifies the error as
an alert.
Alert conditions are often caused by environmental conditions, such as computer
room temperature, which might improve over time. Conditions might also be
caused by a configuration error, such as the wrong DIMM type being installed.
If the conditions responsible for the alert go away, the fault manager will detect
the change and will stop logging alerts for that condition.
■ Faults – When the fault manager determines that a particular FRU has an error
condition that is permanent, that error is classified as a fault. This condition causes
the Service Action Required LEDs to be turned on, the FRUID PROMs updated,
and a fault message logged. If the FRU has status LEDs, the Service Action
Required LED for that FRU will also be turned on.
You must replace a FRU identified as having a fault condition.
In the event of a system fault, Oracle ILOM ensures that the Service Action Required
LED is turned on, FRUID PROMs are updated, the fault is logged, and alerts are
displayed. Faulty FRUs are identified in fault messages using the FRU name.
Fault Clearing
The SP can detect when a fault is no longer present. When this happens, it clears the
fault state in the FRU PROM and extinguishes the Service Action Required LED.
A fault condition can be removed in two ways:
■ Unaided recovery – Faults caused by environmental conditions can clear
automatically if the condition responsible for the fault is no longer present.
■ Repaired fault – When a fault is repaired by human intervention, such as a FRU
replacement, the SP will usually detect the repair automatically and extinguish the
Service Action Required LED. If the SP does not perform these actions, you must
Detecting and Managing Faults13
Page 24
perform these tasks manually by setting the Oracle ILOM component_state or
fault_state of the faulted component. The procedure for clearing faults
manually is described in “Clear Faults (clear_fault_action Property)” on page 21.
Many environmental faults can automatically recover. For example, a temporary
condition might cause the computer room temperature to rise above the maximum
threshold, producing an overtemperature fault in the server module. If the
computer room temperature then returns to the normal range and the server
module’s internal temperature also drops back to an acceptable level, the SP will
detect the new fault-free condition. The SP will extinguish the Service Action
Required LED and clear the fault state from the FRU PROM.
The SP can automatically detect when a FRU is removed. In many cases, the SP does
this even if you remove the FRU while the SP is not running. This function enables
Oracle ILOM to sense that a fault, diagnosed to a specific FRU, has been repaired.
Note – Oracle ILOM does not automatically detect drive replacement. Oracle ILOM
does not automatically clear voltage sensor faults.
Oracle Solaris Fault Manager Commands in Oracle ILOM
The Oracle ILOM CLI includes a feature that enables you to access Oracle Solaris
fault manager commands, such as fmadm, fmdump, and fmstat, from within the
Oracle ILOM shell. This feature is referred to as the Oracle ILOM faultmgmt shell.
Drive Faults
PSH does not monitor drives for faults. As a result, the SP does not recognize drive
faults and will not light the fault LEDs on either the server module or the drive itself.
Use the Oracle Solaris message files to view drive faults. See “View System Message
Log Files” on page 24.
Related Information
■ Oracle ILOM 3.0 documentation
■ SPARC and Netra SPARC T4 Series Servers Administration Guide
■ “Oracle ILOM Troubleshooting Overview” on page 12
■ “Access the SP (Oracle ILOM)” on page 15
■ “Display FRU Information (show Command)” on page 17
■ “Check for Faults (show faulty Command)” on page 18
■ “Check for Faults (fmadm faulty Command)” on page 20
14Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 25
■ “Clear Faults (clear_fault_action Property)” on page 21
■ “Service-Related Oracle ILOM Commands” on page 22
■ “Oracle ILOM Properties That Affect POST Behavior” on page 30
▼ Access the SP (Oracle ILOM)
You can access the server module’s SP either directly or through the CMM of the
modular system. You can manage the server module through the Oracle ILOM CLI
or through the Oracle ILOM web interface.
Use this procedure to log into the CMM to access the SP and to use the Oracle ILOM
CLI.
For alternative methods to access the server module SP, refer to the Server ModuleInstallation Guide.
1. Establish connectivity to the CMM using one of the following methods:
■ SER MGT port – Connect a terminal device (such as an ASCII terminal or
laptop with terminal emulation) to the CMM SER MGT port.
Set up your terminal device for 9600 baud, 8 bit, no parity, 1 stop bit, and no
handshaking, and use a null-modem configuration (transmit and receive signals
crossed over to enable DTE-to-DTE communication).
■ NET MGT port – Connect this CMM port to your Ethernet network. On the
CMM, this connector is labeled NET MGT. This port requires an IP address. By
default, this port uses DHCP to obtain and IP address, or you can assign a static
IP address.
Note – Alternatively, you can connect directly to the server module SP by using a
dongle cable to connect to the server module SER MGT or NET MGT ports. For more
information, refer to the Netra SPARC T4-1B Server Module Installation Guide.
2. Decide which interface to use.
■ Oracle ILOM CLI (default) – Most of the commands and examples in this
document use this interface. The default login account is root with a password
of changeme.
■ Oracle ILOM web interface – Can be used when you access the SP through the
NET MGT port and have a browser. Refer to the Oracle ILOM 3.0
documentation for details. This interface is not referenced in this document.
Detecting and Managing Faults15
Page 26
3. Open an SSH session to log into Oracle ILOM on the CMM.
The default Oracle ILOM login account is root with a default password of
changeme. The password might be different in your environment.
ssh root@CMM_IP_Address
Password:
Waiting for daemons to initialize...
Daemons ready
Oracle (R) Integrated Lights Out Manager
Version 3.0
Copyright (c) 2011, Oracle and/or its affiliates, Inc. All rights reserved.
Warning: password is set to factory default.
->
The Oracle ILOM prompt (->) indicates that you are accessing the Oracle ILOM
CLI.
4. Navigate to the server module.
-> cd /CH/BLn/SP/cli
Replace n with an integer that identifies the target server module (the slot in
which the server module is installed).
5. Start the server module SP Oracle ILOM CLI.
-> start
Are you sure you want to start /CH/BL0/SP/cli? y
start: Connecting to /CH/BL0/SP/cli using Single Sign On
6. Perform Oracle ILOM commands that provide the diagnostic information you
need.
These commands are commonly used for fault management:
■ show command – Displays information about individual FRUs.
See “Display FRU Information (show Command)” on page 17.
■ show faulty command – Displays environmental, POST-detected, and
PSH-detected faults.
See “Check for Faults (show faulty Command)” on page 18.
Note – You can use fmadm faulty in the Oracle ILOM faultmgmt shell as an
alternative to show faulty.
16Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 27
■ clear_fault_action property of the set command – Manually clears
PSH-detected faults.
See “Clear Faults (clear_fault_action Property)” on page 21.
Related Information
■ Oracle ILOM 3.0 documentation
■ “Display FRU Information (show Command)” on page 17
■ “Check for Faults (show faulty Command)” on page 18
■ “Check for Faults (fmadm faulty Command)” on page 20
■ “Clear Faults (clear_fault_action Property)” on page 21
■ “Service-Related Oracle ILOM Commands” on page 22
■ “Oracle ILOM Properties That Affect POST Behavior” on page 30
▼ Display FRU Information (show Command)
Use the Oracle ILOM show command to display information about individual FRUs.
● At the Oracle ILOM prompt, type the show command.
In the following example, the show command displays information about a
memory module.
■ Example of the show faulty command displaying a fault that was detected by
POST. These kinds of faults are identified by the message Forced fail reason,
where reason is the name of the power-on routine that detected the fault. For
more information, see “Managing Faults (POST)” on page 29.
20Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 31
3. Type the exit command when you are finished using the Oracle ILOM
faultmgt shell.
faultmgmtsp> exit
Related Information
■ “Diagnostics Process” on page 7
■ “Access the SP (Oracle ILOM)” on page 15
■ “Display FRU Information (show Command)” on page 17
■ “Check for Faults (show faulty Command)” on page 18
■ “Clear Faults (clear_fault_action Property)” on page 21
■ “Service-Related Oracle ILOM Commands” on page 22
▼ Clear Faults (clear_fault_action Property)
Use the clear_fault_action property with the set command to manually clear
PSH-detected faults for a FRU.
If Oracle ILOM detects a FRU replacement, it will automatically clear the fault. For
PSH-diagnosed faults, if the replacement of the FRU is detected by the SP or the fault
is manually cleared on the host, the fault will also be cleared from Oracle ILOM. In
such cases, you typically do not have to clear the fault manually.
Note – This procedure clears the fault from the SP but not from the host. If the fault
persists in the host, clear it manually as described in “Clear PSH-Detected Faults” on
page 44.
● At the Oracle ILOM prompt, use the set command with the
clear_fault_action=True property.
For example:
-> set /SYS/MB/CMP0/BOB0/CH0/D0 clear_fault_action=True
Are you sure you want to clear /SYS/MB/CMP0/BOB0/CH0/D0 (y/n)? y
Set ’clear_fault_action’ to ’true’
Related Information
■ “Diagnostics Process” on page 7
■ “Access the SP (Oracle ILOM)” on page 15
■ “Display FRU Information (show Command)” on page 17
Detecting and Managing Faults21
Page 32
■ “Check for Faults (show faulty Command)” on page 18
■ “Check for Faults (fmadm faulty Command)” on page 20
■ “Service-Related Oracle ILOM Commands” on page 22
Service-Related Oracle ILOM Commands
These are the Oracle ILOM shell commands most frequently used when performing
service-related tasks.
Oracle ILOM CommandDescription
help [command]Displays a list of all available commands with syntax
and descriptions. Specifying a command name as an
option displays help for that command.
set /HOST send_break_action=breakTakes the host server module from the OS to either
kmdb or OBP (equivalent to a Stop-A), depending on
the mode Oracle Solaris software was booted.
set /SYS/component clear_fault_action=trueManually clears host-detected faults. The component is
the unique ID of the device with a fault to be cleared.
start /HOST/consoleConnects to the host.
show /HOST/console/historyDisplays the contents of the host’s console buffer.
set /HOST/bootmode property=valueControls the host server module OBP firmware
method of booting. property is state, config,or
script
stop /SYS
start /SYS
Powers off the host server module and then powers on
the host server module.
stop /SYSPowers off the host server module.
start /SYSPowers on the host server module.
reset /SYSGenerates a hardware reset on the host server module.
reset /SPReboots the SP.
set /SYS keyswitch_state=valueSets the virtual keyswitch. value is normal, standby,
diag,or locked.
set /SYS/LOCATE value=valueTurns the Locator LED on the server module on or off.
value is Fast_blink or Off.
show faultyDisplays current server module faults. See “Check for
Faults (show faulty Command)” on page 18.
show /SYS keyswitch_stateDisplays the status of the virtual keyswitch.
22Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 33
Oracle ILOM CommandDescription
show /SYS/LOCATEDisplays the current state of the Locator LED as either
on or off.
show /SP/logs/event/listDisplays the history of all events logged in the SP
event buffers (in RAM or the persistent buffers).
show /HOSTDisplays information about the operating state of the
host, whether the hardware is providing service, and
firmware version information.
show /SYSDisplays information about the server module,
including the serial number.
Related Information
■ “Oracle ILOM Troubleshooting Overview” on page 12
■ “Access the SP (Oracle ILOM)” on page 15
■ “Display FRU Information (show Command)” on page 17
■ “Check for Faults (show faulty Command)” on page 18
■ “Check for Faults (fmadm faulty Command)” on page 20
■ “Clear Faults (clear_fault_action Property)” on page 21
■ “Oracle ILOM Properties That Affect POST Behavior” on page 30
Interpreting Log Files and System
Messages
With the Oracle Solaris OS running on the server module, you have the full
complement of Oracle Solaris OS files and commands available for collecting
information and for troubleshooting.
If POST or the PSH features do not indicate the source of a fault, check the message
buffer and log files for notifications for faults. Drive faults are usually captured by
the Oracle Solaris message files.
■ “Check the Message Buffer (dmesg Command)” on page 24
■ “View System Message Log Files” on page 24
■ “List FRU Status (prtdiag Command)” on page 25
Detecting and Managing Faults23
Page 34
Related Information
■ “Diagnostics Overview” on page 5
■ “Diagnostics Process” on page 7
■ “Managing Faults (Oracle ILOM)” on page 11
■ “Managing Faults (PSH)” on page 41
■ “Managing Faults (POST)” on page 29
■ “Managing Components (ASR)” on page 45
■ “Checking if Oracle VTS Software Is Installed” on page 27
▼ Check the Message Buffer (dmesg Command)
The dmesg command checks the system buffer for recent diagnostic messages and
displays them.
1. Log in as superuser.
2. Type.
# dmesg
Related Information
■ “View System Message Log Files” on page 24
■ “List FRU Status (prtdiag Command)” on page 25
▼ View System Message Log Files
The error logging daemon, syslogd, automatically records various system
warnings, errors, and faults in message files. These messages can alert you to system
problems such as a device that is about to fail.
The /var/adm directory contains several message files. The most recent messages
are in the /var/adm/messages file. After a period of time (usually every week), a
new message file is automatically created. The original contents of the messages
file are rotated to a file named messages.0. Over a period of time, the messages are
further rotated to messages.1 and messages.2, and then deleted.
1. Log in as superuser.
24Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 35
2. Type.
# more /var/adm/messages
Or, if you want to view all logged messages, type:
# more /var/adm/messages*
Related Information
■ “Check the Message Buffer (dmesg Command)” on page 24
▼ List FRU Status (prtdiag Command)
● At an Oracle Solaris OS command line, type the prtdiag command.
============================ Environmental Status ============================
Fan sensors:
All fan sensors are OK.
Fan indicators:
All fan indicators are OK.
Temperature sensors:
All temperature sensors are OK.
Temperature indicators:
All temperature indicators are OK.
Current sensors:
All current sensors are OK.
Current indicators:
All current indicators are OK.
Voltage sensors:
All voltage sensors are OK.
Voltage indicators:
All voltage indicators are OK.
26Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 37
============================ FRU Status ============================
All FRUs are enabled.
Related Information
■ “Check the Message Buffer (dmesg Command)” on page 24
■ “View System Message Log Files” on page 24
■ “Display FRU Information (show Command)” on page 17
Checking if Oracle VTS Software Is
Installed
Oracle VTS (previously named SunVTS) is a validation test suite that you can use to
test this server module. These topics provide an overview and a way to check if
Oracle VTS is installed. For comprehensive Oracle VTS information, refer to the
Oracle VTS documentation.
■ “Oracle VTS Overview” on page 27
■ “Check if Oracle VTS Software Is Installed” on page 28
Related Information
■ “Diagnostics Overview” on page 5
■ “Diagnostics Process” on page 7
■ “Managing Faults (Oracle ILOM)” on page 11
■ “Interpreting Log Files and System Messages” on page 23
■ “Managing Faults (PSH)” on page 41
■ “Managing Faults (POST)” on page 29
■ “Managing Components (ASR)” on page 45
Oracle VTS Overview
Oracle VTS is a validation test suite that you can use to test this server module.
Oracle VTS provides multiple diagnostic hardware tests that verify the connectivity
and functionality of most hardware controllers and devices for this server module.
The software provides these kinds of test categories:
Detecting and Managing Faults27
Page 38
■ Audio
■ Communication (serial and parallel)
■ Graphic and video
■ Memory
■ Network
■ Peripherals (hard drives, CD-DVD devices, and printers)
■ Processor
■ Storage
Use Oracle VTS to validate a server module during development, production,
receiving inspection, troubleshooting, periodic maintenance, and system or
subsystem stressing.
You can run Oracle VTS through a web browser, a terminal, or CLI.
You can run tests in a variety of modes for online and offline testing.
Oracle VTS also provides a choice of security mechanisms.
Oracle VTS software is provided in the preinstalled Oracle Solaris OS that shipped
with the server module.
Related Information
■ Oracle VTS documentation
■ “Check if Oracle VTS Software Is Installed” on page 28
▼ Check if Oracle VTS Software Is Installed
1. Log in as superuser.
2. Check for the presence of Oracle VTS packages.
# pkginfo -l SUNWvts SUNWvtsr SUNWvtsts SUNWvtsmn
■ If information about the packages is displayed, then Oracle VTS software is
installed.
■ If you receive messages reporting ERROR: information for package was
not found, then Oracle VTS is not installed. You must install the software
before you can use it. You can obtain the Oracle VTS software from the
following places:
■ Oracle Solaris OS media kit (DVDs)
28Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 39
■ As a download from the web
Related Information
■ Oracle VTS documentation
Managing Faults (POST)
These topics explain how to use POST as a diagnostic tool.
■ “POST Overview” on page 29
■ “Oracle ILOM Properties That Affect POST Behavior” on page 30
■ “Configure POST” on page 33
■ “Run POST With Maximum Testing” on page 35
■ “Interpret POST Fault Messages” on page 37
■ “Clear POST-Detected Faults” on page 37
■ “POST Output Reference” on page 39
Related Information
■ “Diagnostics Overview” on page 5
■ “Diagnostics Process” on page 7
■ “Managing Faults (Oracle ILOM)” on page 11
■ “Interpreting Log Files and System Messages” on page 23
■ “Managing Faults (PSH)” on page 41
■ “Managing Components (ASR)” on page 45
■ “Checking if Oracle VTS Software Is Installed” on page 27
POST Overview
POST is a group of PROM-based tests that run when the server module is powered
on or when it is reset. POST checks the basic integrity of the critical hardware
components in the server module (CPU, memory, and I/O subsystem).
You can also run POST as a system-level hardware diagnostic tool. To do this, use the
Oracle ILOM set command to set the parameter keyswitch_state to diag.
Detecting and Managing Faults29
Page 40
You can also set other Oracle ILOM properties to control various other aspects of
POST operations. For example, you can specify the events that cause POST to run,
the level of testing POST performs, and the amount of diagnostic information POST
displays. These properties are listed and described in “Oracle ILOM Properties That
Affect POST Behavior” on page 30.
If POST detects a faulty component, the component is disabled automatically. If the
server module is able to run without the disabled component, it will boot when
POST completes its tests. For example, if POST detects a faulty processor core, the
core will be disabled. After POST completes its test sequence, the server module
boots and uses the remaining cores.
Related Information
■ “Diagnostics Overview” on page 5
■ “Oracle ILOM Properties That Affect POST Behavior” on page 30
■ “Configure POST” on page 33
■ “Run POST With Maximum Testing” on page 35
■ “Interpret POST Fault Messages” on page 37
■ “Clear POST-Detected Faults” on page 37
■ “POST Output Reference” on page 39
Oracle ILOM Properties That Affect POST
Behavior
These Oracle ILOM properties determine how POST performs its operations. See also
the flowchart that follows the table.
Note – The value of keyswitch_state must be normal when individual POST
parameters are changed.
ParameterValuesDescription
/SYS keyswitch_statenormalThe host can power on and run POST (based on the
other parameter settings). This parameter overrides
all other commands.
diagThe host runs POST based on predetermined settings
that perform maximum verbose testing.
standbyThe host cannot power on.
30Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 41
ParameterValuesDescription
lockedThe host can power on and run POST, but no flash
updates can be made.
/HOST/diag modeoffPOST does not run.
normalRuns POST according to diag level value.
serviceRuns POST with preset values for diag level and
diag verbosity.
/HOST/diag levelmaxIf diag mode = normal, runs all the minimum tests
plus extensive processor and memory tests.
minIf diag mode = normal, runs the minimum set of
tests.
/HOST/diag triggernoneDoes not run POST on reset.
hw-change(Default) Runs POST following an AC power cycle
and when the top cover is removed.
power-on-resetRuns POST only for the first power on.
error-reset(Default) Runs POST if fatal errors are detected.
all-resetsRuns POST after any reset.
/HOST/diag verbositynormalPOST output displays all test and informational
messages.
minPOST output displays functional tests with a banner
and pinwheel.
maxPOST displays all test and informational messages,
and some debugging messages.
debugPOST displays extensive debugging output on the
system console, including the devices being tested
and the debug output of each test.
noneNo POST output is displayed.
Detecting and Managing Faults31
Page 42
The following table shows combinations of Oracle ILOM parameters and associated
POST modes.
Normal Diagnostic Mode
Oracle ILOM Parameter
keyswitch_state
*
(Default Settings)No POST Execution
normalnormaldiag
/HOST/diag modenormalOff
/HOST/diag levelmax
32Netra SPARC T4-1B Server Module Service Manual • June 2012
Service Mode Using the
Keyswitch_state
Page 43
Normal Diagnostic Mode
Oracle ILOM Parameter
(Default Settings)No POST Execution
/HOST/diag triggerhw-change error-resetnone
/HOST/diag verbosity normal
Service Mode Using the
Keyswitch_state
Description of POST
execution
This is the default POST
configuration. This
configuration tests the server
module thoroughly and
suppresses some of the
POST does not run,
resulting in quick
initialization. This
configuration is not
suggested.
detailed POST output.
* The keyswitch_state parameter, when set to diag, overrides all the other POST variables.
Related Information
■ “POST Overview” on page 29
■ “Configure POST” on page 33
■ “Run POST With Maximum Testing” on page 35
■ “Interpret POST Fault Messages” on page 37
■ “Clear POST-Detected Faults” on page 37
■ “POST Output Reference” on page 39
▼ Configure POST
POST runs the full
spectrum of tests with
the maximum output
displayed.
1. Log in to Oracle ILOM.
See “Access the SP (Oracle ILOM)” on page 15.
2. Set the virtual keyswitch to the value that corresponds to the POST
configuration you want to run.
The following example sets the virtual keyswitch to normal, which will configure
POST to run according to other parameter values.
-> set /SYS keyswitch_state=normal
Set ‘keyswitch_state' to ‘Normal'
For possible values for the keyswitch_state parameter, see “Oracle ILOM
Properties That Affect POST Behavior” on page 30.
Detecting and Managing Faults33
Page 44
3. If the virtual keyswitch is set to normal, and you want to define the mode,
level, verbosity,ortrigger, set the respective parameters.
Syntax:
set /HOST/diag property=value
See “Oracle ILOM Properties That Affect POST Behavior” on page 30 for a list of
parameters and values.
For examples:
-> set /HOST/diag mode=normal
or
-> set /HOST/diag verbosity=max
4. To see the current values for settings, use the show command.
For example, showing default values:
-> show /HOST/diag
/HOST/diag
Targets:
Properties:
error_reset_level = max
error_reset_verbosity = normal
hw_change_level = max
hw_change_verbosity = normal
level = max
mode = normal
power_on_level = max
power_on_verbosity = normal
trigger = hw-change error-reset
verbosity = normal
Commands:
cd
set
show
->
Related Information
■ “POST Overview” on page 29
■ “Oracle ILOM Properties That Affect POST Behavior” on page 30
34Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 45
■ “Run POST With Maximum Testing” on page 35
■ “Interpret POST Fault Messages” on page 37
■ “Clear POST-Detected Faults” on page 37
▼ Run POST With Maximum Testing
1. Access the Oracle ILOM prompt.
See “Access the SP (Oracle ILOM)” on page 15.
2. Set the virtual keyswitch to diag so that POST will run in service mode.
-> set /SYS/keyswitch_state=diag
Set ‘keyswitch_state' to ‘Diag'
3. Reset the server module so that POST runs.
There are several ways to initiate a reset. The following example shows a reset
using commands that will power cycle the host.
-> stop /SYS
Are you sure you want to stop /SYS (y/n)? y
Stopping /SYS
-> start /SYS
Are you sure you want to start /SYS (y/n)? y
Starting /SYS
Note – The server module takes about one minute to power off. Type the show
/HOST command to determine when the host has been powered off. The console willdisplay status=Powered Off.
4. Switch to the host console to view the POST output.
-> start /HOST/console
Are you sure you want to start /HOST/console (y/n)? y
The following example shows abridged POST output.
Serial console started. To stop, type #.
[CPU 0:0:0] NOTICE: Checking Flash File System
[CPU 0:0:0] NOTICE: Initializing TOD: 2011/08/30 00:38:11
[CPU 0:0:0] NOTICE: Loaded ASR status DB data. Ver. 2.
[CPU 0:0:0] WARNING: TPM not supported
5. If you receive POST error messages, learn how to interpret them.
See “Interpret POST Fault Messages” on page 37.
Related Information
■ “POST Overview” on page 29
■ “Oracle ILOM Properties That Affect POST Behavior” on page 30
■ “Configure POST” on page 33
■ “Interpret POST Fault Messages” on page 37
■ “Clear POST-Detected Faults” on page 37
36Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 47
▼ Interpret POST Fault Messages
1. Run POST.
See “Run POST With Maximum Testing” on page 35.
2. View the output and watch for messages.
See “POST Output Reference” on page 39.
3. To obtain more information on faults, run the show faulty command.
See “Check for Faults (show faulty Command)” on page 18.
Related Information
■ “Clear POST-Detected Faults” on page 37
■ “POST Overview” on page 29
■ “Oracle ILOM Properties That Affect POST Behavior” on page 30
■ “Diagnostics Overview” on page 5
■ “Configure POST” on page 33
■ “Run POST With Maximum Testing” on page 35
▼ Clear POST-Detected Faults
Use this procedure if you suspect that a fault was not automatically cleared. This
procedure describes how to identify a POST-detected fault and, if necessary,
manually clear the fault.
In most cases, when POST detects a faulty component, POST logs the fault and
automatically takes the failed component out of operation by placing the component
in the ASR blacklist. See “Managing Components (ASR)” on page 45.
Usually, when a faulty component is replaced, the replacement is detected when the
SP is reset or power cycled. The fault is automatically cleared.
1. Replace the faulty FRU.
2. At the Oracle ILOM prompt, type the show faulty command to identify POST
detected faults.
POST-detected faults are distinguished from other kinds of faults by the text:
Forced fail. No UUID number is reported. For example:
■ No fault is reported – The server module cleared the fault and you do not need
to manually clear the fault. Do not perform the subsequent steps.
■ Fault reported – Go to Step 4.
4. Use the component_state property of the component to clear the fault and
remove the component from the ASR blacklist.
Use the FRU name that was reported in the fault in Step 2. For example:
-> set /SYS/MB/CMP0/BOB1/CH0/D0 component_state=Enabled
The fault is cleared and should not show up when you run the show faulty
command. Additionally, the front panel Fault (Service Action Required) LED is no
longer on.
5. Reset the server module.
You must reboot the server module for the component_state property to take
effect.
6. At the Oracle ILOM prompt, type the show faulty command to verify that no
■ “Oracle ILOM Properties That Affect POST Behavior” on page 30
■ “Configure POST” on page 33
■ “Run POST With Maximum Testing” on page 35
■ “Clear POST-Detected Faults” on page 37
38Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 49
POST Output Reference
POST error messages use the following syntax:
c:s > ERROR: TEST = failing-test
c:s > H/W under test = FRU
c:s > Repair Instructions: Replace items in order listed by H/W
under test above
c:s > MSG = test-error-message
c:s > END_ERROR
In this syntax, c = the core number, s = the strand number.
Warning messages use the following syntax:
WARNING: message
Informational messages use the following syntax:
INFO: message
In the following example, POST reports an uncorrectable memory error affecting
DIMM locations /SYS/MB/CMP0/BOB0/CH0/D0 and
/SYS/MB/CMP0/BOB1/CH0/D0. The error was detected by POST.
00000000.22000000
2011-07-03 18:44:13.958 0:7:2> 1 NESR_MCU1SRE: MCU1 issued
a Software Recoverable Error Request
2011-07-03 18:44:14.095 0:7:2> 1 NESR_MCU1HCCE: MCU1
issued a Hardware Corrected-and-Cleared Error Request
2011-07-03 18:44:14.248 0:7:2>
2011-07-03 18:44:14.296 0:7:2>Decode of Mem Error Status Reg Branch 1
bits 33044000.00000000
2011-07-03 18:44:14.427 0:7:2> 1 MEU 61 R/W1C Set to 1
on an UE if VEU = 1, or VEF = 1, or higher priority error in same cycle.
2011-07-03 18:44:14.614 0:7:2> 1 MEC 60 R/W1C Set to 1
on a CE if VEC = 1, or VEU = 1, or VEF = 1, or another error in same cycle.
2011-07-03 18:44:14.804 0:7:2> 1 VEU 57 R/W1C Set to 1
Detecting and Managing Faults39
Page 50
on an UE, if VEF = 0 and no fatal error is detected in same cycle.
2011-07-03 18:44:14.983 0:7:2> 1 VEC 56 R/W1C Set to 1
on a CE, if VEF = VEU = 0 and no fatal or UE is detected in same cycle.
2011-07-03 18:44:15.169 0:7:2> 1 DAU 50 R/W1C Set to 1
if the error was a DRAM access UE.
2011-07-03 18:44:15.304 0:7:2> 1 DAC 46 R/W1C Set to 1
if the error was a DRAM access CE.
2011-07-03 18:44:15.440 0:7:2>
2011-07-03 18:44:15.486 0:7:2> DRAM Error Address Reg for Branch
1 = 00000034.8647d2e0
2011-07-03 18:44:15.614 0:7:2> Physical Address is
00000005.d21bc0c0
2011-07-03 18:44:15.715 0:7:2> DRAM Error Location Reg for Branch
1 = 00000000.00000800
2011-07-03 18:44:15.842 0:7:2> DRAM Error Syndrome Reg for Branch
1 = dd1676ac.8c18c045
2011-07-03 18:44:15.967 0:7:2> DRAM Error Retry Reg for Branch 1
= 00000000.00000004
2011-07-03 18:44:16.086 0:7:2> DRAM Error RetrySyndrome 1 Reg for
Branch 1 = a8a5f81e.f6411b5a
2011-07-03 18:44:16.218 0:7:2> DRAM Error Retry Syndrome 2 Reg
for Branch 1 = a8a5f81e.f6411b5a
2011-07-03 18:44:16.351 0:7:2> DRAM Failover Location 0 for
Branch 1 = 00000000.00000000
2011-07-03 18:44:16.475 0:7:2> DRAM Failover Location 1 for
Branch 1 = 00000000.00000000
2011-07-03 18:44:16.604 0:7:2>
2011-07-03 18:44:16.648 0:7:2>ERROR: POST terminated prematurely. Not
all system components tested.
2011-07-03 18:44:16.786 0:7:2>POST: Return to VBSC
2011-07-03 18:44:16.795 0:7:2>ERROR:
2011-07-03 18:44:16.839 0:7:2> POST toplevel status has the following
failures:
2011-07-03 18:44:16.952 0:7:2> Node 0 ------------------------------2011-07-03 18:44:17.051 0:7:2> /SYS/MB/CMP0/BOB0/CH1/D0
2011-07-03 18:44:17.145 0:7:2> /SYS/MB/CMP0/BOB1/CH1/D0
2011-07-03 18:44:17.241 0:7:2>END_ERROR
Related Information
■ “Oracle ILOM Properties That Affect POST Behavior” on page 30
■ “Run POST With Maximum Testing” on page 35
■ “Clear POST-Detected Faults” on page 37
40Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 51
Managing Faults (PSH)
These topics describe the PSH feature:
■ “PSH Overview” on page 41
■ “Check for PSH-Detected Faults” on page 42
■ “Clear PSH-Detected Faults” on page 44
Related Information
■ “Diagnostics Overview” on page 5
■ “Diagnostics Process” on page 7
■ “Managing Faults (Oracle ILOM)” on page 11
■ “Interpreting Log Files and System Messages” on page 23
■ “Managing Faults (POST)” on page 29
■ “Managing Components (ASR)” on page 45
■ “Checking if Oracle VTS Software Is Installed” on page 27
■ “POST Overview” on page 29
PSH Overview
The Oracle Solaris PSH technology enables the server module to diagnose problems
while the Oracle Solaris OS is running and to mitigate many problems before they
negatively affect operations.
The Oracle Solaris OS uses the fault manager daemon, fmd(1M), which starts at boot
time and runs in the background to monitor the server module. If a component
generates an error, the daemon correlates the error with data from previous errors
and other relevant information to diagnose the problem. Once diagnosed, the fault
manager daemon assigns a UUID to the error. This value distinguishes this error
across any set of server modules.
When possible, the fault manager daemon initiates steps to self-heal the failed
component and take the component offline. The daemon also logs the fault to the
syslogd daemon and provides a fault notification with a message ID (sometimes
labeled MSG-ID). You can use the message ID to get additional information about the
problem from the knowledge article database.
The PSH technology covers the following server module components:
■ CPU
Detecting and Managing Faults41
Page 52
■ Memory
■ I/O subsystem
The PSH console message provides the following information about each detected
fault:
■ Type
■ Severity
■ Description
■ Automated response
■ Impact
■ Suggested action for a system administrator
If PSH detects a faulty component, use the fmadm faulty command to display
information about the fault. Alternatively, you can use the Oracle ILOM command
show faulty for the same purpose.
Related Information
■ “Check for Faults (show faulty Command)” on page 18
■ “Check for PSH-Detected Faults” on page 42
■ “Clear PSH-Detected Faults” on page 44
▼ Check for PSH-Detected Faults
The fmadm faulty command displays the list of faults detected by PSH. You can run
this command either from the host or through the Oracle ILOM fmadm shell.
As an alternative, you can display fault information by running the Oracle ILOM
command show.
42Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 53
1. Check the event log.
# fmadm faulty
TIMEEVENT-IDMSG-IDSEVERITY
Aug 13 11:48:33 21a8b59e-89ff-692a-c4bc-f4c5cccca8c8 SUN4V-8002-6E Major
Platform : sun4v Chassis_id :
Product_sn :
Fault class : fault.cpu.generic-sparc.strand
Affects : cpu:///cpuid=**/serial=*********************
faulted and taken out of service
FRU : "/SYS/MB"
(hc://:product-id=*****:product-sn=**********:server-id=***-******-*****:
chassis-id=********:**************-**********:serial=******:revision=05/
chassis=0/motherboard=0)
faulty
Description : The number of correctable errors associated with this strand has
exceeded acceptable levels.
Response : The fault manager will attempt to remove the affected strand
from service.
Impact : System performance may be affected.
Action: Schedule a repair procedure to replace the affected resource, the
identity of which can be determined using ’fmadm faulty’.
In this example, a fault is displayed, indicating the following details:
■ Date and time of the fault.
■ EVENT-ID, which is unique for every fault
(21a8b59e-89ff-692a-c4bc-f4c5cccca8c8).
■ MSG-ID, which can be used to obtain additional fault information
(SUN4V-8002-6E).
■ Faulted FRU. The information provided in the example includes the part
number of the FRU and the serial number of the FRU. The FRU field provides
the name of the FRU (/SYS/MB for motherboard in this example).
2. Use the message ID to obtain more information about this type of fault.
a. Obtain the message ID from console output or from the Oracle ILOM show
faulty command.
b. Sign into the Oracle support site, http://support.oracle.com.
Detecting and Managing Faults43
Page 54
c. Select the Knowledge tab.
d. Search for that message ID in the Knowledge Base.
e. Follow the suggested actions to repair the fault.
Related Information
■ “Clear PSH-Detected Faults” on page 44
▼ Clear PSH-Detected Faults
When PSH detects faults, the faults are logged and displayed on the console. In most
cases, after the fault is repaired, the server module detects the corrected state and
repairs the fault condition automatically. However, you should verify this repair. In
cases where the fault condition is not automatically cleared, you must clear the fault
manually.
1. After replacing a faulty FRU, power on the server module.
2. At the host prompt, determine if the replaced FRU still shows a faulty state.
# fmadm faulty
TIMEEVENT-IDMSG-IDSEVERITY
Aug 13 11:48:33 21a8b59e-89ff-692a-c4bc-f4c5cccca8c8SUN4V-8002-6E Major
Platform : sun4v Chassis_id :
Product_sn :
Fault class : fault.cpu.generic-sparc.strand
Affects : cpu:///cpuid=**/serial=*********************
faulted and taken out of service
FRU : "/SYS/MB"
(hc://:product-id=*****:product-sn=**********:server-id=***-******-*****:
chassis-id=********:**************-**********:serial=******:revision=05/
chassis=0/motherboard=0)
faulty
Description : The number of correctable errors associated with this strand has
exceeded acceptable levels.
Response : The fault manager will attempt to remove the affected strand
from service.
Impact : System performance may be affected.
44Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 55
Action: Schedule a repair procedure to replace the affected resource, the
identity of which can be determined using ’fmadm faulty’.
■ If no fault is reported, you do not need to do anything else. Do not perform the
subsequent steps.
■ If a fault is reported, continue to Step 3.
3. Clear the fault from all persistent fault records.
In some cases, even though the fault is cleared, some persistent fault information
remains and results in erroneous fault messages at boot time. To ensure that these
messages are not displayed, type the following Oracle Solaris command:
# fmadm repair EVENT-ID
For the EVENT-ID in the example shown in Step 2, type:
4. Use the Oracle ILOM clear_fault_action property of the FRU to clear the
fault.
-> set /SYS/MB clear_fault_action=True
Are you sure you want to clear /SYS/MB (y/n)? y
set ’clear_fault_action’ to ’true
Related Information
■ “PSH Overview” on page 41
■ “Clear PSH-Detected Faults” on page 44
Managing Components (ASR)
These topics explain the role played by ASR and how to manage the components that
ASR controls.
■ “ASR Overview” on page 46
■ “Display System Components” on page 47
■ “Disable System Components” on page 48
■ “Enable System Components” on page 49
Detecting and Managing Faults45
Page 56
Related Information
■ “Diagnostics Overview” on page 5
■ “Diagnostics Process” on page 7
■ “Managing Faults (Oracle ILOM)” on page 11
■ “Interpreting Log Files and System Messages” on page 23
■ “Managing Faults (PSH)” on page 41
■ “Managing Faults (POST)” on page 29
■ “Checking if Oracle VTS Software Is Installed” on page 27
ASR Overview
ASR enables the server module to automatically configure failed components out of
operation until they can be replaced. In the server module, ASR manages the
following components:
■ CPU strands
■ Memory DIMMs
■ I/O subsystem
The database that contains the list of disabled components is the ASR blacklist
(asr-db).
In most cases, POST automatically disables a faulty component. After the cause of
the fault is repaired (FRU replacement, loose connector reseated, and so on), you
might need to remove the component from the ASR blacklist.
The following ASR commands enable you to view, add, or remove components
(asrkeys) from the ASR blacklist. You run these commands from the Oracle ILOM
prompt.
CommandDescription
show componentsDisplays system components and their current state.
set asrkey component_state=
Enabled
set asrkey component_state=
Disabled
Removes a component from the asr-db blacklist,
where asrkey is the component to enable.
Adds a component to the asr-db blacklist, where
asrkey is the component to disable.
46Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 57
Note – The asrkey values vary from system to system, depending on how many
cores and memory are present. Use the show components command to see the
asrkey values on a given system.
After you enable or disable a component, you must reset (or power cycle) the server
module for the component’s change of state to take effect. See the SPARC and Netra
SPARC T4 Series Servers Administration Guide
Related Information
■ “Display System Components” on page 47
■ “Disable System Components” on page 48
■ “Enable System Components” on page 49
▼ Display System Components
The show components command displays the system components (asrkeys) and
reports their status.
● At the Oracle ILOM prompt, type show components.
In the following example, one of the DIMMs (BOB1/CH0/D0) is shown as
disabled.
You disable a component by setting its component_state property to Disabled.
This action adds the component to the ASR blacklist.
48Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 59
1. At the Oracle ILOM prompt, set the component_state property to Disabled.
-> set /SYS/MB/CMP0/BOB1/CH0/D0 component_state=Disabled
2. Reset the server module so that the ASR command takes effect.
-> stop /SYS
Are you sure you want to stop /SYS (y/n)? y
Stopping /SYS
-> start /SYS
Are you sure you want to start /SYS (y/n)? y
Starting /SYS
Note – In the Oracle ILOM shell, there is no notification when the system is actually
powered off. Powering off takes about a minute. Use the show /HOST command to
determine if the host has powered off.
Related Information
■ “View System Message Log Files” on page 24
■ “Display System Components” on page 47
■ “Enable System Components” on page 49
▼ Enable System Components
You enable a component by setting its component_state property to Enabled.
This action removes the component from the ASR blacklist.
1. At the Oracle ILOM prompt, set the component_state property to Enabled.
-> set /SYS/MB/CMP0/BOB1/CH0/D0 component_state=Enabled
2. Reset the server module so that the ASR command takes effect.
-> stop /SYS
Are you sure you want to stop /SYS (y/n)? y
Stopping /SYS
-> start /SYS
Are you sure you want to start /SYS (y/n)? y
Starting /SYS
Detecting and Managing Faults49
Page 60
Note – In the Oracle ILOM shell, there is no notification when the system is actually
powered off. Powering off takes about a minute. Use the show /HOST command to
determine if the host has powered off.
Related Information
■ “View System Message Log Files” on page 24
■ “Display System Components” on page 47
■ “Disable System Components” on page 48
50Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 61
Preparing for Service
The following topics describe how to prepare the server module for servicing.
StepDescriptionLinks
1.Review the safety and handling
information.
2.Gather the tools for service.“Tools Needed for Service” on page 54
3.find serial numbers for the modular
system and the server module.
4.Identify the server module that you
want to service.
5.Shut down the OS and host, and place
the server module in a ready-to-remove
state.
6.Remove the server module from the
modular system chassis.
7.Remove the server module cover.“Remove the Cover” on page 63
“Safety Information” on page 51
“Handling Precautions” on page 53
“Find the Modular System Chassis
Serial Number” on page 54
“Find the Server Module Serial
Number” on page 55
“Locate the Server Module” on page 56
“Preparing the Server Module for
Removal” on page 56
“Remove the Server Module From the
Modular System” on page 61
Related Information
■ “Returning the Server Module to Operation” on page 117
Safety Information
For your protection, observe the following safety precautions when setting up your
equipment:
51
Page 62
■ Follow all cautions and instructions marked on the equipment.
■ Follow all cautions and instructions described in the documentation that shipped
with your server module and in the Netra SPARC T4-1B Server Module Safety and
Compliance Guide.
■ Ensure that the voltage and frequency of your power source match the voltage
and frequency inscribed on the equipment’s electrical rating label.
■ Follow the ESD safety practices as described in this section.
Safety Symbols
You will see the following symbols in various places in the server module
documentation. Note the explanations provided next to each symbol.
Caution – There is a risk of personal injury or equipment damage. To avoid
personal injury and equipment damage, follow the instructions.
Caution – Components inside the server module might be hot. Use caution when
servicing components inside the server module.
Caution – Hazardous voltages are present. To reduce the risk of electric shock and
danger to personal health, follow the instructions.
ESD Measures
ESD sensitive devices, such as the motherboard, cards, drives, and DIMMs require
special handling.
Caution – Circuit boards and drives contain electronic components that are
extremely sensitive to static electricity. Ordinary amounts of static electricity from
clothing or the work environment can destroy the components located on these
boards. Do not touch the components along their connector edges.
52Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 63
Antistatic Wrist Strap Use
Wear an antistatic wrist strap and use an antistatic mat when handling components
such as drive assemblies, circuit boards, or PCI cards. When servicing or removing
server module components, attach an antistatic strap to your wrist and then to a
metal area on the chassis. Following this practice equalizes the electrical potentials
between you and the server module.
Antistatic Mat
Place ESD-sensitive components such as cards and DIMMs on an antistatic mat.
Related Information
■ “Handling Precautions” on page 53
■ “Tools Needed for Service” on page 54
Handling Precautions
Review the following cautions.
Caution – A server module can weigh as much as 20 pounds (9.0 kg). During
removal, hold the server module firmly with both hands.
Caution – Do not stack server modules higher than five units tall.
Caution – Insert a filler panel into the empty server module slot within 60 seconds
after removing a server module ensure proper modular system chassis cooling.
Related Information
■ “Safety Information” on page 51
■ “Tools Needed for Service” on page 54
Preparing for Service53
Page 64
Tools Needed for Service
The following tools are required for service procedures:
■ Antistatic wrist strap
■ Antistatic mat
■ Stylus or pencil (to operate the power button)
■ UCP-3 dongle (UCP-4 dongle can be used, but see instructions in the Server
Module Installation Guide)
■ Blade filler panel
Related Information
■ “Safety Information” on page 51
■ “Handling Precautions” on page 53
■ “Find the Modular System Chassis Serial Number” on page 54
▼Find the Modular System Chassis
Serial Number
To obtain support for your server module, you need the serial number of the Sun
Netra 6000 modular system in which the server module is located, not the serial
number of the server module. The serial number of the modular system is provided
on a label on the upper left edge of the front bezel.
Use the following procedure to obtain the serial number remotely.
1. Log in to the CMM of the modular system.
See the documentation for the Sun Netra 6000 modular system.
2. Type.
-> show /CH
3. In the output, locate the value for product_serial_number.
That number is the serial number of the modular system.
54Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 65
Related Information
■ “Find the Server Module Serial Number” on page 55
■ “Locate the Server Module” on page 56
▼Find the Server Module Serial Number
Note – To obtain support for your server module, you need the serial number of the
Sun Netra 6000 modular system in which the server module is located, not the serial
number of the server module. See “Find the Modular System Chassis Serial Number”
on page 54.
The serial number of the server module is located on a sticker on the RFID tag that is
mounted in the center of the front panel. However, this label is not present on a
server module that has been moved into a new enclosure assembly. You also can type
the Oracle ILOM show /SYS command to display the number.
■ “Find the Modular System Chassis Serial Number” on page 54
▼Locate the Server Module
To identify a specific server module from others in the modular system, perform the
following steps.
1. Log in to Oracle ILOM on the server module you plan to locate.
2. Type.
-> set /SYS/LOCATE value=fast_blink
The Locator LED on the server module blinks.
3. Identify the server module with a blinking white LED.
4. Once you locate the server module, press the Locator LED to turn it off.
Note – Alternatively, you can turn off the Locator LED by typing the Oracle ILOM
set /SYS/LOCATE value=off command.
Related Information
■ “Remove the Server Module From the Modular System” on page 61
Preparing the Server Module for
Removal
There are several ways to shut down the server module before you remove it from
the chassis.
56Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 67
DescriptionLinks
Perform a graceful shutdown using
commands.
Perform a graceful shutdown using the
power button and commands.
Perform a nongraceful shutdown (last
resort or emergency situations).
“Shut Down the OS and Host (Commands)” on
page 57
“Set the Server Module to a Ready-to-Remove
State” on page 60
“Shut Down the OS and Host (Power Button –
Graceful)” on page 59
“Set the Server Module to a Ready-to-Remove
State” on page 60
“Shut Down the OS and Host (Emergency
Shutdown)” on page 59
“Set the Server Module to a Ready-to-Remove
State” on page 60
Related Information
■ “Remove the Server Module From the Modular System” on page 61
▼ Shut Down the OS and Host (Commands)
This topic describes one method for shutting down the Oracle Solaris OS. For
information on other ways to shut down the Oracle Solaris OS, refer to the Oracle
Solaris OS documentation.
1. Log in as superuser or equivalent.
Depending on the type of problem, you might want to view server module status
or log files. You also might want to run diagnostics before you shut down the
server module.
2. Notify affected users that the server module will be shut down.
Refer to the Oracle Solaris system administration documentation for additional
information.
3. Save any open files and quit all running programs.
Refer to the application documentation for specific information on these processes.
4. (If applicable) Shut down all logical domains.
Refer to the Oracle Solaris system administration and Oracle VM Manager for
SPARC documentation for additional information.
Preparing for Service57
Page 68
5. Shut down the Oracle Solaris OS and reach the ok prompt.
Refer to the Oracle Solaris system administration documentation for additional
information.
The following example uses the Oracle Solaris shutdown command:
# shutdown -g0 -i0 -y
Shutdown started. Tue Jun 28 13:06:20 PDT 2011
Changing to init state 0 - please wait
Broadcast Message from root (console) on server1 Tue Jun 28
13:06:20...
THE SYSTEM server1 IS BEING SHUT DOWN NOW ! ! !
Log off now or risk your files being damaged
# svc.startd: The system is coming down. Please wait.
svc.startd: 100 system services are now being stopped.
Jun 28 13:06:34 dt90-366 syslogd: going down on signal 15
svc.startd: The system is down.
syncing file systems... done
Program terminated
Netra SPARC T4-1B, No Keyboard
Copyright (c) 1998, 2011, Oracle and/or its affiliates. All Rights
reserved.
OpenBoot 4.30, 16256 MB memory available, Serial # 87305111.
Ethernet address 0:21:28:34:2b:90, Host ID: 85342b90.
{0} ok
6. Switch from the host console to the Oracle ILOM prompt by typing the #.
(Hash Period) key sequence.
7. At the Oracle ILOM prompt, type.
-> stop /SYS
8. Prepare the server module for removal.
See “Set the Server Module to a Ready-to-Remove State” on page 60.
Related Information
■ “Shut Down the OS and Host (Power Button – Graceful)” on page 59
■ “Shut Down the OS and Host (Emergency Shutdown)” on page 59
■ “Set the Server Module to a Ready-to-Remove State” on page 60
58Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 69
▼ Shut Down the OS and Host (Power Button –
Graceful)
This procedure gracefully shuts down the OS and places the server module in the
power standby mode. In this mode, the Power OK LED blinks rapidly.
● Press and release the recessed Power button.
Use a stylus or the tip of a pen to operate this button. See “Front and Rear Panel
Components” on page 3.
Note – This button is recessed to prevent accidental server module power-off. Use
the tip of a pen or other stylus to operate this button.
Related Information
■ “Shut Down the OS and Host (Commands)” on page 57
■ “Shut Down the OS and Host (Emergency Shutdown)” on page 59
■ “Set the Server Module to a Ready-to-Remove State” on page 60
▼ Shut Down the OS and Host (Emergency
Shutdown)
Caution – All applications and files will be closed abruptly without saving changes.
File system corruption might occur.
● Press and hold the Power button for four seconds.
Use a stylus or the tip of a pen to operate this button. See “Front and Rear Panel
Components” on page 3
Related Information
■ “Shut Down the OS and Host (Commands)” on page 57
■ “Shut Down the OS and Host (Power Button – Graceful)” on page 59
■ “Set the Server Module to a Ready-to-Remove State” on page 60
Preparing for Service59
Page 70
▼ Set the Server Module to a Ready-to-Remove
State
1. Log in to Oracle ILOM on the server module you plan to remove.
2. Ensure that the server module is in standby mode, with the host powered off.
-> show /SYS/ power_state
/SYS
properties:
power_state = Off
If you do not see this message, check that you have performed all the steps in
“Shut Down the OS and Host (Commands)” on page 57.
3. Type:
-> set /SYS/ prepare_to_remove_action=true
Set ‘prepare_to_remove_action’ to ‘true’
The server module is in standby mode. Power is removed from the host while
standby power is applied to the SP.
4. Confirm that the server module is in standby mode by viewing the blue Ready
to Remove LED on the front of the server module.
See “Front and Rear Panel Components” on page 3 to locate this LED. If the Ready
to Remove LED is on, the server module is ready for removal from the modular
system chassis.
5. Remove the server module from the chassis.
See “Remove the Server Module From the Modular System” on page 61.
Related Information
■ “Remove the Server Module From the Modular System” on page 61
■ “Shut Down the OS and Host (Commands)” on page 57
■ “Shut Down the OS and Host (Power Button – Graceful)” on page 59
■ “Shut Down the OS and Host (Emergency Shutdown)” on page 59
60Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 71
▼Remove the Server Module From the
Modular System
1. Review the safety and handling precautions.
See “Safety Information” on page 51 and “Handling Precautions” on page 53.
2. If a cable is connected to the front of the server module, disconnect it.
Press the buttons on either side of the UCP to release the connector.
3. Open both ejector arms (panel 1).
Squeeze both latches on each of the two ejector arms.
Preparing for Service61
Page 72
4. Pull the server module out (panel 2 and panel 3).
5. Close the ejector arms.
6. Remove the server module from the modular system (panel 3).
Lift the server module with two hands.
7. Place the server module on an antistatic mat or surface.
8. Insert a filler panel into the empty chassis slot.
Note – When the modular system is operating, you must fill every slot with a filler
panel or a server module within 60 seconds.
9. Remove the server module cover.
See “Remove the Cover” on page 63.
Related Information
■ “Remove the Cover” on page 63
■ “Install the Server Module Into the Modular System” on page 118
62Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 73
▼Remove the Cover
1. (If needed) Remove the server module from the modular system.
See “Remove the Server Module From the Modular System” on page 61.
2. Attach an antistatic strap to your wrist and then to a metal area on the server
module.
3. While pressing the cover release button, slide the cover toward the rear of the
server module about half an inch (1 cm).
4. Lift the cover off the server module chassis.
5. Service the faulty component.
See “Illustrated Parts Breakdown” on page 1.
Related Information
■ “Illustrated Parts Breakdown” on page 1
■ “Replace the Cover” on page 117
Preparing for Service63
Page 74
64Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 75
Servicing Drives
The following topics apply to hard drives and solid state drives installed in the front
slots of the server module.
Note – The term drive applies to either a hard drive or a solid state drive.
DescriptionLinks
Replace a faulty drive.“Drive Hot-Plugging Guidelines” on page 68
“Drive Configuration” on page 66
“Locate a Faulty Drive” on page 68
“Remove a Drive” on page 69
“Install a Drive” on page 71
“Verify Drive Functionality” on page 74
Add an additional drive.“Drive Configuration” on page 66
“Remove a Drive Filler” on page 70
“Install a Drive” on page 71
“Verify Drive Functionality” on page 74
Remove a drive without
replacing it.
Identify drive LEDs.“Drive LEDs” on page 67
“Drive Configuration” on page 66
“Locate a Faulty Drive” on page 68
“Install a Drive Filler” on page 73
Related Information
■ “Detecting and Managing Faults” on page 5
■ “Preparing for Service” on page 51
65
Page 76
Drive Configuration
The following figure and table describe the physical addresses assigned to the drives
installed when the drive is installed into a particular slot.
Note – The Oracle Solaris OS now uses the WWN syntax in place of the unique tn
(target ID) field in logical device names. This change affects how a target storage
device is identified. Refer to the Server Module Product Notes for details.
No.Description
1Drive slot 0
2Drive slot 1
66Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 77
Drive LEDs
No.LED or ButtonColorIconDescription
1Drive OK/Activity LED GreenIndicates the following drive status:
• On – Drive is idle and available for use.
• Off – Read or write activity is in progress.
3Drive Service Action
Required LED
2Drive Ready to Remove
LED
AmberIndicates that the drive has experienced a fault
condition.
BlueIndicates that a drive can be removed during a
hot-plug operation.
Servicing Drives67
Page 78
Drive Hot-Plugging Guidelines
To safely remove a drive, you must:
■ Prevent any applications from accessing the drive.
■ Remove the logical software links.
Drives cannot be hot-plugged if:
■ The drive provides the operating system, and the operating system is not mirrored
on another drive.
■ The drive cannot be logically isolated from the online operations of the server
module.
If your drive falls into these conditions, you must shut down the Oracle Solaris OS
before you replace the drive. See “Shut Down the OS and Host (Commands)” on
page 57.
Related Information
■ “Remove a Drive” on page 69
■ “Install a Drive” on page 71
▼Locate a Faulty Drive
This procedure describes how to identify a faulty drive using the fault LEDs on the
drive.
You can also use the diskinfo(1M) command to identify the slot in which a
particular drive is installed. Refer to the Administration Guide and to the Product Notes
for more information.
● View the drive LEDs to determine the status of the drive.
When the amber drive Service Required LED on the front of a drive is lit, a fault
has occurred on that drive.
See “Drive LEDs” on page 67.
Related Information
■ “Detecting and Managing Faults” on page 5
■ “Remove a Drive” on page 69
68Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 79
■ “Install a Drive” on page 71
▼Remove a Drive
1. Identify the drive you plan to remove.
See “Locate a Faulty Drive” on page 68.
2. Prepare the drive for removal by performing one of the following steps:
■ Take the drive offline.
The exact commands required to take the drive offline depend on the
configuration of your drives. For example, you might need to unmount file
systems or perform certain RAID commands.
One command that is commonly used to take a drive offline is the cfgadm
command. For more information, refer to the Oracle Solaris cfgadm man page.
■ Shut down the Oracle Solaris OS.
If the drive cannot be taken offline, shut down the Oracle Solaris OS on the
server module. See “Shut Down the OS and Host (Commands)” on page 57
3. Verify whether the blue Drive Ready to Remove LED is illuminated on the front
of the drive.
See “Drive LEDs” on page 67. The blue LED will be illuminated only if the drive
was taken offline using cfgadm or an equivalent command. The LED will not be
illuminated if Oracle Solaris was shut down.
4. Remove the drive.
a. Push the latch release button on the drive (panels 1 and 2).
Servicing Drives69
Page 80
b. Grasp the latch and pull the drive out of the drive slot (panel 3).
5. Consider your next step.
■ If you are replacing the drive, see “Install a Drive” on page 71.
■ If you are not replacing the drive, install a drive filler. See “Install a Drive
Filler” on page 73.
Related Information
■ “Install a Drive Filler” on page 73
■ “Install a Drive” on page 71
▼Remove a Drive Filler
All drive bays must be populated by either a drive or a filler.
1. Open the filler lever (panels 1 and 2).
70Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 81
2. Pull to remove the filler (panel 3).
3. Install a drive in this slot.
See “Install a Drive” on page 71.
Related Information
■ “Install a Drive” on page 71
■ “Install a Drive Filler” on page 73
▼Install a Drive
The physical address of a drive is based the slot in which it is installed. See “Drive
Configuration” on page 66.
1. (If needed) Remove a drive.
See “Remove a Drive” on page 69.
2. Identify the slot in which to install the drive.
Servicing Drives71
Page 82
■ If you are replacing a drive, ensure that you install the replacement drive in the
same slot as the drive you removed.
■ If you are adding an additional drive, install the drive in the next available
drive slot.
3. (If needed) Remove the drive filler from this slot.
See “Remove a Drive Filler” on page 70.
4. Slide the drive into the bay until it is fully seated (panel 1).
5. Close the latch to lock the drive in place (panels 2 and 3).
6. Verify the functionality of the new drive.
See “Verify Drive Functionality” on page 74.
Related Information
■ “Remove a Drive” on page 69
72Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 83
▼Install a Drive Filler
All drive bays must be populated by either a drive or a filler.
1. Extend the filler handle, then align the filler to the empty drive bay (panel 1).
2. Push the filler into place.
3. Close the filler lever (panels 2 and 3).
Related Information
■ “Remove a Drive” on page 69
■ “Remove a Drive Filler” on page 70
Servicing Drives73
Page 84
▼Verify Drive Functionality
1. If the OS is shut down, and the drive you replaced was not the boot device, boot
the OS.
Depending on the nature of the replaced drive, you might need to perform
administrative tasks to reinstall software before the server can boot. Refer to the
Oracle Solaris OS administration documentation for more information.
2. Verify that the drive’s blue Ready to Remove LED is no longer lit on the drive
that you installed.
See “Drive LEDs” on page 67.
If the fault LED is not illuminated, the drive is ready to be configured according to
your requirements. Go to Step 3.
If the fault LED is lit, see “Detecting and Managing Faults” on page 5.
3. Perform administrative tasks to reconfigure the drive.
The procedures that you perform at this point depend on how your data is
configured. You might need to partition the drive, create file systems, load data
from backups, or have data updated from a RAID configuration.
The following commands might apply to your circumstances:
■ You can use the Oracle Solaris command cfgadm -al to list all drives in the
device tree, including unconfigured drives.
■ If the drive is not in the list, such as with a newly installed drive, you can use
devfsadm to configure it into the tree. See the devfsadm man page for details.
Related Information
■ “Detecting and Managing Faults” on page 5
■ “Locate a Faulty Drive” on page 68
■ “Remove a Drive” on page 69
■ “Install a Drive” on page 71
74Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 85
Servicing Memory
The following topics describe how to determine which DIMMs are faulty, remove
DIMMs, install DIMMs, and verify DIMM functionality after installation.
DescriptionLinks
Understand memory faults.“Memory Faults” on page 75
Replace a faulty DIMM.“DIMM Handling Precautions” on page 79
“Locate a Faulty DIMM” on page 79
“Remove a DIMM” on page 80
“Locate a Faulty DIMM” on page 79
“Install a DIMM” on page 81
“Clear the Fault and Verify the Functionality
of the Replacement DIMM” on page 82
Add memory to the server module.“DIMM Configuration” on page 77
“DIMM Handling Precautions” on page 79
“Install a DIMM” on page 81
“Verify DIMM Functionality” on page 86
Related Information
■ “Detecting and Managing Faults” on page 5
■ “Preparing for Service” on page 51
Memory Faults
A variety of features play a role in how the memory subsystem is configured and
how memory faults are handled. Understanding the underlying features helps you
identify and repair memory problems. This topic describes how the server module
deals with memory faults.
The following server module features independently manage memory faults:
75
Page 86
■ POST – Based on Oracle ILOM configuration variables, POST runs when the
server module is powered on.
For correctable memory errors (sometimes called CEs), POST forwards the error to
the Oracle Solaris PSH daemon for error handling.
If an incorrect memory fault is detected, POST displays the fault with the device
name of the faulty DIMMs, and logs the fault. POST then disables the faulty
DIMMs. Depending on the memory configuration and the location of the faulty
DIMM, POST disables half of physical memory in the server module, or half the
physical memory and half the processor threads. When the offlining process
occurs in normal operation, you must replace the faulty DIMMs based on the fault
message and then enable the disabled DIMMs. See “Clear the Fault and Verify the
Functionality of the Replacement DIMM” on page 82.
■ PSH – A feature of the Oracle Solaris OS, PSH uses the fault manager daemon
(fmd) to watch for various kinds of faults. When a fault occurs, the fault is
assigned a UUID and logged. PSH reports the fault and suggests a replacement for
the DIMMs associated with the fault.
If you suspect that the server module has a memory problem, follow the
“Diagnostics Process” on page 7. The flowchart helps you determine if the memory
problem was detected by POST or by PSH.
Once you identify which DIMMs you want to replace, see “Locate a Faulty DIMM”
on page 79. After replacing a faulty DIMM, you must perform the instructions in
“Clear the Fault and Verify the Functionality of the Replacement DIMM” on page 82.
Related Information
■ “Locate a Faulty DIMM” on page 79
■ “Clear the Fault and Verify the Functionality of the Replacement DIMM” on
page 82
■ “Clear the Fault and Verify the Functionality of the Replacement DIMM” on
page 82
■ “Detecting and Managing Faults” on page 5
76Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 87
DIMM Configuration
Description or Partial FRU Name
No.
(full names start with /SYS/MB/CMP0/)
1Fault Remind button
2Fault Remind Power LED
3DIMMs controlled by BOB3:
CH0/D1
CH0/D0
CH1/D1
CH1/D0
4DIMMs controlled by BOB4:
CH0/D1
CH0/D0
CH1/D1
CH1/D0
Servicing Memory77
Page 88
Description or Partial FRU Name
No.
(full names start with /SYS/MB/CMP0/)
5DIMMs controlled by BOB0:
CH0/D1
CH0/D0
CH1/D1
CH1/D0
6DIMMs controlled by BOB1:
CH1/D0
CH1/D1
CH0/D0
CH0/D1
7DIMM Fault LEDs
DIMM configuration guidelines:
■ Use only supported industry-standard DDR-3 DIMMs.
■ Use supported DIMM capacities: 4 Gbyte, 8 Gbyte, and 16 Gbyte.
Refer to the Netra SPARC T4-1B Server Module Product Notes for the latest
information.
■ Install quantities of 4, 8, or 16 DIMMs, in the correct slots:
■ 4 DIMMs: CH1/D0 slots (white sockets)
■ 8 DIMMs: CH1/D0 and CH0/D0 slots
■ 16 DIMMs: All slots
■ Ensure that all DIMMs have the same part number.
Related Information
■ “Memory Faults” on page 75
■ “Locate a Faulty DIMM” on page 79
■ “Remove a DIMM” on page 80
■ “Install a DIMM” on page 81
■ “Clear the Fault and Verify the Functionality of the Replacement DIMM” on
page 82
78Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 89
DIMM Handling Precautions
Caution – This procedure involves handling circuit boards that are extremely
sensitive to static electricity. Ensure that you follow ESD preventative practices to
avoid damaging the circuit boards.
Caution – Components inside the chassis might be hot. Use caution when servicing
components inside the chassis.
Related Information
■ “Locate a Faulty DIMM” on page 79
■ “Remove a DIMM” on page 80
■ “Install a DIMM” on page 81
▼Locate a Faulty DIMM
This procedure describes how to use the DIMM LEDs on the motherboard to
pinpoint the physical location of a faulty DIMM.
Note – You can also obtain the location of the faulty DIMM using the Oracle ILOM
show faulty command. This command displays the FRU name (such as
/SYS/MB/CMP0/BOB0/CH0). Use the FRU name and information to locate the faulty
DIMM. See “DIMM Configuration” on page 77.
1. Check the front panel Fault LED.
See “Diagnostics LEDs” on page 10.
When a faulty DIMM is detected, the front panel Fault LED and the motherboard
DIMM Fault LEDs are illuminated. Before opening the server module to check the
DIMM Fault LEDs, verify that the Fault LED is lit.
■ If the Fault LED is not lit, and you suspect there is a problem, see “Diagnostics
Process” on page 7.
■ If the Fault LED is lit, go to the next step.
Servicing Memory79
Page 90
2. (If needed) Prepare for service.
See “Preparing for Service” on page 51.
3. Press the Remind button on the motherboard.
While the Remind button is pressed, an LED next to the faulty DIMM illuminates,
enabling you to identify the faulty DIMM. See “DIMM Configuration” on page 77.
Tip – The DIMM Fault LEDs are small and difficult to identify when they are not
illuminated. If you do not see any illuminated LEDs in the area of the DIMM LEDs,
assume that the DIMMs are not faulty.
4. Remove the faulty DIMM.
See “Remove a DIMM” on page 80.
Related Information
■ “DIMM Configuration” on page 77
■ “Remove a DIMM” on page 80
▼Remove a DIMM
1. (If needed) Prepare for service.
See “Preparing for Service” on page 51.
2. (If needed) Locate the faulty DIMM.
See “Locate a Faulty DIMM” on page 79.
3. Remove the DIMM from the motherboard.
a. Push down on the ejector tabs on each side of the DIMM until the DIMM is
released (panel 1).
80Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 91
b. Grasp the top corners of the DIMM, and lift and remove it from the server
module (panel 2).
c. Place the DIMM on an antistatic mat.
4. Install a replacement DIMM.
See “Install a DIMM” on page 81.
Related Information
■ “Install a DIMM” on page 81
■ “DIMM Configuration” on page 77
▼Install a DIMM
1. (If needed) Prepare the server module for service and remove the faulty DIMM.
See “Preparing for Service” on page 51 and “Remove a DIMM” on page 80.
2. Unpack the replacement DIMM and set it on an antistatic mat.
See “DIMM Handling Precautions” on page 79.
3. Ensure that the DIMM ejector tabs are in the open position (panel 1).
4. Line up the replacement DIMM with the connector.
Align the DIMM notch with the key in the connector, as in panel 3. This action
ensures that the DIMM is oriented correctly. Panel 2 shows an incorrect alignment.
Servicing Memory81
Page 92
5. Push the DIMM into the connector until the ejector tabs lock the DIMM in
place.
If the DIMM does not easily seat into the connector, verify that the orientation of
the DIMM is correct. Never apply excessive force.
6. Return the server module to operation.
See “Returning the Server Module to Operation” on page 117
7. Perform one of the following tasks to verify the DIMM:
■ Verify a replacement DIMM. See “Clear the Fault and Verify the Functionality
of the Replacement DIMM” on page 82.
■ Verify additional memory. See “Verify DIMM Functionality” on page 86.
Related Information
■ “Remove a DIMM” on page 80
■ “DIMM Configuration” on page 77
▼Clear the Fault and Verify the
Functionality of the Replacement
DIMM
1. Ensure that the following conditions are met:
■ The server module is in Standby mode (installed in a powered modular system,
but the server module’s host is not started).
See “Set the Server Module to a Ready-to-Remove State” on page 60.
■ You have connectivity to the SP.
See “Access the SP (Oracle ILOM)” on page 15
2. Access the Oracle ILOM prompt.
See “Access the SP (Oracle ILOM)” on page 15.
3. Determine how to clear the fault.
The method you use to clear a fault depends on how the fault is identified by the
show faulty command.
Examples:
82Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 93
■ If the fault is a host-detected fault (displays a UUID), continue to Step 4. For
In most cases, the replacement of the faulty DIMM is detected when the SP is
power cycled. In this case, the fault is automatically cleared. If the fault is still
displayed by the show faulty command, then use the set command to enable
the DIMM and clear the fault.
Example:
-> set /SYS/MB/CMP0/BOB0/CH0/D0 component_state=Enabled
4. Verify the repair.
a. Set the virtual keyswitch to diag so that POST will run in Service mode.
-> set /SYS/keyswitch_state=Diag
Set ‘keyswitch_state’ to ‘Diag’
Servicing Memory83
Page 94
b. Power cycle the server module.
-> stop /SYS
Are you sure you want to stop /SYS (y/n)? y
Stopping /SYS
-> start /SYS
Are you sure you want to start /SYS (y/n)? y
Starting /SYS
Note – The server module takes about one minute to power off. Use the show
/HOST command to determine when the host has been powered off. The console willdisplay status=Powered Off.
c. Switch to the host console to view POST output.
-> start /HOST/console
Watch the POST output for possible fault messages. The following output is a
sign that POST did not detect any faults:
.
.
.
0:0:0>INFO:
0:0:0> POST Passed all devices.
0:0:0>POST: Return to VBSC.
0:0:0>Master set ACK for vbsc runpost command and spin...
Note – Depending on the configuration of Oracle ILOM variables that affect POST
and whether POST detected faults or not, the server module might boot, or the server
module might remain at the ok prompt. If the server module is at the ok prompt,
type boot.
d. Return the virtual keyswitch to Normal mode.
-> set /SYS keyswitch_state=Normal
Set ‘ketswitch_state’ to ‘Normal’
84Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 95
e. Switch to the host console and type the Oracle Solaris OS fmadm faulty
command.
# fmadm faulty
No memory faults should be displayed.
If faults are reported, refer to the “Diagnostics Process” on page 7 for an
approach to diagnose the fault.
5. Switch to the Oracle ILOM prompt.
6. Type the show faulty command.
■ If the fault was detected by the host and the fault information persists, the
■ If the show faulty command does not report a fault with a UUID, the fault is
cleared. You do not need to proceed with the following steps.
7. (Only if previous steps did not clear the fault) Type the set command.
-> set /SYS/MB/CMP0/BOB0/CH0/D0 clear_fault_action=true
Are you sure you want to clear /SYS/MB/CMP0/BOB0/CH0/D0 (y/n)? y
Set ’clear_fault_action’ to ’true’
8. (Only if previous steps did not clear the fault) Switch to the host console and
type the fmadm repair command with the UUID.
Use the same UUID that was displayed from the output of the Oracle ILOM show
faulty command.
Refer to the SPARC T4 Series Servers Administration Guide for instructions.
2. Use the show faulty command to determine how to clear the fault.
■ If show faulty indicates a POST-detected fault, go to Step 3.
■ If show faulty output displays a UUID, which indicates a host-detected fault,
go to Step 4.
3. Use the set command to enable the DIMM that was disabled by POST.
In most cases, replacement of a faulty DIMM is detected when the SP is power
cycled. In those cases, the fault is automatically cleared from the server module. If
show faulty still displays the fault, the set command will clear it.
-> set /SYS/MB/CMP0/BOB0/CH0/D0 component_state=Enabled
4. For a host-detected fault, verify the new DIMM.
a. Set the virtual keyswitch to diag so that POST will run in Service mode.
-> set /SYS/keyswitch_state=Diag
Set ‘keyswitch_state’ to ‘Diag’
b. Power cycle the server module host.
-> stop /SYS
Are you sure you want to stop /SYS (y/n)? y
Stopping /SYS
-> start /SYS
Are you sure you want to start /SYS (y/n)? y
Starting /SYS
Note – Use the show /HOST command to determine when the host has been
powered off. The console will display status=Powered Off. Allow approximately
one minute before typing this command.
86Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 97
c. Switch to the host console to view POST output.
Watch the POST output for possible fault messages. The following output
indicates that POST did not detect any faults:
-> start /HOST/console
.
.
.
0:7:2>INFO:
0:7:2> POST Passed all devices.
0:7:2>POST: Return to VBSC.
0:7:2>Master set ACK for vbsc runpost command and spin...
Note – The server module might boot automatically at this point. If so, go directly to
Step e. If it remains at the ok prompt go to Step d.
d. If the server module remains at the ok prompt, type boot.
e. Return the virtual keyswitch to Normal mode.
-> set /SYS keyswitch_state=Normal
Set ‘ketswitch_state’ to ‘Normal’
f. Switch to the host console and type the Oracle Solaris OS fmadm faulty
command.
# fmadm faulty
If any faults are reported, see the diagnostics instructions in “Oracle ILOM
If the show faulty command reports a fault with a UUID go to Step 7.Ifshow
faulty does not report a fault with a UUID, you have completed the verification
process.
7. Switch to the host console and type the fmadm repair command with the
UUID.
Use the same UUID that was displayed from the output of the Oracle ILOM show
faulty command.
# fmadm repair 3aa7c854-9667-e176-efe5-e487e520
Related Information
■ “Remove a DIMM” on page 80
■ “Install a DIMM” on page 81
■ “DIMM Configuration” on page 77
88Netra SPARC T4-1B Server Module Service Manual • June 2012
Page 99
Servicing the REM
The server module supports the installation of one REM. For a list of supported
REMs, refer to the Netra SPARC T4-1B Server Module Product Notes.
DescriptionLinks
Troubleshoot a REM
problem.
Replace a REM.“Remove a REM” on page 89
Install a REM.“Install a REM” on page 90
Refer to the documentation for the REM.
“Install a REM” on page 90
Related Information
■ “Detecting and Managing Faults” on page 5
■ “Preparing for Service” on page 51
▼Remove a REM
1. Prepare for service.
See “Preparing for Service” on page 51.
2. Lift the REM ejector arm (panel 1).
89
Page 100
3. Rotate the card up and off the retainer (panels 2 and 3).
4. Set the card on an antistatic surface.
5. Install a REM.
See “Install a REM” on page 90.
Related Information
■ “Install a REM” on page 90
▼Install a REM
For information about specific configuration tasks for your REM, refer to the REM
documentation.
1. (If needed) Prepare for service.
See “Preparing for Service” on page 51.
90Netra SPARC T4-1B Server Module Service Manual • June 2012
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.