This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except
as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform,
publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is
prohibited.
The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing.
If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable:
U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation,
delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental
regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the
hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs. No other rights are granted to the U.S. Government.
This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous
applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all
appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this
software or hardware in dangerous applications.
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of
SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered
trademark of The Open Group.
This software or hardware and documentation may provide access to or information about content, products, and services from third parties. Oracle Corporation and its affiliates are
not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services unless otherwise set forth in an applicable agreement
between you and Oracle. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content,
products, or services, except as set forth in an applicable agreement between you and Oracle.
Access to Oracle Support
Oracle customers that have purchased support have access to electronic support through My Oracle Support. For information, visit http://www.oracle.com/pls/topic/lookup?
ctx=acc&id=info or visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs if you are hearing impaired.
Ce logiciel et la documentation qui l'accompagne sont protégés par les lois sur la propriété intellectuelle. Ils sont concédés sous licence et soumis à des restrictions d'utilisation et
de divulgation. Sauf stipulation expresse de votre contrat de licence ou de la loi, vous ne pouvez pas copier, reproduire, traduire, diffuser, modifier, accorder de licence, transmettre,
distribuer, exposer, exécuter, publier ou afficher le logiciel, même partiellement, sous quelque forme et par quelque procédé que ce soit. Par ailleurs, il est interdit de procéder à toute
ingénierie inverse du logiciel, de le désassembler ou de le décompiler, excepté à des fins d'interopérabilité avec des logiciels tiers ou tel que prescrit par la loi.
Les informations fournies dans ce document sont susceptibles de modification sans préavis. Par ailleurs, Oracle Corporation ne garantit pas qu'elles soient exemptes d'erreurs et vous
invite, le cas échéant, à lui en faire part par écrit.
Si ce logiciel, ou la documentation qui l'accompagne, est livré sous licence au Gouvernement des Etats-Unis, ou à quiconque qui aurait souscrit la licence de ce logiciel pour le
compte du Gouvernement des Etats-Unis, la notice suivante s'applique :
U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation,
delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental
regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the
hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs. No other rights are granted to the U.S. Government.
Ce logiciel ou matériel a été développé pour un usage général dans le cadre d'applications de gestion des informations. Ce logiciel ou matériel n'est pas conçu ni n'est destiné à être
utilisé dans des applications à risque, notamment dans des applications pouvant causer un risque de dommages corporels. Si vous utilisez ce logiciel ou ce matériel dans le cadre
d'applications dangereuses, il est de votre responsabilité de prendre toutes les mesures de secours, de sauvegarde, de redondance et autres mesures nécessaires à son utilisation dans
des conditions optimales de sécurité. Oracle Corporation et ses affiliés déclinent toute responsabilité quant aux dommages causés par l'utilisation de ce logiciel ou matériel pour des
applications dangereuses.
Oracle et Java sont des marques déposées d'Oracle Corporation et/ou de ses affiliés. Tout autre nom mentionné peut correspondre à des marques appartenant à d'autres propriétaires
qu'Oracle.
Intel et Intel Xeon sont des marques ou des marques déposées d'Intel Corporation. Toutes les marques SPARC sont utilisées sous licence et sont des marques ou des marques
déposées de SPARC International, Inc. AMD, Opteron, le logo AMD et le logo AMD Opteron sont des marques ou des marques déposées d'Advanced Micro Devices. UNIX est une
marque déposée de The Open Group.
Ce logiciel ou matériel et la documentation qui l'accompagne peuvent fournir des informations ou des liens donnant accès à des contenus, des produits et des services émanant de
tiers. Oracle Corporation et ses affiliés déclinent toute responsabilité ou garantie expresse quant aux contenus, produits ou services émanant de tiers, sauf mention contraire stipulée
dans un contrat entre vous et Oracle. En aucun cas, Oracle Corporation et ses affiliés ne sauraient être tenus pour responsables des pertes subies, des coûts occasionnés ou des
dommages causés par l'accès à des contenus, produits ou services tiers, ou à leur utilisation, sauf mention contraire stipulée dans un contrat entre vous et Oracle.
Accès aux services de support Oracle
Les clients Oracle qui ont souscrit un contrat de support ont accès au support électronique via My Oracle Support. Pour plus d'informations, visitez le site http://www.oracle.com/
pls/topic/lookup?ctx=acc&id=info ou le site http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs si vous êtes malentendant.
Page 5
Contents
Using This Documentation ............. ................ ................ ................ ................ ... 11
Returning the Server to Operation .................................................................. 201
▼ Connect Power Cords .............................................................................. 201
▼ Power On the Server (Oracle ILOM) ........ ................ ................ ................ . 202
Index ............. ................ ................ ................ ................ ................ ................ ... 203
9
Page 10
10SPARC T7-4 Server Service Manual • May 2017
Page 11
Using This Documentation
■
Overview – Describes how to troubleshooot and maintain the server
■
Audience – Technicians, system administrators, and authorized service providers
■
Required knowledge – Advanced experience troubleshooting and replacing hardware
Product Documentation Library
Documentation and resources for this product and related products are available at http://www.
oracle.com/goto/t7-4/docs.
Feedback
Provide feedback about this documentation at http://www.oracle.com/goto/docfeedback.
Using This Documentation11
Page 12
12SPARC T7-4 Server Service Manual • May 2017
Page 13
Identifying Components
These topics identify key components of the server, including major boards and internal system
cables, as well as front and rear panel features.
■
“Front Panel Components (Service)” on page 14
■
“Rear Panel Components (Service)” on page 16
■
“Chassis Subassembly Components” on page 18
■
“Processor Module Components” on page 19
■
“Main Module Components” on page 20
■
“Supported Storage and Backup Devices” on page 22
■
“Component Service Task Reference” on page 22
■
“System Schematic” on page 41
Related Information
■
“Detecting and Managing Faults”
■
“Preparing for Service”
■
“Returning the Server to Operation”
Identifying Components13
Page 14
Front Panel Components (Service)
Front Panel Components (Service)
No.DescriptionLinks
1Processor modules (slots 0 and
1) or processor filler module
(slot 1 only)
2Control panel“Detecting and Managing Faults” on page 25
3Main module“Main Module Components” on page 20
4Power supplies (4)“Servicing Power Supplies” on page 147
14SPARC T7-4 Server Service Manual • May 2017
“Processor Module Components” on page 19
“Servicing Processor Modules” on page 55
“Preparing for Service” on page 43
“Returning the Server to Operation” on page 201
“Servicing the Main Module” on page 99
Page 15
Related Information
■
“Rear Panel Components (Service)” on page 16
■
“Chassis Subassembly Components” on page 18
■
“Processor Module Components” on page 19
■
“Main Module Components” on page 20
■
“Supported Storage and Backup Devices” on page 22
■
“Component Service Task Reference” on page 22
■
“System Schematic” on page 41
Front Panel Components (Service)
Identifying Components15
Page 16
Rear Panel Components (Service)
Rear Panel Components (Service)
No.DescriptionLinks
1Fan modules (5)“Servicing Fan Modules” on page 157
2AC power connectors (4)“Preparing for Service” on page 43
3Rear I/O module“Servicing the Rear I/O Module” on page 183
4PCIe carriers (16)“Servicing PCIe Cards” on page 165
These components are accessible within the rear chassis subassembly, which you can access
after you have removed all the components from the rear of the server.
16SPARC T7-4 Server Service Manual • May 2017
Page 17
Rear Panel Components (Service)
No.DescriptionLinks
1Chassis
2Midplane assembly“Servicing the Rear Chassis Subassembly” on page 193
3Rear chassis subassembly“Servicing the Rear Chassis Subassembly” on page 193
Related Information
■
“Front Panel Components (Service)” on page 14
■
“Chassis Subassembly Components” on page 18
■
“Processor Module Components” on page 19
■
“Main Module Components” on page 20
■
“Supported Storage and Backup Devices” on page 22
■
“Component Service Task Reference” on page 22
■
“System Schematic” on page 41
Identifying Components17
Page 18
Chassis Subassembly Components
Chassis Subassembly Components
No.DescriptionLinks
1Hard drives (8)“Servicing Hard Drives” on page 87
2Front I/O assembly“Servicing the Front I/O Assembly” on page 143
3Main module“Servicing the Main Module” on page 99
4System controls and indicators“Front Panel Controls and LEDs” on page 29
5Processor modules (2)“Servicing Processor Modules” on page 55
6Chassis
7Rear chassis subassembly (RCSA)“Servicing the Rear Chassis Subassembly” on page 193
8Fan modules (5)“Servicing Fan Modules” on page 157
9PCIe carriers (16)“Servicing PCIe Cards” on page 165
10Rear I/O module“Servicing the Rear I/O Module” on page 183
11Power supplies (4)“Servicing Power Supplies” on page 147
18SPARC T7-4 Server Service Manual • May 2017
Page 19
Related Information
■
“Front Panel Components (Service)” on page 14
■
“Rear Panel Components (Service)” on page 16
■
“Processor Module Components” on page 19
■
“Main Module Components” on page 20
■
“Supported Storage and Backup Devices” on page 22
■
“Component Service Task Reference” on page 22
■
“System Schematic” on page 41
Processor Module Components
These components are accessible within the processor module when you remove the processor
module from the front of the server.
Processor Module Components
Identifying Components19
Page 20
Main Module Components
No.DescriptionLink
1DIMMs“Servicing DIMMs” on page 69
Related Information
■
“Front Panel Components (Service)” on page 14
■
“Rear Panel Components (Service)” on page 16
■
“Chassis Subassembly Components” on page 18
■
“Main Module Components” on page 20
■
“Supported Storage and Backup Devices” on page 22
■
“Component Service Task Reference” on page 22
■
“System Schematic” on page 41
Main Module Components
These components are accessible after you remove the main module from the front of the
server.
20SPARC T7-4 Server Service Manual • May 2017
Page 21
Main Module Components
No.DescriptionLinks
1Hard drives“Servicing Hard Drives” on page 87
2Front I/O assembly and cables“Servicing the Front I/O Assembly” on page 143
3Storage backplane“Servicing the Drive Backplane” on page 119
4Main module motherboard
5SPM“Servicing the SPM” on page 125
6SCC PROM“Servicing the SCC PROM” on page 133
7Battery“Servicing the Battery” on page 137
8NVMe cards (optional)“Servicing NVMe Switch Cards” on page 111
Related Information
■
“Front Panel Components (Service)” on page 14
■
“Rear Panel Components (Service)” on page 16
Identifying Components21
Page 22
Supported Storage and Backup Devices
■
“Chassis Subassembly Components” on page 18
■
“Processor Module Components” on page 19
■
“Supported Storage and Backup Devices” on page 22
■
“Component Service Task Reference” on page 22
■
“System Schematic” on page 41
Supported Storage and Backup Devices
The server supports the following storage devices:
■
Fibre channel arrays (SATA, FC, flash, and SAS-2)
■
SAS arrays (SAS-2)
■
ZFS appliances (SAS-2)
The server also supports these types of tape backup and restore devices:
■
TCP/IP
■
Fibre channel
■
SAS
■
LVD SCSI
Related Information
■
“Front Panel Components (Service)” on page 14
■
“Rear Panel Components (Service)” on page 16
■
“Chassis Subassembly Components” on page 18
■
“Processor Module Components” on page 19
■
“Main Module Components” on page 20
■
“Component Service Task Reference” on page 22
■
“System Schematic” on page 41
Component Service Task Reference
This table lists the names of serviceable components. It also lists the system names and task
locations for the components.
These topics explain how to use various diagnostic tools to monitor server status and
troubleshoot faults in the server. The examples use the PSH fmadmfaulty command.
■
“Understanding Diagnostics” on page 25
■
“Checking for Faults” on page 27
■
“Interpreting Log Files and System Messages” on page 35
■
“Configuring POST” on page 37
■
“Clear a Fault Manually” on page 40
Related Information
■
“Identifying Components” on page 13
■
“Component Service Categories” on page 46
■
“Preparing for Service” on page 43
■
“Returning the Server to Operation” on page 201
Understanding Diagnostics
These topics explain the diagnostic process and tools.
■
“PSH Overview” on page 25
■
“Diagnostics Process” on page 26
PSH Overview
The PSH feature provides problem diagnosis on the SPM and the host. Regardless of where a
fault occurs, you can view and manage the fault diagnosis from the SPM or the host.
Detecting and Managing Faults25
Page 26
Understanding Diagnostics
When possible, PSH initiates steps to take the component offline. PSH also logs the fault to the
syslogd daemon and provides a fault notification with a message ID. You can use the message
ID to get additional information about the problem from the Knowledge Base article database.
A PSH console message provides this information about each detected fault:
■
Type
■
Severity
■
Description
■
Automated response
■
Impact
■
Suggested action for system administrator
If PSH detects a faulty component, use the fmadmfaulty command to display information
about the fault. See “Check for Faults” on page 33.
Related Information
■
“Diagnostics Process” on page 26
■
“Checking for Faults” on page 27
Diagnostics Process
This table describes the diagnostics process.
StepDiagnostic ActionPossible OutcomeLinks
1.Check the server for detected
2.Check the log files for fault
3.Run POST to provide
faults using these tools:
■ System LEDs on the
front and rear panels.
■
fmadm faultycommand
from the Oracle Solaris
prompt or through the
Oracle ILOM fault
management shell.
information.
additional low-level
Determine the faulty component and replace it, or
continue to advanced troubleshooting.
If system messages indicate a faulty component, replace
it.
If POST indicates a faulty component, replace it.“Configuring POST” on page 37
“Checking for Faults” on page 27
“Interpreting Log Files and System
Messages” on page 35
26SPARC T7-4 Server Service Manual • May 2017
Page 27
StepDiagnostic ActionPossible OutcomeLinks
diagnostic information for
the server.
4.Contact technical support if
the problem persists.
If you are unable to determine the cause of a fault,
contact Oracle Support for help.
https://support.oracle.comhttps://
support.oracle.com
Related Information
■
“PSH Overview” on page 25
■
“Checking for Faults” on page 27
Checking for Faults
Use these methods to check for faults:
■
“Interpreting LEDs” on page 27
■
“Log In to Oracle ILOM (Service)” on page 32
■
“Check for Faults” on page 33
Checking for Faults
Interpreting LEDs
Use these steps to determine if an LED indicates that a component has failed in the server.
StepsDescriptionLinks
1.Check the LEDs on the front and rear of the server.■ “Front Panel Controls and
2.Check the LEDs on the individual components.
Note - Component LEDs might not be lit
even though the component is faulty. Use the
LEDs” on page 29
■ “Rear Panel Controls and
LEDs” on page 31
■ “Determine if the Main Module Is
Faulty” on page 101
Detecting and Managing Faults27
Page 28
Checking for Faults
StepsDescriptionLinks
instructions in these links to determine if the
component has been diagnosed as being faulty.
■ “Determine Which Processor Module Is
■ “Identifying Faulty DIMMs” on page 74
■ “Determine Which Hard Drive Is
■ “Determine Which Power Supply Is
■ “Determine Which Fan Module Is
■ “Determine Which PCIe Card Is
■ “Determine if the Rear I/O Module Is
Related Information
■
“Front Panel Controls and LEDs” on page 29
■
“Rear Panel Controls and LEDs” on page 31
Faulty” on page 60
Faulty” on page 90
Faulty” on page 151
Faulty” on page 158
Faulty” on page 170
Faulty” on page 186
28SPARC T7-4 Server Service Manual • May 2017
Page 29
Front Panel Controls and LEDs
Checking for Faults
No.LEDIcon or LabelDescription
1Locator LED and
button (white)
2Server Service
Required LED
(amber)
3Power OK LED
(green)
You can turn on the Locator LED to identify a particular
server. When lit, the LED blinks rapidly. Turn on the Locator
LED by pressing the Locator button, or see “Locate the
Server” on page 49.
The fmadmfaulty command provides details about any
faults that cause this indicator to light. See “Check for
Faults” on page 33.
Under some fault conditions, individual component fault LEDs
are lit in addition to the Server Service Required LED.
Indicates these conditions:
■ Off – Server is not running in its normal state. Server power
might be off. The SPM might be running.
Detecting and Managing Faults29
Page 30
Checking for Faults
No.LEDIcon or LabelDescription
■ Steady on – Server is powered on and is running in its normal
operating state. No service actions are required.
■ Fast blink – Server is running in standby mode and can be
quickly returned to full function.
■ Slow blink – A normal but transitory activity is taking place.
Slow blinking might indicate that server diagnostics are
running or that the server is booting.
4Power buttonThe recessed Power button toggles the server on or off.
See “Power Off the Server (Power Button – Graceful
Shutdown)” on page 52.
5System Overtemp
Indicates these conditions:
LED(amber)
■ Off – Indicates a steady state, no service action is required.
■ Steady on – Indicates that a temperature failure event has
been acknowledged and a service action is required.
6Fan Module Fault
Rear FMIndicates these conditions:
LED(amber)
■ Off – Indicates a steady state, no service action is required.
■ Steady on – Indicates that a fan module failure event has been
acknowledged and a service action is required on at least one
of the fan modules.
7PCIe Card Fault
Rear PCIeIndicates these conditions:
LED(amber)
■ Off – Indicates a steady state, no service action is required.
■ Steady on – Indicates that a failure event has been
acknowledged and a service action is required on at least one
of the PCIe cards.
30SPARC T7-4 Server Service Manual • May 2017
Page 31
Rear Panel Controls and LEDs
Checking for Faults
No.LEDIcon or LabelDescription
1AC 0 (left) and AC 1 (right)
power LED
2Net MGT port link LEDIndicates these conditions:
3Net MGT port speed LEDIndicates these conditions:
4Network port link LEDIndicates these conditions:
5Network port speed LEDIndicates these conditions:
Indicates these conditions:
■ Off – No power is applied to the server.
■ Green – Power is applied to the server.
■ Off – No link is established.
■ On or blinking – A link is established.
■ Off – The link is operating as a 10-Mbps connection.
■ On or blinking – The link is operating as a 100-Mbps connection.
■ Off – No link is established.
■ Blinking – A link is established.
■ Off – The link is operating as a 10-Mbps connection or there is no
link.
■ Amber on – The link is operating as a 100-Mbps connection.
■ Green on – The link is operating as a Gigabit connection (1000
Mbps).
Detecting and Managing Faults31
Page 32
Log In to Oracle ILOM (Service)
No.LEDIcon or LabelDescription
6AC 2 (left) and AC 3 (right)
power LEDs
7Locator LED and button
(white)
Indicates these conditions:
■ Off – No power is applied to the server.
■ Green – Power is applied to the server.
Turn on the Locator LED by pressing the Locator button, or see
“Locate the Server” on page 49. When lit, the LED blinks rapidly.
8Server Service Required
LED (amber)
9Power OK LED (green)Indicates these conditions:
10SP LEDSPIndicates these conditions:
11Overtemp LED(amber)Indicates these conditions:
The fmadmfaulty command provides details about any faults that
cause this indicator to light. See “Check for Faults” on page 33.
Under some fault conditions, individual component fault LEDs are lit
in addition to the Service Required LED.
■ Off – Server is not running in its normal state. System power might
be off. The SPM might be running.
■ Steady on – Server is powered on and is running in its normal
operating state. No service actions are required.
■ Fast blink – Server is running in standby mode and can be quickly
returned to full function.
■ Slow blink – A normal but transitory activity is taking place. Slow
blinking might indicate that system diagnostics are running or that
the system is booting.
■ Off – AC power might have been connected to the power supplies.
■ Steady on, green – SPM is running in its normal operating state.
No service actions are required.
■ Blink, green – SPM is initializing the Oracle ILOM firmware.
■ Steady on, amber – An SPM error has occurred and service is
required.
■ Off – Indicates a steady state, no service action is required.
■ Steady on – Indicates that a temperature failure event has been
acknowledged and a service action is required.
Log In to Oracle ILOM (Service)
1.
At the terminal prompt, type:
ssh root@IP-address
Password: password
Oracle (R) Integrated Lights Out Manager
Version 3.2.1.2 rXXXXX
32SPARC T7-4 Server Service Manual • May 2017
Page 33
Check for Faults
Copyright (c) 2013, Oracle and/or its affiliates. All rights
reserved.
->
Note - To enable first-time login and access to Oracle ILOM, a default Administrator account
and its password are provided with the system. To build a secure environment, you must change
the default password (changeme) for the default Administrator account (root) after your initial
login to Oracle ILOM. If this default Administrator account has since been changed, contact
your system administrator for an Oracle ILOM user account with Administrator privileges.
2.
Enable the Oracle ILOM 3.0 legacy name spaces.
-> set /SP/cli legacy_targets=enabled
Note - In Oracle ILOM 3.1, the name spaces for /SYS and /STORAGE were replaced with
/System. You can still use the 3.0 legacy names in commands at any time, but to expose the
legacy names in the output, you must enable them. This manual uses the legacy names in the
command examples and shows the names in the output examples. For more information about
the new name spaces, see the Oracle ILOM documentation.
Related Information
■
“Interpreting LEDs” on page 27
■
“Check for Faults” on page 33
Check for Faults
The fmadmfaulty command displays the list of faults detected by PSH. You can run this
command from either the host or through the Oracle ILOM fault management shell.
1.
Log in to Oracle ILOM.
See “Log In to Oracle ILOM (Service)” on page 32.
2.
Check for PSH-diagnosed faults.
This example shows how to check for faults through the Oracle ILOM fault management shell.
-> start /SP/faultmgmt/shell
Are you sure you want to start /SP/faultmgmt/shell (y/n)? y
Description : A fault has been diagnosed by the Host Operation System.
Response : The service required LED on the chassis and on the affected
FRU may be illuminated.
Impact : No SPM impact
Action : Refer to the associated reference document at
https://support.oracle.com/msg/PCIEX-8000-8R for the latest
service procedures and policies regarding this diagnosis.
faultmgmtsp>
In this example, a fault is displayed that includes these details:
■
Date and time of the fault (2012-08-27/19:46:26).
34SPARC T7-4 Server Service Manual • May 2017
Page 35
Interpreting Log Files and System Messages
■
UUID (4e16c8d-5cdb-c6ca-c949-e24d3637ef27), which is unique to each fault.
■
Message identifier (PCIEX-8000-8R), which can be used to obtain additional
fault information from Knowledge Base articles.
3.
Consider your next step:
■If you are checking for faults while adding a second processor module, and
no faults were detected, return to “Server Upgrade Process” on page 56.
■If a fault is detected, proceed to Step 4.
4.
Use the message ID to obtain more information about this type of fault.
a.
Obtain the message ID from console output.
b.
Go to https://support.oracle.com, and search on the message ID in the
Knowledge tab.
5.
Follow the suggested actions to repair the fault.
6.
If necessary, clear the fault manually.
See “Clear a Fault Manually” on page 40.
Related Information
■
“PSH Overview” on page 25
■
“Clear a Fault Manually” on page 40
Interpreting Log Files and System Messages
With the OS running on the server, you have the full complement of Oracle Solaris OS files and
commands available for collecting information and for troubleshooting.
If PSH does not indicate the source of a fault, check the message buffer and log files for
notifications for faults. Drive faults are usually captured by the Oracle Solaris message files.
These topics explain how to view the log files and system messages.
Detecting and Managing Faults35
Page 36
Check the Message Buffer
■
■
■
Check the Message Buffer
The dmesg command checks the system buffer for recent diagnostic messages and displays
them.
1.
Log in as superuser.
2.
Type:
# dmesg
Related Information
■
■
“Check the Message Buffer” on page 36
“View Log Files (Oracle Solaris)” on page 36
“View Log Files (Oracle ILOM)” on page 37
“View Log Files (Oracle Solaris)” on page 36
“View Log Files (Oracle ILOM)” on page 37
View Log Files (Oracle Solaris)
The error logging daemon, syslogd, automatically records various system warnings, errors, and
faults in message files. These messages can alert you to system problems such as a device that
is about to fail.
The /var/adm directory contains several message files. The most recent messages are in
the /var/adm/messages file. After a period of time (usually every week), a new messages
file is automatically created. The original contents of the messages file are rotated to a file
named messages.1. Over a period of time, the messages are further rotated to messages.2 and
messages.3, and then deleted.
1.
Log in as superuser.
2.
Type:
# more /var/adm/messages
3.
To view all logged messages, type:
36SPARC T7-4 Server Service Manual • May 2017
Page 37
# more /var/adm/messages*
Related Information
■
“Check the Message Buffer” on page 36
■
“View Log Files (Oracle ILOM)” on page 37
View Log Files (Oracle ILOM)
1.
View the event log.
-> show /SP/logs/event/list
2.
View the audit log.
-> show /SP/logs/audit/list
Related Information
View Log Files (Oracle ILOM)
■
“Check the Message Buffer” on page 36
■
“View Log Files (Oracle Solaris)” on page 36
Configuring POST
These topics explain how to configure POST as a diagnostic tool.
■
“POST Overview” on page 37
■
“Configure POST” on page 38
POST Overview
POST is a group of PROM-based tests that run when the server is powered on or when it is
reset. POST checks the basic integrity of the critical hardware components in the server.
You can also set other Oracle ILOM properties to control various other aspects of POST
operations. For example, you can specify the events that cause POST to run, the level of testing
Detecting and Managing Faults37
Page 38
Configure POST
1.
POST performs, and the amount of diagnostic information POST displays. These properties are
described in “Configure POST” on page 38.
If POST detects a faulty component, the component is disabled automatically. If the server is
able to run without the disabled component, the server boots when POST completes its tests.
For example, if POST detects a faulty processor core, the core is disabled, POST completes its
test sequence, and the server boots using the remaining cores.
Related Information
■
“Configure POST” on page 38
Configure POST
Log in to Oracle ILOM.
See “Log In to Oracle ILOM (Service)” on page 32.
2.
Set the virtual keyswitch to the value that corresponds to the POST
configuration you want to run.
This example sets the virtual keyswitch default_level to min, which configures POST to run
according to other parameter values.
-> set /HOST keyswitch_state=min
Set default_level to min
For possible values for the keyswitch_state parameter, type:
-> show /HOST diag help
/HOST/diag : Manage Host Power On Self Test Diagnostics
Targets:
Properties:
default_level : Diag level in the default cause (no error or hw change)
default_level : Possible values = off, min, max
default_level : User role required for set = r
default_verbosity : Diag verbosity in the default cause (no error or hw
change)
default_verbosity : Possible values = none, min, normal, max
38SPARC T7-4 Server Service Manual • May 2017
Page 39
Configure POST
default_verbosity : User role required for set = r
error_level : Diag level when running after an error reset
error_level : Possible values = off, min, max
error_level : User role required for set = r
error_verbosity : Diag verbosity when running after an error reset
error_verbosity : Possible values = none, min, normal, max
error_verbosity : User role required for set = r
hw_change_level : Diag level when running after a hw change
hw_change_level : Possible values = off, min, max
hw_change_level : User role required for set = r
hw_change_verbosity : Diag verbosity when running after a hw change
hw_change_verbosity : Possible values = none, min, normal, max
hw_change_verbosity : User role required for set = r
->
Note - Depending on the server configuration, setting the HOST keyswitch_state diagnostics
verbosity to none might result in no POST test status displaying on the console for an extended
period of time.
3.
You can also set the virtual keyswitch to determine the diagnostic level after
an error reset and after a hardware change. To set error_level, to max, and
hw_change_level to max, type.
-> set /HOST/diag error_level=max
-> set /HOST/diag hw_change_level=max
4.
View the current values for settings.
Example:
-> show /HOST/diag
/HOST/diag
Targets:
Properties:
error_reset_level = max
error_reset_verbosity = normal
hw_change_level = max
hw_change_verbosity = normal
level = min
mode = normal
power_on_level = max
Detecting and Managing Faults39
Page 40
Clear a Fault Manually
power_on_verbosity = normal
trigger = hw_change error-reset
verbosity = normal
Commands:
cd
set
show
->
Related Information
■
“POST Overview” on page 37
Clear a Fault Manually
When PSH detects faults, the faults are logged and displayed on the console. In most cases,
after the fault is repaired, the corrected state is detected by the server, and the fault condition
is repaired automatically. However, this repair should be verified. In cases where the fault
condition is not automatically cleared, you must clear the fault manually.
1.
After replacing a faulty FRU, power on the server.
See “Returning the Server to Operation” on page 201.
2.
At the host prompt, determine whether the replaced FRU still shows a faulty
state.
See “Check for Faults” on page 33.
■If no fault is reported, you do not need to do anything else. Do not perform
the subsequent steps.
■If a fault is reported, continue to Step 3.
3.
Clear the fault from all persistent fault records.
In some cases, even though the fault is cleared, some persistent fault information remains
and results in erroneous fault messages at boot time. To ensure that these messages are not
displayed, type this PSH command:
faultmgmtsp> fmadm acquit UUID
4.
If required, reset the server.
40SPARC T7-4 Server Service Manual • May 2017
Page 41
In some cases, the output of the fmadmfaulty command might include this message for the
faulty component:
Component faulted and taken out of service
If this message appears in the output, you must reset the server after you manually repair the
fault.
faultmgmtsp> exit
-> reset /System
Are you sure you want to reset /System? y
Resetting /System ...
Related Information
■
“PSH Overview” on page 25
■
“Check for Faults” on page 33
System Schematic
System Schematic
This schematic shows the connections between and among specific components and device
slots. You can use this schematic to determine optimum locations for any optional cards or other
peripherals based on system configuration and intended use.
Detecting and Managing Faults41
Page 42
System Schematic
Related Information
■
“Front Panel Components (Service)” on page 14
■
“Rear Panel Components (Service)” on page 16
■
“Chassis Subassembly Components” on page 18
■
“Processor Module Components” on page 19
■
“Main Module Components” on page 20
■
“Supported Storage and Backup Devices” on page 22
■
“Component Service Task Reference” on page 22
42SPARC T7-4 Server Service Manual • May 2017
Page 43
Preparing for Service
These topics describe how to prepare the server for servicing.
StepDescriptionLink
1.Review safety and handling information.“Safety Information” on page 43
2.Gather the tools needed for service.“Tools Needed for Service” on page 45
3.Consider filler options.“Component Fillers” on page 46
4.Find the server serial number.“Find the Server Serial Number” on page 47
5.Identify the server to be serviced.“Locate the Server” on page 49
6.Locate the component service information.“Component Service Task Reference” on page 22
7.For cold-service operations, shut down the OS.“Removing Power From the Server” on page 50
8.Gain access to service components.“Chassis Subassembly Components” on page 18
Safety Information
For your protection, observe the following safety precautions when setting up your equipment:
■
Follow all cautions and instructions marked on the equipment and described in the
documentation shipped with your server.
■
Follow all cautions and instructions marked on the equipment and described in the SPARCT7-4 Server Safety and Compliance Guide.
■
Ensure that the voltage and frequency of your power source match the voltage and
frequency inscribed on the equipment's electrical rating label.
■
Follow the ESD safety practices as described in this section.
This topic includes the following sections:
■
“Safety Symbols” on page 44
■
“ESD Precautions” on page 44
■
“Antistatic Wrist Strap” on page 44
Preparing for Service43
Page 44
Safety Information
■
“Antistatic Mat” on page 45
Safety Symbols
Note the meanings of the following symbols that might appear in this document:
Caution - There is a risk of personal injury or equipment damage. To avoid personal injury and
equipment damage, follow the instructions.
Caution - Hot surface. Avoid contact. Surfaces are hot and might cause personal injury if
touched.
Caution - Hazardous voltages are present. To reduce the risk of electric shock and danger to
personal health, follow the instructions.
ESD Precautions
ESD-sensitive devices, such as the PCIe cards, hard drives, and DIMMs require special
handling.
Caution - Circuit boards and hard drives contain electronic components that are extremely
sensitive to static electricity. Ordinary amounts of static electricity from clothing or the work
environment can destroy the components located on these boards. Do not touch the components
along their connector edges.
Caution - You must disconnect all power supplies before servicing any of the components that
are inside the chassis.
Antistatic Wrist Strap
Wear an antistatic wrist strap and use an antistatic mat when handling components such as hard
drive assemblies, circuit boards, or PCIe cards. When servicing or removing server components,
attach an antistatic strap to your wrist and then to a metal area on the chassis. Following this
practice equalizes the electrical potentials between you and the server.
44SPARC T7-4 Server Service Manual • May 2017
Page 45
Tools Needed for Service
Antistatic Mat
Place ESD-sensitive components such as motherboards, memory, and other PCBs on an
antistatic mat.
Related Information
■
“Tools Needed for Service” on page 45
■
“Component Fillers” on page 46
■
“Component Service Categories” on page 46
■
“Find the Server Serial Number” on page 47
■
“Locate the Server” on page 49
■
“Prevent ESD Damage” on page 49
■
“Removing Power From the Server” on page 50
Tools Needed for Service
You will need the following tools for most service operations:
■
Antistatic wrist strap
■
Antistatic mat
■
No. 1 Phillips screwdriver
■
No. 2 Phillips screwdriver
■
No. 1 flat-blade screwdriver (battery removal)
Related Information
■
“Safety Information” on page 43
■
“Tools Needed for Service” on page 45
■
“Component Fillers” on page 46
■
“Component Service Categories” on page 46
■
“Find the Server Serial Number” on page 47
■
“Locate the Server” on page 49
■
“Prevent ESD Damage” on page 49
Preparing for Service45
Page 46
Component Fillers
■
“Removing Power From the Server” on page 50
Component Fillers
Depending on configuration, each server is shipped with replacement fillers for hard drives and
processor modules. A filler is an empty metal or plastic component that does not contain any
functioning system hardware or cable connectors.
The fillers are installed at the factory and must remain in the server until you replace them with
a functional component to ensure proper airflow through the sytem. If you remove a filler and
continue to operate your system with an empty slot, the server might overheat due to improper
airflow. For instructions on removing or installing a filler for a server component, refer to the
topic in this document about servicing that component.
Related Information
■
“Safety Information” on page 43
■
“Tools Needed for Service” on page 45
■
“Component Service Categories” on page 46
■
“Find the Server Serial Number” on page 47
■
“Locate the Server” on page 49
■
“Prevent ESD Damage” on page 49
■
“Removing Power From the Server” on page 50
Component Service Categories
Replaceable components fall into these categories:
■
Hot-serviceable by the customer – Hot-serviceable components can be removed while
the server is running. Hot-swappable components do not require any preparation prior to
servicing. Hot-pluggable components do require preparation prior to servicing.
■
Cold-serviceable by the customer or exclusively by authorized service personnel –
Cold-serviceable components require that the server is shut down. In addition, some service
procedures require that the power cables be disconnected between the power supplies and
the power source.
The following table identifies the server components that are replaceable.
46SPARC T7-4 Server Service Manual • May 2017
Page 47
Find the Server Serial Number
ComponentPower Status for
Processor moduleOff“Servicing Processor Modules” on page 55
DIMMOff“Servicing DIMMs” on page 69
Hard driveOff or On“Servicing Hard Drives” on page 87
Main module
NVMe switch cardOff“Servicing NVMe Switch Cards” on page 111
Storage backplane
SPM
SCC PROM
System battery
Front I/O assembly
Power supplyOff or On“Servicing Power Supplies” on page 147
Fan moduleOff or On“Servicing Fan Modules” on page 157
PCIe cardOff or On“Servicing PCIe Cards” on page 165
Rear I/O module
Rear chassis subassembly
†
You must disconnect the ower cords before accessing this component..
†
*
*
*
*
*
*
*
Removal
Off“Servicing the Main Module” on page 99
OffX“Servicing the Drive Backplane” on page 119
OffX“Servicing the SPM” on page 125
OffX“Servicing the SCC PROM” on page 133
OffX“Servicing the Battery” on page 137
Off“Servicing the Front I/O Assembly” on page 143
OffX“Servicing the Rear I/O Module” on page 183
OffX“Servicing the Rear Chassis Subassembly” on page 193
Authorized
Service
Personnel Only
Remove and Replace Instructions
Related Information
■
“Safety Information” on page 43
■
“Tools Needed for Service” on page 45
■
“Component Fillers” on page 46
■
“Find the Server Serial Number” on page 47
■
“Locate the Server” on page 49
■
“Prevent ESD Damage” on page 49
■
“Removing Power From the Server” on page 50
Find the Server Serial Number
If you require technical support for your server, you will be asked to provide the server's serial
number.
Use one of the following options to find the serial number.
Preparing for Service47
Page 48
Find the Server Serial Number
■Locate the manufacturing sticker on the front of the server or on the sticker
on the side of the server.
You can use the Locator LEDs to identify a particular server.
1.
At the Oracle ILOM prompt, type:
-> set /SYS/LOCATE value=Fast_Blink
The white Locator LEDs (one on the front panel and one on the rear panel) blink.
2.
After locating the server with the blinking Locator LED, turn it off using one of
the following methods.
■Press the Locator button.
■At the Oracle ILOM prompt, type:
Locate the Server
-> set /SYS/LOCATE value=Off
Related Information
■
“Safety Information” on page 43
■
“Tools Needed for Service” on page 45
■
“Component Fillers” on page 46
■
“Component Service Categories” on page 46
■
“Find the Server Serial Number” on page 47
■
“Prevent ESD Damage” on page 49
■
“Removing Power From the Server” on page 50
Prevent ESD Damage
Many components contained in the processor modules and main module can be damaged by
ESD. To protect these components from damage, perform the following steps before opening
these modules for service.
1.
Prepare an antistatic surface to set parts on during the removal, installation, or
replacement process.
Preparing for Service49
Page 50
Removing Power From the Server
Place ESD-sensitive components, such as the printed circuit boards, on an antistatic mat. The
following items can be used as an antistatic mat:
■
Antistatic bag used to wrap a replacement part
■
ESD mat
■
A disposable ESD mat (shipped with some repacement parts or optional server components)
2.
Attach an antistatic wrist strap.
When servicing or removing server components, attach an antistatic strap to your wrist and then
to a metal area on the chassis.
Related Information
■
“Safety Information” on page 43
■
“Servicing Processor Modules” on page 55
■
“Servicing DIMMs” on page 69
■
“Servicing the Main Module” on page 99
■
“Servicing the Drive Backplane” on page 119
■
“Servicing the SPM” on page 125
■
“Servicing the SCC PROM” on page 133
■
“Servicing the Battery” on page 137
■
“Servicing the Front I/O Assembly” on page 143
■
“Servicing PCIe Cards” on page 165
■
“Servicing the Rear I/O Module” on page 183
■
“Servicing the Rear Chassis Subassembly” on page 193
Removing Power From the Server
These topics describe different methods for removing power from the chassis.
■
“Prepare to Power Off the Server” on page 51
■
“Power Off the Server (Oracle ILOM)” on page 51
■
“Power Off the Server (Power Button – Graceful Shutdown)” on page 52
■
“Power Off the Server (Power Button – Emergency Shutdown)” on page 52
■
“Disconnect Power Cords” on page 53
■
“Prevent ESD Damage” on page 49
50SPARC T7-4 Server Service Manual • May 2017
Page 51
Prepare to Power Off the Server
Prepare to Power Off the Server
1.
Notify affected users that the server will be shut down.
Refer to the Oracle Solaris system administration documentation for additional information.
2.
Save any open files and quit all running programs.
Refer to your application documentation for specific information for these processes.
3.
Shut down all logical domains.
Refer to the Oracle Solaris system administration documentation for additional information.
4.
Shut down the Oracle Solaris OS.
Refer to the Oracle Solaris system administration documentation for additional information.
5.
Power off the server.
See:
■
“Power Off the Server (Oracle ILOM)” on page 51
■
“Power Off the Server (Power Button – Graceful Shutdown)” on page 52
■
“Power Off the Server (Power Button – Emergency Shutdown)” on page 52
Related Information
■
“Prepare to Power Off the Server” on page 51
■
“Disconnect Power Cords” on page 53
Power Off the Server (Oracle ILOM)
You can use the SPM to perform a graceful shutdown of the server. This type of shutdown
ensures that all of your data is saved and that the server is ready for restart.
1.
Log in as superuser or equivalent.
Depending on the type of problem, you might want to view server status or log files. You also
might want to run diagnostics before you shut down the server.
2.
Switch from the system console to the Oracle ILOM -> prompt by typing the #.
(Hash-Period) key sequence.
3.
At the Oracle ILOM prompt, type:
Preparing for Service51
Page 52
Power Off the Server (Power Button – Graceful Shutdown)
-> stop /System
Stopping /System
4.
If you are powering off the server in order to add a second processor module,
return to “Server Upgrade Process” on page 56.
Related Information
■
“Prepare to Power Off the Server” on page 51
■
“Power Off the Server (Power Button – Graceful Shutdown)” on page 52
■
“Power Off the Server (Power Button – Emergency Shutdown)” on page 52
Power Off the Server (Power Button – Graceful
Shutdown)
This procedure places the server in the power standby mode.
1.
Press and release the recessed Power button.
The Power OK LED blinks rapidly.
2.
If you are powering off the server in order to add a second processor module,
return to “Server Upgrade Process” on page 56.
Related Information
■
“Power Off the Server (Oracle ILOM)” on page 51
■
“Power Off the Server (Power Button – Emergency Shutdown)” on page 52
Power Off the Server (Power Button – Emergency
Shutdown)
Caution - All applications and files are closed abruptly without saving changes. File system
corruption might occur.
Press and hold the Power button for four seconds.
52SPARC T7-4 Server Service Manual • May 2017
Page 53
Disconnect Power Cords
Related Information
■
“Power Off the Server (Oracle ILOM)” on page 51
■
“Power Off the Server (Power Button – Graceful Shutdown)” on page 52
Disconnect Power Cords
You must disconnect the power cords before accessing the following components:
■
Main module
■
Storage backplanes
■
SPM
■
SCC PROM
■
Battery
■
Front I/O assembly
■
Rear I/O module
■
Rear chassis subassembly
1.
Power off the server.
See:
■
“Power Off the Server (Oracle ILOM)” on page 51
■
“Power Off the Server (Power Button – Graceful Shutdown)” on page 52
■
“Power Off the Server (Power Button – Emergency Shutdown)” on page 52
2.
Disconnect all power cords from the server.
Caution - Because standby power is always present in the system, you must unplug the power
cords before accessing certain components.
Related Information
■
“Safety Information” on page 43
■
“Tools Needed for Service” on page 45
■
“Component Fillers” on page 46
■
“Component Service Categories” on page 46
■
“Find the Server Serial Number” on page 47
■
“Locate the Server” on page 49
Preparing for Service53
Page 54
Disconnect Power Cords
■
Attachment of Devices During Service
During service procedures, you might have to connect devices to the server.
■
■
■
“Prevent ESD Damage” on page 49
For OS support, connect an Ethernet cable to the one of the Ethernet connectors ( NET 0,
NET 1, NET 2, or NET 3).
If you plan to interact with the system console directly, you can connect additional external
devices, such as a mouse and keyboard, to the server's USB connectors, and connect a
monitor to the rear DB-15 video connector. For more details on connecting to the video
port, refer to “Connecting Cables” in SPARC T7-4 Server Installation Guide.
If you plan to connect to the Oracle ILOM software over the network, connect an Ethernet
cable to the Ethernet port labeled NET MGT.
Note - The SP uses the NET MGT (out-of-band) port by default. You can configure the
SP to share one of the sever's four Ethernet ports instead. The SP uses only the configured
Ethernet port.
■
If you plan to access the Oracle ILOM CLI through the management port, connect a serial
null modem cable to the RJ-45 serial port labeled SER MGT.
■
The USB connectors on the front panel support USB 2.0. The USB connectors on the rear
panel support USB 3.0.
Related Information
■
“Front Panel Components (Service)” on page 14
■
“Rear Panel Components (Service)” on page 16
■
“Detecting and Managing Faults” on page 25
■
“Connecting Cables” in SPARC T7-4 Server Installation Guide
54SPARC T7-4 Server Service Manual • May 2017
Page 55
Servicing Processor Modules
This topic describes how to service processor modules, and how to upgrade the server from a
single processor module configuration to a dual processor module configuration.
Caution - You must disconnect the power cords before servicing this component. See
“Disconnect Power Cords” on page 53.
DescriptionLinks
Replace a processor module.■ “Determine Which Processor Module Is
Faulty” on page 60
■ “Preparing for Service” on page 43
■ “Remove a Processor Module or Processor Filler
Module” on page 60
■ “Install a Processor Module or Processor Filler
Module” on page 64
■ “Verify a Processor Module” on page 67
Learn the process for upgrading the server from a single
processor module configuration to a two processor
module configuration.
Remove the processor module as part of another
component's service operation.
Install the processor module as part of another
component's service operation.
Related Information
■
“Identifying Components” on page 13
■
“Processor Module Components” on page 19
■
“Detecting and Managing Faults” on page 25
■
“Preparing for Service” on page 43
■
“Component Service Categories” on page 46
■
“Servicing DIMMs” on page 69
“Server Upgrade Process” on page 56
“Remove a Processor Module or Processor Filler
Module” on page 60
“Install a Processor Module or Processor Filler
Module” on page 64
Servicing Processor Modules55
Page 56
Server Upgrade Process
■
“Returning the Server to Operation” on page 201
Server Upgrade Process
The SPARC T7-4 server supports two processor module configurations:
■
Fully-populated — Two processor modules
■
Half-populated— One processor module and one processor filler module
Processor modules are cold-service components that can be replaced only by qualified
service personnel. For the location of the processor modules, see “Front Panel Components
(Service)” on page 14.
Caution - These service procedures require that you handle components that are sensitive to
electrostatic discharge. This discharge can cause failure of server components.
This table contains the steps for upgrading the server to a fully-populated configuration.
StepDescriptionLink
1.Remove the upgrade components from their packaging, and
2.Remove the covers from the new processor module.“Remove a Processor Module or Processor Filler
3.Remove all of the DIMM fillers in the processor module. The
4.Verify that you have the correct DIMMs for your server. All of
5.Install the DIMMs.“Install a DIMM” on page 80
6.Check the server for faults. If any fault is present, you must
7.Shut down the server.“Removing Power From the Server” on page 50
8.Remove the processor filler module from Slot 1.“Remove a Processor Module or Processor Filler
9.Install the new processor module in Slot 1.“Install a Processor Module or Processor Filler
place them on an antistatic mat.
Module” on page 60
“Remove a DIMM” on page 78
steps to remove the DIMM fillers are the same as the steps for
removing DIMMs.
“Understanding DIMM Configurations” on page 69
the DIMMs must be either 16 or 32 GB, and they must match
the size and capacity of the DIMMs already installed in the
server.
“Check for Faults” on page 33
correct the fault and clear it from the server before you can
continue with the upgrade.
Module” on page 60
Module” on page 64
56SPARC T7-4 Server Service Manual • May 2017
Page 57
Processor Module Configuration
StepDescriptionLink
10.Return the server to operation.“Returning the Server to Operation” on page 201
11.Verify the installation. If any fault is present, you must correct
the fault and clear it from the server.
12.Review the root complex changes.“Understanding PCIe Root Complex
13.Review the PCIe card load balancing changes. Even though
the load balancing guidelines change with the u;pgrade, you
do not need to move any existing PCIe cards.
“Verify a Processor Module” on page 67
Connections” on page 165
“PCIe Card Configuration” on page 168
Related Information
■
“Processor Module Components” on page 19
■
“System Schematic” on page 41
■
“Detecting and Managing Faults” on page 25
■
“Removing Power From the Server” on page 50
■
“Servicing DIMMs” on page 69
■
“Processor Module Configuration” on page 57
■
“Remove a Processor Module or Processor Filler Module” on page 60
■
“Install a Processor Module or Processor Filler Module” on page 64
■
“Verify a Processor Module” on page 67
■
“Understanding PCIe Root Complex Connections” on page 165
■
“PCIe Card Configuration” on page 168
■
“Returning the Server to Operation” on page 201
Processor Module Configuration
Processor modules are accessed from the front of the server. In Oracle ILOM, the processor
modules are numbered PM0 and PM1, starting with the lower slot.
Servicing Processor Modules57
Page 58
Processor Module Configuration
No.Description
1Processor Module 1 (PM1) or processor filler module
2Processor Module 0 (PM0)
Note - In servers with two processor modules installed, DIMMs configurations
in both processor modules must be identical. See “Understanding DIMM
Configurations” on page 69.
58SPARC T7-4 Server Service Manual • May 2017
Page 59
Processor Module LEDs
No.LEDIconDescription
1(No function.)Not supported.
Processor Module LEDs
2Service Required (amber)Indicates that the processor module has
3OK (green)Indicates if the processor module is available for
experienced a fault condition.
use.
■ On – The server is running and the processor
module is functioning correctly.
■ Off – The server is powered down and the
processor module is in standby mode.
Related Information
■
“Processor Module Components” on page 19
■
“Server Upgrade Process” on page 56
■
“Determine Which Processor Module Is Faulty” on page 60
■
“Remove a Processor Module or Processor Filler Module” on page 60
■
“Install a Processor Module or Processor Filler Module” on page 64
Servicing Processor Modules59
Page 60
Determine Which Processor Module Is Faulty
■
“Verify a Processor Module” on page 67
Determine Which Processor Module Is Faulty
The following LEDs are lit when a processor module fault is detected:
■
Front and rear System Fault (Service Required) LEDs
■
Service Required LED on the faulty processor module
1.
Determine if the Service Required LEDs are illuminated on the front panel or the
rear I/O module.
See “Interpreting LEDs” on page 27.
2.
From the front of the server, check the processor module LEDs to identify which
processor module needs to be replaced.
See “Processor Module LEDs” on page 59. The amber Service Required LED is lit on the
processor module that needs to be replaced.
3.
Remove the faulty processor module.
See “Remove a Processor Module or Processor Filler Module” on page 60.
Related Information
■
“Processor Module Components” on page 19
■
“Processor Module LEDs” on page 59
■
“Remove a Processor Module or Processor Filler Module” on page 60
■
“Install a Processor Module or Processor Filler Module” on page 64
■
“Verify a Processor Module” on page 67
Remove a Processor Module or Processor Filler Module
Processor modules and processor filler modules are cold-service components that can be
replaced only after you power off the system. Processor modules can be replaced only
by qualified service personnel. For the location of the modules, see “Processor Module
Configuration” on page 57.
Caution - You must disconnect the power cords before servicing this component. See
“Disconnect Power Cords” on page 53.
60SPARC T7-4 Server Service Manual • May 2017
Page 61
Remove a Processor Module or Processor Filler Module
Caution - This procedure requires that you handle components that are sensitive to electrostatic
discharge. This discharge can cause failure of server components.
1.
Prepare the server for service.
See “Preparing for Service” on page 43.
2.
Ensure that the server is powered off.
See “Removing Power From the Server” on page 50.
3.
Disconnect the power cords.
See “Disconnect Power Cords” on page 53.
4.
Locate the processor module in the server that you want to remove.
■If you are replacing a faulty processor module, see “Determine Which
Processor Module Is Faulty” on page 60 to locate a faulty processor
module.
■If you are adding a processor module, remove the processor filler module in
slot 1.
Servicing Processor Modules61
Page 62
Remove a Processor Module or Processor Filler Module
5.
Press the two extraction levers in toward the server and pull the extraction
levers out to disengage the processor module or processor filler module from
the server.
6.
Pull the processor module or processor filler module halfway out of the server,
and close the levers.
62SPARC T7-4 Server Service Manual • May 2017
Page 63
Remove a Processor Module or Processor Filler Module
This action protects the levers from damage while the module is outside the server.
7.
Using two hands, completely remove the processor module or processor filler
module and place the module on an antistatic mat.
Caution - Do not touch the connectors at the rear of the module.
8.
Determine your next step.
■If you are replacing or installing DIMMs within the processor module, see
“Servicing DIMMs” on page 69.
■If you are replacing a faulty processor module, populate and install the
replacement processor module:
a.
Remove all of the DIMMs from the faulty processor module, and set
them in a safe place.
See “Remove a DIMM” on page 78.
Servicing Processor Modules63
Page 64
Install a Processor Module or Processor Filler Module
b.
Install the DIMMs into the new processor module.
See “Install a DIMM” on page 80.
c.
Install the processor module.
See “Install a Processor Module or Processor Filler Module” on page 64.
■If you have removed a processor filler module as part of a server upgrade,
return to “Server Upgrade Process” on page 56.
■If you have removed a processor module or processor filler module to
prepare the server for installation, return to “Preparing for Installation” in
SPARC T7-4 Server Installation Guide.
Related Information
■
“Processor Module Components” on page 19
■
“Processor Module LEDs” on page 59
■
“Server Upgrade Process” on page 56
■
“Determine Which Processor Module Is Faulty” on page 60
■
“Servicing DIMMs” on page 69
■
“Install a Processor Module or Processor Filler Module” on page 64
■
“Verify a Processor Module” on page 67
Install a Processor Module or Processor Filler Module
Processor modules are cold-service components that can be replaced only by qualified
service personnel. For the location of the processor modules, see “Front Panel Components
(Service)” on page 14.
Caution - You must disconnect the power cords before servicing this component. See
“Disconnect Power Cords” on page 53.
Caution - This procedure requires that you handle components that are sensitive to electrostatic
discharge. This discharge can cause failure of server components.
1.
Ensure the power cords are disconnected.
See “Disconnect Power Cords” on page 53.
64SPARC T7-4 Server Service Manual • May 2017
Page 65
Install a Processor Module or Processor Filler Module
2.
Determine your next step.
■If you are installing a processor module after replacing or installing DIMMs,
go to Step 3.
■If you are installing a new processor module to replace a faulty one, install
all of the DIMMs that you removed from the faulty processor module into the
replacement module. See “Install a DIMM” on page 80.
3.
Open the latches on the processor module or processor filler module, and insert
the module into the empty processor module slot in the server.
Note - A processor filler module can only be installed in slot 1.
4.
Bring the levers together toward the center of the module and press the levers
firmly against the module to fully seat the module back into the server.
Servicing Processor Modules65
Page 66
Install a Processor Module or Processor Filler Module
The levers should click into place when the module is fully seated in the server.
5.
Power on the server.
See “Returning the Server to Operation” on page 201.
6.
Verify the processor module functionality.
See “Verify a Processor Module” on page 67.
7.
If you are adding a second processor module to the server, return to “Server
Upgrade Process” on page 56.
Related Information
■
“Processor Module Components” on page 19
■
“Server Upgrade Process” on page 56
■
“Processor Module LEDs” on page 59
■
“Determine Which Processor Module Is Faulty” on page 60
■
“Remove a Processor Module or Processor Filler Module” on page 60
■
“Servicing DIMMs” on page 69
66SPARC T7-4 Server Service Manual • May 2017
Page 67
■
“Verify a Processor Module” on page 67
Verify a Processor Module
1.
Use the Oracle ILOM fault management shell to determine if the new processor
module is shown as enabled or disabled.
-> start /SP/faultmgmt/shell
Are you sure you want to start /SP/faultmgmt/shell (y/n)? y
faultmgmtsp> fmadm faulty
a.
If the output from the fmadm faulty command shows the replacement
processor module as enabled, go to Step 2.
b.
If the output from the fmadm faulty command shows the replacement
processor as disabled, go to “Detecting and Managing Faults” on page 25 to
clear the PSH-detected fault from the server.
Verify a Processor Module
2.
Verify that the OK LED is lit on the processor module and that the Fault LED is
not lit.
See “Processor Module LEDs” on page 59.
3.
Verify that the front and rear Service Required LEDs are not lit.
See “Front Panel Controls and LEDs” on page 29 and “Rear Panel Controls and
LEDs” on page 31.
4.
Perform one of the following tasks based on your verification results:
■If the previous steps did not clear the fault, see “Diagnostics
Process” on page 26.
■If Step 2 and Step 3 indicate that no faults have been detected, then the
processor module has been replaced successfully. No further action is
required.
■If you are verifying the server after adding a second processor module,
return to “Server Upgrade Process” on page 56.
Servicing Processor Modules67
Page 68
Verify a Processor Module
Related Information
■
■
■
■
■
“Processor Module Components” on page 19
“Processor Module LEDs” on page 59
“Determine Which Processor Module Is Faulty” on page 60
“Remove a Processor Module or Processor Filler Module” on page 60
“Install a Processor Module or Processor Filler Module” on page 64
68SPARC T7-4 Server Service Manual • May 2017
Page 69
Servicing DIMMs
Up to 32 DIMMs can be installed in each processor module, for a total of 64 DIMMs in the
server.
DIMMs are cold-service components that can be replaced by customers. For the location of the
DIMMs, see “Processor Module Components” on page 19.
Caution - This procedure requires that you handle components that are sensitive to electrostatic
discharge. This discharge can cause failure of server components.
DescriptionLinks
Understand how to replace DIMMs■ “Understanding DIMM Configurations” on page 69
■ “Identifying DIMMs” on page 71
Locate a faulty DIMM■ “DIMM Fault Handling” on page 74
■ “Determine Which DIMM Is Faulty
(PSH)” on page 75
■ “Determine Which DIMM Is Faulty (DIMM Fault
LEDs)” on page 76
■ “DIMM Configuration Errors” on page 72
Replace a DIMM■ “Remove a DIMM” on page 78
■ “Install a DIMM” on page 80
■ “Verify a DIMM” on page 83
Understanding DIMM Configurations
These topics describe DIMM configurations:
■
“Supported Memory Configurations” on page 70
■
“Identifying DIMMs” on page 71
■
“DIMM Configuration Errors” on page 72
Servicing DIMMs69
Page 70
Understanding DIMM Configurations
Supported Memory Configurations
The server supports 16-Gbyte, 32-Gbyte, and 64-Gbyte DIMMs, with up to 4096 Gbytes in a
server fully configured with two processor modules.
Each processor module can be either half populated (16 DIMMs) or fully populated (32
DIMMs).
Consider these population rules when installing, upgrading, or replacing DIMMs in a processor
module:
■
In half-populated configurations, 16 DIMMs must be installed in all CH0 slots.
These slots have black ejector levers.
■
In fully-populated configurations (32 DIMMs), DIMMs must be installed in all slots (CH0
and CH1)
Note - The DIMM sparing feature is available only in fully-populated servers.
■
All DIMMs associated with each CMx must be identical (same size, same rank
classification).
■
Mixed configurations are supported (DIMMs associated with CM0 with one size, and
DIMMs associated with CM1 with a different size) as long as all DIMMs in the server have
the same rank classification. For example, 32 Gbyte 4Rx4 DIMMs associated with PM0/
CM0, and 64 Gbyte 4Rx4 DIMMs associated with PM0/CM1.
To identify DIMM architecture, see “Identifying DIMMs” on page 71.
Related Information
■
“DIMM FRU Names” on page 73
■
“Identifying DIMMs” on page 71
■
“Remove a DIMM” on page 78
■
“Install a DIMM” on page 80
■
“Verify a DIMM” on page 83
■
“Server Upgrade Process” on page 56
■
“Processor Module Configuration” on page 57
70SPARC T7-4 Server Service Manual • May 2017
Page 71
Understanding DIMM Configurations
Identifying DIMMs
Each DIMM is affixed with an identifying label. The first four characters on the label describe
the DIMM memory capacity; the second four characters describe the rank classification.
Use these labels to identify the DIMMs installed in the server, to verify that any replacement
DIMMs are compatible, or to confirm that upgrade DIMMs may be installed in a supported
configuration.
The following DIMMs are supported:
DIMM CapacityDRAM DensityRank ClassificationLabel
16 Gbyte4 GbitDual-rank x42Rx4
32 Gbyte4 GbitQuad-rank x44Rx4
32 Gbyte8 GbitDual-rank x42Rx4
64 Gbyte8 GbitQuad-rank x44Rx4
Servicing DIMMs71
Page 72
Understanding DIMM Configurations
Related Information
■
“Understanding DIMM Configurations” on page 69
■
“DIMM FRU Names” on page 73
■
“DIMM Configuration Errors” on page 72
DIMM Configuration Errors
When the server boots, system firmware checks the memory configuration against the rules
described in “Understanding DIMM Configurations” on page 69. If any violations of these
rules are detected, the following general error message is displayed.
Please refer to the service documentation for supported memory
configurations.
In some cases, the server boots in a degraded state, and a message such as the following is
displayed:
WARNING: Running with a nonstandard DIMM configuration. Refer to service document for
details.
In other cases, the configuration error is fatal, and the following message is displayed:
Fatal configuration error - forcing power-down
In addition to these general memory configuration errors, one or more rule-specific messages is
displayed, indicating the type of configuration error detected. To identify the DIMMs affected,
use the fmadm faulty command as described in “Check for Faults” on page 33.
Related Information
■
“Check for Faults” on page 33
■
“Clear a Fault Manually” on page 40
■
“Understanding DIMM Configurations” on page 69
■
“DIMM FRU Names” on page 73
■
“Identifying DIMMs” on page 71
■
“DIMM Fault Handling” on page 74
72SPARC T7-4 Server Service Manual • May 2017
Page 73
DIMM FRU Names
The following table illustrates the DIMM addresses on a processor module, with the front of the
processor module oriented toward the left:
DIMM FRU Names
CM1/BOB21/CH1
CM1/BOB21/CH0
CM1/BOB20/CH0
CM1/BOB20/CH1
CM1/BOB30/CH1
CM1/BOB30/CH0
CM1/BOB31/CH0
CM1/BOB31/CH1
CM0/BOB21/CH1
CM0/BOB21/CH0
CM0/BOB20/CH0
CM0/BOB20/CH1
CM0/BOB30/CH1
CM0/BOB30/CH0
CM0/BOB31/CH0
CM0/BOB31/CH1
CM1/BOB01/CH1
CM1/BOB01/CH0
CM1/BOB00/CH0
CM1/BOB00/CH1
CM1/BOB10/CH1
CM1/BOB10/CH0
CM1/BOB11/CH0
CM1/BOB11/CH1
CM0/BOB01/CH1
CM0/BOB01/CH0
CM0/BOB00/CH0
CM0/BOB00/CH1
CM0/BOB10/CH1
CM0/BOB10/CH0
CM0/BOB11/CH0
CM0/BOB11/CH1
CM1
CM0
DIMM NAC names are based both on the location of the DIMM slot on the processor module,
and in which slot the processor module is installed. For example, the full NAC name for the
DIMM installed in the front-left corner on a processor module installed at PM0 is:
/SYS/PM0/CM1/CMP/BOB21/CH1/DIMM
Related Information
■
“Servicing Processor Modules” on page 55
■
“Understanding DIMM Configurations” on page 69
■
“Identifying DIMMs” on page 71
■
“DIMM Fault Handling” on page 74
■
“DIMM Configuration Errors” on page 72
Servicing DIMMs73
Page 74
DIMM Fault Handling
DIMM Fault Handling
A variety of features play a role in how the memory subsystem is configured and how memory
faults are handled. Understanding the underlying features helps you identify and repair memory
problems.
The following server features manage memory faults:
■
POST – By default, POST runs when the server is powered on.
For CEs, POST forwards the error to the PSH daemon for error handling. If an
uncorrectable memory fault is detected, POST displays the fault with the device name of the
faulty DIMMs, and logs the fault. POST then disables the faulty DIMMs. Depending on the
memory configuration and the location of the faulty DIMM, POST disables half of physical
memory in the server, or half the physical memory and half the processor threads. When
this offlining process occurs in normal operation, you must replace the faulty DIMMs based
on the fault message and enable the disabled DIMMs with the Oracle ILOM command setdevicecomponent_state=enabled where device is the name of the DIMM being enabled.
■
PSH technology – The Oracle PSH feature uses the Fault Manager daemon (fmd) to watch
for various kinds of faults. When a fault occurs, the fault is assigned a UUID and logged.
PSH reports the fault and suggests a replacement for the DIMMs associated with the fault.
If you suspect the server has a memory problem, run the Oracle ILOM show faulty command.
This command lists memory faults and identifies the DIMM modules associated with the fault.
Related Information
■
“POST Overview” on page 37
■
“Understanding DIMM Configurations” on page 69
■
“DIMM FRU Names” on page 73
■
“DIMM Configuration Errors” on page 72
Identifying Faulty DIMMs
You can identify faulty DIMMs using the following methods:
■
“Determine Which DIMM Is Faulty (Oracle ILOM)” on page 75
■
“Determine Which DIMM Is Faulty (PSH)” on page 75
74SPARC T7-4 Server Service Manual • May 2017
Page 75
Determine Which DIMM Is Faulty (Oracle ILOM)
■
“Determine Which DIMM Is Faulty (DIMM Fault LEDs)” on page 76
Determine Which DIMM Is Faulty (Oracle ILOM)
If you suspect that the server has a memory problem, run the Oracle ILOM showfaulty command.
This command lists memory faults and identifies the DIMM modules associated with the fault.
Related Information
■
“Determine Which DIMM Is Faulty (PSH)” on page 75
■
“Determine Which DIMM Is Faulty (DIMM Fault LEDs)” on page 76
Determine Which DIMM Is Faulty (PSH)
The Oracle Fault Management tool fmadm faulty displays current server faults, including
DIMM failures.
1.
Start the Fault Management Shell:
-> start /SP/faultmgmt/shell
Are you sure you want to start /SP/faultmgmt/shell (y/n)? y
2.
Type:
faultmgmtsp> fmadm faulty
------------------- ------------------------------------ -------------- ----- Time UUID msgid Severity
------------------- ------------------------------------ -------------- -----2014-08-18/21:04:40 7040d859-5b03-4a58-8dfd-e3a80875d62f SPSUN4V-8000-EJ Critical
Problem Status : solved
Diag Engine : fdd 1.0
System
Manufacturer : Oracle Corporation
Name : SPARC T7-4
Part_Number : 7021179
Serial_Number : 1201CTHC01
System Component
Manufacturer : Oracle Corporation
Name : SPARC T7-4
Part_Number : 7021179
Servicing DIMMs75
Page 76
Determine Which DIMM Is Faulty (DIMM Fault LEDs)
Serial_Number : 1201CTHC01
---------------------------------------Suspect 1 of 1
Fault class : fault.memory.dimm-ue
Certainty : 100%
Affects : /SYS/PM0/CM1/CMP/BOB10/CH0/DIMM
Status : faulted but still in service
FRU
Status : faulty
Location : /SYS/PM0/CM1/CMP/BOB10/CH0/DIMM
Manufacturer : Samsung
Name : 16384MB DDR4 SDRAM DIMM
Part_Number : 07042208,M393B1K70DH0-YK0
Revision : 04
Serial_Number : 00CE0212153367DD4B
Chassis
Manufacturer : Oracle Corporation
Name : SPARC T7-4
Part_Number : 7021179
Serial_Number : 1201CTHC01
Description : Uncorrectable errors have occurred while accessing memory.
Response : An attempt will be made to remove the affected memory from
service. Host HW may restart.
Impact : Total system memory capacity has been reduced and some
applications may have been terminated.
Action : Use 'fmadm faulty' to provide a more detailed view of this
event. Please refer to the associated reference document at
http://support.oracle.com/msg/SPSUN4V-8000-EJ for the latest
service procedures and policies regarding this diagnosis.
Related Information
■
“Determine Which DIMM Is Faulty (Oracle ILOM)” on page 75
■
“Determine Which DIMM Is Faulty (DIMM Fault LEDs)” on page 76
Determine Which DIMM Is Faulty (DIMM Fault
LEDs)
DIMMs are cold-service components that can be replaced by customers. For the location of the
DIMMs, see “Processor Module Components” on page 19.
Caution - This procedure requires that you handle components that are sensitive to electrostatic
discharge. This discharge can cause failure of server components.
76SPARC T7-4 Server Service Manual • May 2017
Page 77
Determine Which DIMM Is Faulty (DIMM Fault LEDs)
1.
Consider your first steps.
■Familiarize yourself with DIMM configuration rules.
See “Understanding DIMM Configurations” on page 69
■Prepare the system for service.
See “Preparing for Service” on page 43.
■Remove the processor module containing the faulty DIMM. Place the
processor module on an ESD-protect work surface. Remove the processor
module cover.
See “Remove a Processor Module or Processor Filler Module” on page 60.
2.
Locate the DIMM Fault Remind button on the processor module.
Servicing DIMMs77
Page 78
Remove a DIMM
3.
Verify that the Memory Riser Power LED next to the button is illuminated.
An illuminated Memory Riser Power LED indicates that there is power available to illuminate
any Memory DIMM Fault LEDs once you have pressed the DIMM Fault Remind button.
4.
Press the DIMM Fault Remind button on the processor module.
This will cause the Memory DIMM Fault LEDs associated with any faulty DIMMs to
illuminate for a few minutes.
5.
Note the address of the DIMM next to any illuminated Memory DIMM Fault LED.
6.
Ensure that all other DIMMs are seated correctly in their slots.
Related Information
■
“Determine Which DIMM Is Faulty (Oracle ILOM)” on page 75
■
“Determine Which DIMM Is Faulty (PSH)” on page 75
Remove a DIMM
DIMMs are cold-service components that can be replaced by customers. For the location of the
DIMMs, see “Processor Module Components” on page 19.
Caution - This procedure requires that you handle components that are sensitive to electrostatic
discharge. This discharge can cause failure of server components.
1.
Consider your first steps.
■Familiarize yourself with DIMM population rules.
See “Understanding DIMM Configurations” on page 69
■Prepare the system for service.
See “Preparing for Service” on page 43.
■Remove the processor module. Place the processor module on an ESD-
protect work surface.
See “Remove a Processor Module or Processor Filler Module” on page 60.
2.
Remove the cover from the processor module.
78SPARC T7-4 Server Service Manual • May 2017
Page 79
Press the green button near the front edge of the cover and slide the cover back and up off the
main module.
3.
Locate the DIMMs that need to be replaced.
See “Determine Which DIMM Is Faulty (DIMM Fault LEDs)” on page 76.
4.
Push down on the ejector tabs on each side of the DIMM until the DIMM is
released.
Caution - DIMMs and heat sinks on the motherboard might be hot.
Remove a DIMM
5.
Grasp the top corners of the faulty DIMM and lift it out of its slot.
6.
Place the DIMM on an antistatic mat.
7.
Repeat Step 4 through Step 6 for any other DIMMs you intend to remove.
8.
Determine your next step:
Servicing DIMMs79
Page 80
Install a DIMM
■If you are installing replacement DIMMs at this time, go to “Install a
DIMM” on page 80.
■If you are not installing replacement DIMMs at this time, go to Step 9.
9.
Return the server to operation.
See:
■
Install the processor module.
See “Install a Processor Module or Processor Filler Module” on page 64.
■
Power on the server.
See “Power On the Server (Oracle ILOM)” on page 202.
■
Verify DIMM functionality.
See “Verify a DIMM” on page 83.
Related Information
■
“Understanding DIMM Configurations” on page 69
■
“Understanding DIMM Configurations” on page 69
■
“Determine Which DIMM Is Faulty (DIMM Fault LEDs)” on page 76
■
“Determine Which DIMM Is Faulty (PSH)” on page 75
■
“Install a DIMM” on page 80
■
“Verify a DIMM” on page 83
Install a DIMM
DIMMs are cold-service components that can be replaced by customers. For the location of the
DIMMs, see “Processor Module Components” on page 19.
Caution - This procedure requires that you handle components that are sensitive to electrostatic
discharge. This discharge can cause failure of server components.
1.
Consider your first steps.
■Familiarize yourself with DIMM population rules.
See “Understanding DIMM Configurations” on page 69
80SPARC T7-4 Server Service Manual • May 2017
Page 81
■Prepare the system for service.
See “Preparing for Service” on page 43.
■Remove the processor module. Place the processor module on an ESD-
protect work surface.
See “Remove a Processor Module or Processor Filler Module” on page 60.
2.
Consider your next steps.
■If you are replacing a faulty DIMM, ensure that you have removed the faulty
DIMM.
See “Identifying Faulty DIMMs” on page 74.
See “Remove a DIMM” on page 78.
■If you are adding DIMMs to a half-populated processor module:
Ensure you have the correct DIMMs for your server. See “Identifying
DIMMs” on page 71.
Install a DIMM
■If you are populating a new processor module:
Ensure you have the correct DIMMs for your server. See “Understanding DIMM
Configurations” on page 69.
3.
Unpack the replacement DIMMs and place them on an antistatic mat.
4.
Ensure that the ejector tabs on the connector that will receive the DIMM are in
the open position.
5.
Align the DIMM notch with the key in the connector.
Servicing DIMMs81
Page 82
Install a DIMM
Caution - Ensure that the orientation is correct. The DIMM might be damaged if the orientation
is reversed.
6.
Push the DIMM into the connector until the ejector tabs lock the DIMM in place.
If the DIMM does not easily seat into the connector, check the DIMM's orientation.
7.
Repeat Step 4 through Step 6 until all new DIMMs are installed.
8.
Place the cover onto the processor module and slide the cover forward until the
latch clicks into place.
9.
Consider your next steps.
■If you are adding a second processor module to the server, return to “Server
Upgrade Process” on page 56.
82SPARC T7-4 Server Service Manual • May 2017
Page 83
■If you are replacing a processor module after installing replacement DIMMs,
proceed to Step 10.
10.
Finish the installation procedure.
See:
■
Install the processor module.
See “Install a Processor Module or Processor Filler Module” on page 64.
■
Return the server to operation.
See “Returning the Server to Operation” on page 201.
■
Verify DIMM functionality.
See “Verify a DIMM” on page 83.
Related Information
■
“Understanding DIMM Configurations” on page 69
■
“Understanding DIMM Configurations” on page 69
■
“Identifying DIMMs” on page 71
■
“Remove a DIMM” on page 78
■
“Verify a DIMM” on page 83
■
“DIMM Fault Handling” on page 74
■
“DIMM Configuration Errors” on page 72
Verify a DIMM
Verify a DIMM
1.
Access the Oracle ILOM prompt.
Refer to the SPARC T7 Series Server Administration Guide for instructions.
2.
Use the show faulty command to determine how to clear the fault.
■
If show faulty indicates a POST-detected fault, go to Step 3.
■
If show faulty output displays a UUID, which indicates a host-detected fault, skip Step 3
and go directly to Step 4.
3.
Use the set command to enable the DIMM that was disabled by POST.
In most cases, replacement of a faulty DIMM is detected when the service processor is power
cycled. In those cases, the fault is automatically cleared from the server. If show faulty still
displays the fault, the set command will clear it.
Servicing DIMMs83
Page 84
Verify a DIMM
-> set /SYS/PM0/CM0/CMP/BOB10/CH0/DIMM requested_config_state=Enabled
Set 'requested_config_state' to 'enabled'
4.
For a host-detected fault, perform the following steps to verify the new DIMM:
a.
Set the virtual keyswitch to diag so that POST will run in Service mode.
-> set /HOST keyswitch_state=diag
Set 'keyswitch_state' to 'diag'
b.
Power cycle the server.
-> stop /System
Are you sure you want to stop /System (y/n)? y
Stopping /System
-> start /System
Are you sure you want to start /System (y/n)? y
Starting /System
c.
Use the show /HOST command to determine when the host has been powered
off.
The console will display status=Powered Off. Allow approximately one minute before
running this command.
d.
Switch to the system console to view POST output.
Watch the POST output for possible fault messages. The following output indicates that
POST did not detect any faults:
-> start /HOST/console
...
0:0:0>INFO:
0:0:0> POST Passed all devices.
0:0:0>POST: Return to VBSC.
0:0:0>Master set ACK for vbsc runpost command and spin...
Note - The server might boot automatically at this point. If so, go directly to Step 4g. If it
remains at the ok prompt go to Step 4e.
e.
If the server remains at the ok prompt, type boot.
f.
Return the virtual keyswitch to normal mode.
-> set /SYS keyswitch_state=normal
84SPARC T7-4 Server Service Manual • May 2017
Page 85
Set 'ketswitch_state' to 'normal'
g.
Switch to the system console and check for faults.
# fmadm faulty
If any faults are reported, refer to the diagnostics instructions described in “Check for
If the show faulty command reports a fault with a UUID, go on to Step 7. If show faulty does
not report a fault with a UUID, you are done with the verification process.
7.
Switch to the system console and use the fmadm repair command with the UUID.
Use the same UUID that was displayed from the output of the Oracle ILOM show faulty
command. For example:
# fmadm repair 3aa7c854-9667-e176-efe5-e487e520
Related Information
■
“Understanding DIMM Configurations” on page 69
■
“Understanding DIMM Configurations” on page 69
■
“DIMM Fault Handling” on page 74
■
“DIMM Configuration Errors” on page 72
■
“Determine Which DIMM Is Faulty (DIMM Fault LEDs)” on page 76
■
“Determine Which DIMM Is Faulty (PSH)” on page 75
■
“Remove a DIMM” on page 78
■
“Install a DIMM” on page 80
Servicing DIMMs85
Page 86
86SPARC T7-4 Server Service Manual • May 2017
Page 87
Servicing Hard Drives
Hard drives are hot-service components that can be replaced by customers. For the location of
the hard drives, see “Hard Drive Configuration” on page 87.
Caution - This procedure requires that you handle components that are sensitive to electrostatic
discharge. This discharge can cause failure of server components.
These topics describe service procedures for the hard drives in the server.
■
“Hard Drive Configuration” on page 87
■
“Hard Drive Configuration” on page 87
■
“Hard Drive LEDs” on page 89
■
“Determine Which Hard Drive Is Faulty” on page 90
■
“Remove a Hard Drive” on page 90
■
“Install a Hard Drive” on page 94
■
“Verify a Hard Drive” on page 95
Hard Drive Configuration
You can install a mix of hard drives and solid state drives. The server requires at least one hard
drive to be installed and operational.
Servicing Hard Drives87
Page 88
Hard Drive Configuration
No.DescriptionNo.Description
1Drive 15Drive 5
2Drive 06Drive 4
3Drive 37Drive 7
4Drive 28Drive 6
The hard drives in the server are hot-serviceable, meaning that the drives can be removed and
inserted while the server is powered on.
Depending on the configuration of the data on a particular drive, the drive might also be
removable while the server is online. However, to hot-service a drive while the server is online
you must take the drive offline before you can safely remove it. Taking a drive offline prevents
any applications from accessing it, and removes logical software links to it.
You cannot hot-service a drive in the following situations:
■
If the drive contains the operating system and the operating system is not mirrored on
another drive.
■
If the drive cannot be logically isolated from the online operations of the server.
If either of these conditions apply to the drive being serviced, you must take the server offline
(shut down the operating system) before you replace the drive.
Related Information
■
“Supported Storage and Backup Devices” on page 22
■
“Component Service Task Reference” on page 22
■
“Hard Drive LEDs” on page 89
■
“Determine Which Hard Drive Is Faulty” on page 90
■
“Remove a Hard Drive” on page 90
88SPARC T7-4 Server Service Manual • May 2017
Page 89
■
“Install a Hard Drive” on page 94
■
“Verify a Hard Drive” on page 95
Hard Drive LEDs
Hard Drive LEDs
No.LEDIconDescription
1Ready to Remove
(blue)
2Service Required
(amber)
3OK/Activity
(green)
Indicates that a drive can be removed during a hot-service operation.
Indicates that the drive has experienced a fault condition.
Indicates the drive's availability for use.
■ On – Read or write activity is in progress.
■ Off – Drive is idle and available for use.
Related Information
■
“Hard Drive Configuration” on page 87
■
“Hard Drive Configuration” on page 87
■
“Determine Which Hard Drive Is Faulty” on page 90
■
“Remove a Hard Drive” on page 90
■
“Install a Hard Drive” on page 94
■
“Verify a Hard Drive” on page 95
Servicing Hard Drives89
Page 90
Determine Which Hard Drive Is Faulty
Determine Which Hard Drive Is Faulty
The following LEDs are lit when a hard drive fault is detected:
■
System Service Required LEDs on the front panel and rear I/O module
■
Service Required LED on the faulty drive
1.
Determine if the System Service Required LEDs are lit on the front panel or the
rear I/O module.
See “Interpreting LEDs” on page 27.
2.
From the front of the server, check the drive LEDs to identify which drive needs
to be replaced.
See “Hard Drive LEDs” on page 89. The amber Service Required LED is lit on the drive
that needs to be replaced.
3.
Remove the faulty drive.
See “Remove a Hard Drive” on page 90.
Related Information
■
“Hard Drive Configuration” on page 87
■
“Hard Drive Configuration” on page 87
■
“Hard Drive LEDs” on page 89
■
“Remove a Hard Drive” on page 90
■
“Install a Hard Drive” on page 94
■
“Verify a Hard Drive” on page 95
Remove a Hard Drive
Hard drives are hot-service components that can be replaced by customers. For the location of
the hard drives, see “Hard Drive Configuration” on page 87.
Caution - This procedure requires that you handle components that are sensitive to electrostatic
discharge. This discharge can cause failure of server components.
1.
Locate the drive in the server that you want to remove.
■
See “Hard Drive Configuration” on page 87 for the locations of the drives in the server.
90SPARC T7-4 Server Service Manual • May 2017
Page 91
Remove a Hard Drive
■
See “Determine Which Hard Drive Is Faulty” on page 90 to locate a faulty drive.
2.
Determine if you need to shut down the OS to replace the drive, and perform one
of the following actions:
■If the drive cannot be taken offline without shutting down the OS, follow
instructions in “Power Off the Server (Oracle ILOM)” on page 51, and go to
Step 4.
■If the drive can be taken offline without shutting down the OS, go to Step 3.
3.
Take the drive offline:
■For a standard drive:
a.
At the Oracle Solaris prompt, type the cfgadm -al command to list all
drives in the device tree, including drives that are not configured.
# cfgadm -al
This command lists dynamically reconfigurable hardware resources and shows their
operational status. In this case, look for the status of the drive you plan to remove.
This information is listed in the Occupant column.
You must unconfigure any drive whose status is listed as configured, as described in
Step 31b.
b.
Unconfigure the drive using the cfgadm -cunconfigure command.
Example:
Servicing Hard Drives91
Page 92
Remove a Hard Drive
# cfgadm -c unconfigure c2::w5000cca00a76d1f5,0
Replace c2::w5000cca00a76d1f5,0 with the drive name that applies to your
situation.
c.
Verify that the blue Ready to Remove LED on the drive is lit.
■For an NVMe Drive:
a.
Determine the name of the NVMe drive to be removed.
# hotplug list -lc
Locate the name of the drive, such as /SYS/DBP/NVME0 in this example.
You can use this same command to check the state of the drive at other stages of the
removal procedure.
b.
Disable the NVMe drive.
# hotplug disable /SYS/DBP/NVME0
Check that the drive's state has changed from ENABLED to POWERED.
# hotplug list -lc
c.
Power down the NVMe drive.
# hotplug poweroff /SYS/DBP/NVME0
Check that the drive's state has changed from POWERED to PRESENT.
# hotplug list -lc
In this state, the blue OK to Remove LED on the NVMe drive is lit.
Note - Do not remove the drive unless the blue OK to Remove LED is lit.
92SPARC T7-4 Server Service Manual • May 2017
Page 93
Remove a Hard Drive
4.
Press the drive release button to unlock the drive.
5.
Pull on the latch to remove the drive from the server.
Caution - The latch is not an ejector. Do not force the latch too far to the right. Doing so can
damage the latch.
6.
After you remove an NVMe drive, check that the drive slot's state has changed to
EMPTY.
# hotplug list -lc
7.
Install the replacement drive or a filler tray.
Servicing Hard Drives93
Page 94
Install a Hard Drive
See “Install a Hard Drive” on page 94.
Related Information
■
“Determine Which Hard Drive Is Faulty” on page 90
■
“Install a Hard Drive” on page 94
■
“Verify a Hard Drive” on page 95
Install a Hard Drive
Hard drives are hot-service components that can be replaced by customers. For the location of
the hard drives, see “Hard Drive Configuration” on page 87.
Caution - This procedure requires that you handle components that are sensitive to electrostatic
discharge. This discharge can cause failure of server components.
1.
Align the replacement drive to the drive slot, and slide the drive in until it is
seated.
Drives are physically addressed according to the slot in which they are installed. If you are
replacing a drive, install the replacement drive in the same slot as the drive that was removed.
See “Hard Drive Configuration” on page 87 for drive slot information.
2.
Close the latch to lock the drive in place.
3.
Verify the installation.
94SPARC T7-4 Server Service Manual • May 2017
Page 95
See “Verify a Hard Drive” on page 95.
Related Information
■
“Determine Which Hard Drive Is Faulty” on page 90
■
“Remove a Hard Drive” on page 90
■
“Verify a Hard Drive” on page 95
Verify a Hard Drive
1.
Determine if you replaced or installed a hard drive in a running server or not.
■If you replaced or installed a hard drive in a server that is running (if you
hot-serviced the hard drive), then no further action is necessary. The Oracle
Solaris OS auto-configures the hard drive.
■If you replaced or installed a hard drive in a powered-down server, then
continue with these procedures to configure the hard drive.
Verify a Hard Drive
■If you hot-serviced an NVMe drive, it should automatically power up and
attach. If not, power up and attach the drive manually.
# hotplug enable /SYS/DBP/NVME0
Check that the drive's state has changed to ENABLED.
# hotplug list –lc
2.
If the OS is shut down, and the drive you replaced was not the boot device, boot
the OS.
Depending on the nature of the replaced drive, you might need to perform administrative tasks
to reinstall software before the server can boot. Refer to the Oracle Solaris OS administration
documentation for more information.
3.
At the Oracle Solaris prompt, type the cfgadm-al command to list all drives in the
device tree, including any drives that are not configured.
# cfgadm -al
This command helps you identify the drive you installed. For example:
Perform one of the following tasks based on your verification results.
■If the previous steps did not verify the drive, see “Diagnostics
Process” on page 26.
96SPARC T7-4 Server Service Manual • May 2017
Page 97
Verify a Hard Drive
■If the previous steps indicate that the drive is functioning properly, perform
the tasks required to configure the drive. These tasks are covered in the
Oracle Solaris OS administration documentation.
For additional drive verification, you can run the Oracle VTS software. Refer to the Oracle VTS
documentation for details.
Related Information
■
“Determine Which Hard Drive Is Faulty” on page 90
■
“Remove a Hard Drive” on page 90
■
“Install a Hard Drive” on page 94
Servicing Hard Drives97
Page 98
98SPARC T7-4 Server Service Manual • May 2017
Page 99
Servicing the Main Module
For the location of the main module, see “Front Panel Components (Service)” on page 14.
Caution - This procedure requires that you handle components that are sensitive to electrostatic
discharge. This discharge can cause failure of server components.
Caution - You must disconnect the power cords before servicing this component. See
“Disconnect Power Cords” on page 53.
StepDescriptionLink
1.Determine if the main module is faulty.“Main Module LEDs” on page 100
2.Prepare the server for service.“Preparing for Service” on page 43
3.Remove the main module.“Remove the Main Module” on page 101
4.Service main module components.■ “Servicing NVMe Switch Cards” on page 111
■ “Servicing the Drive Backplane” on page 119
■ “Servicing the SPM” on page 125
■ “Servicing the SCC PROM” on page 133
■ “Servicing the Battery” on page 137
■ “Servicing the Front I/O Assembly” on page 143
5.Install the main module.■ “Install the Main Module” on page 105
■ “Verify the Main Module” on page 108
6.Return the server to operation.“Returning the Server to Operation” on page 201
Servicing the Main Module99
Page 100
Main Module LEDs
Main Module LEDs
No.LEDIconDescription
1Service Required LED
(amber)
2Power OK LED (green)Indicates these conditions:
3SPM LED
(green)
SPMIndicates these conditions:
Indicates that service is required. POST and Oracle ILOM are two
diagnostic tools that can detect a fault or failure resulting in this
indication.
The Oracle ILOM show faulty command provides details about any
faults that cause this indicator to illuminate.
Under some fault conditions, individual component fault LEDs are lit in
addition to the Service Required LED.
■ Off – System is not running in its normal state. System power might
be off. The SPM might be running.
■ Steady on – System is powered on and is running in its normal
operating state. No service actions are required.
■ Fast blink – System is running in standby mode and can be quickly
returned to full function.
■ Slow blink – A normal but transitory activity is taking place. Slow
blinking might indicate that system diagnostics are running or that
the system is booting.
■ Off – AC power might have been connected to the power supplies.
100SPARC T7-4 Server Service Manual • May 2017
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.