Oracle SPARC T7-4 Service Manual

Page 1

SPARC T7-4 Server Service Manual

Part No: E54994-07
May 2017
Page 2
Page 3
SPARC T7-4 Server Service Manual
Part No: E54994-07
Copyright © 2015, 2017, Oracle and/or its affiliates. All rights reserved.
This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited.
The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing.
If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable:
U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs. No other rights are granted to the U.S. Government.
This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications.
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group.
This software or hardware and documentation may provide access to or information about content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services unless otherwise set forth in an applicable agreement between you and Oracle. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services, except as set forth in an applicable agreement between you and Oracle.
Access to Oracle Support
Oracle customers that have purchased support have access to electronic support through My Oracle Support. For information, visit http://www.oracle.com/pls/topic/lookup?
ctx=acc&id=info or visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs if you are hearing impaired.
Page 4
Référence: E54994-07
Copyright © 2015, 2017, Oracle et/ou ses affiliés. Tous droits réservés.
Ce logiciel et la documentation qui l'accompagne sont protégés par les lois sur la propriété intellectuelle. Ils sont concédés sous licence et soumis à des restrictions d'utilisation et de divulgation. Sauf stipulation expresse de votre contrat de licence ou de la loi, vous ne pouvez pas copier, reproduire, traduire, diffuser, modifier, accorder de licence, transmettre, distribuer, exposer, exécuter, publier ou afficher le logiciel, même partiellement, sous quelque forme et par quelque procédé que ce soit. Par ailleurs, il est interdit de procéder à toute ingénierie inverse du logiciel, de le désassembler ou de le décompiler, excepté à des fins d'interopérabilité avec des logiciels tiers ou tel que prescrit par la loi.
Les informations fournies dans ce document sont susceptibles de modification sans préavis. Par ailleurs, Oracle Corporation ne garantit pas qu'elles soient exemptes d'erreurs et vous invite, le cas échéant, à lui en faire part par écrit.
Si ce logiciel, ou la documentation qui l'accompagne, est livré sous licence au Gouvernement des Etats-Unis, ou à quiconque qui aurait souscrit la licence de ce logiciel pour le compte du Gouvernement des Etats-Unis, la notice suivante s'applique :
U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs. No other rights are granted to the U.S. Government.
Ce logiciel ou matériel a été développé pour un usage général dans le cadre d'applications de gestion des informations. Ce logiciel ou matériel n'est pas conçu ni n'est destiné à être utilisé dans des applications à risque, notamment dans des applications pouvant causer un risque de dommages corporels. Si vous utilisez ce logiciel ou ce matériel dans le cadre d'applications dangereuses, il est de votre responsabilité de prendre toutes les mesures de secours, de sauvegarde, de redondance et autres mesures nécessaires à son utilisation dans des conditions optimales de sécurité. Oracle Corporation et ses affiliés déclinent toute responsabilité quant aux dommages causés par l'utilisation de ce logiciel ou matériel pour des applications dangereuses.
Oracle et Java sont des marques déposées d'Oracle Corporation et/ou de ses affiliés. Tout autre nom mentionné peut correspondre à des marques appartenant à d'autres propriétaires qu'Oracle.
Intel et Intel Xeon sont des marques ou des marques déposées d'Intel Corporation. Toutes les marques SPARC sont utilisées sous licence et sont des marques ou des marques déposées de SPARC International, Inc. AMD, Opteron, le logo AMD et le logo AMD Opteron sont des marques ou des marques déposées d'Advanced Micro Devices. UNIX est une marque déposée de The Open Group.
Ce logiciel ou matériel et la documentation qui l'accompagne peuvent fournir des informations ou des liens donnant accès à des contenus, des produits et des services émanant de tiers. Oracle Corporation et ses affiliés déclinent toute responsabilité ou garantie expresse quant aux contenus, produits ou services émanant de tiers, sauf mention contraire stipulée dans un contrat entre vous et Oracle. En aucun cas, Oracle Corporation et ses affiliés ne sauraient être tenus pour responsables des pertes subies, des coûts occasionnés ou des dommages causés par l'accès à des contenus, produits ou services tiers, ou à leur utilisation, sauf mention contraire stipulée dans un contrat entre vous et Oracle.
Accès aux services de support Oracle
Les clients Oracle qui ont souscrit un contrat de support ont accès au support électronique via My Oracle Support. Pour plus d'informations, visitez le site http://www.oracle.com/
pls/topic/lookup?ctx=acc&id=info ou le site http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs si vous êtes malentendant.
Page 5

Contents

Using This Documentation ............. ................ ................ ................ ................ ... 11
Product Documentation Library .......................................................................  11
Feedback ............. ................ ................ ................ ................ ................ ......... 11
Identifying Components .... ................ ................ ................ ................ ................  13
Front Panel Components (Service) ...................................................................  14
Rear Panel Components (Service) ....................................................................  16
Chassis Subassembly Components .................................................................... 18
Processor Module Components . ................ ................ ................ ................ .......  19
Main Module Components ..............................................................................  20
Supported Storage and Backup Devices .............................................................  22
Component Service Task Reference ... ...............................................................  22
Detecting and Managing Faults ........................................................................  25
Understanding Diagnostics ..............................................................................  25
PSH Overview ......................................................................................  25
Diagnostics Process ...............................................................................  26
Checking for Faults .......................................................................................  27
Interpreting LEDs ..................................................................................  27
▼ Log In to Oracle ILOM (Service) ........................................................ 32
▼ Check for Faults ..............................................................................  33
Interpreting Log Files and System Messages ......................................................  35
▼ Check the Message Buffer .................................................................  36
▼ View Log Files (Oracle Solaris) .......................................................... 36
▼ View Log Files (Oracle ILOM) ..........................................................  37
Configuring POST .........................................................................................  37
POST Overview ....................................................................................  37
▼ Configure POST ..............................................................................  38
5
Page 6
Contents
▼ Clear a Fault Manually .............................................................................  40
System Schematic .......................................................................................... 41
Related Information ...............................................................................  42
Preparing for Service ........................................................................................  43
Safety Information .........................................................................................  43
Safety Symbols ........... ................ ................ ................ ................ ..........  44
ESD Precautions ...................................................................................  44
Antistatic Wrist Strap ............................ ................ ................ ................ .  44
Antistatic Mat .......................................................................................  45
Related Information ...............................................................................  45
Tools Needed for Service ................................................................................  45
Component Fillers .........................................................................................  46
Component Service Categories ................................................................. .......  46
▼ Find the Server Serial Number ............... ................ ................ ................ ....  47
▼ Locate the Server .....................................................................................  49
▼ Prevent ESD Damage ............... ................ ................ ................ ................  49
Removing Power From the Server .................................................................... 50
▼ Prepare to Power Off the Server .........................................................  51
▼ Power Off the Server (Oracle ILOM) ...................................................  51
▼ Power Off the Server (Power Button – Graceful Shutdown) ...................... 52
▼ Power Off the Server (Power Button – Emergency Shutdown) ... ...............  52
▼ Disconnect Power Cords ...................................................................  53
Attachment of Devices During Service ......................................................  54
Servicing Processor Modules ...........................................................................  55
Server Upgrade Process ............. ................ ................ ................ ................ .....  56
Processor Module Configuration ......................................................................  57
Processor Module LEDs .................................................................................  59
▼ Determine Which Processor Module Is Faulty ...............................................  60
▼ Remove a Processor Module or Processor Filler Module .............. ................ ...  60
▼ Install a Processor Module or Processor Filler Module . ................ ................ ...  64
▼ Verify a Processor Module ......................... ................ ................ ................  67
Servicing DIMMs ...............................................................................................  69
Understanding DIMM Configurations ...............................................................  69
6 SPARC T7-4 Server Service Manual • May 2017
Page 7
Contents
Supported Memory Configurations ...........................................................  70
Identifying DIMMs .... ................ ............................................................  71
DIMM Configuration Errors ....................................................................  72
DIMM FRU Names ........... ................ ................ ................ ................ ............  73
DIMM Fault Handling ...................................................................................  74
Identifying Faulty DIMMs ..............................................................................  74
▼ Determine Which DIMM Is Faulty (Oracle ILOM) .................................  75
▼ Determine Which DIMM Is Faulty (PSH) .............................................  75
▼ Determine Which DIMM Is Faulty (DIMM Fault LEDs) .........................  76
▼ Remove a DIMM .....................................................................................  78
▼ Install a DIMM ........................................................................................ 80
▼ Verify a DIMM ........... ................ ................ ................ ................ ............. 83
Servicing Hard Drives .......................................................................................  87
Hard Drive Configuration ...... ................ ................ ................ ................ .........  87
Hard Drive LEDs ..........................................................................................  89
▼ Determine Which Hard Drive Is Faulty .......... ................ ................ ..............  90
▼ Remove a Hard Drive ........ ................ ................ ................ ................ .......  90
▼ Install a Hard Drive ..................................................................................  94
▼ Verify a Hard Drive ..................................................................................  95
Servicing the Main Module ............. ................ ................ ................ ................ ..  99
Main Module LEDs .....................................................................................  100
▼ Determine if the Main Module Is Faulty .....................................................  101
▼ Remove the Main Module ........................................................................ 101
▼ Install the Main Module ..........................................................................  105
▼ Verify the Main Module ..........................................................................  108
Servicing NVMe Switch Cards .........................................................................  111
▼ Disconnect the NVMe Cables ...................................................................  112
▼ Remove a NVMe Switch Card . ................ ................ ................ ................ .  113
▼ Install a NVMe Switch Card ....................................................................  114
▼ Connect the NVMe Cables .......................................................................  117
▼ Verify a NVMe Switch Card ....................................................................  117
Servicing the Drive Backplane ........................................................................  119
7
Page 8
Contents
▼ Remove the Drive Backplane ...................................................................  119
▼ Install the Drive Backplane ......... ................ ................ ................ .............  121
Servicing the SPM ................ ................ ................ ................ ................ ...........  125
▼ Determine if the SPM Is Faulty .............. ................ ................ ................ ..  125
▼ Remove the SPM .................................................... ................ ...............  126
▼ Install the SPM ......................................................................................  128
▼ Verify the SPM .... ................ ................ ................ ................ ................ ..  131
Servicing the SCC PROM ................................................................................  133
▼ Remove the SCC PROM .............................................................. ...........  133
▼ Install the SCC PROM ... ................ ................ ................ ................ .........  134
▼ Verify the SCC PROM ............................................................................  136
Servicing the Battery .................................. ................ ................ ................ .....  137
▼ Replace the Battery ...... ................ ................ ................ ................ ........... 137
▼ Verify the Battery ...................................................................................  140
Servicing the Front I/O Assembly ....... ................ ................ ............................  143
▼ Remove the Front I/O Assembly ............ ................ ................ ................ ...  143
▼ Install the Front I/O Assembly ..................................................................  145
Servicing Power Supplies ............. ................ ................ ................ ................ ..  147
Power Supply Configuration ..........................................................................  147
Power Supply and AC Power Connector LEDs ..... ................ ................ ............  150
▼ Determine Which Power Supply Is Faulty ...................................................  151
▼ Remove a Power Supply ..........................................................................  151
▼ Install a Power Supply ............................................................................  153
▼ Verify a Power Supply ..... ................ .......................................................  156
Servicing Fan Modules ........... ................ ................ ................ ................ .........  157
Fan Module Configuration ............................................................................  157
Fan Module LED .........................................................................................  158
▼ Determine Which Fan Module Is Faulty .............. ................ ................ .......  158
▼ Remove a Fan Module ................................... ................ ................ .........  159
8 SPARC T7-4 Server Service Manual • May 2017
Page 9
Contents
▼ Install a Fan Module ........ ................ ................ ................ ................ .......  161
▼ Verify a Fan Module ...............................................................................  162
Servicing PCIe Cards .. ................ ................ ................ ................ ................ ....  165
Understanding PCIe Root Complex Connections ...............................................  165
PCIe Card Configuration ............................................................................... 168
PCIe Carrier Handle and LEDs ......................................................................  169
▼ Determine Which PCIe Card Is Faulty ................... ................ ................ ....  170
▼ Remove a PCIe Card Carrier ...... ................ ................ ................ ..............  171
▼ Remove a PCIe Card ..............................................................................  174
▼ Install a PCIe Card ...................................................................... ...........  177
▼ Install a PCIe Card Carrier .......................................................................  180
▼ Verify a PCIe Card .................................................................................  181
Servicing the Rear I/O Module ................................... ................ ................ .....  183
Rear I/O Module LEDs ...................................... ................ ................ ..........  183
▼ Determine if the Rear I/O Module Is Faulty ................................................  186
▼ Remove the Rear I/O Module ...................................................................  186
▼ Install the Rear I/O Module .... ................ ................ ................ ................ .. 188
▼ Verify the Rear I/O Module ......................................................................  190
Servicing the Rear Chassis Subassembly .......... ................ ................ ............. 193
Rear Chassis Subassembly Components ........... ................................................  193
▼ Remove the Rear Chassis Subassembly ............... ................ ................ .......  194
▼ Install the Rear Chassis Subassembly .........................................................  197
▼ Verify the Rear Chassis Subassembly ...... ................ ................ ................ ...  198
Returning the Server to Operation ..................................................................  201
▼ Connect Power Cords .............................................................................. 201
▼ Power On the Server (Oracle ILOM) ........ ................ ................ ................ .  202
Index ............. ................ ................ ................ ................ ................ ................ ...  203
9
Page 10
10 SPARC T7-4 Server Service Manual • May 2017
Page 11

Using This Documentation

Overview – Describes how to troubleshooot and maintain the server
Audience – Technicians, system administrators, and authorized service providers
Required knowledge – Advanced experience troubleshooting and replacing hardware

Product Documentation Library

Documentation and resources for this product and related products are available at http://www.
oracle.com/goto/t7-4/docs.

Feedback

Provide feedback about this documentation at http://www.oracle.com/goto/docfeedback.
Using This Documentation 11
Page 12
12 SPARC T7-4 Server Service Manual • May 2017
Page 13

Identifying Components

These topics identify key components of the server, including major boards and internal system cables, as well as front and rear panel features.
“Front Panel Components (Service)” on page 14
“Rear Panel Components (Service)” on page 16
“Chassis Subassembly Components” on page 18
“Processor Module Components” on page 19
“Main Module Components” on page 20
“Supported Storage and Backup Devices” on page 22
“Component Service Task Reference” on page 22
“System Schematic” on page 41

Related Information

“Detecting and Managing Faults”
“Preparing for Service”
“Returning the Server to Operation”
Identifying Components 13
Page 14

Front Panel Components (Service)

Front Panel Components (Service)
No. Description Links
1 Processor modules (slots 0 and
1) or processor filler module (slot 1 only)
2 Control panel “Detecting and Managing Faults” on page 25
3 Main module “Main Module Components” on page 20
4 Power supplies (4) “Servicing Power Supplies” on page 147
14 SPARC T7-4 Server Service Manual • May 2017
“Processor Module Components” on page 19
“Servicing Processor Modules” on page 55
“Preparing for Service” on page 43
“Returning the Server to Operation” on page 201
“Servicing the Main Module” on page 99
Page 15

Related Information

“Rear Panel Components (Service)” on page 16
“Chassis Subassembly Components” on page 18
“Processor Module Components” on page 19
“Main Module Components” on page 20
“Supported Storage and Backup Devices” on page 22
“Component Service Task Reference” on page 22
“System Schematic” on page 41
Front Panel Components (Service)
Identifying Components 15
Page 16

Rear Panel Components (Service)

Rear Panel Components (Service)
No. Description Links
1 Fan modules (5) “Servicing Fan Modules” on page 157
2 AC power connectors (4) “Preparing for Service” on page 43
3 Rear I/O module “Servicing the Rear I/O Module” on page 183
4 PCIe carriers (16) “Servicing PCIe Cards” on page 165
These components are accessible within the rear chassis subassembly, which you can access after you have removed all the components from the rear of the server.
16 SPARC T7-4 Server Service Manual • May 2017
Page 17
Rear Panel Components (Service)
No. Description Links
1 Chassis
2 Midplane assembly “Servicing the Rear Chassis Subassembly” on page 193
3 Rear chassis subassembly “Servicing the Rear Chassis Subassembly” on page 193

Related Information

“Front Panel Components (Service)” on page 14
“Chassis Subassembly Components” on page 18
“Processor Module Components” on page 19
“Main Module Components” on page 20
“Supported Storage and Backup Devices” on page 22
“Component Service Task Reference” on page 22
“System Schematic” on page 41
Identifying Components 17
Page 18

Chassis Subassembly Components

Chassis Subassembly Components
No. Description Links
1 Hard drives (8) “Servicing Hard Drives” on page 87
2 Front I/O assembly “Servicing the Front I/O Assembly” on page 143
3 Main module “Servicing the Main Module” on page 99
4 System controls and indicators “Front Panel Controls and LEDs” on page 29
5 Processor modules (2) “Servicing Processor Modules” on page 55
6 Chassis
7 Rear chassis subassembly (RCSA) “Servicing the Rear Chassis Subassembly” on page 193
8 Fan modules (5) “Servicing Fan Modules” on page 157
9 PCIe carriers (16) “Servicing PCIe Cards” on page 165
10 Rear I/O module “Servicing the Rear I/O Module” on page 183
11 Power supplies (4) “Servicing Power Supplies” on page 147
18 SPARC T7-4 Server Service Manual • May 2017
Page 19

Related Information

“Front Panel Components (Service)” on page 14
“Rear Panel Components (Service)” on page 16
“Processor Module Components” on page 19
“Main Module Components” on page 20
“Supported Storage and Backup Devices” on page 22
“Component Service Task Reference” on page 22
“System Schematic” on page 41

Processor Module Components

These components are accessible within the processor module when you remove the processor module from the front of the server.
Processor Module Components
Identifying Components 19
Page 20

Main Module Components

No. Description Link
1 DIMMs “Servicing DIMMs” on page 69

Related Information

“Front Panel Components (Service)” on page 14
“Rear Panel Components (Service)” on page 16
“Chassis Subassembly Components” on page 18
“Main Module Components” on page 20
“Supported Storage and Backup Devices” on page 22
“Component Service Task Reference” on page 22
“System Schematic” on page 41
Main Module Components
These components are accessible after you remove the main module from the front of the server.
20 SPARC T7-4 Server Service Manual • May 2017
Page 21
Main Module Components
No. Description Links
1 Hard drives “Servicing Hard Drives” on page 87
2 Front I/O assembly and cables “Servicing the Front I/O Assembly” on page 143
3 Storage backplane “Servicing the Drive Backplane” on page 119
4 Main module motherboard
5 SPM “Servicing the SPM” on page 125
6 SCC PROM “Servicing the SCC PROM” on page 133
7 Battery “Servicing the Battery” on page 137
8 NVMe cards (optional) “Servicing NVMe Switch Cards” on page 111

Related Information

“Front Panel Components (Service)” on page 14
“Rear Panel Components (Service)” on page 16
Identifying Components 21
Page 22

Supported Storage and Backup Devices

“Chassis Subassembly Components” on page 18
“Processor Module Components” on page 19
“Supported Storage and Backup Devices” on page 22
“Component Service Task Reference” on page 22
“System Schematic” on page 41
Supported Storage and Backup Devices
The server supports the following storage devices:
Fibre channel arrays (SATA, FC, flash, and SAS-2)
SAS arrays (SAS-2)
ZFS appliances (SAS-2)
The server also supports these types of tape backup and restore devices:
TCP/IP
Fibre channel
SAS
LVD SCSI

Related Information

“Front Panel Components (Service)” on page 14
“Rear Panel Components (Service)” on page 16
“Chassis Subassembly Components” on page 18
“Processor Module Components” on page 19
“Main Module Components” on page 20
“Component Service Task Reference” on page 22
“System Schematic” on page 41

Component Service Task Reference

This table lists the names of serviceable components. It also lists the system names and task locations for the components.
22 SPARC T7-4 Server Service Manual • May 2017
Page 23
Component Service Task Reference
Component Max. NAC Name SDM Name Link to Service
Processor module
Processor filler module
DIMM 64
Main module 1
Disk backplane
Hard drive 8
NVMe switch card (optional)
NVMe drive (optional)
SPM 1
SCC PROM 1
Battery 1
Front I/O assembly
Power supply 4
Fan module 5
PCIe card 16
Rear IO module
Rear chassis subassembly (RCSA)
2
1
1
2
8
1
1
1
/SYS/PMx /System/CPU_Modules/CPU_Module_x
/SYS/PFMx
/SYS/PMx/CMx/CMP/ BOBxx/CHx/DIMM
/SYS/MB
/SYS/DBP SAS_BACKPLANE
/SYS/DBP/HDDx /System/Storage/Disks/Disks_x
/SYS/MB/PCIEx/PCIESW NVMECARD
/SYS/DBP/NVMEx
/SYS/MB/SPM /SPM
/SYS/MB/SCC
/SYS/MB/BAT
/SYS/FIO
/SYS/PSx /System/Power/Power_Supplies/
/SYS/RCSA/FANBD/FMx /System/Cooling/Fans/Fan_x
/SYS/RCSA/PCIEx/CAR/CAR/CARD /System/PCI_Devices/Add-on/Device_x
/SYS/RIO /System/Networking/Ethernet_NICs
/SYS/RCSA
/System/Memory/DIMMs/DIMM_x
None “Servicing the Main
None “Servicing Hard
None “Servicing the SCC
None “Servicing the
None “Servicing
Power_Supply_x
None “Servicing the
Procedure
“Servicing Processor Modules” on page 55
“Servicing Processor Modules” on page 55
“Servicing DIMMs” on page 69
Module” on page 99
“Servicing the Drive Backplane” on page 119
“Servicing Hard Drives” on page 87
“Servicing NVMe Switch Cards” on page 111
Drives” on page 87
“Servicing the SPM” on page 125
PROM” on page 133
Battery” on page 137
the Front I/O Assembly” on page 143
“Servicing Power Supplies” on page 147
“Servicing Fan Modules” on page 157
“Servicing PCIe Cards” on page 165
“Servicing the Rear I/O Module” on page 183
Rear Chassis Subassembly” on page 193

Related Information

“Front Panel Components (Service)” on page 14
“Rear Panel Components (Service)” on page 16
Identifying Components 23
Page 24
Component Service Task Reference
“Chassis Subassembly Components” on page 18
“Processor Module Components” on page 19
“Main Module Components” on page 20
“Supported Storage and Backup Devices” on page 22
“System Schematic” on page 41
24 SPARC T7-4 Server Service Manual • May 2017
Page 25

Detecting and Managing Faults

These topics explain how to use various diagnostic tools to monitor server status and troubleshoot faults in the server. The examples use the PSH fmadm faulty command.
“Understanding Diagnostics” on page 25
“Checking for Faults” on page 27
“Interpreting Log Files and System Messages” on page 35
“Configuring POST” on page 37
“Clear a Fault Manually” on page 40

Related Information

“Identifying Components” on page 13
“Component Service Categories” on page 46
“Preparing for Service” on page 43
“Returning the Server to Operation” on page 201

Understanding Diagnostics

These topics explain the diagnostic process and tools.
“PSH Overview” on page 25
“Diagnostics Process” on page 26

PSH Overview

The PSH feature provides problem diagnosis on the SPM and the host. Regardless of where a fault occurs, you can view and manage the fault diagnosis from the SPM or the host.
Detecting and Managing Faults 25
Page 26
Understanding Diagnostics
When possible, PSH initiates steps to take the component offline. PSH also logs the fault to the syslogd daemon and provides a fault notification with a message ID. You can use the message ID to get additional information about the problem from the Knowledge Base article database.
A PSH console message provides this information about each detected fault:
Type
Severity
Description
Automated response
Impact
Suggested action for system administrator
If PSH detects a faulty component, use the fmadm faulty command to display information about the fault. See “Check for Faults” on page 33.
Related Information
“Diagnostics Process” on page 26
“Checking for Faults” on page 27

Diagnostics Process

This table describes the diagnostics process.
Step Diagnostic Action Possible Outcome Links
1. Check the server for detected
2. Check the log files for fault
3. Run POST to provide
faults using these tools:
■ System LEDs on the front and rear panels.
fmadm faultycommand from the Oracle Solaris prompt or through the Oracle ILOM fault management shell.
information.
additional low-level
Determine the faulty component and replace it, or continue to advanced troubleshooting.
If system messages indicate a faulty component, replace it.
If POST indicates a faulty component, replace it. “Configuring POST” on page 37
“Checking for Faults” on page 27
“Interpreting Log Files and System Messages” on page 35
26 SPARC T7-4 Server Service Manual • May 2017
Page 27
Step Diagnostic Action Possible Outcome Links
diagnostic information for the server.
4. Contact technical support if the problem persists.
If you are unable to determine the cause of a fault, contact Oracle Support for help.
https://support.oracle.comhttps://
support.oracle.com
Related Information
“PSH Overview” on page 25
“Checking for Faults” on page 27

Checking for Faults

Use these methods to check for faults:
“Interpreting LEDs” on page 27
“Log In to Oracle ILOM (Service)” on page 32
“Check for Faults” on page 33
Checking for Faults

Interpreting LEDs

Use these steps to determine if an LED indicates that a component has failed in the server.
Steps Description Links
1. Check the LEDs on the front and rear of the server. “Front Panel Controls and
2. Check the LEDs on the individual components.
Note - Component LEDs might not be lit
even though the component is faulty. Use the
LEDs” on page 29
“Rear Panel Controls and
LEDs” on page 31
“Determine if the Main Module Is
Faulty” on page 101
Detecting and Managing Faults 27
Page 28
Checking for Faults
Steps Description Links
instructions in these links to determine if the component has been diagnosed as being faulty.
“Determine Which Processor Module Is
“Identifying Faulty DIMMs” on page 74
“Determine Which Hard Drive Is
“Determine Which Power Supply Is
“Determine Which Fan Module Is
“Determine Which PCIe Card Is
“Determine if the Rear I/O Module Is
Related Information
“Front Panel Controls and LEDs” on page 29
“Rear Panel Controls and LEDs” on page 31
Faulty” on page 60
Faulty” on page 90
Faulty” on page 151
Faulty” on page 158
Faulty” on page 170
Faulty” on page 186
28 SPARC T7-4 Server Service Manual • May 2017
Page 29
Front Panel Controls and LEDs
Checking for Faults
No. LED Icon or Label Description
1 Locator LED and
button (white)
2 Server Service
Required LED (amber)
3 Power OK LED
(green)
You can turn on the Locator LED to identify a particular server. When lit, the LED blinks rapidly. Turn on the Locator LED by pressing the Locator button, or see “Locate the
Server” on page 49.
The fmadm faulty command provides details about any faults that cause this indicator to light. See “Check for
Faults” on page 33.
Under some fault conditions, individual component fault LEDs are lit in addition to the Server Service Required LED.
Indicates these conditions:
Off – Server is not running in its normal state. Server power might be off. The SPM might be running.
Detecting and Managing Faults 29
Page 30
Checking for Faults
No. LED Icon or Label Description
Steady on – Server is powered on and is running in its normal operating state. No service actions are required.
Fast blink – Server is running in standby mode and can be quickly returned to full function.
Slow blink – A normal but transitory activity is taking place. Slow blinking might indicate that server diagnostics are running or that the server is booting.
4 Power button The recessed Power button toggles the server on or off.
See “Power Off the Server (Power Button – Graceful
Shutdown)” on page 52.
5 System Overtemp
Indicates these conditions:
LED(amber)
Off – Indicates a steady state, no service action is required.
Steady on – Indicates that a temperature failure event has been acknowledged and a service action is required.
6 Fan Module Fault
Rear FM Indicates these conditions:
LED(amber)
Off – Indicates a steady state, no service action is required.
Steady on – Indicates that a fan module failure event has been acknowledged and a service action is required on at least one of the fan modules.
7 PCIe Card Fault
Rear PCIe Indicates these conditions:
LED(amber)
Off – Indicates a steady state, no service action is required.
Steady on – Indicates that a failure event has been acknowledged and a service action is required on at least one of the PCIe cards.
30 SPARC T7-4 Server Service Manual • May 2017
Page 31
Rear Panel Controls and LEDs
Checking for Faults
No. LED Icon or Label Description
1 AC 0 (left) and AC 1 (right)
power LED
2 Net MGT port link LED Indicates these conditions:
3 Net MGT port speed LED Indicates these conditions:
4 Network port link LED Indicates these conditions:
5 Network port speed LED Indicates these conditions:
Indicates these conditions:
Off – No power is applied to the server.
Green – Power is applied to the server.
Off – No link is established.
On or blinking – A link is established.
Off – The link is operating as a 10-Mbps connection.
On or blinking – The link is operating as a 100-Mbps connection.
Off – No link is established.
Blinking – A link is established.
Off – The link is operating as a 10-Mbps connection or there is no link.
Amber on – The link is operating as a 100-Mbps connection.
Green on – The link is operating as a Gigabit connection (1000 Mbps).
Detecting and Managing Faults 31
Page 32

Log In to Oracle ILOM (Service)

No. LED Icon or Label Description
6 AC 2 (left) and AC 3 (right)
power LEDs
7 Locator LED and button
(white)
Indicates these conditions:
Off – No power is applied to the server.
Green – Power is applied to the server.
Turn on the Locator LED by pressing the Locator button, or see
“Locate the Server” on page 49. When lit, the LED blinks rapidly.
8 Server Service Required
LED (amber)
9 Power OK LED (green) Indicates these conditions:
10 SP LED SP Indicates these conditions:
11 Overtemp LED(amber) Indicates these conditions:
The fmadm faulty command provides details about any faults that cause this indicator to light. See “Check for Faults” on page 33.
Under some fault conditions, individual component fault LEDs are lit in addition to the Service Required LED.
Off – Server is not running in its normal state. System power might be off. The SPM might be running.
Steady on – Server is powered on and is running in its normal operating state. No service actions are required.
Fast blink – Server is running in standby mode and can be quickly returned to full function.
Slow blink – A normal but transitory activity is taking place. Slow blinking might indicate that system diagnostics are running or that the system is booting.
Off – AC power might have been connected to the power supplies.
Steady on, green – SPM is running in its normal operating state. No service actions are required.
Blink, green – SPM is initializing the Oracle ILOM firmware.
Steady on, amber – An SPM error has occurred and service is required.
Off – Indicates a steady state, no service action is required.
Steady on – Indicates that a temperature failure event has been acknowledged and a service action is required.
Log In to Oracle ILOM (Service)
1.
At the terminal prompt, type:
ssh root@IP-address
Password: password
Oracle (R) Integrated Lights Out Manager Version 3.2.1.2 rXXXXX
32 SPARC T7-4 Server Service Manual • May 2017
Page 33

Check for Faults

Copyright (c) 2013, Oracle and/or its affiliates. All rights reserved.
->
Note - To enable first-time login and access to Oracle ILOM, a default Administrator account
and its password are provided with the system. To build a secure environment, you must change the default password (changeme) for the default Administrator account (root) after your initial login to Oracle ILOM. If this default Administrator account has since been changed, contact your system administrator for an Oracle ILOM user account with Administrator privileges.
2.
Enable the Oracle ILOM 3.0 legacy name spaces.
-> set /SP/cli legacy_targets=enabled
Note - In Oracle ILOM 3.1, the name spaces for /SYS and /STORAGE were replaced with
/System. You can still use the 3.0 legacy names in commands at any time, but to expose the
legacy names in the output, you must enable them. This manual uses the legacy names in the command examples and shows the names in the output examples. For more information about the new name spaces, see the Oracle ILOM documentation.
Related Information
“Interpreting LEDs” on page 27
“Check for Faults” on page 33
Check for Faults
The fmadm faulty command displays the list of faults detected by PSH. You can run this command from either the host or through the Oracle ILOM fault management shell.
1.
Log in to Oracle ILOM.
See “Log In to Oracle ILOM (Service)” on page 32.
2.
Check for PSH-diagnosed faults.
This example shows how to check for faults through the Oracle ILOM fault management shell.
-> start /SP/faultmgmt/shell Are you sure you want to start /SP/faultmgmt/shell (y/n)? y
faultmgmtsp> fmadm faulty
------------------- ------------------------------------ ------------- -------
Detecting and Managing Faults 33
Page 34
Check for Faults
Time UUIDmsgidSeverity
------------------- ------------------------------------ ------------- ------­2014-08-27/19:46:26 4ec16c8d-5cdb-c6ca-c949-e24d3637ef27 PCIEX-8000-8R Major
Problem Status : solved Diag Engine : [unknown] System Manufacturer : Oracle Corporation Name : SPARC T7-4 Part_Number : 12345678+11+1 Serial_Number : 1238BDC0DF 
---------------------------------------­Suspect 1 of 1 Fault class : fault.io.pciex.device-interr-corr Certainty  : 100% Affects   : hc:///chassis=0/motherboard=0/cpuboard=0/chip=0/hostbridge=0/    pciexrc=0 Status : faulted but still in service FRU Status: faulty Location: /SYS/PM0 Manufacturer: Oracle Corporation Name : TLA,PN,NRM,M7 1.2 Part_Number : 7061001 Revision : 01 Serial_Number: 465769T+12445102WR Chassis Manufacturer : Oracle Corporation Name :SPARC T7-4 Part_Number : 12345678+13+2 Serial_Number: 1248DC140
Description : A fault has been diagnosed by the Host Operation System.
Response : The service required LED on the chassis and on the affected FRU may be illuminated.
Impact : No SPM impact
Action : Refer to the associated reference document at https://support.oracle.com/msg/PCIEX-8000-8R for the latest service procedures and policies regarding this diagnosis. faultmgmtsp>
In this example, a fault is displayed that includes these details:
Date and time of the fault (2012-08-27/19:46:26).
34 SPARC T7-4 Server Service Manual • May 2017
Page 35

Interpreting Log Files and System Messages

UUID (4e16c8d-5cdb-c6ca-c949-e24d3637ef27), which is unique to each fault.
Message identifier (PCIEX-8000-8R), which can be used to obtain additional fault information from Knowledge Base articles.
3.
Consider your next step:
If you are checking for faults while adding a second processor module, and no faults were detected, return to “Server Upgrade Process” on page 56.
If a fault is detected, proceed to Step 4.
4.
Use the message ID to obtain more information about this type of fault.
a.
Obtain the message ID from console output.
b.
Go to https://support.oracle.com, and search on the message ID in the Knowledge tab.
5.
Follow the suggested actions to repair the fault.
6.
If necessary, clear the fault manually.
See “Clear a Fault Manually” on page 40.
Related Information
“PSH Overview” on page 25
“Clear a Fault Manually” on page 40
Interpreting Log Files and System Messages
With the OS running on the server, you have the full complement of Oracle Solaris OS files and commands available for collecting information and for troubleshooting.
If PSH does not indicate the source of a fault, check the message buffer and log files for notifications for faults. Drive faults are usually captured by the Oracle Solaris message files.
These topics explain how to view the log files and system messages.
Detecting and Managing Faults 35
Page 36

Check the Message Buffer

Check the Message Buffer
The dmesg command checks the system buffer for recent diagnostic messages and displays them.
1.
Log in as superuser.
2.
Type:
# dmesg
Related Information
“Check the Message Buffer” on page 36
“View Log Files (Oracle Solaris)” on page 36
“View Log Files (Oracle ILOM)” on page 37
“View Log Files (Oracle Solaris)” on page 36
“View Log Files (Oracle ILOM)” on page 37

View Log Files (Oracle Solaris)

The error logging daemon, syslogd, automatically records various system warnings, errors, and faults in message files. These messages can alert you to system problems such as a device that is about to fail.
The /var/adm directory contains several message files. The most recent messages are in the /var/adm/messages file. After a period of time (usually every week), a new messages file is automatically created. The original contents of the messages file are rotated to a file named messages.1. Over a period of time, the messages are further rotated to messages.2 and messages.3, and then deleted.
1.
Log in as superuser.
2.
Type:
# more /var/adm/messages
3.
To view all logged messages, type:
36 SPARC T7-4 Server Service Manual • May 2017
Page 37
# more /var/adm/messages*
Related Information
“Check the Message Buffer” on page 36
“View Log Files (Oracle ILOM)” on page 37

View Log Files (Oracle ILOM)

1.
View the event log.
-> show /SP/logs/event/list
2.
View the audit log.
-> show /SP/logs/audit/list
Related Information
View Log Files (Oracle ILOM)
“Check the Message Buffer” on page 36
“View Log Files (Oracle Solaris)” on page 36

Configuring POST

These topics explain how to configure POST as a diagnostic tool.
“POST Overview” on page 37
“Configure POST” on page 38

POST Overview

POST is a group of PROM-based tests that run when the server is powered on or when it is reset. POST checks the basic integrity of the critical hardware components in the server.
You can also set other Oracle ILOM properties to control various other aspects of POST operations. For example, you can specify the events that cause POST to run, the level of testing
Detecting and Managing Faults 37
Page 38

Configure POST

1.
POST performs, and the amount of diagnostic information POST displays. These properties are described in “Configure POST” on page 38.
If POST detects a faulty component, the component is disabled automatically. If the server is able to run without the disabled component, the server boots when POST completes its tests. For example, if POST detects a faulty processor core, the core is disabled, POST completes its test sequence, and the server boots using the remaining cores.
Related Information
“Configure POST” on page 38
Configure POST
Log in to Oracle ILOM.
See “Log In to Oracle ILOM (Service)” on page 32.
2.
Set the virtual keyswitch to the value that corresponds to the POST configuration you want to run.
This example sets the virtual keyswitch default_level to min, which configures POST to run according to other parameter values.
-> set /HOST keyswitch_state=min Set default_level to min
For possible values for the keyswitch_state parameter, type:
-> show /HOST diag help
/HOST/diag : Manage Host Power On Self Test Diagnostics
Targets:
Properties:
default_level : Diag level in the default cause (no error or hw change) default_level : Possible values = off, min, max default_level : User role required for set = r
default_verbosity : Diag verbosity in the default cause (no error or hw change) default_verbosity : Possible values = none, min, normal, max
38 SPARC T7-4 Server Service Manual • May 2017
Page 39
Configure POST
default_verbosity : User role required for set = r
error_level : Diag level when running after an error reset error_level : Possible values = off, min, max error_level : User role required for set = r
error_verbosity : Diag verbosity when running after an error reset error_verbosity : Possible values = none, min, normal, max error_verbosity : User role required for set = r
hw_change_level : Diag level when running after a hw change hw_change_level : Possible values = off, min, max hw_change_level : User role required for set = r
hw_change_verbosity : Diag verbosity when running after a hw change hw_change_verbosity : Possible values = none, min, normal, max hw_change_verbosity : User role required for set = r
->
Note - Depending on the server configuration, setting the HOST keyswitch_state diagnostics
verbosity to none might result in no POST test status displaying on the console for an extended period of time.
3.
You can also set the virtual keyswitch to determine the diagnostic level after an error reset and after a hardware change. To set error_level, to max, and hw_change_level to max, type.
-> set /HOST/diag error_level=max
-> set /HOST/diag hw_change_level=max
4.
View the current values for settings.
Example:
-> show /HOST/diag
/HOST/diag  Targets:
 Properties:  error_reset_level = max  error_reset_verbosity = normal  hw_change_level = max  hw_change_verbosity = normal  level = min  mode = normal  power_on_level = max
Detecting and Managing Faults 39
Page 40

Clear a Fault Manually

 power_on_verbosity = normal  trigger = hw_change error-reset  verbosity = normal
 Commands: cd set show
->
Related Information
“POST Overview” on page 37
Clear a Fault Manually
When PSH detects faults, the faults are logged and displayed on the console. In most cases, after the fault is repaired, the corrected state is detected by the server, and the fault condition is repaired automatically. However, this repair should be verified. In cases where the fault condition is not automatically cleared, you must clear the fault manually.
1.
After replacing a faulty FRU, power on the server.
See “Returning the Server to Operation” on page 201.
2.
At the host prompt, determine whether the replaced FRU still shows a faulty state.
See “Check for Faults” on page 33.
If no fault is reported, you do not need to do anything else. Do not perform
the subsequent steps.
If a fault is reported, continue to Step 3.
3.
Clear the fault from all persistent fault records.
In some cases, even though the fault is cleared, some persistent fault information remains and results in erroneous fault messages at boot time. To ensure that these messages are not displayed, type this PSH command:
faultmgmtsp> fmadm acquit UUID
4.
If required, reset the server.
40 SPARC T7-4 Server Service Manual • May 2017
Page 41
In some cases, the output of the fmadm faulty command might include this message for the faulty component:
Component faulted and taken out of service
If this message appears in the output, you must reset the server after you manually repair the fault.
faultmgmtsp> exit
-> reset /System
Are you sure you want to reset /System? y Resetting /System ...
Related Information
“PSH Overview” on page 25
“Check for Faults” on page 33

System Schematic

System Schematic
This schematic shows the connections between and among specific components and device slots. You can use this schematic to determine optimum locations for any optional cards or other peripherals based on system configuration and intended use.
Detecting and Managing Faults 41
Page 42
System Schematic

Related Information

“Front Panel Components (Service)” on page 14
“Rear Panel Components (Service)” on page 16
“Chassis Subassembly Components” on page 18
“Processor Module Components” on page 19
“Main Module Components” on page 20
“Supported Storage and Backup Devices” on page 22
“Component Service Task Reference” on page 22
42 SPARC T7-4 Server Service Manual • May 2017
Page 43

Preparing for Service

These topics describe how to prepare the server for servicing.
Step Description Link
1. Review safety and handling information. “Safety Information” on page 43
2. Gather the tools needed for service. “Tools Needed for Service” on page 45
3. Consider filler options. “Component Fillers” on page 46
4. Find the server serial number. “Find the Server Serial Number” on page 47
5. Identify the server to be serviced. “Locate the Server” on page 49
6. Locate the component service information. “Component Service Task Reference” on page 22
7. For cold-service operations, shut down the OS. “Removing Power From the Server” on page 50
8. Gain access to service components. “Chassis Subassembly Components” on page 18

Safety Information

For your protection, observe the following safety precautions when setting up your equipment:
Follow all cautions and instructions marked on the equipment and described in the documentation shipped with your server.
Follow all cautions and instructions marked on the equipment and described in the SPARC T7-4 Server Safety and Compliance Guide.
Ensure that the voltage and frequency of your power source match the voltage and frequency inscribed on the equipment's electrical rating label.
Follow the ESD safety practices as described in this section.
This topic includes the following sections:
“Safety Symbols” on page 44
“ESD Precautions” on page 44
“Antistatic Wrist Strap” on page 44
Preparing for Service 43
Page 44
Safety Information
“Antistatic Mat” on page 45

Safety Symbols

Note the meanings of the following symbols that might appear in this document:
Caution - There is a risk of personal injury or equipment damage. To avoid personal injury and
equipment damage, follow the instructions.
Caution - Hot surface. Avoid contact. Surfaces are hot and might cause personal injury if
touched.
Caution - Hazardous voltages are present. To reduce the risk of electric shock and danger to
personal health, follow the instructions.

ESD Precautions

ESD-sensitive devices, such as the PCIe cards, hard drives, and DIMMs require special handling.
Caution - Circuit boards and hard drives contain electronic components that are extremely
sensitive to static electricity. Ordinary amounts of static electricity from clothing or the work environment can destroy the components located on these boards. Do not touch the components along their connector edges.
Caution - You must disconnect all power supplies before servicing any of the components that
are inside the chassis.

Antistatic Wrist Strap

Wear an antistatic wrist strap and use an antistatic mat when handling components such as hard drive assemblies, circuit boards, or PCIe cards. When servicing or removing server components, attach an antistatic strap to your wrist and then to a metal area on the chassis. Following this practice equalizes the electrical potentials between you and the server.
44 SPARC T7-4 Server Service Manual • May 2017
Page 45

Tools Needed for Service

Antistatic Mat

Place ESD-sensitive components such as motherboards, memory, and other PCBs on an antistatic mat.

Related Information

“Tools Needed for Service” on page 45
“Component Fillers” on page 46
“Component Service Categories” on page 46
“Find the Server Serial Number” on page 47
“Locate the Server” on page 49
“Prevent ESD Damage” on page 49
“Removing Power From the Server” on page 50
Tools Needed for Service
You will need the following tools for most service operations:
Antistatic wrist strap
Antistatic mat
No. 1 Phillips screwdriver
No. 2 Phillips screwdriver
No. 1 flat-blade screwdriver (battery removal)

Related Information

“Safety Information” on page 43
“Tools Needed for Service” on page 45
“Component Fillers” on page 46
“Component Service Categories” on page 46
“Find the Server Serial Number” on page 47
“Locate the Server” on page 49
“Prevent ESD Damage” on page 49
Preparing for Service 45
Page 46

Component Fillers

“Removing Power From the Server” on page 50
Component Fillers
Depending on configuration, each server is shipped with replacement fillers for hard drives and processor modules. A filler is an empty metal or plastic component that does not contain any functioning system hardware or cable connectors.
The fillers are installed at the factory and must remain in the server until you replace them with a functional component to ensure proper airflow through the sytem. If you remove a filler and continue to operate your system with an empty slot, the server might overheat due to improper airflow. For instructions on removing or installing a filler for a server component, refer to the topic in this document about servicing that component.

Related Information

“Safety Information” on page 43
“Tools Needed for Service” on page 45
“Component Service Categories” on page 46
“Find the Server Serial Number” on page 47
“Locate the Server” on page 49
“Prevent ESD Damage” on page 49
“Removing Power From the Server” on page 50

Component Service Categories

Replaceable components fall into these categories:
Hot-serviceable by the customer – Hot-serviceable components can be removed while the server is running. Hot-swappable components do not require any preparation prior to servicing. Hot-pluggable components do require preparation prior to servicing.
Cold-serviceable by the customer or exclusively by authorized service personnel – Cold-serviceable components require that the server is shut down. In addition, some service procedures require that the power cables be disconnected between the power supplies and the power source.
The following table identifies the server components that are replaceable.
46 SPARC T7-4 Server Service Manual • May 2017
Page 47

Find the Server Serial Number

Component Power Status for
Processor module Off “Servicing Processor Modules” on page 55
DIMM Off “Servicing DIMMs” on page 69
Hard drive Off or On “Servicing Hard Drives” on page 87
Main module
NVMe switch card Off “Servicing NVMe Switch Cards” on page 111
Storage backplane
SPM
SCC PROM
System battery
Front I/O assembly
Power supply Off or On “Servicing Power Supplies” on page 147
Fan module Off or On “Servicing Fan Modules” on page 157
PCIe card Off or On “Servicing PCIe Cards” on page 165
Rear I/O module
Rear chassis subassembly
You must disconnect the ower cords before accessing this component..
*
*
*
*
*
*
*
Removal
Off “Servicing the Main Module” on page 99
Off X “Servicing the Drive Backplane” on page 119
Off X “Servicing the SPM” on page 125
Off X “Servicing the SCC PROM” on page 133
Off X “Servicing the Battery” on page 137
Off “Servicing the Front I/O Assembly” on page 143
Off X “Servicing the Rear I/O Module” on page 183
Off X “Servicing the Rear Chassis Subassembly” on page 193
Authorized Service Personnel Only
Remove and Replace Instructions

Related Information

“Safety Information” on page 43
“Tools Needed for Service” on page 45
“Component Fillers” on page 46
“Find the Server Serial Number” on page 47
“Locate the Server” on page 49
“Prevent ESD Damage” on page 49
“Removing Power From the Server” on page 50
Find the Server Serial Number
If you require technical support for your server, you will be asked to provide the server's serial number.
Use one of the following options to find the serial number.
Preparing for Service 47
Page 48
Find the Server Serial Number
Locate the manufacturing sticker on the front of the server or on the sticker on the side of the server.
At the Oracle ILOM prompt, type
-> show /SYS
 /SYS Targets: MB MB_ENV RIO PM0 PM1 FM0 ... Properties: type = Host System ipmi_name = /SYS keyswitch_state = Normal product_name = T5-4 product_part_number = 602-1234-01 product_serial_number = 0723BBC006 fault_state = OK clear_fault_action = (none) power_state = On  Commands: cd reset set show start stop
Related Information
“Safety Information” on page 43
“Tools Needed for Service” on page 45
“Component Fillers” on page 46
“Component Service Categories” on page 46
“Locate the Server” on page 49
“Prevent ESD Damage” on page 49
48 SPARC T7-4 Server Service Manual • May 2017
Page 49
“Removing Power From the Server” on page 50

Locate the Server

You can use the Locator LEDs to identify a particular server.
1.
At the Oracle ILOM prompt, type:
-> set /SYS/LOCATE value=Fast_Blink
The white Locator LEDs (one on the front panel and one on the rear panel) blink.
2.
After locating the server with the blinking Locator LED, turn it off using one of the following methods.
Press the Locator button.
At the Oracle ILOM prompt, type:
Locate the Server
-> set /SYS/LOCATE value=Off
Related Information
“Safety Information” on page 43
“Tools Needed for Service” on page 45
“Component Fillers” on page 46
“Component Service Categories” on page 46
“Find the Server Serial Number” on page 47
“Prevent ESD Damage” on page 49
“Removing Power From the Server” on page 50

Prevent ESD Damage

Many components contained in the processor modules and main module can be damaged by ESD. To protect these components from damage, perform the following steps before opening these modules for service.
1.
Prepare an antistatic surface to set parts on during the removal, installation, or replacement process.
Preparing for Service 49
Page 50

Removing Power From the Server

Place ESD-sensitive components, such as the printed circuit boards, on an antistatic mat. The following items can be used as an antistatic mat:
Antistatic bag used to wrap a replacement part
ESD mat
A disposable ESD mat (shipped with some repacement parts or optional server components)
2.
Attach an antistatic wrist strap.
When servicing or removing server components, attach an antistatic strap to your wrist and then to a metal area on the chassis.
Related Information
“Safety Information” on page 43
“Servicing Processor Modules” on page 55
“Servicing DIMMs” on page 69
“Servicing the Main Module” on page 99
“Servicing the Drive Backplane” on page 119
“Servicing the SPM” on page 125
“Servicing the SCC PROM” on page 133
“Servicing the Battery” on page 137
“Servicing the Front I/O Assembly” on page 143
“Servicing PCIe Cards” on page 165
“Servicing the Rear I/O Module” on page 183
“Servicing the Rear Chassis Subassembly” on page 193
Removing Power From the Server
These topics describe different methods for removing power from the chassis.
“Prepare to Power Off the Server” on page 51
“Power Off the Server (Oracle ILOM)” on page 51
“Power Off the Server (Power Button – Graceful Shutdown)” on page 52
“Power Off the Server (Power Button – Emergency Shutdown)” on page 52
“Disconnect Power Cords” on page 53
“Prevent ESD Damage” on page 49
50 SPARC T7-4 Server Service Manual • May 2017
Page 51

Prepare to Power Off the Server

Prepare to Power Off the Server
1.
Notify affected users that the server will be shut down.
Refer to the Oracle Solaris system administration documentation for additional information.
2.
Save any open files and quit all running programs.
Refer to your application documentation for specific information for these processes.
3.
Shut down all logical domains.
Refer to the Oracle Solaris system administration documentation for additional information.
4.
Shut down the Oracle Solaris OS.
Refer to the Oracle Solaris system administration documentation for additional information.
5.
Power off the server.
See:
“Power Off the Server (Oracle ILOM)” on page 51
“Power Off the Server (Power Button – Graceful Shutdown)” on page 52
“Power Off the Server (Power Button – Emergency Shutdown)” on page 52
Related Information
“Prepare to Power Off the Server” on page 51
“Disconnect Power Cords” on page 53

Power Off the Server (Oracle ILOM)

You can use the SPM to perform a graceful shutdown of the server. This type of shutdown ensures that all of your data is saved and that the server is ready for restart.
1.
Log in as superuser or equivalent.
Depending on the type of problem, you might want to view server status or log files. You also might want to run diagnostics before you shut down the server.
2.
Switch from the system console to the Oracle ILOM -> prompt by typing the #. (Hash-Period) key sequence.
3.
At the Oracle ILOM prompt, type:
Preparing for Service 51
Page 52

Power Off the Server (Power Button – Graceful Shutdown)

-> stop /System
Stopping /System
4.
If you are powering off the server in order to add a second processor module, return to “Server Upgrade Process” on page 56.
Related Information
“Prepare to Power Off the Server” on page 51
“Power Off the Server (Power Button – Graceful Shutdown)” on page 52
“Power Off the Server (Power Button – Emergency Shutdown)” on page 52
Power Off the Server (Power Button – Graceful Shutdown)
This procedure places the server in the power standby mode.
1.
Press and release the recessed Power button.
The Power OK LED blinks rapidly.
2.
If you are powering off the server in order to add a second processor module, return to “Server Upgrade Process” on page 56.
Related Information
“Power Off the Server (Oracle ILOM)” on page 51
“Power Off the Server (Power Button – Emergency Shutdown)” on page 52

Power Off the Server (Power Button – Emergency Shutdown)

Caution - All applications and files are closed abruptly without saving changes. File system
corruption might occur.
Press and hold the Power button for four seconds.
52 SPARC T7-4 Server Service Manual • May 2017
Page 53

Disconnect Power Cords

Related Information
“Power Off the Server (Oracle ILOM)” on page 51
“Power Off the Server (Power Button – Graceful Shutdown)” on page 52
Disconnect Power Cords
You must disconnect the power cords before accessing the following components:
Main module
Storage backplanes
SPM
SCC PROM
Battery
Front I/O assembly
Rear I/O module
Rear chassis subassembly
1.
Power off the server.
See:
“Power Off the Server (Oracle ILOM)” on page 51
“Power Off the Server (Power Button – Graceful Shutdown)” on page 52
“Power Off the Server (Power Button – Emergency Shutdown)” on page 52
2.
Disconnect all power cords from the server.
Caution - Because standby power is always present in the system, you must unplug the power
cords before accessing certain components.
Related Information
“Safety Information” on page 43
“Tools Needed for Service” on page 45
“Component Fillers” on page 46
“Component Service Categories” on page 46
“Find the Server Serial Number” on page 47
“Locate the Server” on page 49
Preparing for Service 53
Page 54
Disconnect Power Cords

Attachment of Devices During Service

During service procedures, you might have to connect devices to the server.
“Prevent ESD Damage” on page 49
For OS support, connect an Ethernet cable to the one of the Ethernet connectors ( NET 0, NET 1, NET 2, or NET 3).
If you plan to interact with the system console directly, you can connect additional external devices, such as a mouse and keyboard, to the server's USB connectors, and connect a monitor to the rear DB-15 video connector. For more details on connecting to the video port, refer to “Connecting Cables” in SPARC T7-4 Server Installation Guide.
If you plan to connect to the Oracle ILOM software over the network, connect an Ethernet cable to the Ethernet port labeled NET MGT.
Note - The SP uses the NET MGT (out-of-band) port by default. You can configure the
SP to share one of the sever's four Ethernet ports instead. The SP uses only the configured Ethernet port.
If you plan to access the Oracle ILOM CLI through the management port, connect a serial null modem cable to the RJ-45 serial port labeled SER MGT.
The USB connectors on the front panel support USB 2.0. The USB connectors on the rear panel support USB 3.0.
Related Information
“Front Panel Components (Service)” on page 14
“Rear Panel Components (Service)” on page 16
“Detecting and Managing Faults” on page 25
“Connecting Cables” in SPARC T7-4 Server Installation Guide
54 SPARC T7-4 Server Service Manual • May 2017
Page 55

Servicing Processor Modules

This topic describes how to service processor modules, and how to upgrade the server from a single processor module configuration to a dual processor module configuration.
Caution - You must disconnect the power cords before servicing this component. See
“Disconnect Power Cords” on page 53.
Description Links
Replace a processor module. “Determine Which Processor Module Is
Faulty” on page 60
“Preparing for Service” on page 43
“Remove a Processor Module or Processor Filler
Module” on page 60
“Install a Processor Module or Processor Filler
Module” on page 64
“Verify a Processor Module” on page 67
Learn the process for upgrading the server from a single processor module configuration to a two processor module configuration.
Remove the processor module as part of another component's service operation.
Install the processor module as part of another component's service operation.

Related Information

“Identifying Components” on page 13
“Processor Module Components” on page 19
“Detecting and Managing Faults” on page 25
“Preparing for Service” on page 43
“Component Service Categories” on page 46
“Servicing DIMMs” on page 69
“Server Upgrade Process” on page 56
“Remove a Processor Module or Processor Filler Module” on page 60
“Install a Processor Module or Processor Filler Module” on page 64
Servicing Processor Modules 55
Page 56

Server Upgrade Process

“Returning the Server to Operation” on page 201
Server Upgrade Process
The SPARC T7-4 server supports two processor module configurations:
Fully-populated — Two processor modules
Half-populated— One processor module and one processor filler module
Processor modules are cold-service components that can be replaced only by qualified service personnel. For the location of the processor modules, see “Front Panel Components
(Service)” on page 14.
Caution - These service procedures require that you handle components that are sensitive to
electrostatic discharge. This discharge can cause failure of server components.
This table contains the steps for upgrading the server to a fully-populated configuration.
Step Description Link
1. Remove the upgrade components from their packaging, and
2. Remove the covers from the new processor module. “Remove a Processor Module or Processor Filler
3. Remove all of the DIMM fillers in the processor module. The
4. Verify that you have the correct DIMMs for your server. All of
5. Install the DIMMs. “Install a DIMM” on page 80
6. Check the server for faults. If any fault is present, you must
7. Shut down the server. “Removing Power From the Server” on page 50
8. Remove the processor filler module from Slot 1. “Remove a Processor Module or Processor Filler
9. Install the new processor module in Slot 1. “Install a Processor Module or Processor Filler
place them on an antistatic mat.
Module” on page 60
“Remove a DIMM” on page 78
steps to remove the DIMM fillers are the same as the steps for removing DIMMs.
“Understanding DIMM Configurations” on page 69
the DIMMs must be either 16 or 32 GB, and they must match the size and capacity of the DIMMs already installed in the server.
“Check for Faults” on page 33
correct the fault and clear it from the server before you can continue with the upgrade.
Module” on page 60
Module” on page 64
56 SPARC T7-4 Server Service Manual • May 2017
Page 57

Processor Module Configuration

Step Description Link
10. Return the server to operation. “Returning the Server to Operation” on page 201
11. Verify the installation. If any fault is present, you must correct the fault and clear it from the server.
12. Review the root complex changes. “Understanding PCIe Root Complex
13. Review the PCIe card load balancing changes. Even though the load balancing guidelines change with the u;pgrade, you do not need to move any existing PCIe cards.
“Verify a Processor Module” on page 67
Connections” on page 165
“PCIe Card Configuration” on page 168

Related Information

“Processor Module Components” on page 19
“System Schematic” on page 41
“Detecting and Managing Faults” on page 25
“Removing Power From the Server” on page 50
“Servicing DIMMs” on page 69
“Processor Module Configuration” on page 57
“Remove a Processor Module or Processor Filler Module” on page 60
“Install a Processor Module or Processor Filler Module” on page 64
“Verify a Processor Module” on page 67
“Understanding PCIe Root Complex Connections” on page 165
“PCIe Card Configuration” on page 168
“Returning the Server to Operation” on page 201
Processor Module Configuration
Processor modules are accessed from the front of the server. In Oracle ILOM, the processor modules are numbered PM0 and PM1, starting with the lower slot.
Servicing Processor Modules 57
Page 58
Processor Module Configuration
No. Description
1 Processor Module 1 (PM1) or processor filler module
2 Processor Module 0 (PM0)
Note - In servers with two processor modules installed, DIMMs configurations
in both processor modules must be identical. See “Understanding DIMM
Configurations” on page 69.
58 SPARC T7-4 Server Service Manual • May 2017
Page 59

Processor Module LEDs

No. LED Icon Description
1 (No function.) Not supported.
Processor Module LEDs
2 Service Required (amber) Indicates that the processor module has
3 OK (green) Indicates if the processor module is available for
experienced a fault condition.
use.
■ On – The server is running and the processor module is functioning correctly.
■ Off – The server is powered down and the processor module is in standby mode.

Related Information

“Processor Module Components” on page 19
“Server Upgrade Process” on page 56
“Determine Which Processor Module Is Faulty” on page 60
“Remove a Processor Module or Processor Filler Module” on page 60
“Install a Processor Module or Processor Filler Module” on page 64
Servicing Processor Modules 59
Page 60

Determine Which Processor Module Is Faulty

“Verify a Processor Module” on page 67
Determine Which Processor Module Is Faulty
The following LEDs are lit when a processor module fault is detected:
Front and rear System Fault (Service Required) LEDs
Service Required LED on the faulty processor module
1.
Determine if the Service Required LEDs are illuminated on the front panel or the rear I/O module.
See “Interpreting LEDs” on page 27.
2.
From the front of the server, check the processor module LEDs to identify which processor module needs to be replaced.
See “Processor Module LEDs” on page 59. The amber Service Required LED is lit on the processor module that needs to be replaced.
3.
Remove the faulty processor module.
See “Remove a Processor Module or Processor Filler Module” on page 60.
Related Information
“Processor Module Components” on page 19
“Processor Module LEDs” on page 59
“Remove a Processor Module or Processor Filler Module” on page 60
“Install a Processor Module or Processor Filler Module” on page 64
“Verify a Processor Module” on page 67

Remove a Processor Module or Processor Filler Module

Processor modules and processor filler modules are cold-service components that can be replaced only after you power off the system. Processor modules can be replaced only by qualified service personnel. For the location of the modules, see “Processor Module
Configuration” on page 57.
Caution - You must disconnect the power cords before servicing this component. See
“Disconnect Power Cords” on page 53.
60 SPARC T7-4 Server Service Manual • May 2017
Page 61
Remove a Processor Module or Processor Filler Module
Caution - This procedure requires that you handle components that are sensitive to electrostatic
discharge. This discharge can cause failure of server components.
1.
Prepare the server for service.
See “Preparing for Service” on page 43.
2.
Ensure that the server is powered off.
See “Removing Power From the Server” on page 50.
3.
Disconnect the power cords.
See “Disconnect Power Cords” on page 53.
4.
Locate the processor module in the server that you want to remove.
If you are replacing a faulty processor module, see “Determine Which
Processor Module Is Faulty” on page 60 to locate a faulty processor
module.
If you are adding a processor module, remove the processor filler module in slot 1.
Servicing Processor Modules 61
Page 62
Remove a Processor Module or Processor Filler Module
5.
Press the two extraction levers in toward the server and pull the extraction levers out to disengage the processor module or processor filler module from the server.
6.
Pull the processor module or processor filler module halfway out of the server, and close the levers.
62 SPARC T7-4 Server Service Manual • May 2017
Page 63
Remove a Processor Module or Processor Filler Module
This action protects the levers from damage while the module is outside the server.
7.
Using two hands, completely remove the processor module or processor filler module and place the module on an antistatic mat.
Caution - Do not touch the connectors at the rear of the module.
8.
Determine your next step.
If you are replacing or installing DIMMs within the processor module, see
“Servicing DIMMs” on page 69.
If you are replacing a faulty processor module, populate and install the replacement processor module:
a.
Remove all of the DIMMs from the faulty processor module, and set them in a safe place.
See “Remove a DIMM” on page 78.
Servicing Processor Modules 63
Page 64

Install a Processor Module or Processor Filler Module

b.
Install the DIMMs into the new processor module.
See “Install a DIMM” on page 80.
c.
Install the processor module.
See “Install a Processor Module or Processor Filler Module” on page 64.
If you have removed a processor filler module as part of a server upgrade,
return to “Server Upgrade Process” on page 56.
If you have removed a processor module or processor filler module to prepare the server for installation, return to “Preparing for Installation” in
SPARC T7-4 Server Installation Guide.
Related Information
“Processor Module Components” on page 19
“Processor Module LEDs” on page 59
“Server Upgrade Process” on page 56
“Determine Which Processor Module Is Faulty” on page 60
“Servicing DIMMs” on page 69
“Install a Processor Module or Processor Filler Module” on page 64
“Verify a Processor Module” on page 67
Install a Processor Module or Processor Filler Module
Processor modules are cold-service components that can be replaced only by qualified service personnel. For the location of the processor modules, see “Front Panel Components
(Service)” on page 14.
Caution - You must disconnect the power cords before servicing this component. See
“Disconnect Power Cords” on page 53.
Caution - This procedure requires that you handle components that are sensitive to electrostatic
discharge. This discharge can cause failure of server components.
1.
Ensure the power cords are disconnected.
See “Disconnect Power Cords” on page 53.
64 SPARC T7-4 Server Service Manual • May 2017
Page 65
Install a Processor Module or Processor Filler Module
2.
Determine your next step.
If you are installing a processor module after replacing or installing DIMMs, go to Step 3.
If you are installing a new processor module to replace a faulty one, install all of the DIMMs that you removed from the faulty processor module into the replacement module. See “Install a DIMM” on page 80.
3.
Open the latches on the processor module or processor filler module, and insert the module into the empty processor module slot in the server.
Note - A processor filler module can only be installed in slot 1.
4.
Bring the levers together toward the center of the module and press the levers firmly against the module to fully seat the module back into the server.
Servicing Processor Modules 65
Page 66
Install a Processor Module or Processor Filler Module
The levers should click into place when the module is fully seated in the server.
5.
Power on the server.
See “Returning the Server to Operation” on page 201.
6.
Verify the processor module functionality.
See “Verify a Processor Module” on page 67.
7.
If you are adding a second processor module to the server, return to “Server
Upgrade Process” on page 56.
Related Information
“Processor Module Components” on page 19
“Server Upgrade Process” on page 56
“Processor Module LEDs” on page 59
“Determine Which Processor Module Is Faulty” on page 60
“Remove a Processor Module or Processor Filler Module” on page 60
“Servicing DIMMs” on page 69
66 SPARC T7-4 Server Service Manual • May 2017
Page 67
“Verify a Processor Module” on page 67

Verify a Processor Module

1.
Use the Oracle ILOM fault management shell to determine if the new processor module is shown as enabled or disabled.
-> start /SP/faultmgmt/shell
Are you sure you want to start /SP/faultmgmt/shell (y/n)? y  faultmgmtsp> fmadm faulty
a.
If the output from the fmadm faulty command shows the replacement processor module as enabled, go to Step 2.
b.
If the output from the fmadm faulty command shows the replacement processor as disabled, go to “Detecting and Managing Faults” on page 25 to clear the PSH-detected fault from the server.
Verify a Processor Module
2.
Verify that the OK LED is lit on the processor module and that the Fault LED is not lit.
See “Processor Module LEDs” on page 59.
3.
Verify that the front and rear Service Required LEDs are not lit.
See “Front Panel Controls and LEDs” on page 29 and “Rear Panel Controls and
LEDs” on page 31.
4.
Perform one of the following tasks based on your verification results:
If the previous steps did not clear the fault, see “Diagnostics
Process” on page 26.
If Step 2 and Step 3 indicate that no faults have been detected, then the processor module has been replaced successfully. No further action is required.
If you are verifying the server after adding a second processor module, return to “Server Upgrade Process” on page 56.
Servicing Processor Modules 67
Page 68
Verify a Processor Module
Related Information
“Processor Module Components” on page 19
“Processor Module LEDs” on page 59
“Determine Which Processor Module Is Faulty” on page 60
“Remove a Processor Module or Processor Filler Module” on page 60
“Install a Processor Module or Processor Filler Module” on page 64
68 SPARC T7-4 Server Service Manual • May 2017
Page 69

Servicing DIMMs

Up to 32 DIMMs can be installed in each processor module, for a total of 64 DIMMs in the server.
DIMMs are cold-service components that can be replaced by customers. For the location of the DIMMs, see “Processor Module Components” on page 19.
Caution - This procedure requires that you handle components that are sensitive to electrostatic
discharge. This discharge can cause failure of server components.
Description Links
Understand how to replace DIMMs “Understanding DIMM Configurations” on page 69
“Identifying DIMMs” on page 71
Locate a faulty DIMM “DIMM Fault Handling” on page 74
“Determine Which DIMM Is Faulty
(PSH)” on page 75
“Determine Which DIMM Is Faulty (DIMM Fault
LEDs)” on page 76
“DIMM Configuration Errors” on page 72
Replace a DIMM “Remove a DIMM” on page 78
“Install a DIMM” on page 80
“Verify a DIMM” on page 83

Understanding DIMM Configurations

These topics describe DIMM configurations:
“Supported Memory Configurations” on page 70
“Identifying DIMMs” on page 71
“DIMM Configuration Errors” on page 72
Servicing DIMMs 69
Page 70
Understanding DIMM Configurations

Supported Memory Configurations

The server supports 16-Gbyte, 32-Gbyte, and 64-Gbyte DIMMs, with up to 4096 Gbytes in a server fully configured with two processor modules.
Each processor module can be either half populated (16 DIMMs) or fully populated (32 DIMMs).
Consider these population rules when installing, upgrading, or replacing DIMMs in a processor module:
In half-populated configurations, 16 DIMMs must be installed in all CH0 slots.
These slots have black ejector levers.
In fully-populated configurations (32 DIMMs), DIMMs must be installed in all slots (CH0 and CH1)
Note - The DIMM sparing feature is available only in fully-populated servers.
All DIMMs associated with each CMx must be identical (same size, same rank classification).
Mixed configurations are supported (DIMMs associated with CM0 with one size, and DIMMs associated with CM1 with a different size) as long as all DIMMs in the server have
the same rank classification. For example, 32 Gbyte 4Rx4 DIMMs associated with PM0/ CM0, and 64 Gbyte 4Rx4 DIMMs associated with PM0/CM1.
To identify DIMM architecture, see “Identifying DIMMs” on page 71.
Related Information
“DIMM FRU Names” on page 73
“Identifying DIMMs” on page 71
“Remove a DIMM” on page 78
“Install a DIMM” on page 80
“Verify a DIMM” on page 83
“Server Upgrade Process” on page 56
“Processor Module Configuration” on page 57
70 SPARC T7-4 Server Service Manual • May 2017
Page 71
Understanding DIMM Configurations

Identifying DIMMs

Each DIMM is affixed with an identifying label. The first four characters on the label describe the DIMM memory capacity; the second four characters describe the rank classification. Use these labels to identify the DIMMs installed in the server, to verify that any replacement DIMMs are compatible, or to confirm that upgrade DIMMs may be installed in a supported configuration.
The following DIMMs are supported:
DIMM Capacity DRAM Density Rank Classification Label
16 Gbyte 4 Gbit Dual-rank x4 2Rx4
32 Gbyte 4 Gbit Quad-rank x4 4Rx4
32 Gbyte 8 Gbit Dual-rank x4 2Rx4
64 Gbyte 8 Gbit Quad-rank x4 4Rx4
Servicing DIMMs 71
Page 72
Understanding DIMM Configurations
Related Information
“Understanding DIMM Configurations” on page 69
“DIMM FRU Names” on page 73
“DIMM Configuration Errors” on page 72

DIMM Configuration Errors

When the server boots, system firmware checks the memory configuration against the rules described in “Understanding DIMM Configurations” on page 69. If any violations of these rules are detected, the following general error message is displayed.
Please refer to the service documentation for supported memory configurations.
In some cases, the server boots in a degraded state, and a message such as the following is displayed:
WARNING: Running with a nonstandard DIMM configuration. Refer to service document for details.
In other cases, the configuration error is fatal, and the following message is displayed:
Fatal configuration error - forcing power-down
In addition to these general memory configuration errors, one or more rule-specific messages is displayed, indicating the type of configuration error detected. To identify the DIMMs affected, use the fmadm faulty command as described in “Check for Faults” on page 33.
Related Information
“Check for Faults” on page 33
“Clear a Fault Manually” on page 40
“Understanding DIMM Configurations” on page 69
“DIMM FRU Names” on page 73
“Identifying DIMMs” on page 71
“DIMM Fault Handling” on page 74
72 SPARC T7-4 Server Service Manual • May 2017
Page 73

DIMM FRU Names

The following table illustrates the DIMM addresses on a processor module, with the front of the processor module oriented toward the left:
DIMM FRU Names
CM1/BOB21/CH1
CM1/BOB21/CH0
CM1/BOB20/CH0
CM1/BOB20/CH1
CM1/BOB30/CH1
CM1/BOB30/CH0
CM1/BOB31/CH0
CM1/BOB31/CH1
CM0/BOB21/CH1
CM0/BOB21/CH0
CM0/BOB20/CH0
CM0/BOB20/CH1
CM0/BOB30/CH1
CM0/BOB30/CH0
CM0/BOB31/CH0
CM0/BOB31/CH1
CM1/BOB01/CH1
CM1/BOB01/CH0
CM1/BOB00/CH0
CM1/BOB00/CH1
CM1/BOB10/CH1
CM1/BOB10/CH0
CM1/BOB11/CH0
CM1/BOB11/CH1
CM0/BOB01/CH1
CM0/BOB01/CH0
CM0/BOB00/CH0
CM0/BOB00/CH1
CM0/BOB10/CH1
CM0/BOB10/CH0
CM0/BOB11/CH0
CM0/BOB11/CH1
CM1
CM0
DIMM NAC names are based both on the location of the DIMM slot on the processor module, and in which slot the processor module is installed. For example, the full NAC name for the DIMM installed in the front-left corner on a processor module installed at PM0 is:
/SYS/PM0/CM1/CMP/BOB21/CH1/DIMM

Related Information

“Servicing Processor Modules” on page 55
“Understanding DIMM Configurations” on page 69
“Identifying DIMMs” on page 71
“DIMM Fault Handling” on page 74
“DIMM Configuration Errors” on page 72
Servicing DIMMs 73
Page 74

DIMM Fault Handling

DIMM Fault Handling
A variety of features play a role in how the memory subsystem is configured and how memory faults are handled. Understanding the underlying features helps you identify and repair memory problems.
The following server features manage memory faults:
POST – By default, POST runs when the server is powered on.
For CEs, POST forwards the error to the PSH daemon for error handling. If an uncorrectable memory fault is detected, POST displays the fault with the device name of the faulty DIMMs, and logs the fault. POST then disables the faulty DIMMs. Depending on the memory configuration and the location of the faulty DIMM, POST disables half of physical memory in the server, or half the physical memory and half the processor threads. When this offlining process occurs in normal operation, you must replace the faulty DIMMs based on the fault message and enable the disabled DIMMs with the Oracle ILOM command set devicecomponent_state=enabled where device is the name of the DIMM being enabled.
PSH technology – The Oracle PSH feature uses the Fault Manager daemon (fmd) to watch for various kinds of faults. When a fault occurs, the fault is assigned a UUID and logged. PSH reports the fault and suggests a replacement for the DIMMs associated with the fault.
If you suspect the server has a memory problem, run the Oracle ILOM show faulty command. This command lists memory faults and identifies the DIMM modules associated with the fault.

Related Information

“POST Overview” on page 37
“Understanding DIMM Configurations” on page 69
“DIMM FRU Names” on page 73
“DIMM Configuration Errors” on page 72

Identifying Faulty DIMMs

You can identify faulty DIMMs using the following methods:
“Determine Which DIMM Is Faulty (Oracle ILOM)” on page 75
“Determine Which DIMM Is Faulty (PSH)” on page 75
74 SPARC T7-4 Server Service Manual • May 2017
Page 75

Determine Which DIMM Is Faulty (Oracle ILOM)

“Determine Which DIMM Is Faulty (DIMM Fault LEDs)” on page 76
Determine Which DIMM Is Faulty (Oracle ILOM)
If you suspect that the server has a memory problem, run the Oracle ILOM show faulty command.
This command lists memory faults and identifies the DIMM modules associated with the fault.
Related Information
“Determine Which DIMM Is Faulty (PSH)” on page 75
“Determine Which DIMM Is Faulty (DIMM Fault LEDs)” on page 76

Determine Which DIMM Is Faulty (PSH)

The Oracle Fault Management tool fmadm faulty displays current server faults, including DIMM failures.
1.
Start the Fault Management Shell:
-> start /SP/faultmgmt/shell Are you sure you want to start /SP/faultmgmt/shell (y/n)? y
2.
Type:
faultmgmtsp> fmadm faulty
------------------- ------------------------------------ -------------- -----­ Time UUID msgid Severity
------------------- ------------------------------------ -------------- -----­2014-08-18/21:04:40 7040d859-5b03-4a58-8dfd-e3a80875d62f SPSUN4V-8000-EJ Critical Problem Status : solved Diag Engine : fdd 1.0 System Manufacturer : Oracle Corporation Name : SPARC T7-4 Part_Number : 7021179 Serial_Number : 1201CTHC01 System Component Manufacturer : Oracle Corporation Name : SPARC T7-4 Part_Number : 7021179
Servicing DIMMs 75
Page 76

Determine Which DIMM Is Faulty (DIMM Fault LEDs)

Serial_Number : 1201CTHC01
---------------------------------------­Suspect 1 of 1 Fault class : fault.memory.dimm-ue Certainty : 100% Affects : /SYS/PM0/CM1/CMP/BOB10/CH0/DIMM Status : faulted but still in service FRU Status : faulty Location : /SYS/PM0/CM1/CMP/BOB10/CH0/DIMM Manufacturer : Samsung Name : 16384MB DDR4 SDRAM DIMM Part_Number : 07042208,M393B1K70DH0-YK0 Revision : 04 Serial_Number : 00CE0212153367DD4B Chassis Manufacturer : Oracle Corporation Name : SPARC T7-4 Part_Number : 7021179 Serial_Number : 1201CTHC01 Description : Uncorrectable errors have occurred while accessing memory. Response : An attempt will be made to remove the affected memory from service. Host HW may restart. Impact : Total system memory capacity has been reduced and some applications may have been terminated. Action : Use 'fmadm faulty' to provide a more detailed view of this event. Please refer to the associated reference document at http://support.oracle.com/msg/SPSUN4V-8000-EJ for the latest service procedures and policies regarding this diagnosis.
Related Information
“Determine Which DIMM Is Faulty (Oracle ILOM)” on page 75
“Determine Which DIMM Is Faulty (DIMM Fault LEDs)” on page 76
Determine Which DIMM Is Faulty (DIMM Fault LEDs)
DIMMs are cold-service components that can be replaced by customers. For the location of the DIMMs, see “Processor Module Components” on page 19.
Caution - This procedure requires that you handle components that are sensitive to electrostatic
discharge. This discharge can cause failure of server components.
76 SPARC T7-4 Server Service Manual • May 2017
Page 77
Determine Which DIMM Is Faulty (DIMM Fault LEDs)
1.
Consider your first steps.
Familiarize yourself with DIMM configuration rules.
See “Understanding DIMM Configurations” on page 69
Prepare the system for service.
See “Preparing for Service” on page 43.
Remove the processor module containing the faulty DIMM. Place the
processor module on an ESD-protect work surface. Remove the processor module cover.
See “Remove a Processor Module or Processor Filler Module” on page 60.
2.
Locate the DIMM Fault Remind button on the processor module.
Servicing DIMMs 77
Page 78

Remove a DIMM

3.
Verify that the Memory Riser Power LED next to the button is illuminated.
An illuminated Memory Riser Power LED indicates that there is power available to illuminate any Memory DIMM Fault LEDs once you have pressed the DIMM Fault Remind button.
4.
Press the DIMM Fault Remind button on the processor module.
This will cause the Memory DIMM Fault LEDs associated with any faulty DIMMs to illuminate for a few minutes.
5.
Note the address of the DIMM next to any illuminated Memory DIMM Fault LED.
6.
Ensure that all other DIMMs are seated correctly in their slots.
Related Information
“Determine Which DIMM Is Faulty (Oracle ILOM)” on page 75
“Determine Which DIMM Is Faulty (PSH)” on page 75
Remove a DIMM
DIMMs are cold-service components that can be replaced by customers. For the location of the DIMMs, see “Processor Module Components” on page 19.
Caution - This procedure requires that you handle components that are sensitive to electrostatic
discharge. This discharge can cause failure of server components.
1.
Consider your first steps.
Familiarize yourself with DIMM population rules.
See “Understanding DIMM Configurations” on page 69
Prepare the system for service.
See “Preparing for Service” on page 43.
Remove the processor module. Place the processor module on an ESD-
protect work surface.
See “Remove a Processor Module or Processor Filler Module” on page 60.
2.
Remove the cover from the processor module.
78 SPARC T7-4 Server Service Manual • May 2017
Page 79
Press the green button near the front edge of the cover and slide the cover back and up off the main module.
3.
Locate the DIMMs that need to be replaced.
See “Determine Which DIMM Is Faulty (DIMM Fault LEDs)” on page 76.
4.
Push down on the ejector tabs on each side of the DIMM until the DIMM is released.
Caution - DIMMs and heat sinks on the motherboard might be hot.
Remove a DIMM
5.
Grasp the top corners of the faulty DIMM and lift it out of its slot.
6.
Place the DIMM on an antistatic mat.
7.
Repeat Step 4 through Step 6 for any other DIMMs you intend to remove.
8.
Determine your next step:
Servicing DIMMs 79
Page 80

Install a DIMM

If you are installing replacement DIMMs at this time, go to “Install a
DIMM” on page 80.
If you are not installing replacement DIMMs at this time, go to Step 9.
9.
Return the server to operation.
See:
Install the processor module.
See “Install a Processor Module or Processor Filler Module” on page 64.
Power on the server.
See “Power On the Server (Oracle ILOM)” on page 202.
Verify DIMM functionality.
See “Verify a DIMM” on page 83.
Related Information
“Understanding DIMM Configurations” on page 69
“Understanding DIMM Configurations” on page 69
“Determine Which DIMM Is Faulty (DIMM Fault LEDs)” on page 76
“Determine Which DIMM Is Faulty (PSH)” on page 75
“Install a DIMM” on page 80
“Verify a DIMM” on page 83
Install a DIMM
DIMMs are cold-service components that can be replaced by customers. For the location of the DIMMs, see “Processor Module Components” on page 19.
Caution - This procedure requires that you handle components that are sensitive to electrostatic
discharge. This discharge can cause failure of server components.
1.
Consider your first steps.
Familiarize yourself with DIMM population rules.
See “Understanding DIMM Configurations” on page 69
80 SPARC T7-4 Server Service Manual • May 2017
Page 81
Prepare the system for service.
See “Preparing for Service” on page 43.
Remove the processor module. Place the processor module on an ESD-
protect work surface.
See “Remove a Processor Module or Processor Filler Module” on page 60.
2.
Consider your next steps.
If you are replacing a faulty DIMM, ensure that you have removed the faulty DIMM.
See “Identifying Faulty DIMMs” on page 74.
See “Remove a DIMM” on page 78.
If you are adding DIMMs to a half-populated processor module:
Ensure you have the correct DIMMs for your server. See “Identifying
DIMMs” on page 71.
Install a DIMM
If you are populating a new processor module:
Ensure you have the correct DIMMs for your server. See “Understanding DIMM
Configurations” on page 69.
3.
Unpack the replacement DIMMs and place them on an antistatic mat.
4.
Ensure that the ejector tabs on the connector that will receive the DIMM are in the open position.
5.
Align the DIMM notch with the key in the connector.
Servicing DIMMs 81
Page 82
Install a DIMM
Caution - Ensure that the orientation is correct. The DIMM might be damaged if the orientation
is reversed.
6.
Push the DIMM into the connector until the ejector tabs lock the DIMM in place.
If the DIMM does not easily seat into the connector, check the DIMM's orientation.
7.
Repeat Step 4 through Step 6 until all new DIMMs are installed.
8.
Place the cover onto the processor module and slide the cover forward until the latch clicks into place.
9.
Consider your next steps.
If you are adding a second processor module to the server, return to “Server
Upgrade Process” on page 56.
82 SPARC T7-4 Server Service Manual • May 2017
Page 83
If you are replacing a processor module after installing replacement DIMMs, proceed to Step 10.
10.
Finish the installation procedure.
See:
Install the processor module.
See “Install a Processor Module or Processor Filler Module” on page 64.
Return the server to operation.
See “Returning the Server to Operation” on page 201.
Verify DIMM functionality.
See “Verify a DIMM” on page 83.
Related Information
“Understanding DIMM Configurations” on page 69
“Understanding DIMM Configurations” on page 69
“Identifying DIMMs” on page 71
“Remove a DIMM” on page 78
“Verify a DIMM” on page 83
“DIMM Fault Handling” on page 74
“DIMM Configuration Errors” on page 72

Verify a DIMM

Verify a DIMM
1.
Access the Oracle ILOM prompt.
Refer to the SPARC T7 Series Server Administration Guide for instructions.
2.
Use the show faulty command to determine how to clear the fault.
If show faulty indicates a POST-detected fault, go to Step 3.
If show faulty output displays a UUID, which indicates a host-detected fault, skip Step 3 and go directly to Step 4.
3.
Use the set command to enable the DIMM that was disabled by POST.
In most cases, replacement of a faulty DIMM is detected when the service processor is power cycled. In those cases, the fault is automatically cleared from the server. If show faulty still displays the fault, the set command will clear it.
Servicing DIMMs 83
Page 84
Verify a DIMM
-> set /SYS/PM0/CM0/CMP/BOB10/CH0/DIMM requested_config_state=Enabled Set 'requested_config_state' to 'enabled'
4.
For a host-detected fault, perform the following steps to verify the new DIMM:
a.
Set the virtual keyswitch to diag so that POST will run in Service mode.
-> set /HOST keyswitch_state=diag Set 'keyswitch_state' to 'diag'
b.
Power cycle the server.
-> stop /System Are you sure you want to stop /System (y/n)? y Stopping /System
-> start /System Are you sure you want to start /System (y/n)? y Starting /System
c.
Use the show /HOST command to determine when the host has been powered off.
The console will display status=Powered Off. Allow approximately one minute before running this command.
d.
Switch to the system console to view POST output.
Watch the POST output for possible fault messages. The following output indicates that POST did not detect any faults:
-> start /HOST/console ... 0:0:0>INFO: 0:0:0> POST Passed all devices. 0:0:0>POST: Return to VBSC. 0:0:0>Master set ACK for vbsc runpost command and spin...
Note - The server might boot automatically at this point. If so, go directly to Step 4g. If it
remains at the ok prompt go to Step 4e.
e.
If the server remains at the ok prompt, type boot.
f.
Return the virtual keyswitch to normal mode.
-> set /SYS keyswitch_state=normal
84 SPARC T7-4 Server Service Manual • May 2017
Page 85
Set 'ketswitch_state' to 'normal'
g.
Switch to the system console and check for faults.
# fmadm faulty
If any faults are reported, refer to the diagnostics instructions described in “Check for
Faults” on page 33.
5.
Switch to the Oracle ILOM command shell.
6.
Run the show faulty command.
-> show faulty Target | Property | Value
--------------------+------------------------+------------------------------­/SP/faultmgmt/0 | fru | /SYS/PM0/CM0/CMP/BOB10/D0 /SP/faultmgmt/0 | timestamp | Dec 14 22:43:59 /SP/faultmgmt/0/ | sunw-msg-id | SUN4V-8000-DX faults/0 | | /SP/faultmgmt/0/ | uuid | 3aa7c854-9667-e176-efe5-e487e520 faults/0 | | 7a8a /SP/faultmgmt/0/ | timestamp | Dec 14 22:43:59 faults/0 | |
Verify a DIMM
If the show faulty command reports a fault with a UUID, go on to Step 7. If show faulty does not report a fault with a UUID, you are done with the verification process.
7.
Switch to the system console and use the fmadm repair command with the UUID.
Use the same UUID that was displayed from the output of the Oracle ILOM show faulty command. For example:
# fmadm repair 3aa7c854-9667-e176-efe5-e487e520
Related Information
“Understanding DIMM Configurations” on page 69
“Understanding DIMM Configurations” on page 69
“DIMM Fault Handling” on page 74
“DIMM Configuration Errors” on page 72
“Determine Which DIMM Is Faulty (DIMM Fault LEDs)” on page 76
“Determine Which DIMM Is Faulty (PSH)” on page 75
“Remove a DIMM” on page 78
“Install a DIMM” on page 80
Servicing DIMMs 85
Page 86
86 SPARC T7-4 Server Service Manual • May 2017
Page 87

Servicing Hard Drives

Hard drives are hot-service components that can be replaced by customers. For the location of the hard drives, see “Hard Drive Configuration” on page 87.
Caution - This procedure requires that you handle components that are sensitive to electrostatic
discharge. This discharge can cause failure of server components.
These topics describe service procedures for the hard drives in the server.
“Hard Drive Configuration” on page 87
“Hard Drive Configuration” on page 87
“Hard Drive LEDs” on page 89
“Determine Which Hard Drive Is Faulty” on page 90
“Remove a Hard Drive” on page 90
“Install a Hard Drive” on page 94
“Verify a Hard Drive” on page 95

Hard Drive Configuration

You can install a mix of hard drives and solid state drives. The server requires at least one hard drive to be installed and operational.
Servicing Hard Drives 87
Page 88
Hard Drive Configuration
No. Description No. Description
1 Drive 1 5 Drive 5
2 Drive 0 6 Drive 4
3 Drive 3 7 Drive 7
4 Drive 2 8 Drive 6
The hard drives in the server are hot-serviceable, meaning that the drives can be removed and inserted while the server is powered on.
Depending on the configuration of the data on a particular drive, the drive might also be removable while the server is online. However, to hot-service a drive while the server is online you must take the drive offline before you can safely remove it. Taking a drive offline prevents any applications from accessing it, and removes logical software links to it.
You cannot hot-service a drive in the following situations:
If the drive contains the operating system and the operating system is not mirrored on another drive.
If the drive cannot be logically isolated from the online operations of the server.
If either of these conditions apply to the drive being serviced, you must take the server offline (shut down the operating system) before you replace the drive.

Related Information

“Supported Storage and Backup Devices” on page 22
“Component Service Task Reference” on page 22
“Hard Drive LEDs” on page 89
“Determine Which Hard Drive Is Faulty” on page 90
“Remove a Hard Drive” on page 90
88 SPARC T7-4 Server Service Manual • May 2017
Page 89
“Install a Hard Drive” on page 94
“Verify a Hard Drive” on page 95

Hard Drive LEDs

Hard Drive LEDs
No. LED Icon Description
1 Ready to Remove
(blue)
2 Service Required
(amber)
3 OK/Activity
(green)
Indicates that a drive can be removed during a hot-service operation.
Indicates that the drive has experienced a fault condition.
Indicates the drive's availability for use.
■ On – Read or write activity is in progress.
■ Off – Drive is idle and available for use.

Related Information

“Hard Drive Configuration” on page 87
“Hard Drive Configuration” on page 87
“Determine Which Hard Drive Is Faulty” on page 90
“Remove a Hard Drive” on page 90
“Install a Hard Drive” on page 94
“Verify a Hard Drive” on page 95
Servicing Hard Drives 89
Page 90

Determine Which Hard Drive Is Faulty

Determine Which Hard Drive Is Faulty
The following LEDs are lit when a hard drive fault is detected:
System Service Required LEDs on the front panel and rear I/O module
Service Required LED on the faulty drive
1.
Determine if the System Service Required LEDs are lit on the front panel or the rear I/O module.
See “Interpreting LEDs” on page 27.
2.
From the front of the server, check the drive LEDs to identify which drive needs to be replaced.
See “Hard Drive LEDs” on page 89. The amber Service Required LED is lit on the drive that needs to be replaced.
3.
Remove the faulty drive.
See “Remove a Hard Drive” on page 90.
Related Information
“Hard Drive Configuration” on page 87
“Hard Drive Configuration” on page 87
“Hard Drive LEDs” on page 89
“Remove a Hard Drive” on page 90
“Install a Hard Drive” on page 94
“Verify a Hard Drive” on page 95

Remove a Hard Drive

Hard drives are hot-service components that can be replaced by customers. For the location of the hard drives, see “Hard Drive Configuration” on page 87.
Caution - This procedure requires that you handle components that are sensitive to electrostatic
discharge. This discharge can cause failure of server components.
1.
Locate the drive in the server that you want to remove.
See “Hard Drive Configuration” on page 87 for the locations of the drives in the server.
90 SPARC T7-4 Server Service Manual • May 2017
Page 91
Remove a Hard Drive
See “Determine Which Hard Drive Is Faulty” on page 90 to locate a faulty drive.
2.
Determine if you need to shut down the OS to replace the drive, and perform one of the following actions:
If the drive cannot be taken offline without shutting down the OS, follow instructions in “Power Off the Server (Oracle ILOM)” on page 51, and go to
Step 4.
If the drive can be taken offline without shutting down the OS, go to Step 3.
3.
Take the drive offline:
For a standard drive:
a.
At the Oracle Solaris prompt, type the cfgadm -al command to list all drives in the device tree, including drives that are not configured.
# cfgadm -al
This command lists dynamically reconfigurable hardware resources and shows their operational status. In this case, look for the status of the drive you plan to remove. This information is listed in the Occupant column.
Example:
Ap_idType Receptacle Occupant Condition ... c2 scsi-sas connected configured unknown c2::w5000cca00a76d1f5,0disk-path connected configured  unknown c3 scsi-sas connected configured unknown c3::w5000cca00a772bd1,0 disk-path connected configured unknown c4 scsi-sas connected configured unknown c4::w5000cca00a59b0a9,0 disk-path connected configured unknown ...
You must unconfigure any drive whose status is listed as configured, as described in
Step 31b.
b.
Unconfigure the drive using the cfgadm -c unconfigure command.
Example:
Servicing Hard Drives 91
Page 92
Remove a Hard Drive
# cfgadm -c unconfigure c2::w5000cca00a76d1f5,0
Replace c2::w5000cca00a76d1f5,0 with the drive name that applies to your situation.
c.
Verify that the blue Ready to Remove LED on the drive is lit.
For an NVMe Drive:
a.
Determine the name of the NVMe drive to be removed.
# hotplug list -lc
Locate the name of the drive, such as /SYS/DBP/NVME0 in this example.
You can use this same command to check the state of the drive at other stages of the removal procedure.
b.
Disable the NVMe drive.
# hotplug disable /SYS/DBP/NVME0
Check that the drive's state has changed from ENABLED to POWERED.
# hotplug list -lc
c.
Power down the NVMe drive.
# hotplug poweroff /SYS/DBP/NVME0
Check that the drive's state has changed from POWERED to PRESENT.
# hotplug list -lc
In this state, the blue OK to Remove LED on the NVMe drive is lit.
Note - Do not remove the drive unless the blue OK to Remove LED is lit.
92 SPARC T7-4 Server Service Manual • May 2017
Page 93
Remove a Hard Drive
4.
Press the drive release button to unlock the drive.
5.
Pull on the latch to remove the drive from the server.
Caution - The latch is not an ejector. Do not force the latch too far to the right. Doing so can
damage the latch.
6.
After you remove an NVMe drive, check that the drive slot's state has changed to EMPTY.
# hotplug list -lc
7.
Install the replacement drive or a filler tray.
Servicing Hard Drives 93
Page 94

Install a Hard Drive

See “Install a Hard Drive” on page 94.
Related Information
“Determine Which Hard Drive Is Faulty” on page 90
“Install a Hard Drive” on page 94
“Verify a Hard Drive” on page 95
Install a Hard Drive
Hard drives are hot-service components that can be replaced by customers. For the location of the hard drives, see “Hard Drive Configuration” on page 87.
Caution - This procedure requires that you handle components that are sensitive to electrostatic
discharge. This discharge can cause failure of server components.
1.
Align the replacement drive to the drive slot, and slide the drive in until it is seated.
Drives are physically addressed according to the slot in which they are installed. If you are replacing a drive, install the replacement drive in the same slot as the drive that was removed. See “Hard Drive Configuration” on page 87 for drive slot information.
2.
Close the latch to lock the drive in place.
3.
Verify the installation.
94 SPARC T7-4 Server Service Manual • May 2017
Page 95
See “Verify a Hard Drive” on page 95.
Related Information
“Determine Which Hard Drive Is Faulty” on page 90
“Remove a Hard Drive” on page 90
“Verify a Hard Drive” on page 95

Verify a Hard Drive

1.
Determine if you replaced or installed a hard drive in a running server or not.
If you replaced or installed a hard drive in a server that is running (if you hot-serviced the hard drive), then no further action is necessary. The Oracle Solaris OS auto-configures the hard drive.
If you replaced or installed a hard drive in a powered-down server, then continue with these procedures to configure the hard drive.
Verify a Hard Drive
If you hot-serviced an NVMe drive, it should automatically power up and attach. If not, power up and attach the drive manually.
# hotplug enable /SYS/DBP/NVME0
Check that the drive's state has changed to ENABLED.
# hotplug list –lc
2.
If the OS is shut down, and the drive you replaced was not the boot device, boot the OS.
Depending on the nature of the replaced drive, you might need to perform administrative tasks to reinstall software before the server can boot. Refer to the Oracle Solaris OS administration documentation for more information.
3.
At the Oracle Solaris prompt, type the cfgadm -al command to list all drives in the device tree, including any drives that are not configured.
# cfgadm -al
This command helps you identify the drive you installed. For example:
Servicing Hard Drives 95
Page 96
Verify a Hard Drive
Ap_idType Receptacle Occupant Condition ... c2 scsi-sas connected configured unknown c2::w5000cca00a76d1f5,0disk-path connected configured  unknown c3 scsi-sas connected configured unknown c3::sd2 disk-path connected unconfigured unknown c4 scsi-sas connected configured unknown c4::w5000cca00a59b0a9,0 disk-path connected configured unknown ...
4.
Configure the drive using the cfgadm -c configure command.
For example:
# cfgadm -c configure c2::w5000cca00a76d1f5,0
Replace c2::w5000cca00a76d1f5,0 with the drive name for your configuration.
5.
Verify that the blue Ready-to-Remove LED is no longer lit on the drive that you installed.
See “Hard Drive LEDs” on page 89.
6.
At the Oracle Solaris prompt, type the cfgadm -al command to list all drives in the device tree, including any drives that are not configured.
# cfgadm -al
The replacement drive is now listed as configured. For example:
Ap_idType Receptacle Occupant Condition ... c2 scsi-sas connected configured unknown c2::w5000cca00a76d1f5,0disk-path connected configured  unknown c3 scsi-sas connected configured unknown c3::w5000cca00a772bd1,0disk-path connected configured unknown c4 scsi-sas connected configured unknown c4::w5000cca00a59b0a9,0 disk-path connected configured unknown ...
7.
Perform one of the following tasks based on your verification results.
If the previous steps did not verify the drive, see “Diagnostics
Process” on page 26.
96 SPARC T7-4 Server Service Manual • May 2017
Page 97
Verify a Hard Drive
If the previous steps indicate that the drive is functioning properly, perform the tasks required to configure the drive. These tasks are covered in the Oracle Solaris OS administration documentation.
For additional drive verification, you can run the Oracle VTS software. Refer to the Oracle VTS documentation for details.
Related Information
“Determine Which Hard Drive Is Faulty” on page 90
“Remove a Hard Drive” on page 90
“Install a Hard Drive” on page 94
Servicing Hard Drives 97
Page 98
98 SPARC T7-4 Server Service Manual • May 2017
Page 99

Servicing the Main Module

For the location of the main module, see “Front Panel Components (Service)” on page 14.
Caution - This procedure requires that you handle components that are sensitive to electrostatic
discharge. This discharge can cause failure of server components.
Caution - You must disconnect the power cords before servicing this component. See
“Disconnect Power Cords” on page 53.
Step Description Link
1. Determine if the main module is faulty. “Main Module LEDs” on page 100
2. Prepare the server for service. “Preparing for Service” on page 43
3. Remove the main module. “Remove the Main Module” on page 101
4. Service main module components. “Servicing NVMe Switch Cards” on page 111
“Servicing the Drive Backplane” on page 119
“Servicing the SPM” on page 125
“Servicing the SCC PROM” on page 133
“Servicing the Battery” on page 137
“Servicing the Front I/O Assembly” on page 143
5. Install the main module. “Install the Main Module” on page 105
“Verify the Main Module” on page 108
6. Return the server to operation. “Returning the Server to Operation” on page 201
Servicing the Main Module 99
Page 100

Main Module LEDs

Main Module LEDs
No. LED Icon Description
1 Service Required LED
(amber)
2 Power OK LED (green) Indicates these conditions:
3 SPM LED
(green)
SPM Indicates these conditions:
Indicates that service is required. POST and Oracle ILOM are two diagnostic tools that can detect a fault or failure resulting in this indication.
The Oracle ILOM show faulty command provides details about any faults that cause this indicator to illuminate.
Under some fault conditions, individual component fault LEDs are lit in addition to the Service Required LED.
■ Off – System is not running in its normal state. System power might be off. The SPM might be running.
■ Steady on – System is powered on and is running in its normal operating state. No service actions are required.
■ Fast blink – System is running in standby mode and can be quickly returned to full function.
■ Slow blink – A normal but transitory activity is taking place. Slow blinking might indicate that system diagnostics are running or that the system is booting.
■ Off – AC power might have been connected to the power supplies.
100 SPARC T7-4 Server Service Manual • May 2017
Loading...