intellectual propertylaws. Exceptas expressly permittedin yourlicense agreement orallowed bylaw, youmay notuse, copy, reproduce, translate,
broadcast, modify, license, transmit, distribute,exhibit, perform,publish, ordisplay anypart, inany form,or byany means.Reverse engineering,
disassembly, or decompressionof thissoftware, unlessrequired by law for interoperability, is prohibited.
The informationcontained hereinis subjectto changewithout noticeand isnot warranted to be error-free.If youfind anyerrors, please report them to us
in writing.
If thisis softwareor related software documentationthat isdelivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the
following noticeis applicable:
U.S. GOVERNMENTEND USERS.Oracle programs,including anyoperating system,integrated software, anyprograms installed on the hardware,
and/or documentation,delivered toU.S. Governmentend usersare "commercial computer software" pursuantto theapplicable FederalAcquisition
Regulation andagency-specific supplementalregulations. Assuch, use,duplication, disclosure, modification,and adaptationof theprograms, including
any operatingsystem, integratedsoftware, anyprograms installed on the hardware,and/or documentation,shall besubject tolicense termsand license
restrictions applicableto theprograms. No other rights are granted to the U.S. Government.
This software orhardware is developedfor generaluse ina variety of information management applications. It is notdeveloped orintended for use in any
inherently dangerous applications,including applicationswhich maycreate a risk of personal injury.If you use this software or hardwarein dangerous
applications, thenyou shallbe responsibleto takeall appropriate fail-safe,backup, redundancy, and other measures toensure its safe use. Oracle
Corporation andits affiliatesdisclaim anyliability forany damagescaused byuse of this software or hardware indangerous applications.
Oracle andJava areregistered trademarks of Oracle and/or its affiliates.Other namesmay betrademarks oftheir respective owners.
Intel andIntel Xeonare trademarksor registered trademarks of Intel Corporation. All SPARC trademarks areused underlicense andare trademarks or
registered trademarks of SPARCInternational, Inc. AMD, Opteron, theAMD logo,and theAMD Opteron logo are trademarksor registered trademarksof
Advanced MicroDevices. UNIXis aregistered trademark of The Open Group.
This software or hardware and documentation may provide access to or information on content, products, and services from third parties. Oracle
Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and
services. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party
content, products, or services.
restrictions d’utilisationet dedivulgation. Saufdisposition devotre contrat de licence ou de la loi, vous ne pouvez pas copier, reproduire, traduire,
diffuser, modifier, breveter, transmettre, distribuer, exposer, exécuter, publier ouafficher le logiciel, même partiellement, sous quelque forme et par
quelque procédéque cesoit. Parailleurs, ilest interdit deprocéder à toute ingénierie inverse du logiciel, de le désassembler ou de le décompiler, excepté à
des finsd’interopérabilité avecdes logicielstiers outel queprescrit par la loi.
Les informationsfournies dansce documentsont susceptiblesde modificationsans préavis.Par ailleurs,Oracle Corporationne garantitpas qu’elles
soient exemptesd’erreurs etvous invite,le caséchéant, àlui enfaire part par écrit.
U.S. GOVERNMENTEND USERS.Oracle programs,including anyoperating system,integrated software, anyprograms installed on the hardware,
and/or documentation,delivered toU.S. Governmentend usersare "commercial computer software" pursuantto theapplicable FederalAcquisition
Regulation andagency-specific supplementalregulations. Assuch, use,duplication, disclosure, modification,and adaptationof theprograms, including
any operatingsystem, integratedsoftware, anyprograms installed on the hardware,and/or documentation,shall besubject tolicense termsand license
restrictions applicableto theprograms. No other rights are granted to the U.S. Government.
Ce logicielou matériela étédéveloppé pourun usagegénéral dansle cadre d’applicationsde gestiondes informations.Ce logicielou matérieln’est pas
conçu nin’est destinéà êtreutilisé dansdes applicationsà risque,notamment dans des applications pouvant causer des dommages corporels. Si vous
utilisez celogiciel oumatériel dansle cadred’applications dangereuses, ilest devotre responsabilité de prendre toutes les mesures de secours, de
sauvegarde, deredondance et autres mesures nécessaires àson utilisationdans desconditions optimalesde sécurité.Oracle Corporationet sesaffiliés
déclinent touteresponsabilité quantaux dommagescausés parl’utilisation dece logicielou matériel pour ce type d’applications.
Oracle etJava sontdes marquesdéposées d’OracleCorporation et/oude sesaffiliés.Tout autre nommentionné peutcorrespondre à des marques
appartenant àd’autres propriétaires qu’Oracle.
Intel etIntel Xeonsont desmarques oudes marques déposéesd’Intel Corporation.Toutes les marques SPARC sontutilisées souslicence etsont des
marques oudes marques déposéesde SPARC International, Inc. AMD, Opteron, le logo AMD et le logo AMD Opteron sontdes marques oudes marques
déposées d’AdvancedMicro Devices.UNIX estune marque déposéed’The OpenGroup.
Ce logicielou matérielet ladocumentation quil’accompagne peuventfournir desinformations oudes liensdonnant accèsà descontenus, desproduits et
des servicesémanant detiers. OracleCorporation etses affiliésdéclinent toute responsabilité ou garantie expresse quant aux contenus,produits ou
services émanantde tiers.En aucuncas, OracleCorporation etses affiliésne sauraient être tenus pour responsables des pertes subies, des coûts
occasionnés oudes dommagescausés parl’accès àdes contenus,produits ou services tiers, ou à leur utilisation.
Page 3
Contents
Using This Documentationvii
Detecting and Managing Faults1
Interpreting Status LEDs1
Front Panel LEDs2
Rear Panel LEDs3
▼Check Chassis Status LEDs4
▼Check NET MGT Port Status LEDs4
▼Check Link Status LEDs5
▼Check Power Supply Status LEDs6
▼Check Fan Status LEDs7
Managing Faulty Components7
▼Display Faulty Components (fault_state)8
▼Display Faulty Components (/SP/faultmgmt)9
▼Clear a Fault Manually10
Clearable Fault Targets11
▼Identify Faults in the Oracle ILOM Event Log12
Determining the Alarm State of a Component or System13
▼Display the General Alarm State of Systems and Components14
System Alarm Targets15
Component Alarm Targets15
Oracle ILOM Target Alarm States16
iii
Page 4
Evaluating Sensor Alarms17
▼Display Oracle ILOM Sensor Status18
▼Determine Oracle ILOM Sensor Target Types20
Evaluating a Voltage Sensor Alarm20
▼Evaluate a Voltage Sensor21
Voltage Sensor Values22
Voltage Out of Range22
Evaluating a Temperature Sensor Alarm23
▼Evaluate a Temperature Sensor24
Temperature Sensor Values24
Temperature Out of Range25
Evaluating a Speed Sensor Alarm26
▼Evaluate a Speed Sensor26
Speed Sensor Values27
Speed Out of Range27
Evaluating a State Sensor Alarm29
▼Evaluate a State Sensor29
State Sensor Alarm Conditions30
Evaluating a Presence Sensor Alarm30
▼Evaluate a Presence Sensor31
Presence Sensor Alarm Conditions31
Evaluating an Indicator State32
▼Evaluate an Indicator State32
Indicator State Values33
Indicator State Conditions33
Accessing CLI Prompts34
▼Access the Oracle ILOM CLI (NET MGT Port)35
▼Enter the Restricted Linux Shell35
ivSun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 5
▼Exit the Restricted Linux Shell36
Understanding Service Procedures37
Replaceable Components37
Suggested Tools for Service39
Antistatic Precautions for Service39
Servicing Power Supplies41
▼Determine If a Power Supply Is Faulty41
Inspecting a Power Supply43
▼Identify the Power Supply43
▼Inspect the Power Supply Hardware45
▼Inspect the Power Supply Connectors45
▼Power Off a Power Supply46
▼Remove a Power Supply47
▼Install a Power Supply49
▼Power On a Power Supply51
Servicing Fans55
▼Determine If a Fan Is Faulty55
Inspecting a Fan57
▼Identify the Fan57
▼Inspect the Fan Hardware58
▼Inspect the Fan Connector59
▼Remove a Fan60
▼Install a Fan61
Servicing Data Cables65
Inspecting the Data Cables65
▼Identify the Data Cable66
Contentsv
Page 6
▼Inspect the Data Cable Hardware67
▼Inspect the Data Cable Connectors or Transceivers67
▼Remove a Data Cable68
▼Install a Data Cable72
Servicing the Battery75
▼Determine If the Battery Is Faulty75
▼Remove the Gateway From the Rack77
▼Replace the Battery78
Index85
viSun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 7
Using This Documentation
This service manual provides detailed procedures that describe the service of the Sun
Network QDR InfiniBand Gateway Switch from Oracle. This document is written for
technicians, system administrators, and users who have advanced experience
servicing InfiniBand fabric hardware.
■ “Product Notes” on page vii
■ “Related Documentation” on page vii
■ “Feedback” on page viii
■ “Access to Oracle Support” on page viii
Product Notes
For late-breaking information and known issues about this product, refer to the
product notes at:
http://docs.oracle.com/cd/E36256_01
Related Documentation
DocumentationLinks
Sun Network QDR InfiniBand Gateway Switch
Firmware Version 2.1
http://docs.oracle.com/cd/E36256_01
vii
Page 8
DocumentationLinks
Oracle Solaris 11 OShttp://www.oracle.com/goto/Solaris11/docs
Oracle Integrated Lights Out Manager (ILOM) 3.0http://docs.oracle.com/cd/E19860-01
All Oracle productshttp://docs.oracle.com
Feedback
Provide feedback on this documentation at:
http://www.oracle.com/goto/docfeedback
Access to Oracle Support
Oracle customers have access to electronic support through My Oracle Support. For
information, visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=
info or http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs visit
if you are hearing impaired.
viiiSun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 9
Detecting and Managing Faults
These topics explain how to use various diagnostic tools to find and troubleshoot
faults and alarms in the gateway.
Note – A fault identifies a failure of a component. An alarm identifies an abnormal
condition of a component or system, as reported by a sensor.
DescriptionLinks
Investigate whether there is a fault condition.“Interpreting Status LEDs” on page 1
“Managing Faulty Components” on page 7
“Identify Faults in the Oracle ILOM Event Log” on page 12
Investigate whether there is an alarm condition.“Determining the Alarm State of a Component or System”
on page 13
“Evaluating Sensor Alarms” on page 17
Related Information
■ “Understanding Service Procedures” on page 37
■ “Servicing Power Supplies” on page 41
■ “Servicing Fans” on page 55
■ “Servicing Data Cables” on page 65
■ “Servicing the Battery” on page 75
Interpreting Status LEDs
Use these topics to interpret LEDs to determine if a component has failed.
■ “Front Panel LEDs” on page 2
■ “Rear Panel LEDs” on page 3
1
Page 10
■ “Check Chassis Status LEDs” on page 4
■ “Check NET MGT Port Status LEDs” on page 4
■ “Check Link Status LEDs” on page 5
■ “Check Power Supply Status LEDs” on page 6
■ “Check Fan Status LEDs” on page 7
Related Information
■ “Interpreting Status LEDs” on page 1
■ “Managing Faulty Components” on page 7
■ “Identify Faults in the Oracle ILOM Event Log” on page 12
■ “Determining the Alarm State of a Component or System” on page 13
■ “Evaluating Sensor Alarms” on page 17
■ “Accessing CLI Prompts” on page 34
Front Panel LEDs
No.LEDLink
1Power supply AC LED“Check Power Supply Status LEDs” on page 6
2Power supply Attention LED“Check Power Supply Status LEDs” on page 6
3Power supply OK LED“Check Power Supply Status LEDs” on page 6
4Fan Attention LED“Check Fan Status LEDs” on page 7
2Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 11
Related Information
■ “Rear Panel LEDs” on page 3
■ “Check Chassis Status LEDs” on page 4
■ “Check NET MGT Port Status LEDs” on page 4
■ “Check Link Status LEDs” on page 5
■ “Check Power Supply Status LEDs” on page 6
■ “Check Fan Status LEDs” on page 7
Rear Panel LEDs
No.LEDLink
1NET MGT status LEDs“Check NET MGT Port Status LEDs” on page 4
2InfiniBand link status LEDs“Check Link Status LEDs” on page 5
3Ethernet link status LEDs“Check Link Status LEDs” on page 5
4Chassis status LEDs“Check Chassis Status LEDs” on page 4
5Not used
Related Information
■ “Front Panel LEDs” on page 2
■ “Check Chassis Status LEDs” on page 4
■ “Check NET MGT Port Status LEDs” on page 4
■ “Check Link Status LEDs” on page 5
■ “Check Power Supply Status LEDs” on page 6
■ “Check Fan Status LEDs” on page 7
Detecting and Managing Faults3
Page 12
▼ Check Chassis Status LEDs
The chassis status LEDs are located on the left side of the rear panel. See “Rear Panel
LEDs” on page 3.
1. Visually inspect the chassis status LEDs.
2. Compare what you see to this table.
GlyphLocationNameColorState and Meaning
TopLocatorWhiteOn – No function.
Off – Disabled.
Flashing – The gateway is identifying itself.
MiddleAttentionAmberOn – Normal fault detected.
Off – No faults detected.
Flashing – No function.
BottomOKGreenOn – Gateway is functional without fault.
Off – Gateway is off or initializing.
Flashing – No function.
3. If the Attention LED is lit, there is a fault present.
See “Managing Faulty Components” on page 7.
Related Information
■ “Front Panel LEDs” on page 2
■ “Rear Panel LEDs” on page 3
■ “Check NET MGT Port Status LEDs” on page 4
■ “Check Link Status LEDs” on page 5
■ “Check Power Supply Status LEDs” on page 6
■ “Check Fan Status LEDs” on page 7
▼ Check NET MGT Port Status LEDs
The NET MGT port status LEDs are located on the NET MGT connector of the rear
panel. See “Rear Panel LEDs” on page 3.
1. Visually inspect the NET status LEDs.
2. Compare what you see to this table.
4Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 13
NameLocationColorState and Meaning
Link speedLeftAmber or greenAmber on – 100BASE-T link.
Green on – 1000BASE-T link.
Off – No link or link down.
Flashing – No function.
ActivityRightGreenOn – No function.
Off – No activity.
Flashing – Packet activity.
3. If the Activity LED is off, there might be a problem with the communication to
the management controller.
Refer to Gateway Administration, network management troubleshooting guidelines.
Related Information
■ “Front Panel LEDs” on page 2
■ “Rear Panel LEDs” on page 3
■ “Check Chassis Status LEDs” on page 4
■ “Check Link Status LEDs” on page 5
■ “Check Power Supply Status LEDs” on page 6
■ “Check Fan Status LEDs” on page 7
▼ Check Link Status LEDs
The link status LEDs are located at the data cable connectors of the rear panel. See
“Rear Panel LEDs” on page 3.
1. Visually inspect the link status LEDs.
2. Compare what you see for a particular link to this table.
NameColorState and Meaning
LinkGreenOn – Link established.
Off – No link or link down.
Flashing – Symbol errors.
3. If the Link LED flashes, there might be a problem with the data cable.
See “Servicing Data Cables” on page 65.
Detecting and Managing Faults5
Page 14
Related Information
■ “Front Panel LEDs” on page 2
■ “Rear Panel LEDs” on page 3
■ “Check Chassis Status LEDs” on page 4
■ “Check NET MGT Port Status LEDs” on page 4
■ “Check Power Supply Status LEDs” on page 6
■ “Check Fan Status LEDs” on page 7
▼ Check Power Supply Status LEDs
The power supply status LEDs are located on the power supply at the front of the
chassis. See “Front Panel LEDs” on page 2.
1. Visually inspect the power supply’s status LEDs.
2. Compare what you see on the power supply to this table.
Caution – If a power supply has shut down because of a thermal or overcurrent
condition, signified by the amber Attention LED lighting, remove the respective
power cord from the chassis. Allow the power supply to completely cool for at least
15 minutes. A shorter cooling time might cause damage to the power supply when
the power cord is reattached. If the Attention LED lights amber upon reattaching the
power cord, replace the power supply.
3. If the Attention LED is lit, there is a fault with that power supply.
See “Servicing Power Supplies” on page 41.
6Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 15
Related Information
■ “Front Panel LEDs” on page 2
■ “Rear Panel LEDs” on page 3
■ “Check Chassis Status LEDs” on page 4
■ “Check NET MGT Port Status LEDs” on page 4
■ “Check Link Status LEDs” on page 5
■ “Check Fan Status LEDs” on page 7
▼ Check Fan Status LEDs
The fan status LEDs are located in the lower right corner of the fans at the front of
the gateway chassis. See “Front Panel LEDs” on page 2.
1. Visually inspect the fan status LEDs.
2. If the LED is lit, there is a fault with that fan.
See “Servicing Fans” on page 55.
Related Information
■ “Front Panel LEDs” on page 2
■ “Rear Panel LEDs” on page 3
■ “Check Chassis Status LEDs” on page 4
■ “Check NET MGT Port Status LEDs” on page 4
■ “Check Link Status LEDs” on page 5
■ “Check Power Supply Status LEDs” on page 6
Managing Faulty Components
If Oracle ILOM has detected a fault with a component, you can display and clear that
fault with these topics:
■ “Display Faulty Components (fault_state)” on page 8
■ “Display Faulty Components (/SP/faultmgmt)” on page 9
■ “Clear a Fault Manually” on page 10
■ “Clearable Fault Targets” on page 11
Detecting and Managing Faults7
Page 16
Related Information
■ “Interpreting Status LEDs” on page 1
■ “Identify Faults in the Oracle ILOM Event Log” on page 12
■ “Determining the Alarm State of a Component or System” on page 13
■ “Evaluating Sensor Alarms” on page 17
■ “Accessing CLI Prompts” on page 34
▼ Display Faulty Components (fault_state)
You can identify faulty components by their fault state.
1. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
2. Display the fault state of components.
-> show / -a -l 4 -o table fault_state
Target | Property | Value
--------------------+------------------------+------/SYS/MB | fault_state | OK
/SYS/PSU0 | fault_state | OK
/SYS/PSU1 | fault_state | OK
/SYS/FAN0 | fault_state | OK
/SYS/FAN1 | fault_state | OK
/SYS/FAN2 | fault_state | Faulted
/SYS/FAN3 | fault_state | OK
/SYS/FAN4 | fault_state | OK
->
3. Look in the Value column for Faulted.
4. Look in the same row under the Target column, to find the Oracle ILOM target
of the faulty component.
For example, /SYS/FAN2.
5. Identify the component that has faulted and might need to be replaced.
See “Clearable Fault Targets” on page 11.
Related Information
■ “Display Faulty Components (/SP/faultmgmt)” on page 9
■ “Clear a Fault Manually” on page 10
■ “Clearable Fault Targets” on page 11
8Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 17
▼ Display Faulty Components (/SP/faultmgmt)
1. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
2. Display any faulty components.
-> show -d targets /SP/faultmgmt
/SP/faultmgmt
Targets:
x (faulted_target)
->
where:
■ x is the target sequence number (starting at 0).
■ faulted_target is the Oracle ILOM target of the faulty component.
Note – If there are several faulty components, then their respective targets are listed
with increasing target sequence numbers.
Note – If no number is displayed, there are no faulty components.
For example:
-> show -d targets /SP/faultmgmt
/SP/faultmgmt
Targets:
0 (/SYS/PSU0)
->
3. Display details of the fault.
-> show -d properties /SP/faultmgmt/x/faults/y
where:
■ x is the target sequence number (starting at 0).
Detecting and Managing Faults9
Page 18
■ y is the fault sequence number (starting at 0) for the target x.
For example:
-> show /SP/faultmgmt/0/faults/0
/SP/faultmgmt/0/faults/0
The class property provides a general reason for the fault.
4. Use faulted_target to identify the component that has faulted and might need to
be replaced.
See “Clearable Fault Targets” on page 11.
Related Information
■ “Display Faulty Components (fault_state)” on page 8
■ “Clear a Fault Manually” on page 10
■ “Clearable Fault Targets” on page 11
▼ Clear a Fault Manually
If Oracle ILOM detects a fault and consequential component replacement, Oracle
ILOM automatically clears the fault. However, you can manually clear the fault after
replacing the component, if necessary.
1. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
10Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 19
2. Clear the fault.
-> set target clear_fault_action=true
where target is from “Clearable Fault Targets” on page 11.
For example, to clear a fault with power supply 0, type.
-> set /SYS/PSU0 clear_fault_action=true
Are you sure you want to clear /SYS/PSU0 (y/n)? y
Set ’clear_fault_action’ to ’true’
->
Related Information
■ “Display Faulty Components (fault_state)” on page 8
■ “Display Faulty Components (/SP/faultmgmt)” on page 9
■ “Clearable Fault Targets” on page 11
Clearable Fault Targets
This table lists the components, their Oracle ILOM targets that are clearable, and
links to servicing procedures.
ComponentTargetLinks
Battery/SYS/MB“Servicing the Battery” on page 75
SSD drive/SYS/MBReplace the gateway. See “Remove the Gateway
From the Rack” on page 77.
Fan x, where x is 0 to 4/SYS/FANx“Servicing Fans” on page 55
Power supply x, where x is either 0 or 1/SYS/PSUx“Servicing Power Supplies” on page 41
Use this table for these procedures:
■ “Display Faulty Components (/SP/faultmgmt)” on page 9
■ “Clear a Fault Manually” on page 10
■ “Identify Faults in the Oracle ILOM Event Log” on page 12
Related Information
■ “Display Faulty Components (fault_state)” on page 8
■ “Display Faulty Components (/SP/faultmgmt)” on page 9
Detecting and Managing Faults11
Page 20
■ “Clear a Fault Manually” on page 10
▼Identify Faults in the Oracle ILOM
Event Log
1. Access Oracle ILOM.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
2. Display the Oracle ILOM event log.
-> show /SP/logs/event/list Class==class Type==type
where you choose class and type from the table in Gateway Administration, log entry
filters.
For example, to display log entries pertaining to all faults, type.
-> show /SP/logs/event/list Class==Fault
Note – If you want to display log entries pertaining to only component failure, use
the show /SP/logs/event/list Class==Fault Type==Fault command.
3. Identify the faulty components in the output.
The Oracle ILOM targets of the faulty components follow the word component.
For example:
-> show /SP/logs/event/list Class==Fault
Event
ID Date/Time Class Type Severity
Fault detected at time = Tue Sep 25 13:44:56 2012. The suspect component:
/SYS/PSU0 has fault.chassis.device.psu.fail with probability=100. Refer
to http://support.oracle.com/msg/DCSIB-8000-23 for details.
Fault detected at time = Tue Sep 18 15:51:48 2012. The suspect component:
/SYS/PSU0 has fault.chassis.device.psu.fail with probability=100. Refer
12Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 21
.
.
.
->
to http://support.oracle.com/msg/DCSIB-8000-23 for details.
Note – The most recent events are listed at the top of the log.
In this example, Event ID 18567 on September 18, at 15:51, indicated that a critical
fault occurred in the component with Oracle ILOM target /SYS/PSU0. This is
power supply 0 as identified in “Clearable Fault Targets” on page 11. Following
the Oracle ILOM target is the reason for the fault. A URL is provided for more
information about the fault.
Moving up the output, Event ID 18569 on September 18, at 16:43, indicated that a
repair action was taken on the component with Oracle ILOM target /SYS/PSU0.
The power supply was repaired. The term repaired can mean either repaired
or replaced. In either case, the power supply in slot 0 was now functional.
Continuing up the output, Event ID 18820 on September 25 indicated that a
critical fault occurred again in the component with Oracle ILOM target
/SYS/PSU0.
4. Depending on the severity of the fault, replace the component.
See “Clearable Fault Targets” on page 11 for servicing links.
Related Information
■ “Interpreting Status LEDs” on page 1
■ “Managing Faulty Components” on page 7
■ “Determining the Alarm State of a Component or System” on page 13
■ “Evaluating Sensor Alarms” on page 17
■ “Accessing CLI Prompts” on page 34
Determining the Alarm State of a
Component or System
When a component or system of components experiences a condition which triggers
an alarm, the condition might affect the operation of the gateway. These topics enable
you to display alarm states.
■ “Display the General Alarm State of Systems and Components” on page 14
Detecting and Managing Faults13
Page 22
■ “System Alarm Targets” on page 15
■ “Component Alarm Targets” on page 15
■ “Oracle ILOM Target Alarm States” on page 16
Related Information
■ “Interpreting Status LEDs” on page 1
■ “Managing Faulty Components” on page 7
■ “Identify Faults in the Oracle ILOM Event Log” on page 12
■ “Evaluating Sensor Alarms” on page 17
■ “Accessing CLI Prompts” on page 34
▼ Display the General Alarm State of Systems and
Components
1. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
2. Type.
-> show target alarm_status
where target is from the tables in “System Alarm Targets” on page 15 and
“Component Alarm Targets” on page 15.
For example, to display the general alarm state of fan 1, type.
-> show /SYS/FAN1 alarm_status
/SYS/FAN1
Properties:
alarm_status = cleared
->
3. Compare the value displayed to the alarm states.
See “Oracle ILOM Target Alarm States” on page 16.
4. If the alarm state is major or critical, you might need to replace the
component.
See “Clearable Fault Targets” on page 11 for servicing links.
14Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 23
Related Information
■ “System Alarm Targets” on page 15
■ “Component Alarm Targets” on page 15
■ “Oracle ILOM Target Alarm States” on page 16
System Alarm Targets
This table lists systems that have the ability to report an alarm and their Oracle ILOM
targets.Use these targets for the procedure, “Display the General Alarm State of
Systems and Components” on page 14.
SystemTarget
Cooling system/SYS/COOLING_ATTN
Signal cable monitoring/SYS/CABLE_ATTN
Power system/SYS/POWER_ATTN
Power redundancy/SYS/POWER_REDUN
Cooling redundancy/SYS/COOLING_REDUN
Signal cable connections/SYS/CABLE_CONN_STAT
Temperature monitoring/SYS/TEMP_ATTN
InfiniBand devices within the gateway/SYS/IBDEV_ATTN
Entire gateway/SYS/CHASSIS_STATUS
Related Information
■ “Display the General Alarm State of Systems and Components” on page 14
■ “Component Alarm Targets” on page 15
■ “Oracle ILOM Target Alarm States” on page 16
Component Alarm Targets
This table lists components or sensors that have the ability to report an alarm, and
their Oracle ILOM targets. Use these targets for the procedure “Display the General
Alarm State of Systems and Components” on page 14.
Detecting and Managing Faults15
Page 24
ComponentTarget
ECB alarm/SYS/MB/V_ECB
3.3v main voltage alarm/SYS/MB/V_3.3VMainOK
5v alarm/SYS/MB/V_5VOK
1.0v alarm/SYS/MB/V_1.0VOK
I4 switch chip voltage alarm/SYS/MB/V_I41.2VOK
2.5 v alarm/SYS/MB/V_2.5VOK
Digital power alarm/SYS/MB/V_V1P2DIG
Analog power alarm/SYS/MB/V_V1P2ANG
BridgeX chip voltage alarm/SYS/MB/V_BX1.2VOK
1.8V alarm/SYS/MB/V_1.8VOK
I4 switch chip boot alarm/SYS/MB/BOOT_I4A
SSD drive alarm/SYS/MB/DISK_FAULT
Battery alarm/SYS/MB/BAT_FAULT
Individual power supply alarm, where x is either 0 or 1/SYS/PSUx/FAULT
Individual power supply alert, where x is either 0 or 1/SYS/PSUx/ALERT
Individual power supply mains voltage presence, where
x is either 0 or 1
Individual fan alarm, where x is 0 to 4/SYS/FANx/FAULT
/SYS/PSUx/AC_PRESENT
Related Information
■ “Display the General Alarm State of Systems and Components” on page 14
■ “System Alarm Targets” on page 15
■ “Oracle ILOM Target Alarm States” on page 16
Oracle ILOM Target Alarm States
Use this table to clarify alarm states as seen in the alarm_status = alarm_state
parameter of Oracle ILOM targets and in the output of the procedure “Display the
General Alarm State of Systems and Components” on page 14.
16Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 25
Alarm StateDescription
clearedThe component or system has recovered from an alarmed condition and is fully
operational.
warningAn alarm has identified a condition that is abnormal, but does not affect any
individual component.
minorAn alarm has identified a condition that might affect an individual component.
majorAn alarm has identified a condition that affects only the individual component. The
condition might affect a system, but not enough to compromise the operation of the
gateway.
criticalAn alarm has identified a condition that affects both individual components and
systems. The operation of the gateway is compromised or at risk.
indeterminateOracle ILOM is unable to provide an alarm state for this component.
(none)The component or its alarm is not available to Oracle ILOM. (The component might
have been removed.)
Related Information
■ “Display the General Alarm State of Systems and Components” on page 14
■ “System Alarm Targets” on page 15
■ “Component Alarm Targets” on page 15
Evaluating Sensor Alarms
These topics enable you to evaluate sensor information, to determine if an
unfavorable condition has occurred or will happen.
Detecting and Managing Faults17
Page 26
StepDescriptionLinks
1.Identify a suspect sensor and display its
value.
2.Determine the sensor target and alarm
type.
3.Evaluate the sensor type alarm.“Evaluating a Voltage Sensor Alarm” on page 20
“Display Oracle ILOM Sensor Status” on page 18
“Determine Oracle ILOM Sensor Target Types” on
page 20
“Evaluating a Temperature Sensor Alarm” on page 23
“Evaluating a Speed Sensor Alarm” on page 26
“Evaluating a State Sensor Alarm” on page 29
“Evaluating a Presence Sensor Alarm” on page 30
“Evaluating an Indicator State” on page 32
Related Information
■ “Interpreting Status LEDs” on page 1
■ “Managing Faulty Components” on page 7
■ “Identify Faults in the Oracle ILOM Event Log” on page 12
■ “Determining the Alarm State of a Component or System” on page 13
■ “Accessing CLI Prompts” on page 34
▼ Display Oracle ILOM Sensor Status
1. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
2. Type.
-> show / -a -l 4 -o table alarm_status
Target | Property | Value
3. Look in the Value column for minor, major,orcritical.
For example, minor. For more information about alarm states, see “Oracle ILOM
Target Alarm States” on page 16.
4. Look in the same row under the Target column, to find the Oracle ILOM
sensor target.
For example, /SYS/MB/V_3.3VStby.
5. Display the value of the sensor target.
-> show target value
where target is the Oracle ILOM target for the sensor from Step 4. For example:
-> show /SYS/MB/V_3.3VStby value
/SYS/MB/V_3.3VStby
Properties:
value = 3.490 Volts
->
6. Record the target and value.
For example, /SYS/MB/V_3.3VStby and 3.490 volts.
7. Determine the sensor type.
See “Determine Oracle ILOM Sensor Target Types” on page 20.
Related Information
■ “Determine Oracle ILOM Sensor Target Types” on page 20
■ “Evaluating a Voltage Sensor Alarm” on page 20
■ “Evaluating a Temperature Sensor Alarm” on page 23
■ “Evaluating a Speed Sensor Alarm” on page 26
■ “Evaluating a State Sensor Alarm” on page 29
■ “Evaluating a Presence Sensor Alarm” on page 30
■ “Evaluating an Indicator State” on page 32
Detecting and Managing Faults19
Page 28
▼ Determine Oracle ILOM Sensor Target Types
● Use this table to determine the sensor type from its target and go to the
corresponding link.
The word string represents any string of characters, numbers, and symbols.
Sensor TargetSensor TypeLinks
/SYS/FANx/string• Fan state
• Fan speed
• Fan presence
• “Evaluating a State Sensor Alarm” on page 29
• “Evaluating a Speed Sensor Alarm” on page 26
• “Evaluating a Presence Sensor Alarm” on
page 30
/SYS/I_stringIndicator“Evaluating an Indicator State” on page 32
/SYS/MB/T_stringMain board temperature“Evaluating a Temperature Sensor Alarm” on
page 23
/SYS/MB/V_stringOKMain board voltage state“Evaluating a State Sensor Alarm” on page 29
/SYS/MB/V_stringMain board voltage“Evaluating a Voltage Sensor Alarm” on page 20
/SYS/MB/stringMain board system state“Evaluating a State Sensor Alarm” on page 29
/SYS/PSUx/string• Power supply state
• Power supply presence
• “Evaluating a State Sensor Alarm” on page 29
• “Evaluating a Presence Sensor Alarm” on
page 30
/SYS/stringSystem state“Evaluating a State Sensor Alarm” on page 29
Related Information
■ “Display Oracle ILOM Sensor Status” on page 18
■ “Evaluating a Voltage Sensor Alarm” on page 20
■ “Evaluating a Temperature Sensor Alarm” on page 23
■ “Evaluating a Speed Sensor Alarm” on page 26
■ “Evaluating a State Sensor Alarm” on page 29
■ “Evaluating a Presence Sensor Alarm” on page 30
■ “Evaluating an Indicator State” on page 32
Evaluating a Voltage Sensor Alarm
These topics help you resolve voltage sensor alarms.
■ “Evaluate a Voltage Sensor” on page 21
■ “Voltage Sensor Values” on page 22
20Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 29
■ “Voltage Out of Range” on page 22
Related Information
■ “Display Oracle ILOM Sensor Status” on page 18
■ “Determine Oracle ILOM Sensor Target Types” on page 20
■ “Evaluating a Temperature Sensor Alarm” on page 23
■ “Evaluating a Speed Sensor Alarm” on page 26
■ “Evaluating a State Sensor Alarm” on page 29
■ “Evaluating a Presence Sensor Alarm” on page 30
■ “Evaluating an Indicator State” on page 32
▼ Evaluate a Voltage Sensor
1. Display the sensor status and determine the target type.
See:
■ “Display Oracle ILOM Sensor Status” on page 18
■ “Determine Oracle ILOM Sensor Target Types” on page 20
2. Compare the displayed value with a known good range.
See “Voltage Sensor Values” on page 22.
3. Learn why a voltage sensor might alarm.
See “Voltage Out of Range” on page 22.
4. Determine your next step.
Voltage Sensor TargetActionLinks
• /SYS/MB/V_3.3VMain
• /SYS/MB/V_3.3VStby
Replace the power supply.“Servicing Power Supplies”
on page 41
• /SYS/MB/V_12V
/SYS/MB/V_BATReplace the battery.“Servicing the Battery” on
page 75
All other voltage sensor
targets.
Replace the gateway.“Remove the Gateway From
the Rack” on page 77
Related Information
■ “Voltage Sensor Values” on page 22
■ “Voltage Out of Range” on page 22
Detecting and Managing Faults21
Page 30
Voltage Sensor Values
This table lists typical values and acceptable ranges for the voltage sensors. You use
this table in conjunction with the target and value you recorded in “Display Oracle
ILOM Sensor Status” on page 18. If your voltage sensor’s value is near a boundary or
outside of the acceptable range, refer to “Voltage Out of Range” on page 22.
Voltage Sensor TargetTypical ValueAcceptable Range
/SYS/MB/V_3.3VMain3.266V3.112 to 3.403V
/SYS/MB/V_3.3VStby3.420V3.112 to 3.403V
/SYS/MB/V_12V11.966V11.346 to 12.338V
/SYS/MB/V_5V4.992V4.498 to 5.486V
/SYS/MB/V_BAT3.136V2.746V to N/A
/SYS/MB/V_1.0V1.006V0.877 to 1.158V
/SYS/MB/V_I41.2V1.217V1.041 to 1.392V
/SYS/MB/V_2.5V2.504V2.387 to 2.586V
/SYS/MB/V_V1P2DIG1.170V1.135 to 1.392V
/SYS/MB/V_V1P2ANG1.170V1.135 to 1.392V
/SYS/MB/V_BX1.2V1.217V1.041 to 1.392V
/SYS/MB/V_1.8V1.785V1.697 to 1.891V
/SYS/MB/V_1.2VStby1.193V1.048 to 1.387V
Related Information
■ “Evaluate a Voltage Sensor” on page 21
■ “Voltage Out of Range” on page 22
Voltage Out of Range
Even though all voltages within the chassis are regulated, situations can arise where
a voltage drifts outside of the acceptable range and goes too high or too low.
When a voltage is too high, it can be caused by:
■ The load for which the voltage is provided, is missing – A component has failed or
has been removed from the electrical connection.
■ The regulator for that voltage has failed.
22Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 31
For example, if the voltage at sensor target /SYS/MB/V_I41.2V is too high, then
either the regulator is failing, or the I4 switch chip is no longer requiring the supplied
voltage. This latter situation can occur transitionally if the I4 switch chip is reset or if
all of its ports are disabled. If the I4 switch chip has a catastrophic failure, such as
from overheating, the voltage at the sensor target might go too high.
When a voltage is too low, it can be caused by:
■ The load for which the voltage is provided, has increased beyond that supported
by the regulator - A component has either been overresourced or internally
electrically shorted, internal maximum temperature has been exceeded, or the
electrical connection has been shorted.
■ The regulator for that voltage has failed.
For example, if the voltage at sensor target /SYS/MB/V_I41.2V is too low, then
either the regulator is failing, or the I4 switch chip is under very heavy throughput
loading, quite possibly in conjunction with overheating.
Because both types of voltage extremes for the /SYS/MB/V_I41.2V sensor target can
be indicative of a thermal problem with the I4 switch chip, it follows that a check of
the temperature at sensor target /SYS/MB/T_I4A, is in order.
Note – The 3.3VMain, 3.3VStby, and the 12V are provided by the power supplies
redundantly. If one of these voltages is either too high or too low, one or both of the
power supplies could be at fault, as the voltages are provided by the power supplies
in parallel. Because of this configuration, you must recheck the 3.3VMain, 3.3VStby,
and 12V with only one power supply operational at a time. Re-perform “Display
Oracle ILOM Sensor Status” on page 18 with only the power cord for PSU0
disconnected, and then again with only the power cord for PSU1 disconnected.
Related Information
■ “Evaluate a Voltage Sensor” on page 21
■ “Voltage Sensor Values” on page 22
Evaluating a Temperature Sensor Alarm
These topics help you resolve temperature sensor alarms.
■ “Evaluate a Temperature Sensor” on page 24
■ “Temperature Sensor Values” on page 24
■ “Temperature Out of Range” on page 25
Detecting and Managing Faults23
Page 32
Related Information
■ “Display Oracle ILOM Sensor Status” on page 18
■ “Determine Oracle ILOM Sensor Target Types” on page 20
■ “Evaluating a Voltage Sensor Alarm” on page 20
■ “Evaluating a Speed Sensor Alarm” on page 26
■ “Evaluating a State Sensor Alarm” on page 29
■ “Evaluating a Presence Sensor Alarm” on page 30
■ “Evaluating an Indicator State” on page 32
▼ Evaluate a Temperature Sensor
1. Display the sensor status and determine the target type.
See:
■ “Display Oracle ILOM Sensor Status” on page 18
■ “Determine Oracle ILOM Sensor Target Types” on page 20
2. Compare the displayed value with a known good range.
See “Temperature Sensor Values” on page 24.
3. Learn why a temperature sensor might alarm and take action.
See “Temperature Out of Range” on page 25.
Related Information
■ “Temperature Sensor Values” on page 24
■ “Temperature Out of Range” on page 25
Temperature Sensor Values
This table lists typical values and acceptable ranges for the temperature sensors. You
use this table in conjunction with the target and value you recorded in “Display
Oracle ILOM Sensor Status” on page 18. If your temperature sensor’s value is near a
boundary or outside of the acceptable range, refer to “Temperature Out of Range” on
page 25.
Temperature Sensor TargetTypical ValueAcceptable Range
/SYS/MB/T_BACK30˚C25 to 70˚C
/SYS/MB/T_FRONT29˚C25 to 70˚C
24Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 33
Temperature Sensor TargetTypical ValueAcceptable Range
/SYS/MB/T_SP45˚C25 to 60˚C
/SYS/MB/T_I4A39˚C25 to 70˚C
/SYS/MB/T_B048˚C25 to 70˚C
/SYS/MB/T_B149˚C25 to 70˚C
Related Information
■ “Evaluate a Temperature Sensor” on page 24
■ “Temperature Out of Range” on page 25
Temperature Out of Range
Temperatures within the chassis are regulated by the fans. For the fan cooling to be
effective, the intake room air temperature must be below 25˚C.
When a temperature is too high, it can be caused by:
■ Air flow is insufficient – The fan speeds are too slow, the fans have stopped
spinning, or the fan is missing altogether.
■ Cooling air temperature is too high – No component can be cooled to a
temperature lower than the cooling medium itself. Additionally, as the cooling air
temperature increases, the air’s ability to remove heat diminishes.
■ Heat generated within a component is greater than that removed – The cooling
system was designed for a certain power dissipated by the components. When
those components experience high computing or throughput loads, or are
subjected to overvoltage situations when a voltage regulator fails, they generate
more heat.
For example, if the temperature at sensor target /SYS/MB/T_I4A is too high, then
the fans speeds (/SYS/FANx/TACH) are collectively too low, the cooling air
temperature (/SYS/MB/T_FRONT) is too high, the voltage powering the I4 switch
chip (/SYS/MB/V_I41.2V) is too high, or the loading on the switch chip is too high.
When a temperature is too low, it is rarely a detrimental situation. There is an
exception, when the temperature of a component is the same as room temperature or
lower, there is a great possibility that the component is not functioning as expected.
For example, if the temperature at sensor target /SYS/MB/T_I4A is too low, as
compared to the cooling air temperature (/SYS/MB/T_FRONT), then the I4 switch
chip is being held in a state of reset, the voltage for the I4 switch chip
(/SYS/MB/V_I41.2V) is not being provided, or the I4 switch chip has
catastrophically failed.
Detecting and Managing Faults25
Page 34
Note – The gateway is not fitted with an air filter. Therefore, contaminants can enter
the gateway and adhere to cooling surfaces. The effect is two-fold, the contaminants
prevent the flow of cooling air to the components, and the contaminants behave as
insulators, retaining waste heat dissipated by the components. If supplied voltages,
cooling air temperatures, and fans speeds are within acceptable values, yet
component temperatures are high, the extent of contamination is severe.
When temperatures are out of range, the suggested action is to check the fans and
replace any that are not operating properly. See “Servicing Fans” on page 55.Ifnew
fans do not resolve the problem, then replace the gateway.
Related Information
■ “Evaluate a Temperature Sensor” on page 24
■ “Temperature Sensor Values” on page 24
Evaluating a Speed Sensor Alarm
These topics help you resolve speed sensor alarms.
■ “Evaluate a Speed Sensor” on page 26
■ “Speed Sensor Values” on page 27
■ “Speed Out of Range” on page 27
Related Information
■ “Display Oracle ILOM Sensor Status” on page 18
■ “Determine Oracle ILOM Sensor Target Types” on page 20
■ “Evaluating a Voltage Sensor Alarm” on page 20
■ “Evaluating a Temperature Sensor Alarm” on page 23
■ “Evaluating a State Sensor Alarm” on page 29
■ “Evaluating a Presence Sensor Alarm” on page 30
■ “Evaluating an Indicator State” on page 32
▼ Evaluate a Speed Sensor
1. Display the sensor status and determine the target type.
See:
26Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 35
■ “Display Oracle ILOM Sensor Status” on page 18
■ “Determine Oracle ILOM Sensor Target Types” on page 20
2. Compare the displayed value with a known good range.
See “Speed Sensor Values” on page 27.
3. Learn why a speed sensor might alarm and take action.
See “Speed Out of Range” on page 27.
Related Information
■ “Speed Sensor Values” on page 27
■ “Speed Out of Range” on page 27
Speed Sensor Values
This table lists typical values and acceptable ranges for the speed sensors. You use
this table in conjunction with the target and value you recorded in “Display Oracle
ILOM Sensor Status” on page 18. If your speed sensor’s value is near a boundary or
outside of the acceptable range, refer to “Speed Out of Range” on page 27.
Speed Sensor TargetTypical ValueAcceptable Rang e or Value
/SYS/FANx/TACH12099 RPM6322 to 26705 RPM
Related Information
■ “Evaluate a Speed Sensor” on page 26
■ “Speed Out of Range” on page 27
Speed Out of Range
The speed of the fans is varied by the management controller. The management
controller uses an algorithm that considers the cooling air temperature, the number
of fans spinning, and the temperatures within the chassis, to set the speed of the fans.
Note – The management controller sets all fans of identical type to identical speeds,
and their speeds should not vary more than 2000 RPMs from each other. If one fan’s
speed varies more than 2000 RPMs than the average of the remaining identical fans,
that fan will fail soon and should be replaced.
Detecting and Managing Faults27
Page 36
When a fan speed is too high, it is an indication of the condition of the fan, which if
gone unchecked can be detrimental to the operation of the gateway. A too-high fan
speed can be caused by:
■ Internal failure – To regulate their speed, the fans use hall-effect sensors in an
internal feedback loop. If the sensor fails, the feedback loop opens, and the motor
overspeeds uncontrollably.
■ Other fan failure – The algorithm used by the management controller
compensates for a fan failure by increasing the speed of the remaining functional
fans.
■ Fan obstruction – If the fan intake is blocked, load on the fan is reduced, and the
fan overspeeds.
■ Temperatures too high – If any component temperatures are too high, the fans
spin faster.
■ Supply voltage too high – If the voltage at sensor target /SYS/MB/V_12V is too
high, the fans spin faster.
If a fan overspeeds for an extended time, it will fail. Consequently, insufficient
cooling air will be provided and the gateway will overheat.
When a fan speed is too low, it also is an indication of the condition of the fan, which
directly affects the operation of the gateway. A too-low fan speed can be caused by:
■ Coil failure – The fan motor uses alternating electromagnetic fields to spin the fan
impeller. Depending upon the fan motor design, if the coil that creates a magnetic
field fails, the fan might spin much slower, or not at all.
■ Controller failure – The controller alternates the electromagnet fields to spin the
fan impeller. If the controller fails, the fan might not spin at all.
■ Bearing failure – The fan impeller is balanced on a bearing around which it spins.
The bearing is lubricated with an oil. If the bearing fails or the lubricant degrades,
the fan speed is reduced greatly.
■ Supply voltage too low – If the voltage at sensor target /SYS/MB/V_12V is too
low, the fans spin slower.
If the fans speed is too low, insufficient cooling air will be provided and the gateway
will overheat.
When fan speeds are out of range, the suggested action is to replace any fan that is
not operating properly. See “Servicing Fans” on page 55. If new fans do not resolve
the problem, then replace the gateway.
Related Information
■ “Evaluate a Speed Sensor” on page 26
■ “Speed Sensor Values” on page 27
28Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 37
Evaluating a State Sensor Alarm
These topics help you resolve state sensor alarms.
■ “Evaluate a State Sensor” on page 29
■ “State Sensor Alarm Conditions” on page 30
Related Information
■ “Display Oracle ILOM Sensor Status” on page 18
■ “Determine Oracle ILOM Sensor Target Types” on page 20
■ “Evaluating a Voltage Sensor Alarm” on page 20
■ “Evaluating a Temperature Sensor Alarm” on page 23
■ “Evaluating a Speed Sensor Alarm” on page 26
■ “Evaluating a Presence Sensor Alarm” on page 30
■ “Evaluating an Indicator State” on page 32
▼ Evaluate a State Sensor
1. Display the sensor status and determine the target type.
See:
■ “Display Oracle ILOM Sensor Status” on page 18
■ “Determine Oracle ILOM Sensor Target Types” on page 20
2. Learn why a state sensor might alarm.
See “State Sensor Alarm Conditions” on page 30
3. Determine your next step.
State Sensor TargetActionLinks
/SYS/CHASSIS_STATUSCheck other targets.“Display Oracle ILOM
Sensor Status” on page 18
• /SYS/CABLE_ATTN
• /SYS/CABLE_CONN_STAT
/SYS/MB/BAT_FAULTReplace the battery.“Servicing the Battery” on
Replace the cable.“Servicing Data Cables” on
page 65
page 75
Detecting and Managing Faults29
Page 38
State Sensor TargetActionLinks
• /SYS/MB/V_3.3VMainOK
• /SYS/POWER_ATTN
• /SYS/POWER_REDUN
• /SYS/PSUx/ALERT
• /SYS/PSUx/AC_PRESENT
• /SYS/PSUx/FAULT
• /SYS/TEMP_ATTN
• /SYS/COOLING_ATTN
• /SYS/COOLING_REDUN
• /SYS/FANx/FAULT
• /SYS/MB/BOOT_I4A
• /SYS/IBDEV_ATTN
All other state sensors.Replace the gateway.“Remove the Gateway From
Replace the power supply.“Servicing Power Supplies”
on page 41
Replace the fan.“Servicing Fans” on page 55
Check the I4 switch chip.Refer to Gateway
Administration, resetting a
port.
the Rack” on page 77
Related Information
■ “State Sensor Alarm Conditions” on page 30
State Sensor Alarm Conditions
The gateway has many sensors that check the state of a voltage, component, or
system fault, or voltage presence. In an acceptable state, the state sensors report a
value of State Deasserted, meaning no error. When a voltage, component, or
system goes to a detrimental state, the state sensors report a value of StateAsserted.
For example, when the state of sensor target /SYS/FAN1/FAULT is StateAsserted, there is a problem with fan 1.
Related Information
■ “Evaluate a State Sensor” on page 29
Evaluating a Presence Sensor Alarm
These topics help you resolve presence sensor alarms.
■ “Evaluate a Presence Sensor” on page 31
■ “Presence Sensor Alarm Conditions” on page 31
30Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 39
Related Information
■ “Display Oracle ILOM Sensor Status” on page 18
■ “Determine Oracle ILOM Sensor Target Types” on page 20
■ “Evaluating a Voltage Sensor Alarm” on page 20
■ “Evaluating a Temperature Sensor Alarm” on page 23
■ “Evaluating a Speed Sensor Alarm” on page 26
■ “Evaluating a State Sensor Alarm” on page 29
■ “Evaluating an Indicator State” on page 32
▼ Evaluate a Presence Sensor
1. Display the sensor status and determine the target type.
See:
■ “Display Oracle ILOM Sensor Status” on page 18
■ “Determine Oracle ILOM Sensor Target Types” on page 20
2. Learn why a presence sensor might alarm and take action.
See “Presence Sensor Alarm Conditions” on page 31.
Related Information
■ “Presence Sensor Alarm Conditions” on page 31
Presence Sensor Alarm Conditions
The presence sensors for the power supplies and fans indicate that the component is
physically installed. The sensors do not provide status or health of a component.
During the boot process, the management controller looks for presence sensors to
build a list of Oracle ILOM targets. If the presence sensor cannot be read, yet the
component is physically installed, the management controller does not propagate the
component to the list of targets. Even if the component powers up, so long as it is
invisible to the management controller, the component cannot be used.
If a presence sensor alarms while a component is functional, the management
controller functions as if the component were removed from the chassis. This
situation might cause a fault on the component. If the lack of the component violates
a configuration rule, the chassis Attention LED might illuminate.
Detecting and Managing Faults31
Page 40
When a component is identified as not present, but it is installed, the suggested
action is to replace that component. See “Servicing Fans” on page 55, “Servicing
Power Supplies” on page 41. If the known good component is still identified as not
present, replace the gateway.
Related Information
■ “Evaluate a Presence Sensor” on page 31
Evaluating an Indicator State
These topics help you resolve Indicator state alarms.
■ “Evaluate an Indicator State” on page 32
■ “Indicator State Values” on page 33
■ “Indicator State Conditions” on page 33
Related Information
■ “Display Oracle ILOM Sensor Status” on page 18
■ “Determine Oracle ILOM Sensor Target Types” on page 20
■ “Evaluating a Voltage Sensor Alarm” on page 20
■ “Evaluating a Temperature Sensor Alarm” on page 23
■ “Evaluating a Speed Sensor Alarm” on page 26
■ “Evaluating a State Sensor Alarm” on page 29
■ “Evaluating a Presence Sensor Alarm” on page 30
▼ Evaluate an Indicator State
1. Display the sensor status and determine the target type.
See:
■ “Display Oracle ILOM Sensor Status” on page 18
■ “Determine Oracle ILOM Sensor Target Types” on page 20
2. Compare the displayed value with a known good range.
See “Indicator State Values” on page 33.
3. Learn why an indicator might change state and take action.
See “Indicator State Conditions” on page 33
32Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 41
Related Information
■ “Indicator State Values” on page 33
■ “Indicator State Conditions” on page 33
Indicator State Values
This table lists typical values and acceptable ranges for the indicator targets. The
indicator targets report the state of the chassis status LEDs. You use this table in
conjunction with the value you recorded in “Display Oracle ILOM Sensor Status” on
page 18. If your indicator target’s value is outside of the acceptable range, refer to
“Indicator State Conditions” on page 33.
Indicator TargetTypical ValueAcceptable Value
/SYS/I_LOCATOROffOn or Off
/SYS/I_ATTENTIONOffOff
/SYS/I_POWEROnOn
Related Information
■ “Evaluate an Indicator State” on page 32
■ “Indicator State Conditions” on page 33
Indicator State Conditions
Three primary LED indicators provide management controller status, general chassis
status, and identification. The table correlates the indicator target with the LED that
represents that target.
Indicator Sensor TargetLED
/SYS/I_LOCATORLocator
/SYS/I_ATTENTIONAttention
/SYS/I_POWEROK
When the locator LED is on, it is actually flashing. If the gateway is installed into a
relatively dense rack, the flashing action makes the gateway more conspicuous for
identification.
Detecting and Managing Faults33
Page 42
When the Attention LED is on, it indicates a fault within the gateway chassis. There
is no single fault type that causes the Attention LED to light, so when it is
illuminated, you must determine why.
When the OK LED is off, it indicates a gateway start up condition or the gateway is
completely powered off. If the gateway is in neither state, yet the OK LED is off,
there is a fault with the management controller, and the situation requires further
investigation.
See “Check Chassis Status LEDs” on page 4 and “Display Oracle ILOM Sensor
Status” on page 18 to help determine the alarm condition of the gateway.
Related Information
■ “Evaluate an Indicator State” on page 32
■ “Indicator State Values” on page 33
Accessing CLI Prompts
These tasks enable you to issue Oracle ILOM and restricted shell commands on the
management controller.
■ “Access the Oracle ILOM CLI (NET MGT Port)” on page 35
■ “Enter the Restricted Linux Shell” on page 35
■ “Exit the Restricted Linux Shell” on page 36
Related Information
■ “Interpreting Status LEDs” on page 1
■ “Managing Faulty Components” on page 7
■ “Identify Faults in the Oracle ILOM Event Log” on page 12
■ “Determining the Alarm State of a Component or System” on page 13
■ “Evaluating Sensor Alarms” on page 17
34Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 43
▼ Access the Oracle ILOM CLI (NET MGT Port)
1. If you have not already done so, configure the DHCP server with the MAC
address and new host name of the management controller inside of the gateway.
The MAC address is printed on the customer information (yellow) sheet on the
outside of the gateway shipping carton and on the pull-out tab on the left side
front of the gateway, adjacent to power supply 0.
2. Open an SSH session and connect to the management controller by specifying
the controller’s host name.
where nm2name is the host name of the management controller. Initially, the
password is ilom-admin.
Note – You can change the password at a later time. Refer to Gateway Remote
Management, changing a user role or password, for instructions on how to change
Oracle ILOM user passwords.
The Oracle ILOM shell prompt (->) is displayed.
Related Information
■ “Enter the Restricted Linux Shell” on page 35
■ “Exit the Restricted Linux Shell” on page 36
▼ Enter the Restricted Linux Shell
1. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
Detecting and Managing Faults35
Page 44
2. Enter the restricted Linux shell.
-> show /SYS/Fabric_Mgmt
NOTE: show on Fabric_Mgmt will launch a restricted Linux shell.
User can execute switch diagnosis, SM Configuration and IB
monitoring commands in the shell. To view the list of commands,
use "help" at rsh prompt.
Use exit command at rsh prompt to revert back to
ILOM shell.
FabMan@gateway_name->
The restricted shell prompt (FabMan@gateway_name ->) is displayed, and you can
now issue hardware and InfiniBand commands.
When you want to leave the restricted shell, type the exit command.
Related Information
■ “Access the Oracle ILOM CLI (NET MGT Port)” on page 35
■ “Exit the Restricted Linux Shell” on page 36
▼ Exit the Restricted Linux Shell
When you want to leave the restricted shell, use the exit command.
● On the management controller, type.
FabMan@gateway_name->exit
exit
->
Related Information
■ “Access the Oracle ILOM CLI (NET MGT Port)” on page 35
■ “Enter the Restricted Linux Shell” on page 35
36Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 45
Understanding Service Procedures
Servicing the gateway means a component addition, replacement, or subtraction.
A component addition means installing a component to increase the functionality of
the gateway. Component replacement means removing a failed component and
installing a functional one. Component subtraction means removing a component.
Once a failed part is identified, it can be replaced. The topics listed here help you
service gateway chassis components.
■ “Replaceable Components” on page 37
■ “Suggested Tools for Service” on page 39
■ “Antistatic Precautions for Service” on page 39
Related Information
■ “Detecting and Managing Faults” on page 1
■ “Servicing Power Supplies” on page 41
■ “Servicing Fans” on page 55
■ “Servicing Data Cables” on page 65
■ “Servicing the Battery” on page 75
Replaceable Components
This illustration identifies the replaceable components of the gateway.
37
Page 46
FIGURE:Replaceable Components
Figure Legend
1Battery
2Fan
3Power supply
Related Information
■ “Servicing Power Supplies” on page 41
■ “Servicing Fans” on page 55
■ “Servicing Data Cables” on page 65
■ “Servicing the Battery” on page 75
■ “Suggested Tools for Service” on page 39
■ “Antistatic Precautions for Service” on page 39
38Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 47
Suggested Tools for Service
These tools are necessary or beneficial for servicing the gateway:
■ Antistatic wrist strap
■ Antistatic mat
■ No. 2 Phillips screwdriver
■ No. 1 Phillips screwdriver
■ Flashlight
■ Gloves
■ Magnifying glass
Related Information
■ “Replaceable Components” on page 37
■ “Antistatic Precautions for Service” on page 39
Antistatic Precautions for Service
When installing the gateway chassis, take care to follow antistatic precautions:
■ Use an antistatic mat as a work surface.
■ Wear an antistatic wrist strap that is attached to either the mat or a metal portion
of the gateway chassis.
Related Information
■ “Replaceable Components” on page 37
■ “Suggested Tools for Service” on page 39
Understanding Service Procedures39
Page 48
40Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 49
Servicing Power Supplies
These topics provide procedures for servicing the power supplies.
DescriptionLinks
Add a power supply.“Inspecting a Power Supply” on page 43
“Install a Power Supply” on page 49
“Power On a Power Supply” on page 51
Replace a power supply.“Determine If a Power Supply Is Faulty” on
page 41
“Power Off a Power Supply” on page 46
“Remove a Power Supply” on page 47
“Inspecting a Power Supply” on page 43
“Install a Power Supply” on page 49
“Power On a Power Supply” on page 51
Subtract a power supply.“Power Off a Power Supply” on page 46
“Remove a Power Supply” on page 47
Related Information
■ “Detecting and Managing Faults” on page 1
■ “Understanding Service Procedures” on page 37
■ “Servicing Fans” on page 55
■ “Servicing Data Cables” on page 65
■ “Servicing the Battery” on page 75
▼Determine If a Power Supply Is Faulty
You must determine which power supply is faulty before you replace it.
41
Page 50
1. Check to see if any System Service Required LEDs are lit or flashing.
See “Check Chassis Status LEDs” on page 4.
2. Visually inspect the power supplies to see if any of their status LEDs are lit or
flashing.
See “Check Power Supply Status LEDs” on page 6.
If a power supply is faulty, replace it. See “Remove a Power Supply” on page 47.
3. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
4. Verify that a power supply is faulty.
-> show -d targets /SP/faultmgmt
If a power supply is faulty, you will see /SYS/PSUx listed in the output under
Target:, where x is 0 (left power supply) or 1 (right power supply).
For example:
-> show -d targets /SP/faultmgmt
/SP/faultmgmt
Targets:
0 (/SYS/PSU0)
->
If a power supply is faulty, replace it. See “Remove a Power Supply” on page 47.
If a FRU value in addition to or different from /SYS/PSUx is displayed, see
“Clearable Fault Targets” on page 11 to identify which component is faulty.
In no Oracle ILOM targets are listed, go to Step 5.
5. If you are unable to determine if a power supply is faulty, seek further
information.
See “Detecting and Managing Faults” on page 1.
Related Information
■ “Determine If a Fan Is Faulty” on page 55
■ “Determine If the Battery Is Faulty” on page 75
42Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 51
Inspecting a Power Supply
Before installing a power supply, perform these tasks to verify its suitability for
installation.
StepDescriptionLinks
1.Identify the Power Supply.“Identify the Power Supply” on page 43
2.Inspect the hardware.“Inspect the Power Supply Hardware” on
page 45
3.Inspect the connectors.“Inspect the Power Supply Connectors” on
page 45
Related Information
■ “Inspecting a Fan” on page 57
■ “Inspecting the Data Cables” on page 65
▼ Identify the Power Supply
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Inspecting a Power Supply” on page 43.
2. Use this illustration to identify the various features of a power supply.
Servicing Power Supplies43
Page 52
1AC connector
2Release tab
3Status LEDs
3. Inspect the power supply hardware.
See “Inspect the Power Supply Hardware” on page 45.
Related Information
■ “Identify the Fan” on page 57
■ “Identify the Data Cable” on page 66
44Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 53
▼ Inspect the Power Supply Hardware
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Inspecting a Power Supply” on page 43.
2. Unwrap the replacement power supply from its antistatic packaging.
3. Verify that there is no visible damage to the power supply chassis.
4. Verify that the release tab moves freely and smoothly.
5. Inspect the power supply connectors.
See “Inspect the Power Supply Connectors” on page 45.
Related Information
■ “Inspect the Fan Hardware” on page 58
■ “Inspect the Data Cable Hardware” on page 67
▼ Inspect the Power Supply Connectors
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Inspecting a Power Supply” on page 43.
2. Verify that the connectors are clean and without damage.
Servicing Power Supplies45
Page 54
3. The power supply is ready for installation.
See “Install a Power Supply” on page 49.
Related Information
■ “Inspect the Fan Connector” on page 59
■ “Inspect the Data Cable Connectors or Transceivers” on page 67
▼Power Off a Power Supply
Note – Powering off both power supplies powers off the gateway.
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Servicing Power Supplies” on page 41.
2. Determine which power supply is to be removed.
3. At the front of the gateway chassis, remove the power cord from the respective
power supply.
46Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 55
The power supply is completely powered off.
4. Remove the power supply.
See “Remove a Power Supply” on page 47.
Related Information
■ “Power On a Power Supply” on page 51
▼Remove a Power Supply
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Servicing Power Supplies” on page 41.
Servicing Power Supplies47
Page 56
2. Locate the power supply to be removed.
3. Press and hold the release tab to the left and pull on the handle of the power
supply.
4. Continue to pull the handle of the power supply to remove it from the chassis.
5. Set the power supply aside.
48Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 57
6. Install a replacement power supply.
See “Install a Power Supply” on page 49.
Related Information
■ “Remove a Fan” on page 60
■ “Remove a Data Cable” on page 68
■ “Remove the Gateway From the Rack” on page 77
■ “Replace the Battery” on page 78
▼Install a Power Supply
Note – For residual power discharge, the power supply slot must remain vacant for
at least one minute before installing a power supply.
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Servicing Power Supplies” on page 41.
2. Inspect the replacement power supply.
See “Inspecting a Power Supply” on page 43.
3. Verify that the slot where the power supply installs is clean and free of debris.
4. Verify that the slot connector pins are straight and not missing.
5. Verify that the slot connector receptacles are free from obstructions.
6. Orient the power supply to the opening in the gateway chassis with the status
LEDs on the left and the release tab on the right.
7. Slide the power supply into the open slot, pushing at the handle.
Servicing Power Supplies49
Page 58
8. When the power supply seats, push firmly so that the release tab clicks to secure
the power supply into the chassis.
9. Power on the power supply.
See “Power On a Power Supply” on page 51.
50Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 59
Related Information
■ “Install a Fan” on page 61
■ “Install a Data Cable” on page 72
■ “Replace the Battery” on page 78
▼Power On a Power Supply
1. For residual power discharge, the power cord must remain unattached to the
power supply for at least one minute before powering on a power supply.
2. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Servicing Power Supplies” on page 41.
3. Reconnect the power cord to the power supply.
Servicing Power Supplies51
Page 60
The AC LED lights green to indicate that the power supply is connected to facility
power. A moment later, the OK LED lights green to indicate the power supply is at
full power.
4. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
5. Enter the restricted Linux shell.
See “Enter the Restricted Linux Shell” on page 35.
6. Verify the power supply’s operation with the checkpower and checkvoltages
commands on the management controller.
For example, to check the power supplies:
FabMan@gateway_name->checkpower
PSU 0 present status: OK
PSU 1 present status: OK
All PSUs OK
FabMan@gateway_name->
52Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 61
FabMan@gateway_name->checkvoltages
Voltage ECB OK
Measured 3.3V Main = 3.30 V
Measured 3.3V Standby = 3.42 V
Measured 12V = 12.06 V
Measured 5V = 5.03 V
Measured VBAT = 3.17 V
Measured 1.0V = 1.01 V
Measured I4 1.2V = 1.22 V
Measured 2.5V = 2.51 V
Measured V1P2 DIG = 1.18 V
Measured V1P2 ANG = 1.18 V
Measured 1.2V BridgeX = 1.22 V
Measured 1.8V = 1.80 V
Measured 1.2V Standby = 1.20 V
All voltages OK
FabMan@gateway_name->
Related Information
■ Gateway Reference, checkpower command
■ Gateway Reference, checkvoltages command
■ “Power Off a Power Supply” on page 46
Servicing Power Supplies53
Page 62
54Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 63
Servicing Fans
These topics provide procedures for servicing the fans.
DescriptionLinks
Add a fan.“Inspecting a Fan” on page 57
“Install a Fan” on page 61
Replace a fan.“Determine If a Fan Is Faulty” on page 55
“Remove a Fan” on page 60
“Inspecting a Fan” on page 57
“Install a Fan” on page 61
Subtract a fan.“Remove a Fan” on page 60
Related Information
■ “Detecting and Managing Faults” on page 1
■ “Understanding Service Procedures” on page 37
■ “Servicing Power Supplies” on page 41
■ “Servicing Data Cables” on page 65
■ “Servicing the Battery” on page 75
▼Determine If a Fan Is Faulty
You must determine which power supply is faulty before you replace it.
1. Check to see if any System Service Required LEDs are lit or flashing.
See “Check Chassis Status LEDs” on page 4.
55
Page 64
2. Visually inspect the fans to see if any of their status LEDs are lit.
See “Check Fan Status LEDs” on page 7.
If a fan is faulty, replace it. See “Remove a Fan” on page 60.
3. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
4. Verify that a fan is faulty.
-> show -d targets /SP/faultmgmt
If a fan is faulty, you will see /SYS/FANx listed in the output under Target:,
where x is 0 (left fan) to 4 (right fan).
For example:
-> show -d targets /SP/faultmgmt
/SP/faultmgmt
Targets:
0 (/SYS/FAN2)
->
If a fan is faulty, replace it. See “Remove a Fan” on page 60.
If a FRU value in addition to or different from /SYS/FANx is displayed, see
“Clearable Fault Targets” on page 11 to identify which component is faulty.
If no Oracle ILOM targets are listed, go to Step 5.
5. Within the Oracle ILOM interface, verify the fan speed.
-> show /SYS/FANx/TACH value
where x is 0 (left fan) to 4 (right fan). For example:
-> show /SYS/FAN2/TACH value
/SYS/FAN2/TACH
Properties:
value = 12317.000 RPM
->
6. Compare the value seen with the typical value and range provided in “Speed
Sensor Values” on page 27.
If the fan is faulty, replace it. See “Remove a Fan” on page 60.
56Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 65
7. If you are unable to determine if a fan is faulty, seek further information.
See “Detecting and Managing Faults” on page 1.
Related Information
■ “Determine If a Power Supply Is Faulty” on page 41
■ “Determine If the Battery Is Faulty” on page 75
Inspecting a Fan
Before installing a fan, inspect its hardware and connector to verify its suitability for
installation.
StepDescriptionLinks
1.Identify the fan.“Identify the Fan” on page 57
2.Inspect the hardware.“Inspect the Fan Hardware” on page 58
3.Inspect the connector.“Inspect the Fan Connector” on page 59
Related Information
■ “Inspecting a Power Supply” on page 43
■ “Inspecting the Data Cables” on page 65
▼ Identify the Fan
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Inspecting a Fan” on page 57.
2. Use this illustration to identify the various features of a fan.
Servicing Fans57
Page 66
1Thumbscrew
2Status LED
3. Inspect the fan hardware.
See “Inspect the Fan Hardware” on page 58.
Related Information
■ “Identify the Power Supply” on page 43
■ “Identify the Data Cable” on page 66
▼ Inspect the Fan Hardware
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Inspecting a Fan” on page 57.
2. Unwrap the replacement fan from its antistatic packaging.
3. Verify that there is no visible damage to the fan chassis.
4. Verify that the thumbscrew spins freely and smoothly.
5. Inspect the fan connector.
See “Inspect the Fan Connector” on page 59.
Related Information
■ “Inspect the Power Supply Hardware” on page 45
58Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 67
■ “Inspect the Data Cable Hardware” on page 67
▼ Inspect the Fan Connector
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Inspecting a Fan” on page 57.
2. Verify that the connector is clean and without damage.
3. Verify that the connector receptacles are free from obstructions.
4. Verify that the connector freely floats in its mounting.
5. The fan is ready for installation.
See “Install a Fan” on page 61.
Related Information
■ “Inspect the Power Supply Connectors” on page 45
■ “Inspect the Data Cable Connectors or Transceivers” on page 67
Servicing Fans59
Page 68
▼Remove a Fan
Note – Fans are hot-swappable and do not require powering off. Additionally, if
there are fewer than two operational fans, the gateway shuts down to prevent
thermal overload.
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Servicing Fans” on page 55.
2. Determine which fan is to be removed.
If a fan has failed, its Attention LED lights.
3. Loosen the captive thumbscrew at the right side of the fan.
4. Grasp the handle and pull the fan straight out.
60Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 69
5. Set the fan aside.
6. Consider your next steps:
■ If you are removing the fan for replacement, install a new fan.
See “Install a Fan” on page 61.
■ If you are removing the fan as a subtractive action, you are finished.
Related Information
■ “Remove a Power Supply” on page 47
■ “Remove a Data Cable” on page 68
■ “Remove the Gateway From the Rack” on page 77
■ “Replace the Battery” on page 78
▼Install a Fan
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Servicing Fans” on page 55.
Servicing Fans61
Page 70
2. Inspect the replacement fan.
See “Inspecting a Fan” on page 57.
3. Verify that the slot where the fan installs is clean and free of debris.
4. Verify that the slot connector pins are straight and not missing.
5. Orient the fan to the opening in the gateway chassis with the thumbscrew on
the right.
6. Firmly slide the fan into the chassis until the fan stops.
The fan might immediately power on.
7. Tighten the captive thumbscrew to secure the fan in the gateway chassis.
62Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 71
8. Verify that the fan Attention LED goes out.
9. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
10. Enter the restricted Linux shell.
See “Enter the Restricted Linux Shell” on page 35.
11. Use the getfanspeed command on the management controller to verify the
fan’s operation.
Note – You should see a fan speed for the fan you just installed.
For example, to check the fans:
FabMan@gateway_name->getfanspeed
Fan 0 not present
Fan 1 running at rpm 11212
Fan 2 running at rpm 11313
Fan 3 running at rpm 11521
Fan 4 not present
FabMan@gateway_name->
Related Information
■ Gateway Reference, getfanspeed command
■ “Install a Power Supply” on page 49
■ “Install a Data Cable” on page 72
■ “Replace the Battery” on page 78
Servicing Fans63
Page 72
64Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 73
Servicing Data Cables
These topics provide procedures for servicing the data cables.
DescriptionLinks
Add a data cable.“Inspecting the Data Cables” on page 65
“Install a Data Cable” on page 72
Replace a data cable.“Remove a Data Cable” on page 68
“Inspecting the Data Cables” on page 65
“Install a Data Cable” on page 72
Subtract a data cable.“Remove a Data Cable” on page 68
Related Information
■ “Detecting and Managing Faults” on page 1
■ “Understanding Service Procedures” on page 37
■ “Servicing Power Supplies” on page 41
■ “Servicing Fans” on page 55
■ “Servicing the Battery” on page 75
Inspecting the Data Cables
Before installing a data cable, inspect its hardware and connectors to verify its
suitability for installation.
StepDescriptionLinks
1.Identify the cable.“Identify the Data Cable” on page 66
65
Page 74
StepDescriptionLinks
2.Inspect the hardware.“Inspect the Data Cable Hardware” on
page 67
3.Inspect the connectors“Inspect the Data Cable Connectors or
Transceivers” on page 67
Related Information
■ “Inspecting a Power Supply” on page 43
■ “Inspecting a Fan” on page 57
▼ Identify the Data Cable
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Inspecting the Data Cables” on page 65.
2. Use this illustration to identify the various features of the data cable.
1Retraction strap
2L groove
3Paddle board
66Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 75
3. Inspect the data cable hardware.
See “Inspect the Data Cable Hardware” on page 67.
Related Information
■ “Identify the Power Supply” on page 43
■ “Identify the Fan” on page 57
▼ Inspect the Data Cable Hardware
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Inspecting the Data Cables” on page 65.
2. Verify that the cable is not cut or damaged.
3. Verify that the cable is not kinked or has a fold.
4. Verify that the cable is of the correct type from its label.
5. Inspect the cable connectors or transceivers.
See “Inspect the Data Cable Connectors or Transceivers” on page 67.
Related Information
■ “Inspect the Power Supply Hardware” on page 45
■ “Inspect the Fan Hardware” on page 58
▼ Inspect the Data Cable Connectors or
Transceivers
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Inspecting the Data Cables” on page 65.
2. Verify that the shell is not bent and is parallel to the inner boards.
3. Verify that there are no contaminants inside of the connector or transceiver.
4. Verify that the retractor strap or latch is adequate to remove the connector or
transceiver from the receptacle.
5. Identify the reference surface by the L groove in the surface at the connector tip.
Servicing Data Cables67
Page 76
6. The cable or transceiver is ready for installation.
See “Install a Data Cable” on page 72.
Related Information
■ “Inspect the Power Supply Connectors” on page 45
■ “Inspect the Fan Connector” on page 59
▼Remove a Data Cable
This procedure describes how to remove the cables from the gateway chassis, so that
the cable can be replaced. If you are removing all cables for gateway replacement,
start removing the cables from the left side of the gateway, working your way to the
right.
Note – These instructions are valid for both InfiniBand and Ethernet data cables.
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Servicing Data Cables” on page 65.
2. Loosen the thumbscrews and remove the cover for the cable management
bracket.
68Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 77
3. Locate the cable to be removed.
4. Consider your next steps:
■ If the cable is a one-piece data cable, follow these steps:
a. Grasp the cable connector to support its weight and apply the removal
force.
b. Pull on the retractor strap while simultaneously pulling on the cable
connector.
The cable connector comes free.
Servicing Data Cables69
Page 78
c. Carefully move the cable out of the cable management hardware.
d. Continue to Step 5.
■ If the cable is an assembled data cable, follow these steps:
a. Grasp the release collar on the MTP connector and pull back.
70Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 79
The MTP connector and fiber optic cable come free of the transceiver.
b. Carefully move the fiber optic cable out of the cable management
hardware.
c. Release the latch on the QSFP transceiver and pull on the latch to remove
the transceiver.
The transceiver comes free.
d. Set the transceiver aside.
e. Continue to Step 5.
5. Open hook-and-loop fasteners from bundles and securing hard points to gently
lower the cable to the floor.
Caution – Do not allow the cable or transceiver to drop or strike the floor. Jerking,
bending, pulling on, or dropping the cable can damage the cable.
6. Consider your next steps:
■ If you are removing a single cable for replacement, install the new cable.
See “Install a Data Cable” on page 72.
Servicing Data Cables71
Page 80
■ If you are disconnecting all cables for gateway replacement, repeat from Step 4
for all cables.
Related Information
■ “Remove a Power Supply” on page 47
■ “Remove a Fan” on page 60
■ “Remove the Gateway From the Rack” on page 77
■ “Replace the Battery” on page 78
▼Install a Data Cable
Note – These instructions are valid for InfiniBand and Ethernet data cables. Refer to
Gateway Installation, assembling the optical fiber data cables, for instructions how to
assemble InfiniBand and Ethernet data cables that require assembly.
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Servicing Data Cables” on page 65.
2. Determine your next steps:
■ If you are cabling an entire gateway after a replacement procedure, locate the
cable for the connector 0B and go to Step 6.
■ If you are installing a replacement cable to the gateway, start the procedure at
Step 3.
3. If necessary, assemble the data cable.
Refer to Gateway Installation , assembling the optical fiber data cables.
4. Inspect the replacement data cable.
See “Inspecting the Data Cables” on page 65.
5. Bring the replacement cable to the gateway.
6. Feed the cable through the cable management hardware.
7. Orient the cable connector to the QSFP receptacle squarely and horizontally.
Ensure that the L groove is up for the top row of receptacles, or that the L groove
is down for the bottom row of receptacles.
72Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 81
Note – On some QSFP cable connectors, there is a retraction strap. Both the
retraction strap and L groove indicate the reference surface for the connector. When
installing QSFP cables in the top row receptacles (0A, 1A, 2A, and so on), ensure that
the L groove and retraction strap are up. When installing QSFP cables in the bottom
row receptacles (0B, 1B, 2B, and so on) ensure that the L groove and retraction strap
are down. See “Identify the Data Cable” on page 66.
8. Slowly move the connector in.
As you slide the connector in, the shell should be in the center of the QSFP
receptacle.
■ If the connector stops or binds after about 1/4 in. (5 mm) travel, back out and
repeat from Step 7.
Servicing Data Cables73
Page 82
■ If the connector stops or binds with about 1/8 in. (2 mm) still to go, back out
and repeat Step 8.
9. Continue to push the connector in until you feel a detent.
10. Secure the cable into the cable management hardware.
Close hook-and-loop fasteners at bundles and securing hard points.
11. If you are installing all cables as part of a gateway replacement procedure,
repeat from Step 6 for all cables, including the Ethernet data cables at
connectors 0A and 1A on the right side of the rear panel.
12. Replace the cover for the cable management bracket and tighten the
thumbscrews.
Related Information
■ “Install a Power Supply” on page 49
■ “Install a Fan” on page 61
■ “Replace the Battery” on page 78
74Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 83
Servicing the Battery
The gateway has a battery on the main board that supports the management
controller. You can only replace the battery because the management controller is
dependent upon the battery. You cannot add or subtract the battery. Perform these
tasks in order to replace the battery:
StepDescriptionLinks
1.Determine if the battery is faulty.“Determine If the Battery Is Faulty” on page 75
2.Remove all data cables.“Remove a Data Cable” on page 68
3.Power off both power supplies.“Power Off a Power Supply” on page 46
4.Remove the gateway from the rack.“Remove the Gateway From the Rack” on
page 77
5.Replace the battery.“Replace the Battery” on page 78
6.Install the gateway in the rack.Gateway Installation, installing the gateway
Related Information
■ “Detecting and Managing Faults” on page 1
■ “Understanding Service Procedures” on page 37
■ “Servicing Power Supplies” on page 41
■ “Servicing Fans” on page 55
■ “Servicing Data Cables” on page 65
▼Determine If the Battery Is Faulty
You must determine if the battery is faulty before you replace it.
1. Check to see if any System Service Required LEDs are lit or flashing.
See “Check Chassis Status LEDs” on page 4.
75
Page 84
2. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
3. Verify that the battery is faulty.
a. Type.
-> show -d targets /SP/faultmgmt
If the battery is faulty, you will see /SYS/MB listed in the output under
Target:.
For example:
-> show -d targets /SP/faultmgmt
/SP/faultmgmt
Targets:
0 (/SYS/MB)
->
b. Note the number to the left of /SYS/MB.
c. Type.
-> show -d properties /SP/faultmgmt/number/faults/0
where number is the number to the left of /SYS/MB. For example:
-> show -d properties /SP/faultmgmt/0/faults/0
/SP/faultmgmt/0/faults/0
76Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 85
d. Look for the word battery in the output for the class property.
If the battery is faulty, replace it. See “Replace the Battery” on page 78.
If you do not see the word battery, or if a FRU value in addition to or different
from /SYS/MB is displayed in Step a, see “Clearable Fault Targets” on page 11 to
identify which component is faulty.
If no Oracle ILOM targets are listed in Step a,gotoStep 4.
4. Within the Oracle ILOM interface, verify the battery voltage.
-> show /SYS/MB/V_BAT value
/SYS/MB/V_BAT
Properties:
value = 3.136 Volts
->
5. Compare the value seen with the typical value and range provided in “Voltage
Sensor Values” on page 22.
If the battery is faulty, replace it. See “Replace the Battery” on page 78.
6. If you are unable to determine if the battery is faulty, seek further information.
See “Detecting and Managing Faults” on page 1.
Related Information
■ “Determine If a Power Supply Is Faulty” on page 41
■ “Determine If a Fan Is Faulty” on page 55
▼Remove the Gateway From the Rack
Note – This procedure assumes that you have removed all data cables from the
gateway and have powered down both power supplies by removing both power
cords. If not, see “Remove a Data Cable” on page 68 and “Power Off a Power
Supply” on page 46.
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Servicing the Battery” on page 75.
2. Disconnect the management cables.
Servicing the Battery77
Page 86
3. Use a No. 2 Phillips screwdriver to remove the four screws that secure the front
of the gateway into the rack.
4. Slide the gateway out of the front of the rack.
5. Set the gateway chassis onto a stable work surface.
Related Information
■ Gateway Installation, installing the gateway into the rack
■ “Remove a Power Supply” on page 47
■ “Remove a Fan” on page 60
■ “Remove a Data Cable” on page 68
■ “Replace the Battery” on page 78
▼Replace the Battery
Note – This procedure assumes that you have removed the Sun Network QDR
InfiniBand Gateway Switch from Oracle from the rack. If not, see “Remove the
Gateway From the Rack” on page 77.
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Servicing the Battery” on page 75.
2. Use a No. 1 Phillips screwdriver to remove the eight screws that secure the
C-shaped brackets at the rear sides of the gateway chassis.
78Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 87
3. Remove the eight screws that secure the long front brackets at the front sides of
the gateway chassis.
4. Remove the 16 screws that secure the top cover to the chassis.
There are five screws on each side and six screws across the top front of the cover.
Servicing the Battery79
Page 88
5. Slide the cover forward and lift it off.
6. Depress the clip that retains the battery and release the battery from the main
board.
80Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 89
7. Properly dispose of the old battery.
8. Unwrap the replacement battery from its antistatic packaging.
9. Install the replacement battery into the main board with the + side up.
Servicing the Battery81
Page 90
10. Orient the cover over the chassis and lower it in place.
11. Slide the cover rearward so that it engages at the rear panel.
Ensure that the screw holes in the cover align with the holes in the chassis.
82Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 91
12. Use a No. 1 Phillips screwdriver to install the 16 screws that secure the cover to
the chassis.
13. Use eight screws to attach the two front brackets to the front sides of the
chassis.
Servicing the Battery83
Page 92
14. Use eight screws to attach the two C-shaped brackets to the rear sides of the
chassis.
15. Install the gateway into the rack.
Refer to Gateway Installation , installing the gateway into the rack.
Related Information
■ “Install a Power Supply” on page 49
■ “Install a Fan” on page 61
■ “Install a Data Cable” on page 72
84Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013