Oracle Sun Network QDR InfiniBand Gateway Switch Service Manual

Page 1
Sun Network QDR InfiniBand Gateway Switch
Service Manual for Firmware Version 2.1
Part No.: E36262-01 March 2013, Revision A
Page 2
Copyright © 2013, Oracle and/orits affiliates.All rightsreserved. This softwareand related documentationare provided under a license agreement containingrestrictions on use and disclosure and areprotected by
intellectual propertylaws. Exceptas expressly permittedin yourlicense agreement orallowed bylaw, youmay notuse, copy, reproduce, translate, broadcast, modify, license, transmit, distribute,exhibit, perform,publish, ordisplay anypart, inany form,or byany means.Reverse engineering, disassembly, or decompressionof thissoftware, unlessrequired by law for interoperability, is prohibited.
The informationcontained hereinis subjectto changewithout noticeand isnot warranted to be error-free.If youfind anyerrors, please report them to us in writing.
If thisis softwareor related software documentationthat isdelivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following noticeis applicable:
U.S. GOVERNMENTEND USERS.Oracle programs,including anyoperating system,integrated software, anyprograms installed on the hardware, and/or documentation,delivered toU.S. Governmentend usersare "commercial computer software" pursuantto theapplicable FederalAcquisition Regulation andagency-specific supplementalregulations. Assuch, use,duplication, disclosure, modification,and adaptationof theprograms, including any operatingsystem, integratedsoftware, anyprograms installed on the hardware,and/or documentation,shall besubject tolicense termsand license restrictions applicableto theprograms. No other rights are granted to the U.S. Government.
This software orhardware is developedfor generaluse ina variety of information management applications. It is notdeveloped orintended for use in any inherently dangerous applications,including applicationswhich maycreate a risk of personal injury.If you use this software or hardwarein dangerous applications, thenyou shallbe responsibleto takeall appropriate fail-safe,backup, redundancy, and other measures toensure its safe use. Oracle Corporation andits affiliatesdisclaim anyliability forany damagescaused byuse of this software or hardware indangerous applications.
Oracle andJava areregistered trademarks of Oracle and/or its affiliates.Other namesmay betrademarks oftheir respective owners.
Intel andIntel Xeonare trademarksor registered trademarks of Intel Corporation. All SPARC trademarks areused underlicense andare trademarks or registered trademarks of SPARCInternational, Inc. AMD, Opteron, theAMD logo,and theAMD Opteron logo are trademarksor registered trademarksof Advanced MicroDevices. UNIXis aregistered trademark of The Open Group.
This software or hardware and documentation may provide access to or information on content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services.
Copyright ©2013, Oracleet/ou sesaffiliés. Tous droitsréservés. Ce logicielet ladocumentation quil’accompagne sontprotégés parles lois sur lapropriété intellectuelle. Ils sont concédés sous licence et soumis à des
restrictions d’utilisationet dedivulgation. Saufdisposition devotre contrat de licence ou de la loi, vous ne pouvez pas copier, reproduire, traduire, diffuser, modifier, breveter, transmettre, distribuer, exposer, exécuter, publier ouafficher le logiciel, même partiellement, sous quelque forme et par quelque procédéque cesoit. Parailleurs, ilest interdit deprocéder à toute ingénierie inverse du logiciel, de le désassembler ou de le décompiler, excepté à des finsd’interopérabilité avecdes logicielstiers outel queprescrit par la loi.
Les informationsfournies dansce documentsont susceptiblesde modificationsans préavis.Par ailleurs,Oracle Corporationne garantitpas qu’elles soient exemptesd’erreurs etvous invite,le caséchéant, àlui enfaire part par écrit.
Si celogiciel, oula documentationqui l’accompagne,est concédésous licenceau Gouvernementdes Etats-Unis,ou àtoute entitéqui délivre lalicencede ce logicielou l’utilisepour lecompte duGouvernement desEtats-Unis, lanotice suivantes’applique :
U.S. GOVERNMENTEND USERS.Oracle programs,including anyoperating system,integrated software, anyprograms installed on the hardware, and/or documentation,delivered toU.S. Governmentend usersare "commercial computer software" pursuantto theapplicable FederalAcquisition Regulation andagency-specific supplementalregulations. Assuch, use,duplication, disclosure, modification,and adaptationof theprograms, including any operatingsystem, integratedsoftware, anyprograms installed on the hardware,and/or documentation,shall besubject tolicense termsand license restrictions applicableto theprograms. No other rights are granted to the U.S. Government.
Ce logicielou matériela étédéveloppé pourun usagegénéral dansle cadre d’applicationsde gestiondes informations.Ce logicielou matérieln’est pas conçu nin’est destinéà êtreutilisé dansdes applicationsà risque,notamment dans des applications pouvant causer des dommages corporels. Si vous utilisez celogiciel oumatériel dansle cadred’applications dangereuses, ilest devotre responsabilité de prendre toutes les mesures de secours, de sauvegarde, deredondance et autres mesures nécessaires àson utilisationdans desconditions optimalesde sécurité.Oracle Corporationet sesaffiliés déclinent touteresponsabilité quantaux dommagescausés parl’utilisation dece logicielou matériel pour ce type d’applications.
Oracle etJava sontdes marquesdéposées d’OracleCorporation et/oude sesaffiliés.Tout autre nommentionné peutcorrespondre à des marques appartenant àd’autres propriétaires qu’Oracle.
Intel etIntel Xeonsont desmarques oudes marques déposéesd’Intel Corporation.Toutes les marques SPARC sontutilisées souslicence etsont des marques oudes marques déposéesde SPARC International, Inc. AMD, Opteron, le logo AMD et le logo AMD Opteron sontdes marques oudes marques déposées d’AdvancedMicro Devices.UNIX estune marque déposéed’The OpenGroup.
Ce logicielou matérielet ladocumentation quil’accompagne peuventfournir desinformations oudes liensdonnant accèsà descontenus, desproduits et des servicesémanant detiers. OracleCorporation etses affiliésdéclinent toute responsabilité ou garantie expresse quant aux contenus,produits ou services émanantde tiers.En aucuncas, OracleCorporation etses affiliésne sauraient être tenus pour responsables des pertes subies, des coûts occasionnés oudes dommagescausés parl’accès àdes contenus,produits ou services tiers, ou à leur utilisation.
Page 3
Contents
Using This Documentation vii
Detecting and Managing Faults 1
Interpreting Status LEDs 1
Front Panel LEDs 2
Rear Panel LEDs 3
Check Chassis Status LEDs 4
Check NET MGT Port Status LEDs 4
Check Link Status LEDs 5
Check Power Supply Status LEDs 6
Check Fan Status LEDs 7
Managing Faulty Components 7
Display Faulty Components (fault_state)8
Display Faulty Components (/SP/faultmgmt)9
Clear a Fault Manually 10
Clearable Fault Targets 11
Identify Faults in the Oracle ILOM Event Log 12
Determining the Alarm State of a Component or System 13
Display the General Alarm State of Systems and Components 14
System Alarm Targets 15
Component Alarm Targets 15
Oracle ILOM Target Alarm States 16
iii
Page 4
Evaluating Sensor Alarms 17
Display Oracle ILOM Sensor Status 18
Determine Oracle ILOM Sensor Target Types 20
Evaluating a Voltage Sensor Alarm 20
Evaluate a Voltage Sensor 21
Voltage Sensor Values 22
Voltage Out of Range 22
Evaluating a Temperature Sensor Alarm 23
Evaluate a Temperature Sensor 24
Temperature Sensor Values 24
Temperature Out of Range 25
Evaluating a Speed Sensor Alarm 26
Evaluate a Speed Sensor 26
Speed Sensor Values 27
Speed Out of Range 27
Evaluating a State Sensor Alarm 29
Evaluate a State Sensor 29
State Sensor Alarm Conditions 30
Evaluating a Presence Sensor Alarm 30
Evaluate a Presence Sensor 31
Presence Sensor Alarm Conditions 31
Evaluating an Indicator State 32
Evaluate an Indicator State 32
Indicator State Values 33
Indicator State Conditions 33
Accessing CLI Prompts 34
Access the Oracle ILOM CLI (NET MGT Port) 35
Enter the Restricted Linux Shell 35
iv Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 5
Exit the Restricted Linux Shell 36
Understanding Service Procedures 37
Replaceable Components 37
Suggested Tools for Service 39
Antistatic Precautions for Service 39
Servicing Power Supplies 41
Determine If a Power Supply Is Faulty 41
Inspecting a Power Supply 43
Identify the Power Supply 43
Inspect the Power Supply Hardware 45
Inspect the Power Supply Connectors 45
Power Off a Power Supply 46
Remove a Power Supply 47
Install a Power Supply 49
Power On a Power Supply 51
Servicing Fans 55
Determine If a Fan Is Faulty 55
Inspecting a Fan 57
Identify the Fan 57
Inspect the Fan Hardware 58
Inspect the Fan Connector 59
Remove a Fan 60
Install a Fan 61
Servicing Data Cables 65
Inspecting the Data Cables 65
Identify the Data Cable 66
Contents v
Page 6
Inspect the Data Cable Hardware 67
Inspect the Data Cable Connectors or Transceivers 67
Remove a Data Cable 68
Install a Data Cable 72
Servicing the Battery 75
Determine If the Battery Is Faulty 75
Remove the Gateway From the Rack 77
Replace the Battery 78
Index 85
vi Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 7
Using This Documentation
This service manual provides detailed procedures that describe the service of the Sun Network QDR InfiniBand Gateway Switch from Oracle. This document is written for technicians, system administrators, and users who have advanced experience servicing InfiniBand fabric hardware.
“Product Notes” on page vii
“Related Documentation” on page vii
“Feedback” on page viii
“Access to Oracle Support” on page viii
Product Notes
For late-breaking information and known issues about this product, refer to the product notes at:
http://docs.oracle.com/cd/E36256_01
Related Documentation
Documentation Links
Sun Network QDR InfiniBand Gateway Switch Firmware Version 2.1
http://docs.oracle.com/cd/E36256_01
vii
Page 8
Documentation Links
Oracle Solaris 11 OS http://www.oracle.com/goto/Solaris11/docs
Oracle Integrated Lights Out Manager (ILOM) 3.0 http://docs.oracle.com/cd/E19860-01
All Oracle products http://docs.oracle.com
Feedback
Provide feedback on this documentation at:
http://www.oracle.com/goto/docfeedback
Access to Oracle Support
Oracle customers have access to electronic support through My Oracle Support. For information, visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=
info or http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs visit
if you are hearing impaired.
viii Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 9
Detecting and Managing Faults
These topics explain how to use various diagnostic tools to find and troubleshoot faults and alarms in the gateway.
Note – A fault identifies a failure of a component. An alarm identifies an abnormal
condition of a component or system, as reported by a sensor.
Description Links
Investigate whether there is a fault condition. “Interpreting Status LEDs” on page 1
“Managing Faulty Components” on page 7
“Identify Faults in the Oracle ILOM Event Log” on page 12
Investigate whether there is an alarm condition. “Determining the Alarm State of a Component or System”
on page 13
“Evaluating Sensor Alarms” on page 17
Related Information
“Understanding Service Procedures” on page 37
“Servicing Power Supplies” on page 41
“Servicing Fans” on page 55
“Servicing Data Cables” on page 65
“Servicing the Battery” on page 75
Interpreting Status LEDs
Use these topics to interpret LEDs to determine if a component has failed.
“Front Panel LEDs” on page 2
“Rear Panel LEDs” on page 3
1
Page 10
“Check Chassis Status LEDs” on page 4
“Check NET MGT Port Status LEDs” on page 4
“Check Link Status LEDs” on page 5
“Check Power Supply Status LEDs” on page 6
“Check Fan Status LEDs” on page 7
Related Information
“Interpreting Status LEDs” on page 1
“Managing Faulty Components” on page 7
“Identify Faults in the Oracle ILOM Event Log” on page 12
“Determining the Alarm State of a Component or System” on page 13
“Evaluating Sensor Alarms” on page 17
“Accessing CLI Prompts” on page 34
Front Panel LEDs
No. LED Link
1 Power supply AC LED “Check Power Supply Status LEDs” on page 6
2 Power supply Attention LED “Check Power Supply Status LEDs” on page 6
3 Power supply OK LED “Check Power Supply Status LEDs” on page 6
4 Fan Attention LED “Check Fan Status LEDs” on page 7
2 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 11
Related Information
“Rear Panel LEDs” on page 3
“Check Chassis Status LEDs” on page 4
“Check NET MGT Port Status LEDs” on page 4
“Check Link Status LEDs” on page 5
“Check Power Supply Status LEDs” on page 6
“Check Fan Status LEDs” on page 7
Rear Panel LEDs
No. LED Link
1 NET MGT status LEDs “Check NET MGT Port Status LEDs” on page 4
2 InfiniBand link status LEDs “Check Link Status LEDs” on page 5
3 Ethernet link status LEDs “Check Link Status LEDs” on page 5
4 Chassis status LEDs “Check Chassis Status LEDs” on page 4
5 Not used
Related Information
“Front Panel LEDs” on page 2
“Check Chassis Status LEDs” on page 4
“Check NET MGT Port Status LEDs” on page 4
“Check Link Status LEDs” on page 5
“Check Power Supply Status LEDs” on page 6
“Check Fan Status LEDs” on page 7
Detecting and Managing Faults 3
Page 12
Check Chassis Status LEDs
The chassis status LEDs are located on the left side of the rear panel. See “Rear Panel
LEDs” on page 3.
1. Visually inspect the chassis status LEDs.
2. Compare what you see to this table.
Glyph Location Name Color State and Meaning
Top Locator White On – No function.
Off – Disabled.
Flashing – The gateway is identifying itself.
Middle Attention Amber On – Normal fault detected.
Off – No faults detected.
Flashing – No function.
Bottom OK Green On – Gateway is functional without fault.
Off – Gateway is off or initializing.
Flashing – No function.
3. If the Attention LED is lit, there is a fault present.
See “Managing Faulty Components” on page 7.
Related Information
“Front Panel LEDs” on page 2
“Rear Panel LEDs” on page 3
“Check NET MGT Port Status LEDs” on page 4
“Check Link Status LEDs” on page 5
“Check Power Supply Status LEDs” on page 6
“Check Fan Status LEDs” on page 7
Check NET MGT Port Status LEDs
The NET MGT port status LEDs are located on the NET MGT connector of the rear panel. See “Rear Panel LEDs” on page 3.
1. Visually inspect the NET status LEDs.
2. Compare what you see to this table.
4 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 13
Name Location Color State and Meaning
Link speed Left Amber or green Amber on – 100BASE-T link.
Green on – 1000BASE-T link.
Off – No link or link down.
Flashing – No function.
Activity Right Green On – No function.
Off – No activity.
Flashing – Packet activity.
3. If the Activity LED is off, there might be a problem with the communication to the management controller.
Refer to Gateway Administration, network management troubleshooting guidelines.
Related Information
“Front Panel LEDs” on page 2
“Rear Panel LEDs” on page 3
“Check Chassis Status LEDs” on page 4
“Check Link Status LEDs” on page 5
“Check Power Supply Status LEDs” on page 6
“Check Fan Status LEDs” on page 7
Check Link Status LEDs
The link status LEDs are located at the data cable connectors of the rear panel. See
“Rear Panel LEDs” on page 3.
1. Visually inspect the link status LEDs.
2. Compare what you see for a particular link to this table.
Name Color State and Meaning
Link Green On – Link established.
Off – No link or link down.
Flashing – Symbol errors.
3. If the Link LED flashes, there might be a problem with the data cable.
See “Servicing Data Cables” on page 65.
Detecting and Managing Faults 5
Page 14
Related Information
“Front Panel LEDs” on page 2
“Rear Panel LEDs” on page 3
“Check Chassis Status LEDs” on page 4
“Check NET MGT Port Status LEDs” on page 4
“Check Power Supply Status LEDs” on page 6
“Check Fan Status LEDs” on page 7
Check Power Supply Status LEDs
The power supply status LEDs are located on the power supply at the front of the chassis. See “Front Panel LEDs” on page 2.
1. Visually inspect the power supply’s status LEDs.
2. Compare what you see on the power supply to this table.
Glyph Location Name Color State and Meaning
Top OK Green On – 12 VDC is supplied.
Off – No DC voltage is present.
Flashing – No function.
Middle Attention Amber On – Fault detected, 12 VDC shut down.
Off – No faults detected.
Flashing – No function.
Bottom AC Green On – AC power present and good.
Off – AC power not present.
Flashing – No function.
Caution – If a power supply has shut down because of a thermal or overcurrent
condition, signified by the amber Attention LED lighting, remove the respective power cord from the chassis. Allow the power supply to completely cool for at least 15 minutes. A shorter cooling time might cause damage to the power supply when the power cord is reattached. If the Attention LED lights amber upon reattaching the power cord, replace the power supply.
3. If the Attention LED is lit, there is a fault with that power supply.
See “Servicing Power Supplies” on page 41.
6 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 15
Related Information
“Front Panel LEDs” on page 2
“Rear Panel LEDs” on page 3
“Check Chassis Status LEDs” on page 4
“Check NET MGT Port Status LEDs” on page 4
“Check Link Status LEDs” on page 5
“Check Fan Status LEDs” on page 7
Check Fan Status LEDs
The fan status LEDs are located in the lower right corner of the fans at the front of the gateway chassis. See “Front Panel LEDs” on page 2.
1. Visually inspect the fan status LEDs.
2. If the LED is lit, there is a fault with that fan.
See “Servicing Fans” on page 55.
Related Information
“Front Panel LEDs” on page 2
“Rear Panel LEDs” on page 3
“Check Chassis Status LEDs” on page 4
“Check NET MGT Port Status LEDs” on page 4
“Check Link Status LEDs” on page 5
“Check Power Supply Status LEDs” on page 6
Managing Faulty Components
If Oracle ILOM has detected a fault with a component, you can display and clear that fault with these topics:
“Display Faulty Components (fault_state)” on page 8
“Display Faulty Components (/SP/faultmgmt)” on page 9
“Clear a Fault Manually” on page 10
“Clearable Fault Targets” on page 11
Detecting and Managing Faults 7
Page 16
Related Information
“Interpreting Status LEDs” on page 1
“Identify Faults in the Oracle ILOM Event Log” on page 12
“Determining the Alarm State of a Component or System” on page 13
“Evaluating Sensor Alarms” on page 17
“Accessing CLI Prompts” on page 34
Display Faulty Components (fault_state)
You can identify faulty components by their fault state.
1. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
2. Display the fault state of components.
-> show / -a -l 4 -o table fault_state Target | Property | Value
--------------------+------------------------+------­/SYS/MB | fault_state | OK /SYS/PSU0 | fault_state | OK /SYS/PSU1 | fault_state | OK /SYS/FAN0 | fault_state | OK /SYS/FAN1 | fault_state | OK /SYS/FAN2 | fault_state | Faulted /SYS/FAN3 | fault_state | OK /SYS/FAN4 | fault_state | OK
->
3. Look in the Value column for Faulted.
4. Look in the same row under the Target column, to find the Oracle ILOM target
of the faulty component.
For example, /SYS/FAN2.
5. Identify the component that has faulted and might need to be replaced.
See “Clearable Fault Targets” on page 11.
Related Information
“Display Faulty Components (/SP/faultmgmt)” on page 9
“Clear a Fault Manually” on page 10
“Clearable Fault Targets” on page 11
8 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 17
Display Faulty Components (/SP/faultmgmt)
1. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
2. Display any faulty components.
-> show -d targets /SP/faultmgmt /SP/faultmgmt
Targets:
x (faulted_target)
->
where:
x is the target sequence number (starting at 0).
faulted_target is the Oracle ILOM target of the faulty component.
Note – If there are several faulty components, then their respective targets are listed
with increasing target sequence numbers.
Note – If no number is displayed, there are no faulty components.
For example:
-> show -d targets /SP/faultmgmt /SP/faultmgmt
Targets:
0 (/SYS/PSU0)
->
3. Display details of the fault.
-> show -d properties /SP/faultmgmt/x/faults/y
where:
x is the target sequence number (starting at 0).
Detecting and Managing Faults 9
Page 18
y is the fault sequence number (starting at 0) for the target x.
For example:
-> show /SP/faultmgmt/0/faults/0 /SP/faultmgmt/0/faults/0
Properties:
class = fault.chassis.device.psu.fail sunw-msg-id = DCSIB-8000-23 uuid = e8f7a292-62ab-43a2-9f32-30991cf8fbd5 timestamp = 2012-04-01/10:34:18 fru_part_number = 3002234 fru_serial_number = 006541 product_serial_number = AK00022680 chassis_serial_number = AK00022680
->
The class property provides a general reason for the fault.
4. Use faulted_target to identify the component that has faulted and might need to
be replaced.
See “Clearable Fault Targets” on page 11.
Related Information
“Display Faulty Components (fault_state)” on page 8
“Clear a Fault Manually” on page 10
“Clearable Fault Targets” on page 11
Clear a Fault Manually
If Oracle ILOM detects a fault and consequential component replacement, Oracle ILOM automatically clears the fault. However, you can manually clear the fault after replacing the component, if necessary.
1. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
10 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 19
2. Clear the fault.
-> set target clear_fault_action=true
where target is from “Clearable Fault Targets” on page 11.
For example, to clear a fault with power supply 0, type.
-> set /SYS/PSU0 clear_fault_action=true Are you sure you want to clear /SYS/PSU0 (y/n)? y Set ’clear_fault_action’ to ’true’
->
Related Information
“Display Faulty Components (fault_state)” on page 8
“Display Faulty Components (/SP/faultmgmt)” on page 9
“Clearable Fault Targets” on page 11
Clearable Fault Targets
This table lists the components, their Oracle ILOM targets that are clearable, and links to servicing procedures.
Component Target Links
Battery /SYS/MB “Servicing the Battery” on page 75
SSD drive /SYS/MB Replace the gateway. See “Remove the Gateway
From the Rack” on page 77.
Fan x, where x is 0 to 4 /SYS/FANx “Servicing Fans” on page 55
Power supply x, where x is either 0 or 1 /SYS/PSUx “Servicing Power Supplies” on page 41
Use this table for these procedures:
“Display Faulty Components (/SP/faultmgmt)” on page 9
“Clear a Fault Manually” on page 10
“Identify Faults in the Oracle ILOM Event Log” on page 12
Related Information
“Display Faulty Components (fault_state)” on page 8
“Display Faulty Components (/SP/faultmgmt)” on page 9
Detecting and Managing Faults 11
Page 20
“Clear a Fault Manually” on page 10
Identify Faults in the Oracle ILOM
Event Log
1. Access Oracle ILOM.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
2. Display the Oracle ILOM event log.
-> show /SP/logs/event/list Class==class Type==type
where you choose class and type from the table in Gateway Administration, log entry filters.
For example, to display log entries pertaining to all faults, type.
-> show /SP/logs/event/list Class==Fault
Note – If you want to display log entries pertaining to only component failure, use
the show /SP/logs/event/list Class==Fault Type==Fault command.
3. Identify the faulty components in the output.
The Oracle ILOM targets of the faulty components follow the word component. For example:
-> show /SP/logs/event/list Class==Fault Event ID Date/Time Class Type Severity
----- ------------------------ -------- -------- -------­18820 Tue Sep 25 13:44:56 2012 Fault Fault critical
Fault detected at time = Tue Sep 25 13:44:56 2012. The suspect component: /SYS/PSU0 has fault.chassis.device.psu.fail with probability=100. Refer to http://support.oracle.com/msg/DCSIB-8000-23 for details.
18569 Tue Sep 18 16:43:13 2012 Fault Repair minor
Component /SYS/PSU0 repaired
18567 Tue Sep 18 15:51:48 2012 Fault Fault critical
Fault detected at time = Tue Sep 18 15:51:48 2012. The suspect component: /SYS/PSU0 has fault.chassis.device.psu.fail with probability=100. Refer
12 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 21
. . .
->
to http://support.oracle.com/msg/DCSIB-8000-23 for details.
Note – The most recent events are listed at the top of the log.
In this example, Event ID 18567 on September 18, at 15:51, indicated that a critical fault occurred in the component with Oracle ILOM target /SYS/PSU0. This is power supply 0 as identified in “Clearable Fault Targets” on page 11. Following the Oracle ILOM target is the reason for the fault. A URL is provided for more information about the fault.
Moving up the output, Event ID 18569 on September 18, at 16:43, indicated that a repair action was taken on the component with Oracle ILOM target /SYS/PSU0. The power supply was repaired. The term repaired can mean either repaired or replaced. In either case, the power supply in slot 0 was now functional.
Continuing up the output, Event ID 18820 on September 25 indicated that a critical fault occurred again in the component with Oracle ILOM target /SYS/PSU0.
4. Depending on the severity of the fault, replace the component.
See “Clearable Fault Targets” on page 11 for servicing links.
Related Information
“Interpreting Status LEDs” on page 1
“Managing Faulty Components” on page 7
“Determining the Alarm State of a Component or System” on page 13
“Evaluating Sensor Alarms” on page 17
“Accessing CLI Prompts” on page 34
Determining the Alarm State of a Component or System
When a component or system of components experiences a condition which triggers an alarm, the condition might affect the operation of the gateway. These topics enable you to display alarm states.
“Display the General Alarm State of Systems and Components” on page 14
Detecting and Managing Faults 13
Page 22
“System Alarm Targets” on page 15
“Component Alarm Targets” on page 15
“Oracle ILOM Target Alarm States” on page 16
Related Information
“Interpreting Status LEDs” on page 1
“Managing Faulty Components” on page 7
“Identify Faults in the Oracle ILOM Event Log” on page 12
“Evaluating Sensor Alarms” on page 17
“Accessing CLI Prompts” on page 34
Display the General Alarm State of Systems and
Components
1. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
2. Type.
-> show target alarm_status
where target is from the tables in “System Alarm Targets” on page 15 and
“Component Alarm Targets” on page 15.
For example, to display the general alarm state of fan 1, type.
-> show /SYS/FAN1 alarm_status /SYS/FAN1
Properties:
alarm_status = cleared
->
3. Compare the value displayed to the alarm states.
See “Oracle ILOM Target Alarm States” on page 16.
4. If the alarm state is major or critical, you might need to replace the
component.
See “Clearable Fault Targets” on page 11 for servicing links.
14 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 23
Related Information
“System Alarm Targets” on page 15
“Component Alarm Targets” on page 15
“Oracle ILOM Target Alarm States” on page 16
System Alarm Targets
This table lists systems that have the ability to report an alarm and their Oracle ILOM targets.Use these targets for the procedure, “Display the General Alarm State of
Systems and Components” on page 14.
System Target
Cooling system /SYS/COOLING_ATTN
Signal cable monitoring /SYS/CABLE_ATTN
Power system /SYS/POWER_ATTN
Power redundancy /SYS/POWER_REDUN
Cooling redundancy /SYS/COOLING_REDUN
Signal cable connections /SYS/CABLE_CONN_STAT
Temperature monitoring /SYS/TEMP_ATTN
InfiniBand devices within the gateway /SYS/IBDEV_ATTN
Entire gateway /SYS/CHASSIS_STATUS
Related Information
“Display the General Alarm State of Systems and Components” on page 14
“Component Alarm Targets” on page 15
“Oracle ILOM Target Alarm States” on page 16
Component Alarm Targets
This table lists components or sensors that have the ability to report an alarm, and their Oracle ILOM targets. Use these targets for the procedure “Display the General
Alarm State of Systems and Components” on page 14.
Detecting and Managing Faults 15
Page 24
Component Target
ECB alarm /SYS/MB/V_ECB
3.3v main voltage alarm /SYS/MB/V_3.3VMainOK
5v alarm /SYS/MB/V_5VOK
1.0v alarm /SYS/MB/V_1.0VOK
I4 switch chip voltage alarm /SYS/MB/V_I41.2VOK
2.5 v alarm /SYS/MB/V_2.5VOK
Digital power alarm /SYS/MB/V_V1P2DIG
Analog power alarm /SYS/MB/V_V1P2ANG
BridgeX chip voltage alarm /SYS/MB/V_BX1.2VOK
1.8V alarm /SYS/MB/V_1.8VOK
I4 switch chip boot alarm /SYS/MB/BOOT_I4A
SSD drive alarm /SYS/MB/DISK_FAULT
Battery alarm /SYS/MB/BAT_FAULT
Individual power supply alarm, where x is either 0 or 1 /SYS/PSUx/FAULT
Individual power supply alert, where x is either 0 or 1 /SYS/PSUx/ALERT
Individual power supply mains voltage presence, where x is either 0 or 1
Individual fan alarm, where x is 0 to 4 /SYS/FANx/FAULT
/SYS/PSUx/AC_PRESENT
Related Information
“Display the General Alarm State of Systems and Components” on page 14
“System Alarm Targets” on page 15
“Oracle ILOM Target Alarm States” on page 16
Oracle ILOM Target Alarm States
Use this table to clarify alarm states as seen in the alarm_status = alarm_state parameter of Oracle ILOM targets and in the output of the procedure “Display the
General Alarm State of Systems and Components” on page 14.
16 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 25
Alarm State Description
cleared The component or system has recovered from an alarmed condition and is fully
operational.
warning An alarm has identified a condition that is abnormal, but does not affect any
individual component.
minor An alarm has identified a condition that might affect an individual component.
major An alarm has identified a condition that affects only the individual component. The
condition might affect a system, but not enough to compromise the operation of the gateway.
critical An alarm has identified a condition that affects both individual components and
systems. The operation of the gateway is compromised or at risk.
indeterminate Oracle ILOM is unable to provide an alarm state for this component.
(none) The component or its alarm is not available to Oracle ILOM. (The component might
have been removed.)
Related Information
“Display the General Alarm State of Systems and Components” on page 14
“System Alarm Targets” on page 15
“Component Alarm Targets” on page 15
Evaluating Sensor Alarms
These topics enable you to evaluate sensor information, to determine if an unfavorable condition has occurred or will happen.
Detecting and Managing Faults 17
Page 26
Step Description Links
1. Identify a suspect sensor and display its value.
2. Determine the sensor target and alarm type.
3. Evaluate the sensor type alarm. “Evaluating a Voltage Sensor Alarm” on page 20
“Display Oracle ILOM Sensor Status” on page 18
“Determine Oracle ILOM Sensor Target Types” on page 20
“Evaluating a Temperature Sensor Alarm” on page 23
“Evaluating a Speed Sensor Alarm” on page 26
“Evaluating a State Sensor Alarm” on page 29
“Evaluating a Presence Sensor Alarm” on page 30
“Evaluating an Indicator State” on page 32
Related Information
“Interpreting Status LEDs” on page 1
“Managing Faulty Components” on page 7
“Identify Faults in the Oracle ILOM Event Log” on page 12
“Determining the Alarm State of a Component or System” on page 13
“Accessing CLI Prompts” on page 34
Display Oracle ILOM Sensor Status
1. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
2. Type.
-> show / -a -l 4 -o table alarm_status Target | Property | Value
--------------------+------------------------+---------­/SYS/MB/V_ECB | alarm_status | cleared /SYS/MB/V_3.3VMain | alarm_status | cleared /SYS/MB/ | alarm_status | cleared V_3.3VMainOK | | /SYS/MB/V_3.3VStby | alarm_status | minor . . . /SYS/FAN3/PRSNT | alarm_status | cleared
18 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 27
/SYS/FAN3/TACH | alarm_status | cleared /SYS/FAN3/FAULT | alarm_status | cleared
->
3. Look in the Value column for minor, major,orcritical.
For example, minor. For more information about alarm states, see “Oracle ILOM
Target Alarm States” on page 16.
4. Look in the same row under the Target column, to find the Oracle ILOM
sensor target.
For example, /SYS/MB/V_3.3VStby.
5. Display the value of the sensor target.
-> show target value
where target is the Oracle ILOM target for the sensor from Step 4. For example:
-> show /SYS/MB/V_3.3VStby value /SYS/MB/V_3.3VStby
Properties:
value = 3.490 Volts
->
6. Record the target and value.
For example, /SYS/MB/V_3.3VStby and 3.490 volts.
7. Determine the sensor type.
See “Determine Oracle ILOM Sensor Target Types” on page 20.
Related Information
“Determine Oracle ILOM Sensor Target Types” on page 20
“Evaluating a Voltage Sensor Alarm” on page 20
“Evaluating a Temperature Sensor Alarm” on page 23
“Evaluating a Speed Sensor Alarm” on page 26
“Evaluating a State Sensor Alarm” on page 29
“Evaluating a Presence Sensor Alarm” on page 30
“Evaluating an Indicator State” on page 32
Detecting and Managing Faults 19
Page 28
Determine Oracle ILOM Sensor Target Types
Use this table to determine the sensor type from its target and go to the
corresponding link.
The word string represents any string of characters, numbers, and symbols.
Sensor Target Sensor Type Links
/SYS/FANx/string • Fan state
• Fan speed
• Fan presence
“Evaluating a State Sensor Alarm” on page 29
“Evaluating a Speed Sensor Alarm” on page 26
“Evaluating a Presence Sensor Alarm” on
page 30
/SYS/I_string Indicator “Evaluating an Indicator State” on page 32
/SYS/MB/T_string Main board temperature “Evaluating a Temperature Sensor Alarm” on
page 23
/SYS/MB/V_stringOK Main board voltage state “Evaluating a State Sensor Alarm” on page 29
/SYS/MB/V_string Main board voltage “Evaluating a Voltage Sensor Alarm” on page 20
/SYS/MB/string Main board system state “Evaluating a State Sensor Alarm” on page 29
/SYS/PSUx/string • Power supply state
• Power supply presence
“Evaluating a State Sensor Alarm” on page 29
“Evaluating a Presence Sensor Alarm” on
page 30
/SYS/string System state “Evaluating a State Sensor Alarm” on page 29
Related Information
“Display Oracle ILOM Sensor Status” on page 18
“Evaluating a Voltage Sensor Alarm” on page 20
“Evaluating a Temperature Sensor Alarm” on page 23
“Evaluating a Speed Sensor Alarm” on page 26
“Evaluating a State Sensor Alarm” on page 29
“Evaluating a Presence Sensor Alarm” on page 30
“Evaluating an Indicator State” on page 32
Evaluating a Voltage Sensor Alarm
These topics help you resolve voltage sensor alarms.
“Evaluate a Voltage Sensor” on page 21
“Voltage Sensor Values” on page 22
20 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 29
“Voltage Out of Range” on page 22
Related Information
“Display Oracle ILOM Sensor Status” on page 18
“Determine Oracle ILOM Sensor Target Types” on page 20
“Evaluating a Temperature Sensor Alarm” on page 23
“Evaluating a Speed Sensor Alarm” on page 26
“Evaluating a State Sensor Alarm” on page 29
“Evaluating a Presence Sensor Alarm” on page 30
“Evaluating an Indicator State” on page 32
Evaluate a Voltage Sensor
1. Display the sensor status and determine the target type.
See:
“Display Oracle ILOM Sensor Status” on page 18
“Determine Oracle ILOM Sensor Target Types” on page 20
2. Compare the displayed value with a known good range.
See “Voltage Sensor Values” on page 22.
3. Learn why a voltage sensor might alarm.
See “Voltage Out of Range” on page 22.
4. Determine your next step.
Voltage Sensor Target Action Links
/SYS/MB/V_3.3VMain
/SYS/MB/V_3.3VStby
Replace the power supply. “Servicing Power Supplies”
on page 41
/SYS/MB/V_12V
/SYS/MB/V_BAT Replace the battery. “Servicing the Battery” on
page 75
All other voltage sensor targets.
Replace the gateway. “Remove the Gateway From
the Rack” on page 77
Related Information
“Voltage Sensor Values” on page 22
“Voltage Out of Range” on page 22
Detecting and Managing Faults 21
Page 30
Voltage Sensor Values
This table lists typical values and acceptable ranges for the voltage sensors. You use this table in conjunction with the target and value you recorded in “Display Oracle
ILOM Sensor Status” on page 18. If your voltage sensor’s value is near a boundary or
outside of the acceptable range, refer to “Voltage Out of Range” on page 22.
Voltage Sensor Target Typical Value Acceptable Range
/SYS/MB/V_3.3VMain 3.266V 3.112 to 3.403V
/SYS/MB/V_3.3VStby 3.420V 3.112 to 3.403V
/SYS/MB/V_12V 11.966V 11.346 to 12.338V
/SYS/MB/V_5V 4.992V 4.498 to 5.486V
/SYS/MB/V_BAT 3.136V 2.746V to N/A
/SYS/MB/V_1.0V 1.006V 0.877 to 1.158V
/SYS/MB/V_I41.2V 1.217V 1.041 to 1.392V
/SYS/MB/V_2.5V 2.504V 2.387 to 2.586V
/SYS/MB/V_V1P2DIG 1.170V 1.135 to 1.392V
/SYS/MB/V_V1P2ANG 1.170V 1.135 to 1.392V
/SYS/MB/V_BX1.2V 1.217V 1.041 to 1.392V
/SYS/MB/V_1.8V 1.785V 1.697 to 1.891V
/SYS/MB/V_1.2VStby 1.193V 1.048 to 1.387V
Related Information
“Evaluate a Voltage Sensor” on page 21
“Voltage Out of Range” on page 22
Voltage Out of Range
Even though all voltages within the chassis are regulated, situations can arise where a voltage drifts outside of the acceptable range and goes too high or too low.
When a voltage is too high, it can be caused by:
The load for which the voltage is provided, is missing – A component has failed or
has been removed from the electrical connection.
The regulator for that voltage has failed.
22 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 31
For example, if the voltage at sensor target /SYS/MB/V_I41.2V is too high, then either the regulator is failing, or the I4 switch chip is no longer requiring the supplied voltage. This latter situation can occur transitionally if the I4 switch chip is reset or if all of its ports are disabled. If the I4 switch chip has a catastrophic failure, such as from overheating, the voltage at the sensor target might go too high.
When a voltage is too low, it can be caused by:
The load for which the voltage is provided, has increased beyond that supported
by the regulator - A component has either been overresourced or internally electrically shorted, internal maximum temperature has been exceeded, or the electrical connection has been shorted.
The regulator for that voltage has failed.
For example, if the voltage at sensor target /SYS/MB/V_I41.2V is too low, then either the regulator is failing, or the I4 switch chip is under very heavy throughput loading, quite possibly in conjunction with overheating.
Because both types of voltage extremes for the /SYS/MB/V_I41.2V sensor target can be indicative of a thermal problem with the I4 switch chip, it follows that a check of the temperature at sensor target /SYS/MB/T_I4A, is in order.
Note – The 3.3VMain, 3.3VStby, and the 12V are provided by the power supplies
redundantly. If one of these voltages is either too high or too low, one or both of the power supplies could be at fault, as the voltages are provided by the power supplies in parallel. Because of this configuration, you must recheck the 3.3VMain, 3.3VStby, and 12V with only one power supply operational at a time. Re-perform “Display
Oracle ILOM Sensor Status” on page 18 with only the power cord for PSU0
disconnected, and then again with only the power cord for PSU1 disconnected.
Related Information
“Evaluate a Voltage Sensor” on page 21
“Voltage Sensor Values” on page 22
Evaluating a Temperature Sensor Alarm
These topics help you resolve temperature sensor alarms.
“Evaluate a Temperature Sensor” on page 24
“Temperature Sensor Values” on page 24
“Temperature Out of Range” on page 25
Detecting and Managing Faults 23
Page 32
Related Information
“Display Oracle ILOM Sensor Status” on page 18
“Determine Oracle ILOM Sensor Target Types” on page 20
“Evaluating a Voltage Sensor Alarm” on page 20
“Evaluating a Speed Sensor Alarm” on page 26
“Evaluating a State Sensor Alarm” on page 29
“Evaluating a Presence Sensor Alarm” on page 30
“Evaluating an Indicator State” on page 32
Evaluate a Temperature Sensor
1. Display the sensor status and determine the target type.
See:
“Display Oracle ILOM Sensor Status” on page 18
“Determine Oracle ILOM Sensor Target Types” on page 20
2. Compare the displayed value with a known good range.
See “Temperature Sensor Values” on page 24.
3. Learn why a temperature sensor might alarm and take action.
See “Temperature Out of Range” on page 25.
Related Information
“Temperature Sensor Values” on page 24
“Temperature Out of Range” on page 25
Temperature Sensor Values
This table lists typical values and acceptable ranges for the temperature sensors. You use this table in conjunction with the target and value you recorded in “Display
Oracle ILOM Sensor Status” on page 18. If your temperature sensor’s value is near a
boundary or outside of the acceptable range, refer to “Temperature Out of Range” on
page 25.
Temperature Sensor Target Typical Value Acceptable Range
/SYS/MB/T_BACK 30˚C 25 to 70˚C
/SYS/MB/T_FRONT 29˚C 25 to 70˚C
24 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 33
Temperature Sensor Target Typical Value Acceptable Range
/SYS/MB/T_SP 45˚C 25 to 60˚C
/SYS/MB/T_I4A 39˚C 25 to 70˚C
/SYS/MB/T_B0 48˚C 25 to 70˚C
/SYS/MB/T_B1 49˚C 25 to 70˚C
Related Information
“Evaluate a Temperature Sensor” on page 24
“Temperature Out of Range” on page 25
Temperature Out of Range
Temperatures within the chassis are regulated by the fans. For the fan cooling to be effective, the intake room air temperature must be below 25˚C.
When a temperature is too high, it can be caused by:
Air flow is insufficient – The fan speeds are too slow, the fans have stopped
spinning, or the fan is missing altogether.
Cooling air temperature is too high – No component can be cooled to a
temperature lower than the cooling medium itself. Additionally, as the cooling air temperature increases, the air’s ability to remove heat diminishes.
Heat generated within a component is greater than that removed – The cooling
system was designed for a certain power dissipated by the components. When those components experience high computing or throughput loads, or are subjected to overvoltage situations when a voltage regulator fails, they generate more heat.
For example, if the temperature at sensor target /SYS/MB/T_I4A is too high, then the fans speeds (/SYS/FANx/TACH) are collectively too low, the cooling air temperature (/SYS/MB/T_FRONT) is too high, the voltage powering the I4 switch chip (/SYS/MB/V_I41.2V) is too high, or the loading on the switch chip is too high.
When a temperature is too low, it is rarely a detrimental situation. There is an exception, when the temperature of a component is the same as room temperature or lower, there is a great possibility that the component is not functioning as expected.
For example, if the temperature at sensor target /SYS/MB/T_I4A is too low, as compared to the cooling air temperature (/SYS/MB/T_FRONT), then the I4 switch chip is being held in a state of reset, the voltage for the I4 switch chip (/SYS/MB/V_I41.2V) is not being provided, or the I4 switch chip has catastrophically failed.
Detecting and Managing Faults 25
Page 34
Note – The gateway is not fitted with an air filter. Therefore, contaminants can enter
the gateway and adhere to cooling surfaces. The effect is two-fold, the contaminants prevent the flow of cooling air to the components, and the contaminants behave as insulators, retaining waste heat dissipated by the components. If supplied voltages, cooling air temperatures, and fans speeds are within acceptable values, yet component temperatures are high, the extent of contamination is severe.
When temperatures are out of range, the suggested action is to check the fans and replace any that are not operating properly. See “Servicing Fans” on page 55.Ifnew fans do not resolve the problem, then replace the gateway.
Related Information
“Evaluate a Temperature Sensor” on page 24
“Temperature Sensor Values” on page 24
Evaluating a Speed Sensor Alarm
These topics help you resolve speed sensor alarms.
“Evaluate a Speed Sensor” on page 26
“Speed Sensor Values” on page 27
“Speed Out of Range” on page 27
Related Information
“Display Oracle ILOM Sensor Status” on page 18
“Determine Oracle ILOM Sensor Target Types” on page 20
“Evaluating a Voltage Sensor Alarm” on page 20
“Evaluating a Temperature Sensor Alarm” on page 23
“Evaluating a State Sensor Alarm” on page 29
“Evaluating a Presence Sensor Alarm” on page 30
“Evaluating an Indicator State” on page 32
Evaluate a Speed Sensor
1. Display the sensor status and determine the target type.
See:
26 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 35
“Display Oracle ILOM Sensor Status” on page 18
“Determine Oracle ILOM Sensor Target Types” on page 20
2. Compare the displayed value with a known good range.
See “Speed Sensor Values” on page 27.
3. Learn why a speed sensor might alarm and take action.
See “Speed Out of Range” on page 27.
Related Information
“Speed Sensor Values” on page 27
“Speed Out of Range” on page 27
Speed Sensor Values
This table lists typical values and acceptable ranges for the speed sensors. You use this table in conjunction with the target and value you recorded in “Display Oracle
ILOM Sensor Status” on page 18. If your speed sensor’s value is near a boundary or
outside of the acceptable range, refer to “Speed Out of Range” on page 27.
Speed Sensor Target Typical Value Acceptable Rang e or Value
/SYS/FANx/TACH 12099 RPM 6322 to 26705 RPM
Related Information
“Evaluate a Speed Sensor” on page 26
“Speed Out of Range” on page 27
Speed Out of Range
The speed of the fans is varied by the management controller. The management controller uses an algorithm that considers the cooling air temperature, the number of fans spinning, and the temperatures within the chassis, to set the speed of the fans.
Note – The management controller sets all fans of identical type to identical speeds,
and their speeds should not vary more than 2000 RPMs from each other. If one fan’s speed varies more than 2000 RPMs than the average of the remaining identical fans, that fan will fail soon and should be replaced.
Detecting and Managing Faults 27
Page 36
When a fan speed is too high, it is an indication of the condition of the fan, which if gone unchecked can be detrimental to the operation of the gateway. A too-high fan speed can be caused by:
Internal failure – To regulate their speed, the fans use hall-effect sensors in an
internal feedback loop. If the sensor fails, the feedback loop opens, and the motor overspeeds uncontrollably.
Other fan failure – The algorithm used by the management controller
compensates for a fan failure by increasing the speed of the remaining functional fans.
Fan obstruction – If the fan intake is blocked, load on the fan is reduced, and the
fan overspeeds.
Temperatures too high – If any component temperatures are too high, the fans
spin faster.
Supply voltage too high – If the voltage at sensor target /SYS/MB/V_12V is too
high, the fans spin faster.
If a fan overspeeds for an extended time, it will fail. Consequently, insufficient cooling air will be provided and the gateway will overheat.
When a fan speed is too low, it also is an indication of the condition of the fan, which directly affects the operation of the gateway. A too-low fan speed can be caused by:
Coil failure – The fan motor uses alternating electromagnetic fields to spin the fan
impeller. Depending upon the fan motor design, if the coil that creates a magnetic field fails, the fan might spin much slower, or not at all.
Controller failure – The controller alternates the electromagnet fields to spin the
fan impeller. If the controller fails, the fan might not spin at all.
Bearing failure – The fan impeller is balanced on a bearing around which it spins.
The bearing is lubricated with an oil. If the bearing fails or the lubricant degrades, the fan speed is reduced greatly.
Supply voltage too low – If the voltage at sensor target /SYS/MB/V_12V is too
low, the fans spin slower.
If the fans speed is too low, insufficient cooling air will be provided and the gateway will overheat.
When fan speeds are out of range, the suggested action is to replace any fan that is not operating properly. See “Servicing Fans” on page 55. If new fans do not resolve the problem, then replace the gateway.
Related Information
“Evaluate a Speed Sensor” on page 26
“Speed Sensor Values” on page 27
28 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 37
Evaluating a State Sensor Alarm
These topics help you resolve state sensor alarms.
“Evaluate a State Sensor” on page 29
“State Sensor Alarm Conditions” on page 30
Related Information
“Display Oracle ILOM Sensor Status” on page 18
“Determine Oracle ILOM Sensor Target Types” on page 20
“Evaluating a Voltage Sensor Alarm” on page 20
“Evaluating a Temperature Sensor Alarm” on page 23
“Evaluating a Speed Sensor Alarm” on page 26
“Evaluating a Presence Sensor Alarm” on page 30
“Evaluating an Indicator State” on page 32
Evaluate a State Sensor
1. Display the sensor status and determine the target type.
See:
“Display Oracle ILOM Sensor Status” on page 18
“Determine Oracle ILOM Sensor Target Types” on page 20
2. Learn why a state sensor might alarm.
See “State Sensor Alarm Conditions” on page 30
3. Determine your next step.
State Sensor Target Action Links
/SYS/CHASSIS_STATUS Check other targets. “Display Oracle ILOM
Sensor Status” on page 18
/SYS/CABLE_ATTN
/SYS/CABLE_CONN_STAT
/SYS/MB/BAT_FAULT Replace the battery. “Servicing the Battery” on
Replace the cable. “Servicing Data Cables” on
page 65
page 75
Detecting and Managing Faults 29
Page 38
State Sensor Target Action Links
/SYS/MB/V_3.3VMainOK
/SYS/POWER_ATTN
/SYS/POWER_REDUN
/SYS/PSUx/ALERT
/SYS/PSUx/AC_PRESENT
/SYS/PSUx/FAULT
/SYS/TEMP_ATTN
/SYS/COOLING_ATTN
/SYS/COOLING_REDUN
/SYS/FANx/FAULT
/SYS/MB/BOOT_I4A
/SYS/IBDEV_ATTN
All other state sensors. Replace the gateway. “Remove the Gateway From
Replace the power supply. “Servicing Power Supplies”
on page 41
Replace the fan. “Servicing Fans” on page 55
Check the I4 switch chip. Refer to Gateway
Administration, resetting a port.
the Rack” on page 77
Related Information
“State Sensor Alarm Conditions” on page 30
State Sensor Alarm Conditions
The gateway has many sensors that check the state of a voltage, component, or system fault, or voltage presence. In an acceptable state, the state sensors report a value of State Deasserted, meaning no error. When a voltage, component, or system goes to a detrimental state, the state sensors report a value of State Asserted.
For example, when the state of sensor target /SYS/FAN1/FAULT is State Asserted, there is a problem with fan 1.
Related Information
“Evaluate a State Sensor” on page 29
Evaluating a Presence Sensor Alarm
These topics help you resolve presence sensor alarms.
“Evaluate a Presence Sensor” on page 31
“Presence Sensor Alarm Conditions” on page 31
30 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 39
Related Information
“Display Oracle ILOM Sensor Status” on page 18
“Determine Oracle ILOM Sensor Target Types” on page 20
“Evaluating a Voltage Sensor Alarm” on page 20
“Evaluating a Temperature Sensor Alarm” on page 23
“Evaluating a Speed Sensor Alarm” on page 26
“Evaluating a State Sensor Alarm” on page 29
“Evaluating an Indicator State” on page 32
Evaluate a Presence Sensor
1. Display the sensor status and determine the target type.
See:
“Display Oracle ILOM Sensor Status” on page 18
“Determine Oracle ILOM Sensor Target Types” on page 20
2. Learn why a presence sensor might alarm and take action.
See “Presence Sensor Alarm Conditions” on page 31.
Related Information
“Presence Sensor Alarm Conditions” on page 31
Presence Sensor Alarm Conditions
The presence sensors for the power supplies and fans indicate that the component is physically installed. The sensors do not provide status or health of a component.
During the boot process, the management controller looks for presence sensors to build a list of Oracle ILOM targets. If the presence sensor cannot be read, yet the component is physically installed, the management controller does not propagate the component to the list of targets. Even if the component powers up, so long as it is invisible to the management controller, the component cannot be used.
If a presence sensor alarms while a component is functional, the management controller functions as if the component were removed from the chassis. This situation might cause a fault on the component. If the lack of the component violates a configuration rule, the chassis Attention LED might illuminate.
Detecting and Managing Faults 31
Page 40
When a component is identified as not present, but it is installed, the suggested action is to replace that component. See “Servicing Fans” on page 55, “Servicing
Power Supplies” on page 41. If the known good component is still identified as not
present, replace the gateway.
Related Information
“Evaluate a Presence Sensor” on page 31
Evaluating an Indicator State
These topics help you resolve Indicator state alarms.
“Evaluate an Indicator State” on page 32
“Indicator State Values” on page 33
“Indicator State Conditions” on page 33
Related Information
“Display Oracle ILOM Sensor Status” on page 18
“Determine Oracle ILOM Sensor Target Types” on page 20
“Evaluating a Voltage Sensor Alarm” on page 20
“Evaluating a Temperature Sensor Alarm” on page 23
“Evaluating a Speed Sensor Alarm” on page 26
“Evaluating a State Sensor Alarm” on page 29
“Evaluating a Presence Sensor Alarm” on page 30
Evaluate an Indicator State
1. Display the sensor status and determine the target type.
See:
“Display Oracle ILOM Sensor Status” on page 18
“Determine Oracle ILOM Sensor Target Types” on page 20
2. Compare the displayed value with a known good range.
See “Indicator State Values” on page 33.
3. Learn why an indicator might change state and take action.
See “Indicator State Conditions” on page 33
32 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 41
Related Information
“Indicator State Values” on page 33
“Indicator State Conditions” on page 33
Indicator State Values
This table lists typical values and acceptable ranges for the indicator targets. The indicator targets report the state of the chassis status LEDs. You use this table in conjunction with the value you recorded in “Display Oracle ILOM Sensor Status” on
page 18. If your indicator target’s value is outside of the acceptable range, refer to “Indicator State Conditions” on page 33.
Indicator Target Typical Value Acceptable Value
/SYS/I_LOCATOR Off On or Off
/SYS/I_ATTENTION Off Off
/SYS/I_POWER On On
Related Information
“Evaluate an Indicator State” on page 32
“Indicator State Conditions” on page 33
Indicator State Conditions
Three primary LED indicators provide management controller status, general chassis status, and identification. The table correlates the indicator target with the LED that represents that target.
Indicator Sensor Target LED
/SYS/I_LOCATOR Locator
/SYS/I_ATTENTION Attention
/SYS/I_POWER OK
When the locator LED is on, it is actually flashing. If the gateway is installed into a relatively dense rack, the flashing action makes the gateway more conspicuous for identification.
Detecting and Managing Faults 33
Page 42
When the Attention LED is on, it indicates a fault within the gateway chassis. There is no single fault type that causes the Attention LED to light, so when it is illuminated, you must determine why.
When the OK LED is off, it indicates a gateway start up condition or the gateway is completely powered off. If the gateway is in neither state, yet the OK LED is off, there is a fault with the management controller, and the situation requires further investigation.
See “Check Chassis Status LEDs” on page 4 and “Display Oracle ILOM Sensor
Status” on page 18 to help determine the alarm condition of the gateway.
Related Information
“Evaluate an Indicator State” on page 32
“Indicator State Values” on page 33
Accessing CLI Prompts
These tasks enable you to issue Oracle ILOM and restricted shell commands on the management controller.
“Access the Oracle ILOM CLI (NET MGT Port)” on page 35
“Enter the Restricted Linux Shell” on page 35
“Exit the Restricted Linux Shell” on page 36
Related Information
“Interpreting Status LEDs” on page 1
“Managing Faulty Components” on page 7
“Identify Faults in the Oracle ILOM Event Log” on page 12
“Determining the Alarm State of a Component or System” on page 13
“Evaluating Sensor Alarms” on page 17
34 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 43
Access the Oracle ILOM CLI (NET MGT Port)
1. If you have not already done so, configure the DHCP server with the MAC address and new host name of the management controller inside of the gateway.
The MAC address is printed on the customer information (yellow) sheet on the outside of the gateway shipping carton and on the pull-out tab on the left side front of the gateway, adjacent to power supply 0.
2. Open an SSH session and connect to the management controller by specifying the controller’s host name.
For example:
% ssh -l ilom-admin nm2name ilom-admin@nm2name’s password: password
->
where nm2name is the host name of the management controller. Initially, the password is ilom-admin.
Note – You can change the password at a later time. Refer to Gateway Remote
Management, changing a user role or password, for instructions on how to change
Oracle ILOM user passwords.
The Oracle ILOM shell prompt (->) is displayed.
Related Information
“Enter the Restricted Linux Shell” on page 35
“Exit the Restricted Linux Shell” on page 36
Enter the Restricted Linux Shell
1. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
Detecting and Managing Faults 35
Page 44
2. Enter the restricted Linux shell.
-> show /SYS/Fabric_Mgmt NOTE: show on Fabric_Mgmt will launch a restricted Linux shell. User can execute switch diagnosis, SM Configuration and IB monitoring commands in the shell. To view the list of commands, use "help" at rsh prompt.
Use exit command at rsh prompt to revert back to ILOM shell. FabMan@gateway_name->
The restricted shell prompt (FabMan@gateway_name ->) is displayed, and you can now issue hardware and InfiniBand commands.
When you want to leave the restricted shell, type the exit command.
Related Information
“Access the Oracle ILOM CLI (NET MGT Port)” on page 35
“Exit the Restricted Linux Shell” on page 36
Exit the Restricted Linux Shell
When you want to leave the restricted shell, use the exit command.
On the management controller, type.
FabMan@gateway_name->exit exit
->
Related Information
“Access the Oracle ILOM CLI (NET MGT Port)” on page 35
“Enter the Restricted Linux Shell” on page 35
36 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 45
Understanding Service Procedures
Servicing the gateway means a component addition, replacement, or subtraction.
A component addition means installing a component to increase the functionality of the gateway. Component replacement means removing a failed component and installing a functional one. Component subtraction means removing a component.
Once a failed part is identified, it can be replaced. The topics listed here help you service gateway chassis components.
“Replaceable Components” on page 37
“Suggested Tools for Service” on page 39
“Antistatic Precautions for Service” on page 39
Related Information
“Detecting and Managing Faults” on page 1
“Servicing Power Supplies” on page 41
“Servicing Fans” on page 55
“Servicing Data Cables” on page 65
“Servicing the Battery” on page 75
Replaceable Components
This illustration identifies the replaceable components of the gateway.
37
Page 46
FIGURE: Replaceable Components
Figure Legend
1 Battery
2 Fan
3 Power supply
Related Information
“Servicing Power Supplies” on page 41
“Servicing Fans” on page 55
“Servicing Data Cables” on page 65
“Servicing the Battery” on page 75
“Suggested Tools for Service” on page 39
“Antistatic Precautions for Service” on page 39
38 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 47
Suggested Tools for Service
These tools are necessary or beneficial for servicing the gateway:
Antistatic wrist strap
Antistatic mat
No. 2 Phillips screwdriver
No. 1 Phillips screwdriver
Flashlight
Gloves
Magnifying glass
Related Information
“Replaceable Components” on page 37
“Antistatic Precautions for Service” on page 39
Antistatic Precautions for Service
When installing the gateway chassis, take care to follow antistatic precautions:
Use an antistatic mat as a work surface.
Wear an antistatic wrist strap that is attached to either the mat or a metal portion
of the gateway chassis.
Related Information
“Replaceable Components” on page 37
“Suggested Tools for Service” on page 39
Understanding Service Procedures 39
Page 48
40 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 49
Servicing Power Supplies
These topics provide procedures for servicing the power supplies.
Description Links
Add a power supply. “Inspecting a Power Supply” on page 43
“Install a Power Supply” on page 49
“Power On a Power Supply” on page 51
Replace a power supply. “Determine If a Power Supply Is Faulty” on
page 41
“Power Off a Power Supply” on page 46
“Remove a Power Supply” on page 47
“Inspecting a Power Supply” on page 43
“Install a Power Supply” on page 49
“Power On a Power Supply” on page 51
Subtract a power supply. “Power Off a Power Supply” on page 46
“Remove a Power Supply” on page 47
Related Information
“Detecting and Managing Faults” on page 1
“Understanding Service Procedures” on page 37
“Servicing Fans” on page 55
“Servicing Data Cables” on page 65
“Servicing the Battery” on page 75
Determine If a Power Supply Is Faulty
You must determine which power supply is faulty before you replace it.
41
Page 50
1. Check to see if any System Service Required LEDs are lit or flashing.
See “Check Chassis Status LEDs” on page 4.
2. Visually inspect the power supplies to see if any of their status LEDs are lit or flashing.
See “Check Power Supply Status LEDs” on page 6.
If a power supply is faulty, replace it. See “Remove a Power Supply” on page 47.
3. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
4. Verify that a power supply is faulty.
-> show -d targets /SP/faultmgmt
If a power supply is faulty, you will see /SYS/PSUx listed in the output under Target:, where x is 0 (left power supply) or 1 (right power supply).
For example:
-> show -d targets /SP/faultmgmt /SP/faultmgmt
Targets:
0 (/SYS/PSU0)
->
If a power supply is faulty, replace it. See “Remove a Power Supply” on page 47.
If a FRU value in addition to or different from /SYS/PSUx is displayed, see
“Clearable Fault Targets” on page 11 to identify which component is faulty.
In no Oracle ILOM targets are listed, go to Step 5.
5. If you are unable to determine if a power supply is faulty, seek further information.
See “Detecting and Managing Faults” on page 1.
Related Information
“Determine If a Fan Is Faulty” on page 55
“Determine If the Battery Is Faulty” on page 75
42 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 51
Inspecting a Power Supply
Before installing a power supply, perform these tasks to verify its suitability for installation.
Step Description Links
1. Identify the Power Supply. “Identify the Power Supply” on page 43
2. Inspect the hardware. “Inspect the Power Supply Hardware” on
page 45
3. Inspect the connectors. “Inspect the Power Supply Connectors” on
page 45
Related Information
“Inspecting a Fan” on page 57
“Inspecting the Data Cables” on page 65
Identify the Power Supply
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Inspecting a Power Supply” on page 43.
2. Use this illustration to identify the various features of a power supply.
Servicing Power Supplies 43
Page 52
1 AC connector
2 Release tab
3 Status LEDs
3. Inspect the power supply hardware.
See “Inspect the Power Supply Hardware” on page 45.
Related Information
“Identify the Fan” on page 57
“Identify the Data Cable” on page 66
44 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 53
Inspect the Power Supply Hardware
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Inspecting a Power Supply” on page 43.
2. Unwrap the replacement power supply from its antistatic packaging.
3. Verify that there is no visible damage to the power supply chassis.
4. Verify that the release tab moves freely and smoothly.
5. Inspect the power supply connectors.
See “Inspect the Power Supply Connectors” on page 45.
Related Information
“Inspect the Fan Hardware” on page 58
“Inspect the Data Cable Hardware” on page 67
Inspect the Power Supply Connectors
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Inspecting a Power Supply” on page 43.
2. Verify that the connectors are clean and without damage.
Servicing Power Supplies 45
Page 54
3. The power supply is ready for installation.
See “Install a Power Supply” on page 49.
Related Information
“Inspect the Fan Connector” on page 59
“Inspect the Data Cable Connectors or Transceivers” on page 67
Power Off a Power Supply
Note – Powering off both power supplies powers off the gateway.
1. Identify the prerequisite and subsequent service tasks you must perform in conjunction with this procedure.
See “Servicing Power Supplies” on page 41.
2. Determine which power supply is to be removed.
3. At the front of the gateway chassis, remove the power cord from the respective power supply.
46 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 55
The power supply is completely powered off.
4. Remove the power supply.
See “Remove a Power Supply” on page 47.
Related Information
“Power On a Power Supply” on page 51
Remove a Power Supply
1. Identify the prerequisite and subsequent service tasks you must perform in conjunction with this procedure.
See “Servicing Power Supplies” on page 41.
Servicing Power Supplies 47
Page 56
2. Locate the power supply to be removed.
3. Press and hold the release tab to the left and pull on the handle of the power supply.
4. Continue to pull the handle of the power supply to remove it from the chassis.
5. Set the power supply aside.
48 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 57
6. Install a replacement power supply.
See “Install a Power Supply” on page 49.
Related Information
“Remove a Fan” on page 60
“Remove a Data Cable” on page 68
“Remove the Gateway From the Rack” on page 77
“Replace the Battery” on page 78
Install a Power Supply
Note – For residual power discharge, the power supply slot must remain vacant for
at least one minute before installing a power supply.
1. Identify the prerequisite and subsequent service tasks you must perform in conjunction with this procedure.
See “Servicing Power Supplies” on page 41.
2. Inspect the replacement power supply.
See “Inspecting a Power Supply” on page 43.
3. Verify that the slot where the power supply installs is clean and free of debris.
4. Verify that the slot connector pins are straight and not missing.
5. Verify that the slot connector receptacles are free from obstructions.
6. Orient the power supply to the opening in the gateway chassis with the status LEDs on the left and the release tab on the right.
7. Slide the power supply into the open slot, pushing at the handle.
Servicing Power Supplies 49
Page 58
8. When the power supply seats, push firmly so that the release tab clicks to secure the power supply into the chassis.
9. Power on the power supply.
See “Power On a Power Supply” on page 51.
50 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 59
Related Information
“Install a Fan” on page 61
“Install a Data Cable” on page 72
“Replace the Battery” on page 78
Power On a Power Supply
1. For residual power discharge, the power cord must remain unattached to the power supply for at least one minute before powering on a power supply.
2. Identify the prerequisite and subsequent service tasks you must perform in conjunction with this procedure.
See “Servicing Power Supplies” on page 41.
3. Reconnect the power cord to the power supply.
Servicing Power Supplies 51
Page 60
The AC LED lights green to indicate that the power supply is connected to facility power. A moment later, the OK LED lights green to indicate the power supply is at full power.
4. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
5. Enter the restricted Linux shell.
See “Enter the Restricted Linux Shell” on page 35.
6. Verify the power supply’s operation with the checkpower and checkvoltages
commands on the management controller.
For example, to check the power supplies:
FabMan@gateway_name->checkpower PSU 0 present status: OK PSU 1 present status: OK All PSUs OK FabMan@gateway_name->
52 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 61
FabMan@gateway_name->checkvoltages Voltage ECB OK Measured 3.3V Main = 3.30 V Measured 3.3V Standby = 3.42 V Measured 12V = 12.06 V Measured 5V = 5.03 V Measured VBAT = 3.17 V Measured 1.0V = 1.01 V Measured I4 1.2V = 1.22 V Measured 2.5V = 2.51 V Measured V1P2 DIG = 1.18 V Measured V1P2 ANG = 1.18 V Measured 1.2V BridgeX = 1.22 V Measured 1.8V = 1.80 V Measured 1.2V Standby = 1.20 V All voltages OK FabMan@gateway_name->
Related Information
Gateway Reference, checkpower command
Gateway Reference, checkvoltages command
“Power Off a Power Supply” on page 46
Servicing Power Supplies 53
Page 62
54 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 63
Servicing Fans
These topics provide procedures for servicing the fans.
Description Links
Add a fan. “Inspecting a Fan” on page 57
“Install a Fan” on page 61
Replace a fan. “Determine If a Fan Is Faulty” on page 55
“Remove a Fan” on page 60
“Inspecting a Fan” on page 57
“Install a Fan” on page 61
Subtract a fan. “Remove a Fan” on page 60
Related Information
“Detecting and Managing Faults” on page 1
“Understanding Service Procedures” on page 37
“Servicing Power Supplies” on page 41
“Servicing Data Cables” on page 65
“Servicing the Battery” on page 75
Determine If a Fan Is Faulty
You must determine which power supply is faulty before you replace it.
1. Check to see if any System Service Required LEDs are lit or flashing.
See “Check Chassis Status LEDs” on page 4.
55
Page 64
2. Visually inspect the fans to see if any of their status LEDs are lit.
See “Check Fan Status LEDs” on page 7.
If a fan is faulty, replace it. See “Remove a Fan” on page 60.
3. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
4. Verify that a fan is faulty.
-> show -d targets /SP/faultmgmt
If a fan is faulty, you will see /SYS/FANx listed in the output under Target:, where x is 0 (left fan) to 4 (right fan).
For example:
-> show -d targets /SP/faultmgmt /SP/faultmgmt
Targets:
0 (/SYS/FAN2)
->
If a fan is faulty, replace it. See “Remove a Fan” on page 60.
If a FRU value in addition to or different from /SYS/FANx is displayed, see
“Clearable Fault Targets” on page 11 to identify which component is faulty.
If no Oracle ILOM targets are listed, go to Step 5.
5. Within the Oracle ILOM interface, verify the fan speed.
-> show /SYS/FANx/TACH value
where x is 0 (left fan) to 4 (right fan). For example:
-> show /SYS/FAN2/TACH value /SYS/FAN2/TACH
Properties:
value = 12317.000 RPM
->
6. Compare the value seen with the typical value and range provided in “Speed
Sensor Values” on page 27.
If the fan is faulty, replace it. See “Remove a Fan” on page 60.
56 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 65
7. If you are unable to determine if a fan is faulty, seek further information.
See “Detecting and Managing Faults” on page 1.
Related Information
“Determine If a Power Supply Is Faulty” on page 41
“Determine If the Battery Is Faulty” on page 75
Inspecting a Fan
Before installing a fan, inspect its hardware and connector to verify its suitability for installation.
Step Description Links
1. Identify the fan. “Identify the Fan” on page 57
2. Inspect the hardware. “Inspect the Fan Hardware” on page 58
3. Inspect the connector. “Inspect the Fan Connector” on page 59
Related Information
“Inspecting a Power Supply” on page 43
“Inspecting the Data Cables” on page 65
Identify the Fan
1. Identify the prerequisite and subsequent service tasks you must perform in conjunction with this procedure.
See “Inspecting a Fan” on page 57.
2. Use this illustration to identify the various features of a fan.
Servicing Fans 57
Page 66
1 Thumbscrew
2 Status LED
3. Inspect the fan hardware.
See “Inspect the Fan Hardware” on page 58.
Related Information
“Identify the Power Supply” on page 43
“Identify the Data Cable” on page 66
Inspect the Fan Hardware
1. Identify the prerequisite and subsequent service tasks you must perform in conjunction with this procedure.
See “Inspecting a Fan” on page 57.
2. Unwrap the replacement fan from its antistatic packaging.
3. Verify that there is no visible damage to the fan chassis.
4. Verify that the thumbscrew spins freely and smoothly.
5. Inspect the fan connector.
See “Inspect the Fan Connector” on page 59.
Related Information
“Inspect the Power Supply Hardware” on page 45
58 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 67
“Inspect the Data Cable Hardware” on page 67
Inspect the Fan Connector
1. Identify the prerequisite and subsequent service tasks you must perform in conjunction with this procedure.
See “Inspecting a Fan” on page 57.
2. Verify that the connector is clean and without damage.
3. Verify that the connector receptacles are free from obstructions.
4. Verify that the connector freely floats in its mounting.
5. The fan is ready for installation.
See “Install a Fan” on page 61.
Related Information
“Inspect the Power Supply Connectors” on page 45
“Inspect the Data Cable Connectors or Transceivers” on page 67
Servicing Fans 59
Page 68
Remove a Fan
Note – Fans are hot-swappable and do not require powering off. Additionally, if
there are fewer than two operational fans, the gateway shuts down to prevent thermal overload.
1. Identify the prerequisite and subsequent service tasks you must perform in conjunction with this procedure.
See “Servicing Fans” on page 55.
2. Determine which fan is to be removed.
If a fan has failed, its Attention LED lights.
3. Loosen the captive thumbscrew at the right side of the fan.
4. Grasp the handle and pull the fan straight out.
60 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 69
5. Set the fan aside.
6. Consider your next steps:
If you are removing the fan for replacement, install a new fan.
See “Install a Fan” on page 61.
If you are removing the fan as a subtractive action, you are finished.
Related Information
“Remove a Power Supply” on page 47
“Remove a Data Cable” on page 68
“Remove the Gateway From the Rack” on page 77
“Replace the Battery” on page 78
Install a Fan
1. Identify the prerequisite and subsequent service tasks you must perform in conjunction with this procedure.
See “Servicing Fans” on page 55.
Servicing Fans 61
Page 70
2. Inspect the replacement fan.
See “Inspecting a Fan” on page 57.
3. Verify that the slot where the fan installs is clean and free of debris.
4. Verify that the slot connector pins are straight and not missing.
5. Orient the fan to the opening in the gateway chassis with the thumbscrew on the right.
6. Firmly slide the fan into the chassis until the fan stops.
The fan might immediately power on.
7. Tighten the captive thumbscrew to secure the fan in the gateway chassis.
62 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 71
8. Verify that the fan Attention LED goes out.
9. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
10. Enter the restricted Linux shell.
See “Enter the Restricted Linux Shell” on page 35.
11. Use the getfanspeed command on the management controller to verify the
fan’s operation.
Note – You should see a fan speed for the fan you just installed.
For example, to check the fans:
FabMan@gateway_name->getfanspeed Fan 0 not present Fan 1 running at rpm 11212 Fan 2 running at rpm 11313 Fan 3 running at rpm 11521 Fan 4 not present FabMan@gateway_name->
Related Information
Gateway Reference, getfanspeed command
“Install a Power Supply” on page 49
“Install a Data Cable” on page 72
“Replace the Battery” on page 78
Servicing Fans 63
Page 72
64 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 73
Servicing Data Cables
These topics provide procedures for servicing the data cables.
Description Links
Add a data cable. “Inspecting the Data Cables” on page 65
“Install a Data Cable” on page 72
Replace a data cable. “Remove a Data Cable” on page 68
“Inspecting the Data Cables” on page 65
“Install a Data Cable” on page 72
Subtract a data cable. “Remove a Data Cable” on page 68
Related Information
“Detecting and Managing Faults” on page 1
“Understanding Service Procedures” on page 37
“Servicing Power Supplies” on page 41
“Servicing Fans” on page 55
“Servicing the Battery” on page 75
Inspecting the Data Cables
Before installing a data cable, inspect its hardware and connectors to verify its suitability for installation.
Step Description Links
1. Identify the cable. “Identify the Data Cable” on page 66
65
Page 74
Step Description Links
2. Inspect the hardware. “Inspect the Data Cable Hardware” on
page 67
3. Inspect the connectors “Inspect the Data Cable Connectors or
Transceivers” on page 67
Related Information
“Inspecting a Power Supply” on page 43
“Inspecting a Fan” on page 57
Identify the Data Cable
1. Identify the prerequisite and subsequent service tasks you must perform in conjunction with this procedure.
See “Inspecting the Data Cables” on page 65.
2. Use this illustration to identify the various features of the data cable.
1 Retraction strap
2 L groove
3 Paddle board
66 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 75
3. Inspect the data cable hardware.
See “Inspect the Data Cable Hardware” on page 67.
Related Information
“Identify the Power Supply” on page 43
“Identify the Fan” on page 57
Inspect the Data Cable Hardware
1. Identify the prerequisite and subsequent service tasks you must perform in conjunction with this procedure.
See “Inspecting the Data Cables” on page 65.
2. Verify that the cable is not cut or damaged.
3. Verify that the cable is not kinked or has a fold.
4. Verify that the cable is of the correct type from its label.
5. Inspect the cable connectors or transceivers.
See “Inspect the Data Cable Connectors or Transceivers” on page 67.
Related Information
“Inspect the Power Supply Hardware” on page 45
“Inspect the Fan Hardware” on page 58
Inspect the Data Cable Connectors or
Transceivers
1. Identify the prerequisite and subsequent service tasks you must perform in conjunction with this procedure.
See “Inspecting the Data Cables” on page 65.
2. Verify that the shell is not bent and is parallel to the inner boards.
3. Verify that there are no contaminants inside of the connector or transceiver.
4. Verify that the retractor strap or latch is adequate to remove the connector or transceiver from the receptacle.
5. Identify the reference surface by the L groove in the surface at the connector tip.
Servicing Data Cables 67
Page 76
6. The cable or transceiver is ready for installation.
See “Install a Data Cable” on page 72.
Related Information
“Inspect the Power Supply Connectors” on page 45
“Inspect the Fan Connector” on page 59
Remove a Data Cable
This procedure describes how to remove the cables from the gateway chassis, so that the cable can be replaced. If you are removing all cables for gateway replacement, start removing the cables from the left side of the gateway, working your way to the right.
Note – These instructions are valid for both InfiniBand and Ethernet data cables.
1. Identify the prerequisite and subsequent service tasks you must perform in conjunction with this procedure.
See “Servicing Data Cables” on page 65.
2. Loosen the thumbscrews and remove the cover for the cable management bracket.
68 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 77
3. Locate the cable to be removed.
4. Consider your next steps:
If the cable is a one-piece data cable, follow these steps:
a. Grasp the cable connector to support its weight and apply the removal
force.
b. Pull on the retractor strap while simultaneously pulling on the cable
connector.
The cable connector comes free.
Servicing Data Cables 69
Page 78
c. Carefully move the cable out of the cable management hardware.
d. Continue to Step 5.
If the cable is an assembled data cable, follow these steps:
a. Grasp the release collar on the MTP connector and pull back.
70 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 79
The MTP connector and fiber optic cable come free of the transceiver.
b. Carefully move the fiber optic cable out of the cable management
hardware.
c. Release the latch on the QSFP transceiver and pull on the latch to remove
the transceiver.
The transceiver comes free.
d. Set the transceiver aside.
e. Continue to Step 5.
5. Open hook-and-loop fasteners from bundles and securing hard points to gently lower the cable to the floor.
Caution – Do not allow the cable or transceiver to drop or strike the floor. Jerking,
bending, pulling on, or dropping the cable can damage the cable.
6. Consider your next steps:
If you are removing a single cable for replacement, install the new cable.
See “Install a Data Cable” on page 72.
Servicing Data Cables 71
Page 80
If you are disconnecting all cables for gateway replacement, repeat from Step 4
for all cables.
Related Information
“Remove a Power Supply” on page 47
“Remove a Fan” on page 60
“Remove the Gateway From the Rack” on page 77
“Replace the Battery” on page 78
Install a Data Cable
Note – These instructions are valid for InfiniBand and Ethernet data cables. Refer to
Gateway Installation, assembling the optical fiber data cables, for instructions how to assemble InfiniBand and Ethernet data cables that require assembly.
1. Identify the prerequisite and subsequent service tasks you must perform in conjunction with this procedure.
See “Servicing Data Cables” on page 65.
2. Determine your next steps:
If you are cabling an entire gateway after a replacement procedure, locate the
cable for the connector 0B and go to Step 6.
If you are installing a replacement cable to the gateway, start the procedure at
Step 3.
3. If necessary, assemble the data cable.
Refer to Gateway Installation , assembling the optical fiber data cables.
4. Inspect the replacement data cable.
See “Inspecting the Data Cables” on page 65.
5. Bring the replacement cable to the gateway.
6. Feed the cable through the cable management hardware.
7. Orient the cable connector to the QSFP receptacle squarely and horizontally.
Ensure that the L groove is up for the top row of receptacles, or that the L groove is down for the bottom row of receptacles.
72 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 81
Note – On some QSFP cable connectors, there is a retraction strap. Both the
retraction strap and L groove indicate the reference surface for the connector. When installing QSFP cables in the top row receptacles (0A, 1A, 2A, and so on), ensure that the L groove and retraction strap are up. When installing QSFP cables in the bottom row receptacles (0B, 1B, 2B, and so on) ensure that the L groove and retraction strap are down. See “Identify the Data Cable” on page 66.
8. Slowly move the connector in.
As you slide the connector in, the shell should be in the center of the QSFP receptacle.
If the connector stops or binds after about 1/4 in. (5 mm) travel, back out and
repeat from Step 7.
Servicing Data Cables 73
Page 82
If the connector stops or binds with about 1/8 in. (2 mm) still to go, back out
and repeat Step 8.
9. Continue to push the connector in until you feel a detent.
10. Secure the cable into the cable management hardware.
Close hook-and-loop fasteners at bundles and securing hard points.
11. If you are installing all cables as part of a gateway replacement procedure,
repeat from Step 6 for all cables, including the Ethernet data cables at connectors 0A and 1A on the right side of the rear panel.
12. Replace the cover for the cable management bracket and tighten the thumbscrews.
Related Information
“Install a Power Supply” on page 49
“Install a Fan” on page 61
“Replace the Battery” on page 78
74 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 83
Servicing the Battery
The gateway has a battery on the main board that supports the management controller. You can only replace the battery because the management controller is dependent upon the battery. You cannot add or subtract the battery. Perform these tasks in order to replace the battery:
Step Description Links
1. Determine if the battery is faulty. “Determine If the Battery Is Faulty” on page 75
2. Remove all data cables. “Remove a Data Cable” on page 68
3. Power off both power supplies. “Power Off a Power Supply” on page 46
4. Remove the gateway from the rack. “Remove the Gateway From the Rack” on
page 77
5. Replace the battery. “Replace the Battery” on page 78
6. Install the gateway in the rack. Gateway Installation, installing the gateway
Related Information
“Detecting and Managing Faults” on page 1
“Understanding Service Procedures” on page 37
“Servicing Power Supplies” on page 41
“Servicing Fans” on page 55
“Servicing Data Cables” on page 65
Determine If the Battery Is Faulty
You must determine if the battery is faulty before you replace it.
1. Check to see if any System Service Required LEDs are lit or flashing.
See “Check Chassis Status LEDs” on page 4.
75
Page 84
2. Access the Oracle ILOM CLI.
See “Access the Oracle ILOM CLI (NET MGT Port)” on page 35.
3. Verify that the battery is faulty.
a. Type.
-> show -d targets /SP/faultmgmt
If the battery is faulty, you will see /SYS/MB listed in the output under Target:.
For example:
-> show -d targets /SP/faultmgmt /SP/faultmgmt
Targets:
0 (/SYS/MB)
->
b. Note the number to the left of /SYS/MB.
c. Type.
-> show -d properties /SP/faultmgmt/number/faults/0
where number is the number to the left of /SYS/MB. For example:
-> show -d properties /SP/faultmgmt/0/faults/0 /SP/faultmgmt/0/faults/0
Properties:
class = fault.chassis.device.battery.low sunw-msg-id = DCSIB-8000-45 uuid = 82e90599-8650-47dc-b613-1e602607441b timestamp = 2002-01-01/00:07:27 fru_part_number = 3002234 fru_serial_number = 006541 product_serial_number = AK00022680 chassis_serial_number = AK00022680
->
76 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 85
d. Look for the word battery in the output for the class property.
If the battery is faulty, replace it. See “Replace the Battery” on page 78.
If you do not see the word battery, or if a FRU value in addition to or different from /SYS/MB is displayed in Step a, see “Clearable Fault Targets” on page 11 to identify which component is faulty.
If no Oracle ILOM targets are listed in Step a,gotoStep 4.
4. Within the Oracle ILOM interface, verify the battery voltage.
-> show /SYS/MB/V_BAT value /SYS/MB/V_BAT
Properties:
value = 3.136 Volts
->
5. Compare the value seen with the typical value and range provided in “Voltage
Sensor Values” on page 22.
If the battery is faulty, replace it. See “Replace the Battery” on page 78.
6. If you are unable to determine if the battery is faulty, seek further information.
See “Detecting and Managing Faults” on page 1.
Related Information
“Determine If a Power Supply Is Faulty” on page 41
“Determine If a Fan Is Faulty” on page 55
Remove the Gateway From the Rack
Note – This procedure assumes that you have removed all data cables from the
gateway and have powered down both power supplies by removing both power cords. If not, see “Remove a Data Cable” on page 68 and “Power Off a Power
Supply” on page 46.
1. Identify the prerequisite and subsequent service tasks you must perform in
conjunction with this procedure.
See “Servicing the Battery” on page 75.
2. Disconnect the management cables.
Servicing the Battery 77
Page 86
3. Use a No. 2 Phillips screwdriver to remove the four screws that secure the front of the gateway into the rack.
4. Slide the gateway out of the front of the rack.
5. Set the gateway chassis onto a stable work surface.
Related Information
Gateway Installation, installing the gateway into the rack
“Remove a Power Supply” on page 47
“Remove a Fan” on page 60
“Remove a Data Cable” on page 68
“Replace the Battery” on page 78
Replace the Battery
Note – This procedure assumes that you have removed the Sun Network QDR
InfiniBand Gateway Switch from Oracle from the rack. If not, see “Remove the
Gateway From the Rack” on page 77.
1. Identify the prerequisite and subsequent service tasks you must perform in conjunction with this procedure.
See “Servicing the Battery” on page 75.
2. Use a No. 1 Phillips screwdriver to remove the eight screws that secure the C-shaped brackets at the rear sides of the gateway chassis.
78 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 87
3. Remove the eight screws that secure the long front brackets at the front sides of the gateway chassis.
4. Remove the 16 screws that secure the top cover to the chassis.
There are five screws on each side and six screws across the top front of the cover.
Servicing the Battery 79
Page 88
5. Slide the cover forward and lift it off.
6. Depress the clip that retains the battery and release the battery from the main board.
80 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 89
7. Properly dispose of the old battery.
8. Unwrap the replacement battery from its antistatic packaging.
9. Install the replacement battery into the main board with the + side up.
Servicing the Battery 81
Page 90
10. Orient the cover over the chassis and lower it in place.
11. Slide the cover rearward so that it engages at the rear panel.
Ensure that the screw holes in the cover align with the holes in the chassis.
82 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 91
12. Use a No. 1 Phillips screwdriver to install the 16 screws that secure the cover to the chassis.
13. Use eight screws to attach the two front brackets to the front sides of the chassis.
Servicing the Battery 83
Page 92
14. Use eight screws to attach the two C-shaped brackets to the rear sides of the chassis.
15. Install the gateway into the rack.
Refer to Gateway Installation , installing the gateway into the rack.
Related Information
“Install a Power Supply” on page 49
“Install a Fan” on page 61
“Install a Data Cable” on page 72
84 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 93
Index
A
accessing
CLI prompts, 34 Oracle ILOM
NET MGT port, 34, 35
alarm conditions
presence sensor, 31 state sensor, 30
alarm state
description, 16 displaying
system, 14
antistatic precautions, 39
B
battery
determining faulty, 75 replacing, 78 servicing, 75
C
chassis
status LEDs, 4
checking
LEDs
chassis status, 4 fan, 7 link, 5 NET MGT, 4 power supply, 6
checkpower command, 51 checkvoltages command, 51
clearable fault targets, 11 CLI
displaying
faulty components, 8, 9
command
checkpower,51 checkvoltages,51
components
alarm state, 14 alarm targets, 15 determining alarm state, 13 managing faulty, 7 resetting, 10
D
data cable
features, 66 inspecting, 65
connectors, 67 hardware, 67
transceivers, 67 installing, 72 removing, 68 servicing, 65
detecting faults, 1 determining
component alarm state, 13 faulty
battery, 75
fans, 55
power supplies, 41 sensor alarm types, 20 system alarm state, 13
displaying
alarm state
component, 14
system, 14 from CLI
faulty components, 8, 9 sensor alarm status, 18
85
Page 94
E
entering
Linux shell, 35 restricted shell, 35
Ethernet cable
features, 66 inspecting, 65
connectors, 67 hardware, 67
transceivers, 67 installing, 72 removing, 68 servicing, 65
evaluating
indicator state, 32 presence sensor, 31 presence sensor alarms, 30 sensor alarms, 17 speed sensor, 26 speed sensor alarms, 26 state sensor, 29 state sensor alarms, 29 temperature sensor, 24 temperature sensor alarms, 23 voltage sensor, 21 voltage sensor alarms, 20
exiting
Linux shell, 36 restricted shell, 36
F
fan
checking
LEDs, 7 determining faulty, 55 features, 57 inspecting, 57
connector, 59
hardware, 58 installing, 61 LED, 2 removing, 60 servicing, 55
faults
clearing manually, 10 detecting, 1 identifying in log, 12 managing, 1
faulty
battery, 75 fan, 55
power supply, 41 faulty components, 8, 9 features
data cable, 66
Ethernet cable, 66
fan, 57
InfiniBand cable, 66
power supply, 43 front status LEDs, 2
G
gateway
powering off, 46
removing from rack, 77
I
identifying
data cable, 66
Ethernet cable, 66
fan, 57
faults in log, 12
InfiniBand cable, 66
power supply, 43 indicator
evaluating state, 32
state conditions, 33
values, 33 InfiniBand cable
features, 66
inspecting, 65
connectors, 67 hardware, 67
transceivers, 67 installing, 72 removing, 68 servicing, 65
inspecting
data cable, 65
connectors, 67
hardware, 67
transceivers, 67 Ethernet cable, 65
connectors, 67
hardware, 67
transceivers, 67
86 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Page 95
fan, 57
connector, 59 hardware, 58
InfiniBand cable, 65
connectors, 67 hardware, 67 transceivers, 67
power supply, 43
connectors, 45 hardware, 45
installing
data cable, 72 Ethernet cable, 72 fans, 61 InfiniBand cable, 72 power supply, 49
L
LEDs
chassis status, 3, 4 fan, 2, 7 front, 2 interpreting, 1 link, 3, 5 NET MGT, 3, 4 power supply, 2, 6 rear, 3
link
LEDs, 5
Linux shells
entering, 35 exiting, 36
M
managing
faults, 1 faulty components, 7
N
network management
checking LEDs, 4
O
Oracle ILOM
accessing
NET MGT port, 34, 35
out of range
speed sensor, 27 temperature sensor, 25 voltage sensor, 22
P
paddle boards, 66 power supply
checking
LEDs, 6 determining faulty, 41 features, 43 inspecting, 43
connectors, 45
hardware, 45 installing, 49 LEDs, 2 powering off, 46 powering on, 51 removing, 47 servicing, 41
powering off
gateway, 46 power supply, 46
powering on
power supply, 51
presence sensor
alarm conditions, 31 evaluating, 31
R
rear status LEDs, 3 removing
data cable, 68 Ethernet cable, 68 fan, 60 gateway from rack, 77 InfiniBand cable, 68 power supply, 47
replaceable components, 37 replacing the battery, 78 resetting
components, 10
restricted shell
entering, 35 exiting, 36
retraction strap, 66
Index 87
Page 96
S
sensor alarms
determining types, 20 displaying status, 18 evaluating, 17 presence, 30 speed, 26 state, 29 temperature, 23 voltage, 20
servicing
battery, 75 data cable, 65 Ethernet cable, 65 fan, 55 InfiniBand cable, 65 power supply, 41
speed sensor
evaluating, 26 out of range, 27 values, 27
state sensor
alarm conditions, 30 evaluating, 29
system
alarm state, 14 alarm targets, 15 determining alarm state, 13
speed sensor, 27 temperature sensor, 24 voltage sensor, 22
voltage sensor
evaluating, 21 out of range, 22
values, 22
T
targets
alarm state
component, 15 system, 15
temperature sensor
evaluating, 24 out of range, 25 values, 24
tools, 39
U
understanding
service procedures, 37
V
values
indicator state, 33
88 Sun Network QDR InfiniBand Gateway Switch Service Manual for Firmware Version 2.1 • March 2013
Loading...