Sun Microsystems, Inc.
4150 Network Circle
Santa Clara, CA 95054 U.S.A.
650-960-1300
Part No. 816-5255-12
March 2003, Revision A
Send comments about this document to: docfeedback@sun.com
Copyright 2003Sun Microsystems, Inc.,4150 NetworkCircle, SantaClara, California95054, U.S.A.All rightsreserved.
Sun Microsystems, Inc.has intellectualproperty rightsrelating to technology embodied in the productthat isdescribed inthis document.In
particular,and without limitation, these intellectual property rightsmay includeone ormore ofthe U.S.patents listedat
http://www.sun.com/patents and one or moreadditional patentsor pendingpatent applicationsin theU.S. andin othercountries.
This document and the productto whichit pertainsare distributedunder licensesrestricting their use, copying, distribution, and
decompilation. No part of the product orof thisdocument maybe reproducedin any form by any means without prior written authorization of
Sun and its licensors, if any.
Third-partysoftware, includingfont technology,is copyrighted and licensed fromSun suppliers.
Parts of the product maybe derivedfrom BerkeleyBSD systems,licensed fromthe University of California. UNIX is a registered trademarkin
the U.S. and in other countries, exclusively licensed throughX/Open Company,Ltd.
Sun, Sun Microsystems,the Sunlogo, AnswerBook2,Sun StorEdge,StorTools,docs.sun.com, SunEnterprise, SunFire, SunOS, Netra, SunSolve
and Solaris aretrademarks, registeredtrademarks, or service marks of Sun Microsystems, Inc. inthe U.S.and othercountries. AllSPARC
trademarks are usedunder licenseand aretrademarks orregisteredtrademarks ofSPARCInternational, Inc.in theU.S. and other countries.
Productsbearing SPARCtrademarks arebased uponan architecturedevelopedby SunMicrosystems, Inc.
All SPARCtrademarks areused underlicense andare trademarks or registered trademarksof SPARCInternational, Inc.in theU.S. andin other
countries. Products bearingSPARCtrademarks arebased upon an architecture developedby SunMicrosystems, Inc.
The OPEN LOOK and Sun™ Graphical User Interface was developed bySun Microsystems,Inc. forits usersand licensees. Sun acknowledges
the pioneering effortsof Xeroxin researchingand developing the concept of visual orgraphical userinterfaces forthe computerindustry.Sun
holds a non-exclusive license fromXerox tothe XeroxGraphical User Interface, which license also covers Sun’s licensees who implement OPEN
LOOK GUIs and otherwise comply with Sun’s written license agreements.
Netscape Navigator is a trademark or registeredtrademark ofNetscape CommunicationsCorporation inthe UnitedStates andother countries.
U.S. Government Rights—Commercialuse. Governmentusers aresubject tothe SunMicrosystems, Inc. standardlicense agreementand
applicable provisions ofthe FAR and its supplements.
DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES,
INCLUDING ANY IMPLIED WARRANTYOF MERCHANTABILITY,FITNESS FORA PARTICULARPURPOSE ORNON-INFRINGEMENT,
ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Copyright 2003 Sun Microsystems, Inc.,4150 NetworkCircle, SantaClara, California95054, Etats-Unis.Tousdroitsréservés.
Sun Microsystems, Inc.a lesdroits depropriété intellectuels relatantsà latechnologie incorporéedans leproduit quiest décritdans ce
document. En particulier,et sans la limitation, ces droits depropriété intellectuelspeuvent inclureun ou plus des brevetsaméricains énumérés
à http://www.sun.com/patentset unou lesbrevets plussupplémentaires ou les applications de brevet enattente dansles Etats-Uniset dans
les autres pays.
Ce produit oudocument estprotégé parun copyrightet distribuéavec deslicences quien restreignentl’utilisation,la copie,la distribution,et la
décompilation. Aucune partie de ce produit oudocument nepeut êtrereproduitesous aucuneforme, parquelquemoyen quece soit,sans
l’autorisation préalable et écrite de Sun et de ses bailleurs delicence, s’ily ena.
Le logiciel détenu par des tiers, et qui comprendla technologierelative auxpolices decaractères, est protégépar uncopyright etlicencié pardes
fournisseurs de Sun.
Des parties de ce produitpourront êtredérivées des systèmes Berkeley BSD licenciés par l’Université de Californie. UNIX est une marque
déposée aux Etats-Unis et dans d’autres payset licenciéeexclusivement parX/Open Company,Ltd.
Sun, Sun Microsystems,le logoSun, AnswerBook2,Sun StorEdge,StorTools,docs.sun.com, SunEnterprise, SunFire, SunOS, Netra, SunSolve,
et Solaris sont des marquesde fabriqueou desmarques déposées,ou marquesde service, de Sun Microsystems,Inc. auxEtats-Unis etdans
d’autrespays. Toutesles marquesSPARCsont utilisées sous licence et sont des marques defabrique oudes marques déposées de SPARC
International, Inc. aux Etats-Unis et dans d’autres pays.Les produitsportant lesmarques SPARCsont baséssur unearchitecturedéveloppée
par Sun Microsystems,Inc.
Toutes les marquesSPARCsont utiliséessous licenceet sontdes marquesde fabrique ou des marquesdéposées deSPARCInternational, Inc.
aux Etats-Unis et dans d’autrespays. Lesproduits protantles marques SPARCsont baséssur unearchitecture développéepar Sun
Microsystems,Inc.
L’interfaced’utilisation graphique OPEN LOOK et Sun™ a été développéepar SunMicrosystems, Inc.pour sesutilisateurs etlicenciés. Sun
reconnaîtles effortsde pionniersde Xeroxpour la rechercheet ledéveloppment duconcept desinterfaces d’utilisationvisuelle ougraphique
pour l’industrie de l’informatique. Sun détient une license non exclusive do Xerox surl’interface d’utilisationgraphique Xerox,cette licence
couvrant également les licenciées de Sun qui mettent en place l’interfaced ’utilisationgraphique OPENLOOK etqui enoutre seconforment
aux licences écrites de Sun.
Netscape Navigator est une marque de Netscape Communications Corporation aux Etats-Unis et dans d’autrespays.
LA DOCUMENTATION EST FOURNIE "EN L’ETAT" ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES
OU TACITES SONT FORMELLEMENTEXCLUES, DANSLA MESUREAUTORISEE PARLA LOIAPPLICABLE, YCOMPRIS NOTAMMENT
TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L’APTITUDE A UNE UTILISATION PARTICULIERE OU A
L’ABSENCE DE CONTREFAÇON.
Please
Recycle
Contents
PrefaceXV
How This Book Is OrganizedXV
Using UNIX CommandsXVI
Typographic ConventionsXVII
Shell PromptsXVII
Related DocumentationXVIII
Accessing Sun Documentation OnlineXX
Sun Welcomes Your CommentsXX
1.Introduction1
Predictive Failure Analysis (PFA) Capabilities2
2.General Troubleshooting Procedures3
High-Level Troubleshooting Tasks3
Host-Side Troubleshooting6
Storage Service Processor-Side Troubleshooting6
Verifying the Configuration Settings7
▼To Verify Configuration Settings7
Clearing the Lock File10
▼To Clear the Lock File10
Sun Proprietary/Confidential: Internal Use Only
ContentsIII
Sun StorEdge 6900 Series Multipathing Example11
Multipathing Options in the Sun StorEdge 6900 Series16
Manually Halting the I/O17
▼To Quiesce the I/O17
▼To Unconfigure the c2 Path17
Suspending the I/O18
▼To Put the c2 Path Back into Production19
▼To View the Dynamic Multi-Pathing (DMP) Properties20
▼To Put the DMP-Enabled Paths Back into Production22
3.Troubleshooting Tools23
Storage Automated Diagnostic Environment 2.223
Example Topology24
Generating Component-Specific Event Grids25
▼To Customize an Event Report25
Microsoft Windows 2000 System Errors26
Command Line Test Examples27
qlctest(1M)27
switchtest(1M)28
Monitoring Sun StorEdge T3 and T3+ Arrays Using the Explorer Data Collection
Utility29
▼To Install the Explorer Data Collection Utility on the Storage Service
Processor29
Monitoring Host Bus Adapters (HBAs) Using QLogic SANblade Manager32
4.Troubleshooting Ethernet Hubs35
5.Troubleshooting the Fibre Channel (FC) Links37
FC Links38
FC Link Diagrams39
ContentsIV
Sun Proprietary/Confidential: Internal Use Only
Troubleshooting the A1 or B1 FC Link42
Verifying the Data Host45
FRU Tests Available for the A1 or B1 FC Link Segment46
▼To Isolate the A1 or B1 FC Link48
Troubleshooting the A2 or B2 FC Link49
Verifying the Data Host51
Verifying the A2 or B2 FC Link52
FRU Tests Available for the A2 or B2 FC Link Segment52
▼To Isolate the A2 or B2 FC Link52
Troubleshooting the A3 or B3 FC Link54
Verifying the Data Host56
Verifying the Storage Service Processor-Side57
FRU Tests Available for the A3 or B3 FC Link Segment57
▼To Isolate the A3 or B3 FC Link58
Quiescing the I/O on the A3 or B3 Link59
Suspending the I/O on the A3 to B3 Link59
Troubleshooting the A4 or B4 FC Link60
Verifying the Data Host62
Sun StorEdge 3900 Series62
Sun StorEdge 6900 Series62
FRU Tests Available for the A4 or B4 FC Link Segment64
▼To Isolate the A4 or B4 FC Link64
6.Troubleshooting Host Devices67
Using the Host Event Grid67
▼To Access the Host Event Grid67
Replacing the Master, Alternate Master, and Slave Monitoring Host71
▼To Replace the Master Host71
Sun Proprietary/Confidential: Internal Use Only
ContentsV
▼To Replace the Alternate Master or Slave Monitoring Host72
7.Troubleshooting Switches73
About the Switches73
Zone Modifications74
Switchless Configurations75
▼Diagnosing and Troubleshooting Switch Hardware Problems75
Using the Switch Event Grid77
▼To Use the Switch Event Grid77
setupswitch Exit Values85
8.Troubleshooting the Sun StorEdge T3+ Array Devices87
Troubleshooting the T1 or T2 Data Path88
Notification Events89
▼To Verify the Storage Service Processor92
FRU Tests Available for the T1 or T2 Data Path FRU93
About the Virtualization Engine107
Virtualization Engine Diagnostics108
Service Request Numbers (SRNs)108
Service and Diagnostic Codes108
Retrieving Service Information108
CLI Interface108
Error Log Analysis Commands109
▼To Display the Log Files and Retrieve SRNs109
VISun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
▼To Clear the Log110
Virtualization Engine LEDs110
Power LED Codes111
Interpreting LED Service and Diagnostic Codes111
Back Panel Features112
Ethernet Port LEDs112
FC Link Error Status Report113
▼To Check the FC Link Error Status Manually113
Translating Host-Device Names115
Displaying the VLUN Serial Number116
▼To Display Devices That are Not Sun StorEdge Traffic Manager (MPxIO)-
Enabled116
▼To Display Sun StorEdge Traffic Manager (MPxIO)-Enabled Devices117
Viewing the Virtualization Engine Map118
▼To Failback the Virtualization Engine120
Manually Clearing and Restoring the SAN Database123
▼To Reset the SAN Database on Both Virtualization Engines124
▼To Reset the SAN Database on a Single Virtualization Engine125
Restarting the slicd Daemon126
▼To Restart the slicd Daemon126
Diagnosing a creatediskpools(1M) Failure129
Virtualization Engine Event Grid132
▼To Use the Virtualization Engine Event Grid132
10.Troubleshooting Using Microsoft Windows 2000137
General Notes137
Troubleshooting Tasks Using Microsoft Windows 2000138
Launching the Sun StorEdge T3+ Array Failover Driver GUI138
Checking the Version of the Sun StorEdge T3+ Array Failover Driver139
Sun Proprietary/Confidential: Internal Use Only
ContentsVII
▼To Use the Sun StorEdge T3+ Array Failover Driver GUI140
▼To Use the Sun StorEdge T3+ Array Failover Driver Command Line
Interface (CLI)142
11.Example of Fault Isolation147
A.Virtualization Engine References155
SRN Reference155
SRN/SNMP Single Point-of-Failure Descriptions159
Port Communication Numbers160
Virtualization Engine Service Codes160
B.Configuration Utility Error Messages163
Virtualization Engine Error Messages164
Switch Error Messages168
Sun StorEdge T3+ Array Partner Group Error Messages171
Other Error Messages175
VIIISun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
List of Figures
FIGURE 2-1Sun StorEdge 6900 Series Logical View 11
FIGURE 2-2Primary Data Paths to the Alternate Master 12
FIGURE 2-3Primary Data Paths to the Master Sun StorEdge T3+ Array 13
FIGURE 2-4Path Failure—Before the Second Tier of Switches 14
FIGURE 2-5Path Failure—I/O Routed Through Both HBAs 15
FIGURE 3-1Storage Automated Diagnostic Environment Example Topology 24
FIGURE 3-2Microsoft Windows 2000 Event Properties System Log 26
FIGURE 3-3Qlogic SANblade Manager HBA Driver and Firmware Versions 33
FIGURE 3-4QLogic SANblade Manager Diagnostics 34
FIGURE 5-1Sun StorEdge 3900 Series FC Link Diagram 39
FIGURE 5-2Sun StorEdge 6900 Series FC Link Diagram 41
FIGURE 5-3Data Host Notification of Intermittent Problems 43
FIGURE 5-4Data Host Notification of Severe Link Error 43
FIGURE 5-5Storage Service Processor Notification 44
FIGURE 5-6A2 or B2 FC Link Host-Side Event 49
FIGURE 5-7A2 or B2 FC Link Storage Service Processor-Side Event 50
FIGURE 5-8A3 or B3 FC Link Host-Side Event 54
FIGURE 5-9A3 or B3 FC Link Storage Service Processor-Side Event 55
FIGURE 5-10A3 or B3 FC Link Storage Service Processor-Side Event 55
Sun Proprietary/Confidential: Internal Use Only
List of FiguresIX
FIGURE 5-11A4 or B4 FC Link Data-Host Notification 60
FIGURE 5-12Storage Service Processor-Side Notification 61
FIGURE 6-1Sample Host Event Grid 68
FIGURE 7-1Switch Event Grid 77
FIGURE 8-1Storage Service Processor Event 89
FIGURE 8-2Virtualization Engine Alert 90
FIGURE 8-3Manage Configuration Files Menu 92
FIGURE 8-4Example Link Test Text Output from the Storage Automated Diagnostic Environment 93
FIGURE 8-5Sun StorEdge T3+ Array Event Grid 95
FIGURE 9-1Virtualization Engine Front Panel LEDs 111
FIGURE 9-2Virtualization Engine Back Panel 112
FIGURE 9-3Virtualization Engine Event Grid 132
FIGURE 10-1Launching the Sun StorEdge T3+ Array Failover Driver 138
FIGURE 10-2Sun StorEdge T3+ Array Failover Driver Versions 2.0.0.123 and 2.1.0.104 139
FIGURE 10-3Healthy Sun StorEdge 3900 series system, shown using Multipath Configurator 140
FIGURE 10-4Sun StorEdge 3900 series system with a LUN failover, shown using Multipath
Configurator 141
FIGURE 10-5Multipath Configurator Array Properties 141
FIGURE 10-6Multipath Configurator LUN Properties Detail 142
FIGURE 10-7Sun StorEdge T3+ Array Failover Driver CLI Output for the Sun StorEdge 3900 Series 143
FIGURE 10-8Sun StorEdge T3+ Array Failover Driver CLI Example Output for the Sun StorEdge 6900
Series 144
FIGURE 11-1Alerts Display Using the Storage Automated Diagnostic Environment 147
FIGURE 11-2Drilling Down for Sun StorEdge T3+ Array Failover Driver Fault Detail 148
FIGURE 11-3Fault Confirmation Using QLogic SunBlade 149
FIGURE 11-4Diagnostics Using QLogic SunBlade 150
FIGURE 11-5Storage Automated Diagnostic Environment Test from Topology 151
FIGURE 11-6Storage Automated Diagnostic Environment Test from Topology Pull-Down Menu 152
FIGURE 11-7Storage Automated Diagnostic Environment Test from Topology Test Detail 152
List of FiguresX
Sun Proprietary/Confidential: Internal Use Only
FIGURE 11-8Successful Switch Test Results 153
FIGURE 11-9Multipath Recovery using the Sun StorEdge T3+ Array Multipath Configurator 154
FIGURE 11-10 Recovered Paths 154
Sun Proprietary/Confidential: Internal Use Only
List of FiguresXI
XII Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
List of Tables
TABLE 1-1Sun StorEdge 3900 and 6900 Series Configurations 1
TABLE 3-1Event Grid Sorting Criteria 25
TABLE 5-1FC Links 38
TABLE 5-2Ax to Bx FC Links. 40
TABLE 6-1Storage Automated Diagnostic Environment Event Grid for the Host 69
TABLE 7-1Storage Automated Diagnostic Environment Event Grid for 1 Gbit Switches 78
TABLE 7-2Storage Automated Diagnostic Environment Event Grid for 2 GBit Switches 82
TABLE 0-1setupswitch Exit Values 85
TABLE 8-1Storage Automated Diagnostic Environment Event Grid for the Sun StorEdge T3+ Array 96
TABLE 9-1Virtualization Engine LEDs 110
TABLE 9-2LED Diagnostic Codes 111
TABLE 9-3Speed, Activity, and Validity of the Link 112
TABLE 9-4Virtualization Engine Statistical Data 113
TABLE 9-5Storage Automated Diagnostic Environment Event Grid for Virtualization Engine 133
TABLE 10-1Tips for Interpreting Sun StorEdge 6910 Series CLI Output 145
TABLE A-1SRN Reference 156
TABLE A-2SRN/SNMP Single Point-of-Failure Table 159
TABLE A-3Port CommunicationNumbers 160
TABLE A-4Virtualization Engine Service Codes —0 -399 Host-Side Interface Driver Errors 160
XIVSun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
Preface
The Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide provides guidelines
for isolating problems in supported configurations of the Sun StorEdge
6900 series. For detailed configuration information, refer to the Sun StorEdge 3900and 6900 Series Reference Manual.
The scope of this troubleshooting guide is limited to information pertaining to the
components of the Sun StorEdge 3900 and 6900 series, including the Storage Service
Processor, Sun StorEdge 1 Gbit and 2 Gbit switches, Sun StorEdge T3+ arrays, and
the virtualization engines in the Sun StorEdge 6900 series. This guide is written for
TM
personnel who have been fully trained on all the components in the
Sun
configuration.
TM
3900 and
How This Book Is Organized
This book contains the following topics:
Chapter 1 introduces the Sun StorEdge 3900 and 6900 series storage subsystems.
Chapter 2 offers general troubleshooting guidelines, such as manually halting the
I/O and returning paths to production.
Chapter 3 presents information about tools used to troubleshoot. Tools include the
Storage Automated Diagnostic Environment, component-specific event grids,
command line examples, and QLogic’s SANblade Manager.
Chapter 4 discusses Ethernet hub troubleshooting. Information associated with the
3Com Ethernet hubs is limited in this guide, however, because 3Com does not allow
duplication of its information.
Chapter 5 provides Fibre Channel (FC) link diagrams and troubleshooting
procedures.
Sun Proprietary/Confidential: Internal Use Only
XV
Chapter 6 provides information on host device troubleshooting.
Chapter 7 provides information on troubleshooting a Sun StorEdge Network FC
switch-8 and switch-16 switch device.
Chapter 8 describes how to troubleshoot the Sun StorEdge T3+ array devices. Also
included in this chapter is information about the Explorer Data Collection Utility.
Chapter 9 provides detailed information for troubleshooting the virtualization
engines.
Chapter 10 describes how to troubleshoot using Microsoft Windows 2000. It also
explains how to launch the Sun StorEdge T3+ Array Failover Driver GUI and
interpret the multipath configurator.
Chapter 11 provides an example of fault isolation. It begins with how to discover an
error and shows the user steps that are necessary for resolution.
Appendix A provides virtualization engine references, including Service Request
Numbers (SRNs) and Simple Network Management Protocol (SNMP) Reference, an
SRN/SNMP single point-of-failure table, and port communication and service code
tables.
Appendix B provides a list of SUNWsecfg(1M) error messages and
recommendations for corrective action.
Using UNIX Commands
This document may not contain information on basic UNIX®commands and
procedures such as shutting down the system, booting the system, and configuring
devices.
See one or more of the following documents for this information:
■ Solaris Handbook for Sun Peripherals
■ AnswerBook2™ online documentation for the Solaris™ operating environment
■ Other software documentation that you received with your system
XVISun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
Typographic Conventions
TypefaceMeaningExamples
AaBbCc123The names of commands, files,
and directories; on-screen
computer output
AaBbCc123
AaBbCc123Book titles, new words or terms,
What you type, when
contrasted with on-screen
computer output
words to be emphasized
Command-line variable; replace
with a real name or value
Edit your.login file.
Use ls -a to list all files.
% You have mail.
%
su
Password:
Read Chapter 6 in the User’s Guide.
These are called class options.
You must be superuser to do this.
To delete a file, type rm filename.
Shell Prompts
ShellPrompt
C shellmachine-name%
C shell superusermachine-name#
Bourne shell and Korn shell$
Bourne shell and Korn shell superuser#
Sun Proprietary/Confidential: Internal Use Only
PrefaceXVII
Related Documentation
ProductTitlePart Number
Late-breaking News• Sun StorEdge 3900 and 6900 Series 2.0 Release Notes816-5254
Sun StorEdge 3900 and
6900 series information
Sun StorEdge T3 and
T3+ array
Diagnostics• Storage Automated Diagnostics Environment User’s Guide816-3142
Sun StorEdge SAN 4.0
• Netra X1 Server Hard Disk Drive Installation Guide
875-3060
875-1881
875-3059
806-5980
806-5980
806-7670
Sun Proprietary/Confidential: Internal Use Only
PrefaceXIX
Accessing Sun Documentation Online
You can view, print, or purchase a broad selection of Sun documentation, including
localized versions, at:
http://www.sun.com/documentation
Sun Welcomes Your Comments
Sun is interested in improving its documentation and welcomes your comments and
suggestions. You can email your comments to Sun at:
docfeedback@sun.com
Please include the part number (816-5255) of your document in the subject line of
your email.
XXSun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
CHAPTER
SeriesSystem
1
Sun StorEdge
3900 series
Introduction
The Sun StorEdge 3900 and 6900 series storage subsystems are complete
preconfigured storage solutions. The configurations for each of the storage
subsystems are shown in
TABLE1-1Sun StorEdge 3900 and 6900 Series Configurations
Sun StorEdge
3910 system
Two 8-port
switches
TABLE 1-1.
Sun StorEdge
Fibre Channel
Switches
Supported
1
Sun StorEdg e
T3+ Array
Partner
Groups
Supported
One to four
Additional
Array Partner
Groups
Supported
with Optional
Additional
Expansion
Cabinet
N/A
Virtualization
Engine
N/A
2
3900SL
Sun StorEdge
6900 series
3
6910SL
3
6960SL
1
1 Gbit or 2 Gbit switches
2
3900SL—No switches
3
6910SL and 6960SL—No front-end switches; two back-end switches
Sun StorEdge
3960 system
Sun StorEdge
6910 system
Sun StorEdge
6960 system
Two 16-port
switches
Four 8-port
switches
Four 16-port
switches
One to four
One to three
One to three
Sun Proprietary/Confidential: Internal Use Only
One to five
One to four
One to four
One virtualization
engine pair
Two virtualization
engine pairs
1
Predictive Failure Analysis (PFA)
Capabilities
The Storage Automated Diagnostic Environment software provides the health and
monitoring functions for the Sun StorEdge 3900 and 6900 series systems. This
software provides the following predictive failure analysis (PFA) capabilities:
■ FC links—Fibre Channel (FC) links are monitored at all end points using the
Fibre Channel-Extended Link Service (FC-ELS) link counters. When link errors
surpass the threshold values, an alert is sent. This enables Sun-trained personnel
to replace components that are experiencing high transient fault levels before a
hard fault occurs.
■ Enclosure status—Many devices, like the Sun StorEdge FC switch-8 and switch-
16 switch and the Sun StorEdge T3+ array, cause the Storage Automated
Diagnostic Environment alerts to be sent if the temperature thresholds are
exceeded. This enables Sun-trained personnel to address the problem before the
component and enclosure fails.
■ Single Point-of-Failure (SPOF) notification—Storage Automated Diagnostic
Environment notification for path failures and failovers (that is, Sun StorEdge
Traffic Manager software failover) can be considered a PFA method, since Suntrained personnel are notified and can repair the primary path. This eliminates
the time of exposure to SPOF and helps to preserve customer availability during
the repair process.
PFA is not always effective in detecting or isolating failures. The remainder of this
document provides guidelines that you can use to troubleshoot problems that occur
in supported components of the Sun StorEdge 3900 and 6900 series.
2Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
CHAPTER
2
General Troubleshooting
Procedures
This chapter contains the following sections:
■ “High-Level Troubleshooting Tasks” on page 3
■ “Host-Side Troubleshooting” on page 6
■ “Storage Service Processor-Side Troubleshooting” on page 6
■ “Verifying the Configuration Settings” on page 7
■ “Sun StorEdge 6900 Series Multipathing Example” on page 11
■ “Multipathing Options in the Sun StorEdge 6900 Series” on page 16
High-Level Troubleshooting Tasks
This section lists the high-level steps you can take to isolate and troubleshoot
problems in the Sun StorEdge 3900 and 6900 series. It offers a methodical approach,
and lists the tools and resources available at each step.
Note – A single problem can cause various errors throughout the storage area
network (SAN). A good practice is to begin by investigating the devices that have
experienced “Loss of Communication” events in the Storage Automated Diagnostic
Environment. These errors usually indicate more serious problems.
A “Loss of Communication” error on a switch, for example, could cause multiple
ports and host bus adapters (HBAs) to go offline. Concentrating on the switch and
fixing that failure can help bring the ports and HBAs back online.
Sun Proprietary/Confidential: Internal Use Only
3
1. Discover the error by checking one or more of the following messages or files:
■ Storage Automated Diagnostic Environment alerts or email messages
■ /var/adm/messages
■ Sun StorEdge T3+ array syslog file
■ Storage Service Processor messages
■ /var/adm/messages.t3 messages
■ /var/adm/log/SEcfglog file
2. Determine the extent of the problem by using one or more of the following
methods:
■ Review the Storage Automated Diagnostic Environment topology view.
■ Using the Storage Automated Diagnostic Environment revision checking
functionality, determine whether the package or patch is installed.
■ Verify the functionality using one of the following tools:
■ checkdefaultconfig(1M)
■ cfgadm -al output
■ luxadm(1M) output
■ Review the multipathing status using the Sun StorEdge Traffic Manager (MPxIO)
software or vxdmp(1M) command.
3. Check the status of a Sun StorEdge T3+ array by using one or more of the
following methods:
■ Review the Storage Automated Diagnostic Environment device monitoring
reports.
■ Run the checkt3config(1M) and showt3(1M) commands, which check and
display the Sun StorEdge T3+ array configuration.
■ Manually open a Telnet session to the Sun StorEdge T3+ array.
■ Review the luxadm(1M) display output.
■ Review the LED status on the Sun StorEdge T3+ array.
■ Review the Explorer Data Collection Utility output, which is located on the
Storage Service Processor.
4Sun StorEdge 3900 and 6900 2.0 Series Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
4. Check the status of the Sun StorEdge network FC switch-8 and switch-16 switches
using the following tools:
■ Review the Storage Automated Diagnostic Environment device monitoring
reports.
■ Run the checkswitch(1M) and showswitch(1M) commands, which check and
display the Sun StorEdge FC switch configurations.
■ Review the online and offline LED status codes and POST error codes, which can
be found in the Sun StorEdge SAN 4.0 and SAN 4.1 Release Installation Guide.
■ Review the Explorer Data Collection Utility output, which is located on the
Storage Service Processor.
■ Refer to the SANsurfer GUI, which supports the Sun StorEdge 4.0 Release, or the
SANbox Manager, which supports the Sun StorEdge 4.1 Release.
Note – To run the SANsurfer GUI or SANbox Manager from the Storage Service
Processor, you must export X-Display.
5. Check the status of the virtualization engine using one or more of the following
methods:
■ Review the Storage Automated Diagnostic Environment device monitoring
reports.
■ Run the checkve(1M), checkvemap(1M) and showvemap(1M) commands, which
check and display the virtualization host and LUN configurations.
■ Refer to the LED status blink codes “Virtualization Engine LEDs” on page 110.
6. Quiesce the I/O along the path to be tested using one of the following methods:
■ For installations using VERITAS Dynamic Multi-Pathing (DMP), disable
vxdmpadm(1M).
■ For installations using the Sun StorEdge Traffic Manager (MPxIO) software,
unconfigure the Fabric device.
■ Refer to “To Quiesce the I/O” on page 17.
■ Halt the application.
7. Test and isolate field-replaceable units (FRUs) using the following tools:
■ Sun StorEdge T3+ array tests, including t3test(1M), t3ofdg(1M), and
t3volverify(1M), which can be found in the Storage Automated Diagnostic
Environment User’s Guide
Chapter 2General Troubleshooting Procedures5
Sun Proprietary/Confidential: Internal Use Only
Note – These tests isolate the problem to a FRU that must be replaced. Follow the
instructions in the Sun StorEdge 3900 and 6900 Series 2.0 Reference and Service Guide
and the Sun StorEdge 3900 and 6900 Series 2.0 Installation Guide for proper FRU
replacement procedures.
8. Verify the fix using the following tools:
■ Storage Automated Diagnostic Environment GUI Topology View and Diagnostic
Tests
■ /var/adm/messages on the data host
9. Return the path to service with one of the following methods:
■ Use the multipathing software
■ Restart the application
Host-Side Troubleshooting
Host-side troubleshooting refers to the messages and errors that the data host detects.
Usually these messages appear in the /var/adm/messages file.
Storage Service Processor-Side
Troubleshooting
Storage Service Processor-side troubleshooting refers to messages, alerts, and errors
that the Storage Automated Diagnostic Environment detects while running on the
Storage Service Processor. You can find these messages by monitoring the following
Sun StorEdge 3900 series and Sun StorEdge 6900 series components:
■ Sun StorEdge network FC switch-8 and switch-16 switches
■ Virtualization engine
■ Sun StorEdge T3+ array
Combining the host-side messages and errors and the Storage Service Processor-side
messages, alerts, and errors into a meaningful context is essential for proper
troubleshooting.
6Sun StorEdge 3900 and 6900 2.0 Series Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
Verifying the Configuration Settings
During the course of troubleshooting, you might need to verify configuration
settings on the various components in the Sun StorEdge 3900 or 6900 series.
▼ To Verify Configuration Settings
1. Run one of the following scripts:
■ Run the runsecfg(1M) script and select the various Verify menu selections for
the Sun StorEdge T3+ arrays, the Sun StorEdge network FC switch-8 and switch16 switches, and the virtualization engine components.
■ Run the checkdefaultconfig(1M) script to check all accessible components.
The output is shown in
■ Run the checkswitch(1M) | checkt3config(1M) | checkve(1M) |
checkvemap(1M) scripts from /opt/SUNWsecfg/bin to check the settings on
the Sun StorEdge network FC switch-8 and switch-16 switches, the Sun StorEdge
T3+ array, and the virtualization engine.
The scripts check the default configuration files in the /opt/SUNWsecfg/etc
directory and compare the current, live settings to those of the defaults. Any
differences are marked with a FAIL.
CODE EXAMPLE 2-1.
Note – For cluster configurations and systems that are attached to Microsoft
Windows NT, the default configurations may not match the current installed
configuration. Be aware of this when running the verification scripts. Certain items
may be flagged as FAIL in these special circumstances.
Checking command ver : PASS
Checking command vol stat : PASS
Checking command port list : PASS
Checking command port listmap : PASS
Checking command sys list: FAIL <-- Failure Noted
Checking T3+: t3b2
Checking : t3b2 Configuration.......
Checking command ver : PASS
Checking command vol stat : PASS
Checking command port list : PASS
Checking command port listmap : PASS
Checking command sys list : PASS
<snip>
8Sun StorEdge 3900 and 6900 2.0 Series Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
2. If anything is marked FAIL, check the /var/adm/log/SEcfglog file for the
details of the failure.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : ----------
-SAVED CONFIGURATION--------------.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : blocksize : 16k.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : cache : auto.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : mirror : auto.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : mp_support : rw.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : rd_ahead : off.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : recon_rate : med.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO :sys memsize : 32
MBytes.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : cache memsize :
256 MBytes.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : .
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : ----------
-CURRENT CONFIGURATION------------.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : blocksize : 16k.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : cache : auto.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : mirror : off.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : mp_support : rw.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : rd_ahead : off.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : recon_rate : med.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO :sys memsize : 32
MBytes.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : cache memsize :
256 MBytes.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : .
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : ----------
In this example, the mirror setting in the Sun StorEdge T3+ array system settings is
“off.” The saved configuration setting for this parameter, which is the default
setting, should be “auto.”
3. Fix the FAIL condition, and then verify the settings again.
# /opt/SUNWsecfg/bin/checkt3config -n t3b0
Checking : t3b0 Configuration.......
Checking command ver : PASS
Checking command vol stat : PASS
Checking command port list : PASS
Checking command port listmap : PASS
Checking command sys list : PASS
Chapter 2General Troubleshooting Procedures9
Sun Proprietary/Confidential: Internal Use Only
Clearing the Lock File
If you interrupt any of the Configuration Utility scripts (by typing Control-C, for
example), a lock file might remain in the /opt/SUNWsecfg/etc directory, causing
subsequent commands to fail. Use the following procedure to clear the lock file.
▼ To Clear the Lock File
1. Type the following command:
# /opt/SUNWsecfg/bin/removelocks
usage : removelocks [-t|-s|-v]
where:
-t - remove all T3+ related lock files.
-s - remove all switch related lock files.
-v - remove all virtualization engine related lock files.
# /opt/SUNWsecfg/bin/removelocks -v
Note – After making any change to the virtualization engine configuration, the
script saves a new copy of the virtualization engine map. This may take a minimum
of two minutes, during which time no additional virtualization engine changes are
accepted.
If a process such as savevemap(1M) is running, you cannot remove the lock file
using the removelocks(1M) command. This process causes a component to be
unavailable.
2. Monitor the /var/adm/log/SEcfglog file to see when the savevemap(1M)
process successfully exits.
CODE EXAMPLE 2-2savevemap(1M) Output
Tue Jan 29 16:12:34 MST 2002 savevemap: v1 ENTER.
Tue Jan 29 16:12:34 MST 2002 checkslicd: v1 ENTER.
Tue Jan 29 16:12:42 MST 2002 checkslicd: v1 EXIT.
Tue Jan 29 16:14:01 MST 2002 savevemap: v1 EXIT.
When savevemap: ve-pair EXIT is displayed, the savevemap(1M) process has
successfully exited.
10Sun StorEdge 3900 and 6900 2.0 Series Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
Sun StorEdge 6900 Series Multipathing
Example
This Sun StorEdge 6900 series multipathing example contains the following
elements:
■ One Sun StorEdge T3+ array partner group
■ Two total LUNs
■ One 500-Gbyte RAID5 LUN per partner group
FIGURE 2-1 for a logical view of the Sun StorEdge 6900 series.
See
Host with HBA-0 and HBA-1
Switch
Switch
LUN0-500G
Passive-Master
LUN1-500G
Active-Alternate
Master
LUN0-10G
Active-MPDrive
LUN1-10G
Active-MPDrive1
SAN
Virtualization
Engine
(1)
Virtualization Engine Communications Traffic
Database
MPDrive
Carved LUNs
Masking
Storage I/O and
Logical Multipath
Drive
MPDrive 0
Logical Multipath
Drive
MPDrive 1
T3ES
(Master)
(0A - 1P)
(Alternate Master)
(1A - 0P)
LUN0-10G
Active-MPDrive 0
LUN1-10G
Active-MPDrive1
Virtualization
Engine
(2)
Switch
Switch
LUN0-500G
Active-Master
LUN1-500G
Passive-Alternate
Master
FIGURE 2-1 Sun StorEdge 6900 Series Logical View
Chapter 2General Troubleshooting Procedures11
Sun Proprietary/Confidential: Internal Use Only
Currently, one 10-Gbyte VLUN is created from each physical LUN, for a total of two
VLUNs. The Sun StorEdge 6900 series has four possible physical paths to each Sun
StorEdge T3+ array volume (LUN).
Refer to
FIGURE 2-2, which illustrates primary data paths to the alternate master, and
FIGURE 2-3, which illustrates the primary data paths to the master Sun StorEdge T3+
array.
Host with HBA-0 and HBA-1
Switch
Switch
LUN0 - 500G
Passive-Master
LUN1 - 10G
Active-MPDrive 1
Virtualization
Engine (1)
LUN0 - 10G
Active-MPDrive 0
SAN
Database
MPDrive
Carved LUNs
Masking
Virtualization Engine Communications Traffic
Storage I/O and
Logical Multipath Drive
MPDrive 0
LUN0 - 10G
Active-MPDrive 0
LUN1 - 10G
Active-MPDrive 1
Virtualization
Engine (2)
Switch
Switch
LUN0 - 500G
Active-Master
LUN1 - 500G
Active Alternate Master
Logical Multipath Drive
MPDrive 1
T3ES
(Master) (0A - 1P)
(Alternate Master)
(1A - 0P)
FIGURE 2-2 Primary Data Paths to the Alternate Master
12Sun StorEdge 3900 and 6900 2.0 Series Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
LUN1 - 500G
Passive Alternate Master
Host with HBA-0 and HBA-1
Switch
Switch
LUN0 - 500G
Passive - Master
LUN1 - 500G
Active Alternate Master
LUN1 - 10G
Active-MPDrive1
Virtualization
Engine (1)
LUN0 - 10G
Active-MPDrive0
SAN
Database
MPDrive
Carved LUNs
Masking
Storage I/O and
Virtualization Engine Communications Traffic
Logical Multipath Drive
MPDrive 0
Logical Multipath Drive
MPDrive 1
T3ES
(Master)
(0A - 1P)
(Alternate Master)
(1A - 0P)
LUN0 - 10G
Active-MPDrive0
LUN1 - 10G
Active-MPDrive1
Virtualization
Engine (2)
Switch
Switch
LUN0 - 500G
Active-Master
LUN1 - 500G
Passive Alternate Master
FIGURE 2-3 Primary Data Paths to the Master Sun StorEdge T3+ Array
To access the LUN on the alternate master, the Sun StorEdge T3+ array I/O could
travel:
> backend loop to alternate master (secondary route from HBA-1)
Chapter 2General Troubleshooting Procedures13
Sun Proprietary/Confidential: Internal Use Only
The host, using multipathing software, is presented with two primary (active) paths
for each LUN, allowing the host to route I/O through either or both HBAs.
If a path failure occurs before the second tier of Sun StorEdge network FC switch-8
and switch-16 switches, one of the paths is disabled—but the other path continues
sending I/O as it normally would and takes over the entire load. Refer to
FIGURE 2-4,
which illustrates a path failure before the second tier of switches.
No Sun StorEdge T3+ array failure is noted because of the redundant path, by way
of the Sun StorEdge network FC switch-8 and switch-16 switch T ports.
Host with HBA-0 and HBA-1
Switch
Switch
LUN0 - 500G
Passive-Master
LUN1 - 500G
Active Alternate Master
LUN1 - 10G
Active-MPDrive1
FAILURE
LUN0 - 10G
Active-MPDrive0
SAN
Database
MPDrive
Carved LUNs
Masking
Storage I/O and
Virtualization Engine Communications Traffic
Logical Multipath
Drive MPDrive 0
Logical Multipath
Drive MPDrive 1
T3ES
(Master)(0A - 1P)
(Alternate Master)
(1A - 0P)
LUN0 - 10G
Active-MPDrive 0
LUN1-10G
Active-MPDrive1
Virtualization
Engine (2)
Switch
Switch
LUN0 - 500G
Active-Master
LUN1 - 500G
Passive Alternate Master
FIGURE 2-4 Path Failure—Before the Second Tier of Switches
14Sun StorEdge 3900 and 6900 2.0 Series Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
The virtualization engine recognizes the primary (active) and secondary (passive)
pathing for the LUNs, and routes the I/O to the primary controller—unless there is
a path failure to the primary path. In that case, the virtualization engine initiates a
LUN failover and routes the I/O through the secondary path (which, in turn, goes
through the interconnect cables). Refer to
FIGURE 2-5, which illustrates a path failure
where I/O is routed through both HBAs.
Host with HBA-0 and HBA-1
Switch
Switch
LUN0-500G
Passive-Master
LUN1-500G
Active Alternate Master
LUN1 - 10G
Active-MPDrive1
Virtualization
Engine(1)
FAILURE
LUN0 - 10G
Active-MPDrive0
SAN
Database
MPDrive
Carved LUNs
Storage I/O
and Virtualization Engine Communications Traffic
Masking
Logical
Multipath Drive
MPDrive 0
Logical
Multipath Drive
MPDrive 1
T3ES
(Master) (0A - 1P)
(Alternate Master)
(1A - 0P)
LUN0 - 10G
Active-MPDrive0
LUN1 - 10G
Active-MPDrive1
Virtualization
Engine(2)
Switch
Switch
LUN0 - 500 G
Active-Master
LUN1-500G
PassiveAlternate Master
FIGURE 2-5 Path Failure—I/O Routed Through Both HBAs
In the event of a path failure after the second tier of Sun StorEdge network FC
switch-8 and switch-16 switches (or in the event that both T ports fail between the
switches), the virtualization engine forces a LUN failover of the affected Sun
StorEdge T3+ array and routes all I/O to its secondary path.
From the host side, nothing has changed: all I/O is routed through both HBAs (refer
FIGURE 2-5).
to
Chapter 2General Troubleshooting Procedures15
Sun Proprietary/Confidential: Internal Use Only
Multipathing Options in the Sun
StorEdge 6900 Series
The presence of the virtualization engine makes multipathing in a Sun StorEdge
6900 series environment challenging.
Unlike Sun StorEdge T3+ array and Sun StorEdge network FC switch-8 and switch16 switch installations (which present primary and secondary pathing options), the
virtualization engines present only primary pathing options to the data host. The
virtualization engines handle all failover and failback operations and mask those
operations from the multipathing software on the data host.
The following example illustrates a Sun StorEdge Traffic Manager (MPxIO) software
problem on a Sun StorEdge 6900 series system.
# /usr/sbin/luxadm display
/dev/rdsk/c6t29000060220041F96257354230303052d0s2
DEVICE PROPERTIES for disk: /dev/rdsk/
c6t29000060220041F96257354230303052d0s2
Status(Port A): O.K.
Status(Port B): O.K.
Vendor: SUN
Product ID: SESS01
WWN(Node): 2a000060220041f4
WWN(Port A): 2b000060220041f4
WWN(Port B): 2b000060220041f9
Revision: 080C
Serial Num: Unsupported
Unformatted capacity: 102400.000 MBytes
Write Cache: Enabled
Read Cache: Enabled
Minimum prefetch: 0x0
Maximum prefetch: 0x0
Device Type: Disk device
Path(s):
/dev/rdsk/c6t29000060220041F96257354230303052d0s2
/devices/scsi_vhci/ssd@g29000060220041f96257354230303052:c,raw
Controller /devices/pci@6,4000/SUNW,qlc@2/fp@0,0
Device Address 2b000060220041f4,0
Class primary
State ONLINE
Controller /devices/pci@6,4000/SUNW,qlc@3/fp@0,0
Device Address 2b000060220041f9,0
Class primary
State ONLINE
16Sun StorEdge 3900 and 6900 2.0 Series Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
Note that in the Class and State fields, the virtualization engines are presented as
two primary ONLINE devices. The current Sun StorEdge Traffic Manager software
design does not enable you to manually halt the I/O (that is, you cannot perform a
failover to the secondary path) when only primary devices are present.
Manually Halting the I/O
As an alternative to using the Sun StorEdge Traffic Manager (MPxIO) software, you
can manually halt the I/O using one of two methods:
■ Quiesce the I/O
■ Unconfigure the c2 path
These methods are explained in the following sections.
# vxdmpadm listctlr all
CTLR-NAME ENCLR-TYPE STATE ENCLR-NAME
=====================================================
c0 OTHER_DISKS ENABLED OTHER_DISKS
c2 SENA ENABLED SENA0
c3 SENA ENABLED SENA0
c20 Disk ENABLED Disk
c23 Disk ENABLED Disk
The vxdisk output includes two physical paths to the LUN:
■ c20t2B000060220041F4d0s2
■ c23t2B000060220041F9d0s2
Both of these paths are currently enabled with DMP.
20Sun StorEdge 3900 and 6900 2.0 Series Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
2. Use the luxadm (1M) command to display further information about the
underlying LUN.
DEVICE PROPERTIES for disk: /dev/rdsk/c23t2B000060220041F9d0s2
Status(Port A): O.K.
Vendor: SUN
Product ID: SESS01
WWN(Node): 2a000060220041f9
WWN(Port A): 2b000060220041f9
Revision: 080C
Serial Num: Unsupported
Unformatted capacity: 102400.000 MBytes
Write Cache: Enabled
Read Cache: Enabled
Minimum prefetch: 0x0
Maximum prefetch: 0x0
Device Type: Disk device
Path(s):
/dev/rdsk/c23t2B000060220041F9d0s2
/devices/pci@e,2000/pci@2/SUNW,qlc@4/fp@0,0/
ssd@w2b000060220041f9,0:c,raw
Chapter 2General Troubleshooting Procedures21
Sun Proprietary/Confidential: Internal Use Only
To Put the DMP-Enabled Paths Back into Production
▼
1. Type:
# vxdmpadm enable ctlr=<cn>
2. Verify that the path has been reenabled by typing:
# vxdmpadm listctlr all
22Sun StorEdge 3900 and 6900 2.0 Series Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
CHAPTER
3
Troubleshooting Tools
This chapter contains the following information related to tools used to troubleshoot
the Sun StorEdge 3900 or 6900 series components.
■ “Storage Automated Diagnostic Environment 2.2” on page 23
■ “Microsoft Windows 2000 System Errors” on page 26
■ “Command Line Test Examples” on page 27
■ “Monitoring Sun StorEdge T3 and T3+ Arrays Using the Explorer Data Collection
Utility” on page 29
■ “Monitoring Host Bus Adapters (HBAs) Using QLogic SANblade Manager” on
page 32
Storage Automated Diagnostic
Environment 2.2
Check the internal status of the Sun StorEdge 3900 or 6900 series systems using the
Storage Automated Diagnostic Environment utility, version 2.2.
The Storage Automated Diagnostic Environment is installed on every Storage
Service Processor that ships with the unit. All that is needed is web browser access
to the Storage Service Processor.
In non-Sun host configurations such as Microsoft Windows 2000, the Storage
Automated Diagnostic Environment will be able to monitor the internals of the
storage unit (switches, virtualization engines, and the Sun StorEdge T3+ arrays), but
will not be able to completely monitor the host-to-storage unit link (the HBA to
switch). Certain conditions will be noted by Storage Automated Diagnostic
Environment, however, such as a port going offline, or increasing Fibre Channel
errors on the port.
Sun Proprietary/Confidential: Internal Use Only
23
Example Topology
In the Storage Automated Diagnostic Environment topology shown in FIGURE 3-1,
the internel components of a Sun StorEdge 3910 system are shown. There is also a
Solaris host (diag221) and the Storage Service Processor (diag156) in the view. What
is missing is the Microsoft Windows 2000 host, which is also connected.
FIGURE 3-1 Storage Automated Diagnostic Environment Example Topology
24Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
Generating Component-Specific Event Grids
The Storage Automated Diagnostic Environment generates component-specific event
grids that describe the severity of an event, tell whether action is required, provide a
description of the event, and recommended action. Refer to Chapters 5 through 9 of
this troubleshooting guide for component-specific event grids.
▼ To Customize an Event Report
1. Choose the Event Grid link on the the Storage Automated Diagnostic
Environment Help menu.
2. Select the criteria from the Storage Automated Diagnostic Environment event
grid, like the one shown in in
TABLE3-1Event Grid Sorting Criteria
CategoryComponentEvent TypeSeverityAction
• All (default)
• Sun StorEdge
A3500FC array
• Sun StorEdge A5000
array
• Agent
• Host
• Message
• Sun Switch
• Sun StorEdge T3+
array
• Tape
• Virtualization engine
• All
(default)
• Backplane
• Controller
• Disk
• Interface
• LUN
• Port
• Power
• Agent Deinstall
• Agent Install
• Alarm
•FC+
• Alternate Master -
• Audit
• Communication Established
• Communication Lost
• Discovery
• Heartbeat
• Insert Component
• Location Change
• Patch Info
• Quiesce End
• Quiesce Start
• Removal
• Remove Component
• State Change +
(from offline to online)
• State Change (from online to offline)
• Statistics
• Backup
TABLE 3-1.
critical (error)
alert (warning)
system down
Yes—This
event is
actionable
and is sent to
the RSS/SRS
providers
No—This
event is
nonactionable
Chapter 3Troubleshooting Tools25
Sun Proprietary/Confidential: Internal Use Only
Microsoft Windows 2000 System Errors
You can view Microsoft Windows 2000 errors through the Event Properties System
Log. The types of errors that would indicate a Sun StorEdge T3+ Array Failover
Driver issue have the Source "Jafo". An example is shown in
You should also look for other events such as any HBA driver-related events
(qla2200, for example) or disk-related events.
FIGURE 3-2.
FIGURE 3-2 Microsoft Windows 2000 Event Properties System Log
26Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
Command Line Test Examples
To run a single Sun StorEdge diagnostic test from the command line rather than
through the Storage Automated Diagnostic Environment interface, you must log in
to the appropriate host or slave for testing the components.
The following two tests, qlctest (1M) and switchtest (1M), are provided as
examples.
qlctest(1M)
The qlctest(1M) test comprises several subtests that test the functions of the Sun
StorEdge PCI dual Fibre Channel (FC) host adapter board. This board is an HBA that
has diagnostic support. This diagnostic test is not scalable.
"qlctest: called with options: dev=/devices/pci@6,4000/SUNW,qlc@3/
fp@0,0:devctl|run_connect=Yes|mbox=Disable|ilb=Disable|ilb_10=Disable|el
b=Enable"
"qlctest: Started."
"Program Version is 4.0.1"
"Testing qlc0 device at /devices/pci@6,4000/SUNW,qlc@3/fp@0,0:devctl."
"QLC Adapter Chip Revision = 1, Risc Revision = 3,
Frame Buffer Revision = 1029, Riscrom Revision = 4,
Driver Revision = 5.a-2-1.15 "
"Running ECHO command test with pattern 0x7e7e7e7e"
"Running ECHO command test with pattern 0x1e1e1e1e"
"Running ECHO command test with pattern 0xf1f1f1f1"
...
"Running ECHO command test with pattern 0x4a4a4a4a"
"Running ECHO command test with pattern 0x78787878"
"Running ECHO command test with pattern 0x25252525"
"FCODE revision is ISP2200 FC-AL Host Adapter Driver: 1.12 01/01/16"
"Firmware revision is 2.1.7f"
"Running CHECKSUM check"
"Running diag selftest"
"qlctest: Stopped successfully."
Chapter 3Troubleshooting Tools27
Sun Proprietary/Confidential: Internal Use Only
switchtest(1M)
switchtest(1M) diagnoses the Sun StorEdge network FC switch-8 and switch-16
switch devices. The switchtest process also provides command-line access to
switch diagnostics. switchtest supports testing on local and remote switches.
switchtest runs the port diagnostic on connected switch ports. While
switchtest is running, the switch ports monitor the port statistics and check the
chassis status.
CODE EXAMPLE 3-2switchtest(1M)
# /opt/SUNWstade/Diags/bin/switchtest -v -o "dev=\
2:192.168.0.30:0x0|xfersize=200"\
"switchtest: called with options: dev=2:192.168.0.30:0x0|xfersize=200"
All Storage Automated Diagnostic Environment diagnostic tests are located in
/opt/SUNWstade/Diags/bin. Refer to the Storage Automated Diagnostic
Environment User’s Guide for a complete list of tests, subtests, options, and
restrictions.
28Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
Monitoring Sun StorEdge T3 and T3+
Arrays Using the Explorer Data
Collection Utility
The Explorer Data Collection Utility script is included on the Storage Service
Processor in the /export/packages directory.
The Explorer Data Collection Utility is not installed by default, but can be installed
during rack setup. Customer-specific site information can be entered at that time.
To find out more about the Explorer Data Collection Utility, you can access the web
site with the following URL:
http://webhome.eng/mdeSW/Project/Explorer.html
▼ To Install the Explorer Data Collection Utility on
the Storage Service Processor
1. Type:
# cd /export/packages
# pkgadd -d . SUNWexplo
2. When you are prompted for site-specific information during the installation
process, you can optionally click Return to accept the blank defaults.
Caution – Do not accept automatic emailing of the Explorer Data Collection Utility
output unless the Storage Service Processor is set up to handle mail correctly.
Automatic Email Submission
Would you like all explorer output to be sent to:
explorer-database-americas@sun.com
at the completion of explorer when -mail or -e is specified?
[y,n] n
Chapter 3Troubleshooting Tools29
Sun Proprietary/Confidential: Internal Use Only
3. Before running the Explorer Data Collection Utility, make sure that the switch and
Sun StorEdge T3+ array information is added to the proper
/opt/SUNWexplo/etc files.
Example
Type switch information in the /opt/SUNWexplo/etc/saninput.txt file. Edit
the file and add the switch information, as shown in
CODE EXAMPLE 3-3Editing Switch Information Using vi
# vi saninput.txt
# Input file for extended data collection
# Format is SWITCH SWITCH-TYPE PASSWORD LOGIN
# Valid switch types are ancor and brocade
# LOGIN is required for brocade switches, the default is admin
sw1a ancor
sw1b ancor
sw2a ancor
sw2b ancor
:wq!
4. Type Sun StorEdge T3+ array information in the /opt/SUNWexplo/etc/
t3input.txt file.
CODE EXAMPLE 3-3.
5. Type the password for your specific site.
CODE EXAMPLE 3-4Editing Sun StorEdge T3+ Array Information Using vi
# vi t3input.txt
# Input file for extended data collection
# Format is HOST PASSWORD
t3b0 xxxx
t3b2 xxxx
t3b3 xxxx
:wq!
Note – xxxx represents Sun StorEdge T3+ array passwords.
30Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
■ You can now run /opt/SUNWexplo/bin/explorer for information about the
Storage Service Processor operating system, the Sun StorEdge network FC switch8 or switch-16 switch, and Sun StorEdge T3+ array information that you can use
for troubleshooting purposes.
■ A tar/gzip file is put in the /opt/SUNWexplo/output/tar/gzip file
directory. You can send the tar/gzip file to Sun Solution Center for evaluation.
■ The Sun StorEdge network FC switch-8 and switch-16 switch information is
placed in the san directory of the tar file.
■ Sun StorEdge T3+ array information is placed in the disk’s /t3 directory.
Chapter 3Troubleshooting Tools31
Sun Proprietary/Confidential: Internal Use Only
Monitoring Host Bus Adapters (HBAs)
Using QLogic SANblade Manager
The most effective way to retrieve HBA status and information is by using the HBA
manufacturer’s utility, such as the Qlogic SANblade Manager software provided by
Qlogic for their HBAs. This software is freely downloadable from Qlogic’s website
(http://www.qlogic.com).
Note – Other manufacturer’s utilities, such as LightPulse’s Emulex, are needed for
other HBA’s, such as Emulex HBAs.
Use the Qlogic SANblade Manager to extract information about:
■ HBA Driver versions
■ Firmware versions
■ A primitive topology view
■ A LUN listing
■ Diagnostics on the HBA
32Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
FIGURE 3-3 Qlogic SANblade Manager HBA Driver and Firmware Versions
Chapter 3Troubleshooting Tools33
Sun Proprietary/Confidential: Internal Use Only
QLogic SANblade Manager is also useful for viewing a primitive topology and a
LUN listing.
FIGURE 3-4 QLogic SANblade Manager Diagnostics
Note – Differing HBA manufacturer’s may bundle different features with their
tools. The information in this guide is written with the assumption of Qlogic
software usage.
34Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
CHAPTER
4
Troubleshooting Ethernet Hubs
The Sun StorEdge 3900 and 6900 series uses an Ethernet hub as the backbone for the
internal service network. The allocation of Ethernet ports is as follows:
■ One for the Storage Service Processor (per subsystem)
■ One for each FC switch
■ One for each virtualization engine
■ Two for each Sun StorEdge T3+ array partner group
■ One for the Ethernet hub that is installed on the second Sun StorEdge Expansion
Cabinet in the Sun StorEdge 3960 and 6960 series systems
Note – Information about LED status lights, power information, and front panel
settings can be found in the 3Com document SuperStack 3 Baseline Hub 12-Port TP
User Guide or SuperStack 3 Baseline Hub 24-Port TP User Guide, available at
http://www.3com.com.
For repair and replacement procedures, refer to the Sun StorEdge 3900 and 6900 SeriesReference and Service Guide .
Sun Proprietary/Confidential: Internal Use Only
35
36Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
CHAPTER
5
Troubleshooting the Fibre Channel
(FC) Links
FC links diagnose Sun StorEdge network FC components in a SAN or a direct
attached storage (DAS) environment. linktest(1M), which tests the health of the
FC links, is available only from the Test from Topology view of the Storage
Automated Diagnostic Environment GUI.
Note – linktest tests both ends of the link segment and enters a guided isolation
when a fault is detected.
Faults can be detected in one of two ways: when linktest sends an alert on a bad
or intermittent link, or when a red link appears on the topology graph, indicating a
failure.
This chapter contains the following sections:
■ “FC Links” on page 38
■ “Troubleshooting the A1 or B1 FC Link” on page 42
■ “Troubleshooting the A2 or B2 FC Link” on page 49
■ “Troubleshooting the A3 or B3 FC Link” on page 54
■ “Troubleshooting the A4 or B4 FC Link” on page 60
Sun Proprietary/Confidential: Internal Use Only
37
FC Links
The following sections provide troubleshooting information for the basic
components and FC links, listed in
TABLE5-1FC Links
LinkProvides FC Link Between These Components
A1 to B1Data host, sw1a, and sw1b
A2sw1a and v1a*
B2sw1b and v1b*
A3v1a and sw2a*
B3v1b and sw2b*
A4Master Sun StorEdge T3+ array and the “A” path switch
B4Alternate master Sun StorEdge T3+ array and the “B” path switch
T1 to T2sw2a and sw2b*
* Sun StorEdge 6900 1.1 Series only
By using the Storage Automated Diagnostic Environment, you should be able to
isolate the problem to one particular segment of the configuration.
TABLE 5-1.
Note – The information found in this section is based on the assumption that the
Storage Automated Diagnostic Environment is running on the data host, and that it
is configured to monitor host errors.
The following diagrams provide troubleshooting information for the basic
components and FC links specific to the Sun StorEdge 3900 1.1 series (shown in
FIGURE 5-1), and the Sun StorEdge 6900 1.1 series (shown in FIGURE 5-2).
Note – An actual Sun StorEdge 3900 or 6900 series configuration could have more
Sun StorEdge T3+ arrays than are shown in FIGURE 5-1 and FIGURE 5-2.
38Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
FC Link Diagrams
FIGURE 5-1 shows the basic components and the FC links for a Sun StorEdge 3900
series system:
■ A1 to B1—HBA to Sun StorEdge network FC switch-8 and switch-16 switch link
■ A4 to B4—Sun StorEdge network FC switch-8 and switch-16 switch to Sun
StorEdge T3+ array link
HOST
HBA-A
A1
sw1asw1b
T3+ alternate master
A4
T3+ Master
HBA-B
B1
B4
FIGURE 5-1 Sun StorEdge 3900 Series FC Link Diagram
Chapter 5Troubleshooting the Fibre Channel (FC) Links39
Sun Proprietary/Confidential: Internal Use Only
TABLE 5-2 and FIGURE 5-2 shows the basic components and the FC links for a Sun
StorEdge 6900 series system:
TABLE5-2Ax to Bx FC Links.
LinkProvides FC Link Between These Components
A1 to B1HBA to Sun StorEdge network FC switch-8 and switch-16
switch link
A2 to B2Sun StorEdge network FC switch-8 and switch-16 switch to
virtualization engine link on the host side
A3 to B3Sun StorEdge network FC switch-8 and switch-16 switch to the
virtualization engine link on the device side
A4 to B4Sun StorEdge network FC switch-8 and switch-16 switch to Sun
StorEdge T3+ array link
T1 to T2T port switch-to-switch link
40Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
HOST
A2
A3
sw2a
sw1a
v1a
A4
A1
HBA-A
T1
T2
T3+ alternate master
T3+ Master
HBA-B
B1
sw1b
B2
v1b
B3
sw2b
B4
FIGURE 5-2 Sun StorEdge 6900 Series FC Link Diagram
Chapter 5Troubleshooting the Fibre Channel (FC) Links41
Sun Proprietary/Confidential: Internal Use Only
Troubleshooting the A1 or B1 FC Link
The A1 or B1 link is the FC link from the HBA to the switch.
What happens when a FC link fails depends on the system. If a problem occurs with
the A1 or B1 FC link:
■ In a Sun StorEdge 3900 series system, the Sun StorEdge T3+ array will fail over.
■ In a Sun StorEdge 6900 series system, no Sun StorEdge T3+ array will fail over,
but an error with the FC link can cause a path to go offline.
42Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
FIGURE 5-3, FIGURE 5-4, and FIGURE 5-5 are examples of A1 or B1 link notification
events.
Site : FSDE LAB Broomfield CO
Source : diag.xxxxx.xxx.com
Severity : Normal
Category : Message Key: message:diag.xxxxx.xxx.com
EventType: LogEvent.driver.LOOP_OFFLINE
EventTime: 01/08/2002 14:34:45
Found 1 ’driver.LOOP_OFFLINE’ error(s) in logfile: /var/adm/messages on
diag.xxxxx.xxx.com (id=80fee746):
info: Loop Offline
Jan 8 14:34:25 WWN:Received 2 ’Loop Offline’ message(s) [threshold is 1
in 5mins] Last-Message: ’diag.xxxxx.xxx.com qlc: [ID 686697 kern.info] NOTICE:
Qlogic qlc(0): Loop OFFLINE ’
FIGURE 5-3 Data Host Notification of Intermittent Problems
Site : FSDE LAB Broomfield CO
Source : diag.xxxxx.xxx.com
Severity : Normal
Category : Message Key: message:diag.xxxxx.xxx.com
EventType: LogEvent.driver.MPXIO_offline
EventTime: 01/08/2002 14:48:02
Found 2 ’driver.MPXIO_offline’ warning(s) in logfile: /var/adm/messages on
diag.xxxxx.xxx.com (id=80fee746):
Jan 8 14:47:07 WWN:2b000060220041f9 diag.xxxxx.xxx.com mpxio: [ID
779286 kern.info] /scsi_vhci/ssd@g29000060220041f96257354230303053
(ssd19) multipath status: degraded, path /pci@6,4000/SUNW,qlc@3/fp@0,0
(fp1) to target address: 2b000060220041f9,1 is offline
Jan 8 14:47:07 WWN:2b000060220041f9 diag.xxxxx.xxx.com mpxio: [ID
779286 kern.info] /scsi_vhci/ssd@g29000060220041f96257354230303052
(ssd18) multipath status: degraded, path /pci@6,4000/SUNW,qlc@3/fp@0,0
(fp1) to target address: 2b000060220041f9,0 is offline
FIGURE 5-4 Data Host Notification of Severe Link Error
Chapter 5Troubleshooting the Fibre Channel (FC) Links43
Sun Proprietary/Confidential: Internal Use Only
Site : FSDE LAB Broomfield CO
Source : diag.xxxxx.xxx.com
Severity : Normal
Category : Switch Key: switch:100000c0dd0057bd
EventType: StateChangeEvent.X.port.6
EventTime: 01/08/2002 14:54:20
’port.6’ in SWITCH diag-sw1a (ip=192.168.0.30) is now Unknown (statusstate changed from ’Online’ to ’Admin’):
FIGURE 5-5 Storage Service Processor Notification
Note – An A1 or B1 FC link error can cause a port in sw1a or sw1b to change state.
44Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
Verifying the Data Host
The following example shows an error in the A1 or B1 FC link, which can cause a
path to go offline in the multipathing software.
FRU Tests Available for the A1 or B1 FC Link
Segment
The following FRU tests are available for the A1 or B1 FC link segment. All
diagnostics are located in /opt/SUNWstade/Diags/bin. Refer to the man pages
for more details.
■ HBA—qlctest(1M)
■ Available only if the Storage Automated Diagnostic Environment is installed
on a data host
■ Causes HBA to go offline and online during tests
■ Switch —switchtest(1M)
■ Can be run while the link is still cabled and online (connected to HBA)
■ Can be run only from the Storage Service Processor.
■ The dev option to switchtest is in the following format:
Port:IP-Address:FCAddress
The FCAddress can be set to 0x0.
Note – If you are testing an A1 or B1 FC link that is connected to an HBA, you must
specify a payload of 200 bytes or less. This is a limitation in the HBA applicationspecific integrated circuit (ASIC).
46Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
CODE EXAMPLE 5-3switchtest(1M) Called With Options
"switchtest: called with options: dev=2:192.168.0.30:0"
"switchtest: Started."
"Testing port: 2"
"Using ip_addr: 192.168.0.30, fcaddr: 0x0 to access this port."
"Chassis Status for Device: Switch Power: OK Temp: OK 23.0c Fan 1: OK
Fan 2: OK "
02/06/02 15:09:45 diag Storage Automated Diagnostic Environment MSGID 4001
switchtest.WARNING
switch0: "Maximum transfer size for a FABRIC port is 200. Changing
transfer size 2000 to 200"
"Testing Device: Switch Port: 2 Pattern: 0x7e7e7e7e"
"Testing Device: Switch Port: 2 Pattern: 0x1e1e1e1e"
Note – The Storage Automated Diagnostic Environment automatically resets the
transfer size if it notes that it is about to test a switch to the HBA connection. This is
done both in the Storage Automated Diagnostic Environment GUI and from the
command-line interface (CLI).
Chapter 5Troubleshooting the Fibre Channel (FC) Links47
Sun Proprietary/Confidential: Internal Use Only
▼
To Isolate the A1 or B1 FC Link
To isolate the A1 or B1 link, which is the FC link from the HBA to the switch, follow
these steps:
1. Quiesce the I/O on the A1 or B1 FC link path.
2. Run switchtest(1M) or qlctest(1M) to test the entire link.
3. Break the connection by uncabling the link.
4. Insert a loopback connector into the switch port.
5. Rerun switchtest.
a. If switchtest fails, replace the gigabit interface converter (GBIC) and rerun
switchtest.
b. If switchtest fails again, replace the switch.
6. Insert a loopback connector into the HBA.
7. Run qlctest.
a. If the qlctest test fails, replace the HBA.
b. If the qlctest test passes, replace the cable.
8. Recable the entire link.
9. Run switchtest or qlctest to validate the fix.
10. Put the path back into production.
48Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
Troubleshooting the A2 or B2 FC Link
The A2 or B2 link is the FC link from the first switch to the virtualization engine.
This link exists in the Sun StorEdge 6900 Series only. An error with the FC link can
cause a path to go offline.
FIGURE 5-6 and FIGURE 5-7 are examples of A2 or B2 Link Notification Events.
From root Tue Jan 8 18:39:48 2002
Date: Tue, 8 Jan 2002 18:39:47 -0700 (MST)
Message-Id: <200201090139.g091dlg07015@diag.xxxxx.xxx.com>
From: Storage Automated Diagnostic Environment.Agent
Subject: Message from ’diag.xxxxx.xxx.com’ (2.0.B2.002)
Content-Length: 2742
You requested the following events be forwarded to you from
’diag.xxxxx.xxx.com’.
Site : FSDE LAB Broomfield CO
Source : diag226.xxxxx.xxx.com
Severity : Normal
Category : Message Key: message:diag.xxxxx.xxx.com
EventType: LogEvent.driver.Fabric_Warning
EventTime: 01/08/2002 17:34:47
Found 1 ’driver.Fabric_Warning’ warning(s) in logfile: /var/adm/messages
on diag.xxxxx.xxx.com (id=80fee746):
Info: Fabric warning
Jan 8 17:34:36 WWN:2b000060220041f4diag.xxxxx.xxx.com fp: [ID 517869
kern.warning] WARNING: fp(0): N_x Port with D_ID=108000,
PWWN=2b000060220041f4 disappeared from fabric
<snip>
multipath status: degraded, path /pci@6,4000/SUNW,qlc@2/fp@0,0 (fp0) to
target address: 2b000060220041f4,1 is offline
Jan 8 17:34:55 WWN:2b000060220041f4 diag.xxxxx.xxx.com
Site : FSDE LAB Broomfield CO
Source : diag.xxxxx.xxx.com
Severity : Normal
Category : San Key: switch:100000c0dd0061bb:1
EventType: LinkEvent.ITW.switch|ve
EventTime: 01/08/2002 17:39:47
ITW-ERROR (765 in 11 mins): Origin: port 1 on ’switch ’sw1b/192.168.0.31’.
Destination: port 1 on ve ’diag-v1b/29000060220041f4’:
Info:
An invalid transmission word (ITW) was detected between two components.
This could indicate a potential problem.
Cause:
Likely Causes are: GBIC, FC Cable and device optical connections.
Action:
To isolate further please run the Storage Automated Diagnostic Environment
tests associated with this link segment.
FIGURE 5-7 A2 or B2 FC Link Storage Service Processor-Side Event
50Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
Verifying the Data Host
An error in the A2 or B2 FC link can result in a device being listed as in an
“unusable” state in cfgadm, but no HBAs being listed in the “unconnected” state in
the luxadm output. The multipathing software will note an offline path, as shown in
CODE EXAMPLE 5-4.
CODE EXAMPLE 5-4cfgadm -al
# /usr/sbin/cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
DEVICE PROPERTIES for disk: /dev/rdsk/c6t29000060220041F96257354230303052d0s2
Status(Port A): O.K.
Status(Port B): O.K.
Vendor: SUN
Product ID: SESS01
WWN(Node): 2a000060220041f9
WWN(Port A): 2b000060220041f9
WWN(Port B): 2b000060220041f4
Revision: 080C
Serial Num: Unsupported
Unformatted capacity: 102400.000 MBytes
Write Cache: Enabled
Read Cache: Enabled
Minimum prefetch: 0x0
Maximum prefetch: 0x0
Device Type: Disk device
Path(s):
/dev/rdsk/c6t29000060220041F96257354230303052d0s2
/devices/scsi_vhci/ssd@g29000060220041f96257354230303052:c,raw
Controller /devices/pci@6,4000/SUNW,qlc@3/fp@0,0
Device Address 2b000060220041f9,0
Class primary
State ONLINE
Controller /devices/pci@6,4000/SUNW,qlc@2/fp@0,0
Device Address 2b000060220041f4,0
Class primary
State OFFLINE
Note – You can find procedures for restoring virtualization engine settings in the
Sun StorEdge 3900 and 6900 Series 2.0 Reference and Service Guide.
Chapter 5Troubleshooting the Fibre Channel (FC) Links51
Sun Proprietary/Confidential: Internal Use Only
Verifying the A2 or B2 FC Link
You can check the A2 or B2 FC link using the Storage Automated Diagnostic
Environment, Diagnose—Test from Topology functionality. The Storage Automated
Diagnostic Environment’s implementation of diagnostic tests verifies the operation
of user-selected components. Using the Topology view, you can select specific tests,
subtests, and test options.
FRU Tests Available for the A2 or B2 FC Link
Segment
■ The linktest is not available.
■ Both the switch and the GBIC are tested using the switchtest test. The
switchtest test:
■ Can be used only in conjunction with the loopback connector
■ Cannot be cabled to the virtualization engine while switchtest runs
■ No virtualization engine tests are available.
▼ To Isolate the A2 or B2 FC Link
To isolate the A2 or B2 link, which is the FC link from the first switch to the
virtualization engine (only in the Sun StorEdge 6900 Series), follow these steps.
Note – The A2 or B2 FC link exists in a Sun StorEdge 6900 series only.
1. Quiesce the I/O on the A2 or B2 FC link path.
2. Break the connection by uncabling the link.
3. Insert the loopback connector in to the switch port.
4. Run switchtest:
a. If the test fails, replace the GBIC and rerun switchtest.
b. If the test fails again, replace the switch.
52Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
5. If the switch and the GBIC show no errors, replace the remaining components in
the following order:
a. Replace the virtualization engine-side GBIC, recable the link, and monitor the
link for errors.
b. Replace the cable, recable the link, and monitor the link for errors.
c. Replace the virtualization engine, restore the virtualization engine settings,
recable the link, and monitor the link for errors.
Note – The procedures for restoring virtualization engine settings are in the Sun
StorEdge 3900 and 6900 Series 2.0 Reference and Service Guide.
6. Return the path to production.
Chapter 5Troubleshooting the Fibre Channel (FC) Links53
Sun Proprietary/Confidential: Internal Use Only
Troubleshooting the A3 or B3 FC Link
The A3 or B3 link is the FC link from the virtualization engine to the backend switch.
The A3 or B3 FC link exists in a Sun StorEdge 6900 Series only. An error with the FC
link can cause a path to go offline.
FIGURE 5-8, FIGURE 5-9, and FIGURE 5-10 are examples of A3 or B3 link notification
events.
Site : FSDE LAB Broomfield CO
Source : diag.xxxxx.xxx.com
Severity : Normal
Category : Message Key: message:diag.xxxxx.xxx.com
EventType: LogEvent.driver.MPXIO_offline
EventTime: 01/08/2002 18:25:18
Found 2 ’driver.MPXIO_offline’ warning(s) in logfile: /var/adm/messages on
diag.xxxxx.xxx.com (id=80fee746):
Jan 8 18:24:24 WWN:2b000060220041f9 diag.xxxxx.xxx.com mpxio: [ID 779286
kern.info] /scsi_vhci/ssd@g29000060220041f96257354230303053 (ssd19) multipath
status: degraded, path /pci@6,4000/SUNW,qlc@3/fp@0,0 (fp1) to target address:
2b000060220041f9,1 is offline
Jan 8 18:24:24 WWN:2b000060220041f9 diag.xxxxx.xxx.com mpxio: [ID 779286
kern.info] /scsi_vhci/ssd@g29000060220041f96257354230303052 (ssd18) multipath
status: degraded, path /pci@6,4000/SUNW,qlc@3/fp@0,0 (fp1) to target address:
2b000060220041f9,0 is offline
---------------------------------------------------------------Site : FSDE LAB Broomfield CO
Source : diag.xxxxx.xxx.com
Severity : Normal
Category : Message Key: message:diag.xxxxx.xxx.com
EventType: LogEvent.driver.Fabric_Warning
EventTime: 01/08/2002 18:25:18
Found 1 ’driver.Fabric_Warning’ warning(s) in logfile: /var/adm/messages on
diag.xxxxx.xxx.com (id=80fee746):
Info:
Fabric warning
Jan 8 18:24:04 WWN:2b000060220041f9 diag.xxxxx.xxx.com fp: [ID 517869
kern.warning] WARNING: fp(1): N_x Port with D_ID=104000, PWWN=2b000060220041f9
disappeared from fabric
FIGURE 5-8 A3 or B3 FC Link Host-Side Event
54Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
Site : FSDE LAB Broomfield CO
Source : diag.xxxxx.xxx.com
Severity : Normal
Category : Switch Key: switch:100000c0dd0057bd
EventType: StateChangeEvent.M.port.1
EventTime: 01/08/2002 18:28:38
’port.1’ in SWITCH diag-sw1a (ip=192.168.0.30) is now Not-Available
(status-state changed from ’Online’ to ’Offline’):
Info:
A port on the switch has logged out of the fabric and gone offline
Action:
1. Verify cables, GBICs and connections along FC path
2. Check Storage Automated Diagnostic Environment SAN Topology GUI to
identify failing segment of the data path
3. Verify correct FC switch configuration
FIGURE 5-9 A3 or B3 FC Link Storage Service Processor-Side Event
Site : FSDE LAB Broomfield CO
Source : diag.xxxxx.xxx.com
Severity : Normal
Category : Switch Key: switch:100000c0dd00cbfe
EventType: StateChangeEvent.M.port.1
EventTime: 01/08/2002 18:28:40
’port.1’ in SWITCH diag-sw2a (ip=192.168.0.32) is now Not-Available
(status-state changed from ’Online’ to ’Offline’):
Info:
A port on the switch has logged out of the fabric and gone offline
Action:
1. Verify cables, GBICs and connections along FC path
2. Check Storage Automated Diagnostic Environment SAN Topology GUI to
identify failing segment of the data path
3. Verify correct FC switch configuration
FIGURE 5-10 A3 or B3 FC Link Storage Service Processor-Side Event
Chapter 5Troubleshooting the Fibre Channel (FC) Links55
Sun Proprietary/Confidential: Internal Use Only
Verifying the Data Host
An error in the A3 or B3 FC link results in a device being listed as in an “unusable”
state in cfgadm, but no HBAs are listed as in the “unconnected” state in luxadm
output. The multipathing software will note an offline path.
/dev/rdsk/c6t29000060220041F96257354230303052d0s2
DEVICE PROPERTIES for disk: /dev/rdsk/
c6t29000060220041F96257354230303052d0s2
...
/devices/scsi_vhci/ssd@g29000060220041f96257354230303052:c,raw
Controller /devices/pci@6,4000/SUNW,qlc@3/fp@0,0
Device Address 2b000060220041f9,0
Class primary
State OFFLINE
Controller /devices/pci@6,4000/SUNW,qlc@2/fp@0,0
Device Address 2b000060220041f4,0
Class primary
State ONLINE
56Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Jul 8 18:26:38 diag.xxxxx.xxx.com vxdmp: [ID 997040 kern.notice] NOTICE:
vxvm:vxdmp: disabled path 118/0x1f8 belonging to the dmpnode 231/0xd0
Verifying the Storage Service Processor-Side
You can check the A3 or B3 FC link using the Storage Automated Diagnostic
Environment’s Test from Topology functionality.
The Storage Automated Diagnostic Environment’s implementation of diagnostic
tests verifies the operation of user-selected components. Using the Topology view,
you can select specific tests, subtests, and test options.
Refer to the Storage Automated Diagnostic Environment User’s Guide for more
information.
FRU Tests Available for the A3 or B3 FC Link
Segment
■ The linktest is not available.
■ Both the switch and the GBIC are tested using the switchtest test. The
switchtest test:
■ Can be used only in conjunction with the loopback connector
■ Cannot be cabled to the virtualization engine while switchtest runs
■ No virtualization engine tests are available at this time.
Chapter 5Troubleshooting the Fibre Channel (FC) Links57
Sun Proprietary/Confidential: Internal Use Only
▼
To Isolate the A3 or B3 FC Link
To isolate the A3 or B3 link, which is the FC link from the virtualization engine to
the back-end switch, follow these steps:
Note – The A3 or B3 FC link exists in a Sun StorEdge 6900 series only.
1. Quiesce the I/O on the A3 or B3 FC link path (refer to “Quiescing the I/O on the
A3 or B3 Link” on page 59).
2. Break the connection by uncabling the link.
3. Insert the loopback connector in to the switch port.
4. Run switchtest:
a. If the test fails, replace the GBIC and rerun switchtest.
b. If the test fails again, replace the switch.
5. If the switch or the GBIC shows no errors, replace the remaining components in
the following order:
a. Replace the virtualization engine-side GBIC, recable the link, and monitor the
link for errors.
b. Replace the cable, recable the link, and monitor the link for errors.
c. Replace the virtualization engine, restore the virtualization engine settings,
recable the link, and monitor the link for errors.
Note – The procedures for restoring virtualization engine settings are in the Sun
StorEdge 3900 and 6900 Series 2.0 Reference and Service Guide.
6. Return the path to production.
58Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
Quiescing the I/O on the A3 or B3 Link
1. Determine the path you want to disable.
2. Disable the path by typing the following:
# /usr/bin/vxdmpadm disable ctlr=<cn>
3. Verify that the path is disabled:
# /usr/bin/vxdmpadm listctlr all
Steps 1 and 2 halt I/O only up to the A3 to B3 link. I/O continues to move over the
T1 and T2 paths, as well as the A4 to B4 links to the Sun StorEdge T3+ array.
Suspending the I/O on the A3 to B3 Link
Use one of the following methods to suspend I/O while the failover occurs:
■ Stop all customer applications that are accessing the Sun StorEdge T3+ array.
■ Manually pull the link from the Sun StorEdge T3+ array to the switch and wait
for a Sun StorEdge T3+ array LUN failover.
■ After the failover occurs, replace the cable and proceed with testing and FRU
isolation.
■ After testing is complete and any FRU replacement is finished, return the
controller state back to the default by using the virtualization engine failback
command.
Caution – This action will cause SCSI errors on the data host and a brief suspension
of I/O while the failover occurs.
Chapter 5Troubleshooting the Fibre Channel (FC) Links59
Sun Proprietary/Confidential: Internal Use Only
Troubleshooting the A4 or B4 FC Link
The A4 or B4 link is the FC link from the switch to the Sun StorEdge T3+ array.
If a problem occurs with the A4 or B4 FC link:
■ In a Sun StorEdge 3900 series system, the Sun StorEdge T3+ array will fail over.
■ In a Sun StorEdge 6900 series system, no Sun StorEdge T3+ array will fail over,
but an error with the FC link can cause a path to go offline.
FIGURE 5-11 and FIGURE 5-12 are examples of A4 or B4 Link Notification Events.
Found 1 ’driver.Fabric_Warning’ warning(s) in logfile: /var/adm/messages on
diag.xxxxx.xxx.com (id=80e4aa60):
INFORMATION:
Fabric warning
<snip>
status of hba /devices/pci@a,2000/pci@2/SUNW,qlc@5/fp@0,0:devctl on
diag.xxxxx.xxx.com changed from CONNECTED to NOT CONNECTED
INFORMATION:
monitors changes in the output of luxadm -e port
Found path to 20 HBA ports
/devices/sbus@2,0/SUNW,socal@d,10000:0 NOT CONNECTED
FIGURE 5-11 A4 or B4 FC Link Data-Host Notification
60Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
FIGURE 5-12 Storage Service Processor-Side Notification
Chapter 5Troubleshooting the Fibre Channel (FC) Links61
Sun Proprietary/Confidential: Internal Use Only
Verifying the Data Host
A problem in the A4 or B4 FC Link appears differently on the data host, depending
on whether the array is a Sun StorEdge 3900 series or a Sun StorEdge 6900 series
device.
Sun StorEdge 3900 Series
In a Sun StorEdge 3900 series device, the data host multipathing software is
responsible for initiating the failover and reports it in /var/adm/messages, such
as those reported by the Storage Automated Diagnostic Environment email
notifications.
The luxadm failover command is used to fail the Sun StorEdge T3+ array LUNs
back to the proper configuration after the failing FRU is replaced. This command is
issued from the data host.
Sun StorEdge 6900 Series
In a Sun StorEdge 6900 series device, the virtualization engine pairs handle the
failover and the failover is not noted on the data host. All paths remain online and
active.
The failbackt3path command is used, and is issued from the Storage Service
Processor.
Note – In the event of a complete sw1b or sw2b failure in a Sun StorEdge 6900
series configuration, the virtualization engine pairs handle the failover. In addition,
the multipathing software notes a path failure on the data host, the Sun StorEdge
Traffic Manager or DMP software takes the entire path that was connected to the
failed switch offline, and the Inter-Switch Link (ISL) ports on the surviving switch
go offline as well.
62Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
To verify that the failover luxadm display can be used, the failed path is marked
“offline,” as shown in
FRU Tests Available for the A4 or B4 FC Link
Segment
■ The switchtest can only be run from the Storage Service Processor.
■ The linktest can isolate the switch and the GBIC on the switch. It cannot
isolate the cable or the Sun StorEdge T3+ array controller.
▼ To Isolate the A4 or B4 FC Link
To isolate the A4 or B4 link, which is the FC link from the switch to the Sun StorEdge
T3+ array, follow these steps.
1. Quiesce the I/O on the A4 or B4 FC link path.
2. Run linktest(1M) from the Storage Automated Diagnostic Environment GUI to
isolate suspected failing components.
Alternatively, follow these steps:
1. Quiesce the I/O on the A4 or B4 FC link path.
2. Run switchtest(1M) to test the entire link (re-create the problem).
3. Break the connection by uncabling the link.
4. Insert the loopback connector in to the switch port.
64Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
5. Rerun switchtest.
a. If switchtest fails, replace the GBIC and rerun switchtest.
b. If the test fails again, replace the switch.
6. If switchtest passes, assume that the suspect components are the cable and the
Sun StorEdge T3+ array controller.
a. Replace the cable.
b. Rerun switchtest.
7. If the test fails again, replace the Sun StorEdge T3+ array controller.
8. Return the path to production.
9. Return the Sun StorEdge T3+ array LUNs to the correct controllers, if a failover
occurred. (Determine if failovers occur using the luxadm failover or
failbackt3path commands.)
Chapter 5Troubleshooting the Fibre Channel (FC) Links65
Sun Proprietary/Confidential: Internal Use Only
66Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
CHAPTER
6
Troubleshooting Host Devices
This chapter describes how to troubleshoot components associated with a Sun
StorEdge 3900 or 6900 series host.
This chapter contains the following sections:
■ “To Access the Host Event Grid” on page 67
■ “To Replace the Master Host” on page 71
■ “To Replace the Alternate Master or Slave Monitoring Host” on page 72
Using the Host Event Grid
The Storage Automated Diagnostic Environment Event Grid enables you to sort host
events by component, category, or event type. The Storage Automated Diagnostic
Environment GUI displays an event grid that describes the severity of the event,
tells whether action is required, provides a description of the event, and gives the
recommended action. Refer to the Storage Automated Diagnostic Environment User’sGuide for more information.
▼ To Access the Host Event Grid
1. From the Storage Automated Diagnostic Environment Help menu, choose the
Event Grid link.
FIGURE 6-1 shows the Host Event Grid, from which you can select related criteria
2.
for the event you are troubleshooting.
Sun Proprietary/Confidential: Internal Use Only
67
FIGURE 6-1 Sample Host Event Grid
68Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
TABLE 6-1 lists all the host events in the Storage Automated Diagnostic Environment.
TABLE6-1Storage Automated Diagnostic Environment Event Grid for the Host
Action
Description
Component
Severity
EventT ype
HBAAlarm+YellowThe status of hba /
devices/sbus@9,0/
SUNW,qlc@0,30000/
fp@0,0:devctl on
diag.xxxxx.xxx.com.
The status changed from
not connected to
connected.
HBAAlarm-RedYThe status of hba
/devices/sbus@9,0/
SUNW,qlc@0,30000/
fp@0,0:devctl on
diag.xxxxx.xxx.com.
The status changed from
connected to not
connected.
LUN.
t300
Alarm-RedYThe state of
lUN.t300.c14t50020F2
300003EE5d0s2.status
Aon
diag.xxxxx.xxx.com.
The status changed from
OK to error
(target=t3:diag244-t3b0/
90.0.0.40).
LUN.
VE
Alarm-RedYThe state of
LUN.VE.c14t50020F230
0003EE5d0s2.statusA
on diag.xxxxx.xxx.com.
The Status changed from
OK to error
(target=ve:diag244-ve0/90.0.0.40).
Information
Monitors changes in the
output of the
luxadm -e port.
• Monitors changes in the
output of the luxadm -e
port.
• Finds the path to 20
HBA ports.
The luxadm display
reported a change in the
port status of one of its
paths. The Storage
Automated Diagnostic
Environment tries to find
the enclosure
corresponding to this path
by reviewing its database
of Sun StorEdge T3+ arrays
and virtualization engines.
The luxadm display
reported a change in the
port status of one of its
paths. The Storage
Automated Diagnostic
Environment tries to find
the enclosure
corresponding to this path
by reviewing its database
of Sun StorEdge T3+ arrays
and virtualization engines.
Chapter 6Troubleshooting Host Devices69
Sun Proprietary/Confidential: Internal Use Only
TABLE6-1Storage Automated Diagnostic Environment Event Grid for the Host (Continued)
Severity
Component
ifptestDiagnostic
EventT ype
RedYifptest (diag240) on the
Test-
qlctestDiagnostic
Redqlctest (diag240) on the
Test-
socal
test
Diagnostic
Test-
Redsocaltest (diag240) on
enclosurePatchInfoNew patch and package
Action
host failed.
host failed.
the host failed.
information were
generated.
Description
Information
Check Test Manager for
failure details.
Check Test Manager for
failure details.
Check Test Manager for
failure details.
Send changes to the output
of
showrev -p and
pkginfo -|.
enclosurebackupThe Agent was backed up.Backs up the configuration
file of the Agent.
disk_
capacity
AlarmYellowYDetected that
/var/opt/SUNWstade is
at or above 98% capacity
by typing:
Remove unused files and
directories to free up space.
Use a larger disk for
70Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
Replacing the Master, Alternate Master,
and Slave Monitoring Host
The following procedures are a high-level overview of the procedures that are
detailed in the Storage Automated Diagnostic Environment User’s Guide. Follow these
procedures when replacing a master, alternate master, or slave monitoring host.
Note – The procedures for replacing the master host are different from the
procedures for replacing an alternate master or slave monitoring host.
▼ To Replace the Master Host
Refer to Chapter 2 of the Storage Automated Diagnostic Environment User’s Guide for
detailed instructions for the next four steps.
1. Install the SUNWstade package on a new master host.
2. Run /opt/SUNWstade/bin/ras_install on the new master host.
3. Configure the host as the master host.
4. Connect to the master server’s GUI at
http://<servername>:7654
5. Choose System Utilities -> Recover Config.
Refer to Chapter 3 of the Storage Automated Diagnostic Environment User’s Guide for
detailed instructions.
a. In the Recover Config window, enter the IP address of any alternate master or
slave monitoring host. (All hosts keep a copy of the configuration.)
b. Make sure the checkboxes for Recover config and Reset slave to this master are
checked.
c. Click Recover.
6. Choose Maintenance -> General Maintenance.
a. Ensure that all host and device settings are recovered correctly.
b. Refer to Chapter 3 of the Storage Automated Diagnostic Environment User’s
Guide for detailed instructions.
Chapter 6Troubleshooting Host Devices71
Sun Proprietary/Confidential: Internal Use Only
7. Choose Maintenance -> General Maintenance -> Start/Stop Agent to start the
agent on the master host.
▼ To Replace the Alternate Master or Slave
Monitoring Host
1. Choose Maintenance -> General Maintenance -> Maintain Hosts.
Refer to the maintenance section in Chapter 3 of the Storage Automated Diagnostic
Environment User’s Guide.
2. In the Maintain Hosts window, from the Existing Hosts list, select the host to be
replaced and click Delete.
3. Install the new host.
Refer to Chapter 2 of the Storage Automated Diagnostic Environment User’s Guide for
detailed instructions for the next four steps.
4. Install the SUNWstade package on the new host.
5. Run /opt/SUNWstade/bin/ras_install.
6. Configure the host as a slave.
7. Choose Maintenance -> General Maintenance -> Maintain Hosts.
Refer to the maintenance section in Chapter 3 of the Storage Automated Diagnostic
User’s Guide for detailed instructions.
8. In the Maintain Hosts window, select the new host.
9. Configure the options as needed.
10. Choose Maintenance -> Topology Maintenance -> Topology Snapshot.
a. In the Topology Snapshot window, select the new host.
b. Click the Create and Retrieve Selected Topologies button.
c. Click the Merge and Push Master Topology button.
Note – Any time you replace a master, alternate master, or slave monitoring host,
you must recover the configuration using the procedures described in this section.
This is especially important when the Storage Service Processor is replaced as a
FRU— whether the Storage Service Processor is the master or the slave.
72Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
CHAPTER
7
Troubleshooting Switches
This chapter describes how to troubleshoot the 1 Gbit and 2 Gbit switch components
associated with a Sun StorEdge 3900 or 6900 series system.
This chapter contains the following sections:
■ “About the Switches” on page 73
■ “Using the Switch Event Grid” on page 77
■ “setupswitch Exit Values” on page 85
About the Switches
The Sun StorEdge network FC switch-8 and switch-16 switches provide cable
consolidation and increased connectivity for the internal data interconnection
infrastructure.
The switches are paired to provide redundancy. Two switches are used in each Sun
StorEdge 3900 series, and four switches are used in each Sun StorEdge 6900 series.
Each Sun StorEdge network FC switch-8 and switch-16 switch is connected by way
of an Ethernet to the service network for management and service from the Storage
Service Processor.
These switches can be monitored through the SANSurfer GUI (for SAN Release 4.0)
or the SANbox Manager (for SAN Release 4.1), which is available on the Storage
Service Processor. You configure and modify the switches using the Configuration
Utilities.
Caution – Do not configure or modify the switches using any method other than
the Configuration Utilities included in the SUNWsecfg package.
Sun Proprietary/Confidential: Internal Use Only
73
The Sun StorEdge network FC switches in a Sun StorEdge 3900 or 6900 configuration
now support the Sun StorEdge SAN 4.1 Release. You can upgrade the switches to
support the 402xx 2 Gbit-compatible firmware.
Caution – Use caution when upgrading back-end switches to the 2 Gbit-compatible
firmware. Use only the setswitchflash command, which performs the upgrade
and creates the zone configuration in a controlled manner (refer to the Sun StorEdge3900 and 6900 Series 2.0 Reference and Service Guide for the procedures).
Zone Modifications
You should not modify the shared zone set on the back-end switches—doing so can
cause an error (Error State 50) on the virtualization engine. If you determine,
however, that you must modify the shared zone set, follow these steps:
1. Offline the T ports (interswitch links).
2. Offline the virtualization engine ports.
3. Modify the zone on one switch while the other switch continues to run.
4. Online the T ports (interswitch links).
5. Allow the zone database to merge.
6. Online the virtualization engine ports.
You can use the sanbox2(1M) command to offline the ports. For example:
# /opt/SUNWsecfg/flib/sanbox2 -x switch-ip-addr port -state
offline
By default:
■ T ports are 671415
■ Virtualization engine ports are 08
74Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
Switchless Configurations
In a switchless configuration (Sun StorEdge 3900SL, 6910SL, or 6960SL series system)
you can upgrade the switches that are connected to the Solaris server to the Sun
StorEdge SAN 4.1 Release firmware. For a list of the supported switches visit the
http://www.sun.com web site.
Direct attachment to the StorEdge 3900 and 6900 Series arrays with 1 Gbit or 2 Gbit
HBAs require no changes.
Before making any changes to the Sun StorEdge 3900 or 6900 series, you must have
a Sun StorEdge SAN 4.1 infrastructure already in place and functional. This includes
at a minimum:
■ A Solaris host on the SAN management network loaded with SANbox2 Manager.
■ Sun StorEdge 2 Gbit 16-port switch network configured in desired topology (ring,
star, mesh, or cascade) with healthy ISL links.
▼ Diagnosing and Troubleshooting Switch
Hardware Problems
Note – Whereas 1 Gbit switch port numbers are numbered starting with 1 (one),
2 Gbit switch port numbers are numbered starting with 0 (zero).
1. To compare the current configuration to the default configuration, type:
# checkswitch -s switch -v
2. To compare the current switch configuration to the most recently saved map file,
type:
# checkswitch -s switch -p -v
3. To display the current switch configuration, type:
# showswitch -s switch
Chapter 7Troubleshooting Switches75
Sun Proprietary/Confidential: Internal Use Only
4. To restore the configuration from the saved map file back to the default switch
configuration, type:
# restoreswitch -s switch
For detailed diagnostic and troubleshooting procedures for the Sun StorEdge
network FC switch-8 and switch-16 switch hardware, refer to the Sun StorEdge SAN
4.1 Release Field Troubleshooting Guide.
This document covers the Sun StorEdge network FC switch-8 and switch-16 switch
and the interconnections (HBA, GBIC, and cables) on either side of the switch. The
Sun StorEdge SAN 4.1 Release Field Troubleshooting Guide also includes an appendix on
the Brocade Silkworm switch troubleshooting.
76Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
Using the Switch Event Grid
The Storage Automated Diagnostic Environment Switch Event Grid enables you to
sort switch events by component, category, or event type. The Storage Automated
Diagnostic Environment GUI displays an event grid that describes the severity of the
event, tells whether action is required, provides a description of the event, and gives
the recommended action. Refer to the Storage Automated Diagnostic EnvironmentUser’s Guide for more information.
▼ To Use the Switch Event Grid
1. From the Storage Automated Diagnostic Environment Help menu, select the Event
Grid link.
FIGURE 7-1 shows the Switch Event Grid, from which you can select related criteria
2.
for the event you are troubleshooting.
FIGURE 7-1 Switch Event Grid
Chapter 7Troubleshooting Switches77
Sun Proprietary/Confidential: Internal Use Only
TABLE 7-1 lists the switch events for Sun StorEdge network FC switch-8 and switch-
16 1 Gbit switches.
TABLE7-1Storage Automated Diagnostic Environment Event Grid for 1 Gbit Switches
Component
port
statistics
chassis.
fan
system_
reboot
chassis.
power
chassis.
temp
EventType
Severity
Action
Description
Note:
Text within
quotation marks
(“ “) is exactly
as it appears on
the Event Grid.
LogYellowY“Changein port statistics on
switch diag156-sw1b
(ip=192.168.0.31)”
The switch has reported a
change in an error counter.
This could indicate a failing
component in the link.
AlarmYellowY“chassis.fan.1 status
changed from OK”
AlarmYellowYThe uptime of the switch
was less than the previous
uptime of the switch. This
could indicate that the
switch has been reset either
by a user or by the loss of
power.
AlarmYellow“chassis.power.1 status
changed from OK”
This event monitors
changes in the status of the
chassis’ power supply, as
reported by the SANbox
chassis status.
AlarmYellow“chassis.temp.1 status
changed from OK”
Action
Required
1. Check the Topology GUI
for any link errors.
2. Quiesce I/O on the link
3. Run linktest on the link
to isolate the failing
FRU.
None.
1. Checkto see if the switch
has been reset.
2. Check the power going
to the switch.
None.
None.
This event monitors
changes in the status of the
chassis’ temperaturesupply,
as reported by SANbox
chassis status.
78Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
called ras d2-swb1
(ip=xxx.0.0.41)
10002000007a609”
Discovery events occur the
very first time the agent
probes a storage device. It
creates a detailed
description of the device
monitored and sends it
using any active notifier
such as the Sun
TM
Remote
Services (SRS) Net Connect
service or email.
enclosureLocation
Change
“Location of switch rasd2-swb0 (ip xxx.0.0.40)
was changed”
the Event Grid.
Action
Required
80Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide • March 2003
Sun Proprietary/Confidential: Internal Use Only
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.