document may be reproduced inany formby anymeans withoutprior writtenauthorization ofSun andits licensors,if any. Third-party
software,including fonttechnology,is copyrighted and licensed fromSun suppliers.
Parts of the product maybe derivedfrom BerkeleyBSD systems,licensed fromthe University of California. UNIX is a registered trademarkin
the U.S. and other countries, exclusively licensed through X/OpenCompany,Ltd.
Sun, Sun Microsystems,the Sunlogo, AnswerBook2,Sun StorEdge,StorTools,docs.sun.com, SunEnterprise, SunFire, SunOS, Netra, and
Solaris are trademarks,registered trademarks, or service marks of Sun Microsystems, Inc.in theU.S. andother countries.All SPARC
trademarks are usedunder licenseand aretrademarks orregisteredtrademarks ofSPARCInternational, Inc.in theU.S. and other countries.
Productsbearing SPARC trademarksare basedupon anarchitecturedeveloped bySun Microsystems,Inc.
The OPEN LOOK and Sun™ Graphical User Interface was developed bySun Microsystems,Inc. forits usersand licensees.Sun acknowledges
the pioneering effortsof Xeroxin researchingand developing the concept of visual orgraphical userinterfaces forthe computerindustry.Sun
holds a non-exclusive license fromXerox tothe XeroxGraphical User Interface, which license also covers Sun’s licensees who implement OPEN
LOOK GUIs and otherwise comply with Sun’s written license agreements.
Federal Acquisitions: CommercialSoftware—Government UsersSubject toStandard License TermsandConditions.
DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES,
INCLUDING ANY IMPLIED WARRANTY OFMERCHANTABILITY, FITNESS FOR APARTICULARPURPOSE OR NON-INFRINGEMENT,
ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Copyright 2002 Sun Microsystems, Inc.,4150 NetworkCircle, SantaClara, CA95054 Etats-Unis.Tousdroitsréservés.
Ce produit oudocument estdistribué avecdes licencesqui enrestreignent l’utilisation, la copie, la distribution, et la décompilation. Aucune
partie de ce produit oudocument nepeut êtrereproduitesous aucuneforme, parquelque moyenque cesoit, sansl’autorisation préalableet
écrite de Sun et de ses bailleurs de licence, s’il y en a.Le logicieldétenu pardes tiers,et quicomprend latechnologie relativeaux policesde
caractères,est protégépar uncopyright etlicencié pardes fournisseursde Sun.
Des parties de ce produitpourront êtredérivées des systèmes Berkeley BSD licenciés par l’Université de Californie. UNIX est une marque
déposée aux Etats-Unis et dans d’autres payset licenciéeexclusivement parX/Open Company, Ltd.
Sun, Sun Microsystems,le logoSun, AnswerBook2,Sun StorEdge,StorTools,docs.sun.com, SunEnterprise, SunFire, SunOS, Netra, et Solaris
sont des marquesde fabriqueou desmarques déposées,ou marquesde service, de Sun Microsystems,Inc. auxEtats-Unis etdans d’autrespays.
Toutes lesmarques SPARC sontutilisées souslicence etsont desmarques de fabrique ou des marques déposées de SPARCInternational, Inc.
aux Etats-Unis et dans d’autres pays.Les produitsportant lesmarques SPARC sont basés sur unearchitecture développéepar Sun
Microsystems,Inc.
L’interfaced’utilisation graphique OPEN LOOK et Sun™ a été développéepar SunMicrosystems, Inc.pour sesutilisateurs etlicenciés. Sun
reconnaîtles effortsde pionniersde Xeroxpour la rechercheet ledéveloppement duconcept desinterfaces d’utilisationvisuelle ougraphique
pour l’industrie de l’informatique. Sun détient une licence non exclusive deXerox surl’interface d’utilisationgraphique Xerox,cette licence
couvrant également les licenciés de Sun qui mettent en place l’interfaced’utilisation graphiqueOPEN LOOKet quien outrese conformentaux
licences écrites de Sun.
LA DOCUMENTATION EST FOURNIE “EN L’ETAT” ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES
OU TACITES SONT FORMELLEMENTEXCLUES, DANSLA MESUREAUTORISEE PAR LALOI APPLICABLE,Y COMPRIS NOTAMMENT
TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L’APTITUDE A UNE UTILISATION PARTICULIERE OU A
L’ABSENCE DE CONTREFAÇON.
Please
Recycle
Contents
1.Introduction1
Predictive Failure Analysis Capabilities2
2.General Troubleshooting Procedures3
Troubleshooting Overview Tasks3
Multipathing Options in the Sun StorEdge 6900 Series7
Alternatives to Sun StorEdge Traffic Manager8
▼To Quiesce the I/O8
▼To Unconfigure the c2 Path8
▼To Suspend the I/O10
▼To Return the Path to Production10
▼To View the VxDisk Properties11
▼To Quiesce the I/O on the A3/B3 Link13
▼To Suspend the I/O on the A3/B3 Link13
▼To Return the Path to Production14
Fibre Channel Links15
Fibre Channel Link Diagrams16
Host Side Troubleshooting18
Storage Service Processor Side Troubleshooting18
Service Request Numbers70
Service and Diagnostic Codes70
▼To Retrieve Service Information70
CLI Interface70
▼To Display Log Files and Retrieve SRNs71
▼To Clear the Log72
For Internal Use Only
Contentsv
Virtualization Engine LEDs72
Power LED Codes73
Interpreting LED Service and Diagnostic Codes73
Back Panel Features74
Ethernet Port LEDs74
Fibre Channel Link Error Status Report75
▼To Check Fibre Channel Link Error Status Manually76
Translating Host Device Names78
▼To Display the VLUN Serial Number79
Devices That Are Not Sun StorEdge Traffic Manager-Enabled79
Sun StorEdge Traffic Manager-Enabled Devices80
▼To View the Virtualization Engine Map81
▼To Failback the Virtualization Engine83
▼To Replace a Failed Virtualization Engine84
▼To Manually Clear the SAN Database86
▼To Reset the SAN Database on Both Virtualization Engines86
▼To Reset the SAN Database on a Single Virtualization Engine86
Stopping and Restarting the SLIC Daemon87
▼To Restart the SLIC Daemon87
Sun StorEdge 6900 Series Multipathing Example89
One Sun StorEdge T3+ array partner pair with 1 500GB RAID 5 LUN per
brick (2 LUNs total)89
Virtualization Engine Event Grid95
▼Using the Virtualization Engine Event Grid95
8.Troubleshooting the Sun StorEdge T3+ Array Devices99
Explorer Data Collection Utility99
▼To Install Explorer Data Collection Utility on the Storage Service
Processor99
viSun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Troubleshooting the T1/T2 Data Path102
Notes102
T1/T2 Notification Events103
Sun StorEdge T3+ Array Storage Service Processor Verification106
T1/T2 FRU Tests Available107
Notes108
T1/T2 Isolation Procedures108
Sun StorEdge T3+ Array Event Grid109
▼Using the Sun StorEdge T3+ Array Event Grid109
Replacing the Master Midplane122
▼To Replace the Master Midplane122
Conclusion122
9.Troubleshooting Ethernet Hubs123
setupswitch Exit Values141
For Internal Use Only
Contentsvii
viiiSun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
List of Figures
FIGURE 2-1Sun StorEdge 3900 Series Fibre Channel Link Diagram 16
FIGURE 2-2Sun StorEdge 6900 Series Fibre Channel Link Diagram 17
FIGURE 3-1Data Host Notification of Intermittent Problems 23
FIGURE 3-2Data Host Notification of Severe Link Error 24
FIGURE 3-3Storage Service Processor Notification 24
FIGURE 3-4A2/B2 FC Link Host Side Event 29
FIGURE 3-5A2/B2 FC Link Storage Service Processor Side Event 30
FIGURE 3-6A3/B3 FC Link Host-Side Event 35
FIGURE 3-7A3/B3 FC Link Storage Service Processor-Side Event 36
FIGURE 3-8A3/B3 FC Link Storage Service Processor-Side Event 36
FIGURE 3-9A4/B4 FC Link Data Host Notification 40
FIGURE 3-10Storage Service Processor Notification 41
FIGURE 5-1Host Event Grid 54
FIGURE 6-1Switch Event Grid 63
FIGURE 7-1Virtualization Engine Front Panel LEDs 73
FIGURE 7-2Sun StorEdge 6900 Series Logical View 90
FIGURE 7-3Primary Data Paths to the Alternate Master 91
FIGURE 7-4Primary Data Paths to the Master Sun StorEdge T3+ Array 92
FIGURE 7-5Path Failure—Before the Second Tier of Switches 93
List of Figuresix
FIGURE 7-6Path Failure —I/O Routed through Both HBAs 94
FIGURE 7-7Virtualization Engine Event Grid 95
FIGURE 8-1Storage Service Processor Event 103
FIGURE 8-2Virtualization Engine Alert 105
FIGURE 8-3Manage Configuration Files Menu 106
FIGURE 8-4Example Link Test Text Output from the Storage Automated Diagnostic Environment 107
FIGURE 8-5Sun StorEdge T3+ array Event Grid 109
List of Figuresx
Preface
The Sun StorEdge 3900 and 6900 Series Troubleshooting Guide provides guidelines
for isolating problems in supported configurations of the Sun StorEdge
6900 series. For detailed configuration information, refer to the Sun StorEdge 3900and 6900 Series Reference Manual.
The scope of this troubleshooting guide is limited to information pertaining to the
components of the Sun StorEdge 3900 and 6900 series, including the Storage Service
Processor and the virtualization engines in the Sun StorEdge 6900 series. This guide
is written for Sun personnel who have been fully trained on all the components in
the configuration.
TM
3900 and
How This Book Is Organized
This book contains the following topics:
Chapter 1 introduces the Sun StorEdge 3900 and 6900 series storage subsystems.
Chapter 2 offers general troubleshooting guidelines, such as quiescing the I/O, and
tools you can use to isolate and troubleshoot problems.
Chapter 3 provides Fibre Channel link troubleshooting procedures.
Chapter 4 presents information about configuration settings, specific to the Sun
StorEdge 3900 and 6900 series. It also provides a procedure for how to clear the lock
file.
Chapter 5 provides information on host device troubleshooting.
Chapter 6 provides information on Sun StorEdge network FC switch-8 and switch-
16 switch device troubleshooting.
xi
Chapter 7 provides detailed information for troubleshooting the virtualization
engines.
Chapter 8 describes how to troubleshoot the Sun StorEdge T3+ array devices. Also
included in this chapter is information about the Explorer Data Collection Utility.
Chapter 9 discusses ethernet hub troubleshooting. Information associated with the
3COM Ethernet hubs is limited in this guide, however, as this is third-party
information.
Appendix A provides virtualization engine references, including SRN and SNMP
Reference, an SRN/SNMP single point of failure table, and port communication and
service code tables.
Appendix B provides a list of SUNWsecfg Error Messages and recommendations for
corrective action.
Using UNIX Commands
This document may not contain information on basic UNIX®commands and
procedures such as shutting down the system, booting the system, and configuring
devices.
See one or more of the following for this information:
■ Solaris Handbook for Sun Peripherals
■ AnswerBook2™ online documentation for the Solaris™ operating environment
■ Other software documentation that you received with your system
xiiSun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Typographic Conventions
TypefaceMeaningExamples
AaBbCc123The names of commands, files,
and directories; on-screen
computer output
AaBbCc123
AaBbCc123Book titles, new words or terms,
What you type, when
contrasted with on-screen
computer output
words to be emphasized
Command-line variable; replace
with a real name or value
Edit your.login file.
Use ls -a to list all files.
% You have mail.
%
su
Password:
Read Chapter 6 in the User’s Guide.
These are called class options.
You must be superuser to do this.
To delete a file, type rm filename.
Shell Prompts
ShellPrompt
C shellmachine_name%
C shell superusermachine_name#
Bourne shell and Korn shell$
Bourne shell and Korn shell superuser#
Prefacexiii
Related Documentation
ProductTitlePart Number
Late-breaking News• Sun StorEdge 3900 and 6900 Series Release Notes816-3247
Sun StorEdge 3900 and 6900
series hardware information
Sun StorEdge T3 and T3+
array
Diagnostics• Storage Automated Diagnostics Environment User’s Guide816-3142
Sun StorEdge network FC
switch-8 and switch-16
SANbox switch management
using SANsurfer
Expansion cabinet• Sun StorEdge Expansion Cabinet Installation and Service
Storage server processor• Netra X1 Server User’s Guide
• Sun StorEdge 3900 and 6900 Series Site Preparation Guide
• Sun StorEdge 3900 and 6900 Series Regulatory and Safety
Compliance Manual
• Sun StorEdge 3900 and 6900 Series Hardware Installation and
Service Manual
• Sun StorEdge T3 and T3+ Array Start Here
• Sun StorEdge T3 and T3+ Array Installation, Operation, and
Service Manual
• Sun StorEdge T3 and T3+ Array Administrator’s Guide
• Sun StorEdge T3 and T3+ Array Configuration Guide
• Sun StorEdge T3 and T3+ Array Site Preparation Guide
• Sun StorEdge T3 and T3+ Field Service Manual
• Sun StorEdge T3 and T3+ Array Release Notes
• Sun StorEdge Network FC Switch-8 and Switch-16 Release Notes
• Sun StorEdge Network FC Switch-8 and Switch-16 Installation
and Configuration Guide
• Sun StorEdge Network FC Switch-8 and Switch-16 Best
Practices Manual
• Sun StorEdge Network FC Switch-8 and Switch-16 Operations
Guide
• Sun StorEdge Network FC Switch-8 and Switch-16 Field
Troubleshooting Guide
• SANbox 8/16 Segmented Loop Switch Management User ’s
Manual
A complete set of Solaris documentation and many other titles are located at:
http://docs.sun.com
Sun Welcomes Your Comments
Sun is interested in improving its documentation and welcomes your comments and
suggestions. You can email your comments to Sun at:
docfeedback@sun.com
Please include the part number (816-4290-10) of your document in the subject line of
your email.
Prefacexv
xviSun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CHAPTER
1
Introduction
The Sun StorEdge 3900 and 6900 series storage subsystems are complete
preconfigured storage solutions. The configurations for each of the storage
subsystems are shown in
TABLE1-1
SeriesSystem
Sun StorEdge
3900 series
Sun StorEdge
3910 system
TABLE 1-1.
Sun StorEdge
Fibre Channel
Switch Supported
Two 8-port
switches
Sun StorEdge T3+
Array Partner Groups
Supported
1to4
Additional Array
Partner Groups
Supported with
Optional Additional
Expansion Cabinet
Not applicable
Sun StorEdge
6900 series
Sun StorEdge
3960 system
Sun StorEdge
6910 system
Sun StorEdge
6960 system
Two 16-port
switches
Two 8-port
switches
Two 16-port
switches
1to4
1to3
1to3
1to5
1to4
1
Predictive Failure Analysis Capabilities
The Storage Automated Diagnostic Environment software provides the health and
monitoring functions for the Sun StorEdge 3900 and 6900 series systems. This
software provides the following predictive failure analysis (PFA) capabilities.
■ FC links—Fibre Channel links are monitored at all end points using the link FC-
ELS link counters. When link errors surpass the threshold values, an alert is sent.
This enables Sun personnel to replace components that are experiencing high
transient fault levels before a hard fault occurs.
■ Enclosure status—Many devices, like the Sun StorEdge network FC switch-8 and
switch-16 switch and the Sun StorEdge T3+ array, will cause the Storage
Automated Diagnostic Environment alerts to be sent if the temperature
thresholds are exceeded. This enables Sun-trained personnel to address the
problem before the component and enclosure fails.
■ SPOF notification—Storage Automated Diagnostic Environment notification for
path failures and failovers (that is, Sun StorEdge Traffic Manager software
failover) can be considered PFA, since Sun-trained personnel are notified and can
repair the primary path. This eliminates the time of exposure to single points of
failure and helps to preserve customer availability during the repair process.
PFA is not always effective in detecting or isolating failures. The remainder of this
document provides guidelines that can be used to troubleshoot problems that occur
in supported components of the Sun StorEdge 3900 and 6900 series.
2Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CHAPTER
2
General Troubleshooting
Procedures
This chapter contains the following sections:
■ “Troubleshooting Overview Tasks” on page 3
■ “Multipathing Options in the Sun StorEdge 6900 Series” on page 7
■ “Fibre Channel Links” on page 15
■ “Storage Automated Diagnostic Environment Event Grid” on page 21
Troubleshooting Overview Tasks
This section lists the high-level steps to isolate and troubleshoot problems in the Sun
StorEdge 3900 and 6900 series. It offers a methodical approach and lists the tools and
resources available at each step.
Note – A single problem can cause various errors throughout the SAN. A good
practice is to begin by investigating the devices that have experienced “Loss of
Communication” events in the Storage Automated Diagnostic Environment. These
errors usually indicate more serious problems.
A “Loss of Communication” error on a switch, for example, could cause multiple
ports and HBAs to go offline. Concentrating on the switch and fixing that failure can
help bring the ports and HBAs back online.
3
1. Discover the error by checking one or more of the following messages or files:
■ Storage Automated Diagnostic Environment alerts or email messages
■ /var/adm/messages
■ Sun StorEdge T3+ array syslog file
■ Storage Service Processor messages
■ /var/adm/messages.t3 messages
■ /var/adm/log/SEcfglog file
2. Determine the extent of the problem by using one or more of the following
methods:
■ Sun StorEdge T3+ array tests, including t3test(1M), t3ofdg(1M), and
t3volverify(1M), which can be found in the Storage Automated Diagnostic
Environment User’s Guide.
Note – These tests isolate the problem to a FRU that must be replaced. Follow the
instructions in the Sun StorEdge 3900 and 6900 Series Reference Manual and the Sun
StorEdge 3900 and 6900 Installation and Service Manual for proper FRU replacement
procedures.
Chapter 2General Troubleshooting Procedures5
For Internal Use Only
8. Verify the fix using the following tools:
■ Storage Automated Diagnostic Environment GUI Topology View and Diagnostic
Tests
■ /var/adm/messages on the data host
9. Return the path to service by using one of the following methods:
■ Multipathing software
■ Restarting the application
6Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Multipathing Options in the Sun
StorEdge 6900 Series
Using the virtualization engines presents several challenges in how multipathing is
handled in the Sun StorEdge 6900 series.
Unlike Sun StorEdge T3+ array and Sun StorEdge network FC switch-8 and switch16 switch installations, which present primary and secondary pathing options, the
virtualization engines present only primary pathing options to the data host. The
virtualization engines handle all failover and failback operations and mask those
operations from the multipathing software on the data host.
The following example illustrates a Sun StorEdge Traffic Manager problem on a Sun
StorEdge 6900 series system.
# luxadm display
/dev/rdsk/c6t29000060220041F96257354230303052d0s2
DEVICE PROPERTIES for disk: /dev/rdsk/
c6t29000060220041F96257354230303052d0s2
Status(Port A): O.K.
Status(Port B): O.K.
Vendor: SUN
Product ID: SESS01
WWN(Node): 2a000060220041f4
WWN(Port A): 2b000060220041f4
WWN(Port B): 2b000060220041f9
Revision: 080C
Serial Num: Unsupported
Unformatted capacity: 102400.000 MBytes
Write Cache: Enabled
Read Cache: Enabled
Minimum prefetch: 0x0
Maximum prefetch: 0x0
Device Type: Disk device
Path(s):
/dev/rdsk/c6t29000060220041F96257354230303052d0s2
/devices/scsi_vhci/ssd@g29000060220041f96257354230303052:c,raw
Controller /devices/pci@6,4000/SUNW,qlc@2/fp@0,0
Device Address 2b000060220041f4,0
Class primary
State ONLINE
Controller /devices/pci@6,4000/SUNW,qlc@3/fp@0,0
Device Address 2b000060220041f9,0
Class primary
State ONLINE
For Internal Use Only
Chapter 2General Troubleshooting Procedures7
Note that in the Class and State fields, the virtualization engines are presented as
two primary/ONLINE devices. The current Sun StorEdge Traffic Manager design
does not enable you to manually halt the I/O (that is, you cannot perform a failover
to the secondary path) when only primary devices are present.
Alternatives to Sun StorEdge Traffic Manager
As an alternative to using Sun StorEdge Traffic Manager, you can manually halt the
I/O using one of two methods: quiesce I/O and unconfigure the c2 path. These
methods are explained below.
# vxdmpadm listctlr all
CTLR-NAME ENCLR-TYPE STATE ENCLR-NAME
=====================================================
c0 OTHER_DISKS ENABLED OTHER_DISKS
c2 SENA ENABLED SENA0
c3 SENA ENABLED SENA0
c20 Disk ENABLED Disk
c23 Disk ENABLED Disk
From the VxDisk output, notice that there are two physical paths to the LUN:
■ c20t2B000060220041F4d0s2
■ c23t2B000060220041F9d0s2
Both of these paths are currently enabled with VxDMP.
Chapter 2General Troubleshooting Procedures11
For Internal Use Only
2. Use the luxadm(1M) command to display further information about the
underlying LUN.
DEVICE PROPERTIES for disk: /dev/rdsk/c23t2B000060220041F9d0s2
Status(Port A): O.K.
Vendor: SUN
Product ID: SESS01
WWN(Node): 2a000060220041f9
WWN(Port A): 2b000060220041f9
Revision: 080C
Serial Num: Unsupported
Unformatted capacity: 102400.000 MBytes
Write Cache: Enabled
Read Cache: Enabled
Minimum prefetch: 0x0
Maximum prefetch: 0x0
Device Type: Disk device
Path(s):
/dev/rdsk/c23t2B000060220041F9d0s2
/devices/pci@e,2000/pci@2/SUNW,qlc@4/fp@0,0/
ssd@w2b000060220041f9,0:c,raw
12Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
▼
To Quiesce the I/O on the A3/B3 Link
1. Determine the path you want to disable.
2. Disable the path by typing the following:
# vxdmpadm disable ctlr=<c#>
3. Verify that the path is disabled:
# vxdmpadm listctlr all
Steps 1 and 2 halt I/O only up to the A3/B3 link. I/O will continue to move over the
T1 & T2 paths, as well as the A4/B4 links to the Sun StorEdge T3+ array.
▼ To Suspend the I/O on the A3/B3 Link
Use one of the following methods to suspend I/O while the failover occurs:
1. Stop all customer applications that are accessing the Sun StorEdge T3+ array.
2. Manually pull the link from the Sun StorEdge T3+ array to the switch and wait
for a Sun StorEdge T3+ array LUN failover.
a. After the failover occurs, replace the cable and proceed with testing and FRU
isolation.
b. After testing is complete and any FRU replacement is finished, return the
controller state back to the default by using the virtualization engine failback
command.
Caution – This action will cause SCSI errors on the data host and a brief suspension
of I/O while the failover occurs.
Chapter 2General Troubleshooting Procedures13
For Internal Use Only
▼ To Return the Path to Production
1. Type:
# vxdmpadm enable ctlr=<c#>
2. Verify that the path has been re-enabled by typing:
# vxdmpadm listctlr all
14Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Fibre Channel Links
The following sections provide troubleshooting information for the basic
components and Fibre Channel links, listed in
TABLE2-1
LinkProvides Fibre Channel Link Between these Components
A1 to B1Datahost, sw1a, and sw1b
A2sw1a and v1a*
B2sw1b and v1b*
A3v1a and sw2a*
B3v1b and sw2b*
A4Master Sun StorEdge T3+ array and the “A” path switch
B4AltMaster Sun StorEdge T3+ array and the “B” path switch
T1 to T2sw2a and sw2b*
* Sun StorEdge 6900 series only
Note – In an actual Sun StorEdge 3900 or 6900 series configuration, there could be
more Sun StorEdge T3+ arrays than are shown in FIGURE 2-1 and FIGURE 2-2.
TABLE 2-1.
By using the Storage Automated Diagnostic Environment, you should be able to
isolate the problem to one particular segment of the configuration.
The information found in this section is based on the assumption that the Storage
Automated Diagnostic Environment is running on the data host, and that it is
configured to monitor host errors. If the Storage Automated Diagnostic Environment
is not installed on the data host, there will be areas of limited monitoring, diagnosis
and isolation.
The following diagrams provide troubleshooting information for the basic
components and Fibre Channel links specific to the Sun StorEdge 3900 series, shown
FIGURE 2-1, and the Sun StorEdge 6900 series, shown in FIGURE 2-2.
in
Chapter 2General Troubleshooting Procedures15
For Internal Use Only
Fibre Channel Link Diagrams
FIGURE 2-1 shows the basic components and the Fibre Channel links for a Sun
StorEdge 3900 series system:
■ A1 to B1—HBA to Sun StorEdge FC network switch-8 and switch-16 switch link
■ A4 to B4—Sun StorEdge FC network switch-8 and switch-16 switch to Sun
StorEdge T3+ array link
HOST
HBA-A
A1
Sw1aSw1b
T3 Alt-Master
A4
T3 Master
HBA-B
B1
B4
FIGURE 2-1 Sun StorEdge 3900 Series Fibre Channel Link Diagram
16Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
FIGURE 2-2 shows the basic components and the Fibre Channel links for a Sun
StorEdge 6900 series system:
■ A1 to B1—HBA to Sun StorEdge network FC switch-8 and switch-16 switch link
■ A2 to B2—Sun StorEdge network FC switch-8 and switch-16 switch to
virtualization engine link on the host side
■ A3 to B3—Sun StorEdge network FC switch-8 and switch-16 switch to the
virtualization engine link on the device side
■ A4 to B4—Sun StorEdge network FC switch-8 and switch-16 switch to Sun
StorEdge T3+ array switch
■ T1 to T2—T Port switch-to-switch link
HOST
A2
A3
Sw2a
Sw1a
V1a
A4
A1
HBA-A
HBA-B
B1
Sw1b
B2
V1b
B3
T1
Sw2b
T2
B4
T3 Alt-Master
T3 Master
FIGURE 2-2 Sun StorEdge 6900 Series Fibre Channel Link Diagram
Chapter 2General Troubleshooting Procedures17
For Internal Use Only
Host Side Troubleshooting
Host-side troubleshooting refers to the messages and errors the data host detects.
Usually, these messages appear in the /var/adm/messages file.
Storage Service Processor Side Troubleshooting
Storage Service Processor-side Troubleshooting refers to messages, alerts, and errors
that the Storage Automated Diagnostic Environment, running on the Storage Service
Processor, detects. You can find these messages by monitoring the following Sun
StorEdge 3900 series and the Sun StorEdge 6900 series components:
■ Sun StorEdge network FC switch-8 and switch-16 switches
■ Virtualization engine
■ Sun StorEdge T3+ array
Combining the host side messages and errors and the Storage Service Processor-side
messages, alerts, and errors into a meaningful context is essential for proper
troubleshooting.
18Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Command Line Test Examples
To run a single Sun StorEdge diagnostic test from the command line rather than
through the Storage Automated Diagnostic Environment interface, you must log into
the appropriate Host or Slave for testing the components. The following two tests,
the qlctest(1M) and the switchtest(1M) are provided as examples.
qlctest(1M)
The qlctest(1M) comprises several subtests that test the functions of the Sun
StorEdge PCI dual Fibre Channel (FC) host adapter board. This board is an HBA that
has diagnostic support. This diagnostic test is not scalable.
"qlctest: called with options: dev=/devices/pci@6,4000/SUNW,qlc@3/
fp@0,0:devctl|run_connect=Yes|mbox=Disable|ilb=Disable|ilb_10=Disable|el
b=Enable"
"qlctest: Started."
"Program Version is 4.0.1"
"Testing qlc0 device at /devices/pci@6,4000/SUNW,qlc@3/fp@0,0:devctl."
"QLC Adapter Chip Revision = 1, Risc Revision = 3,
Frame Buffer Revision = 1029, Riscrom Revision = 4,
Driver Revision = 5.a-2-1.15 "
"Running ECHO command test with pattern 0x7e7e7e7e"
"Running ECHO command test with pattern 0x1e1e1e1e"
"Running ECHO command test with pattern 0xf1f1f1f1"
<snip>
"Running ECHO command test with pattern 0x4a4a4a4a"
"Running ECHO command test with pattern 0x78787878"
"Running ECHO command test with pattern 0x25252525"
"FCODE revision is ISP2200 FC-AL Host Adapter Driver: 1.12 01/01/16"
"Firmware revision is 2.1.7f"
"Running CHECKSUM check"
"Running diag selftest"
"qlctest: Stopped successfully."
Chapter 2General Troubleshooting Procedures19
For Internal Use Only
switchtest(1M)
switchtest(1M) is used to diagnose the Sun StorEdge network FC switch-8 and
switch-16 switch devices. The switchtest process also provides command line
access to switch diagnostics. switchtest supports testing on local and remote
switches.
switchtest runs the port diagnostic on connected switch ports. While
switchtest is running, the port statistics are monitored for errors, and the chassis
status is checked.
CODE EXAMPLE 2-2switchtest(1M)
# /opt/SUNWstade/Diags/bin/switchtest -v -o "dev=
2:192.168.0.30:0x0|xfersize=200"
"switchtest: called with options: dev=2:192.168.0.30:0x0|xfersize=200"
The Storage Automated Diagnostic Environment generates component-specific event
grids that describe the severity of an Event, whether action is required, a description
of the event, and recommended action. Refer to Chapters 5 through 9 of this
troubleshooting guide for component-specific event grids.
▼ To Customize an Event Report
1. Click the Event Grid link on the the Storage Automated Diagnostic Environment
Help menu.
2. Select the criteria from the Storage Automated Diagnostic Environment event
grid, like the one shown in in
TABLE2-2Event Grid Sorting Criteria
CategoryComponentEvent TypeSeverityAction
• All (Default)
• Sun StorEdge A3500FC array
• Sun StorEdge A5000 array
• Agent
• Host
• Message
• Sun Switch
• Sun StorEdge T3+ array
• Tape
• Vvirtualization engine
• All
(Default)
• Backplane
• Controller
• Disk
• Interface
• LUN
• Port
• Power
TABLE 2-2.
• Agent Deinstall
• Agent Install
• Alarm
• Alternate Master +
• Alternate Master—
• Audit
• CommunicationEstablished
• CommunicationLost
• Discovery
• Heartbeat
• Insert Component
• Location Change
• Patch Info
• Quiesce End
• Quiesce Start
• Removal
• Remove Component
• State Change +(from offline
to online)
• State Change—(from online
to offline)
• Statistics
• Backup
Red—
Critical
(Error)
Yellow—
Alert
(Warning)
Down—
System
Down
Y—This
event is
actionable
and is sent
to RSS/
SRS
N—This
event is
non
actionable
For Internal Use Only
Chapter 2General Troubleshooting Procedures21
22Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CHAPTER
3
Troubleshooting the Fibre Channel
Links
A1/B1 Fibre Channel (FC) Link
If a problem occurs with the A1/B1 FC link:
■ In a Sun StorEdge 3900 series system, the Sun StorEdge T3+ array will fail over.
■ In a Sun StorEdge 6900 series system, no Sun StorEdge T3+ array will fail over,
but a severe problem can cause a path to go offline.
FIGURE 3-1, FIGURE 3-2, and FIGURE 3-3 are examples of A1/B1 Fibre Channel Link
Notification Events.
Site : FSDE LAB Broomfield CO
Source : diag.xxxxx.xxx.com
Severity : Normal
Category : Message Key: message:diag.xxxxx.xxx.com
EventType: LogEvent.driver.LOOP_OFFLINE
EventTime: 01/08/2002 14:34:45
Found 1 ’driver.LOOP_OFFLINE’ error(s) in logfile: /var/adm/messages on
diag.xxxxx.xxx.com (id=80fee746):
info: Loop Offline
Jan 8 14:34:25 WWN:Received 2 ’Loop Offline’ message(s) [threshold is 1
in 5mins] Last-Message: ’diag.xxxxx.xxx.com qlc: [ID 686697 kern.info] NOTICE:
Qlogic qlc(0): Loop OFFLINE ’
FIGURE 3-1 Data Host Notification of Intermittent Problems
23
Site : FSDE LAB Broomfield CO
Source : diag.xxxxx.xxx.com
Severity : Normal
Category : Message Key: message:diag.xxxxx.xxx.com
EventType: LogEvent.driver.MPXIO_offline
EventTime: 01/08/2002 14:48:02
Found 2 ’driver.MPXIO_offline’ warning(s) in logfile: /var/adm/messages on
diag.xxxxx.xxx.com (id=80fee746):
Jan 8 14:47:07 WWN:2b000060220041f9 diag.xxxxx.xxx.com mpxio: [ID
779286 kern.info] /scsi_vhci/ssd@g29000060220041f96257354230303053
(ssd19) multipath status: degraded, path /pci@6,4000/SUNW,qlc@3/fp@0,0
(fp1) to target address: 2b000060220041f9,1 is offline
Jan 8 14:47:07 WWN:2b000060220041f9 diag.xxxxx.xxx.com mpxio: [ID
779286 kern.info] /scsi_vhci/ssd@g29000060220041f96257354230303052
(ssd18) multipath status: degraded, path /pci@6,4000/SUNW,qlc@3/fp@0,0
(fp1) to target address: 2b000060220041f9,0 is offline
FIGURE 3-2 Data Host Notification of Severe Link Error
Site : FSDE LAB Broomfield CO
Source : diag.xxxxx.xxx.com
Severity : Normal
Category : Switch Key: switch:100000c0dd0057bd
EventType: StateChangeEvent.X.port.6
EventTime: 01/08/2002 14:54:20
’port.6’ in SWITCH diag-sw1a (ip=192.168.0.30) is now Unknown (statusstate changed from ’Online’ to ’Admin’):
FIGURE 3-3 Storage Service Processor Notification
Note – An A1/B1 FC link error can cause a port in sw1a or sw1b to change state.
24Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
▼
To Verify the Data Host
An error in the A1/B1 FC link can cause a path to go offline in the multipathing
software.
Controller /devices/pci@6,4000/SUNW,qlc@2/fp@0,0
Device Address 2b000060220041f4,0
Class primary
State ONLINE
...
For Internal Use Only
Chapter 3Troubleshooting the Fibre Channel Links25
An error in the A1/B1 FC link can also cause a device to enter the “unusable” state
in cfgadm. In this case, the output for luxadm -e port will show that a device that
was “connected” changed to an “unconnected” state.
■ Available only if the Storage Automated Diagnostic Environment is installed
on a data host
■ Causes HBA to go “offline” and “online” during tests
■ Switch —switchtest(1M)
■ Can be run while the link is still cabled and online (connected to HBA)
■ You must specify a payload of 200 bytes or less when testing the A1/B1 FC
link, while the link is connected to the HBA (limitation in HBA ASIC).
■ Can be run only from the Storage Service Processor
■ The dev option to switchtest is in the following format:
Port:IP-Address:FCAddress
The FCAddress can be set to 0x0
26Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CODE EXAMPLE 3-3switchtest(1M) called with options
# ./switchtest -v -o "dev=2:192.168.0.30:0"
"switchtest: called with options: dev=2:192.168.0.30:0"
"switchtest: Started."
"Testing port: 2"
"Using ip_addr: 192.168.0.30, fcaddr: 0x0 to access this port."
"Chassis Status for Device: Switch Power: OK Temp: OK 23.0c Fan 1: OK
Fan 2: OK "
02/06/02 15:09:45 diag Storage Automated Diagnostic Environment MSGID 4001
switchtest.WARNING
switch0: "Maximum transfer size for a FABRIC port is 200. Changing
transfer size 2000 to 200"
"Testing Device: Switch Port: 2 Pattern: 0x7e7e7e7e"
"Testing Device: Switch Port: 2 Pattern: 0x1e1e1e1e"
Note – The Storage Automated Diagnostic Environment automatically resets the
transfer size if it notes that it is about to test a switch to HBA connection. This is
done both in the Storage Automated Diagnostic Environment GUI and from the
command-line interface (CLI).
For Internal Use Only
Chapter 3Troubleshooting the Fibre Channel Links27
▼ To Isolate the A1/B1 FC Link
1. Quiesce the I/O on the A1/B1 FC link path.
2. Run switchtest or qlctest to test the entire link.
3. Break the connection by uncabling the link.
4. Insert a loopback connector into the switch port.
5. Rerun switchtest.
a. If switchtest fails, replace the GBIC and rerun switchtest.
b. If switchtest fails again, replace the switch.
6. Insert a loopback connector into the HBA.
7. Run qlctest.
■ If the test fails, replace the HBA.
■ If the test passes, replace the cable.
8. Recable the entire link.
9. Run switchtest or qlctest to validate the fix.
10. Return the path to production.
28Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
A2/B2 Fibre Channel (FC) Link
If a problem occurs with the A2/B2 FC link:
■ In a Sun StorEdge 3900 series system, the Sun StorEdge T3+ array will fail over.
■ In a Sun StorEdge 6900 series system, no Sun StorEdge T3+ array will fail over,
but a severe problem can cause a path to go offline.
FIGURE 3-4 and FIGURE 3-5 are examples of A2/B2 FC Link Notification Events.
From root Tue Jan 8 18:39:48 2002
Date: Tue, 8 Jan 2002 18:39:47 -0700 (MST)
Message-Id: <200201090139.g091dlg07015@diag.xxxxx.xxx.com>
From: Storage Automated Diagnostic Environment.Agent
Subject: Message from ’diag.xxxxx.xxx.com’ (2.0.B2.002)
Content-Length: 2742
You requested the following events be forwarded to you from
’diag.xxxxx.xxx.com’.
Site : FSDE LAB Broomfield CO
Source : diag226.xxxxx.xxx.com
Severity : Normal
Category : Message Key: message:diag.xxxxx.xxx.com
EventType: LogEvent.driver.Fabric_Warning
EventTime: 01/08/2002 17:34:47
Found 1 ’driver.Fabric_Warning’ warning(s) in logfile: /var/adm/messages
on diag.xxxxx.xxx.com (id=80fee746):
Info: Fabric warning
Jan 8 17:34:36 WWN:2b000060220041f4diag.xxxxx.xxx.com fp: [ID 517869
kern.warning] WARNING: fp(0): N_x Port with D_ID=108000,
PWWN=2b000060220041f4 disappeared from fabric
<snip>
multipath status: degraded, path /pci@6,4000/SUNW,qlc@2/fp@0,0 (fp0) to
target address: 2b000060220041f4,1 is offline
Jan 8 17:34:55 WWN:2b000060220041f4 diag.xxxxx.xxx.com
Site : FSDE LAB Broomfield CO
Source : diag.xxxxx.xxx.com
Severity : Normal
Category : San Key: switch:100000c0dd0061bb:1
EventType: LinkEvent.ITW.switch|ve
EventTime: 01/08/2002 17:39:47
ITW-ERROR (765 in 11 mins): Origin: port 1 on ’switch ’sw1b/192.168.0.31’.
Destination: port 1 on ve ’diag-v1b/29000060220041f4’:
Info:
An invalid transmission word (ITW) was detected between two components.
This could indicate a potential problem.
Cause:
Likely Causes are: GBIC, FC Cable and device optical connections.
Action:
To isolate further please run the Storage Automated Diagnostic Environment
tests associated with this link segment.
FIGURE 3-5 A2/B2 FC Link Storage Service Processor Side Event
30Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
▼
To Verify the Host Side
An error in the A2/B2 FC link can result in a device being listed as in an “unusable”
state in cfgadm, but no HBAs are listed as in the “unconnected” state in luxadm
output. The multipathing software will note an OFFLINE path.
For Internal Use Only
Chapter 3Troubleshooting the Fibre Channel Links31
DEVICE PROPERTIES for disk: /dev/rdsk/
c6t29000060220041F96257354230303052d0s2
Status(Port A): O.K.
Status(Port B): O.K.
Vendor: SUN
Product ID: SESS01
WWN(Node): 2a000060220041f9
WWN(Port A): 2b000060220041f9
WWN(Port B): 2b000060220041f4
Revision: 080C
Serial Num: Unsupported
Unformatted capacity: 102400.000 MBytes
Write Cache: Enabled
Read Cache: Enabled
Minimum prefetch: 0x0
Maximum prefetch: 0x0
Device Type: Disk device
Path(s):
/dev/rdsk/c6t29000060220041F96257354230303052d0s2
/devices/scsi_vhci/ssd@g29000060220041f96257354230303052:c,raw
Controller /devices/pci@6,4000/SUNW,qlc@3/fp@0,0
Device Address 2b000060220041f9,0
Class primary
State ONLINE
Controller /devices/pci@6,4000/SUNW,qlc@2/fp@0,0
Device Address 2b000060220041f4,0
Class primary
State OFFLINE
32Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Note – You can find procedures for restoring virtualization engine settings in the
Sun StorEdge 3900 and 6900 Series Reference Manual .
▼ To Verify the A2/B2 FC Link
You can check the A2/B2 FC link using the Storage Automated Diagnostic
Environment, Diagnose—Test from Topology functionality. The Storage Automated
Diagnostic Environment’s implementation of diagnostic tests verifies the operation
of user-selected components. Using the Topology view, you can select specific tests,
subtests, and test options.
Refer to Chapter 5 of the Storage Automated Diagnostic Environment User’s Guide for
more information.
FRU Tests Available for A2/B2 FC Link Segment
■ The linktest is not available.
■ The switch and/or GBIC— switchtest test:
■ Can be used only in conjunction with the loopback connector.
■ Cannot be cabled to the virtualization engine while switchtest runs.
■ No virtualization engine tests are available at this time.
▼ To Isolate the A2/B2 FC Link
1. Quiesce the I/O on the A2/B2 FC link path.
2. Break the connection by uncabling the link.
3. Insert the loopback connector into the switch port.
4. Run switchtest:
a. If the test fails, replace the GBIC and rerun switchtest.
b. If the test fails again, replace the switch.
Chapter 3Troubleshooting the Fibre Channel Links33
For Internal Use Only
5. If the switch or the GBIC show no errors, replace the remaining components in
the following order:
a. Replace the virtualization engine-side GBIC, recable the link, and monitor the
link for errors.
b. Replace the cable, recable the link, and monitor the link for errors.
c. Replace the virtualization engine, restore the virtualization engine settings,
recable the link, and monitor the link for errors
6. Return the path to production.
The procedures for restoring virtualization engine settings are in the Sun StorEdge
3900 and 6900 Series Reference Manual.
34Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
A3/B3 Fibre Channel (FC) Link
If a problem occurs with the A3/B3 FC link:
■ In a Sun StorEdge 3900 series system, the Sun StorEdge T3+ array will fail over.
■ In a Sun StorEdge 6900 series system, no Sun StorEdge T3+ array will fail over,
but a severe problem can cause a path to go offline.
FIGURE 3-6, FIGURE 3-7, and FIGURE 3-8 are examples of A3/B3 FC link Notification
Events.
Site : FSDE LAB Broomfield CO
Source : diag.xxxxx.xxx.com
Severity : Normal
Category : Message Key: message:diag.xxxxx.xxx.com
EventType: LogEvent.driver.MPXIO_offline
EventTime: 01/08/2002 18:25:18
Found 2 ’driver.MPXIO_offline’ warning(s) in logfile: /var/adm/messages on
diag.xxxxx.xxx.com (id=80fee746):
Jan 8 18:24:24 WWN:2b000060220041f9 diag.xxxxx.xxx.com mpxio: [ID
779286 kern.info] /scsi_vhci/ssd@g29000060220041f96257354230303053
(ssd19) multipath status: degraded, path /pci@6,4000/SUNW,qlc@3/fp@0,0
(fp1) to target address: 2b000060220041f9,1 is offline
Jan 8 18:24:24 WWN:2b000060220041f9 diag.xxxxx.xxx.com mpxio: [ID
779286 kern.info] /scsi_vhci/ssd@g29000060220041f96257354230303052
(ssd18) multipath status: degraded, path /pci@6,4000/SUNW,qlc@3/fp@0,0
(fp1) to target address: 2b000060220041f9,0 is offline
---------------------------------------------------------------Site : FSDE LAB Broomfield CO
Source : diag.xxxxx.xxx.com
Severity : Normal
Category : Message Key: message:diag.xxxxx.xxx.com
EventType: LogEvent.driver.Fabric_Warning
EventTime: 01/08/2002 18:25:18
Found 1 ’driver.Fabric_Warning’ warning(s) in logfile: /var/adm/messages
on diag.xxxxx.xxx.com (id=80fee746):
Info:
Fabric warning
Jan 8 18:24:04 WWN:2b000060220041f9diag.xxxxx.xxx.com fp: [ID 517869
kern.warning] WARNING: fp(1): N_x Port with D_ID=104000,
PWWN=2b000060220041f9 disappeared from fabric
FIGURE 3-6 A3/B3 FC Link Host-Side Event
Chapter 3Troubleshooting the Fibre Channel Links35
For Internal Use Only
Site : FSDE LAB Broomfield CO
Source : diag.xxxxx.xxx.com
Severity : Normal
Category : Switch Key: switch:100000c0dd0057bd
EventType: StateChangeEvent.M.port.1
EventTime: 01/08/2002 18:28:38
’port.1’ in SWITCH diag-sw1a (ip=192.168.0.30) is now Not-Available
(status-state changed from ’Online’ to ’Offline’):
Info:
A port on the switch has logged out of the fabric and gone offline
Action:
1. Verify cables, GBICs and connections along Fibre Channel path
2. Check Storage Automated Diagnostic Environment SAN Topology GUI to
identify failing segment of the data path
3. Verify correct FC switch configuration
FIGURE 3-7 A3/B3 FC Link Storage Service Processor-Side Event
Site : FSDE LAB Broomfield CO
Source : diag.xxxxx.xxx.com
Severity : Normal
Category : Switch Key: switch:100000c0dd00cbfe
EventType: StateChangeEvent.M.port.1
EventTime: 01/08/2002 18:28:40
’port.1’ in SWITCH diag-sw2a (ip=192.168.0.32) is now Not-Available
(status-state changed from ’Online’ to ’Offline’):
Info:
A port on the switch has logged out of the fabric and gone offline
Action:
1. Verify cables, GBICs and connections along Fibre Channel path
2. Check Storage Automated Diagnostic Environment SAN Topology GUI to
identify failing segment of the data path
3. Verify correct FC switch configuration
FIGURE 3-8 A3/B3 FC Link Storage Service Processor-Side Event
36Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
▼
To Verify the Host Side
An error in the A3/B3 FC link results in a device being listed as in an “unusable”
state in cfgadm, but no HBAs are listed as in the “unconnected” state in luxadm
output. The multipathing software will note an “offline” path.
/dev/rdsk/c6t29000060220041F96257354230303052d0s2
DEVICE PROPERTIES for disk: /dev/rdsk/
c6t29000060220041F96257354230303052d0s2
<snip>
/devices/scsi_vhci/ssd@g29000060220041f96257354230303052:c,raw
Controller /devices/pci@6,4000/SUNW,qlc@3/fp@0,0
Device Address 2b000060220041f9,0
Class primary
State OFFLINE
Controller /devices/pci@6,4000/SUNW,qlc@2/fp@0,0
Device Address 2b000060220041f4,0
Class primary
State ONLINE
Chapter 3Troubleshooting the Fibre Channel Links37
For Internal Use Only
CODE EXAMPLE 3-6VxDMP Error Message
Jan 8 18:26:38 diag.xxxxx.xxx.com vxdmp: [ID 619769 kern.notice] NOTICE:
vxdmp: Path failure on 118/0x1f8
Jan 8 18:26:38 diag.xxxxx.xxx.com vxdmp: [ID 997040 kern.notice] NOTICE:
vxvm:vxdmp: disabled path 118/0x1f8 belonging to the dmpnode 231/0xd0
▼ To Verify the Storage Service Processor
You can check the A3/B3 FC link using the Storage Automated Diagnostic
Environment, Diagnose—Test from Topology functionality. Storage Automated
Diagnostic Environment’s implementation of diagnostic tests verify the operation of
user-selected components. Using the Topology view, you can select specific tests,
subtests, and test options.
Refer to the Storage Automated Diagnostic Environment User’s Guide for more
information.
FRU Tests Available for the A3/B3 FC Link
Segment
■ The Linktest is not available.
■ The switch and/or GBIC - switchtest test:
■ Can be used only in conjunction with the loopback connector.
■ Cannot be cabled to the virtualization engine while switchtest runs.
■ No virtualization engine tests are available at this time.
38Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
▼
To Isolate the A3/B3 FC Link
1. Quiesce the I/O on the A3/B3 FC link path.
2. Break the connection by uncabling the link.
3. Insert the loopback connector into the switch port.
4. Run switchtest:
a. If the test fails, replace the GBIC and rerun switchtest.
b. If the test fails again, replace the switch.
5. If the switch or the GBIC show no errors, replace the remaining components in
the following order:
a. Replace the virtualization engine-side GBIC, recable the link, and monitor the
link for errors.
b. Replace the cable, recable the link, and monitor the link for errors.
c. Replace the virtualization engine, restore the virtualization engine settings,
recable the link, and monitor the link for errors
6. Return the path to production.
The procedures for restoring virtualization engine settings are in the Sun StorEdge
3900 and 6900 Series Reference Manual.
For Internal Use Only
Chapter 3Troubleshooting the Fibre Channel Links39
A4/B4 Fibre Channel (FC) Link
If a problem occurs with the A4/B4 FC link:
■ In a Sun StorEdge 3900 series system, the Sun StorEdge T3+ array will fail over.
■ In a Sun StorEdge 6900 series system, no Sun StorEdge T3+ array will fail over,
but a severe problem can cause a path to go offline.
FIGURE 3-10 are examples of A4/B4 Link Notification Events.
Found 1 ’driver.Fabric_Warning’ warning(s) in logfile: /var/adm/messages on
diag.xxxxx.xxx.com (id=80e4aa60):
INFORMATION:
Fabric warning
<snip>
status of hba /devices/pci@a,2000/pci@2/SUNW,qlc@5/fp@0,0:devctl on
diag.xxxxx.xxx.com changed from CONNECTED to NOT CONNECTED
INFORMATION:
monitors changes in the output of luxadm -e port
Found path to 20 HBA ports
/devices/sbus@2,0/SUNW,socal@d,10000:0 NOT CONNECTED
FIGURE 3-9 A4/B4 FC Link Data Host Notification
40Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
FIGURE 3-10 Storage Service Processor Notification
Chapter 3Troubleshooting the Fibre Channel Links41
For Internal Use Only
▼ To Verify the Data Host
A problem in the A4/B4 FC Link appears differently on the data host, depending on
if the array is a Sun StorEdge 3900 series or a Sun StorEdge 6900 seriesdevice.
Sun StorEdge 3900 Series
In a Sun StorEdge 3900 series device, the data host multipathing software is
responsible for initiating the failover and reports it in /var/adm/messages, such
as those reported by the Storage Automated Diagnostic Environment email
notifications.
The luxadm failover command is used to fail the Sun StorEdge T3+ array LUNs
back to the proper configuration after the failing FRU is replaced. This command is
issued from the data host.
Sun StorEdge 6900 Series
In a Sun StorEdge 6900 series device, the virtualization engine pairs handle the
failover and the failover is not noted on the data host. All paths would remain
ONLINE and ACTIVE.
The mpdrive failback command is used, and is issued from the Storage Service
Processor.
Note – In the event of a complete sw1b or sw2b failure in a Sun StorEdge 6900
series configuration, the virtualization engine pairs handle the failover. In addition,
the multipathing software notes a path failure on the data host, Sun StorEdge Traffic
Manager or VxDMP takes the entire path that was connected to the failed switch
offline, and the ISL ports on the surviving switch go offline as well.
42Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
To verify the failover luxadm display can be used, the failed path will be marked
OFFLINE, as shown in
CODE EXAMPLE 3-7Failed Path marked OFFLINE
# luxadm display /dev/rdsk/c26t60020F200000644>
DEVICE PROPERTIES for disk: /dev/rdsk/
c26t60020F20000064433C3352A60003E82Fd0s2
Status(Port A): O.K.
Status(Port B): O.K.
Vendor: SUN
Product ID: T300
WWN(Node): 50020f2000006443
WWN(Port A): 50020f2300006355
WWN(Port B): 50020f2300006443
Revision: 0118
Serial Num: Unsupported
Unformatted capacity: 488642.000 MBytes
Write Cache: Enabled
Read Cache: Enabled
Minimum prefetch: 0x0
Maximum prefetch: 0x0
Device Type: Disk device
Path(s):
/dev/rdsk/c26t60020F20000064433C3352A60003E82Fd0s2
/devices/scsi_vhci/ssd@g60020f20000064433c3352a60003e82f:c,raw
Controller /devices/pci@a,2000/pci@2/SUNW,qlc@5/fp@0,0
Device Address 50020f2300006355,1
Class primary
State OFFLINE
Controller /devices/pci@e,2000/pci@2/SUNW,qlc@5/fp@0,0
Device Address 50020f2300006443,1
Class secondary
State ONLINE
CODE EXAMPLE 3-7.
Note – This type of error may also cause the device to show up "unusable" in
cfgadm, as shown in CODE EXAMPLE 3-8.
Chapter 3Troubleshooting the Fibre Channel Links43
■ The switchtest can only be run from the Storage Service Processor
■ The linktest will be able to isolate the switch and the GBIC on the switch. It
will not be able to isolate the cable or the Sun StorEdge T3+ array controller.
▼ To Isolate the A4/B4 FC Link
1. Quiesce the I/O on the A4/B4 FC link path.
2. Run linktest from the Storage Automated Diagnostic Environment GUI to
isolate suspected failing components.
Alternatively, follow these steps:
1. Quiesce the I/O on the A4/B4 FC link path.
2. Run switchtest to test the entire link (re-create the problem).
3. Break the connnection by uncabling the link.
4. Insert the loopback connector into the switch port.
44Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
5. Rerun switchtest.
a. If switchtest fails, replace the GBIC and rerun switchtest.
b. If the test fails again, replace the switch.
6. If switchtest passes, assume that the suspect components are the cable and the
Sun StorEdge T3+ array controller.
a. Replace the cable.
b. Rerun switchtest.
7. If the test fails again, replace the Sun StorEdge T3+ array controller.
8. Return the path to production.
9. Return the Sun StorEdge T3+ array LUNs to the correct controllers, if a failover
occured (determine if failovers occur using the luxadm failover or mpdrivefailback commands).
For Internal Use Only
Chapter 3Troubleshooting the Fibre Channel Links45
46Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CHAPTER
4
Configuration Settings
This chapter contains the following sections:
■ “Verifying Configuration Settings” on page 47
■ “To Clear the Lock File” on page 50
For a complete listing of SUNWsecfg Error Messages and recommended action, refer
to Appendix B.
Verifying Configuration Settings
During the course of troubleshooting, you might need to verify configuration
settings on the various components in the Sun StorEdge 3900 or 6900 series.
▼ To Verify Configuration Settings
● Run one of the following scripts:
■ Use the /opt/SUNWsecfg/runsecfg script and select the various Verify menu
selections.
■ Run the /opt/SUNWsecfg/bin/checkdefaultconfig script to check all
accessible components. The output is shown in
■ Run the checkswitch | checkt3config | checkve | checkvemap scripts
manually from /opt/SUNwsecfg/bin.
The scripts listed above check the default configuration files in the /opt/SUNWsecfg/etc directory and compare the current, live settings to those of the
defaults. Any differences are marked with a FAIL.
CODE EXAMPLE 4-1.
47
Note – For cluster configurations and systems that are attached to Windows NT, the
default configurations may not match the current installed configuration. Be aware
of this when running the verification scripts. Certain items may be flagged as FAIL
in these special circumstances.
CODE EXAMPLE 4-1/opt/SUNWsecfg/checkdefaultconfig output
Checking command ver : PASS
Checking command vol stat : PASS
Checking command port list : PASS
Checking command port listmap : PASS
Checking command sys list: FAIL <-- Failure Noted
Checking T3+: t3b2
Checking : t3b2 Configuration.......
Checking command ver : PASS
Checking command vol stat : PASS
Checking command port list : PASS
Checking command port listmap : PASS
Checking command sys list : PASS
<snip>
48Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
10. If anything is marked FAIL, check the /var/adm/log/SEcfglog file for the
details of the failure.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : ----------
-SAVED CONFIGURATION--------------.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : blocksize : 16k.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : cache : auto.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : mirror : auto.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : mp_support : rw.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : rd_ahead : off.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : recon_rate : med.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO :sys memsize : 32
MBytes.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : cache memsize :
256 MBytes.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : .
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : ----------
-CURRENT CONFIGURATION------------.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : blocksize : 16k.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : cache : auto.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : mirror : off.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : mp_support : rw.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : rd_ahead : off.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : recon_rate : med.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO :sys memsize : 32
MBytes.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : cache memsize :
256 MBytes.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : .
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : ----------
In this example, the mirror setting in the Sun StorEdge T3+ array system settings is
“off.” The SAVED CONFIGURATION setting for this parameter, which is the default
setting, should be “auto.”
Chapter 4Configuration Settings49
For Internal Use Only
11. Fix the FAIL condition, and then verify the settings again.
# /opt/SUNWsecfg/bin/checkt3config -n t3b0
Checking : t3b0 Configuration.......
Checking command ver : PASS
Checking command vol stat : PASS
Checking command port list : PASS
Checking command port listmap : PASS
Checking command sys list : PASS
If you interrupt any of the SUNWsecfg scripts (by typing a Control-C default font,
for example), a lock file might remain in the /opt/SUNWsecfg/etc directory,
causing subsequent commands to fail. Use the following procedure to clear the lock
file.
▼ To Clear the Lock File
1. Type the following command:
# /opt/SUNWsecfg/bin/removelocks
usage : removelocks [-t|-s|-v]
where:
-t - remove all T3+ related lock files.
-s - remove all switch related lock files.
-v - remove all virtualization engine related lock files.
# /opt/SUNWsecfg/bin/removelocks -v
Note – After any virtualization engine configuration change, the script saves a new
copy of the virtualization engine map. This may take a minimum of two minutes,
during which time no additional virtualization engine changes are accepted.
2. Monitor the /var/adm/log/SEcfglog file to see when the savevemap process
successfully exits.
50Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
CODE EXAMPLE 4-2savevemap output
Tue Jan 29 16:12:34 MST 2002 savevemap: v1 ENTER.
Tue Jan 29 16:12:34 MST 2002 checkslicd: v1 ENTER.
Tue Jan 29 16:12:42 MST 2002 checkslicd: v1 EXIT.
Tue Jan 29 16:14:01 MST 2002 savevemap: v1 EXIT.
When savevemap: <ve-pair> EXIT is displayed, the savevemap process has
successfully exited.
For Internal Use Only
Chapter 4Configuration Settings51
52Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
CHAPTER
5
Troubleshooting Host Devices
This chapter describes how to troubleshoot components associated with a Sun
StorEdge 3900 or 6900 series Host.
This chapter contains the following sections:
■ “Using the Host Event Grid” on page 53
■ “To Replace the Master Host” on page 57
■ “To Replace the Alternate Master or Slave Monitoring Host” on page 58
Host Event Grid
The Storage Automated Diagnostic Environment Event Grid enables you to sort host
events by component, category, or event type. The Storage Automated Diagnostic
Environment GUI displays an event grid that describes the severity of the event,
whether action is required, a description of the event, and the recommended action.
Refer to the Storage Automated Diagnostic Environment User’s Guide for more
information.
▼ Using the Host Event Grid
1. From the Storage Automated Diagnostic Environment Help menu, click the Event
Grid link.
2. Select the criteria from the Storage Automated Diagnostic Environment event
grid, like the one shown in
FIGURE 5-1.
53
FIGURE 5-1 Host Event Grid
54Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
TABLE 5-1 lists all the host events in the Storage Automated Diagnostic Environment.
TABLE5-1Storage Automated Diagnostic Environment Event Grid for the Host
• Monitors changes
in the output of the
luxadm -e port.
• Found path to 20
HBA ports.
on
diag.xxxxx.xxx.com
changed from
CONNECTED to
NOT CONNECTED
hostlun.t300Alarm-RedY[
Info ] The state of
lun.T300.c14t500
20F2300003EE5d0s
2.statusA on
diag.xxxxx.xxx.com
changed from OK to
ERROR
(target=t3:diag244t3b0/90.0.0.40)
luxadm display
reported a change in
the port status of
one of its paths. The
Storage Automated
Diagnostic
Environment then
tries to find to
which enclosurethis
path corresponds by
reviewing its
database of Sun
StorEdge T3+ arrays
and virtualization
engines.
For Internal Use Only
Chapter 5Troubleshooting Host Devices55
TABLE5-1Storage Automated Diagnostic Environment Event Grid for the Host (Continued)
hostlun.VEAlarm-RedY[ Info ] The state of
lun.VE.c14t50020
F2300003EE5d0s2.
statusA on
diag.xxxxx.xxx.com
changed from OK to
ERROR
(target=ve:diag244ve0/90.0.0.40)
hostifptestDiagnostic
Test-
hostqlctestDiagnostic
Test-
hostsocaltestDiagnostic
Test-
hostenclosurePatchInfo[
RedYifptest (diag240) on
host failed.
Redqlctest (diag240) on
host failed.
Redsocaltest (diag240) on
host failed.
Info ] New patch
and package
information
generated.
hostenclosurebackup[
Info ] Agent BackupBackup of the
luxadm display
reported a change in
the port status of
one of its paths. The
Storage Automated
Diagnostic
Environment then
tries to find to
which enclosurethis
path corresponds by
reviewing its
database of Sun
StorEdge T3+ arrays
and virtualization
engines.
Send changes to the
output of showrev p and pkginfo -|.
configuration file of
the agent.
56Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Replacing the Master, Alternate Master,
and Slave Monitoring Host
The following procedures are a high-level overview of the procedures that are
detailed in the Storage Automated Diagnostic Environment User’s Guide. Follow these
procedures when replacing a master, alternate master, or slave monitoring host.
Note – The procedures for replacing the master host are different from the
procedures for replacing an alternate master or slave monitoring host.
▼ To Replace the Master Host
Refer to Chapter 2 of the Storage Automated Diagnostic Environment User’s Guide for
detailed instructions for the next four steps.
1. Install the SUNWstade package on a new Master Host.
2. Run /opt/SUNWstade/bin/ras_install on the new Master Host.
3. Configure the Host as the Master Host.
4. Connect to the Master Server’s GUI at http://<servername>:7654.
For Internal Use Only
Chapter 5Troubleshooting Host Devices57
5. Choose Utilities -> System -> Recover Config.
Refer to Chapter 7 of the Storage Automated Diagnostic Environment User’s Guide for
detailed instructions.
a. In the Recover Config window, enter the IP address of any alternate master or
slave monitoring host (all hosts keep a copy of the configuration).
b. Make sure the Recover Config and Reset slave to this master checkboxes are
checked.
c. Click Recover.
6. Choose Maintenance -> General Maintenance.
Ensure that all host and device settings are recovered correctly.
Refer to Chapter 3 of the Storage Automated Diagnostic Environment User’s Guide for
detailed instructions.
7. Choose Maintenance -> General Maintenance -> Start/Stop Agent to start the
agent on the master host.
▼ To Replace the Alternate Master or Slave
Monitoring Host
1. Choose Maintenance -> General Maintenance -> Maintain Hosts.
Refer to Chapter 3, “Maintenance,” of the Storage Automated Diagnostic Environment
User’s Guide.
2. In the Maintain Hosts window, select the host to be replaced from the Existing
Hosts list, and click Delete.
3. Install the new host.
Refer to Chapter 2 of the Storage Automated Diagnostic Environment User’s Guide for
detailed instructions for the next four steps.
4. Install the SUNWstade package on the new host.
5. Run /opt/SUNWstade/bin/ras_install.
6. Configure the host as a slave.
58Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
7. Choose Maintenance -> General Maintenance -> Maintain Hosts.
Refer to Chapter 3, “Maintenance,” of the Storage Automated Diagnostic User’s Guide
for detailed instructions.
8. In the Maintain Hosts window, select the new host.
9. Configure the options as needed.
10. Choose Maintenance -> Topology Maintenance -> Topology Snapshot.
a. In the Topology Snapshot window, select the new host.
b. Click Create and Retrieve Selected Topologies.
c. Click Merge and Push Master Topology.
Conclusion
Any time a master, alternate master, or slave monitoring host is replaced, you must
recover the configuration using the procedures described above. This is especially
important when the Storage Service Processor is replaced as a FRU, whether the
Storage Service Processor is the master or the slave.
For Internal Use Only
Chapter 5Troubleshooting Host Devices59
60Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CHAPTER
6
Troubleshooting Sun StorEdge FC
Switch-8 and Switch-16 Devices
This chapter describes how to troubleshoot the switch components associated with a
Sun StorEdge 3900 or 6900 series system.
This chapter contains the following sections:
■ “Sun StorEdge Network FC Switch-8 and Switch-16 Switch Description” on
page 61
■ “Switch Event Grid” on page 62
■ “setupswitch Exit Values” on page 68
■ “Replacing the Master Midplane” on page 68
Sun StorEdge Network FC Switch-8 and
Switch-16 Switch Description
The Sun StorEdge network FC switch-8 and switch-16 switches provide cable
consolidation and increased connectivity for the internal data interconnection
infrastructure.
The switches are paired to provide redundancy. Two switches are used in each Sun
StorEdge 3900 series, and four switches are used in each Sun StorEdge 6900 series.
Each Sun StorEdge network FC switch-8 and switch-16 switch is connected by way
of an Ethernet to the service network for management and service from the Storage
Service Processor.
61
These switches can be monitored through the SANSurfer GUI, which is available on
the Storage Service Processor. You configure and modify the switches using the
Configuration Utilities. Do not configure or modify the switches using any method other
than the
SUNWsecfg
tools.
▼ To Diagnose and Troubleshoot Switch Hardware
1. To diagnose and troubleshoot the switch hardware, begin by running the
SUNWsecfg checkswitch utility.
2. For detailed troubleshooting procedures, refer to the Sun StorEdge SAN FieldTroubleshooting Guide, Release 3.0.
The Sun StorEdge SAN Field Troubleshooting Guide, Release 3.0 describes how to
diagnose and troubleshoot the switch hardware. The scope of this document
includes the Sun StorEdge network FC switch-8 and switch-16 switch and the
interconnections (HBA, GBIC, cables) on either side of the switch. In addition, the
document provides examples of fault isolation and includes a Brocade switch
appendix.
Switch Event Grid
The Storage Automated Diagnostic Environment Event Grid enables you to sort
switch events by component, category, or event type. The Storage Automated
Diagnostic Environment GUI displays an event grid that describes the severity of the
event, whether action is required, a description of the event, and the recommended
action. Refer to the Storage Automated Diagnostic Environment User’s Guide for more
information.
▼ Using the Switch Event Grid
1. From the Storage Automated Diagnostic Environment Help menu, click the Event
Grid link.
2. Select the criteria from the Storage Automated Diagnostic Environment event
grid, like the one shown in
62Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
FIGURE 6-1.
FIGURE 6-1 Switch Event Grid
Chapter 6Troubleshooting Sun StorEdge FC Switch-8 and Switch-16 Devices63
For Internal Use Only
TABLE 6-1 lists the switch events.
TABLE6-1Storage Automated Diagnostic Environment Event Grid for Switches
new switch called ras
d2-swb1
(ip=xxx.0.0.41)
10002000007a609
Discovery events
occur the very first
time the agent
probes a storage
device. It creates a
detailed description
of the device
monitored and
sends it using any
active notifier
(NetConnect,
Email).
switchenclosureLocationChan
ge
Location of switch
rasd2-swb0 (ip
xxx.0.0.40) was
changed
66Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
TABLE6-1Storage Automated Diagnostic Environment Event Grid for Switches (Continued)
in SWITCH diag185
(ip=xxx.20.67.185)is
now Available (statusstate changed from
OFFLINE to ONLINE)
switchportStateChange-RedY[
switchenclosureStatistics[
Info/Action ] port.1
in SWITCH diag185
(ip=xxx.20.67.185)
is now Not-Available
(status state changed
from ONLINE to
OFFLINE)
Info ] Statistics
about switch d2-
swb1
(ipxxx.0.0.41)
10002000007a609
Port on switch is
now available.
Information: A port
on the switch has
logged out of the
Fabric and has gone
offline.
Recommended
action:
1. Verify cables,
GBICs, and
connections
along the Fibre
Channel path.
2. Check Storage
Automated
Diagnostic
Environment
SAN Topology
GUI to identify
failing segment
of the data path.
3. Verify the correct
FC switch
configuration.
Port Statistics
Chapter 6Troubleshooting Sun StorEdge FC Switch-8 and Switch-16 Devices67
For Internal Use Only
Replacing the Master Midplane
Follow this procedure when replacing the master midplane in a Sun StorEdge
network FC switch-8 or switch-16 switch or a Brocade Silkworm switch. This
procedure is detailed in the Storage Automated Diagnostic Environment User’s Guide.
▼ To Replace the Master Midplane
1. Choose Maintenance --> General Maintenance -- > Maintain Devices.
Refer to Chapter 3 of the Storage Automated Diagnostic Environment User’s Guide.
2. In the Maintain Devices window, delete the device that is to be replaced.
3. Choose Maintenance -- > General Maintenance -- > Discovery.
4. In the Device Discovery window, rediscover the device.
5. Choose Maintenance -- > Topology Maintenance -- > Topology Snapshot.
a. Select the host that monitors the replaced FRU.
b. Click Create and Retrieve Selected Topologies.
c. Click Merge and Push Master Topology.
Conclusion
Any time a master midplane is replaced, you must rediscover the device using the
procedure described above. This is especially important when the Storage Service
Processor is replaced as a FRU, whether the Storage Service Processor is the master
or the slave.
68Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CHAPTER
7
Troubleshooting Virtualization
Engine Devices
This chapter describes how to troubleshoot the virtualization engine component of a
Sun StorEdge 6900 series system.
This chapter contains the following sections:
■ “Virtualization Engine Description” on page 69
■ “Translating Host Device Names” on page 78
■ “Sun StorEdge 6900 Series Multipathing Example” on page 89
■ “Virtualization Engine Event Grid” on page 95
Virtualization Engine Description
The virtualization engine supports the multipathing functionality of the Sun
StorEdge T3+ array. Each virtualization engine has physical access to all underlying
Sun StorEdge T3+ arrays and controls access to half of the Sun StorEdge T3+ arrays.
The virtualization engine has the ability to assume control of all arrays in the event
of component failure. The configuration is maintained between virtualization engine
pairs through redundant T Port connections by way of a pair of Sun StorEdge
network FC switch-8 or switch-16 switches.
69
Virtualization Engine Diagnostics
The virtualization engine monitors the following components:
■ Virtualization engine router
■ Sun StorEdge T3+ array
■ Cabling among the router and storage
Service Request Numbers
The service request numbers are used to inform the user of storage subsystem
activities.
Service and Diagnostic Codes
The virtualization engine’s service and diagnostic codes inform the user of
subsystem activities. The codes are presented as a LED readout. See Appendix
the table of codes and actions to take. In some cases, you might not be able to receive
Service Request Numbers (SRNs) because of communication errors. If this occurs,
you must read the virtualization engine LEDs to determine the problem.
A for
▼ To Retrieve Service Information
You can retrieve service information in two ways:
■ CLI Interface
■ Error Log Analysis Commands
Both of these methods are described in the following sections.
CLI Interface
The SLIC daemon, which runs on the Storage Service Processor, communicates with
the virtualization engine. The SLIC daemon periodically polls the virtualization
engine for all subsystem errors and for topology changes. It then passes this
information in the form of an SRN to the Error Log file.
70Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
To Display Log Files and Retrieve SRNs
▼
Use the /opt/svengine/sduc/sreadlog command to display log files and
retrieve the Service Request Numbers (SRN) for errors that need action. Data is
returned in the following format:
TimeStampTime and date when error occurred
nnnThe name of the virtualization engine pair (v1 or v2)
TxxxxxThe LUN where the error occurred.
Note: Txxxxx can represent a physical or a logical LUN.
uuuuuuuuThe unique ID of the drive or the virtualization engine routerSRN=mmmmmThe SRN defined in numerical order
nnnv1 (virtualization engine pair v1)
uuuuuuuu29000060-220041F9 (v1a, obtained by checking the virtualization
engine map from the SEcfg utility)
SRN=mmmmmSRN=70030: SAN Configuration Changed
(Refer to Appendix A for codes.)
▼ To Clear the Log
● Use the /opt/svengine/sduc/sclrlog command.
Virtualization Engine LEDs
TABLE 7-1 describes the LEDs on the back of the virtualization engine..
TABLE7-1Virtualization Engine LEDs
LEDColorStateDescription
PowerGreenSolid onThe virtualization engine is powered
on
1
Status
Green• Solid on
• Normal operating mode
• Blink Service
Code
FaultAmberSerious problemDecipher the blinking of the Status
1 The Status LED will blink a service code when the Fault LED is Solid on.
72Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
• Number of blinks to indicate a
decimal number
LED to determine the service code.
Once you have determined the
service code, look up the decimal
number of the service code in
Appendix A.
Power LED Codes
The virtualization engine LEDs are shown in FIGURE 7-1.
FIGURE 7-1 Virtualization Engine Front Panel LEDs
Interpreting LED Service and Diagnostic Codes
The Status LED communicates the status of the virtualization engine in decimal
numbers. Each decimal number is represented by number of blinks, followed by a
medium duration (two seconds) of LED off.
descriptions.
TABLE7-2LED Service and Diagnostic Codes
TABLE 7-2 lists the status LED code
0Fast blink
1LED blinks once
2LED blinks twice with one short duration (one second) between blinks
3LED blinks three times with one short duration (one second) between blinks
...
10LED blinks ten times with one short duration (one second) between blinks
The blink code repeats continuously, with a four-second off interval between code
sequences.
The back panel of the virtualization engine contains the Sun StorEdge network FC
switch-8 or switch-16 switches and a socket for the AC power input, and various
data ports and LEDs.
Ethernet Port LEDs
The Ethernet port LEDs indicate the speed, activity, and validity of the link, shown
TABLE 7-3.
in
TABLE7-3Speed, Activity, and Validity of the Link
LEDColorStateDescription
SpeedAmberSolid On
The link is 100Base-TX
Off
Link ActivityGreenSolid On
Blink
The link is 10base-T
A valid link is established
Normal operations, including data
activity
74Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
Fibre Channel Link Error Status Report
The virtualization engine’s host-side and device-side interfaces provide statistical
data for the counts listed in
TABLE7-4Virtualization Engine Statistical Data
Count TypeDescription
Link Failure CountThe number of times the virtualization engine’s frame manager
detects a non-operational state or other failure of N_Port
initialization protocol.
Loss of
Synchronization
Count
Loss of Signal CountThe number of times that the virtualization engine’s frame manager
Primitive Sequence
Protocol Error
Invalid Transmission
Word
Invalid CRC CountThe number of times that the virtualization engine receives frames
The number of times that the virtualization engine detects a loss in
synchronization.
detects a loss of signal.
The number of times that the virtualization engine’s frame manager
detects N_Port protocol errors.
The number of times that the virtualization engine’s 8b/10b
decoder does not detect a valid 10-bit code.
with a bad CRC and a valid EOF. A valid EOF includes EOFn, EOFt,
or EOFdti.
The Storage Automated Diagnostic Environment, which runs on the Storage Service
Processor, monitors the Fibre Channel link status of the virtualization engine. The
virtualization engine must be power-cycled to reset the counters. Therefore, you
should manually check the accumulation of errors between a fixed period of time. To
check the status manually, follow these steps:
1. Use the svstat command to take a reading, as shown in
A Status report for the host-side and device-side ports is displayed.
2. Within the next few minutes, take another reading.
The number of new errors that occurred within that time frame represents the
number of link errors.
CODE EXAMPLE 7-1.
Note – If the t3ofdg(1M) is running while you perform these steps, the following
error message is displayed:
Daemon error: check the SLIC router.
76Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
CODE EXAMPLE 7-1Fibre Channel Link Error Status Example
# /opt/svengine/sduc/svstat -d v1
I00001 Host Side FC Vital Statistics:
Link Failure Count 0
Loss of Sync Count 0
Loss of Signal Count 0
Protocol Error Count 0
Invalid Word Count 8
Invalid CRC Count 0
I00001 Device Side FC Vital Statistics:
Link Failure Count 0
Loss of Sync Count 0
Loss of Signal Count 0
Protocol Error Count 0
Invalid Word Count 139
Invalid CRC Count 0
I00002 Host Side FC Vital Statistics:
Link Failure Count 0
Loss of Sync Count 0
Loss of Signal Count 0
Protocol Error Count 0
Invalid Word Count 11
Invalid CRC Count 0
I00002 Device Side FC Vital Statistics:
Link Failure Count 0
Loss of Sync Count 0
Loss of Signal Count 0
Protocol Error Count 0
Invalid Word Count 135
Invalid CRC Count 0
diag.xxxxx.xxx.com: root#
Note – v1 represents the first virtualization engine pair
Vendor: SUN
Product: SESS01
Revision: 080E
Removable media: no
Device type: 0
From this screen, note that the VLUN number is 62 57 33 4b 30 30 31 48 , beginning
with the 5th pair of numbers on the 3rd line, up to and including the 12 pair.
Initiator UIDVE Host Online Revision Number of SLIC Zones
-------------------------------------------------------------------------I00001 2900006022004195 v1a Yes 08.14 0
I00002 2900006022004186 v1b Yes 08.14 0
*****
ZONE SUMMARY
Zone Name HBA WWN Initiator Online Number of VLUNs
2. You can optionally establish a telnet connection to the virtualization engine and
run the runsecfg utility to poll a live snapshot of the virtualization engine map.
Refer to “To Replace a Failed Virtualization Engine” on page 84 for telnet
instructions.
Determining the virtualization engine pairs on the system .........
MAIN MENU - SUN StorEdge 6910 SYSTEM CONFIGURATION TOOL
1) T3+ Configuration Utility
2) Switch Configuration Utility
3) Virtualization Engine Configuration Utility
4) View Logs
5) View Errors
6) Exit
Select option above:> 3
VIRTUALIZATION ENGINE MAIN MENU
1) Manage VLUNs
2) Manage Virtualization Engine Zones
3) Manage Configuration Files
4) Manage Virtualization Engine Hosts
5) Help
6) Return
Select option above:> 3
MANAGE CONFIGURATION FILES MENU
1) Display Virtualization Engine Map
2) Save Virtualization Engine Map
3) Verify Virtualization Engine Map
4) Help
5) Return
Select configuration option above:> 1
Do you want to poll the live system (time consuming) or view the file [l|f]: l
From the virtualization engine map output, you can match the VLUN serial number
to the VLUN name (VDRV000), the disk pool (t3b00) and the MP drive target
(T49152). This information can also help you find the controller serial number
(60020F2000006DFA), which you need to perform Sun StorEdge T3+ array LUN
failback commands.
82Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
▼
To Failback the Virtualization Engine
In the event of a Sun StorEdge T3+ array LUN failover, use the following procedure
to fail the LUN back to its original controller.
-dVirtualization engine pair on which to run the command
-jController serial number, which corresponds to the Sun
StorEdge T3+ array WWN of the affected partner pair
The failback command will always be performed on the controller serial number,
regardless by which controller the LUN actually is currently owned (the Master or
Alt-Master). All VLUNS are affected by a failover and failback of the underlying
physical LUN.
The controller serial number is the system WWN for the Sun StorEdge T3+ array. In
the above example, the master Sun StorEdge T3+ array WWN is
50020F2300006DFA, and the number used in the failback command is
60020F2000006DFA.
2. The SLIC daemon must be running for the mpdrive failback command to
work. Ensure that the SLIC daemon is running by using the command found in
CODE EXAMPLE 7-3.
If no SLIC processes are running, you can start them manually using the
SUNWsecfg scripts, which are located in the /opt/SUNWsecfg/bin/startslicd
-n v1 directory.
CODE EXAMPLE 7-3slicd Output Example
# ps -ef | grep slic
root 6299 6295 0 Jan 04 ? 0:00 ./slicd
root 6296 6295 0 Jan 04 ? 0:02 ./slicd
root 6295 1 0 Jan 04 ? 0:01 ./slicd
root 6357 6295 0 Jan 04 ? 0:00 ./slicd
root 6362 6295 0 Jan 04 ? 0:03 ./slicd