Sun Microsystems 3900, 6900 User Manual

Sun StorEdge™3900 and 6900
Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. 650-960-1300
Part No. 816-4290-11 March 2002, Revision A
Send comments about this document to: docfeedback@sun.com
Copyright 2002Sun Microsystems, Inc.,4150 NetworkCircle, SantaClara, CA95054 U.S.A.All rightsreserved. This product ordocument isdistributed underlicenses restrictingits use,copying, distribution,and decompilation.No partof thisproduct or
document may be reproduced inany formby anymeans withoutprior writtenauthorization ofSun andits licensors,if any. Third-party software,including fonttechnology,is copyrighted and licensed fromSun suppliers.
Parts of the product maybe derivedfrom BerkeleyBSD systems,licensed fromthe University of California. UNIX is a registered trademarkin the U.S. and other countries, exclusively licensed through X/OpenCompany,Ltd.
Sun, Sun Microsystems,the Sunlogo, AnswerBook2,Sun StorEdge,StorTools,docs.sun.com, SunEnterprise, SunFire, SunOS, Netra, and Solaris are trademarks,registered trademarks, or service marks of Sun Microsystems, Inc.in theU.S. andother countries.All SPARC trademarks are usedunder licenseand aretrademarks orregisteredtrademarks ofSPARCInternational, Inc.in theU.S. and other countries. Productsbearing SPARC trademarksare basedupon anarchitecturedeveloped bySun Microsystems,Inc.
The OPEN LOOK and Sun™ Graphical User Interface was developed bySun Microsystems,Inc. forits usersand licensees.Sun acknowledges the pioneering effortsof Xeroxin researchingand developing the concept of visual orgraphical userinterfaces forthe computerindustry.Sun holds a non-exclusive license fromXerox tothe XeroxGraphical User Interface, which license also covers Sun’s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun’s written license agreements.
Federal Acquisitions: CommercialSoftware—Government UsersSubject toStandard License TermsandConditions. DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES,
INCLUDING ANY IMPLIED WARRANTY OFMERCHANTABILITY, FITNESS FOR APARTICULARPURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Copyright 2002 Sun Microsystems, Inc.,4150 NetworkCircle, SantaClara, CA95054 Etats-Unis.Tousdroitsréservés. Ce produit oudocument estdistribué avecdes licencesqui enrestreignent l’utilisation, la copie, la distribution, et la décompilation. Aucune
partie de ce produit oudocument nepeut êtrereproduitesous aucuneforme, parquelque moyenque cesoit, sansl’autorisation préalableet écrite de Sun et de ses bailleurs de licence, s’il y en a.Le logicieldétenu pardes tiers,et quicomprend latechnologie relativeaux policesde caractères,est protégépar uncopyright etlicencié pardes fournisseursde Sun.
Des parties de ce produitpourront êtredérivées des systèmes Berkeley BSD licenciés par l’Université de Californie. UNIX est une marque déposée aux Etats-Unis et dans d’autres payset licenciéeexclusivement parX/Open Company, Ltd.
Sun, Sun Microsystems,le logoSun, AnswerBook2,Sun StorEdge,StorTools,docs.sun.com, SunEnterprise, SunFire, SunOS, Netra, et Solaris sont des marquesde fabriqueou desmarques déposées,ou marquesde service, de Sun Microsystems,Inc. auxEtats-Unis etdans d’autrespays. Toutes lesmarques SPARC sontutilisées souslicence etsont desmarques de fabrique ou des marques déposées de SPARCInternational, Inc. aux Etats-Unis et dans d’autres pays.Les produitsportant lesmarques SPARC sont basés sur unearchitecture développéepar Sun Microsystems,Inc.
L’interfaced’utilisation graphique OPEN LOOK et Sun™ a été développéepar SunMicrosystems, Inc.pour sesutilisateurs etlicenciés. Sun reconnaîtles effortsde pionniersde Xeroxpour la rechercheet ledéveloppement duconcept desinterfaces d’utilisationvisuelle ougraphique pour l’industrie de l’informatique. Sun détient une licence non exclusive deXerox surl’interface d’utilisationgraphique Xerox,cette licence couvrant également les licenciés de Sun qui mettent en place l’interfaced’utilisation graphiqueOPEN LOOKet quien outrese conformentaux licences écrites de Sun.
LA DOCUMENTATION EST FOURNIE “EN L’ETAT” ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES OU TACITES SONT FORMELLEMENTEXCLUES, DANSLA MESUREAUTORISEE PAR LALOI APPLICABLE,Y COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L’APTITUDE A UNE UTILISATION PARTICULIERE OU A L’ABSENCE DE CONTREFAÇON.
Please
Recycle

Contents

1. Introduction 1
Predictive Failure Analysis Capabilities 2
2. General Troubleshooting Procedures 3
Troubleshooting Overview Tasks 3 Multipathing Options in the Sun StorEdge 6900 Series 7
Alternatives to Sun StorEdge Traffic Manager 8
To Quiesce the I/O 8 To Unconfigure the c2 Path 8 To Suspend the I/O 10 To Return the Path to Production 10 To View the VxDisk Properties 11 To Quiesce the I/O on the A3/B3 Link 13 To Suspend the I/O on the A3/B3 Link 13 To Return the Path to Production 14
Fibre Channel Links 15 Fibre Channel Link Diagrams 16
Host Side Troubleshooting 18 Storage Service Processor Side Troubleshooting 18
For Internal Use Only
Contents iii
Command Line Test Examples 19
qlctest(1M) 19 switchtest(1M) 20
Storage Automated Diagnostic Environment Event Grid 21
To Customize an Event Report 21
3. Troubleshooting the Fibre Channel Links 23
A1/B1 Fibre Channel (FC) Link 23
To Verify the Data Host 25
FRU Tests Available for A1/B1 FC Link Segment 26
To Isolate the A1/B1 FC Link 28
A2/B2 Fibre Channel (FC) Link 29
To Verify the Host Side 31 To Verify the A2/B2 FC Link 33 To Isolate the A2/B2 FC Link 33
A3/B3 Fibre Channel (FC) Link 35
To Verify the Host Side 37 To Verify the Storage Service Processor 38
FRU Tests Available for the A3/B3 FC Link Segment 38
To Isolate the A3/B3 FC Link 39
A4/B4 Fibre Channel (FC) Link 40
To Verify the Data Host 42
Sun StorEdge 3900 Series 42 Sun StorEdge 6900 Series 42
FRU tests available for the A4/B4 FC Link Segment 44
To Isolate the A4/B4 FC Link 44
4. Configuration Settings 47
Verifying Configuration Settings 47
For Internal Use Only
Contents iv
To Verify Configuration Settings 47 To Clear the Lock File 50
5. Troubleshooting Host Devices 53
Host Event Grid 53
Using the Host Event Grid 53
Replacing the Master, Alternate Master, and Slave Monitoring Host 57
To Replace the Master Host 57 To Replace the Alternate Master or Slave Monitoring Host 58
Conclusion 59
6. Troubleshooting Sun StorEdge FC Switch-8 and Switch-16 Devices 61
Sun StorEdge Network FC Switch-8 and Switch-16 Switch Description 61
To Diagnose and Troubleshoot Switch Hardware 62
Switch Event Grid 62
Using the Switch Event Grid 62
Replacing the Master Midplane 68
To Replace the Master Midplane 68
Conclusion 68
7. Troubleshooting Virtualization Engine Devices 69
Virtualization Engine Description 69 Virtualization Engine Diagnostics 70
Service Request Numbers 70 Service and Diagnostic Codes 70
To Retrieve Service Information 70
CLI Interface 70
To Display Log Files and Retrieve SRNs 71 To Clear the Log 72
For Internal Use Only
Contents v
Virtualization Engine LEDs 72
Power LED Codes 73 Interpreting LED Service and Diagnostic Codes 73
Back Panel Features 74 Ethernet Port LEDs 74 Fibre Channel Link Error Status Report 75
To Check Fibre Channel Link Error Status Manually 76
Translating Host Device Names 78
To Display the VLUN Serial Number 79
Devices That Are Not Sun StorEdge Traffic Manager-Enabled 79 Sun StorEdge Traffic Manager-Enabled Devices 80
To View the Virtualization Engine Map 81 To Failback the Virtualization Engine 83 To Replace a Failed Virtualization Engine 84
To Manually Clear the SAN Database 86 To Reset the SAN Database on Both Virtualization Engines 86 To Reset the SAN Database on a Single Virtualization Engine 86
Stopping and Restarting the SLIC Daemon 87
To Restart the SLIC Daemon 87
Sun StorEdge 6900 Series Multipathing Example 89
One Sun StorEdge T3+ array partner pair with 1 500GB RAID 5 LUN per
brick (2 LUNs total) 89
Virtualization Engine Event Grid 95
Using the Virtualization Engine Event Grid 95
8. Troubleshooting the Sun StorEdge T3+ Array Devices 99
Explorer Data Collection Utility 99
To Install Explorer Data Collection Utility on the Storage Service
Processor 99
vi Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Troubleshooting the T1/T2 Data Path 102
Notes 102 T1/T2 Notification Events 103 Sun StorEdge T3+ Array Storage Service Processor Verification 106 T1/T2 FRU Tests Available 107 Notes 108 T1/T2 Isolation Procedures 108
Sun StorEdge T3+ Array Event Grid 109
Using the Sun StorEdge T3+ Array Event Grid 109
Replacing the Master Midplane 122
To Replace the Master Midplane 122
Conclusion 122
9. Troubleshooting Ethernet Hubs 123
setupswitch Exit Values 141
For Internal Use Only
Contents vii
viii Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

List of Figures

FIGURE 2-1 Sun StorEdge 3900 Series Fibre Channel Link Diagram 16 FIGURE 2-2 Sun StorEdge 6900 Series Fibre Channel Link Diagram 17 FIGURE 3-1 Data Host Notification of Intermittent Problems 23 FIGURE 3-2 Data Host Notification of Severe Link Error 24 FIGURE 3-3 Storage Service Processor Notification 24 FIGURE 3-4 A2/B2 FC Link Host Side Event 29 FIGURE 3-5 A2/B2 FC Link Storage Service Processor Side Event 30 FIGURE 3-6 A3/B3 FC Link Host-Side Event 35 FIGURE 3-7 A3/B3 FC Link Storage Service Processor-Side Event 36 FIGURE 3-8 A3/B3 FC Link Storage Service Processor-Side Event 36 FIGURE 3-9 A4/B4 FC Link Data Host Notification 40 FIGURE 3-10 Storage Service Processor Notification 41 FIGURE 5-1 Host Event Grid 54 FIGURE 6-1 Switch Event Grid 63 FIGURE 7-1 Virtualization Engine Front Panel LEDs 73 FIGURE 7-2 Sun StorEdge 6900 Series Logical View 90 FIGURE 7-3 Primary Data Paths to the Alternate Master 91 FIGURE 7-4 Primary Data Paths to the Master Sun StorEdge T3+ Array 92 FIGURE 7-5 Path Failure—Before the Second Tier of Switches 93
List of Figures ix
FIGURE 7-6 Path Failure —I/O Routed through Both HBAs 94 FIGURE 7-7 Virtualization Engine Event Grid 95 FIGURE 8-1 Storage Service Processor Event 103 FIGURE 8-2 Virtualization Engine Alert 105 FIGURE 8-3 Manage Configuration Files Menu 106 FIGURE 8-4 Example Link Test Text Output from the Storage Automated Diagnostic Environment 107 FIGURE 8-5 Sun StorEdge T3+ array Event Grid 109
List of Figures x

Preface

The Sun StorEdge 3900 and 6900 Series Troubleshooting Guide provides guidelines for isolating problems in supported configurations of the Sun StorEdge 6900 series. For detailed configuration information, refer to the Sun StorEdge 3900 and 6900 Series Reference Manual.
The scope of this troubleshooting guide is limited to information pertaining to the components of the Sun StorEdge 3900 and 6900 series, including the Storage Service Processor and the virtualization engines in the Sun StorEdge 6900 series. This guide is written for Sun personnel who have been fully trained on all the components in the configuration.
TM
3900 and

How This Book Is Organized

This book contains the following topics:
Chapter 1 introduces the Sun StorEdge 3900 and 6900 series storage subsystems. Chapter 2 offers general troubleshooting guidelines, such as quiescing the I/O, and
tools you can use to isolate and troubleshoot problems.
Chapter 3 provides Fibre Channel link troubleshooting procedures. Chapter 4 presents information about configuration settings, specific to the Sun
StorEdge 3900 and 6900 series. It also provides a procedure for how to clear the lock file.
Chapter 5 provides information on host device troubleshooting. Chapter 6 provides information on Sun StorEdge network FC switch-8 and switch-
16 switch device troubleshooting.
xi
Chapter 7 provides detailed information for troubleshooting the virtualization engines.
Chapter 8 describes how to troubleshoot the Sun StorEdge T3+ array devices. Also included in this chapter is information about the Explorer Data Collection Utility.
Chapter 9 discusses ethernet hub troubleshooting. Information associated with the 3COM Ethernet hubs is limited in this guide, however, as this is third-party information.
Appendix A provides virtualization engine references, including SRN and SNMP Reference, an SRN/SNMP single point of failure table, and port communication and service code tables.
Appendix B provides a list of SUNWsecfg Error Messages and recommendations for corrective action.

Using UNIX Commands

This document may not contain information on basic UNIX®commands and procedures such as shutting down the system, booting the system, and configuring devices.
See one or more of the following for this information:
Solaris Handbook for Sun Peripherals
AnswerBook2™ online documentation for the Solaris™ operating environment
Other software documentation that you received with your system
xii Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

Typographic Conventions

Typeface Meaning Examples
AaBbCc123 The names of commands, files,
and directories; on-screen computer output
AaBbCc123
AaBbCc123 Book titles, new words or terms,
What you type, when contrasted with on-screen computer output
words to be emphasized
Command-line variable; replace with a real name or value
Edit your.login file. Use ls -a to list all files.
% You have mail. %
su
Password:
Read Chapter 6 in the User’s Guide. These are called class options. You must be superuser to do this.
To delete a file, type rm filename.

Shell Prompts

Shell Prompt
C shell machine_name% C shell superuser machine_name# Bourne shell and Korn shell $ Bourne shell and Korn shell superuser #
Preface xiii

Related Documentation

Product Title Part Number
Late-breaking News • Sun StorEdge 3900 and 6900 Series Release Notes 816-3247 Sun StorEdge 3900 and 6900
series hardware information
Sun StorEdge T3 and T3+ array
Diagnostics • Storage Automated Diagnostics Environment User’s Guide 816-3142 Sun StorEdge network FC
switch-8 and switch-16
SANbox switch management using SANsurfer
Expansion cabinet • Sun StorEdge Expansion Cabinet Installation and Service
Storage server processor • Netra X1 Server User’s Guide
• Sun StorEdge 3900 and 6900 Series Site Preparation Guide
• Sun StorEdge 3900 and 6900 Series Regulatory and Safety Compliance Manual
• Sun StorEdge 3900 and 6900 Series Hardware Installation and Service Manual
• Sun StorEdge T3 and T3+ Array Start Here
• Sun StorEdge T3 and T3+ Array Installation, Operation, and Service Manual
• Sun StorEdge T3 and T3+ Array Administrator’s Guide
• Sun StorEdge T3 and T3+ Array Configuration Guide
• Sun StorEdge T3 and T3+ Array Site Preparation Guide
• Sun StorEdge T3 and T3+ Field Service Manual
• Sun StorEdge T3 and T3+ Array Release Notes
• Sun StorEdge Network FC Switch-8 and Switch-16 Release Notes
• Sun StorEdge Network FC Switch-8 and Switch-16 Installation and Configuration Guide
• Sun StorEdge Network FC Switch-8 and Switch-16 Best Practices Manual
• Sun StorEdge Network FC Switch-8 and Switch-16 Operations Guide
• Sun StorEdge Network FC Switch-8 and Switch-16 Field Troubleshooting Guide
• SANbox 8/16 Segmented Loop Switch Management User ’s Manual
• SANbox-8 Segmented Loop Fibre Channel Switch Installer’s/ User’s Manual
• SANbox-16 Segmented Loop Fibre Channel Switch Installer’s/ User’s Manual
Manual
• Netra X1 Server Hard Disk Drive Installation Guide
816-3242 816-3243
816-3244
816-0772 816-0773 816-0776 816-0777 816-0778 816-0779 816-0781
816-0842 816-0830
816-2688
816-1986
816-1701
875-3060
875-1881
875-3059
805-3067
806-5980 806-7670
xiv Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

Accessing Sun Documentation Online

A broad selection of Sun system documentation is located at:
http://www.sun.com/products-n-solutions/hardware/docs
A complete set of Solaris documentation and many other titles are located at:
http://docs.sun.com

Sun Welcomes Your Comments

Sun is interested in improving its documentation and welcomes your comments and suggestions. You can email your comments to Sun at:
docfeedback@sun.com
Please include the part number (816-4290-10) of your document in the subject line of your email.
Preface xv
xvi Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CHAPTER
1

Introduction

The Sun StorEdge 3900 and 6900 series storage subsystems are complete preconfigured storage solutions. The configurations for each of the storage subsystems are shown in
TABLE1-1
Series System
Sun StorEdge 3900 series
Sun StorEdge 3910 system
TABLE 1-1.
Sun StorEdge Fibre Channel
Switch Supported
Two 8-port
switches
Sun StorEdge T3+
Array Partner Groups
Supported
1to4
Additional Array
Partner Groups Supported with
Optional Additional
Expansion Cabinet
Not applicable
Sun StorEdge 6900 series
Sun StorEdge 3960 system
Sun StorEdge 6910 system
Sun StorEdge 6960 system
Two 16-port
switches
Two 8-port
switches
Two 16-port
switches
1to4
1to3
1to3
1to5
1to4
1

Predictive Failure Analysis Capabilities

The Storage Automated Diagnostic Environment software provides the health and monitoring functions for the Sun StorEdge 3900 and 6900 series systems. This software provides the following predictive failure analysis (PFA) capabilities.
FC links—Fibre Channel links are monitored at all end points using the link FC-
ELS link counters. When link errors surpass the threshold values, an alert is sent. This enables Sun personnel to replace components that are experiencing high transient fault levels before a hard fault occurs.
Enclosure status—Many devices, like the Sun StorEdge network FC switch-8 and
switch-16 switch and the Sun StorEdge T3+ array, will cause the Storage Automated Diagnostic Environment alerts to be sent if the temperature thresholds are exceeded. This enables Sun-trained personnel to address the problem before the component and enclosure fails.
SPOF notification—Storage Automated Diagnostic Environment notification for
path failures and failovers (that is, Sun StorEdge Traffic Manager software failover) can be considered PFA, since Sun-trained personnel are notified and can repair the primary path. This eliminates the time of exposure to single points of failure and helps to preserve customer availability during the repair process.
PFA is not always effective in detecting or isolating failures. The remainder of this document provides guidelines that can be used to troubleshoot problems that occur in supported components of the Sun StorEdge 3900 and 6900 series.
2 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CHAPTER
2

General Troubleshooting Procedures

This chapter contains the following sections:
“Troubleshooting Overview Tasks” on page 3
“Multipathing Options in the Sun StorEdge 6900 Series” on page 7
“Fibre Channel Links” on page 15
“Storage Automated Diagnostic Environment Event Grid” on page 21

Troubleshooting Overview Tasks

This section lists the high-level steps to isolate and troubleshoot problems in the Sun StorEdge 3900 and 6900 series. It offers a methodical approach and lists the tools and resources available at each step.
Note – A single problem can cause various errors throughout the SAN. A good
practice is to begin by investigating the devices that have experienced “Loss of Communication” events in the Storage Automated Diagnostic Environment. These errors usually indicate more serious problems.
A “Loss of Communication” error on a switch, for example, could cause multiple ports and HBAs to go offline. Concentrating on the switch and fixing that failure can help bring the ports and HBAs back online.
3

1. Discover the error by checking one or more of the following messages or files:

Storage Automated Diagnostic Environment alerts or email messages
/var/adm/messages
Sun StorEdge T3+ array syslog file
Storage Service Processor messages
/var/adm/messages.t3 messages
/var/adm/log/SEcfglog file

2. Determine the extent of the problem by using one or more of the following methods:

Storage Automated Diagnostic Environment Topology view
Storage Automated Diagnostic Environment Revision Checking (manual patch or
package, to check whether the package or patch is installed)
Verify the functionality using one of the following:
checkdefaultconfig(1M)
checkt3config(1M)
cfgadm -al output
luxadm(1M) output
Check the multipathing status using the Sun StorEdge Traffic Manager software
or VxDMP.

3. Check the status of a Sun StorEdge T3+ array by using one or more of the following methods:

Storage Automated Diagnostic Environment device monitoring reports
Run the SEcfg script, which displays and shows the Sun StorEdge T3+ array
configuration
Manually open a telnet session to the Sun StorEdge T3+ array
luxadm(1M) display output
LED status on the Sun StorEdge T3+ array
Explorer Data Collection Utility output (located on the Storage Service Processor)
4 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
4. Check the status of the Sun StorEdge FC network switch-8 and switch-16 switches using the following tools:
Storage Automated Diagnostic Environment device monitoring reports
Run the SEcfg script, which displays and shows the Sun StorEdge T3+ array
configuration
LED Status (online/offline, POST error codes found in the Sun StorEdge network
FC switch-8 and switch-16 switch Installation and Configuration Guide)
Explorer Data Collection Utility output (located on the Storage Service Processor)
SANsurfer GUI
Note – To run the SANsurfer GUI from the Storage Service Processor, you must
export X-Display.)

5. Check the status of the virtualization engine using one or more of the following methods:

Storage Automated Diagnostic Environment device monitoring reports
Run the SEcfg script, which displays and shows the virtualization engine
Refer to the LED status blink codes in Chapter 7.

6. Quiesce the I/O along the path to be tested as follows:

For installations using VERITAS VxDMP, disable vxdmpadm
For installations using the Sun StorEdge Traffic Manager software, unconfigure
the Fabric device.
Refer to “To Quiesce the I/O” on page 8
Halt the application.

7. Test and isolate the FRUs using the following tools:

Storage Automated Diagnostic Environment diagnostic tests (this might require
the use of a loopback cable for isolation)
Sun StorEdge T3+ array tests, including t3test(1M), t3ofdg(1M), and
t3volverify(1M), which can be found in the Storage Automated Diagnostic
Environment User’s Guide.
Note – These tests isolate the problem to a FRU that must be replaced. Follow the
instructions in the Sun StorEdge 3900 and 6900 Series Reference Manual and the Sun StorEdge 3900 and 6900 Installation and Service Manual for proper FRU replacement procedures.
Chapter 2 General Troubleshooting Procedures 5
For Internal Use Only

8. Verify the fix using the following tools:

Storage Automated Diagnostic Environment GUI Topology View and Diagnostic
Tests
/var/adm/messages on the data host

9. Return the path to service by using one of the following methods:

Multipathing software
Restarting the application
6 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

Multipathing Options in the Sun StorEdge 6900 Series

Using the virtualization engines presents several challenges in how multipathing is handled in the Sun StorEdge 6900 series.
Unlike Sun StorEdge T3+ array and Sun StorEdge network FC switch-8 and switch­16 switch installations, which present primary and secondary pathing options, the virtualization engines present only primary pathing options to the data host. The virtualization engines handle all failover and failback operations and mask those operations from the multipathing software on the data host.
The following example illustrates a Sun StorEdge Traffic Manager problem on a Sun StorEdge 6900 series system.
# luxadm display
/dev/rdsk/c6t29000060220041F96257354230303052d0s2 DEVICE PROPERTIES for disk: /dev/rdsk/ c6t29000060220041F96257354230303052d0s2 Status(Port A): O.K. Status(Port B): O.K. Vendor: SUN Product ID: SESS01 WWN(Node): 2a000060220041f4 WWN(Port A): 2b000060220041f4 WWN(Port B): 2b000060220041f9 Revision: 080C Serial Num: Unsupported Unformatted capacity: 102400.000 MBytes Write Cache: Enabled Read Cache: Enabled Minimum prefetch: 0x0 Maximum prefetch: 0x0 Device Type: Disk device Path(s): /dev/rdsk/c6t29000060220041F96257354230303052d0s2 /devices/scsi_vhci/ssd@g29000060220041f96257354230303052:c,raw Controller /devices/pci@6,4000/SUNW,qlc@2/fp@0,0 Device Address 2b000060220041f4,0 Class primary State ONLINE Controller /devices/pci@6,4000/SUNW,qlc@3/fp@0,0 Device Address 2b000060220041f9,0
Class primary
State ONLINE
For Internal Use Only
Chapter 2 General Troubleshooting Procedures 7
Note that in the Class and State fields, the virtualization engines are presented as two primary/ONLINE devices. The current Sun StorEdge Traffic Manager design does not enable you to manually halt the I/O (that is, you cannot perform a failover to the secondary path) when only primary devices are present.

Alternatives to Sun StorEdge Traffic Manager

As an alternative to using Sun StorEdge Traffic Manager, you can manually halt the I/O using one of two methods: quiesce I/O and unconfigure the c2 path. These methods are explained below.

To Quiesce the I/O

1. Determine the path you want to disable.
2. Type:
# cfgadm -c unconfigure device

To Unconfigure the c2 Path

1. Type:
# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown c0::dsk/c0t0d0 disk connected configured unknown c0::dsk/c0t1d0 disk connected configured unknown c1 scsi-bus connected configured unknown c1::dsk/c1t6d0 CD-ROM connected configured unknown c2 fc-fabric connected configured unknown c2::210100e08b23fa25 unknown connected unconfigured unknown c2::2b000060220041f4 disk connected configured unknown c3 fc-fabric connected configured unknown c3::210100e08b230926 unknown connected unconfigured unknown c3::2b000060220041f9 disk connected configured unknown c4 fc-private connected unconfigured unknown c5 fc connected unconfigured unknown
8 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
2. Using Storage Automated Diagnostic Environment Topology GUI, determine which virtualization engine is in the path you need to disable.
3. Use the world wide name (WWN) of the virtualization engine that is in the unconfigure command, as follows:
# cfgadm -c unconfigure c2::2b000060220041f4 # cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown c0::dsk/c0t0d0 disk connected configured unknown c0::dsk/c0t1d0 disk connected configured unknown c1 scsi-bus connected configured unknown c1::dsk/c1t6d0 CD-ROM connected configured unknown c2 fc-fabric connected unconfigured unknown c2::210100e08b23fa25 unknown connected unconfigured unknown c2::2b000060220041f4 disk connected unconfigured unknown c3 fc-fabric connected configured unknown c3::210100e08b230926 unknown connected unconfigured unknown c3::2b000060220041f9 disk connected configured unknown c4 fc-private connected unconfigured unknown c5 fc connected unconfigured unknown
4. Verify that I/O has halted.
This halts the I/O only up to the A3/B3 link (see
FIGURE 2-2). I/O continues to move
over the T1 and T2 paths, as well as the A4/B4 links to the Sun StorEdge T3+ array.
For Internal Use Only
Chapter 2 General Troubleshooting Procedures 9

To Suspend the I/O

Use one of the following methods to suspend the I/O while the failover occurs:
1. Stop all customer applications that are accessing the Sun StorEdge T3+ array.
2. Manually pull the link from the Sun StorEdge T3+ array to the switch and wait for a Sun StorEdge T3+ array LUN failover.
After the failover occurs, replace the cable and proceed with testing and FRU
isolation.
After testing and any FRU replacement is finished, return the Controller state
back to the default by using virtualization engine failback. Refer to “Virtualization Engine Failback” on page 81.
Note – To confirm that a failover is occurring, open a telnet session to the Sun
StorEdge T3+ array and check the output of port listmap.
Another, but slower, method is to run the runsecfg script and verify the virtualization engine maps by polling them against a live system.
Caution – During the failover, SCSI errors will occur on the data host and a brief
suspension of I/O will occur.

To Return the Path to Production

1. Type cfgadm -c configure device.
# cfgadm -c configure c2::2b000060220041f4
2. Verify that I/O has resumed on all paths.
10 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

To View the VxDisk Properties

1. Type the following:
# vxdisk list Disk_1
Device: Disk_1 devicetag: Disk_1 type: sliced hostid: diag.xxxxx.xxx.COM disk: name=t3dg02 id=1010283311.1163.diag.xxxxx.xxx.com group: name=t3dg id=1010283312.1166.diag.xxxxx.xxx.com flags: online ready private autoconfig nohotuse autoimport imported pubpaths: block=/dev/vx/dmp/Disk_1s4 char=/dev/vx/rdmp/Disk_1s4 privpaths: block=/dev/vx/dmp/Disk_1s3 char=/dev/vx/rdmp/Disk_1s3 version: 2.2 iosize: min=512 (bytes) max=2048 (blocks) public: slice=4 offset=0 len=209698816 private: slice=3 offset=1 len=4095 update: time=1010434311 seqno=0.6 headers: 0 248 configs: count=1 len=3004 logs: count=1 len=455 Defined regions: config priv 000017-000247[000231]: copy=01 offset=000000 enabled config priv 000249-003021[002773]: copy=01 offset=000231 enabled log priv 003022-003476[000455]: copy=01 offset=000000 enabled Multipathing information:
numpaths: 2
c20t2B000060220041F4d0s2 state=enabled c23t2B000060220041F9d0s2 state=enabled
# vxdmpadm listctlr all CTLR-NAME ENCLR-TYPE STATE ENCLR-NAME ===================================================== c0 OTHER_DISKS ENABLED OTHER_DISKS c2 SENA ENABLED SENA0 c3 SENA ENABLED SENA0 c20 Disk ENABLED Disk c23 Disk ENABLED Disk
From the VxDisk output, notice that there are two physical paths to the LUN:
c20t2B000060220041F4d0s2
c23t2B000060220041F9d0s2
Both of these paths are currently enabled with VxDMP.
Chapter 2 General Troubleshooting Procedures 11
For Internal Use Only
2. Use the luxadm(1M) command to display further information about the underlying LUN.
# luxadm display /dev/rdsk/c20t2B000060220041F4d0s2
DEVICE PROPERTIES for disk: /dev/rdsk/c20t2B000060220041F4d0s2 Status(Port A): O.K. Vendor: SUN Product ID: SESS01 WWN(Node): 2a000060220041f4 WWN(Port A): 2b000060220041f4 Revision: 080C Serial Num: Unsupported Unformatted capacity: 102400.000 MBytes Write Cache: Enabled Read Cache: Enabled Minimum prefetch: 0x0 Maximum prefetch: 0x0 Device Type: Disk device Path(s): /dev/rdsk/c20t2B000060220041F4d0s2 /devices/pci@a,2000/pci@2/SUNW,qlc@4/fp@0,0 ssd@w2b000060220041f4,0:c,raw
# luxadm display /dev/rdsk/c23t2B000060220041F9d0s2
DEVICE PROPERTIES for disk: /dev/rdsk/c23t2B000060220041F9d0s2 Status(Port A): O.K. Vendor: SUN Product ID: SESS01 WWN(Node): 2a000060220041f9 WWN(Port A): 2b000060220041f9 Revision: 080C Serial Num: Unsupported Unformatted capacity: 102400.000 MBytes Write Cache: Enabled Read Cache: Enabled Minimum prefetch: 0x0 Maximum prefetch: 0x0 Device Type: Disk device Path(s): /dev/rdsk/c23t2B000060220041F9d0s2 /devices/pci@e,2000/pci@2/SUNW,qlc@4/fp@0,0/ ssd@w2b000060220041f9,0:c,raw
12 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

To Quiesce the I/O on the A3/B3 Link

1. Determine the path you want to disable.
2. Disable the path by typing the following:
# vxdmpadm disable ctlr=<c#>
3. Verify that the path is disabled:
# vxdmpadm listctlr all
Steps 1 and 2 halt I/O only up to the A3/B3 link. I/O will continue to move over the T1 & T2 paths, as well as the A4/B4 links to the Sun StorEdge T3+ array.

To Suspend the I/O on the A3/B3 Link

Use one of the following methods to suspend I/O while the failover occurs:
1. Stop all customer applications that are accessing the Sun StorEdge T3+ array.
2. Manually pull the link from the Sun StorEdge T3+ array to the switch and wait for a Sun StorEdge T3+ array LUN failover.
a. After the failover occurs, replace the cable and proceed with testing and FRU
isolation.
b. After testing is complete and any FRU replacement is finished, return the
controller state back to the default by using the virtualization engine failback command.
Caution – This action will cause SCSI errors on the data host and a brief suspension
of I/O while the failover occurs.
Chapter 2 General Troubleshooting Procedures 13
For Internal Use Only

To Return the Path to Production

1. Type:
# vxdmpadm enable ctlr=<c#>
2. Verify that the path has been re-enabled by typing:
# vxdmpadm listctlr all
14 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

Fibre Channel Links

The following sections provide troubleshooting information for the basic components and Fibre Channel links, listed in
TABLE2-1
Link Provides Fibre Channel Link Between these Components
A1 to B1 Datahost, sw1a, and sw1b
A2 sw1a and v1a*
B2 sw1b and v1b*
A3 v1a and sw2a*
B3 v1b and sw2b*
A4 Master Sun StorEdge T3+ array and the “A” path switch
B4 AltMaster Sun StorEdge T3+ array and the “B” path switch
T1 to T2 sw2a and sw2b*
* Sun StorEdge 6900 series only
Note – In an actual Sun StorEdge 3900 or 6900 series configuration, there could be
more Sun StorEdge T3+ arrays than are shown in FIGURE 2-1 and FIGURE 2-2.
TABLE 2-1.
By using the Storage Automated Diagnostic Environment, you should be able to isolate the problem to one particular segment of the configuration.
The information found in this section is based on the assumption that the Storage Automated Diagnostic Environment is running on the data host, and that it is configured to monitor host errors. If the Storage Automated Diagnostic Environment is not installed on the data host, there will be areas of limited monitoring, diagnosis and isolation.
The following diagrams provide troubleshooting information for the basic components and Fibre Channel links specific to the Sun StorEdge 3900 series, shown
FIGURE 2-1, and the Sun StorEdge 6900 series, shown in FIGURE 2-2.
in
Chapter 2 General Troubleshooting Procedures 15
For Internal Use Only

Fibre Channel Link Diagrams

FIGURE 2-1 shows the basic components and the Fibre Channel links for a Sun
StorEdge 3900 series system:
A1 to B1—HBA to Sun StorEdge FC network switch-8 and switch-16 switch link
A4 to B4—Sun StorEdge FC network switch-8 and switch-16 switch to Sun
StorEdge T3+ array link
HOST
HBA-A
A1
Sw1a Sw1b
T3 Alt-Master
A4
T3 Master
HBA-B
B1
B4
FIGURE 2-1 Sun StorEdge 3900 Series Fibre Channel Link Diagram
16 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
FIGURE 2-2 shows the basic components and the Fibre Channel links for a Sun
StorEdge 6900 series system:
A1 to B1—HBA to Sun StorEdge network FC switch-8 and switch-16 switch link
A2 to B2—Sun StorEdge network FC switch-8 and switch-16 switch to
virtualization engine link on the host side
A3 to B3—Sun StorEdge network FC switch-8 and switch-16 switch to the
virtualization engine link on the device side
A4 to B4—Sun StorEdge network FC switch-8 and switch-16 switch to Sun
StorEdge T3+ array switch
T1 to T2—T Port switch-to-switch link
HOST
A2
A3
Sw2a
Sw1a
V1a
A4
A1
HBA-A
HBA-B
B1
Sw1b
B2
V1b
B3
T1
Sw2b
T2
B4
T3 Alt-Master
T3 Master
FIGURE 2-2 Sun StorEdge 6900 Series Fibre Channel Link Diagram
Chapter 2 General Troubleshooting Procedures 17
For Internal Use Only

Host Side Troubleshooting

Host-side troubleshooting refers to the messages and errors the data host detects. Usually, these messages appear in the /var/adm/messages file.

Storage Service Processor Side Troubleshooting

Storage Service Processor-side Troubleshooting refers to messages, alerts, and errors that the Storage Automated Diagnostic Environment, running on the Storage Service Processor, detects. You can find these messages by monitoring the following Sun StorEdge 3900 series and the Sun StorEdge 6900 series components:
Sun StorEdge network FC switch-8 and switch-16 switches
Virtualization engine
Sun StorEdge T3+ array
Combining the host side messages and errors and the Storage Service Processor-side messages, alerts, and errors into a meaningful context is essential for proper troubleshooting.
18 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

Command Line Test Examples

To run a single Sun StorEdge diagnostic test from the command line rather than through the Storage Automated Diagnostic Environment interface, you must log into the appropriate Host or Slave for testing the components. The following two tests, the qlctest(1M) and the switchtest(1M) are provided as examples.
qlctest(1M)
The qlctest(1M) comprises several subtests that test the functions of the Sun StorEdge PCI dual Fibre Channel (FC) host adapter board. This board is an HBA that has diagnostic support. This diagnostic test is not scalable.
CODE EXAMPLE 2-1 qlctest(1M)
# /opt/SUNWstade/Diags/bin/qlctest -v -o "dev=
/devices/pci@6,4000/SUNW,qlc@3/fp@0,0:devctl|run_connect =Yes|mbox=Disable|ilb=Disable|ilb_10=Disable|elb=Enable"
"qlctest: called with options: dev=/devices/pci@6,4000/SUNW,qlc@3/ fp@0,0:devctl|run_connect=Yes|mbox=Disable|ilb=Disable|ilb_10=Disable|el b=Enable" "qlctest: Started." "Program Version is 4.0.1" "Testing qlc0 device at /devices/pci@6,4000/SUNW,qlc@3/fp@0,0:devctl." "QLC Adapter Chip Revision = 1, Risc Revision = 3, Frame Buffer Revision = 1029, Riscrom Revision = 4, Driver Revision = 5.a-2-1.15 " "Running ECHO command test with pattern 0x7e7e7e7e" "Running ECHO command test with pattern 0x1e1e1e1e" "Running ECHO command test with pattern 0xf1f1f1f1"
<snip>
"Running ECHO command test with pattern 0x4a4a4a4a" "Running ECHO command test with pattern 0x78787878" "Running ECHO command test with pattern 0x25252525" "FCODE revision is ISP2200 FC-AL Host Adapter Driver: 1.12 01/01/16" "Firmware revision is 2.1.7f" "Running CHECKSUM check" "Running diag selftest" "qlctest: Stopped successfully."
Chapter 2 General Troubleshooting Procedures 19
For Internal Use Only
switchtest(1M)
switchtest(1M) is used to diagnose the Sun StorEdge network FC switch-8 and
switch-16 switch devices. The switchtest process also provides command line access to switch diagnostics. switchtest supports testing on local and remote switches.
switchtest runs the port diagnostic on connected switch ports. While switchtest is running, the port statistics are monitored for errors, and the chassis
status is checked.
CODE EXAMPLE 2-2 switchtest(1M)
# /opt/SUNWstade/Diags/bin/switchtest -v -o "dev=
2:192.168.0.30:0x0|xfersize=200" "switchtest: called with options: dev=2:192.168.0.30:0x0|xfersize=200"
"switchtest: Started." "Testing port: 2" "Using ip_addr: 192.168.0.30, fcaddr: 0x0 to access this port." "Chassis Status for Device: Switch Power: OK Temp: OK 23.0c Fan 1: OK Fan 2: OK" "Testing Device: Switch Port: 2 Pattern: 0x7e7e7e7e" "Testing Device: Switch Port: 2 Pattern: 0x1e1e1e1e" "Testing Device: Switch Port: 2 Pattern: 0xf1f1f1f1" "Testing Device: Switch Port: 2 Pattern: 0xb5b5b5b5" "Testing Device: Switch Port: 2 Pattern: 0x4a4a4a4a" "Testing Device: Switch Port: 2 Pattern: 0x78787878" "Testing Device: Switch Port: 2 Pattern: 0xe7e7e7e7" "Testing Device: Switch Port: 2 Pattern: 0xaa55aa55" "Testing Device: Switch Port: 2 Pattern: 0x7f7f7f7f" "Testing Device: Switch Port: 2 Pattern: 0x0f0f0f0f" "Testing Device: Switch Port: 2 Pattern: 0x00ff00ff" "Testing Device: Switch Port: 2 Pattern: 0x25252525" "Port: 2 passed all tests on Switch" "switchtest: Stopped successfully."
All Storage Automated Diagnostic Environment diagnostics tests are located in
/opt/SUNWstade/Diags/bin. Refer to the Storage Automated Diagnostic Environment User’s Guide for a complete list of tests, subtests, options, and
restrictions.
20 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

Storage Automated Diagnostic Environment Event Grid

The Storage Automated Diagnostic Environment generates component-specific event grids that describe the severity of an Event, whether action is required, a description of the event, and recommended action. Refer to Chapters 5 through 9 of this troubleshooting guide for component-specific event grids.

To Customize an Event Report

1. Click the Event Grid link on the the Storage Automated Diagnostic Environment Help menu.
2. Select the criteria from the Storage Automated Diagnostic Environment event grid, like the one shown in in
TABLE2-2 Event Grid Sorting Criteria
Category Component Event Type Severity Action
• All (Default)
• Sun StorEdge A3500FC array
• Sun StorEdge A5000 array
• Agent
• Host
• Message
• Sun Switch
• Sun StorEdge T3+ array
• Tape
• Vvirtualization engine
• All (Default)
• Backplane
• Controller
• Disk
• Interface
• LUN
• Port
• Power
TABLE 2-2.
• Agent Deinstall
• Agent Install
• Alarm
• Alternate Master +
• Alternate Master—
• Audit
• CommunicationEstablished
• CommunicationLost
• Discovery
• Heartbeat
• Insert Component
• Location Change
• Patch Info
• Quiesce End
• Quiesce Start
• Removal
• Remove Component
• State Change +(from offline to online)
• State Change—(from online to offline)
• Statistics
• Backup
Red—
Critical
(Error)
Yellow—
Alert
(Warning)
Down—
System
Down
Y—This event is actionable and is sent to RSS/ SRS
N—This event is non actionable
For Internal Use Only
Chapter 2 General Troubleshooting Procedures 21
22 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CHAPTER
3
Troubleshooting the Fibre Channel Links

A1/B1 Fibre Channel (FC) Link

If a problem occurs with the A1/B1 FC link:
In a Sun StorEdge 3900 series system, the Sun StorEdge T3+ array will fail over.
In a Sun StorEdge 6900 series system, no Sun StorEdge T3+ array will fail over,
but a severe problem can cause a path to go offline.
FIGURE 3-1, FIGURE 3-2, and FIGURE 3-3 are examples of A1/B1 Fibre Channel Link
Notification Events.
Site : FSDE LAB Broomfield CO Source : diag.xxxxx.xxx.com Severity : Normal Category : Message Key: message:diag.xxxxx.xxx.com EventType: LogEvent.driver.LOOP_OFFLINE EventTime: 01/08/2002 14:34:45
Found 1 ’driver.LOOP_OFFLINE’ error(s) in logfile: /var/adm/messages on diag.xxxxx.xxx.com (id=80fee746): info: Loop Offline
Jan 8 14:34:25 WWN: Received 2 ’Loop Offline’ message(s) [threshold is 1 in 5mins] Last-Message: ’diag.xxxxx.xxx.com qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE ’
FIGURE 3-1 Data Host Notification of Intermittent Problems
23
Site : FSDE LAB Broomfield CO Source : diag.xxxxx.xxx.com Severity : Normal Category : Message Key: message:diag.xxxxx.xxx.com EventType: LogEvent.driver.MPXIO_offline EventTime: 01/08/2002 14:48:02
Found 2 ’driver.MPXIO_offline’ warning(s) in logfile: /var/adm/messages on diag.xxxxx.xxx.com (id=80fee746):
Jan 8 14:47:07 WWN:2b000060220041f9 diag.xxxxx.xxx.com mpxio: [ID 779286 kern.info] /scsi_vhci/ssd@g29000060220041f96257354230303053 (ssd19) multipath status: degraded, path /pci@6,4000/SUNW,qlc@3/fp@0,0 (fp1) to target address: 2b000060220041f9,1 is offline
Jan 8 14:47:07 WWN:2b000060220041f9 diag.xxxxx.xxx.com mpxio: [ID 779286 kern.info] /scsi_vhci/ssd@g29000060220041f96257354230303052 (ssd18) multipath status: degraded, path /pci@6,4000/SUNW,qlc@3/fp@0,0 (fp1) to target address: 2b000060220041f9,0 is offline
FIGURE 3-2 Data Host Notification of Severe Link Error
Site : FSDE LAB Broomfield CO Source : diag.xxxxx.xxx.com Severity : Normal Category : Switch Key: switch:100000c0dd0057bd EventType: StateChangeEvent.X.port.6 EventTime: 01/08/2002 14:54:20
’port.6’ in SWITCH diag-sw1a (ip=192.168.0.30) is now Unknown (status­state changed from ’Online’ to ’Admin’):
FIGURE 3-3 Storage Service Processor Notification
Note – An A1/B1 FC link error can cause a port in sw1a or sw1b to change state.
24 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

To Verify the Data Host

An error in the A1/B1 FC link can cause a path to go offline in the multipathing software.
CODE EXAMPLE 3-1 luxadm(1M) Display
# luxadm display
/dev/rdsk/c6t29000060220041F96257354230303052d0s2 DEVICE PROPERTIES for disk: /dev/rdsk/ c6t29000060220041F96257354230303052d0s2 Status(Port A): O.K. Status(Port B): O.K. Vendor: SUN Product ID: SESS01 WWN(Node): 2a000060220041f4 WWN(Port A): 2b000060220041f4 WWN(Port B): 2b000060220041f9 Revision: 080C Serial Num: Unsupported Unformatted capacity: 102400.000 MBytes Write Cache: Enabled Read Cache: Enabled Minimum prefetch: 0x0 Maximum prefetch: 0x0 Device Type: Disk device Path(s): /dev/rdsk/c6t29000060220041F96257354230303052d0s2 /devices/scsi_vhci/ssd@g29000060220041f96257354230303052:c,raw Controller /devices/pci@6,4000/SUNW,qlc@3/fp@0,0 Device Address 2b000060220041f9,0 Class primary
State OFFLINE
Controller /devices/pci@6,4000/SUNW,qlc@2/fp@0,0 Device Address 2b000060220041f4,0 Class primary State ONLINE
...
For Internal Use Only
Chapter 3 Troubleshooting the Fibre Channel Links 25
An error in the A1/B1 FC link can also cause a device to enter the “unusable” state in cfgadm. In this case, the output for luxadm -e port will show that a device that was “connected” changed to an “unconnected” state.
CODE EXAMPLE 3-2 cfgadm -al Display
...
# cfgadm -al
Ap_Id Type Receptacle Occupant Condition c0 scsi-bus connected configured unknown c0::dsk/c0t0d0 disk connected configured unknown c0::dsk/c0t1d0 disk connected configured unknown c1 scsi-bus connected configured unknown c1::dsk/c1t6d0 CD-ROM connected configured unknown c2 fc-fabric connected configured unknown c2::210100e08b23fa25 unknown connected unconfigured unknown c2::2b000060220041f4 disk connected configured unknown c3 fc-fabric connected configured unknown
c3::2b000060220041f9 disk connected configured unusable
c4 fc-private connected unconfigured unknown c5 fc connected unconfigured unknown

FRU Tests Available for A1/B1 FC Link Segment

HBA—qlctest(1M)
Available only if the Storage Automated Diagnostic Environment is installed
on a data host
Causes HBA to go “offline” and “online” during tests
Switch —switchtest(1M)
Can be run while the link is still cabled and online (connected to HBA)
You must specify a payload of 200 bytes or less when testing the A1/B1 FC
link, while the link is connected to the HBA (limitation in HBA ASIC).
Can be run only from the Storage Service Processor
The dev option to switchtest is in the following format:
Port:IP-Address:FCAddress The FCAddress can be set to 0x0
26 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CODE EXAMPLE 3-3 switchtest(1M) called with options
# ./switchtest -v -o "dev=2:192.168.0.30:0"
"switchtest: called with options: dev=2:192.168.0.30:0" "switchtest: Started." "Testing port: 2" "Using ip_addr: 192.168.0.30, fcaddr: 0x0 to access this port." "Chassis Status for Device: Switch Power: OK Temp: OK 23.0c Fan 1: OK Fan 2: OK " 02/06/02 15:09:45 diag Storage Automated Diagnostic Environment MSGID 4001 switchtest.WARNING switch0: "Maximum transfer size for a FABRIC port is 200. Changing transfer size 2000 to 200" "Testing Device: Switch Port: 2 Pattern: 0x7e7e7e7e" "Testing Device: Switch Port: 2 Pattern: 0x1e1e1e1e"
Note – The Storage Automated Diagnostic Environment automatically resets the
transfer size if it notes that it is about to test a switch to HBA connection. This is done both in the Storage Automated Diagnostic Environment GUI and from the command-line interface (CLI).
For Internal Use Only
Chapter 3 Troubleshooting the Fibre Channel Links 27

To Isolate the A1/B1 FC Link

1. Quiesce the I/O on the A1/B1 FC link path.
2. Run switchtest or qlctest to test the entire link.
3. Break the connection by uncabling the link.
4. Insert a loopback connector into the switch port.
5. Rerun switchtest. a. If switchtest fails, replace the GBIC and rerun switchtest. b. If switchtest fails again, replace the switch.
6. Insert a loopback connector into the HBA.
7. Run qlctest.
If the test fails, replace the HBA.
If the test passes, replace the cable.
8. Recable the entire link.
9. Run switchtest or qlctest to validate the fix.
10. Return the path to production.
28 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

A2/B2 Fibre Channel (FC) Link

If a problem occurs with the A2/B2 FC link:
In a Sun StorEdge 3900 series system, the Sun StorEdge T3+ array will fail over.
In a Sun StorEdge 6900 series system, no Sun StorEdge T3+ array will fail over,
but a severe problem can cause a path to go offline.
FIGURE 3-4 and FIGURE 3-5 are examples of A2/B2 FC Link Notification Events.
From root Tue Jan 8 18:39:48 2002 Date: Tue, 8 Jan 2002 18:39:47 -0700 (MST) Message-Id: <200201090139.g091dlg07015@diag.xxxxx.xxx.com> From: Storage Automated Diagnostic Environment.Agent Subject: Message from ’diag.xxxxx.xxx.com’ (2.0.B2.002) Content-Length: 2742 You requested the following events be forwarded to you from ’diag.xxxxx.xxx.com’.
Site : FSDE LAB Broomfield CO Source : diag226.xxxxx.xxx.com Severity : Normal Category : Message Key: message:diag.xxxxx.xxx.com EventType: LogEvent.driver.Fabric_Warning EventTime: 01/08/2002 17:34:47
Found 1 ’driver.Fabric_Warning’ warning(s) in logfile: /var/adm/messages on diag.xxxxx.xxx.com (id=80fee746): Info: Fabric warning
Jan 8 17:34:36 WWN:2b000060220041f4 diag.xxxxx.xxx.com fp: [ID 517869 kern.warning] WARNING: fp(0): N_x Port with D_ID=108000, PWWN=2b000060220041f4 disappeared from fabric
<snip>
multipath status: degraded, path /pci@6,4000/SUNW,qlc@2/fp@0,0 (fp0) to target address: 2b000060220041f4,1 is offline Jan 8 17:34:55 WWN:2b000060220041f4 diag.xxxxx.xxx.com
mpxio: [ID 779286 kern.info] /scsi_vhci/ ssd@g29000060220041f96257354230303052 (ssd18)
multipath status: degraded, path /pci@6,4000/SUNW,qlc@2/fp@0,0 (fp0) to target address: 2b000060220041f4,0 is offline
FIGURE 3-4 A2/B2 FC Link Host Side Event
Chapter 3 Troubleshooting the Fibre Channel Links 29
For Internal Use Only
Site : FSDE LAB Broomfield CO Source : diag.xxxxx.xxx.com Severity : Normal Category : Switch Key: switch:100000c0dd0061bb EventType: StateChangeEvent.X.port.1 EventTime: 01/08/2002 17:38:32
’port.1’ in SWITCH diag-sw1b (ip=192.168.0.31) is now Unknown (status­state changed from ’Online’ to ’Admin’):
----------------------------------------------------------------
Site : FSDE LAB Broomfield CO Source : diag.xxxxx.xxx.com Severity : Normal Category : San Key: switch:100000c0dd0061bb:1 EventType: LinkEvent.ITW.switch|ve EventTime: 01/08/2002 17:39:47
ITW-ERROR (765 in 11 mins): Origin: port 1 on ’switch ’sw1b/192.168.0.31’. Destination: port 1 on ve ’diag-v1b/29000060220041f4’: Info: An invalid transmission word (ITW) was detected between two components. This could indicate a potential problem. Cause: Likely Causes are: GBIC, FC Cable and device optical connections. Action: To isolate further please run the Storage Automated Diagnostic Environment tests associated with this link segment.
FIGURE 3-5 A2/B2 FC Link Storage Service Processor Side Event
30 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

To Verify the Host Side

An error in the A2/B2 FC link can result in a device being listed as in an “unusable” state in cfgadm, but no HBAs are listed as in the “unconnected” state in luxadm output. The multipathing software will note an OFFLINE path.
For Internal Use Only
Chapter 3 Troubleshooting the Fibre Channel Links 31
CODE EXAMPLE 3-4 cfgadm -al
# cfgadm -al
Ap_Id Type Receptacle Occupant Condition c0 scsi-bus connected configured unknown
<snip>
# luxadm -e port
Found path to 2 HBA ports
/devices/pci@6,4000/SUNW,qlc@2/fp@0,0:devctl CONNECTED
/devices/pci@6,4000/SUNW,qlc@3/fp@0,0:devctl CONNECTED
# luxadm display /dev/rdsk/c6t29000060220041F96257354230303052d0s2
DEVICE PROPERTIES for disk: /dev/rdsk/ c6t29000060220041F96257354230303052d0s2 Status(Port A): O.K. Status(Port B): O.K. Vendor: SUN Product ID: SESS01 WWN(Node): 2a000060220041f9 WWN(Port A): 2b000060220041f9 WWN(Port B): 2b000060220041f4 Revision: 080C Serial Num: Unsupported Unformatted capacity: 102400.000 MBytes Write Cache: Enabled Read Cache: Enabled Minimum prefetch: 0x0 Maximum prefetch: 0x0 Device Type: Disk device Path(s): /dev/rdsk/c6t29000060220041F96257354230303052d0s2 /devices/scsi_vhci/ssd@g29000060220041f96257354230303052:c,raw Controller /devices/pci@6,4000/SUNW,qlc@3/fp@0,0 Device Address 2b000060220041f9,0 Class primary State ONLINE Controller /devices/pci@6,4000/SUNW,qlc@2/fp@0,0 Device Address 2b000060220041f4,0 Class primary State OFFLINE
32 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Note – You can find procedures for restoring virtualization engine settings in the
Sun StorEdge 3900 and 6900 Series Reference Manual .

To Verify the A2/B2 FC Link

You can check the A2/B2 FC link using the Storage Automated Diagnostic Environment, Diagnose—Test from Topology functionality. The Storage Automated Diagnostic Environment’s implementation of diagnostic tests verifies the operation of user-selected components. Using the Topology view, you can select specific tests, subtests, and test options.
Refer to Chapter 5 of the Storage Automated Diagnostic Environment User’s Guide for more information.
FRU Tests Available for A2/B2 FC Link Segment
The linktest is not available.
The switch and/or GBIC— switchtest test:
Can be used only in conjunction with the loopback connector.
Cannot be cabled to the virtualization engine while switchtest runs.
No virtualization engine tests are available at this time.

To Isolate the A2/B2 FC Link

1. Quiesce the I/O on the A2/B2 FC link path.
2. Break the connection by uncabling the link.
3. Insert the loopback connector into the switch port.
4. Run switchtest: a. If the test fails, replace the GBIC and rerun switchtest. b. If the test fails again, replace the switch.
Chapter 3 Troubleshooting the Fibre Channel Links 33
For Internal Use Only
5. If the switch or the GBIC show no errors, replace the remaining components in the following order:
a. Replace the virtualization engine-side GBIC, recable the link, and monitor the
link for errors. b. Replace the cable, recable the link, and monitor the link for errors. c. Replace the virtualization engine, restore the virtualization engine settings,
recable the link, and monitor the link for errors
6. Return the path to production.
The procedures for restoring virtualization engine settings are in the Sun StorEdge 3900 and 6900 Series Reference Manual.
34 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

A3/B3 Fibre Channel (FC) Link

If a problem occurs with the A3/B3 FC link:
In a Sun StorEdge 3900 series system, the Sun StorEdge T3+ array will fail over.
In a Sun StorEdge 6900 series system, no Sun StorEdge T3+ array will fail over,
but a severe problem can cause a path to go offline.
FIGURE 3-6, FIGURE 3-7, and FIGURE 3-8 are examples of A3/B3 FC link Notification
Events.
Site : FSDE LAB Broomfield CO Source : diag.xxxxx.xxx.com Severity : Normal Category : Message Key: message:diag.xxxxx.xxx.com EventType: LogEvent.driver.MPXIO_offline EventTime: 01/08/2002 18:25:18
Found 2 ’driver.MPXIO_offline’ warning(s) in logfile: /var/adm/messages on diag.xxxxx.xxx.com (id=80fee746):
Jan 8 18:24:24 WWN:2b000060220041f9 diag.xxxxx.xxx.com mpxio: [ID 779286 kern.info] /scsi_vhci/ssd@g29000060220041f96257354230303053 (ssd19) multipath status: degraded, path /pci@6,4000/SUNW,qlc@3/fp@0,0 (fp1) to target address: 2b000060220041f9,1 is offline Jan 8 18:24:24 WWN:2b000060220041f9 diag.xxxxx.xxx.com mpxio: [ID 779286 kern.info] /scsi_vhci/ssd@g29000060220041f96257354230303052 (ssd18) multipath status: degraded, path /pci@6,4000/SUNW,qlc@3/fp@0,0 (fp1) to target address: 2b000060220041f9,0 is offline
---------------------------------------------------------------­Site : FSDE LAB Broomfield CO Source : diag.xxxxx.xxx.com Severity : Normal Category : Message Key: message:diag.xxxxx.xxx.com EventType: LogEvent.driver.Fabric_Warning
EventTime: 01/08/2002 18:25:18
Found 1 ’driver.Fabric_Warning’ warning(s) in logfile: /var/adm/messages on diag.xxxxx.xxx.com (id=80fee746): Info: Fabric warning
Jan 8 18:24:04 WWN:2b000060220041f9 diag.xxxxx.xxx.com fp: [ID 517869 kern.warning] WARNING: fp(1): N_x Port with D_ID=104000, PWWN=2b000060220041f9 disappeared from fabric
FIGURE 3-6 A3/B3 FC Link Host-Side Event
Chapter 3 Troubleshooting the Fibre Channel Links 35
For Internal Use Only
Site : FSDE LAB Broomfield CO Source : diag.xxxxx.xxx.com Severity : Normal Category : Switch Key: switch:100000c0dd0057bd EventType: StateChangeEvent.M.port.1 EventTime: 01/08/2002 18:28:38
’port.1’ in SWITCH diag-sw1a (ip=192.168.0.30) is now Not-Available (status-state changed from ’Online’ to ’Offline’): Info: A port on the switch has logged out of the fabric and gone offline Action:
1. Verify cables, GBICs and connections along Fibre Channel path
2. Check Storage Automated Diagnostic Environment SAN Topology GUI to identify failing segment of the data path
3. Verify correct FC switch configuration
FIGURE 3-7 A3/B3 FC Link Storage Service Processor-Side Event
Site : FSDE LAB Broomfield CO Source : diag.xxxxx.xxx.com Severity : Normal Category : Switch Key: switch:100000c0dd00cbfe EventType: StateChangeEvent.M.port.1 EventTime: 01/08/2002 18:28:40
’port.1’ in SWITCH diag-sw2a (ip=192.168.0.32) is now Not-Available (status-state changed from ’Online’ to ’Offline’): Info: A port on the switch has logged out of the fabric and gone offline Action:
1. Verify cables, GBICs and connections along Fibre Channel path
2. Check Storage Automated Diagnostic Environment SAN Topology GUI to identify failing segment of the data path
3. Verify correct FC switch configuration
FIGURE 3-8 A3/B3 FC Link Storage Service Processor-Side Event
36 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

To Verify the Host Side

An error in the A3/B3 FC link results in a device being listed as in an “unusable” state in cfgadm, but no HBAs are listed as in the “unconnected” state in luxadm output. The multipathing software will note an “offline” path.
CODE EXAMPLE 3-5 Devices in the “connected” state
# cfgadm -al
Ap_Id Type Receptacle Occupant Condition c0 scsi-bus connected configured unknown c0::dsk/c0t0d0 disk connected configured unknown c0::dsk/c0t1d0 disk connected configured unknown c1 scsi-bus connected configured unknown c1::dsk/c1t6d0 CD-ROM connected configured unknown c2 fc-fabric connected configured unknown c2::210100e08b23fa25 unknown connected unconfigured unknown c2::2b000060220041f4 disk connected configured unknown c3 fc-fabric connected configured unknown c3::2b000060220041f9 disk connected configured unusable c3::210100e08b230926 unknown connected unconfigured unknown c4 fc-private connected unconfigured unknown c5 fc connected unconfigured unknown
# luxadm -e port
Found path to 2 HBA ports
/devices/pci@6,4000/SUNW,qlc@2/fp@0,0:devctl CONNECTED /devices/pci@6,4000/SUNW,qlc@3/fp@0,0:devctl CONNECTED
# luxadm display
/dev/rdsk/c6t29000060220041F96257354230303052d0s2 DEVICE PROPERTIES for disk: /dev/rdsk/ c6t29000060220041F96257354230303052d0s2
<snip>
/devices/scsi_vhci/ssd@g29000060220041f96257354230303052:c,raw Controller /devices/pci@6,4000/SUNW,qlc@3/fp@0,0 Device Address 2b000060220041f9,0 Class primary State OFFLINE Controller /devices/pci@6,4000/SUNW,qlc@2/fp@0,0 Device Address 2b000060220041f4,0 Class primary State ONLINE
Chapter 3 Troubleshooting the Fibre Channel Links 37
For Internal Use Only
CODE EXAMPLE 3-6 VxDMP Error Message
Jan 8 18:26:38 diag.xxxxx.xxx.com vxdmp: [ID 619769 kern.notice] NOTICE: vxdmp: Path failure on 118/0x1f8
Jan 8 18:26:38 diag.xxxxx.xxx.com vxdmp: [ID 997040 kern.notice] NOTICE: vxvm:vxdmp: disabled path 118/0x1f8 belonging to the dmpnode 231/0xd0

To Verify the Storage Service Processor

You can check the A3/B3 FC link using the Storage Automated Diagnostic Environment, Diagnose—Test from Topology functionality. Storage Automated Diagnostic Environment’s implementation of diagnostic tests verify the operation of user-selected components. Using the Topology view, you can select specific tests, subtests, and test options.
Refer to the Storage Automated Diagnostic Environment User’s Guide for more information.

FRU Tests Available for the A3/B3 FC Link Segment

The Linktest is not available.
The switch and/or GBIC - switchtest test:
Can be used only in conjunction with the loopback connector.
Cannot be cabled to the virtualization engine while switchtest runs.
No virtualization engine tests are available at this time.
38 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

To Isolate the A3/B3 FC Link

1. Quiesce the I/O on the A3/B3 FC link path.
2. Break the connection by uncabling the link.
3. Insert the loopback connector into the switch port.
4. Run switchtest: a. If the test fails, replace the GBIC and rerun switchtest. b. If the test fails again, replace the switch.
5. If the switch or the GBIC show no errors, replace the remaining components in the following order:
a. Replace the virtualization engine-side GBIC, recable the link, and monitor the
link for errors. b. Replace the cable, recable the link, and monitor the link for errors. c. Replace the virtualization engine, restore the virtualization engine settings,
recable the link, and monitor the link for errors
6. Return the path to production.
The procedures for restoring virtualization engine settings are in the Sun StorEdge 3900 and 6900 Series Reference Manual.
For Internal Use Only
Chapter 3 Troubleshooting the Fibre Channel Links 39

A4/B4 Fibre Channel (FC) Link

If a problem occurs with the A4/B4 FC link:
In a Sun StorEdge 3900 series system, the Sun StorEdge T3+ array will fail over.
In a Sun StorEdge 6900 series system, no Sun StorEdge T3+ array will fail over,
but a severe problem can cause a path to go offline.
FIGURE 3-10 are examples of A4/B4 Link Notification Events.
and
Site : FSDE LAB Broomfield CO Source : diag.xxxxx.xxx.com Severity : Warning Category : Message DeviceId : message:diag.xxxxx.xxx.com EventType: LogEvent.driver.MPXIO_offline EventTime: 01/29/2002 14:28:06
Found 2 ’driver.MPXIO_offline’ warning(s) in logfile: /var/adm/messages on diag.xxxxx.xxx.com (id=80e4aa60):
<snip>
---------------------------------------------------------------------­Site : FSDE LAB Broomfield CO Source : diag.xxxxx.xxx.com Severity : Warning Category : Message DeviceId : message:diag.xxxxx.xxx.com EventType: LogEvent.driver.Fabric_Warning EventTime: 01/29/2002 14:28:06
Found 1 ’driver.Fabric_Warning’ warning(s) in logfile: /var/adm/messages on diag.xxxxx.xxx.com (id=80e4aa60): INFORMATION: Fabric warning
<snip>
status of hba /devices/pci@a,2000/pci@2/SUNW,qlc@5/fp@0,0:devctl on diag.xxxxx.xxx.com changed from CONNECTED to NOT CONNECTED INFORMATION: monitors changes in the output of luxadm -e port
Found path to 20 HBA ports /devices/sbus@2,0/SUNW,socal@d,10000:0 NOT CONNECTED
FIGURE 3-9 A4/B4 FC Link Data Host Notification
40 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Site : FSDE LAB Broomfield CO Source : diag Severity : Warning Category : Switch DeviceId : switch:100000c0dd0061bb EventType: LogEvent.MessageLog EventTime: 01/29/2002 14:25:05
Change in Port Statistics on switch diag-sw1b (ip=192.168.0.31):
Port-1: Received 16289 ’InvalidTxWds’ in 0 mins (value=365972 )
---------------------------------------------------------------------­Site : FSDE LAB Broomfield CO Source : diag Severity : Warning Category : T3message DeviceId : t3message:83060c0c EventType: LogEvent.MessageLog EventTime: 01/29/2002 14:25:06
Warning(s) found in logfile: /var/adm/messages.t3 on diag (id=83060c0c):
Jan 29 14:12:58 t3b0 ISR1[2]: W: u2ctr ISP2100[2] Received LOOP DOWN async event Jan 29 14:13:32 t3b0 MNXT[1]: W: u1ctr starting lun 1 failover
---------------------------------------------------------------------
Site : FSDE LAB Broomfield CO Source : diag Severity : Warning Category : T3message DeviceId : t3message:83060c0c EventType: LogEvent.MessageLog EventTime: 01/29/2002 14:11:14
Warning(s) found in logfile: /var/adm/messages.t3 on diag (id=83060c0c):
Jan 29 14:05:18 t3b0 ISR1[1]: W: u2d4 SVD_PATH_FAILOVER: path_id = 0 Jan 29 14:05:18 t3b0 ISR1[1]: W: u2d5 SVD_PATH_FAILOVER: path_id = 0 Jan 29 14:05:18 t3b0 ISR1[1]: W: u2d6 SVD_PATH_FAILOVER: path_id = 0 Jan 29 14:05:18 t3b0 ISR1[1]: W: u2d7 SVD_PATH_FAILOVER: path_id = 0 Jan 29 14:05:18 t3b0 ISR1[1]: W: u2d8 SVD_PATH_FAILOVER: path_id = 0 Jan 29 14:05:18 t3b0 ISR1[1]: W: u2d9 SVD_PATH_FAILOVER: path_id = 0
FIGURE 3-10 Storage Service Processor Notification
Chapter 3 Troubleshooting the Fibre Channel Links 41
For Internal Use Only

To Verify the Data Host

A problem in the A4/B4 FC Link appears differently on the data host, depending on if the array is a Sun StorEdge 3900 series or a Sun StorEdge 6900 seriesdevice.
Sun StorEdge 3900 Series
In a Sun StorEdge 3900 series device, the data host multipathing software is responsible for initiating the failover and reports it in /var/adm/messages, such as those reported by the Storage Automated Diagnostic Environment email notifications.
The luxadm failover command is used to fail the Sun StorEdge T3+ array LUNs back to the proper configuration after the failing FRU is replaced. This command is issued from the data host.
Sun StorEdge 6900 Series
In a Sun StorEdge 6900 series device, the virtualization engine pairs handle the failover and the failover is not noted on the data host. All paths would remain ONLINE and ACTIVE.
The mpdrive failback command is used, and is issued from the Storage Service Processor.
Note – In the event of a complete sw1b or sw2b failure in a Sun StorEdge 6900
series configuration, the virtualization engine pairs handle the failover. In addition, the multipathing software notes a path failure on the data host, Sun StorEdge Traffic Manager or VxDMP takes the entire path that was connected to the failed switch offline, and the ISL ports on the surviving switch go offline as well.
42 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
To verify the failover luxadm display can be used, the failed path will be marked OFFLINE, as shown in
CODE EXAMPLE 3-7 Failed Path marked OFFLINE
# luxadm display /dev/rdsk/c26t60020F200000644>
DEVICE PROPERTIES for disk: /dev/rdsk/ c26t60020F20000064433C3352A60003E82Fd0s2 Status(Port A): O.K. Status(Port B): O.K. Vendor: SUN Product ID: T300 WWN(Node): 50020f2000006443 WWN(Port A): 50020f2300006355 WWN(Port B): 50020f2300006443 Revision: 0118 Serial Num: Unsupported Unformatted capacity: 488642.000 MBytes Write Cache: Enabled Read Cache: Enabled Minimum prefetch: 0x0 Maximum prefetch: 0x0 Device Type: Disk device Path(s): /dev/rdsk/c26t60020F20000064433C3352A60003E82Fd0s2 /devices/scsi_vhci/ssd@g60020f20000064433c3352a60003e82f:c,raw Controller /devices/pci@a,2000/pci@2/SUNW,qlc@5/fp@0,0 Device Address 50020f2300006355,1 Class primary
State OFFLINE
Controller /devices/pci@e,2000/pci@2/SUNW,qlc@5/fp@0,0 Device Address 50020f2300006443,1 Class secondary State ONLINE
CODE EXAMPLE 3-7.
Note – This type of error may also cause the device to show up "unusable" in
cfgadm, as shown in CODE EXAMPLE 3-8.
Chapter 3 Troubleshooting the Fibre Channel Links 43
For Internal Use Only
CODE EXAMPLE 3-8 Failed Path marked “unusable”
# cfgadm -al Ap_Id Type Receptacle Occupant Condition ac0:bank0 memory connected configured ok ac0:bank1 memory empty unconfigured unknown c1 scsi-bus connected configured unknown c16 scsi-bus connected unconfigured unknown c18 scsi-bus connected unconfigured unknown c19 scsi-bus connected unconfigured unknown c1::dsk/c1t6d0 CD-ROM connected configured unknown c20 fc-private connected unconfigured unknown c21 fc-fabric connected configured unknown
c21::50020f2300006355 disk connected configured unusable

FRU tests available for the A4/B4 FC Link Segment

The switchtest can only be run from the Storage Service Processor
The linktest will be able to isolate the switch and the GBIC on the switch. It
will not be able to isolate the cable or the Sun StorEdge T3+ array controller.

To Isolate the A4/B4 FC Link

1. Quiesce the I/O on the A4/B4 FC link path.
2. Run linktest from the Storage Automated Diagnostic Environment GUI to isolate suspected failing components.
Alternatively, follow these steps:
1. Quiesce the I/O on the A4/B4 FC link path.
2. Run switchtest to test the entire link (re-create the problem).
3. Break the connnection by uncabling the link.
4. Insert the loopback connector into the switch port.
44 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
5. Rerun switchtest. a. If switchtest fails, replace the GBIC and rerun switchtest. b. If the test fails again, replace the switch.
6. If switchtest passes, assume that the suspect components are the cable and the Sun StorEdge T3+ array controller.
a. Replace the cable. b. Rerun switchtest.
7. If the test fails again, replace the Sun StorEdge T3+ array controller.
8. Return the path to production.
9. Return the Sun StorEdge T3+ array LUNs to the correct controllers, if a failover occured (determine if failovers occur using the luxadm failover or mpdrive failback commands).
For Internal Use Only
Chapter 3 Troubleshooting the Fibre Channel Links 45
46 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CHAPTER
4
Configuration Settings
This chapter contains the following sections:
“Verifying Configuration Settings” on page 47
“To Clear the Lock File” on page 50
For a complete listing of SUNWsecfg Error Messages and recommended action, refer to Appendix B.

Verifying Configuration Settings

During the course of troubleshooting, you might need to verify configuration settings on the various components in the Sun StorEdge 3900 or 6900 series.
To Verify Configuration Settings
Run one of the following scripts:
Use the /opt/SUNWsecfg/runsecfg script and select the various Verify menu
selections.
Run the /opt/SUNWsecfg/bin/checkdefaultconfig script to check all
accessible components. The output is shown in
Run the checkswitch | checkt3config | checkve | checkvemap scripts
manually from /opt/SUNwsecfg/bin.
The scripts listed above check the default configuration files in the /opt/ SUNWsecfg/etc directory and compare the current, live settings to those of the defaults. Any differences are marked with a FAIL.
CODE EXAMPLE 4-1.
47
Note – For cluster configurations and systems that are attached to Windows NT, the
default configurations may not match the current installed configuration. Be aware of this when running the verification scripts. Certain items may be flagged as FAIL in these special circumstances.
CODE EXAMPLE 4-1 /opt/SUNWsecfg/checkdefaultconfig output
# /opt/SUNWsecfg/checkdefaultconfig
Checking all accessible components.....
Checking switch: sw1a Switch sw1a - PASSED Checking switch: sw1b Switch sw1b - PASSED Checking switch: sw2a Switch sw2a - PASSED Checking switch: sw2b Switch sw2b - PASSED Please enter the Sun StorEdge T3+ array password :
Checking T3+: t3b0
Checking : t3b0 Configuration.......
Checking command ver : PASS Checking command vol stat : PASS Checking command port list : PASS Checking command port listmap : PASS
Checking command sys list : FAIL <-- Failure Noted
Checking T3+: t3b2
Checking : t3b2 Configuration.......
Checking command ver : PASS Checking command vol stat : PASS Checking command port list : PASS Checking command port listmap : PASS Checking command sys list : PASS <snip>
Checking Virtualization Engine Pair Parameters: v1a v1a configuration check passed
Checking Virtualization Engine Pair Parameters: v1b v1b configuration check passed
Checking Virtualization Engine Pair Configuration: v1 checkvemap: virtualization engine map v1 verification complete: PASS.
48 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
10. If anything is marked FAIL, check the /var/adm/log/SEcfglog file for the details of the failure.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : ----------
-SAVED CONFIGURATION--------------. Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : blocksize : 16k. Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : cache : auto. Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : mirror : auto. Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : mp_support : rw. Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : rd_ahead : off. Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : recon_rate : med. Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : sys memsize : 32 MBytes. Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : cache memsize : 256 MBytes. Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : . Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : ----------
-CURRENT CONFIGURATION------------. Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : blocksize : 16k. Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : cache : auto. Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : mirror : off.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : mp_support : rw. Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : rd_ahead : off. Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : recon_rate : med. Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : sys memsize : 32 MBytes. Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : cache memsize : 256 MBytes. Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : . Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : ----------
In this example, the mirror setting in the Sun StorEdge T3+ array system settings is “off.” The SAVED CONFIGURATION setting for this parameter, which is the default setting, should be “auto.”
Chapter 4 Configuration Settings 49
For Internal Use Only
11. Fix the FAIL condition, and then verify the settings again.
# /opt/SUNWsecfg/bin/checkt3config -n t3b0
Checking : t3b0 Configuration.......
Checking command ver : PASS Checking command vol stat : PASS Checking command port list : PASS Checking command port listmap : PASS Checking command sys list : PASS
If you interrupt any of the SUNWsecfg scripts (by typing a Control-C default font, for example), a lock file might remain in the /opt/SUNWsecfg/etc directory, causing subsequent commands to fail. Use the following procedure to clear the lock file.
To Clear the Lock File
1. Type the following command:
# /opt/SUNWsecfg/bin/removelocks
usage : removelocks [-t|-s|-v]
where:
-t - remove all T3+ related lock files.
-s - remove all switch related lock files.
-v - remove all virtualization engine related lock files.
# /opt/SUNWsecfg/bin/removelocks -v
Note – After any virtualization engine configuration change, the script saves a new
copy of the virtualization engine map. This may take a minimum of two minutes, during which time no additional virtualization engine changes are accepted.
2. Monitor the /var/adm/log/SEcfglog file to see when the savevemap process successfully exits.
50 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
CODE EXAMPLE 4-2 savevemap output
Tue Jan 29 16:12:34 MST 2002 savevemap: v1 ENTER. Tue Jan 29 16:12:34 MST 2002 checkslicd: v1 ENTER. Tue Jan 29 16:12:42 MST 2002 checkslicd: v1 EXIT.
Tue Jan 29 16:14:01 MST 2002 savevemap: v1 EXIT.
When savevemap: <ve-pair> EXIT is displayed, the savevemap process has successfully exited.
For Internal Use Only
Chapter 4 Configuration Settings 51
52 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
CHAPTER
5
Troubleshooting Host Devices
This chapter describes how to troubleshoot components associated with a Sun StorEdge 3900 or 6900 series Host.
This chapter contains the following sections:
“Using the Host Event Grid” on page 53
“To Replace the Master Host” on page 57
“To Replace the Alternate Master or Slave Monitoring Host” on page 58

Host Event Grid

The Storage Automated Diagnostic Environment Event Grid enables you to sort host events by component, category, or event type. The Storage Automated Diagnostic Environment GUI displays an event grid that describes the severity of the event, whether action is required, a description of the event, and the recommended action. Refer to the Storage Automated Diagnostic Environment User’s Guide for more information.

Using the Host Event Grid

1. From the Storage Automated Diagnostic Environment Help menu, click the Event Grid link.
2. Select the criteria from the Storage Automated Diagnostic Environment event grid, like the one shown in
FIGURE 5-1.
53
FIGURE 5-1 Host Event Grid
54 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
TABLE 5-1 lists all the host events in the Storage Automated Diagnostic Environment.
TABLE5-1 Storage Automated Diagnostic Environment Event Grid for the Host
Category Component EventType Sev Action Description Information
host hba Alarm+ Yellow [ Info ] status of hba
/devices/ sbus@9,0/
Monitors changes in the output of the
luxadm -e port.
SUNW,qlc@0,30000 /fp@0,0:devctl on diag.xxxxx.xxx.com
changed from NOT CONNECTED to CONNECTED
host hba Alarm- Red Y [
Info ] status of hba
/devices/ sbus@9,0/ SUNW,qlc@0,30000 /fp@0,0:devctl
• Monitors changes in the output of the luxadm -e port.
• Found path to 20 HBA ports.
on diag.xxxxx.xxx.com
changed from CONNECTED to NOT CONNECTED
host lun.t300 Alarm- Red Y [
Info ] The state of
lun.T300.c14t500 20F2300003EE5d0s
2.statusA on diag.xxxxx.xxx.com
changed from OK to ERROR (target=t3:diag244­t3b0/90.0.0.40)
luxadm display reported a change in the port status of one of its paths. The Storage Automated Diagnostic Environment then tries to find to which enclosurethis path corresponds by reviewing its database of Sun StorEdge T3+ arrays and virtualization engines.
For Internal Use Only
Chapter 5 Troubleshooting Host Devices 55
TABLE5-1 Storage Automated Diagnostic Environment Event Grid for the Host (Continued)
host lun.VE Alarm- Red Y [ Info ] The state of
lun.VE.c14t50020 F2300003EE5d0s2. statusA on diag.xxxxx.xxx.com
changed from OK to ERROR (target=ve:diag244­ve0/90.0.0.40)
host ifptest Diagnostic
Test-
host qlctest Diagnostic
Test-
host socaltest Diagnostic
Test-
host enclosure PatchInfo [
Red Y ifptest (diag240) on
host failed.
Red qlctest (diag240) on
host failed.
Red socaltest (diag240) on
host failed.
Info ] New patch and package information generated.
host enclosure backup [
Info ] Agent Backup Backup of the
luxadm display reported a change in the port status of one of its paths. The Storage Automated Diagnostic Environment then tries to find to which enclosurethis path corresponds by reviewing its database of Sun StorEdge T3+ arrays and virtualization engines.
Send changes to the output of showrev ­p and pkginfo -|.
configuration file of the agent.
56 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Replacing the Master, Alternate Master, and Slave Monitoring Host
The following procedures are a high-level overview of the procedures that are detailed in the Storage Automated Diagnostic Environment User’s Guide. Follow these procedures when replacing a master, alternate master, or slave monitoring host.
Note – The procedures for replacing the master host are different from the
procedures for replacing an alternate master or slave monitoring host.

To Replace the Master Host

Refer to Chapter 2 of the Storage Automated Diagnostic Environment User’s Guide for detailed instructions for the next four steps.
1. Install the SUNWstade package on a new Master Host.
2. Run /opt/SUNWstade/bin/ras_install on the new Master Host.
3. Configure the Host as the Master Host.
4. Connect to the Master Server’s GUI at http://<servername>:7654.
For Internal Use Only
Chapter 5 Troubleshooting Host Devices 57
5. Choose Utilities -> System -> Recover Config.
Refer to Chapter 7 of the Storage Automated Diagnostic Environment User’s Guide for detailed instructions.
a. In the Recover Config window, enter the IP address of any alternate master or
slave monitoring host (all hosts keep a copy of the configuration).
b. Make sure the Recover Config and Reset slave to this master checkboxes are
checked.
c. Click Recover.
6. Choose Maintenance -> General Maintenance.
Ensure that all host and device settings are recovered correctly. Refer to Chapter 3 of the Storage Automated Diagnostic Environment User’s Guide for
detailed instructions.
7. Choose Maintenance -> General Maintenance -> Start/Stop Agent to start the agent on the master host.
To Replace the Alternate Master or Slave
Monitoring Host
1. Choose Maintenance -> General Maintenance -> Maintain Hosts.
Refer to Chapter 3, “Maintenance,” of the Storage Automated Diagnostic Environment User’s Guide.
2. In the Maintain Hosts window, select the host to be replaced from the Existing Hosts list, and click Delete.
3. Install the new host.
Refer to Chapter 2 of the Storage Automated Diagnostic Environment User’s Guide for detailed instructions for the next four steps.
4. Install the SUNWstade package on the new host.
5. Run /opt/SUNWstade/bin/ras_install.
6. Configure the host as a slave.
58 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
7. Choose Maintenance -> General Maintenance -> Maintain Hosts.
Refer to Chapter 3, “Maintenance,” of the Storage Automated Diagnostic User’s Guide for detailed instructions.
8. In the Maintain Hosts window, select the new host.
9. Configure the options as needed.
10. Choose Maintenance -> Topology Maintenance -> Topology Snapshot. a. In the Topology Snapshot window, select the new host. b. Click Create and Retrieve Selected Topologies. c. Click Merge and Push Master Topology.
Conclusion
Any time a master, alternate master, or slave monitoring host is replaced, you must recover the configuration using the procedures described above. This is especially important when the Storage Service Processor is replaced as a FRU, whether the Storage Service Processor is the master or the slave.
For Internal Use Only
Chapter 5 Troubleshooting Host Devices 59
60 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CHAPTER
6
Troubleshooting Sun StorEdge FC Switch-8 and Switch-16 Devices
This chapter describes how to troubleshoot the switch components associated with a Sun StorEdge 3900 or 6900 series system.
This chapter contains the following sections:
“Sun StorEdge Network FC Switch-8 and Switch-16 Switch Description” on
page 61
“Switch Event Grid” on page 62
“setupswitch Exit Values” on page 68
“Replacing the Master Midplane” on page 68
Sun StorEdge Network FC Switch-8 and Switch-16 Switch Description
The Sun StorEdge network FC switch-8 and switch-16 switches provide cable consolidation and increased connectivity for the internal data interconnection infrastructure.
The switches are paired to provide redundancy. Two switches are used in each Sun StorEdge 3900 series, and four switches are used in each Sun StorEdge 6900 series. Each Sun StorEdge network FC switch-8 and switch-16 switch is connected by way of an Ethernet to the service network for management and service from the Storage Service Processor.
61
These switches can be monitored through the SANSurfer GUI, which is available on the Storage Service Processor. You configure and modify the switches using the Configuration Utilities. Do not configure or modify the switches using any method other
than the
SUNWsecfg
tools.

To Diagnose and Troubleshoot Switch Hardware

1. To diagnose and troubleshoot the switch hardware, begin by running the SUNWsecfg checkswitch utility.
2. For detailed troubleshooting procedures, refer to the Sun StorEdge SAN Field Troubleshooting Guide, Release 3.0.
The Sun StorEdge SAN Field Troubleshooting Guide, Release 3.0 describes how to diagnose and troubleshoot the switch hardware. The scope of this document includes the Sun StorEdge network FC switch-8 and switch-16 switch and the interconnections (HBA, GBIC, cables) on either side of the switch. In addition, the document provides examples of fault isolation and includes a Brocade switch appendix.

Switch Event Grid

The Storage Automated Diagnostic Environment Event Grid enables you to sort switch events by component, category, or event type. The Storage Automated Diagnostic Environment GUI displays an event grid that describes the severity of the event, whether action is required, a description of the event, and the recommended action. Refer to the Storage Automated Diagnostic Environment User’s Guide for more information.
Using the Switch Event Grid
1. From the Storage Automated Diagnostic Environment Help menu, click the Event Grid link.
2. Select the criteria from the Storage Automated Diagnostic Environment event grid, like the one shown in
62 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
FIGURE 6-1.
FIGURE 6-1 Switch Event Grid
Chapter 6 Troubleshooting Sun StorEdge FC Switch-8 and Switch-16 Devices 63
For Internal Use Only
TABLE 6-1 lists the switch events.
TABLE6-1 Storage Automated Diagnostic Environment Event Grid for Switches
Cat Component EventType Sev Action Description Information/Action
switch port statistics Log Yellow Y [ Info/Action ]
Information: The
switch has reported Change in port statistics on switch
diag156-sw1b
(ip=192.168.0.31)
a change in an error
counter. This could
indicate a failing
component in the
link.
Action:
Check the Topology
GUI for any link
errors.
Run linktest on the
link to isolate the
failing FRU. Quiesce
I/O on the link
before running
linktest.
switch chassis.fan Alarm Yellow chassis.fan.1 status
changed from OK
switch chassis.power Alarm Yellow [
Info ] chassis.power.1 status changed from OK
This event monitors changes in the status of the chassis’ power supply, as reportedby SANbox chassis_status.
switch chassis.temp Alarm Yellow [
Info ] chassis.temp.1 status changed from OK
switch chassis.zone Alarm Yellow [
Info ] Switch sw1a was rezoned: [ new zones ...]
64 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
This event monitors changes in the status of the chassis’ temperature supply, as reported by SANbox chassis_status.
This event reports changes in the zoning of a switch.
TABLE6-1 Storage Automated Diagnostic Environment Event Grid for Switches (Continued)
Cat Component EventType Sev Action Description Information/Action
switch enclosure Audit Auditing a new
switch called ras
d2-swb1 (ip=xxx.0.0.41) 10002000007a609
switch oob Comm_
Established
switch oob Comm_Lost Down Yes [ Info/Action ] Lost
Communication regained with
(ip=xxx.20.67.213)
sw1a
communication with sw1a (ip=xxx.20.67.213)
Information: Ethernet connectivity to the switch has been lost.
Recommended action:
1. Check Ethernet connectivity to the switch.
2. Verify that the switch is booted correctly with no POST errors.
3. Verify that the switch Test Mode is set for normal operations.
4. Verify the TCP/ IP settings on switch viaForced PROM Mode access.
5. Replace switch,if needed.
switch switchtest Diagnostic
Test-
Chapter 6 Troubleshooting Sun StorEdge FC Switch-8 and Switch-16 Devices 65
For Internal Use Only
Red switchtest (diag240)
on d2-swb1 (ip=xxx.0.0.41) 10002000007a609
TABLE6-1 Storage Automated Diagnostic Environment Event Grid for Switches (Continued)
Cat Component EventType Sev Action Description Information/Action
switch enclosure Discovery [ Info ] Discovered a
new switch called ras d2-swb1 (ip=xxx.0.0.41) 10002000007a609
Discovery events occur the very first time the agent probes a storage device. It creates a detailed description of the device monitored and sends it using any active notifier (NetConnect, Email).
switch enclosure LocationChan
ge
Location of switch rasd2-swb0 (ip xxx.0.0.40) was changed
66 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
TABLE6-1 Storage Automated Diagnostic Environment Event Grid for Switches (Continued)
Cat Component EventType Sev Action Description Information/Action
switch port StateChange+ [ Info/Action ] port.1
in SWITCH diag185 (ip= xxx.20.67.185)is now Available (status­state changed from OFFLINE to ONLINE)
switch port StateChange- Red Y [
switch enclosure Statistics [
Info/Action ] port.1 in SWITCH diag185 (ip=xxx.20.67.185) is now Not-Available (status state changed from ONLINE to OFFLINE)
Info ] Statistics about switch d2-
swb1 (ipxxx.0.0.41) 10002000007a609
Port on switch is now available.
Information: A port on the switch has logged out of the Fabric and has gone offline.
Recommended action:
1. Verify cables, GBICs, and connections along the Fibre Channel path.
2. Check Storage Automated Diagnostic Environment SAN Topology GUI to identify failing segment of the data path.
3. Verify the correct FC switch configuration.
Port Statistics
Chapter 6 Troubleshooting Sun StorEdge FC Switch-8 and Switch-16 Devices 67
For Internal Use Only

Replacing the Master Midplane

Follow this procedure when replacing the master midplane in a Sun StorEdge network FC switch-8 or switch-16 switch or a Brocade Silkworm switch. This procedure is detailed in the Storage Automated Diagnostic Environment User’s Guide.
To Replace the Master Midplane
1. Choose Maintenance --> General Maintenance -- > Maintain Devices.
Refer to Chapter 3 of the Storage Automated Diagnostic Environment User’s Guide.
2. In the Maintain Devices window, delete the device that is to be replaced.
3. Choose Maintenance -- > General Maintenance -- > Discovery.
4. In the Device Discovery window, rediscover the device.
5. Choose Maintenance -- > Topology Maintenance -- > Topology Snapshot. a. Select the host that monitors the replaced FRU. b. Click Create and Retrieve Selected Topologies. c. Click Merge and Push Master Topology.
Conclusion
Any time a master midplane is replaced, you must rediscover the device using the procedure described above. This is especially important when the Storage Service Processor is replaced as a FRU, whether the Storage Service Processor is the master or the slave.
68 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CHAPTER
7
Troubleshooting Virtualization Engine Devices
This chapter describes how to troubleshoot the virtualization engine component of a Sun StorEdge 6900 series system.
This chapter contains the following sections:
“Virtualization Engine Description” on page 69
“Translating Host Device Names” on page 78
“Sun StorEdge 6900 Series Multipathing Example” on page 89
“Virtualization Engine Event Grid” on page 95

Virtualization Engine Description

The virtualization engine supports the multipathing functionality of the Sun StorEdge T3+ array. Each virtualization engine has physical access to all underlying Sun StorEdge T3+ arrays and controls access to half of the Sun StorEdge T3+ arrays. The virtualization engine has the ability to assume control of all arrays in the event of component failure. The configuration is maintained between virtualization engine pairs through redundant T Port connections by way of a pair of Sun StorEdge network FC switch-8 or switch-16 switches.
69

Virtualization Engine Diagnostics

The virtualization engine monitors the following components:
Virtualization engine router
Sun StorEdge T3+ array
Cabling among the router and storage

Service Request Numbers

The service request numbers are used to inform the user of storage subsystem activities.

Service and Diagnostic Codes

The virtualization engine’s service and diagnostic codes inform the user of subsystem activities. The codes are presented as a LED readout. See Appendix the table of codes and actions to take. In some cases, you might not be able to receive Service Request Numbers (SRNs) because of communication errors. If this occurs, you must read the virtualization engine LEDs to determine the problem.
A for

To Retrieve Service Information

You can retrieve service information in two ways:
CLI Interface
Error Log Analysis Commands
Both of these methods are described in the following sections.
CLI Interface
The SLIC daemon, which runs on the Storage Service Processor, communicates with the virtualization engine. The SLIC daemon periodically polls the virtualization engine for all subsystem errors and for topology changes. It then passes this information in the form of an SRN to the Error Log file.
70 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
To Display Log Files and Retrieve SRNs
Use the /opt/svengine/sduc/sreadlog command to display log files and retrieve the Service Request Numbers (SRN) for errors that need action. Data is returned in the following format:
TimeStamp:nnn:Txxxxx.uuuuuuuu SRN=mmmmm TimeStamp:nnn:Txxxxx.uuuuuuuu SRN=mmmmm TimeStamp:nnn:Txxxxx.uuuuuuuu SRN=mmmmm
Item Description
TimeStamp Time and date when error occurred nnn The name of the virtualization engine pair (v1 or v2) Txxxxx The LUN where the error occurred.
Note: Txxxxx can represent a physical or a logical LUN. uuuuuuuu The unique ID of the drive or the virtualization engine router SRN=mmmmm The SRN defined in numerical order
Example
# /opt/svengine/sduc/sreadlog -d v1
2002:Jan:3:10:13:05:v1.29000060-220041F9.SRN=70030 2002:Jan:3:10:13:31:v1.29000060-220041F9.SRN=70030 2002:Jan:3:10:17:10:v1.29000060-220041F9.SRN=70030 2002:Jan:3:10:17:37:v1.29000060-220041F9.SRN=70030 2002:Jan:3:10:22:26:v1.29000060-220041F9.SRN=70030 2002:Jan:3:10:25:54:v1.29000060-220041F9.SRN=70030
Chapter 7 Troubleshooting Virtualization Engine Devices 71
For Internal Use Only
Item Description
TimeStamp January 3, 2002 10:13
nnn v1 (virtualization engine pair v1) uuuuuuuu 29000060-220041F9 (v1a, obtained by checking the virtualization
engine map from the SEcfg utility)
SRN=mmmmm SRN=70030: SAN Configuration Changed
(Refer to Appendix A for codes.)
To Clear the Log
Use the /opt/svengine/sduc/sclrlog command.

Virtualization Engine LEDs

TABLE 7-1 describes the LEDs on the back of the virtualization engine..
TABLE7-1 Virtualization Engine LEDs
LED Color State Description
Power Green Solid on The virtualization engine is powered
on
1
Status
Green • Solid on
• Normal operating mode
• Blink Service Code
Fault Amber Serious problem Decipher the blinking of the Status
1 The Status LED will blink a service code when the Fault LED is Solid on.
72 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
• Number of blinks to indicate a decimal number
LED to determine the service code. Once you have determined the service code, look up the decimal number of the service code in Appendix A.

Power LED Codes

The virtualization engine LEDs are shown in FIGURE 7-1.
FIGURE 7-1 Virtualization Engine Front Panel LEDs

Interpreting LED Service and Diagnostic Codes

The Status LED communicates the status of the virtualization engine in decimal numbers. Each decimal number is represented by number of blinks, followed by a medium duration (two seconds) of LED off. descriptions.
TABLE7-2 LED Service and Diagnostic Codes
TABLE 7-2 lists the status LED code
0 Fast blink 1 LED blinks once 2 LED blinks twice with one short duration (one second) between blinks 3 LED blinks three times with one short duration (one second) between blinks
...
10 LED blinks ten times with one short duration (one second) between blinks
The blink code repeats continuously, with a four-second off interval between code sequences.
Chapter 7 Troubleshooting Virtualization Engine Devices 73
For Internal Use Only

Back Panel Features

The back panel of the virtualization engine contains the Sun StorEdge network FC switch-8 or switch-16 switches and a socket for the AC power input, and various data ports and LEDs.

Ethernet Port LEDs

The Ethernet port LEDs indicate the speed, activity, and validity of the link, shown
TABLE 7-3.
in
TABLE7-3 Speed, Activity, and Validity of the Link
LED Color State Description
Speed Amber Solid On
The link is 100Base-TX
Off
Link Activity Green Solid On
Blink
The link is 10base-T A valid link is established
Normal operations, including data activity
74 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002

Fibre Channel Link Error Status Report

The virtualization engine’s host-side and device-side interfaces provide statistical data for the counts listed in
TABLE7-4 Virtualization Engine Statistical Data
Count Type Description
Link Failure Count The number of times the virtualization engine’s frame manager
detects a non-operational state or other failure of N_Port initialization protocol.
Loss of Synchronization Count
Loss of Signal Count The number of times that the virtualization engine’s frame manager
Primitive Sequence Protocol Error
Invalid Transmission Word
Invalid CRC Count The number of times that the virtualization engine receives frames
The number of times that the virtualization engine detects a loss in synchronization.
detects a loss of signal. The number of times that the virtualization engine’s frame manager
detects N_Port protocol errors. The number of times that the virtualization engine’s 8b/10b
decoder does not detect a valid 10-bit code.
with a bad CRC and a valid EOF. A valid EOF includes EOFn, EOFt, or EOFdti.
TABLE 7-4.
For Internal Use Only
Chapter 7 Troubleshooting Virtualization Engine Devices 75
To Check Fibre Channel Link Error Status
Manually
The Storage Automated Diagnostic Environment, which runs on the Storage Service Processor, monitors the Fibre Channel link status of the virtualization engine. The virtualization engine must be power-cycled to reset the counters. Therefore, you should manually check the accumulation of errors between a fixed period of time. To check the status manually, follow these steps:
1. Use the svstat command to take a reading, as shown in
A Status report for the host-side and device-side ports is displayed.
2. Within the next few minutes, take another reading.
The number of new errors that occurred within that time frame represents the number of link errors.
CODE EXAMPLE 7-1.
Note – If the t3ofdg(1M) is running while you perform these steps, the following
error message is displayed:
Daemon error: check the SLIC router.
76 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
CODE EXAMPLE 7-1 Fibre Channel Link Error Status Example
# /opt/svengine/sduc/svstat -d v1
I00001 Host Side FC Vital Statistics: Link Failure Count 0 Loss of Sync Count 0 Loss of Signal Count 0 Protocol Error Count 0 Invalid Word Count 8 Invalid CRC Count 0
I00001 Device Side FC Vital Statistics: Link Failure Count 0 Loss of Sync Count 0 Loss of Signal Count 0 Protocol Error Count 0 Invalid Word Count 139 Invalid CRC Count 0
I00002 Host Side FC Vital Statistics: Link Failure Count 0 Loss of Sync Count 0 Loss of Signal Count 0 Protocol Error Count 0 Invalid Word Count 11 Invalid CRC Count 0
I00002 Device Side FC Vital Statistics: Link Failure Count 0 Loss of Sync Count 0 Loss of Signal Count 0 Protocol Error Count 0 Invalid Word Count 135 Invalid CRC Count 0 diag.xxxxx.xxx.com: root#
Note – v1 represents the first virtualization engine pair
Note – The SLIC daemon must be running for the
/opt/svengine/sduc/svstat -d v1 command to work.
Chapter 7 Troubleshooting Virtualization Engine Devices 77
For Internal Use Only
Translating Host Device Names
You can translate host device names to VLUN, disk pool, and physical Sun StorEdge T3+ array LUNs.
The luxadm output for a host device, shown in
CODE EXAMPLE 7-2, does not include
the unique VLUN serial number that is needed to identify this LUN.
CODE EXAMPLE 7-2 luxadm Output for a Host Device
# luxadm display /dev/rdsk/c4t2B00006022004186d0s2
DEVICE PROPERTIES for disk: /dev/rdsk/c4t2B00006022004186d0s2 Status(Port A): O.K. Vendor: SUN Product ID: SESS01 WWN(Node): 2a00006022004186 WWN(Port A): 2b00006022004186 Revision: 080E Serial Num: Unsupported Unformatted capacity: 56320.000 MBytes Write Cache: Enabled Read Cache: Enabled Minimum prefetch: 0x0 Maximum prefetch: 0x0 Device Type: Disk device Path(s): /dev/rdsk/c4t2B00006022004186d0s2 /devices/pci@1f,4000/pci@2/SUNW,qlc@5/fp@0,0/ ssd@w2b00006022004186,0:c,raw
78 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
To Display the VLUN Serial Number
Devices That Are Not Sun StorEdge Traffic Manager-Enabled
1. Use the format -e command.
2. Type the disk on which you are working at the format prompt.
3. Type inquiry at the scsi prompt.
4. Find the VLUN serial number in the Inquiry displayed list.
# format -e c4t2B00006022004186d0
format> scsi
...
scsi> inquiry
Inquiry:
00 00 03 12 2b 00 00 02 53 55 4e 20 20 20 20 20 ....+...SUN
53 45 53 53 30 31 20 20 20 20 20 20 20 20 20 20 SESS01 30 38 30 45 62 57 33 4b 30 30 31 48 30 30 30 080EbW3K001H000
Vendor: SUN Product: SESS01 Revision: 080E Removable media: no Device type: 0
From this screen, note that the VLUN number is 62 57 33 4b 30 30 31 48 , beginning with the 5th pair of numbers on the 3rd line, up to and including the 12 pair.
Chapter 7 Troubleshooting Virtualization Engine Devices 79
For Internal Use Only
Sun StorEdge Traffic Manager-Enabled Devices
1. If the devices support the Sun StorEdge Traffic Manager software, you can use this shortcut.
2. Type:
# luxadm display /dev/rdsk/c6t29000060220041956257334B30303148d0s2
DEVICE PROPERTIES for disk: /dev/rdsk/ c6t29000060220041956257334B30303148d0s2 Status(Port A): O.K. Status(Port B): O.K. Vendor: SUN Product ID: SESS01 WWN(Node): 2a00006022004195 WWN(Port A): 2b00006022004195 WWN(Port B): 2b00006022004186 Revision: 080E Serial Num: Unsupported Unformatted capacity: 56320.000 MBytes Write Cache: Enabled Read Cache: Enabled Minimum prefetch: 0x0 Maximum prefetch: 0x0 Device Type: Disk device Path(s): /dev/rdsk/c6t29000060220041956257334B30303148d0s2 /devices/scsi_vhci/ssd@g29000060220041956257334b30303148:c,raw Controller /devices/pci@1f,4000/SUNW,qlc@4/fp@0,0 Device Address 2b00006022004195,0 Class primary State ONLINE Controller /devices/pci@1f,4000/pci@2/SUNW,qlc@5/fp@0,0 Device Address 2b00006022004186,0 Class primary State ONLINE
The /dev/rdsk/c#t# represents the Global Unique Identifier of the device. It is 32 bits long.
The first 16 bits correspond to the WWN of the master virtualization engine
router.
The remaining 16 bits are a the VLUN serial number.
Virtualization engine WWN = 2900006022004195
VLUN serial number = 6257334B30303148
80 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
To View the Virtualization Engine Map
The virtualization engine map is stored on the Storage Service Processor.
1. To view the virtualization engine map, type:
# showvemap -n v1 -f
VIRTUAL LUN SUMMARY
Disk pool VLUN Serial MP Drive VLUN VLUN Size Slic Zones Number Target Target Name GB
--------------------------------------------------------------------------­t3b00 6257334B30303148 T49152 T16384 VDRV000 55.0 t3b00 6257334B30303149 T49152 T16385 VDRV001 55.0
***** DISK POOL SUMMARY
Disk pool RAID MP Drive Size Free Space T3+ Active Number of Target GB GB Path WWN VLUNs
----------------------------------------------------------------------­t3b00 5 T49152 116.7 6.7 50020F2300006DFA 2 t3b01 5 T49153 116.7 116.7 50020F230000725B 0
***** MULTIPATH DRIVE SUMMARY
Disk pool MP Drive T3+ Active Controller Serial Target Path WWN Number
------------------------------------------------------­t3b00 T49152 50020F2300006DFA 60020F2000006DFA t3b01 T49153 50020F230000725B 60020F2000006DFA ***** VIRTUALIZATION ENGINE SUMMARY
Initiator UID VE Host Online Revision Number of SLIC Zones
-------------------------------------------------------------------------­I00001 2900006022004195 v1a Yes 08.14 0 I00002 2900006022004186 v1b Yes 08.14 0 ***** ZONE SUMMARY Zone Name HBA WWN Initiator Online Number of VLUNs
--------------------------------------------------------------------­Undefined 210000E08B033401 I00001 Yes 0 Undefined 210000E08B026C0F I00002 Yes 0
Note – This example uses the virtualization engine map file, which could include
old information.
Chapter 7 Troubleshooting Virtualization Engine Devices 81
For Internal Use Only
2. You can optionally establish a telnet connection to the virtualization engine and run the runsecfg utility to poll a live snapshot of the virtualization engine map.
Refer to “To Replace a Failed Virtualization Engine” on page 84 for telnet instructions.
Determining the virtualization engine pairs on the system .........
MAIN MENU - SUN StorEdge 6910 SYSTEM CONFIGURATION TOOL
1) T3+ Configuration Utility
2) Switch Configuration Utility
3) Virtualization Engine Configuration Utility
4) View Logs
5) View Errors
6) Exit Select option above:> 3
VIRTUALIZATION ENGINE MAIN MENU
1) Manage VLUNs
2) Manage Virtualization Engine Zones
3) Manage Configuration Files
4) Manage Virtualization Engine Hosts
5) Help
6) Return Select option above:> 3
MANAGE CONFIGURATION FILES MENU
1) Display Virtualization Engine Map
2) Save Virtualization Engine Map
3) Verify Virtualization Engine Map
4) Help
5) Return Select configuration option above:> 1 Do you want to poll the live system (time consuming) or view the file [l|f]: l
From the virtualization engine map output, you can match the VLUN serial number to the VLUN name (VDRV000), the disk pool (t3b00) and the MP drive target (T49152). This information can also help you find the controller serial number (60020F2000006DFA), which you need to perform Sun StorEdge T3+ array LUN failback commands.
82 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
To Failback the Virtualization Engine
In the event of a Sun StorEdge T3+ array LUN failover, use the following procedure to fail the LUN back to its original controller.
1. From the Storage Service Processor, type:
# /opt/svengine/sduc/mpdrive failback -d v1 -j 60020F2000006DFA
where:
-d Virtualization engine pair on which to run the command
-j Controller serial number, which corresponds to the Sun
StorEdge T3+ array WWN of the affected partner pair
The failback command will always be performed on the controller serial number, regardless by which controller the LUN actually is currently owned (the Master or Alt-Master). All VLUNS are affected by a failover and failback of the underlying physical LUN.
The controller serial number is the system WWN for the Sun StorEdge T3+ array. In the above example, the master Sun StorEdge T3+ array WWN is
50020F2300006DFA, and the number used in the failback command is 60020F2000006DFA.
2. The SLIC daemon must be running for the mpdrive failback command to work. Ensure that the SLIC daemon is running by using the command found in
CODE EXAMPLE 7-3.
If no SLIC processes are running, you can start them manually using the
SUNWsecfg scripts, which are located in the /opt/SUNWsecfg/bin/startslicd
-n v1 directory.
CODE EXAMPLE 7-3 slicd Output Example
# ps -ef | grep slic
root 6299 6295 0 Jan 04 ? 0:00 ./slicd root 6296 6295 0 Jan 04 ? 0:02 ./slicd root 6295 1 0 Jan 04 ? 0:01 ./slicd root 6357 6295 0 Jan 04 ? 0:00 ./slicd root 6362 6295 0 Jan 04 ? 0:03 ./slicd
Chapter 7 Troubleshooting Virtualization Engine Devices 83
For Internal Use Only
For detailed information about the SUNWsecfg scripts, refer to the Sun StorEdge 3900 and 6900 Series Reference Manual.
To Replace a Failed Virtualization Engine
1. Replace the old (failed) virtualization engine unit with a new unit.
2. Identify the MAC address of the new unit and replace the old MAC address with the new one in the /etc/ethers file:
8:0:20:7d:82:9e virtualization engine-name
3. Verify that RARP is running on the Storage Service Processor.
4. Disable the switch port:
# /opt/SUNWsecfg/flib/setveport -v VE-name -d
5. Power on the new unit.
6. Log in to the new unit, for example:
# telnet v1a virtualization engine-name
7. From the User Service Utility Menu, enter 9 to clear the SAN database.
8. Choose Quit to clear the SAN database.
9. Configure the new unit:
# setupve -n virtualization engine-name
10. Check the configuration:
# checkve -n virtualization engine-name
84 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
Loading...