ADX, AnyIO, Brocade, Brocade Assurance, the B-wing symbol, DCX, Fabric OS, ICX, MLX, MyBrocade, OpenScript, VCS, VDX, and
Vyatta are registered trademarks, and HyperEdge, The Effortless Network, and The On-Demand Data Center are trademarks of
Brocade Communications Systems, Inc., in the United States and/or in other countries. Other brands, products, or service names
mentioned may be trademarks of their respective owners.
Notice: This document is for informational purposes only and does not set forth any warranty, expressed or implied, concerning
any equipment, equipment feature, or service of fered or to be offered by Brocade. Brocade reserves the right to make changes to
this document at any time, without notice, and assumes no responsibility for its use. This informational document describes
features that may not be currently available. Contact a Brocade sales office for information on feature and product availability.
Export of technical data contained in this document may require an export license from the United States government.
The authors and Brocade Communications Systems, Inc. shall have no liability or responsibility to any person or entity with
respect to any loss, cost, liability, or damages arising from the information contained in this book or the computer programs that
accompany it.
The product described by this document may contain “open source” software covered by the GNU General Public License or other
open source license agreements. To find out which open source software is included in Brocade products, view the licensing
terms applicable to the open source software, and obtain a copy of the programming source code, please visit
http://www.brocade.com/support/oscd.
Brocade Communications Systems, Incorporated
Corporate and Latin American Headquarters
Brocade Communications Systems, Inc.
130 Holger Way
San Jose, CA 95134
Tel: 1-408-333-8000
Fax: 1-408-333-8101
E-mail: info@brocade.com
European Headquarters
Brocade Communications Switzerland Sàrl
Centre Swissair
Tour B - 4ème ét age
29, Route de l'Aéroport
Case Postale 105
CH-1215 Genève 15
Switzerland
Tel: +41 22 799 5640
Fax: +41 22 799 5641
E-mail: emea-info@brocade.com
Asia-Pacific Headquarters
Brocade Communications Systems China HK, Ltd.
No. 1 Guanghua Road
Chao Yang District
Units 2718 and 2818
Beijing 100020, China
Tel: +8610 6588 8888
Fax: +8610 6588 9999
E-mail: china-info@brocade.com
Asia-Pacific Headquarters
Brocade Communications Systems Co., Ltd. (Shenzhen WFOE)
Citic Plaza
No. 233 Tian He Road North
Unit 1308 – 13th Floor
Guangzhou, China
Tel: +8620 3891 2000
Fax: +8620 3891 2111
E-mail: china-info@brocade.com
Document History
TitlePublication numberSummary of changesDate
Fabric OS Troubleshooting and
Diagnostics Guide
Fabric OS Troubleshooting and
Diagnostics Guide
Fabric OS Troubleshooting and
Diagnostics Guide
Fabric OS Troubleshooting and
Diagnostics Guide
Fabric OS Troubleshooting and
Diagnostics Guide
Fabric OS Troubleshooting and
Diagnostics Guide
Fabric OS Troubleshooting and
Diagnostics Guide
Fabric OS Troubleshooting and
Diagnostics Guide
Fabric OS Troubleshooting and
Diagnostics Guide
Fabric OS Troubleshooting and
Diagnostics Guide
53-0000853-01First released edition.March 2008
53-1001187-01Added support for Virtual Fabrics, fcPing,
pathInfo, and additional troubleshooting
tips.
53-1001340-01Added support for checking physical
connections, updated commands,
removed obsolete information, and moved
the FCIP and FICON chapters into their
respective books.
53-1001769-01Added support for the Rolling Reboot
Detection feature and the Superping tool;
added enhancements for supportSave
and spinFab; updated commands;
transferred the iSCSI chapter into its
respective book.
53-1002150-01Added Frame Viewer and Diagnostics port
features.
53-1002150-02Updated the Diagnostics port feature.June 2011
53-1002751-01Updated for Fabric OS v7.1.0December 2012
53-1002751-02Corrected errors and omissions in the
guide.
53-1002930-01Updated for Fabric OS v7.2.0.July 2013
53-1003141-01Updated for Fabric OS v7.3.0.June 2014
November 2008
July 2009
March 2010
April 2011
March 2013
Fabric OS Troubleshooting and Diagnostics Guideiii
53-1003141-01
ivFabric OS Troubleshooting and Diagnostics Guide
53-1003141-01
Contents
About This Document
How this document is organized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Chapter 1, “Introduction,” gives a brief overview of troubleshooting the Fabric OS, and provides
procedures for gathering basic information from your switch and fabric to aid in
troubleshooting.
•
Chapter 2, “General Troubleshooting,” provides information on licensing, hardware, and syslog
issues.
•
Chapter 3, “Connectivity,” provides information and procedures on troubleshooting various link
issues.
•
Chapter 4, “Configuration,” provides troubleshooting information and procedures for
configuration file issues.
•
Chapter 5, “Firmware Download Errors,” provides procedures for troubleshooting firmware
download issues.
•
Chapter 6, “Security,” provides procedures for user account and security issues.
•
Chapter 7, “Virtual Fabrics,” provides procedures for troubleshooting Virtual Fabrics.
•
Chapter 8, “ISL Trunking,” provides procedures for resolving trunking issues.
•
Chapter 9, “Zoning,” provides preparations and procedures for performing firmware
downloads, as well troubleshooting information.
•
Chapter 10, “Diagnostic Features,” provides procedures for the use of the diagnostics
commands for the chassis, ports, and other chassis equipment, as well as providing
information on the system messages.
•
Appendix A, “Switch Type and Blade ID,” provides reference information to guide you in
understanding switch output.
•
Appendix B, “Hexadecimal Conversion,” provides reference information for translating
hexadecimal output.
Fabric OS Troubleshooting and Diagnostics Guidexi
53-1003141-01
Supported hardware and software
In those instances in which procedures or parts of procedures documented here apply to some
switches but not to others, this guide identifies which switches are supported and which are not.
Although many different software and hardware configurations are tested and supported by
Brocade Communications Systems, Inc. for Fabric OS v7.3.0, documenting all possible
configurations and scenarios is beyond the scope of this document.
The following hardware platforms are supported by this release of Fabric OS:
•
Brocade 300 switch
•
Brocade 5100 switch
•
Brocade 5300 switch
•
Brocade 5410 embedded switch
•
Brocade 5424 embedded switch
•
Brocade 5430 embedded switch
•
Brocade 5431 embedded switch
•
Brocade 5432 embedded switch
•
Brocade 5450 embedded switch
•
Brocade 5460 embedded switch
•
Brocade 5470 embedded switch
•
Brocade 5480 embedded switch
•
Brocade M6505 embedded switch
•
Brocade 6505 switch
•
Brocade 6510 switch
•
Brocade 6520 switch
•
Brocade 6547 embedded switch
•
Brocade 6548 embedded switch
•
Brocade 7800 extension switch
•
Brocade 7840 extension switch
•
Brocade VA-40FC
•
Brocade Encryption Switch
•
Brocade DCX Backbone family:
-
Brocade DCX
-
Brocade DCX-4S
•
Brocade DCX 8510 Backbone family:
-
Brocade DCX 8510-4
-
Brocade DCX 8510-8
xiiFabric OS Troubleshooting and Diagnostics Guide
53-1003141-01
What’s new in this document
The following information is added:
•
Support for Brocade 7840 extension switch.
•
Support for FC16-64 port blade.
•
POST2 section is updated.
For further information about documentation updates for this release, refer to the release notes.
Document conventions
This section describes text formatting conventions and important notice formats used in this
document.
Text formatting
The narrative-text formatting conventions that are used are as follows:
bold textIdentifies command names
Identifies the names of user-manipulated GUI elements
Identifies keywords and operands
Identifies text to enter at the GUI or CLI
italic textProvides emphasis
Identifies variables
Identifies paths and Internet addresses
Identifies document titles
code
textIdentifies CLI output
Identifies command syntax examples
Command syntax conventions
For readability, command names in the narrative portions of this guide are presented in mixed
lettercase: for example, switchShow. In actual examples, command lettercase is often all
lowercase. Otherwise, this manual specifically notes those cases in which a command is
case-sensitive. Command syntax in this manual follows these conventions:
commandCommands are printed in bold.
--
option, optionCommand options are printed in bold.
-
argument, argArguments.
[ ]Optional element.
variableVariables are printed in italics. In the help pages, values are underlined
enclosed in angled brackets < >.
or
Fabric OS Troubleshooting and Diagnostics Guidexiii
53-1003141-01
NOTE
ATTENTION
CAUTION
DANGER
...Repeat the previous element, for example “member[;member...]”
valueFixed values following arguments are printed in plain font. For example,
--
show WWN
|Boolean. Elements are exclusive. Example:
--
show -mode egress | ingress
Command examples
This book describes how to perform configuration tasks using the Fabric OS command line
interface, but does not describe the commands in detail. For complete descriptions of all Fabric OS
commands, including syntax, operand descriptions, and sample output, refer to the Fabric OS Command Reference.
Notes, cautions, and warnings
The following notices and statements are used in this manual. They are listed below in order of
increasing severity of potential hazards.
A note provides a tip, guidance, or advice, emphasizes important information, or provides a
reference to related information.
An Attention statement indicates potential damage to hardware or data.
A Caution statement alerts you to situations that can be potentially hazardous to you or cause
damage to hardware, firmware, software, or data.
A Danger statement indicates conditions or situations that can be potentially lethal or extremely
hazardous to you. Safety labels are also attached directly to products to warn of these conditions
or situations.
Key terms
For definitions specific to Brocade and Fibre Channel, refer to the Brocade Glossary.
For definitions of SAN-specific terms, visit the Storage Networking Industry Association online
dictionary at:
http://www.snia.org/education/dictionary
xivFabric OS Troubleshooting and Diagnostics Guide
53-1003141-01
Additional information
This section lists additional Brocade and industry-specific documentation that you might find
helpful.
Brocade resources
To get up-to-the-minute information, go to http://my.brocade.com and register at no cost for a user
ID and password.
White papers, online demonstrations, and data sheets are available through the Brocade website
at:
For additional Brocade documentation, visit the Brocade website:
http://www.brocade.com
Release notes are available on the MyBrocade website and are also bundled with the Fabric OS
firmware.
Other industry resources
For additional resource information, visit the Technical Committee T11 website. This website
provides interface standards for high-performance and mass storage applications for Fibre
Channel, storage management, and other applications:
http://www.t11.org
For information about the Fibre Channel industry, visit the Fibre Channel Industry Association
website:
http://www.fibrechannel.org
Getting technical help
Contact your switch support supplier for hardware, firmware, and software support, including
product repairs and part ordering. To expedite your call, have the following information available:
1. General Information
•
Switch model
•
Switch operating system version
•
Error numbers and messages received
•
supportSave command output
•
Detailed description of the problem, including the switch or fabric behavior immediately
following the problem, and specific questions
•
Description of any troubleshooting steps already performed and the results
•
Serial console and Telnet session logs
Fabric OS Troubleshooting and Diagnostics Guidexv
53-1003141-01
•
'"!&'
FT00X0054E9
syslog message logs
2. Switch Serial Number
The switch serial number and corresponding bar code are provided on the serial number label,
as illustrated below.
The serial number label is located as follows:
•
Brocade 5424 — On the bottom of the switch module.
•
Brocade 300, 5100, and 5300 — On the switch ID pull-out tab located on the bottom of the
port side of the switch.
•
Brocade 6505, 6510, and 6520— On the switch ID pull-out tab located inside the chassis
on the port side on the left.
•
Brocade 7800 — On the bottom of the chassis.
•
Brocade DCX Backbone — On the bottom right on the port side of the chassis.
•
Brocade DCX-4S Backbone — On the bottom right on the port side of the chassis.
•
Brocade DCX 8510-4 — On the nonport side of the chassis, on the left just below the left
power supply.
•
Brocade DCX 8510-8 — On the bottom right on the port side of the chassis and directly
above the cable management comb.
3. World Wide Name (WWN)
Use the licenseIdShow command to display the chassis WWN.
If you cannot use the licenseIdShow command because the switch is inoperable, you can get
the WWN from the same place as the serial number, except for the Brocade DCX. For the
Brocade DCX, access the numbers on the WWN cards by removing the Brocade logo plate at
the top of the nonport side of the chassis.
Document feedback
Quality is our first concern at Brocade and we have made every effort to ensure the accuracy and
completeness of this document. However, if you find an error or an omission, or you think that a
topic needs further development, we want to hear from you. Forward your feedback to:
documentation@brocade.com
Provide the title and version number of the document and as much detail as possible about your
comment, including the topic heading and page number and your suggestions for improvement.
xviFabric OS Troubleshooting and Diagnostics Guide
Gathering information for your switch support provider . . . . . . . . . . . . . . . . . 5
•
Building a case for your switch support provider . . . . . . . . . . . . . . . . . . . . . . 7
Troubleshooting overview
This book is a companion guide to be used in conjunction with the Fabric OS Administrator’s Guide.
Although it provides a lot of common troubleshooting tips and techniques, it does not teach
troubleshooting methodology.
Troubleshooting should begin at the center of the SAN — the fabric. Because switches are located
between the hosts and storage devices and have visibility into both sides of the storage network,
starting with them can help narrow the search path. After eliminating the possibility of a fault within
the fabric, see if the problem is on the storage side or the host side, and continue a more detailed
diagnosis from there. Using this approach can quickly pinpoint and isolate problems.
1
For example, if a host cannot detect a storage device, run the switchShow command to determine if
the storage device is logically connected to the switch. If not, focus first on the switch directly
connecting to storage. Use your vendor-supplied storage diagnostic tools to better understand why
it is not visible to the switch. If the storage can be detected by the switch, and the host still cannot
detect the storage device, then there is still a problem between the host and the switch.
Network Time Protocol
One of the most frustrating parts of troubleshooting is trying to synchronize a switch’s message
logs and portlogs with other switches in the fabric. If you do not have Network Time Protocol (NTP)
set up on your switches, then trying to synchronize log files to track a problem is more difficult.
Fabric OS Troubleshooting and Diagnostics Guide1
53-1003141-01
Most common problem areas
1
Most common problem areas
Tab le 1 identifies the most common problem areas that arise within SANs and identifies tools to
use to resolve them.
TABLE 1
Problem areaInvestigateTools
Fabric
Common troubleshooting problems and tools
•
Missing devices
•
Marginal links (unstable connections)
•
Incorrect zoning configurations
•
Incorrect switch configurations
•
•
•
•
Storage Devices
Hosts
Storage Managemen t
Applications
Physical issues between switch and
devices
•
Incorrect storage software
configurations
•
Physical issues between switch and
devices
•
Downgrade HBA firmware
•
Incorrect device driver installation
•
Incorrect device driver configuration
•
Incorrect installation and
configuration of the storage devices
that the software references.
For example, if using a
volume-management application,
check for:
-
Incorrect volume installation
-
Incorrect volume configuration
•
•
•
•
•
•
•
Also, make sure you use the latest HBA
firmware recommended by the switch
supplier or on the HBA supplier's website
•
Switch LEDs
Switch commands (for example,
switchShow or nsAllShow) for
diagnostics
Web or GUI-based monitoring and
management software tools
Device LEDs
Storage diagnostic tools
Switch commands (for example,
switchShow or nsAllShow) for
diagnostics
Device LEDs
Host operating system diagnostic tools
Device driver diagnostic tools
Switch commands (for example,
switchShow or nsAllShow) for
diagnostics
Application-specific tools and
resources
Questions for common symptoms
You first must determine what the problem is. Some symptoms are obvious, such as the switch
rebooted without any user intervention, or more obscure, such as your storage is having
intermittent connectivity to a particular host. Whatever the symptom is, you must gather
information from the devices that are directly involved in the symptom.
Tab le 2 lists common symptoms and possible areas to check. You may notice that an intermittent
connectivity problem has lots of variables to look into, such as the type of connection between the
two devices, how the connection is behaving, and the port type involved.
2Fabric OS Troubleshooting and Diagnostics Guide
53-1003141-01
Questions for common symptoms
1
TABLE 2
Common symptoms
SymptomAreas to checkChapter or Document
Blade is faultyFirmware or application download
Hardware connections
Blade is stuck in the “LOADING” stateFirmware or application downloadChapter 5, “Firmware Download Errors”
Configuration upload or download failsFTP or SCP server or USB availabilityChapter 4, “Configuration”
E_Port failed to come onlineCorrect licensing
Fabric parameters
Zoning
EX_Port does not formLinksChapter 3, “Connectivity”
Gathering information for your switch support provider
NOTE
NOTE
1
TABLE 2
SymptomAreas to checkChapter or Document
User is unable to change switch settingsRBAC settings
Virtual Fabric does not formFIDsChapter 7, “Virtual Fabrics”
Zone configuration mismatchEffective configurationChapter 9, “Zoning”
Zone content mismatchEffective configurationChapter 9, “Zoning”
Zone type mismatchEffective configurationChapter 9, “Zoning”
Common symptoms (Continued)
Chapter 6, “Security”
Account settings
Gathering information for your switch support provider
If you are troubleshooting a production system, you must gather data quickly. As soon as a problem
is observed, perform the following tasks. For more information about these commands and their
operands, refer to the Fabric OS Command Reference.
1. Enter the supportSave command to save RASlog, TRACE, supportShow, core file, FFDC data,
and other support information from the switch, chassis, blades, and logical switches.
2. Gather console output and logs.
To execute the supportSave command on the chassis, you must log in to the switch on an account
with the admin role that has the chassis role permission.
Setting up your switch for FTP
1. Connect to the switch and log in using an account with admin permissions.
2. Enter the supportFtp command and respond to the prompts.
Example of supportFTP command
switch:admin> supportftp -s
Host IP Addr[1080::8:800:200C:417A]:
User Name[njoe]: userFoo
Password[********]: <hidden>
Remote Dir[support]:
supportftp: parameters changed
Refer to “Automatic trace dump transfers” on page 100 for more information on setting up for
automatic transfer of diagnostic files as part of standard switch configuration.
Using the supportSave command
The supportSave command uses the default switch name to replace the chassis name regardless
of whether the chassis name has been changed to a non-factory setting. If Virtual Fabrics is
enabled, the supportSave command uses the default switch name for each logical fabric.
Fabric OS Troubleshooting and Diagnostics Guide5
53-1003141-01
Gathering information for your switch support provider
1
1. Connect to the switch and log in using an account with admin permissions.
2. Enter the appropriate supportSave command based on your needs:
•
If you are saving to an FTP or SCP server, use the following syntax:
supportSave [-n] [-c]
When invoked without operands, this command goes into interactive mode. The following
operands are optional:
-n—Does not prompt for confirmation. This operand is optional; if omitted, you are
prompted for confirmation.
-c—Uses the FTP parameters saved by the supportFtp command. This operand is optional;
if omitted, specify the FTP parameters through command line options or interactively. To
display the current FTP parameters, run supportFtp (on a dual-CP system, run supportFtp
on the active CP).
•
On platforms that support USB devices, you can use your Brocade USB device to save the
support files. To use your USB device, use the following syntax:
supportsave [-U -d remote_dir]
-U—Saves support data to an attached USB device. When using this option, a target
directory must be specified with the -d option.
-d remote_dir—Specifies the remote directory to which the file is to be transferred. When
saving to a USB device, the remote directory is created in the /support directory of the USB
device by default.
Changing the supportSave timeout value
While running the supportSave command, you may encounter a timeout. A timeout occurs if the
system is in a busy state due to the CPU or I/O bound from a lot of port traffic or file access. A
timeout can also occur on very large machine configurations or when the machine is under heavy
usage. If this occurs, an SS-1004 message is generated to both the console and the RASlog to
report the error. You must rerun the supportSave command with the -t option.
Example of SS-1004 message:
SS-1004: “One or more modules timed out during supportsave. Please retry supportsave with -t
option to collect all logs.”
Use this feature when you observe that supportSave has timed out.
1. Connect to the switch and log in using an account with admin permissions.
2. Enter the supportSave command with the -t operand, and specify a value from 1 through 5.
The following example increases the supportSave modules timeout to two times of the original
timeout setting.
switch:admin> supportSave –t 2
6Fabric OS Troubleshooting and Diagnostics Guide
53-1003141-01
Building a case for your switch support provider
1
Capturing output from a console
Some information, such as boot information is only outputted directly to the console. To capture
this information, you must connect directly to the switch through its management interface, either
a serial cable or an RJ-45 connector that is specifically used for Ethernet connection to the
management network.
1. Connect directly to the switch using hyperterminal.
2. Log in to the switch using an account with admin permissions.
3. Set the utility to capture output from the screen.
Some utilities require this step to be performed prior to opening up a session. Check with your
utility vendor for instructions.
4. Enter the command or start the process to capture the required data on the console.
Capturing command output
1. Connect to the switch through a Telnet or SSH utility.
2. Log in using an account with admin permissions.
3. Set the Telnet or SSH utility to capture output from the screen.
Some Telnet or SSH utilities require this step to be performed prior to opening up a session.
Check with your Telnet or SSH utility vendor for instructions.
4. Enter the command or start the process to capture the required data on the console.
Building a case for your switch support provider
The questions listed in “Basic information” should be printed out and answered in their entirety
and be ready to send to your switch support provider when you contact them. Having this
information immediately available expedites the information-gathering process that is necessary to
begin determining the problem and finding a solution.
Basic information
1. What is the switch’s current Fabric OS level?
To determine the switch’s Fabric OS level, enter the firmwareShow command and write down
the information.
2. What is the switch model?
To determine the switch model, enter the switchShow command and write down the value in
the switchType field. Cross-reference this value with the chart located in Appendix A, “Switch
Type and Blade ID”.
3. Is the switch operational? Yes or no.
4. Impact assessment and urgency:
•
Is the switch down? Yes or no.
•
Is it a standalone switch? Yes or no.
Fabric OS Troubleshooting and Diagnostics Guide7
53-1003141-01
Building a case for your switch support provider
1
•
Are there VE, VEX, or EX ports connected to the chassis? Yes or no.
Use the switchShow command to determine the answer.
•
How large is the fabric?
Use the nsAllShow command to determine the answer.
•
Do you have encryption blades or switches installed in the fabric? Yes or no.
•
Do you have Virtual Fabrics enabled in the fabric? Yes or no.
Use the switchShow command to determine the answer.
•
Do you have IPsec installed on the switch’s Ethernet interface? Yes or no.
Use the ipsecConfig
•
Do you have In-band Management installed on the switch’s Gigabit Ethernet ports? Yes or
no.
Use the portShow iproute geX command to determine the answer.
•
Are you using NPIV? Yes or no.
Use the switchShow command to determine the answer.
•
Are there security policies turned on in the fabric? If so, what are they? Gather the output
from the following commands:
secPolicyShow
fddCfg --showall
ipFilter --show
authUtil --show
secAuthSecret --show
fipsCfg --showall
5. Is the fabric redundant? If yes, what is the MPIO software? (List vendor and version.)
6. If you have a redundant fabric, did a failover occur? To verify, view the RASlogs on both CPs and
look for messages related to haFailover, for example, HAM-1004.
7.Was POST enabled on the switch? Use the diagPost command to verify if POST is enabled or
not.
--
show command to determine the answer.
8. Which CP blade was active? (Only applicable to Brocade DCX, DCX-4S, and DCX 8510 family
enterprise-class platforms.) Use the haShow command in conjunction with the RASlogs to
determine which is the active and standby CP. They will reverse roles in a failover and their
logs are separate.
Detailed problem information
Obtain as much of the following informationas possible prior to contacting the SAN technical
support vendor.
Document the sequence of events by answering the following questions:
•
When did the problem occur?
•
Is this a new installation?
•
How long has the problem been occurring?
8Fabric OS Troubleshooting and Diagnostics Guide
53-1003141-01
Building a case for your switch support provider
•
Are specific devices affected?
-
If so, what are their World Wide Number Names?
•
What happened prior to the problem?
•
Is the problem reproducible?
-
If so, what are the steps to reproduce the problem?
•
What configuration was in place when the problem occurred?
•
A description of the problem with the switch or the fault with the fabric.
•
The last actions or changes made to the system environment:
-
Settings
-
supportSave output
•
Host information:
-
OS version and patch level
-
HBA type
-
HBA firmware version
-
HBA driver version
-
Configuration settings
•
Storage information:
-
Disk/tape type
-
Disk/tape firmware level
-
Controller type
-
Controller firmware level
-
Configuration settings
-
Storage software (such as EMC Control Center, Veritas SPC, and so on.)
•
If this is a Brocade DCX, DCX-4S, or DCX 8510 family enterprise-class platform,, are the CPs
in-sync? Yes or no.
1
Use the haShow command to determine the answer.
•
List out when and what were the last actions or changes made to the switch, the fabric, and
the SAN or metaSAN.
•
In Tabl e 3, list the environmental changes added to the network.
TABLE 3
Type of ChangeDate when change occurred
Fabric OS Troubleshooting and Diagnostics Guide9
53-1003141-01
Environmental changes
Building a case for your switch support provider
1
Gathering additional information
The following features that require you to gather additional information. The additional information
is necessary in order for your switch support provider to effectively and efficiently troubleshoot your
issue. Refer to the chapter or document specified for the commands used for the data you must
capture:
•
Configurations, refer to Chapter 3, “Connectivity”.
•
Firmware download, refer to Chapter 5, “Firmware Download Errors”.
•
Tru nking , refer to Chapter 8, “ISL Trunking”.
•
Zoning, refer to Chapter 9, “Zoning”.
•
FCIP tunnels, refer to the Fibre Channel over IP Administrator’s Guide.
Some features require licenses in order to work properly. To view a list of features and their
associated licenses, refer to the Fabric OS Administrator’s Guide. Licenses are created using a
switch’s License Identifierso you cannot apply one license to different switches. Before calling your
switch support provider, verify that you have the correct licenses installed by using the licenseShow
command.
2
Time
SymptomA feature is not working.
Probable cause and recommended action
Refer to the Fabric OS Administrator’s Guide to determine if the appropriate licenses are installed
on the local switch and any connecting switches.
Determining installed licenses
1. Connect to the switch and log in using an account with admin permissions.
2. Enter the licenseShow command.
A list of the currently installed licenses on the switch is displayed.
SymptomTime is not in-sync.
Probable cause and recommended action
NTP is not set up on the switches in your fabric. Set up NTP on your switches in all fabrics in your
SAN and metaSAN.
For more information on setting up NTP, refer to the Fabric OS Administrator’s Guide.
Fabric OS Troubleshooting and Diagnostics Guide11
53-1003141-01
Frame Viewer
2
Frame Viewer
SymptomFrames are being dropped.
When a frame is unable to reach its destination due to timeout, unroutable, or destination
unreachable, it is discarded. You can use Frame Viewer to find out the flows that contained the
dropped frames, which can help you determine which applications might be impacted. Using Frame
Viewer, you can see exactly what time the frames were dropped. (Timestamps are accurate to
within one second.) Additionally, this assists in the debug process.
You can view and filter up to 20 discarded frames per chip per second for 1200 seconds using a
number of fields with the framelog command.
Probable cause and recommended action
Frames are timing out.
Viewing frames
1. Connect to the switch and log in using an account with admin permissions.
2. Enter the framelog --show command.
Switch message logs
Switch message logs (RAS logs) contain information on events that happen on the switch or in the
fabric. This is an effective tool in understanding what is going on in your fabric or on your switch.
RAS logs are independent on director class switches. Weekly review of the RAS logs is necessary to
prevent minor problems from becoming larger issues, or in catching problems at an early stage.
There are two sets of logs. The ipAddrShow command provides the IP addresses of the CP0 and
CP1 control processor blades and associated RAS logs.
The following common problems can occur with or in your system message log.
SymptomInaccurate information in the system message log.
Probable cause and recommended action
In rare instances, events gathered by the Track Change feature can report inaccurate information
to the system message log.
For example, a user enters a correct user name and password, but the login was rejected because
the maximum number of users had been reached. However, when looking at the system message
log, the login was reported as successful.
If the maximum number of switch users has been reached, the switch still performs correctly, in
that it rejects the login of additional users, even if they enter the correct user name and password
information.
12Fabric OS Troubleshooting and Diagnostics Guide
53-1003141-01
However, in this limited example, the Track Change feature reports this event inaccurately to the
system message log; it appears that the login was successful. This scenario only occurs when the
maximum number of users has been reached; otherwise, the login information displayed in the
system message log reflects reality.
Refer to the Fabric OS Administrator’s Guide for information regarding enabling and disabling Track
Changes (TC).
SymptomMQ errors are appearing in the switch log.
Probable cause and recommended action
An MQ error is a message queue error. Identify an MQ error message by looking for the two letters
MQ followed by a number in the error message:
2004/08/24-10:04:42, [MQ-1004], 218,, ERROR, ras007, mqRead, queue =
raslog-test- string0123456-raslog, queue I
D = 1, type = 2
MQ errors can result in devices dropping from the switch’s Name Server or can prevent a switch
from joining the fabric. MQ errors are rare and difficult to troubleshoot; resolve them by working
with the switch supplier. When encountering an MQ error, issue the supportSave command to
capture debug information about the switch; then, forward the supportSave data to the switch
supplier for further investigation.
Switch message logs
2
SymptomI
2
C bus errors are appearing in the switch log.
Probable cause and recommended action
2
I
C bus errors generally indicate defective hardware or poorly seated devices or blades; the specific
item is listed in the error message. Refer to the Fabric OS Message Reference for information
specific to the error that was received. Some Chip-Port (CPT) and Environmental Monitor (EM)
messages contain I
2
If the I
C message does not indicate the specific hardware that may be failing, begin debugging the
2
C-related information.
hardware, as this is the most likely cause.
SymptomCore file or FFDC warning messages appear on the serial console or in the system log.
Probable cause and recommended action
Issue the supportSave command. The messages can be dismissed by issuing the supportSave -R
command after all data is confirmed to be collected properly.
Error example:
*** CORE FILES WARNING (10/22/08 - 05:00:01 ) ***
3416 KBytes in 1 file(s)
use "supportsave" command to upload
Fabric OS Troubleshooting and Diagnostics Guide13
53-1003141-01
Switch boot
ATTENTION
2
Switch boot
SymptomThe enterprise-class platform model rebooted again after an initial bootup.
Probable cause and recommended action
This issue can occur during an enterprise-class platform bootup with two CPs. If any failure occurs
on the active CP, before the standby CP is fully functional and has obtained HA sync, the standby CP
may not be able to take on the active role to perform failover successfully.
In this case, both CPs reboot to recover from the failure.
Rolling Reboot Detection
A rolling reboot occurs when a switch or enterprise-class platform has continuously experienced
unexpected reboots. This behavior is continuous until the rolling reboot is detected by the system.
Once the Rolling Reboot Detection (RRD) occurs, the switch is put into a stable state so that only
minimal supportSave output need be collected and sent to your service support provider for
analysis. USB is also supported in RRD mode. The USB device can be enabled by entering
usbstorage -e and the results ccollected by entering supportsave -U -d MySupportSave.Not every
reboot activates the Rolling Reboot Detection feature.
If a rolling reboot is caused by a Linux kernel panic, then the RRD feature is not activated.
Reboot classification
There are two types of reboots that occur on a switch and enterprise-class platform: expected and
unexpected. Expected reboots occur when the reboots are initialized by commands; these types of
reboots are ignored by the Rolling Reboot Detection (RRD) feature. They include the following
commands:
•
reboot
•
haFailover
•
fastBoot
•
firmwareDownload
The RRD feature is activated and halts rebooting when an unexpected reboot reason is shown
continuously in the reboot history within a certain period of time. The period of time depends on
the switch. The following reboots are considered unexpected reboots:
•
Reset
A reset reboot may be caused by one of the following:
-
Power-cycle of the switch or CP
-
Linux reboot command
-
Hardware watchdog timeout
-
Heartbeat loss-related reboot
14Fabric OS Troubleshooting and Diagnostics Guide
53-1003141-01
Loading...
+ 100 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.