Brocade, the B-wing symbol, BigIron, DCX, Fabric OS, FastIron, IronPoint, IronShield, IronView, IronWare, JetCore, NetIron,
SecureIron, ServerIron, StorageX, and TurboIron are registered trademarks, and Brocade Network Advisor (formerly Data Center
Fabric Manager or DCFM), Extraordinary Networks, and SAN Health are trademarks of Brocade Communications Systems, Inc., in
the United States and/or in other countries. All other brands, products, or service names are or may be trademarks or service
marks of, and are used to identify, products or services of their respective owners.
Notice: This document is for informational purposes only and does not set forth any warranty, expressed or implied, concerning
any equipment, equipment feature, or service offered or to be offered by Brocade. Brocade reserves the right to make changes to
this document at any time, without notice, and assumes no responsibility for its use. This informational document describes
features that may not be currently available. Contact a Brocade sales office for information on feature and product availability.
Export of technical data contained in this document may require an export license from the United States government.
The authors and Brocade Communications Systems, Inc. shall have no liability or responsibility to any person or entity with
respect to any loss, cost, liability, or damages arising from the information contained in this book or the computer programs that
accompany it.
The product described by this document may contain “open source” software covered by the GNU General Public License or other
open source license agreements. To find out which open source software is included in Brocade products, view the licensing
terms applicable to the open source software, and obtain a copy of the programming source code, please visit
http://www.brocade.com/support/oscd.
Brocade Communications Systems, Incorporated
Corporate and Latin American Headquarters
Brocade Communications Systems, Inc.
130 Holger Way
San Jose, CA 95134
Tel: 1-408-333-8000
Fax: 1-408-333-8101
E-mail: info@brocade.com
Asia-Pacific Headquarters
Brocade Communications Systems China HK, Ltd.
No. 1 Guanghua Road
Chao Yang District
Units 2718 and 2818
Beijing 100020, China
Tel: +8610 6588 8888
Fax: +8610 6588 9999
E-mail: china-info@brocade.com
European Headquarters
Brocade Communications Switzerland Sàrl
Centre Swissair
Tour B - 4ème étage
29, Route de l'Aéroport
Case Postale 105
CH-1215 Genève 15
Switzerland
Tel: +41 22 799 5640
Fax: +41 22 799 5641
E-mail: emea-info@brocade.com
Asia-Pacific Headquarters
Brocade Communications Systems Co., Ltd. (Shenzhen WFOE)
Citic Plaza
No. 233 Tian He Road North
Unit 1308 – 13th Floor
Guangzhou, China
Tel: +8620 3891 2000
Fax: +8620 3891 2111
E-mail: china-info@brocade.com
Document History
TitlePublication numberSummary of changesDate
Fabric OS Troubleshooting and
Diagnostics Guide
Fabric OS Troubleshooting and
Diagnostics Guide
Fabric OS Troubleshooting and
Diagnostics Guide
Fabric OS Troubleshooting and
Diagnostics Guide
Fabric OS Troubleshooting and
Diagnostics Guide
Fabric OS Troubleshooting and
Diagnostics Guide
Fabric OS Troubleshooting and
Diagnostics Guide
53-0000853-01First released edition.March 2008
53-1001187-01Added support for Vir tual Fabrics, fcPing,
pathInfo, and additional troubleshooting
tips.
53-1001340-01Added support for checking physical
connections, updated commands,
removed obsolete information, and moved
the FCIP and FICON chapters into their
respective books.
53-1001769-01Added support for the Rolling Reboot
Detection feature and the Superping tool;
added enhancements for supportSave and
spinFab; updated commands; transferred
the iSCSI chapter into its respective book.
53-1002150-01Added Frame Viewer and Diagnostics port
features.
53-1002150-02Updated the Diagnostics port feature.June 2011
53-1002751-01Updated for Fabric OS v7.1.0.December 2012
November 2008
July 2009
March 2010
April 2011
Fabric OS Troubleshooting and Diagnostics Guideiii
53-1002751-01
ivFabric OS Troubleshooting and Diagnostics Guide
53-1002751-01
Contents
About This Document
How this document is organized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
• Chapter 9, “Zoning,” provides preparations and procedures for performing firmware
downloads, as well troubleshooting information.
• Chapter 10, “Diagnostic Features,” provides procedures for the use of the diagnostics
commands for the chassis, ports, and other chassis equipment. Provides information on the
system messages.
• Appendix A, “Switch Type and Blade ID,” provides reference information to guide you in
understanding switch output.
• Appendix B, “Hexadecimal Conversion,” provide reference information for translating
hexadecimal output.
Fabric OS Troubleshooting and Diagnostics Guidexi
53-1002751-01
Supported hardware and software
In those instances in which procedures or parts of procedures documented here apply to some
switches but not to others, this guide identifies which switches are supported and which are not.
Although many different software and hardware configurations are tested and supported by
Brocade Communications Systems, Inc. for Fabric OS v7.1.0, documenting all possible
configurations and scenarios is beyond the scope of this document.
The following hardware platforms are supported by this release of Fabric OS:
• Brocade 300 switch
• Brocade 5100 switch
• Brocade 5300 switch
• Brocade 5410 embedded switch
• Brocade 5424 embedded switch
• Brocade 5450 embedded switch
• Brocade 5460 embedded switch
• Brocade 5470 embedded switch
• Brocade 5480 embedded switch
• Brocade 6505 switch
• Brocade 6510 switch
• Brocade 6520 switch
• Brocade 7800 extension switch
• Brocade 8000 FCoE switch
• Brocade VA-40FC
• Brocade Encryption Switch
• Brocade DCX
• Brocade DCX-4S
• Brocade DCX 8510-4
• Brocade DCX 8510-8
What’s new in this document
Updated for Brocade Fabric OS v7.1.0, including the following:
• Updated system messages related to firmware downloads. (Refer to Chapter 5, “Firmware
Download Errors,” on page 51.)
• Introduced new features available with the D_Port diagnostic tool. (Refer to Chapter 10,
“Diagnostic Features,” on page 81.)
For further information about documentation updates for this release, refer to the release notes.
xiiFabric OS Troubleshooting and Diagnostics Guide
53-1002751-01
Document conventions
This section describes text formatting conventions and important notice formats used in this
document.
TEXT FORMATTING
The narrative-text formatting conventions that are used are as follows:
bold textIdentifies command names
italic textProvides emphasis
code textIdentifies CLI output
COMMAND SYNTAX CONVENTIONS
For readability, command names in the narrative portions of this guide are presented in mixed
lettercase: for example, switchShow. In actual examples, command lettercase is often all
lowercase. Otherwise, this manual specifically notes those cases in which a command is case
sensitive. Command syntax in this manual follows these conventions:
Identifies the names of user-manipulated GUI elements
Identifies keywords and operands
Identifies text to enter at the GUI or CLI
Identifies variables
Identifies paths and Internet addresses
Identifies document titles
Identifies command syntax examples
commandCommands are printed in bold.
--option, optionCommand options are printed in bold.
-argument, argArguments.
[ ]Optional element.
variableVariables are printed in italics. In the help pages, values are underlined
enclosed in angled brackets < >.
...Repeat the previous element, for example “member[;member...]”
valueFixed values following arguments are printed in plain font. For example,
--show WWN
|Boolean. Elements are exclusive. Example:
--show -mode egress | ingress
or
COMMAND EXAMPLES
This book describes how to perform configuration tasks using the Fabric OS command line
interface, but does not describe the commands in detail. For complete descriptions of all Fabric OS
commands, including syntax, operand description, and sample output, refer to the Fabric OS Command Reference.
Fabric OS Troubleshooting and Diagnostics Guidexiii
53-1002751-01
NOTES, CAUTIONS, AND WARNINGS
NOTE
ATTENTION
CAUTION
DANGER
The following notices and statements are used in this manual. They are listed below in order of
increasing severity of potential hazards.
A note provides a tip, guidance, or advice, emphasizes important information, or provides a
reference to related information.
An Attention statement indicates potential damage to hardware or data.
A Caution statement alerts you to situations that can be potentially hazardous to you or cause
damage to hardware, firmware, software, or data.
A Danger statement indicates conditions or situations that can be potentially lethal or extremely
hazardous to you. Safety labels are also attached directly to products to warn of these conditions
or situations.
KEY TERMS
For definitions specific to Brocade and Fibre Channel, refer to the Brocade Glossary.
For definitions of SAN-specific terms, visit the Storage Networking Industry Association online
dictionary at:
http://www.snia.org/education/dictionary
Additional information
This section lists additional Brocade and industry-specific documentation that you might find
helpful.
BROCADE RESOURCES
To get up-to-the-minute information, go to http://my.brocade.com and register at no cost for a user
ID and password.
For practical discussions about SAN design, implementation, and maintenance, you can obtain
Building SANs with Brocade Fabric Switches through:
http://www.amazon.com
xivFabric OS Troubleshooting and Diagnostics Guide
53-1002751-01
White papers, online demonstrations, and data sheets are available through the Brocade website
at:
For additional Brocade documentation, visit the Brocade website:
http://www.brocade.com
Release notes are available on the MyBrocade website and are also bundled with the Fabric OS
firmware.
OTHER INDUSTRY RESOURCES
For additional resource information, visit the Technical Committee T11 website. This website
provides interface standards for high-performance and mass storage applications for Fibre
Channel, storage management, and other applications:
http://www.t11.org
For information about the Fibre Channel industry, visit the Fibre Channel Industry Association
website:
http://www.fibrechannel.org
Getting technical help
Contact your switch support supplier for hardware, firmware, and software support, including
product repairs and part ordering. To expedite your call, have the following information available:
1. General Information
• Switch model
• Switch operating system version
• Error numbers and messages received
• supportSave command output
• Detailed description of the problem, including the switch or fabric behavior immediately
following the problem, and specific questions
• Description of any troubleshooting steps already performed and the results
• Serial console and Telnet session logs
• syslog message logs
2. Switch Serial Number
The switch serial number and corresponding bar code are provided on the serial number label,
as illustrated below.:
*FT00X0054E9*
FT00X0054E9
The serial number label is located as follows:
• Brocade 5424 — On the bottom of the switch module.
Fabric OS Troubleshooting and Diagnostics Guidexv
53-1002751-01
• Brocade 300, 5100, and 5300 — On the switch ID pull-out tab located on the bottom of the
port side of the switch.
• Brocade 6505, 6510, and 6520— On the switch ID pull-out tab located inside the chassis
on the port side on the left.
• Brocade 7800 and 8000 — On the bottom of the chassis.
• Brocade DCX Backbone — On the bottom right on the port side of the chassis.
• Brocade DCX-4S Backbone — On the bottom right on the port side of the chassis.
• Brocade DCX 8510-4 — On the nonport side of the chassis, on the left just below the left
power supply.
• Brocade DCX 8510-8 — On the bottom right on the port side of the chassis and directly
above the cable management comb.
3. World Wide Name (WWN)
Use the licenseIdShow command to display the chassis’ WWN.
If you cannot use the licenseIdShow command because the switch is inoperable, you can get
the WWN from the same place as the serial number, except for the Brocade DCX. For the
Brocade DCX, access the numbers on the WWN cards by removing the Brocade logo plate at
the top of the nonport side of the chassis.
Document feedback
Quality is our first concern at Brocade and we have made every effort to ensure the accuracy and
completeness of this document. However, if you find an error or an omission, or you think that a
topic needs further development, we want to hear from you. Forward your feedback to:
documentation@brocade.com
Provide the title and version number of the document and as much detail as possible about your
comment, including the topic heading and page number and your suggestions for improvement.
xviFabric OS Troubleshooting and Diagnostics Guide
•Gathering information for your switch support provider. . . . . . . . . . . . . . . . . 5
•Building a case for your switch support provider . . . . . . . . . . . . . . . . . . . . . . 7
Troubleshooting overview
This book is a companion guide to be used in conjunction with the Fabric OS Administrator’s Guide.
Although it provides a lot of common troubleshooting tips and techniques, it does not teach
troubleshooting methodology.
Troubleshooting should begin at the center of the SAN — the fabric. Because switches are located
between the hosts and storage devices and have visibility into both sides of the storage network,
starting with them can help narrow the search path. After eliminating the possibility of a fault within
the fabric, see if the problem is on the storage side or the host side, and continue a more detailed
diagnosis from there. Using this approach can quickly pinpoint and isolate problems.
1
For example, if a host cannot detect a storage device, run the switchShow command to determine if
the storage device is logically connected to the switch. If not, focus first on the switch directly
connecting to storage. Use your vendor-supplied storage diagnostic tools to better understand why
it is not visible to the switch. If the storage can be detected by the switch, and the host still cannot
detect the storage device, then there is still a problem between the host and switch.
Network time protocol
One of the most frustrating parts of troubleshooting is trying to synchronize switch’s message logs
and portlogs with other switches in the fabric. If you do not have NTP set up on your switches, then
trying to synchronize log files to track a problem is more difficult.
Fabric OS Troubleshooting and Diagnostics Guide1
53-1002751-01
1
Most common problem areas
Most common problem areas
Tab le 1 identifies the most common problem areas that arise within SANs and identifies tools to
use to resolve them.
TABLE 1Common troubleshooting problems and tools
Problem areaInvestigateTools
Fabric• Missing devices
• Marginal links (unstable connections)
• Incorrect zoning configurations
• Incorrect switch configurations
Storage Devices
• Physical issues between switch and
devices
• Incorrect storage software
configurations
Hosts
• Physical issues between switch and
devices
• Downgrade HBA firmware
• Incorrect device driver installation
• Incorrect device driver configuration
Storage Management
Applications
• Incorrect installation and
configuration of the storage devices
that the software references.
For example, if using a
volume-management application,
check for:
-Incorrect volume installation
-Incorrect volume
configuration
• Switch LEDs
• Switch commands (for example,
switchShow or nsAllShow) for
diagnostics
• Web or GUI-based monitoring and
management software tools
• Device LEDs
• Storage diagnostic tools
• Switch commands (for example,
switchShow or nsAllShow) for
diagnostics
• Device LEDs
• Host operating system diagnostic
tools
• Device driver diagnostic tools
• Switch commands (for example,
switchShow or nsAllShow) for
diagnostics
Also, make sure you use the latest HBA
firmware recommended by the switch
supplier or on the HBA supplier's website
• Application-specific tools and
resources
Questions for common symptoms
You first must determine what the problem is. Some symptoms are obvious, such as the switch
rebooted without any user intervention, or more obscure, such as your storage is having
intermittent connectivity to a particular host. Whatever the symptom is, you must gather
information from the devices that are directly involved in the symptom.
Tab le 2 lists common symptoms and possible areas to check. You may notice that an intermittent
connectivity problem has lots of variables to look into, such as the type of connection between the
two devices, how the connection is behaving, and the port type involved.
2Fabric OS Troubleshooting and Diagnostics Guide
53-1002751-01
Questions for common symptoms
TABLE 2Common symptoms
SymptomAreas to checkChapter or Document
1
Blade is faultyFirmware or application download
Hardware connections
Blade is stuck in the “LOADING” stateFirmware or application downloadChapter 5, “Firmware Download Errors”
Configupload or download failsFTP or SCP server or USB availabilityChapter 4, “Configuration”
E_Port failed to come onlineCorrect licensing
Fabric parameters
Zoning
EX_Port does not formLinksChapter 3, “Connectivity”
Gathering information for your switch support provider
NOTE
TABLE 2Common symptoms (Continued)
SymptomAreas to checkChapter or Document
1
User is unable to change switch settingsRBAC settings
Account settings
Virtual Fabric does not formFIDsChapter 7, “Virtual Fabrics”
Zone configuration mismatchEffective configurationChapter 9, “Zoning”
Zone content mismatchEffective configurationChapter 9, “Zoning”
Zone type mismatchEffective configurationChapter 9, “Zoning”
Chapter 6, “Security”
Gathering information for your switch support provider
If you are troubleshooting a production system, you must gather data quickly. As soon as a problem
is observed, perform the following tasks. For more information about these commands and their
operands, refer to the Fabric OS Command Reference.
1. Enter the supportSave command to save RASlog, TRACE, supportShow, core file, FFDC data,
and other support information from the switch, chassis, blades, and logical switches.
2. Gather console output and logs.
To execute the supportSave command on the chassis, you must log in to the switch on an account
with the admin role that has the chassis role permission.
Setting up your switch for FTP
1. Connect to the switch and log in using an account with admin permissions.
2. Type the supportFtp command and respond to the prompts.
Example of supportFTP command
switch:admin> supportftp -s
Host IP Addr[1080::8:800:200C:417A]:
User Name[njoe]: userFoo
Password[********]: <hidden>
Remote Dir[support]:
supportftp: parameters changed
Capturing a supportSave
The supportSave command uses the default switch name to replace the chassis name regardless
if the chassis name has been changed to a non-factory setting. If Virtual Fabrics is enabled, the
supportSave command uses the default switch name for each logical fabric.
1. Connect to the switch and log in using an account with admin permissions.
Fabric OS Troubleshooting and Diagnostics Guide5
53-1002751-01
1
Gathering information for your switch support provider
2. Type the appropriate supportSave command based on your needs:
• If you are saving to an FTP or SCP server, use the following syntax:
supportSave
When invoked without operands, this command goes into interactive mode. The following
operands are optional:
-n Does not prompt for confirmation. This operand is optional; if omitted, you are prompted
for confirmation.
-c Uses the FTP parameters saved by the supportFtp command. This operand is optional; if
omitted, specify the FTP parameters through command line options or interactively. To
display the current FTP parameters, run supportFtp (on a dual-CP system, run supportFtp
on the active CP).
• On platforms that support USB devices, you can use your Brocade USB device to save the
support files. To use your USB device, use the following syntax:
supportsave [-U -d remote_dir]
-d Specifies the remote directory to which the file is to be transferred. When saving to a
USB device, the predefined
/support directory must be used.
• While running the supportSave command you may encounter a timeout. A timeout occurs
if the system is in busy state due to CPU or I/O bound from a lot of port traffic or file
access. If this occurs, an SS-1004 is generated to both the console and the RASlog to
report the error. You must rerun the supportSave command with the -t option.
Example of SS-1004 message:
SS-1004: “One or more modules timed out during supportsave. Please retry supportsave
with -t option to collect all logs.”
Changing the supportSave timeout value
1. Connect to the switch and log in using an account with admin permissions.
2. Enter the supportSave command with the -t operand, and specify a value between 1 through
5.
The following example increases the supportSave modules timeout to two times of the original
timeout setting.
switch:admin> supportSave –t 2
Capturing output from a console
Some information, such as boot information is only outputted directly to the console. In order to
capture this information you have to connect directly to the switch through its management
interface, either a serial cable or an RJ-45 connection.
1. Connect directly to the switch using hyperterminal.
2. Log in to the switch using an account with admin permissions.
6Fabric OS Troubleshooting and Diagnostics Guide
53-1002751-01
Building a case for your switch support provider
3. Set the utility to capture output from the screen.
Some utilities require this step to be performed prior to opening up a session. Check with your
utility vendor for instructions.
4. Type the command or start the process to capture the required data on the console.
Capturing command output
1. Connect to the switch through a Telnet or SSH utility.
2. Log in using an account with admin permissions.
3. Set the Telnet or SSH utility to capture output from the screen.
Some Telnet or SSH utilities require this step to be performed prior to opening up a session.
Check with your Telnet or SSH utility vendor for instructions.
4. Type the command or start the process to capture the required data on the console.
Building a case for your switch support provider
1
The questions listed “Basic information” should be printed out and answered in its entirety and be
ready to send to your switch support provider when you contact them. Having this information
immediately available expedites the information gathering process that is necessary to begin
determining the problem and finding a solution.
Basic information
1. What is the switch’s current Fabric OS level?
To determine the switch’s Fabric OS level, type the firmwareShow command and write down
the information.
2. What is the switch model?
To determine the switch model, type the switchshow command and write down the value in the
switchType field. Cross-reference this value with the chart located in Appendix A, “Switch Type
and Blade ID”.
3. Is the switch operational? Yes or no.
4. Impact assessment and urgency:
• Is the switch down? Yes or no.
• Is it a standalone switch? Yes or no.
• Are there VE, VEX, or EX ports connected to the chassis? Yes or no.
• Use the switchShow command to determine the answer.
• How large is the fabric?
• Use the nsAllShow command to determine the answer.
• Do you have encryption blades or switches installed in the fabric? Yes or no.
• Do you have Virtual Fabrics enabled in the fabric? Yes or no.
• Use the switchShow command to determine the answer.
Fabric OS Troubleshooting and Diagnostics Guide7
53-1002751-01
1
Building a case for your switch support provider
• Do you have IPsec installed on the switch’s Ethernet interface? Yes or no.
• Use the ipsecConfig --show command to determine the answer.
• Do you have Inband Management installed on the switches GigE ports? Yes or no.
• User the portShow iproute geX command to determine the answer.
• Are you using NPIV? Yes or no.
• Use the switchShow command to determine the answer.
• Are there security policies turned on in the fabric? If so, what are they? Gather the output from
the following commands:
-secPolicyShow
-fddCfg --showall
-ipFilter --show
-authUtil --show
-secAuthSecret --show
-fipsCfg --showall
• Is the fabric redundant? If yes, what is the MPIO software? (List vendor and version.)
5. If you have a redundant fabric, did a failover occur?
6. Was POST enabled on the switch?
7. Which CP blade was active? (Only applicable to Brocade DCX, DCX 8510 family, and DCX-4S
enterprise-class platforms.)
Detailed problem information
Obtain as much of the following informational items as possible prior to contacting the SAN
technical support vendor.
Document the sequence of events by answering the following questions:
• When did problem occur?
• Is this a new installation?
• How long has the problem been occurring?
• Are specific devices affected?
-If so, what are their World Wide Node Names?
• What happened prior to the problem?
• Is the problem reproducible?
-If so, what are the steps to produce the problem?
• What configuration was in place when the problem occurred?
• A description of the problem with the switch or the fault with the fabric.
• The last actions or changes made to the system environment:
-settings
-supportShow output
8Fabric OS Troubleshooting and Diagnostics Guide
53-1002751-01
Building a case for your switch support provider
1
• Host information:
-OS version and patch level
-HBA type
-HBA firmware version
-HBA driver version
-Configuration settings
• Storage information:
-Disk/tape type
-Disk/tape firmware level
-Controller type
-Controller firmware level
-Configuration settings
-Storage software (such as EMC Control Center, Veritas SPC, etc.)
• If this is a Brocade DCX, DCX 8510 family, and DCX-4S enterprise-class platforms, are the CPs
in-sync? Yes or no.
• Use the haShow command to determine the answer.
• List out when and what were the last actions or changes made to the switch, the fabric, and
the SAN or metaSAN.
• In Tab le 3, list the environmental changes added to the network.
TABLE 3Environmental changes
Type of ChangeDate when change occurred
Gathering additional information
Below are features that require you to gather additional information. The additional information is
necessary in order for your switch support provider to effectively and efficiently troubleshoot your
issue. Refer to the chapter or document specified for the commands whose data you must capture:
• Configurations, see Chapter 3, “Connectivity”.
• Firmwaredownload, see Chapter 5, “Firmware Download Errors”.
• Trunking, see Chapter 8, “ISL Trunking”.
• Zoning, see Chapter 9, “Zoning”.
• FCIP tunnels, refer to the Fibre Channel over IP Administrator’s Guide.
• FICON, refer to the FICON Administrator’s Guide.
Fabric OS Troubleshooting and Diagnostics Guide9
53-1002751-01
1
Building a case for your switch support provider
10Fabric OS Troubleshooting and Diagnostics Guide
53-1002751-01
Chapter
General
In this chapter
Licenses
Some features need licenses in order to work properly. To view a list of features and their
associated licenses, refer to the Fabric OS Administrator’s Guide. Licenses are created using a
switch’s License Identifierso you cannot apply one license to different switches. Before calling your
switch support provider, verify that you have the correct licenses installed by using the licenseShow
command.
Refer to the Fabric OS Administrator’s Guide to determine if the appropriate licenses are installed
on the local switch and any connecting switches.
Determining installed licenses
1. Connect to the switch and log in using an account with admin permissions.
2. Type the licenseShow command.
A list of the currently installed licenses on the switch is displayed.
SymptomTime is not in-sync.
Probable cause and recommended action
NTP is not set up on the switches in your fabric. Set up NTP on your switches in all fabrics in your
SAN and metaSAN.
For more information on setting up NTP, refer to the Fabric OS Administrator’s Guide.
Fabric OS Troubleshooting and Diagnostics Guide11
53-1002751-01
2
Frame Viewer
SymptomFrames are being dropped.
Frame Viewer
When a frame is unable to reach its destination due to timeout, it is discarded. You can use Frame
Viewer to find out which flows contained the dropped frames, which can help you determine which
applications might be impacted. Using Frame Viewer, you can see exactly what time the frames
were dropped. (Timestamps are accurate to within one second.) Additionally, this assists in the
debug process.
You can view and filter up to 20 discarded frames per chip per second for 1200 seconds using a
number of fields with the framelog command.
Probable cause and recommended action
Frames are timing out.
Viewing frames.
1. Connect to the switch and log in using an account with admin permissions.
2. Type the framelog --show command.
Switch message logs
Switch message logs (RAS logs) contain information on events that happen on the switch or in the
fabric. This is an effective tool in understanding what is going on in your fabric or on your switch.
Weekly review of the RAS logs is necessary to prevent minor problems from becoming larger issues,
or in catching problems at an early stage.
Below are some common problems that can occur with or in your system message log.
SymptomInaccurate information in the system message log
Probable cause and recommended action
In rare instances, events gathered by the track change feature can report inaccurate information to
the system message log.
For example, a user enters a correct user name and password, but the login was rejected because
the maximum number of users had been reached. However, when looking at the system message
log, the login was reported as successful.
If the maximum number of switch users has been reached, the switch still performs correctly, in
that it rejects the login of additional users, even if they enter the correct user name and password
information.
However, in this limited example, the Track Change feature reports this event inaccurately to the
system message log; it appears that the login was successful. This scenario only occurs when the
maximum number of users has been reached; otherwise, the login information displayed in the
system message log reflects reality.
Refer to the Fabric OS Administrator’s Guide for information regarding enabling and disabling track
changes (TC).
12Fabric OS Troubleshooting and Diagnostics Guide
53-1002751-01
Switch boot
SymptomMQ errors are appearing in the switch log.
Probable cause and recommended action
An MQ error is a message queue error. Identify an MQ error message by looking for the two letters
MQ followed by a number in the error message:
2004/08/24-10:04:42, [MQ-1004], 218,, ERROR, ras007, mqRead, queue =
raslog-test- string0123456-raslog, queue I
D = 1, type = 2
MQ errors can result in devices dropping from the switch’s Name Server or can prevent a switch
from joining the fabric. MQ errors are rare and difficult to troubleshoot; resolve them by working
with the switch supplier. When encountering an MQ error, issue the supportSave command to
capture debug information about the switch; then, forward the supportSave data to the switch
supplier for further investigation.
2
SymptomI
SymptomCore file or FFDC warning messages appear on the serial console or in the system log.
Switch boot
2
C bus errors are appearing in the switch log.
Probable cause and recommended action
2
I
C bus errors generally indicate defective hardware or poorly seated devices or blades; the specific
item is listed in the error message. Refer to the Fabric OS Message Reference for information
specific to the error that was received. Some Chip-Port (CPT) and Environmental Monitor (EM)
messages contain I
2
If the I
hardware, as this is the most likely cause. The next sections provide procedures for debugging the
hardware.
Probable cause and recommended action
Issue the supportSave command. The messages can be dismissed by issuing the supportSave -R
command after all data is confirmed to be collected properly.
Error example:
*** CORE FILES WARNING (10/22/08 - 05:00:01 ) ***
3416 KBytes in 1 file(s)
use "supportsave" command to upload
C message does not indicate the specific hardware that may be failing, begin debugging the
2
C-related information.
SymptomThe enterprise-class platform model rebooted again after an initial bootup.
Probable cause and recommended action
This issue can occur during an enterprise-class platform boot up with two CPs. If any failure occurs
on active CP, before the standby CP is fully functional and has obtained HA sync, the Standby CP
may not be able to take on the active role to perform failover successfully.
In this case, both CPs reboot to recover from the failure.
Fabric OS Troubleshooting and Diagnostics Guide13
53-1002751-01
2
ATTENTION
Switch boot
Rolling Reboot Detection
A rolling reboot occurs when a switch or enterprise-class platform has continuously experienced
unexpected reboots. This behavior is continuous until the rolling reboot is detected by the system.
Once the Rolling Reboot Detection (RRD) occurs, the switch is put into a stable state so that a
minimal supportSave can be collected and sent to your service support provider for analysis. Not
every reboot activates the Rolling Reboot Detection feature.
If a rolling reboot is caused by a panic inside Linux kernel, then the RRD feature is not activated.
Reboot classification
There are two types of reboots that occur on a switch and enterprise-class platform, expected and
unexpected. Expected reboots occur when the reboots are initialized by commands, these types of
reboots are ignored by the Rolling Reboot Detection (RRD) feature. They include the following:
• reboot
• haFailover
• fastBoot
• firmwareDownload
The RRD feature is activated and halts rebooting when an unexpected reboot reason is shown
continuously in the reboot history within a certain period of time. The period of time is switch
dependent. The following are considered unexpected reboots:
• Reset
A reset reboot may be caused by one of the following:
-Power-cycle of the switch or CP.
-Linux reboot command.
-Hardware watchdog timeout.
-Heartbeat loss-related reboot.
• Software Fault:Kernel Panic
-If the system detects an internal fatal error from which it cannot safely recover, it outputs
an error message to the console, dumps a stack trace for debugging, and then performs
an automatic reboot.
-After a kernel panic, the system may not have enough time to write the reboot reason
causing the reboot reason to be empty. This is treated as an Unknown/reset case.
• Software fault
-Software Fault:Software Watchdog
-Software Fault:ASSERT
• Software recovery failure
14Fabric OS Troubleshooting and Diagnostics Guide
53-1002751-01
Loading...
+ 116 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.