High Availability for the Cisco Catalyst 650 0 Series Switches
Overview
Cisco Catalyst® 6500 Series multilayer
switches have become an essential component
of asound network designin today’senterprise
and service provider environments. Having
such a critical role, the Cisco Catalyst 6500
Series must provide a reliable switching
platform, and offer high performance and
intelligent network services. The high
availability of the Cisco Catalyst 6500 Series
evenhas the capability to maintain an IP phone
call during supervisor engine failover. This
paper discusses how the Cisco Catalyst 6500
Series provides high system availability
through hardware and software redundancy
features, and focuses specifically on the
following three areas:
Fabric redundancy of the Switch Fabric
Module (SFM)
Supervisor engine redundancy with the Cisco
CatalystOperating System (Catalyst OS),High
Availabilityfeature,which includes thestateful
protocol redundancy and image versioning
functions
• Multilayer Switch Feature Card (MSFC)
®
Cisco IOS
Software redundancy
features—Dual Router Mode (DRM),
Configuration-Synchronization
(config-sync), and Single Router
Mode (SRM).
This paper is based on the hybrid software
model for the Cisco Catalyst 6500 Series
(Cisco Catalyst OS on the supervisor engine,
Cisco IOS Software on the MSFC) and not on
the Cisco IOS Software model (native Cisco
IOS Software). All feature set references will
be specifically described as a Cisco Catalyst
OS feature on a supervisor engine or a Cisco
IOS Software feature on an MSFC. The Cisco
Catalyst OS High Availabilityfeature was first
introduced in the Cisco Catalyst OS 5.4 release
and is available for both Cisco Catalyst
Supervisor Engine 1A and Catalyst Supervisor
Engine 2. Support for DRM began in Cisco
IOS Software Release12.0(7)XE1. TheMSFC
config-sync redundancy feature for DRM is
supported in Cisco IOS Software Release
12.1(3a)E4 for both the MSFC and MSFC2.
The MSFC SRM feature was first supported
with Cisco Catalyst OS 6.3.1 and Cisco IOS
Software Release 12.1(8)E2 for the MSFC2.
Figure 1
The Cisco Catalyst 6500
Series WS-6503,
WS-C6506, WS-C6509,
WS-C6509-NEBS, and
WS-C6513
This paper is the second version of the original
that was written in September 2000. This
version includes some updated sections for
moreprecise understanding and a discussionof
SRM.
Although component-levelredundancy is very
important, a high-availability network design
relies on the proper combination of individual
system redundancy and overall network
Since its introduction, the Cisco Catalyst 6500 Series has been built on a single 32-Gbps bus switching architecture that
provides the data path for all packets through the system. The Cisco Catalyst 6500 Series includes a 256-Gbps crossbar
switching fabric (the SFM for higher bandwidth capacities and 30+ Mpps of forwardingperformance). The SFM is supported
in the Cisco Catalyst 6506 and the Cisco Catalyst 6509 chassis. The SFM2 is essentially the same fabric but designed to work
in all the Cisco Catalyst 6506, 6509, and 6513.
Switching Fabric Failover
The SFM also provides another level of hardware redundancy to the system. The single fabric channel versions of the
fabric-enabled line cards providea connection to both theswitching fabric and the existingsystem busbackplane. This allows
the Cisco Catalyst 6500 Series to use the SFM as the primary data path between fabric-enabled line cards. In the event that an
SFM fails, the system will fail over to the 32-Gbps bus to ensure that packet switching continues (albeit at the bus capacities
of 15 Mpps throughput and 32-Gbps bandwidth) and the network remains online. Additionally, a Cisco Catalyst 6500 Series
can be configured with dual SFMs (in slots 5 and 6 of a Catalyst 6506 or Catalyst 6509 or in slots 7 and 8 of a Catalyst 6513),
which provide a third level of fabric redundancy. In this configuration, a failure on the primary fabric module would result in
a switchover to the secondary fabric module for continued operation at 30 Mpps. Also, in the event of further fabric module
failures, the ability to switch over yet again to the system bus would still be available.
Switching Fabric Operation
Different combinations of SFMs, fabric-enabled line cards, and classic line cards in a chassis affect the internal switching
operation, which in turn affects the failover characteristics. This is an important point to understand as fabric-to-fabric or
fabric-to-bus failover scenarios are discussed. When an SFM is installed in a system of only fabric-enabled line cards, the
switching operation is called compact mode. This allows for 32-byte compacted headers (not the entire packet) to be sent
across the bus to the supervisor engine for each forwarding decision. The increase in efficiency for this operation allows for
inherent system performance capable of 30 Mpps. The data path for fabric-enabled cards is via the SFM.
If a classic line card is installed in a system with an SFM, the header format on the bus must be compatible with all the line
cards in the system. Because classic line cards do not support compact mode, the fabric-enabled line cards will change their
switching modes to truncated mode. Truncated mode allows the fabric-enabled line card to send packets in a 64-byte
header-only format that the classic line cards can understand. It is very important to note that the truncated mode still uses the
SFM as the data path between fabric-enabled line cards. Although the maximum centralized forwarding performance is 15
Mpps in a system of classic and fabric cards, the switch fabric is still used to provide higher bandwidth to the system. If
fabric-enabled line cards are installed in a system with no SFM, they will operate in flow-through mode even when classic
cards are present. This mode essentially programs the line card to operate in a classic mode whereby the entire packet is sent
across the system busfor a forwarding decision. Asystem in flow-through mode is capable of switching 15 Mpps and the data
path is via the 32-Gbps bus.
The changes to the switching mode are done automatically, depending on the hardware installed. No specific configuration is
necessaryon theSFM for typical operation. Thecurrent switchingmode of the switch fabricmodule can be monitored through
the Catalyst OS command-line interface (CLI) using the show fabric channel switchmode command. Example 1 shows a
completely fabric-enabled system (all compact mode) and Example 2 shows classic and fabric-enabled line cards in an SFM
system (flow-through and truncated mode).
Example 1: Fabric-Enabled System
The following output is from a configuration with dual supervisor engines, one SFM, and a fabric-enabled line card in slot 3.
Sup2-A> (enable) show fabric channel switchmode
Module Num Fab Chan Fab Chan Switch Mode Channel Status
CLI output description for show fabric channel switchmode:
Num Fab Chan—The number of fabric channels that the module is associated with.
FabChan—The first number is the fabric channel number that the module is associated with. The second number is the fabric
channel number that the SFM is associated with.
Switch mode—Possible output is “flow through,” “truncated,”“compact.” Switch mode applies only to line cards with fabric
and bus connections.
Channel status—Possible output is “ok,” “sync error,” “CRC error,” “heartbeat error,” “buffer error,” “timeout error,” or
“unknown.” Channel status applies only to line cards with fabric and bus connections.
The following output is from a configuration with dual supervisor engines, one SFM, one classic line card in slot 3, and two
fabric-enabled line cards in slots 7 and 9.
Sup-A> (enable) show fabric channel switchmode
Module Num Fab Chan Fab Chan Switch Mode Channel Status
Automatic switching mode changes allow the acceptance of classic or fabric-enabled cards into a system with no manual
configuration change. As previously stated, there is a performance versus interoperability tradeoff when installing classic line
cards into a fabric-enabled system. Because many network environments hold performance in higher regard, a fabric-enabled
system can be configured to reject classic cards (for example, not support flow-through mode). By issuing the set systemcrossbar-fallback none command, the system will not start classic line cards installed in the chassis, thereby running in
compact switching mode (30 Mpps) only.
Sup-A> (enable) set system crossbar-fallback none
The default for the crossbar-fallback is bus mode. Todetermine the current system state, the show system crossbar-fallback
command is available.
Sup-A> (enable) show system crossbar-fallback
Cross-fallback: bus-mode
In summary, the SFM can be redundantly configured in a chassis to provide fabric-to-fabric and fabric-to-bus failover. A
system configured with dual SFMs can use the standby SFM for failover. Additionally, single SFM systems with
fabric-enabled line cards can fail over to the 32-Gbps bus for continuous operation. In both of these scenarios, recovery and
return to normal operation occur in less than three seconds. This quick recoverytime allows for a switching mode change and
asynchronization processthat must takeplace betweeneach line card, supervisor engine,and the SFM fabric channelsin these
scenarios. The capability to configure redundant SFMs provides up to three levelsof backplane redundancy,helping to enable
continuous operations with minimal impact to network availability in the event of a hardware failure.
As previously mentioned, the High Availability feature on the Cisco Catalyst 6500 Series provides low-impact, stateful
switchover between redundant supervisor engines. This feature was first available in Cisco Catalyst OS Software Version5.4.
Supervisor Engine Switchover
Dual supervisor engines provide hardware redundancy for the forwarding intelligence of the Cisco Catalyst 6500 Series. The
Cisco Catalyst 6500 Series can support up to two supervisor engines in slots 1 and 2 only.One is the active supervisor engine
and the other isthe standby supervisor engine. Theactive supervisor engine is the first one to go online. This can be confirmed
by the “Active” LED on the supervisor engine or by typing the show module command from the console. Both supervisor
engines must be the same hardware models. This means that if a Policy Feature Card (PFC) and a MSFC are on a Supervisor
1A in slot 1, then a PFC and MSFC must be also on a Supervisor Engine 1A in slot 2, or if a Supervisor Engine 2 is in slot 1,
a Supervisor Engine 2 must also be in slot 2. Supervisor engines 1A and 2 can be used in the Cisco Catalyst 6000 and 6500
series. If an active supervisor is taken offline or fails, the standby supervisor takes control of the system.
The two supervisor engines in a redundant supervisor configuration have different responsibilities. The active supervisor
engine is responsiblefor controllingthe system busand allline cards. All protocols arerunning onthe activesupervisorengine
and it performs all packet forwarding. The standby supervisor engine does not communicate with the line cards. It receives
packets from the network and populates its forwarding tables with this information but does not participate in any packet
forwarding. The relevant protocols on the system are initialized, but not active, on the standby supervisor engine. The Cisco
Catalyst 6500 Series supervisor engines are hot swappable and the standby supervisor engine can be installed in an active
system without affecting network operation. Also important to note is that redundant supervisor engines do not perform load
sharing. The active supervisor engine is providing the entire packet forwarding intelligence for the system (N+1 redundancy).
If the active supervisor engine fails, the standby supervisor engine can maintain the same system load.
The standby supervisor engine polls the active supervisor engine via the Ethernet out-of-band channel (EOBC) every 5–10
milliseconds to monitor the online status of the active supervisor engine. The active supervisor engine may go offline for a
variety of reasons such as hardware failures, system overload conditions, memory corruption issues, removal from chassis, or
being reset by the operator. The standby supervisor engine detects this type of failure and becomes the new active supervisor
engine. The Cisco Catalyst OS software on the supervisor engine is responsible for restoring the protocols, line cards, and
forwarding engines to normal operation. This restoration takes place via a fast switchover or a high-availability switchover.
Supervisor Fast Switchover
Becausethe CiscoCatalyst OS High Availabilityfeatureis disabledby default, thealternativeis referredto as FastSwitchover.
The Fast Switchover feature is the predecessor to the High Availability feature and as such is the supervisor switchover
mechanism in place when high availability is disabled or not supported in the software version. This feature reduces the
switchover time by skipping some events that would typically take place should a supervisor fail. Specifically, the fast
switchovermechanism allowseach line card toskip the respective software downloadsand a portion of the diagnostics, which
are normally a part of system re-initialization. The switchover still includes restarting all protocols at Layer 2 and above as
well as resetting allports. The resulting switchoverperformance with default settings will take approximately 28 seconds plus
the time it takes for the protocols to restart. As an example, a switch with the default time values for the Spanning-Tree
Protocol took approximately 58seconds after the fast switchover to begin forwardingtraffic again. However, the time tobegin
forwarding traffic after a fast switchover can be reduced by tuning the switch from the default settings. By enabling Portfast,
disabling port channels (PagP), and turning trunking off for ports to which workstations are directly attached, the fast
switchover time can be reduced to approximately 10 seconds. In a live network environment, these switchover times present
a major disruption to network operations.
Supervisor High Availability Feature
The High Availability software feature of Cisco Catalyst OS further enhances the Cisco Catalyst 6500 Series hardware
redundancy by also providing protocol redundancy. This feature includes stateful protocol redundancy and image versioning.
The High Availability feature must be enabled via the CLI for these features to operate.
Sup-A> (enable) set system highavailability enable
System high availability enabled.
As a general practice with redundant supervisors, it is recommended that the High Availability feature be enabled for normal
operation.
Supervisor Stateful Protocol Redundancy
The stateful supervisor switchover is when the switchover time from the active to the standby supervisor is reduced to less
than three seconds. This reduced downtime isachievedby synchronizingmany of the Layer 2, Layer 3, andLayer 4protocols
between the active and standby supervisor engines and is called maintaining protocol state.
For stateful protocol redundancy between dual supervisor engines, a protocol state database is maintained on each supervisor
engine for all protocols and featuresrequiring high-availabilitysupport. Most of these protocols areonly running on the active
supervisor engine. In the eventof a high-availability switchover, the new activesupervisor engine can start the protocols from
the updated database state, rather than the initialization state. This is how a redundant system can maintain stateful protocol
redundancy and minimal network downtime when the active supervisor engine goes offline.
• High AvailabilitySupported Feature—High availability iffully supported. The state of the feature is preservedbetween
the active and standby supervisor engines in the protocol database.
• High AvailabilityCompatible Feature—High availability is not supported for these features. The protocol database for
these features is not synchronized between supervisor engines. The feature can be used if the High Availabilityfeature is
enabled. For example, if GARP Multicast Registration Protocol (GMRP) and high availability were both enabled and a
high-availabilitysupervisor enginefailover took place, the GMRP protocol would be restarted from the initialization state
(non-stateful). The stateful protocol redundancy is still in place for the supported features if a compatible feature is
enabled.
• High AvailabilityIncompatible Feature—Highavailability is not supported. The protocol database for these features is
not synchronized between supervisor engines. The feature should not be enabled if the High Availability feature is
enabled. These features are not supported with high availability enabled because incorrect behavior may result.
1
Important: Do not use these features if a high-availability system is required.
1. Layer 4 protocols include the Layer 4 information in extended IP access lists.