Extreme Networks BlackDiamond 6800, MSM-3, Hitless Failover, Hitless User Manual

Hitless Failover and Hitless Upgrade User Guide
This guide describes hitless failover and hitless upgrade, including:
Causes and Behaviors of MSM Failover on page 1
Summary of Supported Features on page 3
Overview of Hitless Failover on page 3
Configuring ESRP for Hitless Failover on page 8
Overview of Hitless Upgrade on page 10
Performing a Hitless Upgrade on page 11
T-sync is a term used to describe the hitless failover and hitless upgrade features available on the BlackDiamond management control from the master MSM-3 to the slave MSM-3 without causing traffic to be dropped. Hitless upgrade allows an ExtremeWare without taking it out of service or losing traffic.
To configure hitless failover or hitless upgrade, you must install MSM-3 modules in your BlackDiamond chassis; MSM64i modules do not support hitless failover or hitless upgrade.
If you enable T-sync and normally use scripts to configure your switch, Extreme Networks recommends using the
NOTE
To use the T-sync features available on the MSM-3 modules, you must install and run ExtremeWare
7.1.1 or later and BootROM 8.1 or later.
®
Management Switch Module 3 (MSM-3). In simple terms, hitless failover transfers switch
®
software upgrade on a BlackDiamond 6800 series chassis
download configuration incremental command instead.
Causes and Behaviors of MSM Failover
This section describes the events that cause an MSM failover and the behavior of the system after failover occurs.
The following events cause an MSM failover:
Operator command
Part Number: 121071-00 Rev 01 1
Causes and Behaviors of MSM Failover
Software exception
Watchdog timeout
Keepalive failure
Diagnostic failure
Hot-removal of the master MSM
Hard-reset of the master MSM
NOTE
Operator command and software exception support hitless failover.
Operator Command and Software Exception. Of the listed events, only operator command and
software exception result in a hitless failover. The remaining sections of this guide describe T-sync, including:
Supported features
How to configure the T-sync features
The behavior surrounding hitless failover and hitless upgrade
Watchdog Timeout and Keepalive Failure. Both the watchdog timeout and the keepalive failure are
long duration events, thus they are not hitless. If one of these events occur:
All saved operational state information is discarded
The failed master is hard reset
The slave uses its own flash configuration file
Diagnostic Failure, Hot-removal, or Hard-reset of the Master MSM. If the master MSM-3
experiences a diagnostic failure or you hot-remove it, a “partial” hitless failover function is performed and some traffic flows will not experience traffic hits. The switch cannot perform a completely hitless failover because it lost hardware that it uses during normal operation.
To understand how traffic is affected when MSM-3 hardware is lost, a brief explanation of the switch fabric is given. Each MSM-3 has switching logic that provides bandwidth to each I/O module. When two MSM-3s are present, both provide bandwidth so that twice the amount of bandwidth is available. For each traffic flow that requires inter-module data movement, the I/O module chooses an MSM-3 to switch the data for that flow. When an MSM-3 is lost, the remaining MSM-3 eventually instructs the I/O module that all inter-module traffic is to use the switching logic of the remaining MSM-3. In the time between the loss of an MSM-3 and the reprogramming of the I/O module, traffic destined for the lost MSM-3 switching logic is dropped.
The I/O module also switches some traffic flows directly between its own ports without MSM-3 involvement.
If you hot-remove the master MSM-3, only half of the switch fabric remains operational. The slave becomes the master and reprograms each I/O module to send all traffic through it’s own switch fabric logic. In the time between the failure and the reprogramming of the I/O module, traffic destined for the removed MSM-3’s switching logic is lost. After the new master recovers, it reprograms the I/O module so that all traffic uses the available MSM-3 switching logic.
If you hard-reset the master MSM-3 (using the recessed reset button on the MSM-3), all of the master’s switch programming is lost. As a result, traffic that the I/O module forwards to the master is also lost.
2 Hitless Failover and Hitless Upgrade User Guide
After a failover occurs, the new master reprograms the “reset” MSM-3’s switch fabric and the switching logic of both MSM-3s is available again. In this case, the “Cause of last MSM failover” displayed by the
show msm-failover command indicates “removal,” and a “partial” hitless failover has occurred.
A “partial” hitless failover preserves:
Data flows in the hardware and software, layer 2 protocol states, configurations, etc.
All of the software states and the hardware states that are not interrupted by the diagnostic failure,
hot-removal, or hard-reset.
After a failover caused by hot-removal or diagnostic failure, the I/O modules are reprogrammed to use only the switching logic of the remaining MSM-3. After a failover caused by a hard-reset of the master MSM-3, the reset MSM-3’s switch fabric is reprogrammed and placed into full operation. Thus, a data hit of several seconds occurs for flows that were directed to the failed MSM-3. For flows that were directed to the currently active MSM-3, or for inter-module flows, there is no hit.
NOTE
Hitless upgrade of configuration is not suppor ted on MSM-3.
Summary of Supported Features
Summary of Suppor ted Features
This section describes the features supported by T-sync. If the information in the release notes differ from the information in this guide, follow the release notes.
Preserves unsaved configurations across a failover
Load sharing
Learned MAC address
ARP
STP
EAPSv1
IP FDB entries
Access lists
ESRP
SNMP trap failover
Configuration via the web, CLI, and SNMP
NOTE
T-sync does not support EAPSv2.
Overview of Hitless Failover
When you install two MSM-3 modules in a BlackDiamond chassis, one MSM-3 assumes the role of master and the other assumes the role of slave. The master executes the switch’s management function,
Hitless Failover and Hitless Upgrade User Guide 3
Overview of Hitless Failover
and the slave acts in a standby role. Hitless failover is a mechanism to transfer switch management control from the master to the slave.
When there is a software exception in the master, the slave may be configured to take over as the master. Without T-sync, a software exception results in a traffic “hit” because the hardware is reinitialized and all FDB information is lost. The modules require seconds to complete the initialization, but it may take minutes to relearn the forwarding information from the network. With T-sync, it is possible for this transition to occur without interrupting existing unicast traffic flows.
During failover, the master passes control of all system management functions to the slave. In addition, hitless failover preserves layer 2 data and layer 3 unicast flows for recently routed packets. When a hitless failover event occurs, the failover timer begins and all previously established traffic flows continue to function without packet loss. Hitless failover also preserves the:
Master’s active configuration (both saved and unsaved)
Forwarding and resolution database entries (layer 2, layer 3, and ARP)
Loop redundancy and protocol states (STP, EAPS, ESRP, and others)
Load shared ports
Access control lists
NOTE
Hitless failover does not preserve the full route table, routing protocol databases for OSPF, BGP, RIP, etc., or ICMP traffic.
Hitless Failover Concepts
T-sync preserves the current active configuration across a hitless failover. When you first boot up your BlackDiamond switch, it uses the master MSM-3 configuration. During the initialization of the slave, the master’s active configuration is relayed to the slave. As you make configuration changes to the master, the master relays those individual changes to the slave. When a failover occurs, the slave continues to use the master’s configuration. Regardless of the number of failovers, the active configuration remains in effect provided the slave can process it.
NOTE
It is important to save any switch configuration changes that you make. Configuration changes made in real-time must be saved on the master MSM-3 to guarantee hitles s failover and hitless upgrade operation. Failure to save the configuration may result in an unstable environment after the hitless failover or upgrade operation is complete.
If a hitless failover occurs before you can save the changes, the changes are still in effect on the new master MSM-3. The asterisk appears in front of the command line if unsaved configuration changes are present after a hitless failover. To save your changes after a hitless failover, use the
save command.
NOTE
If you have a BlackDiamond 6816 switch populated with four MSM-3 modules, the MSMs in slots C and D provide extra switch bandwidth; they do not participate in switch management functions.
4 Hitless Failover and Hitless Upgrade User Guide
Configuring Hitless Failover
Configuring Hitless Failover
You can configure failover so that one of the following occurs:
All links are forced to be in a down state (nothing is preserved)
Only the configuration is preserved
Only the link up/down state is preserved
The configuration and link up/down states are preserved
The configuration, link up/down states, and layer 2 FDB and states (STP, EAPS, and ESRP) are
preserved
The configuration, link up/down states, layer 2 FDB and states, and the layer 3 FDB and ARP table are preserved
Hitless failover operation utilizes the last two options. To enable hitless failover, see the following section, “Enabling Hitless Failover.”
You can also configure ESRP hitless failover behavior. See “Configuring ESRP for Hitless Failover” on page 8 for more information.
To use the hitless failover feature, you must have a BlackDiamond 6800 series chassis installed with MSM-3 modules running ExtremeWare 7.1.1 or later and BootROM 8.1 or later.
Enabling Hitless Failover
To enable hitless failover, you need to:
Configure the system recovery level to automatically reboot after a software exception
Enable the slave MSM-3 to “inherit” its configuration from the master MSM-3
Configure the external ports to remain active when a failover occurs
Enable the preservation of layer 2 and/or layer 3 state in the slave MSM-3
NOTE
If you have an active Telnet session and initiate a hitless failover on that switch, the session disconnects when failover occurs.
Configuring the System Recovery Level
You must configure the slave MSM-3 to take over control of the switch if there is a software exception on the master. To configure the slave to assume the role of master, use the following command:
configure sys-recovery-level [all | critical] msm-failover
where the following is true:
all—Configures ExtremeWare to log an error into the syslog and automatically reboot the system
after any task exception
critical—Configures ExtremeWare to log an error into the syslog and automatically reboot the
system after a critical task exception
Hitless Failover and Hitless Upgrade User Guide 5
Loading...
+ 11 hidden pages