Allied Telesis EPSR AT-8948 User Manual
481.51 Kb

AlliedWareTM OS

How To | Configure EPSR (Ethernet Protection Switching Ring) to Protect a Ring from Loops


Putting a ring of Ethernet switches at the core of a network is a simple way to increase the network’s resilience—sucha network is no longer susceptible to a single point of failure. However, the ring must be protected from Layer 2 loops. Traditionally,STP-basedtechnologies are used to protect rings, but they are relatively slow to recover from link failure. This can create problems for applications that have strict loss requirements, such as voice and video traffic, where the speed of recovery is highly significant.

This How To Note describes a fast alternative to STP: Ethernet Protection Switching Ring (EPSR). EPSR enables rings to recover rapidly from link or node failures—withinas little as 50ms, depending on port type and configuration. This is much faster than STP at 30 seconds or even RSTP at 1 to 3 seconds.

What information will you find in this document?

This How To Note begins by describing EPSR in the following sections:

"How EPSR Works" on page 3

"Establishing a Ring" on page 4

"Detecting a Fault" on page 5

"Recovering from a Fault" on page 5

"Restoring Normal Operation" on page 7

Next it gives step-by-stepconfiguration details and examples in the following sections:

"How To Configure EPSR" on page 8

"Example 1: A Basic Ring" on page 11

"Example 2: A Double Ring" on page 14

C613-16092-00REV D

Which products and software versions does it apply to?

"Example 3: EPSR and RSTP" on page 17

"Example 4: EPSR with Nested VLANs" on page 20

"Example 5: EPSR with management stacking" on page 23

"Example 6: EPSR with an iMAP" on page 26

Next, it discusses important implementation details in the following sections:

"Classifiers and Hardware Filters" on page 29

"Ports and Recovery Times" on page 30

"IGMP Snooping and Recovery Times" on page 31

"Health Message Priority" on page 31

Finally, it ends with troubleshooting information in the following sections:

"EPSR State and Settings" on page 32

"SNMP Traps" on page 34

"Counters" on page 35

"Debugging" on page 36

Which products and software versions does it apply to?

This How To Note applies to the following Allied Telesis switches:

AT-8948,x900-48FE,x900-48FE-N,AT-9924T,AT-9924SP,andAT-9924T/4SPswitches, running software version 2.8.1 or later

AT-9924Ts,x900-24XT,andx900-24XT-Nswitches running software version 3.1.1 or later

EPSR is also available on the following Allied Telesis switches, running the AlliedWare Plus OS software version 5.2.1 or later:

SwitchBlade x908

x900 series

For information about using the AlliedWare Plus OS, see the AlliedWare Plus Note, How To Configure EPSR (Ethernet Protection Switching Ring) to Protect a Ring from Loops. This Note is available

The implementation on the above switches is also compatible with EPSR on Allied Telesis’ Multiservice Access Platforms (iMAPs).

Page 2 | AlliedWare™ OS How To Note: EPSR

How EPSR Works

How EPSR Works

EPSR operates on physical rings of switches (note, not on meshed networks). When all nodes and links in the ring are up, EPSR prevents a loop by blocking data transmission across one port. When a node or link fails, EPSR detects the failure rapidly and responds by unblocking the blocked port so that data can flow around the ring.

In EPSR, each ring of switches forms an EPSR domain. One of the domain’s switches is themaster node and the others aretransit nodes. Each node connects to the ring via two ports.

One or more data VLANs sends data around the ring, and acontrol VLAN sends EPSR messages. A physical ring can have more than one EPSR domain, but each domain operates as a separate logical group of VLANs and has its own control VLAN and master node.

On the master node, one port is the primary port and the other is thesecondary port. When all the nodes in the ring are up, EPSR prevents loops by blocking the data VLAN on the secondary port.

The master node does not need to block any port on the control VLAN because loops never form on the control VLAN. This is because the master node never forwards any EPSR messages that it receives.

The following diagram shows a basic ring with all the switches in the ring up.


End User Ports










Control VLAN is forwarding









Control VLAN is forwarding

Data VLAN is blocked










Data VLAN is forwarding



































































































































































End User Ports





End User Ports


















End User Ports







End User Ports























Control VLAN

Control VLAN


Primary Port


Data VLAN_1

Data VLAN_1



Secondary Port


Data VLAN_2

Data VLAN_2
























EPSR Components

EPSR domain:

A protection scheme for an Ethernet ring that consists of one or more data VLANs and a control VLAN.

Master node:

The controlling node for a domain, responsible for polling the ring state, collecting error messages, and controlling the flow of traffic in the domain.

Transit node:

Other nodes in the domain.

Ring port:

A port that connects the node to the ring. On the master node, each ring port is either the primary port or the secondary port. On transit nodes, ring ports do not have roles.

Primary port:

A ring port on the master node. This port determines the direction of the traffic flow, and is always operational.

Secondary port:

A second ring port on the master node. This port remains active, but blocks all protected VLANs from operating unless the ring fails. Similar to the blocking port in an STP/RSTP instance.

Control VLAN:

The VLAN over which all control messages are sent and received. EPSR never blocks this VLAN.


A VLAN that needs to be protected from loops. Each EPSR domain has one or more data VLANs.

Page 3 | AlliedWare™ OS How To Note: EPSR

How EPSR Works

Establishing a Ring

Once you have configured EPSR on the switches, the following steps complete the EPSR ring:

1.The master node creates an EPSR Health message and sends it out the primary port. This increments the master node’sTransmit: Health counter in theshow epsr count command.

2.The first transit node receives the Health message on one of its two ring ports and, using a hardware filter, sends the message out its other ring port.

Note that transit nodes never generate Health messages, only receive them and forward them with their switching hardware. This does not increment the transit node’s Transmit: Health counter. However, it does increment the Transmit counter in the show switch port command.

The hardware filter also copies the Health message to the CPU. This increments the transit node’s Receive: Health counter. The CPU processes this message as required by the state machines, but does not send the message anywhere because the switching hardware has already done this.

3.The Health message continues around the rest of the transit nodes, being copied to the CPU and forwarded in the switching hardware.

4.The master node eventually receives the Health message on its secondary port. The master node's hardware filter copies the packet to the CPU (which increments the master node’s Receive: Health counter). Because the master received the Health message on its secondary port, it knows that all links and nodes in the ring are up.

When the master node receives the Health message back on its secondary port, it resets the Failover timer. If the Failover timer expires before the master node receives the Health message back, it concludes that the ring must be broken.

Note that the master node does not send that particular Health message out again. If it did, the packet would be continuously flooded around the ring. Instead, the master node generates a new Health message when the Hello timer expires.

Page 4 | AlliedWare™ OS How To Note: EPSR

How EPSR Works

Detecting a Fault

EPSR uses a fault detection scheme that alerts the ring when a break occurs, instead of using a spanning treelike calculation to determine the best path. The ring then automatically heals itself by sending traffic over a protected reverse path.

EPSR uses the following two methods to detect when a transit node or a link goes down:

Master node polling fault detection

To check the condition of the ring, the master node regularly sends Health messages out its primary port, as described in "Establishing a Ring" on page 4. If all links and nodes in the ring are up, the messages arrive back at the master node on its secondary port.

This can be a relatively slow detection method, because it depends on how often the node sends Health messages.

Note that the master node only ever sends Health messages out its primary port. If its primary port goes down, it does not send Health messages.

Transit node unsolicited fault detection

To speed up fault detection, EPSR transit nodes directly communicate when one of their interfaces goes down. When a transit node detects a fault at one of its interfaces, it immediately sends a LinkDown message over the link that remains up. This notifies the master node that the ring is broken and causes it to respond immediately.

Recovering from a Fault

Fault in a link or a transit node

When the master node detects an outage somewhere in the ring, using either detection method, it restores traffic flow by:

1.declaring the ring to be in a Failed state

2.unblocking its secondary port, which enables data VLAN traffic to pass between its primary and secondary ports

3.flushing its own forwarding database (FDB) for the two ring ports

Master Node States


The state when there are no link or node failures on the ring.


The state when there is a link or node failure on the ring. This state indicates that the master node received a Link-Downmessage or that the failover timer expired before the master node’s secondary port received a Health message.

Transit Node States


The state when EPSR is first configured, before the master node determines that all links in the ring are up. In this state, both ports on the node are blocked for the data VLAN. From this state, the node can move to Links Up or Links Down.

Links Up:

The state when both the node’s ring ports are up and forwarding. From this state, the node can move to Links Down.

Links Down:

The state when one or both of the node’s ring ports are down. From this state, the node can move to Preforwarding


The state when both ring ports are up, but one has only just come up and is still blocked to prevent loops. From this state, the transit node can move to Links Up if the master node blocks its secondary port, or to Links Down if another port goes down.

4.sending an EPSR Ring-Down-Flush-FDB control message to all the transit nodes, via both its primary and secondary ports

The transit nodes respond to the Ring-Down-Flush-FDBmessage by flushing their forwarding databases for each of their ring ports. As the data starts to flow in the ring’s

Page 5 | AlliedWare™ OS How To Note: EPSR

How EPSR Works

new configuration, the nodes (master and transit) re-learntheir layer 2 addresses. During this period, the master node continues to send Health messages over the control VLAN. This situation continues until the faulty link or node is repaired.

For a multidomain ring, this process occurs separately for each domain within the ring.

The following figure shows the flow of control frames when a link breaks.

Control VLAN is forwarding



Control VLAN is forwarding

Data VLANs move from blocking



Data VLANs are forwarding

to forwarding






























































Data ports move from




fowarding to blocking




Master Node Health Message


Control VLAN

Transit Node Link-DownMessage










Fault in the master node

If the master node goes down, the transit nodes simply continue forwarding traffic around the ring—theiroperation does not change.

The only observable effects on the transit nodes are that:

They stop receiving Health messages and other messages from the master node.

The transit nodes connected to the master node experience a broken link, so they send Link-Downmessages. If the master node is down these messages are simply dropped.

Neither of these symptoms affect how the transit nodes forward traffic. Once the master node recovers, it continues its function as the master node.

Page 6 | AlliedWare™ OS How To Note: EPSR

How EPSR Works

Restoring Normal Operation

Master Node

Once the fault has been fixed, the master node’s Health messages traverse the whole ring and arrive at the master node’s secondary port. The master node then restores normal conditions by:

1.declaring the ring to be in a state of Complete

2.blocking its secondary port for data VLAN traffic (but not for the control VLAN)

3.flushing its forwarding database for its two ring ports

4.sending a Ring-Up-Flush-FDBmessage from its primary port, to all transit nodes.

Transit Nodes with One Port Down

As soon as the fault has been fixed, the transit nodes on each side of the (previously) faulty link section detect that link connectivity has returned. They change their ring port state from Links Down to Pre-Forwarding,and wait for the master node to send aRing-Up-Flush-FDBcontrol message.

Once these transit nodes receive the Ring-Up-Flush-FDBmessage, they:

flush the forwarding databases for both their ring ports

change the state of their ports from blocking to forwarding for the data VLAN, which allows data to flow through their previously-blockedring ports

The transit nodes do not start forwarding traffic on the previously-downports until after they receive theRing-Up-Flush-FDBmessage. This makes sure thepreviously-downtransit node ports stay blocked until after the master node blocks its secondary port. Otherwise, the ring could form a loop because it had no blocked ports.

Transit Nodes with Both Ports Down

The Allied Telesis implementation includes an extra feature to improve handling of double link failures. If both ports on a transit node are down and one port comes up, the node:

1.puts the port immediately into the forwarding state and starts forwarding data out that port. It does not need to wait, because the node knows there is no loop in the ring— because the other ring port on the node is down

2.remains in the Links Down state

3.starts a DoubleFailRecovery timer with a timeout of four seconds

4.waits for the timer to expire. At that time, if one port is still up and one is still down, the transit node sends a Ring-Up-Flush-FDBmessage out the port that is up. This message is usually called a “Fake Ring Up message”.

Sending this message allows any ports on other transit nodes that are blocking or in the Preforwarding state to move to forwarding traffic in the Links Up state. The timer delay lets the device at the other end of the link that came up configure its port appropriately, so that it is ready to receive the transmitted message.

Note that the master node would not send a Ring-Up-Flush-FDBmessage in these circumstances, because the ring is not in a state of Complete. The master node’s secondary port remains unblocked.

Page 7 | AlliedWare™ OS How To Note: EPSR

How To Configure EPSR

How To Configure EPSR

This section first outlines, step-by-step,how to configure EPSR. Then it discusseschanging the settings for the control VLAN, if you need to do this after initial configuration.

Configuring EPSR

1. Connect your switches into a ring

EPSR does not in itself limit the number of nodes that can exist on any given ring. Each switch can participate in up to 16 rings.

If you already have a ring in a live network, disconnect the cable between any two of the nodes before you start configuring EPSR, to prevent a loop.

2. On each switch, configure EPSR

On each switch, perform the following configuration steps. Configuration of the master node and each transit node is very similar.

i. Configure the control VLAN

This step creates the control VLAN and adds the ring ports to it as tagged ports.

Enter the commands:

create vlan=control-vlan-namevid=control-vid

add vlan=control-vid port=ring-ports frame=tagged

Note that you can use trunk groups for the ring ports.

ii. Configure the data VLAN

This step creates the data VLAN (or VLANs—youcan have as many as you want) and adds the ring ports as tagged ports.

Enter the commands:

create vlan=data-vlan-namevid=data-vid

add vlan=data-vid port=ring-ports frame=tagged

The two ring ports must belong to the control VLAN and all data VLANs.

Page 8 | AlliedWare™ OS How To Note: EPSR

How To Configure EPSR

iii. Remove the ring ports from the default VLAN

If you leave all the ring ports in the default VLAN (vlan1), they will create a loop, unless vlan1 is part of the EPSR domain. To avoid loops, you need to do one of the following:

make vlan1 a data VLAN, or

remove the ring ports from vlan1, or

remove at least one of the ring ports from vlan1 on at least one of the switches. We do not recommend this option, because the action you have taken is less obvious when maintaining the network later.

In this How To Note, we remove the ring ports from the default VLAN. Use the command:

delete vlan=1 port=ring-ports

iv. Configure the EPSR domain

This step creates the domain, specifying whether the switch is the master node or a transit node. It also specifies which VLAN is the control VLAN, and on the master node which port is the primary port.

Enter one of the following commands:

On the master node:

create epsr=name mode=mastercontrolvlan=control-vlan-name primaryport=port-number

On each transit node:

create epsr=name mode=transitcontrolvlan=control-vlan-name

This step also adds the data VLAN to the domain. Enter the command:

add epsr=namedatavlan=data-vlan-name

v. Enable EPSR

This step enables the domain on each switch. Enter the command:

enable epsr=name

3. Configure other ports and protocols as required

On each switch, configure the other ports and protocols that are required for your network.

Page 9 | AlliedWare™ OS How To Note: EPSR

How To Configure EPSR

Modifying the Control VLAN

You cannot modify the control VLAN while EPSR is enabled. If you try to remove or add ports to the control VLAN, the switch generates an error message as follows:

Manager> delete vlan=1000 port=1

Error (3089409): VLAN 1000 is a control VLAN in EPSR and cannot be modified

Disable the EPSR domain and then make the required changes. Note that disabling EPSR will create a loop, so is not recommended on a network with live data. Of course, in a live network, you can manually prevent a loop by disconnecting the cable between any two of the nodes.

Page 10 | AlliedWare™ OS How To Note: EPSR