Configure EPSR (Ethernet Protection Switching
Ring) to Protect a Ring from Loops
Introduction
Putting a ring of Ethernet switches at the core of a network is a simple way to increase the
network’s resilience—such a network is no longer susceptible to a single point of failure.
However, the ring must be protected from Layer 2 loops. Traditionally, STP-based
technologies are used to protect rings, but they are relatively slow to recover from link
failure. This can create problems for applications that have strict loss requirements, such as
voice and video traffic, where the speed of recovery is highly significant.
This How To Note describes a fast alternative to STP: Ethernet Protection Switching Ring
(EPSR). EPSR enables rings to recover rapidly from link or node failures—within as little as
50ms, depending on port type and configuration. This is much faster than STP at 30 seconds
1
or even RSTP at
to 3 seconds.
What information will you find in this document?
This How To Note begins by describing EPSR in the following sections:
•"How EPSR Works" on page 3
•"Establishing a Ring" on page 4
•"Detecting a Fault" on page 5
•"Recovering from a Fault" on page 5
•"Restoring Normal Operation" on page 7
Next it gives step-by-step configuration details and examples in the following sections:
•"How To Configure EPSR" on page 8
•"Example
1
: A Basic Ring" on page 11
C613-16092-00 REV D
•"Example 2: A Double Ring" on page 14
www.alliedtelesis.com
Which products and software versions does it apply to?
•"Example 3: EPSR and RSTP" on page 17
•"Example 4: EPSR with Nested VLANs" on page 20
•"Example 5: EPSR with management stacking" on page 23
•"Example 6: EPSR with an iMAP" on page 26
Next, it discusses important implementation details in the following sections:
•"Classifiers and Hardware Filters" on page 29
•"Ports and Recovery Times" on page 30
•"IGMP Snooping and Recovery Times" on page 31
•"Health Message Priority" on page 31
Finally, it ends with troubleshooting information in the following sections:
•"EPSR State and Settings" on page 32
•"SNMP Traps" on page 34
•"Counters" on page 35
•"Debugging" on page 36
Which products and software versions does it
apply to?
This How To Note applies to the following Allied Telesis switches:
•AT-8948, x900-48FE, x900-48FE-N, AT-9924T, AT-9924SP, and AT-9924T/4SP switches,
running software version 2.8.
•AT-9924Ts, x900-24XT, and x900-24XT-N switches running software version 3.
later
EPSR is also available on the following Allied Telesis switches, running the AlliedWare Plus OS
software version 5.2.
1
or later:
•SwitchBlade x908
•x900 series
For information about using the AlliedWare Plus OS, see the AlliedWare Plus Note, How To Configure EPSR (Ethernet Protection Switching Ring) to Protect a Ring from Loops. This Note is
available from www.alliedtelesis.com/resources/literature/howto_plus.aspx.
The implementation on the above switches is also compatible with EPSR on Allied
Te l e s i s ’ Multiservice Access Platforms (iMAPs).
1
or later
1.1
or
Page 2 | AlliedWare™ OS How To Note: EPSR
EPSR Components
EPSR domain:
A protection scheme for an
Ethernet ring that consists of
one or more data VLANs and a
control VLAN.
Master node:
The controlling node for a
domain, responsible for polling
the ring state, collecting error
messages, and controlling the
flow of traffic in the domain.
Transit node:
Other nodes in the domain.
Ring port:
A port that connects the node
to the ring. On the master node,
each ring port is either the
primary port or the secondary
port. On transit nodes, ring
ports do not have roles.
Primary port:
A ring port on the master node.
This port determines the
direction of the traffic flow, and
is always operational.
Secondary port:
A second ring port on the
master node. This port remains
active, but blocks all protected
VLANs from operating unless
the ring fails. Similar to the
blocking port in an STP/RSTP
instance.
Control VLAN:
The VLAN over which all
control messages are sent and
received. EPSR never blocks this
VLAN.
Data VLAN
A VLAN that needs to be
protected from loops. Each
EPSR domain has one or more
data VLANs.
D
a
t
a
V
L
A
N
_
2
D
a
t
a
V
L
A
N
_
1
C
o
n
t
r
o
l
V
L
A
N
Master
Node
Transit
Node
1
Transit
Node
4
Transit
Node
2
Data VLAN_1
Control VLAN
Primary Port
Transit
Node
3
epsr-basic-ring
Control VLAN
Data VLAN_2
P
SSecondary Port
Control VLAN is forwarding
Data VLAN is forwarding
End User Ports
Data VLAN_2
Control VLAN is forwarding
Data VLAN is blocked
Data VLAN_1
P
S
End User Ports
End User Ports
End User Ports
End User Ports
How EPSR Works
How EPSR Works
EPSR operates on physical rings of switches (note, not on
meshed networks). When all nodes and links in the ring
are up, EPSR prevents a loop by blocking data transmission
across one port. When a node or link fails, EPSR detects
the failure rapidly and responds by unblocking the blocked
port so that data can flow around the ring.
In EPSR, each ring of switches forms an EPSR domain.
One of the domain’s switches is the master node and
the others are transit nodes. Each node connects to the
ring via two ports.
One or more data VLANs sends data around the ring,
and a control VLAN sends EPSR messages. A physical
ring can have more than one EPSR domain, but each
domain operates as a separate logical group of VLANs and
has its own control VLAN and master node.
On the master node, one port is the primary port and
the other is the secondary port. When all the nodes in
the ring are up, EPSR prevents loops by blocking the data
VLAN on the secondary port.
The master node does not need to block any port on the
control VLAN because loops never form on the control
VLAN. This is because the master node never forwards
any EPSR messages that it receives.
The following diagram shows a basic ring with all the
switches in the ring up.
Page 3 | AlliedWare™ OS How To Note: EPSR
How EPSR Works
Establishing a Ring
Once you have configured EPSR on the switches, the following steps complete the EPSR ring:
1.The master node creates an EPSR Health message and sends it out the primary port. This
increments the master node’s Transmit: Health counter in the show epsr count
command.
2.The first transit node receives the Health message on one of its two ring ports and, using
a hardware filter, sends the message out its other ring port.
Note that transit nodes never generate Health messages, only receive them and forward
them with their switching hardware. This does not increment the transit node’s Transmit:
Health counter. However, it does increment the Transmit counter in the show switch port command.
The hardware filter also copies the Health message to the CPU. This increments the
transit node’s Receive: Health counter. The CPU processes this message as required by
the state machines, but does not send the message anywhere because the switching
hardware has already done this.
3.The Health message continues around the rest of the transit nodes, being copied to the
CPU and forwarded in the switching hardware.
4.The master node eventually receives the Health message on its secondary port. The
master node's hardware filter copies the packet to the CPU (which increments the master
node’s Receive: Health counter). Because the master received the Health message on its
secondary port, it knows that all links and nodes in the ring are up.
When the master node receives the Health message back on its secondary port, it resets
the Failover timer. If the Failover timer expires before the master node receives the Health
message back, it concludes that the ring must be broken.
Note that the master node does not send that particular Health message out again. If it
did, the packet would be continuously flooded around the ring. Instead, the master node
generates a new Health message when the Hello timer expires.
Page 4 | AlliedWare™ OS How To Note: EPSR
Detecting a Fault
Master Node States
Complete:
The state when there are no link or
node failures on the ring.
Failed:
The state when there is a link or
node failure on the ring. This state
indicates that the master node
received a Link-Down message or
that the failover timer expired before
the master node’s secondary port
received a Health message.
Transit Node States
Idle:
The state when EPSR is first
configured, before the master node
determines that all links in the ring
are up. In this state, both ports on
the node are blocked for the data
VLAN. From this state, the node can
move to Links Up or Links Down.
Links Up:
The state when both the node’s ring
ports are up and forwarding. From
this state, the node can move to
Links Down.
Links Down:
The state when one or both of the
node’s ring ports are down. From this
state, the node can move to Preforwarding
Pre-forwarding:
The state when both ring ports are
up, but one has only just come up and
is still blocked to prevent loops. From
this state, the transit node can move
to Links Up if the master node blocks
its secondary port, or to Links Down
if another port goes down.
EPSR uses a fault detection scheme that alerts the ring
when a break occurs, instead of using a spanning treelike calculation to determine the best path. The ring
then automatically heals itself by sending traffic over a
protected reverse path.
EPSR uses the following two methods to detect when
a transit node or a link goes down:
•Master node polling fault detection
To check the condition of the ring, the master
node regularly sends Health messages out its
primary port, as described in "Establishing a
Page 5 | AlliedWare™ OS How To Note: EPSR
Ring" on page 4. If all links and nodes in the ring are
up, the messages arrive back at the master node on
its secondary port.
This can be a relatively slow detection method,
because it depends on how often the node sends
Health messages.
Note that the master node only ever sends Health
messages out its primary port. If its primary port
goes down, it does not send Health messages.
•Transit node unsolicited fault detection
To speed up fault detection, EPSR transit nodes
directly communicate when one of their interfaces
goes down. When a transit node detects a fault at
one of its interfaces, it immediately sends a LinkDown message over the link that remains up. This
notifies the master node that the ring is broken and
causes it to respond immediately.
Recovering from a Fault
Fault in a link or a transit node
When the master node detects an outage somewhere
in the ring, using either detection method, it restores
traffic flow by:
1.declaring the ring to be in a Failed state
2.unblocking its secondary port, which enables data
VLAN traffic to pass between its primary and
secondary ports
3.flushing its own forwarding database (FDB) for the
two ring ports
4.sending an EPSR Ring-Down-Flush-FDB control message to all the transit nodes, via
both its primary and secondary ports
The transit nodes respond to the Ring-Down-Flush-FDB message by flushing their
forwarding databases for each of their ring ports. As the data starts to flow in the ring’s
How EPSR Works
How EPSR Works
Master
Node
Transit
Node
1
Transit
Node
4
Transit
Node
2
Control VLAN
Ring-Down-Flush-FDB Message
Transit
Node
3
Control VLAN is forwarding
Data VLANs are forwarding
Control VLAN is forwarding
Data VLANs move from blocking
to forwarding
Data ports move from
fowarding to blocking
Transit Node Link-Down Message
Master Node Health Message
P
S
epsr-broken-ring
1
2
3
1
2
3
new configuration, the nodes (master and transit) re-learn their layer 2 addresses. During
this period, the master node continues to send Health messages over the control VLAN.
This situation continues until the faulty link or node is repaired.
For a multidomain ring, this process occurs separately for each domain within the ring.
The following figure shows the flow of control frames when a link breaks.
Fault in the master node
If the master node goes down, the transit nodes simply continue forwarding traffic around
the ring—their operation does not change.
The only observable effects on the transit nodes are that:
•They stop receiving Health messages and other messages from the master node.
•The transit nodes connected to the master node experience a broken link, so they send
Link-Down messages. If the master node is down these messages are simply dropped.
Neither of these symptoms affect how the transit nodes forward traffic.
Once the master node recovers, it continues its function as the master node.
Page 6 | AlliedWare™ OS How To Note: EPSR
How EPSR Works
Restoring Normal Operation
Master Node
Once the fault has been fixed, the master node’s Health messages traverse the whole ring and
arrive at the master node’s secondary port. The master node then restores normal
conditions by:
1.declaring the ring to be in a state of Complete
2.blocking its secondary port for data VLAN traffic (but not for the control VLAN)
3.flushing its forwarding database for its two ring ports
4.sending a Ring-Up-Flush-FDB message from its primary port, to all transit nodes.
Transit Nodes with One Port Down
As soon as the fault has been fixed, the transit nodes on each side of the (previously) faulty
link section detect that link connectivity has returned. They change their ring port state from
Links Down to Pre-Forwarding, and wait for the master node to send a Ring-Up-Flush-FDB
control message.
Once these transit nodes receive the Ring-Up-Flush-FDB message, they:
•flush the forwarding databases for both their ring ports
•change the state of their ports from blocking to forwarding for the data VLAN, which
allows data to flow through their previously-blocked ring ports
The transit nodes do not start forwarding traffic on the previously-down ports until after
they receive the Ring-Up-Flush-FDB message. This makes sure the previously-down transit
node ports stay blocked until after the master node blocks its secondary port. Otherwise,
the ring could form a loop because it had no blocked ports.
Transit Nodes with Both Ports Down
The Allied Telesis implementation includes an extra feature to improve handling of double
link failures. If both ports on a transit node are down and one port comes up, the node:
1.puts the port immediately into the forwarding state and starts forwarding data out that
port. It does not need to wait, because the node knows there is no loop in the ring—
because the other ring port on the node is down
2.remains in the Links Down state
3.starts a DoubleFailRecovery timer with a timeout of four seconds
4.waits for the timer to expire. At that time, if one port is still up and one is still down, the
transit node sends a Ring-Up-Flush-FDB message out the port that is up. This message is
usually called a “Fake Ring Up message”.
Sending this message allows any ports on other transit nodes that are blocking or in the Preforwarding state to move to forwarding traffic in the Links Up state. The timer delay lets the
device at the other end of the link that came up configure its port appropriately, so that it is
ready to receive the transmitted message.
Note that the master node would not send a Ring-Up-Flush-FDB message in these
circumstances, because the ring is not in a state of Complete. The master node’s secondary
port remains unblocked.
Page 7 | AlliedWare™ OS How To Note: EPSR
How To Configure EPSR
How To Configure EPSR
This section first outlines, step-by-step, how to configure EPSR. Then it discusses changing
the settings for the control VLAN, if you need to do this after initial configuration.
Configuring EPSR
1.Connect your switches into a ring
EPSR does not in itself limit the number of nodes that can exist on any given ring. Each switch
1
can participate in up to
If you already have a ring in a live network, disconnect the cable between any two of the
nodes before you start configuring EPSR, to prevent a loop.
2.On each switch, configure EPSR
6 rings.
On each switch, perform the following configuration steps. Configuration of the master node
and each transit node is very similar.
i.Configure the control VLAN
This step creates the control VLAN and adds the ring ports to it as tagged ports.
Enter the commands:
create vlan=control-vlan-name vid=control-vid
add vlan=control-vid port=ring-ports frame=tagged
Note that you can use trunk groups for the ring ports.
ii. Configure the data VLAN
This step creates the data VLAN (or VLANs—you can have as many as you want) and
adds the ring ports as tagged ports.
Enter the commands:
create vlan=data-vlan-name vid=data-vid
add vlan=data-vid port=ring-ports frame=tagged
The two ring ports must belong to the control VLAN and all data VLANs.
Page 8 | AlliedWare™ OS How To Note: EPSR
How To Configure EPSR
iii. Remove the ring ports from the default VLAN
If you leave all the ring ports in the default VLAN (vlan1), they will create a loop, unless
1
is part of the EPSR domain. To avoid loops, you need to do one of the following:
vlan
•make vlan
•remove the ring ports from vlan
•remove at least one of the ring ports from vlan
We do not recommend this option, because the action you have taken is less
obvious when maintaining the network later.
In this How To Note, we remove the ring ports from the default VLAN. Use the
command:
delete vlan=1 port=ring-ports
1
a data VLAN, or
1
, or
1
on at least one of the switches.
iv. Configure the EPSR domain
This step creates the domain, specifying whether the switch is the master node or a
transit node. It also specifies which VLAN is the control VLAN, and on the master node
which port is the primary port.
This step also adds the data VLAN to the domain. Enter the command:
add epsr=name datavlan=data-vlan-name
v.Enable EPSR
This step enables the domain on each switch. Enter the command:
enable epsr=name
3.Configure other ports and protocols as required
On each switch, configure the other ports and protocols that are required for your network.
Page 9 | AlliedWare™ OS How To Note: EPSR
How To Configure EPSR
Modifying the Control VLAN
You cannot modify the control VLAN while EPSR is enabled. If you try to remove or add
ports to the control VLAN, the switch generates an error message as follows:
Manager> delete vlan=1000 port=1
Error (3089409): VLAN 1000 is a control VLAN in EPSR and cannot be modified
Disable the EPSR domain and then make the required changes. Note that disabling EPSR will
create a loop, so is not recommended on a network with live data. Of course, in a live
network, you can manually prevent a loop by disconnecting the cable between any two of the
nodes.
Page 10 | AlliedWare™ OS How To Note: EPSR
1
Master
Node
(A)
Transit
Node
(C)
Transit
Node
(B)
epsr-example-basic-ring
End User Ports
S
P
End User Ports
End User Ports
port 1: primaryport 2: secondary
port 1: ring
port 2: ring
port 1: ring
port 2: ring
Example
: A Basic Ring
Example 1: A Basic Ring
This example builds a simple 3-switch ring with one data VLAN, as shown in the following
1
diagram. Control packets are transmitted around the ring on vlan
vlan2.
000 and data packets on
Configure the Master Node (A)
1.Create the control VLAN
create vlan=vlan1000 vid=1000
2.Add the ring ports to the control VLAN
add vlan=1000 port=1-2 frame=tagged
3.Create the data VLAN
create vlan=vlan2 vid=2
4.Add the ring ports to the data VLAN
The two ring ports must belong to the control VLAN and all data VLANs.
add vlan=2 port=1-2 frame=tagged
Page 11 | AlliedWare™ OS How To Note: EPSR
1
Example
: A Basic Ring
5.Remove the ring ports from the default VLAN
delete vlan=1 port=1-2
6.Create the EPSR domain
This step creates the domain, specifying that this switch is the master node. It also specifies
which VLAN is the control VLAN and which port is the primary port.