Ethernet Switch Blade User's Guiderelease 3.2.2jpage ii
Page 3
Legal Notices
The information in this document is subject to change without notice.
Hewlett-Packard makes no warranty of any kind with regard to this manual, including, but
not limited to, the implied warranties of merchantability and fitness for a particular
purpose. Hewlett- Packard shall not be held liable for errors contained herein or direct,
indirect, special, incidental or consequential damages in connection with the furnishing,
performance, or use of this material.
Restricted Rights Legend. Use, duplication or disclosure by the U.S. Government is subject
to restrictions as set forth in subparagraph (c) (1) (ii) of the Rights in Technical Data and
Computer Software clause at DFARS 252.227-7013 for DOD agencies, and subparagraphs
(c) (1) and (c) (2) of the Commercial Computer Software Restricted Rights clause at FAR
52.227-19 for other agencies.
Information in this document is provided in connection with Intel® products. No
license, express or implied, by estoppel or otherwise, to any intellectual property rights
is granted by this document. Except as provided in Intel’s Terms and Conditions of
Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any
express or implied warranty, relating to sale and/or use of Intel products including
liability or warranties relating to fitness for a particular purpose, merchantability, or
infringement of any patent, copyright or other intellectual property right. Intel
products are not intended for use in medical, life saving, or life sustaining applications.
Intel may make changes to specifications and product descriptions at any time, without
notice. HEWLETT-PACKARD COMPANY 3000 Hanover Street Palo Alto, California 94304
U.S.A.
Additional Copyright Notices. AdvancedTCA® is a registered trademark of the PCI
Industrial Computer Manufacturers Group. Linux® is a registered trademark of Linus
Torvalds. ZNYX Networks, RAIN, RAINlink, OpenArchitect®, CarrierClass and HotSwap
are trademarks or registered trademarks of ZNYX Networks in the United States and/or
other countries. All other marks, trademarks or service marks are the property of their
respective owners.
Ethernet Switch Blade User's Guiderelease 3.2.2jpage iii
Page 4
About the Ethernet Switch Blade Manual
This manual includes everything you need to begin using the HP Ethernet Switch Blade with
OpenArchitect software, Release 3.2.2j.
Ethernet Switch Blade User's Guiderelease 3.2.2jpage iv
Page 5
Table of Contents
Chapter 1 Overview of the Ethernet Switch Blade ...........................................................17
High Performance Embedded Switching...................................................................... 17
Using the S50layer3 Script.................................................................................. 98
Layer 3 Switch Using Multiple VLANs............................................................100
Using the S50multivlan Script...........................................................................100
To Modify the Layer 3 Multivlan Script ......................................................... 102
Modify the example script you copied into the /etc/rcZ.d directory. Adjust and
assign the number of IP addresses as applicable. In the example below, the IP
address is changed for the interface in the ifconfig command line of the script.
Editing the S50layer2 script can change the Ethernet Switch Blade Fabric Interface
default configuration. The S50Layer2 script and included example scripts
(/etc/rcZ.d/examples) can be used as templates to create custom scripts. The default
S50layer2 script configures the switch accordingly:..............................................163
The Ethernet Switch Blade is a 72-port AdvancedTCA® Hub and providing Gigabit Ethernet.
Up to 14 ATCA node boards may be addressed via the PICMG 3.0 Base Interface and via the
ATCA PICMG 3.1 fabric . The Base and Fabric switching domains are kept totally separate, both
on the physical layer and the software layer. The Ethernet Switch Blade provides a tightly
integrated modular switching platform that enables high-density solutions.
The Ethernet Switch Blade is actually two separate switches, one for the Base ports and one for
the fabric ports. There are two OpenArchitect® operating system images, one for each switch,
allowing the maximum in separation between the control signaling and the data. The modular
design provides great flexibility and control.
Ethernet Switch Blades can support a 10 Gigabit Ethernet Inter-Switch Link (ISL) for the Fabric
Interfaces, and a Gigabit Ethernet ISL for the Base Interface switches. Depending on the version
of OpenArchitect used, the ISL for the Fabric Interface switches may be operated at 10 Gigabits
per second and provide stacking features.
Linux-based OpenArchitect 3 runs on the embedded processors, providing a comprehensive
package for the management of Layer 2 and Layer 3 packet switching. VLAN management and
Layer 2-7 packet classification are also included with a user-friendly interface. OpenArchitect can
be used with a variety of IP routing protocols.
As part of Advanced TCA, the switch incorporates the PICMG 3.0 Intelligent Platform
Management Interface (IPMI) standard for Field Replaceable Unit FRU) management by the
Shelf Manager.
High Performance Embedded Switching
The Ethernet Switch Blade with OpenArchitect combines the performance of silicon-based
switching fabric with flexibility of software-managed routing policies. It provides Base fabric
PICMC 3.0 (1 Gigabit Ethernet ) links to each of the payload slots, plus two to four PICMC 3.1
in-band GigE ports to each node card, and GigE links to management ports and the second
switch. The Ethernet Switch Blade maintains the forwarding table on silicon, providing the
capability to switch and route at full line rate performance on every port.
Advanced TCA® Compliant
The Advanced TCA® standard developed by the PCI Industrial Computer Manufacturer Group
defines an embedded Ethernet environment for high availability chassis. This environment
includes two switch fabric slots that create a dual star Ethernet network to the 14 Base node slots.
Placing the Ethernet Switch Blade in a hub slot provides embedded Ethernet services to each
node card of the chassis. A standard HA configuration is one Ethernet Switch Blade placed in
each of the two hub slots in a chassis for creation of a redundant, high availability system.
The OpenArchitect software component – open source Linux, IP protocol stack, control
applications and the OA Engine – runs on two embedded PowerPC microprocessors.
OpenArchitect provides extensive managed IP routing protocols and other open standards for
switch management. Examples include network services; Virtual Redundant Router Protocol;
Routing Information Protocol; Open Shortest Path First; Border Gateway Protocol; Quality of
Service and Class of Service; access control lists; Simple Network Management Protocol MIBs,
Common Open Policy Services and web.
Extensible Customization of Routing Policies
The OpenArchitect software environment enables rapid porting of other UNIX/Linux-based
protocols, including open source software conforming to RFCs and other standards. It also
enables the development of application-specific protocol configuration scripts.
Powerful CarrierClass Features
The Ethernet Switch Blade has High Availability hardware features for advanced
telecommunication applications. The switch implements the PICMG 3.0 Full Hotswap support.
This feature provides field replaceable capabilities so a switch can fail and be replaced without
impacting the operational performance of a chassis.
The PICMG 3.0 Intelligent Platform Management Interface (IPMI) standard is also supported.
IPMI uses message-based interfaces that monitor the physical health characteristics of the
Ethernet Switch Blade. The switch provides operational status information to an IPMI
management application. End customers benefit with advanced notice of potential problems.
The Ethernet Switch Blade also implements the Media Dependent Interface called Auto MDI-X.
Auto MDI-X allows connections to any device, switches, hubs, or systems using a regular
straight-through or crossover Cat 5 cable. The RJ-45 port will auto detect and switch MDI/MDIX modes. This IEEE standard makes cabling – especially between switches – faster and less error
prone.
E-Keying is supported by the Ethernet Switch Blade.
Ethernet Port Layout
The Ethernet Switch Blade has a total of 72 switched Gigabit Ethernet ports. The base fabric is
connected via 24 Gigabit Ethernet ports and the data fabric is connected via 48 Gigabit Ethernet
ports. The Ethernet Switch Blade is actually composed of two separate switches, one for Base
port activity and another for fabric port activity. The Base ports ( control and signaling) are
switched on the Base switch, and the fabric ports ( data ) are switched on the fabric switch, which
provides total separation between system management or control packets, and customer data
packets.
You will find the Ethernet Switch Blade has a straightforward installation and configuration.
UNIX or Linux system management skills and some understanding of network protocols will be
required. Configure the Ethernet Switch Blades to your networking application before you
begin using the OpenArchitect switch.
OpenArchitect Switch Environment
The key elements of the OpenArchitect environment include two embedded Linux operating
systems, OpenArchitect-specific applications and libraries, plus, an innovative switch hardware
design.
OpenArchitect hardware is in many ways similar to typical switch architectures. The primary
difference in OpenArchitect is that the PCI bus that interfaces with the embedded processor and
the switch fabric is at a higher performance level than a typical switch (see Figure 1.1: Fabric
Switch Elements). The use of PCI creates a pipe of significant bandwidth between the processor
and the switch fabric.
The embedded processors, running Linux and the OpenArchitect processes, control the flow of all
traffic by maintaining the switch forwarding tables. These tables define the flow of the switch
traffic. Because they are on the switching chips, packets proceed at line rate.
OpenArchitect Software Structure
Figure 1.1: Fabric Switch Elements
OpenArchitect is based on an embedded Linux operating system and includes a number of ZNYX
Networks-supplied modules. The key element is the Linux routing table, which is crucial in a
The purpose of the routing table is to tell the packet forwarding software where to forward the
data packets. In Linux, the packet-forwarding algorithm is operated in software. Normally, the
routing tables are maintained by operator configuration and the various routing protocols that run
in the application environment of Linux.
OpenArchitect uses an innovative new approach for forwarding packets. It provides embedded
software daemons that replicate ( shadow) the Linux routing tables in the silicon-based
forwarding tables (see Figure 1.1: Fabric Switch Elements). In the OpenArchitect switching
environment, the switching chips do the real-time work in switching network packets. The switch
fabric consults its own forwarding tables for each incoming packet; and either filters or forwards
the packet to any egress port, the embedded CPU, or to any combination. The Linux routing
tables, running in software, are used to update the silicon-based tables. This provides both the
flexibility and control of the Linux software environment and the speed of dedicated switching
silicon.
The OpenArchitect environment includes additional features. For example, installing the
OpenArchitect switch gives you immediate implementation of Linux routing protocols. Also, you
have complete support of routing table updates and a standardized method for configuration.
Finally, you can quickly integrate bug fixes, protocol enhancements and additional protocol
implementations from the Linux community. You can also integrate OpenArchitect into other
Linux applications including VPN software, voice over IP protocols, Quality of Service, and
HTML configuration.
RAIN Management API (RMAPI) is a generic interface for passing control data. The
OpenArchitect libraries are implemented completely above RMAPI. The libraries provide a frontend to RMAPI to simplify application writing. Currently one library is implemented, a general
library called zlxlib. As the OpenArchitect application requirements grow, the existing library
will be expanded and additional libraries will be created.
OpenArchitect applications are used to program and configure the Ethernet Switch Blade. These
applications are implemented above the libraries and RMAPI.
The PICMG 3.1 standard defines an embedded Ethernet environment for Telco chassis. This
environment includes two switch fabric slots that create a dual star Ethernet network to the
fourteen node slots. Placing the Ethernet Switch Blade in a hub slot provides embedded Ethernet
services to each node card across the Packet Switching Backplane of the chassis. A standard
configuration is to place a Ethernet Switch Blade in each hub slot creating a redundant, high
availability system. This chapter provides information on the Ethernet Switch Blade port
connectors and LED indicators.
Connecting the Cables
Your switch setup may require some or all of the following types of cables: 10/100/1000 Port
Cabling
Category 5 cabling is required for all external ports. Be sure that your cable length is within the
minimum and maximum length restrictions for the Ethernet, otherwise you could experience
signal or data loss. All copper GigE ports on the Ethernet Switch Blade are auto-MDI sensing
and will automatically determine whether or not an MDI (straight-through) or MDI-X (crossover)
cable is attached.
Console Port Cabling
The switch console can be accessed via one RJ-45 10/100 service port located on the front panel
of the Ethernet Switch Blade.
NOTE: There are two switch portions that make up a Ethernet Switch Blade unit. Each
switch portion, Base and fabric, has its own console ports, and requires its own console
cable or OOB Ethernet cable.
The RS-232 configured RJ-45 connector console port on the front panel can be used to recover
from a system failure. It is used for maintenance only, and is generally not connected. Use a HP
console cable (P/N A6900-63006) provided with the HP bh5700 ATCA 14-Slot Blade Server, in
combination with a Modem Eliminator cable, to access the switch software through the console
port. Refer to the HP bh5700 ATCA 14-Slot Blade Server Installation Guide for additional
information.
Connecting to the Console Port
To attach the console cable to the OpenArchitect Base or fabric switch:
1. Plug the RJ-45 end of the console cable into the RJ-45 Console Port on the front.
2. Connect the Modem Eliminator cable to the DB-9 connector on the console cable.
3. Connect the other end of the Modem Eliminator cable to a standard COM port (9600, n,
8, 1).
4. Reinsert the switch into the shelf chassis and power up.
Use a terminal emulation program to access the switch console.
Out of Band Ports (OOB Ports)
Each switch, fabric and Base, in a Ethernet Switch Blade unit has out-of-band (OOB) Ethernet
ports on the front panel. This is an alternative maintenance port supplying Ethernet connectivity
instead of serial connectivity and is connected only when performing switch maintenance
activities. Use ifconfig to bring up and configure the OOB ports. The OOB ports are 100 full
duplex, not auto-sensing. The front OOB port is eth0, and the rear (not implemented with this
release) is eth1.
LED Reference
See Figure 2.1 for a schematic view of the front of a typical Ethernet Switch Blade board. Note
that there are out-of-band ports, RS232 ports, a USB port, and 10 Gig egress ports (not
implemented in this release). In-band ports from the Base and fabric switches have LED status
lights controlled from the LED Mode button. Press the button successively to display the Base
switch ports, fabric switch ports 0-23, and finally the fabric switch ports 24-47. There are
separate LEDs for the out-of-band ports, and the ATCA status functions.
High availability networking is achieved by eliminating any single point of failure through
redundant connectivity: Redundant cables, switches and network interfaces for hardware,
combined with HA software solutions on both the hosts and switches to control the HA hardware
and maintain connectivity. An HA solution called Surviving Partner is provided on the switch.
For host-side HA, the most common solution is to use the Linux bonding driver. HA solutions
like the Linux bonding driver present a single, virtual interface to the protocol stack while
managing multiple physical links. Figure 3.1: Host HA Architecture shows the relation of the
protocol stack, a bonding driver and physical ports.
Figure 3.1: Host HA
A failover between physical links can be made very quickly without requiring change to the IP or
MAC address of the virtual interface, effectively transparent from the applications point of view.
With redundant links from a switch (or switches) to the host, one link is maintained as the
ACTIVE link and the other as STANDBY. If the ACTIVE link were to go down, the STANDBY
becomes the new ACTIVE, while presenting the same virtual interface to the host.
NOTE: It is important that the bonding solution provide an active-backup mode. For the
Linux bonding driver set “mode == 1” see the http://sourceforge.net/projects/bonding/
documentation for more information. Use the recommendations for Linux kernel 2.4x not
2.6x.
Redundant connections provide an ACTIVE and STANDBY link to a switch, or provide
redundant links between more than one switch. In the case of more than one switch, a complete
HA solution requires a switch-based HA solution.
Surviving Partner
Surviving Partner is a switch-based HA solution. Surviving Partner runs on the switches to
provide transition of Layer 2 and Layer 3 switching functionality between two or more switches.
Surviving Partner is comprised of many interactive protocols and processes including VRRP,
zlmd, zlc, and others.
Since most end nodes use default router addresses, the change of the default router address during
a switch failover would require the end nodes to reconfigure. Layer 3 switches that failover must
maintain the default router address to maintain the end node's IP transparent failover. The Virtual
Router Redundancy Protocol (VRRP, RFC 2338) running in the Surviving Partner switches
provides transparent movement of the default router address. VRRP maintains the notion of a
Master switch and one or more Backup switches. This group of switches presents a virtual router
IP address that can be used by hosts on that net as their default route.
If a Backup switch determines the Master switch is no longer available, one of the Backup
switches will assume the role as Master. Physically, each switch maintains a link to the local
network. Only the Master switch answers to the default gateway, and the hosts on that net have
no need to relearn the router address.
In an HA configuration, the goal is to avoid any single point of failure. VRRP provides a good
mechanism to provide a static route for a local network, but a true HA configuration must also
provide redundant connections for the host. Providing a virtual router for the local network is not
enough. Take the simple case of two hosts on the local network with a connection to the virtual
router. Each host needs a connection to each physical switch participating in VRRP. In the
simplest configuration, each host would have one connection to the network. An HA solution
would include redundant connections from each host to each switch in the virtual router.
Combining the features of Surviving Partner on the switches and HA bonding drivers on the hosts
allows implementation of this true HA configuration.
zlmd
In addition to complete switch failover, single link failure must be properly handled. The Link
Monitor Daemon zlmd, monitors the link status of each port. If a link goes down, zlmd
communicates with the VRRP daemon (vrrpd) to change its priority. Changing the VRRP
priority results in movement of switching functionality. By combining zlmd with the zlc
application, links connected to hosts that have not failed can be deterministically moved to the
new master switch if desired. Supported modes include:
•switch - The switch with the greatest number of UP links becomes the Master for all
VLANs under HA management.
•Vlan - The switch with the greatest number of UP links in that particular VLAN becomes
the Master for that particular VLAN. If the switch has additional VLANs, they each
change independently.
•Port - The Master will remain the Master for that particular VLAN until all ports in that
VLAN are down. The Backup then becomes the new Master for that VLAN. Failed links
move their connectivity through the Backup Switch and the switch interconnect to reach
the Master Switch. This option alleviates the need to move all nodes to a new switch just
because a single link goes down.
NOTE: All modes require inclusion of the interconnect in the VLAN. The ISL
connection between the two Base switches is port 23 for the Ethernet Switch Blade. The
ISL connection between the two fabric slots in port 51.
When a switch fails, it must be replaced. The replacement switch will likely require proper
configuration. For transparent switch replacement, the newly replaced switch must learn its
configuration from its Surviving Partner.
In a simple failover scenario, Host A and Host B are configured with failover between two host
ports, one port connected to Switch A and the other connected to Switch B. Assume Switch A
provides connectivity between Host A and Host B. If Switch A fails, the active link on each host
moves over to the port connected to Switch B. Surviving Partner software on Switch B
recognizes that Switch A has failed, and assumes the role of switching traffic between Host A and
Host B. When the failed Switch A is replaced with a new Switch A', Switch A' will learn its
network configuration from the surviving partner Switch B. Switch A' is now ready as a backup
to Switch B in case of failure of Switch B.
This is achieved through the use of DHCP. When a switch becomes a VRRP Master, a DHCP
server is started with a pointer to a configuration file that contains configuration information for
its partners. The replacement switch comes up running DHCP client to retrieve its configuration.
Proper configuration of Surviving Partner requires coordinated configuration of many different
processes, including vrrpd, zlmd, zlc, and dhcpd. The daemon processes run scripts to
perform their actions. Because these scripts are complex and inter-dependent, a configuration
application called zspconfig is used to build them.
The basic steps to configuring Surviving Partner are:
1. Determine your desired configuration.
2. Modify the configuration file (
the default
3. Configure startup scripts or other scripts such as gated routing scripts and vrrp
configuration scripts.
4. Run
5.
Run zspconfig –u
zspconfig
zspconfig performs the job of building the scripts based on a provided input file locally, or
from a remote machine. A text-based configuration file provides input to zspconfig. Example
configuration files are included on the switch in /etc/rcZ.d/surviving_partner. The
result of zspconfig is to create several configuration files and runtime shell scripts, and
optionally start the Surviving Partner processes. Scripts are generated for configuring VLANs,
starting the network, and starting the vrrpd and zlmd daemons.
zspconfig can also used by sibling backup switches to retrieve configuration from the
Surviving Partner and start the vrrpd and zlmd daemons. zspconfig is generally only run
once to configure Surviving Partner.
) to use as input to the configuration utility (zspconfig).
The configuration and runtime scripts created are as follows:
•S70Surviving_partner Switch initialization script that is run at boot time. This
script will restart the switch with the original configuration given to zspconfig.
Optionally, zspconfig will run this script from the initial invocation.
•zsp.conf.<n> - zspconfig configuration file that contains the configuration of
the sibling backup switches. The <n> is used to distinguish potentially more than one
backup switch. This configuration file is placed in /tftpboot, and is retrieved via
DHCP during configuration of the backup switch by zspconfig with the “-u” option
or, by a replacement switch on boot up.
•vrrpd.conf - Configuration script for the VRRP daemon. This configuration is
used when the S70Surviving_partner script launches vrrpd. There is a line in this file for
each virtual router address vrrpd will manage.
•dhcpd.conf - Configuration script used by dhcpd when the switch becomes
master. dhcpd is also used to give replacement switches their configuration scripts.
Namely a zsp_.conf<n> file that can be input to zspconfig with the -u flag.
•dhclient.conf - If zspconfig is executed with the -u flag, a dhclient.conf file
is created, and then dhclient is used to retrieve a zspconfig configuration file from the
/tftpboot area of the Master switch.
•vrrpd.script - Runtime script that executes each time the vrrpd changes state.
This script starts and stops dhcpd, and toggles down RAINlink ports to force the
RAINlink nodes to a new Master switch.
•zlmd.script - Runtime script executed by zlmd when a link goes up or down. This
script modifies the priority of the vrrpd that in turn may cause the VRRP Master to move
from one sibling switch to another.
After the scripts are created, zspconfig may run the
S70Surviving_partner
script to start
the Surviving Partner tasks. The tasks started are vrrpd, zlmd, and dhcpd.
The vrrpd and zlmd daemons run scripts to perform their actions. When vrrpd changes state
between Master and Backup, it runs a script that starts and stops dhcpd. When zlmd sees a link
go up or down, it runs a script that communicates with vrrpd via vrrpconfig.
Example HA Switch Configuration
The following walks through a basic Surviving Partner configuration typical for an HA setup.
Assume an HA chassis with multiple hosts, such as single-board CPUs, and two switches
configured for Surviving Partner. Each of the hosts has two Base Ethernet ports providing a link
to each of the Base switches and up to four fabric Ethernet ports providing links to each of the
Fabric switches.
Each host runs Linux bonding drivers (or ZNYX OA Node software with embedded RAINlink)
with the ports configured for failover. An interlink provides communication between the Base
switches. Another interlink provides communication between the Fabric switches.
When using a Linux Bonding driver on the node card, the bonding driver should be configured
for Mode 1 (active/standby). See the Linux Bonding documentation at
http://sourceforge.net/projects/bonding/ for complete information.
The two Base switches will be configured as Surviving Partners, using VRRP to form a single
virtual interface to the hosts, as will the two Fabric switches. The ports can be configured many
different ways, with blocks of ports configured as vans. The configuration is set up in the zsp
configuration file, zsp.conf.
NOTE: The actual name on the system may change slightly from zsp.conf, depending on
current release requirements.
Modifying zsp.conf on the Base switch
An example file for setting up zspconfig on an Ethernet Switch Blade is
/etc/rcZ.d/surviving_partner/zsp.conf
. The following will document the default settings.
NOTE: It is unlikely that any installation will use this default script in production. You will
have to modify it to suit your network design.
On Switch A (Master), make a backup copy of
zsp.conf
, and edit
zsp.conf
:
cd /etc/rcZ.d/surviving_partner/
cp zsp.conf zsp.conf.save
vi zsp.conf
The first section uses zconfig to create the VLANs. Many of these choices are determined by
the physical configuration of the switch and ATCA backplane. For instance, the Base switch
interconnect will always be port 23, and the shelf managers will be ports 22 and 13.
zconfig zhp0: vlan100 = zre23;
zconfig zhp1: vlan1 = zre0..11, zre20..21, zre23;
zconfig zre0..11, zre20..21 = untag1;
zconfig zre23 = untag100;
The next section sets up the physical IP addresses to use for the Master and the Backup switch.
The Master provides the addresses to the Backup on a first come, first serve basis. Note that the
physical IP address should be different from the virtual IP address that spans the pair of switches.
Once configured, the pair appears as one connection point to other hosts on the VLAN. You need
to supply an IP address for each interface on each switch. The first IP address on each line is the
Master and the second is the Backup.
Now configure the virtual address for each sibling group. We are going to create a virtual
interface across one VLAN, but not for the interconnect. This provides a single point to
connect/route to the VLANs.
Next come port definitions, as defined on the zspconfig man page. Since our hosts are
connected using the Linux bonding driver (or RAINlink), we will want to choose RAINlink on
each of the ports in VLANs on each switch, and interconnect for the interconnect port on each
switch,
The port definitions are:
interconnect - Ports connected between groups of Surviving Partner switches. VRRP
heartbeat messages are sent on the interconnect ports.
Crossconnect - Crossconnect ports are ports that are connected to other Surviving Partner
switches, that are not part of this Surviving Partner group. Crossconnect ports behave
differently then bonding driver/RAINlink ports. The links are not brought down
temporarily, and VRRP runs with the native MAC addresses to avoid MAC address
duplication with the other VRRP group.
RAINlink - Ports connected to bonding driver/RAINlink enabled nodes. These ports contain
virtual addresses managed by VRRP. And during a failover event, the links are toggled
down to force failover to the Master switch.
Route - Ports connected to upstream routers. VRRP does not manage virtual IP addresses for
these links. Routing protocols must be used to instruct up stream routers of a different path
to get to the VRRP managed networks.
monitor_only - Ports that are monitored but do not have a virtual address managed on them.
They will not have their links brought down temporarily during a failover scenario. These
ports are only monitored. If a problem occurs on this type of link it will cause a failover
scenario.
configure_only - Ports are configured as per the zconfig commands, but do not participate
in the high availability network. Problems on these links will not cause a switch failover.
interconnect: zhp0;
RAINlink: zre0..11, zre20..21;
Next come special modes for VRRP for use when more than one pair of Ethernet Switch Blades
are connected to another pair of Ethernet Switch Blades in a redundant configuration. The intent
of these modes is to provided Spanning Tree like capabilities eliminating network looks between
pairs of Surviving Partner configurations, as well as expedite address learning between the two
pairs of switches:
The next sections determines the failover mode between the Surviving Partner switches. There
are three modes:
•switch - Failover by switch. Failover from Master switch to Backup on any port
failure. The switch with the most links becomes the new Master. One port failure
will cause the switch to failover.
•vlan - Failover by VLAN. The switch with the most up links in the VLAN becomes
the Master of that VLAN. When VLANs failover and all VLAN masters are not
located on a common switch, the interconnect link is used to carry data traffic, and
could become saturated. The use of the interconnect for data traffic in a failover
situation depends on the VLAN design, from one extreme where one VLAN could
contain all ports to one port per VLAN at the other extreme.
•port - Failover by port. The Master switch will remain the Master until all ports in the
VLAN are down. The Backup then becomes the new Master for that VLAN.
Similar to VLAN failover, the interconnect link will carry data traffic in this mode,
when ports failover.
failover_mode: switch;
Next, you can set VRRP_msg_rate and default priority. VRRP_msg_rate is the time in
milliseconds between vrrp message transmissions over the interconnect link. The
vrrp_def_priority is the default priority for both switches. The value is set to 254 and
should not require change.
vrrp_msg_rate: 100; # In milliseconds
vrrp_def_priority: 254;
The following optional entries provide a mechanism to propagate files and/or startup scripts to
sibling switches. An example might be startup scripts or scripts to configure gated. Example
scripts are included to start gated with RIP1, RIP2, or OSPF setup. You must use absolute path
names.
# start_script: Allows the user to add files and scripts that
are moved
# to the slave switches when they do a zspconfig -u. An
example might
# be the gated configuration script S55... Absolute path
names are
# required. Multiple start_script commands can be used to move
more than
If you use the special failover modes vlan or port ( see above for details), you can also specify an
individual address to be the default master, that is, that a port or VLAN should run on a specific
switch when the vrrp priorities are equal between switches.
NOTE: VLAN or port mastering is not appropriate for switch mode and should not be
attempted.
When addresses designated 'master' failover, they will return to their Master switch, whenever the
link is repaired. If they are not designated 'master', they will remain at the backup switch after
repairs.
If both switches are equal in priority for a VLAN, then the switch with theIPaddress designated
'master' will become Master for that VLAN.
Add the keyword “(master)” after one of the sibling_addresses. The local address comes first.
Once the configuration files are complete, run the zspconfig utility on the Master to configure
all the scripts:
NOTE: This command can take 60 seconds or more with no screen output.
zspconfig –f zsp.conf
You will see output similar to this:
zspconfig -f zsp.conf
….
Would you like to install the Surviving Partner startup
script[y,n,?] y
Would you like to start the Surviving Partner daemons without
rebooting [y,n,?] y
Once configuration is complete, insure there are no superfluous S-type startup scripts in
/etc/rcZ.d, and zsync your switch to save your configuration.
Now go to the backup switch and run zspconfig –u to get the appropriate configuration
information from the Master,
zspconfig –u zhp0
Modifying zsp_vlan.conf on the Fabric Switch
An example file for setting up zspconfig on a Ethernet Switch Blade Fabric board is
/etc/rcZ.d/surviving_partner
/zsp_vlan.conf
. Reference the descriptions in the
previous section for descriptions of each configuration section.
# Sample configuration is based on the idea that there are
separate VLANs
# for the multiple connections to a slot.
#
# zhp0: Interconnect VLAN
# zhp1..4: Data interface VLANs, configured such that
Option 2
# slots have 2 VLANs connected to them and Option 3 slots
have
Once the configuration files are complete, run the zspconfig utility on the Master to configure
all the scripts:
NOTE: This command can take 60 seconds or more with no screen output.
zspconfig –f zsp.conf
You will see output similar to this:
zspconfig -f zsp.conf
….
Would you like to install the Surviving Partner startup
script[y,n,?] y
Would you like to start the Surviving Partner daemons without
rebooting [y,n,?] y
Once configuration is complete, insure there are no superfluous S-type startup scripts in
/etc/rcZ.d, and zsync your switch to save your configuration.
Now go to the backup switch and run zspconfig –u to get the appropriate configuration
information from the Master,
zspconfig –u zhp0
Configuring Surviving Partner
The
S60SP_startup
installing the
looking for a Master switch configuration. The
It first looks for a local file
file exists, it is used to configure the switch. Only the originally configured switch or Central
Authority should contain this file. See Central Authority later in this Chapter for more
information.
Next it uses zspconfig –u to attempt to contact a running Master switch to retrieve the proper
configuration. This is the normal case for a replacement switch.
S60SP_startup
script is useful in setting up proper switch replacement. By factory
script in replacement switches, the replacement switches will boot
Finally, it lets the currently saved
the case of a power up of an already configured backup switch when the other HA switch is
unavailable. This case could occur after losing power to the entire chassis.
S70Surviving_Partner
script execute. This case would be
Central Authority
Modifications can be made to the
that is not part of the Surviving Partner pair. The third machine is referred to as the Central
Authority.
Setup requires a DHCP daemon configuration file on the Central Authority and a dhclient
configuration file for each of the two Surviving Partner switches in the pair. The format of the
DHCP daemon configuration file is dependent on the machine and operating system being used.
An example can be obtained from the Surviving Partner primary switch in the location
/etc/rcZ.d/surviving_partner/dhcpd.conf
This configuration will contain configuration for only one of the two Surviving Partner switches.
It must be edited. For example:
subnet 100.0.0.0 netmask 255.0.0.0 {
option broadcast-address 100.255.255.255;
host ZNYX1 {
fixed-address 100.0.0.31;
option dhcp-client-identifier "ZNYX";
option vendor-encapsulated-options
"zsp_conf.1";
S60SP_startup
script to use a third machine running DHCP
.
}
}
A second host entry must be added with unique information.
The last step is to modify the startup scripts that run zspconfig to use the -c option. The -c
option allows you to provide a
default. For example, the
dhclient.conf
S60SP_startup
script line that reads:
script rather then having zspconfig create a
echo y n | zspconfig -t 10 -su zhp0 > /dev/null 2>&1
Can be modified to
echo y n | zspconfig -c
/etc/rcZ.d/surviving_partner/dhclient.new.conf -t 10 -su zhp0
> /dev/null 2>&1
If you use
S60SP_startup
, the /
etc/rcZ.d/surviving_partner/zsp.primary.conf
file
should not exist. This way the S60SP_startup script will first look at the Central Authority. If
the Central Authority is down, then it will use its current configuration.
There are two separate switch portions in the Ethernet Switch Blade units, the base switch and the
fabric switch. The fabric switch handles the data traffic for the ATCA rack over ports 0-47. It
runs the Ethernet Switch Blade software. Two or four GigE connections are provided to node
cards using the ATCA backplane.
Connecting to the Fabric Switch Console
You can connect to the fabric switch console using a telnet connection or with a console
cable. Use the procedure below for a telnet connection. See Connecting to the Console Port,
for instructions.
Connect an Ethernet cable to the host and the switch. The OOB port is not active in the default
configuration. You can connect to the fabric OOB port on the front panel.
Work from a host on the 10.0.0.0 network.
The OpenArchitect switch is pre-configured with address 10.0.0.43. Telnet to 10.0.0.43.
telnet 10.0.0.43
After you are connected, enter the login name
OpenArchitect
ZX7100-OA<release no.>#
login: root
root
. No password is required.
OpenArchitect Configuration Procedure
Layer 2 and Layer 3 switch configurations can be accomplished with a few simple commands.
Once you have configured your switch, the commands should be placed into a start up
configuration script. Like most Linux systems, the OpenArchitect switch boot process runs
initialization commands and scripts in
/etc/init.d/rcS
uppercase “S” in alphabetical order. Any configuration scripts you create should be named in the
standard Linux/Unix manner, starting with an uppercase “S” and numbered in the sequence you
would like them executed. The final step once the switch has been properly configured is to use
the zsync command to save all files into flash for reloading.
You may use standard bash shell procedures to change the prompts on your base switches. Many
sites choose a system that distinguishes among the individual switches at their location. The same
rules apply for saving your choice (zsync) as for all other configuration changes.
Default Configuration Scripts
As shipped the following scripts are run from /etc/rcZ.d as the switch boots up:
NOTE: These default scripts will change in later releases. Use them as examples.
S20stack -
Script that calls zstack to combine the two BCM56504 24-port switch fabric chips
into a single 48 port virtual switch. zstack must be run before any other switch configuration.
S50layer2 -
Script that sets up a basic Layer 2 switch. All 48 ports are set up on one VLAN. This
configuration script is appropriate for a Ethernet Switch Blade. It may need to be modified for
other models.
Example Configuration Scripts
Example scripts are provided that can be used as templates. Use one of the scripts located in the
switch
configuration for the switch is located in the script file
The following scripts are included (each is examined in more detail later in the appropriate
section describing common Layer 2 and Layer 3 configurations):
/etc/rcZ.d/examples
directory to help you configure the switch. The default
/etc/rcZ.d/S50layer2
•
S50layer2 -
Script which sets up a basic Layer 2 switch. All 48 ports are set up on one
VLAN. This is a copy of the script in /etc/rcZ.d that is loaded in the default
configuration.
•
S50layer3 -
Script which sets up a basic Layer 3 switch. All 48 ports are set up on
individual IP networks (VLANs). Layer 3 switching is enabled.
.
•
S50multivlan -
Script which sets up multiple untagged VLANs. (See Using the S50layer3
Script) Layer 3 switching is enabled.
•
S55gatedRip1 -
Script which is used with a Layer 3 switch and calls the GateD daemon to
enable RIP 1 routing protocol.
•
S55gatedRip2 -
Script which is used with a Layer 3 switch and calls the GateD daemon to
enable RIP 2 routing protocol.
•
S55gatedOspf -
Script which is used with a Layer 3 switch and calls the GateD daemon to
A zhp device is associated with one VLAN. zhp may have one or more physical ports and their
associated zre devices. A VLAN from the viewpoint of the switch is a logical mapping of ports
based on intended use. The primary purpose of a VLAN is to isolate traffic and enable
communication to flow more efficiently within groups of mutual interest. The switch is used to
bridge from one VLAN to another. Figure 4.1: Fabric VLANs is an example of a custom layer2
VLAN network structure in a fabric switch.
In the Figure 4.1, four VLANs for each fabric switch are used to organize traffic. This is just one
example of how a layer 2 switch could be configured with the fabric switch.
Tagging and Untagging VLANs
The OpenArchitect switch is capable of switching VLAN tagged and untagged data packets.
VLAN tagged packets conform to the 802.1q specification and the packet header contains an
additional four bytes of VLAN tag information. A given port can be specified to accept VLAN
tagged or untagged traffic. Internally, all traffic for a particular VLAN is treated as tagged traffic.
For each switch port, OpenArchitect creates a separate interface with its own MAC address called
a ZNYX raw Ethernet (
each in band port. You cannot directly access or modify the
During the initial power up of the switch, the default configuration creates a Layer 2 switch. The
Layer 2 configuration places the zre interfaces in one
VLANs The number after zre represents the corresponding switch port number (that is, zre1
represents port 1 on the switch).
zre
). After the initial power up, 48
zre
interfaces are created, one for
zre
interfaces.
zhp
interface. See Figure 4.1: Fabric
Layer 2 Switch Configuration
The steps to build a Layer 2 switch involve creating groups of switch ports in VLANs (Layer 2
switching domains) and bringing the interfaces up. zconfig creates the VLAN group of switch
ports as well as a network interface. Use ifconfig(1M) on the network interface to bring up
the VLAN group.
A startup script called
VLAN (
assigned the IP address of 10.0.0.43 to allow access to the switch. The VLAN is assigned an IP
address. The
## Create a single untagged vlan (i.e. interface), consisting
# of the 48 Gigabit Ethernet ports Layer 2 forwarding enabled
# Put the ISL in its own vlan to avoid loops
#
/usr/sbin/zconfig zhp0: vlan1=zre0..50
/usr/sbin/zconfig zre0..50=untag1
/usr/sbin/zconfig zhp1: vlan2=zre51
/usr/sbin/zconfig zre51=untag2
sleep 1
#
# Assign the ZNYX default IP address 10.0.0.43 to the
zhp0
) for all ports
S50layer2
/etc/rcZ.d/S50layer2
. The ISL is assigned its own VLAN. The interface to the host is then
# At this point the system will act as a Layer 2 switch
# across all ports. Also, the system will accept telnet()
# connections on 10.0.0.43 on any port. Script(s) may then
# be run to reinitialize the system and modify its
# configuration.
Using the S50layer2 Script
The
S50layer2
script can be used as an example, and edited to customize your Layer2 setup.
The default script may not match your physical port configuration. In that case you will have to
alter the script to suit your circumstances. For example, to reconfigure the IP address on your
Layer 2 switch,
Open the
S50layer2
file in the Linux vi editor.
Change the IP address value listed under the Linux ifconfig(1M) command line.
Save your changes by running OpenArchitect zsync.
zsync
Reboot the switch.
Rapid Spanning Tree
The Rapid Spanning Tree Protocol (RSTP) configures a simply connected active topology from
the arbitrarily connected components of a Bridged Local Area Network. RSTP participants use a
simple dialog carried in packets called Bridge Protocol Data Units (BPDUs) for finding the
shortest path between two networks and for eliminating loops from the topology. If nodes
attached to ports fail or are added or deleted, the topology dynamically changes to accommodate
the new configuration. If your network topology is such that there is no real redundancy or
chance for loops, you do not need to turn on Spanning Tree.
zl2d is a shell script used to create Linux bridges consisting of the name of the previously
created zhp device or devices preceded with a "b" (for example, if you are creating a Bridge
device from zhp0, the resulting device would be bzhp0). zl2d then starts a background task
that monitors the port information of the Linux bridge at a specified interval and updates the
Spanning Tree state fields in the hardware when necessary.
brctl(8) is called by zl2d for configuring certain RSTP parameters. For an explanation of
these parameters, see the IEEE 802.1d specification, or reference the brctl(8) man page in
Appendix A. The following demonstrates a simple example of setting up a Layer 2 switch and
starting RSTP.
Create a VLAN containing the ports that will be a part of the Linux bridge running Rapid
Spanning Tree. This example will use ports 0-3 (untagged):
zconfig zhp0: vlan1=zre0..3
zconfig zre0..3=untag1
Create a bridge device from the zhp device,
zl2d start zhp0
A Bridge device named bzhp0 should now exist consisting of ports zre0 through zre3 with
Spanning Tree enabled. To view the bridge device, use the brctl command,
brctl show
brctl showbr bzhp0
Port Path Cost
Each port has an associated cost that contributes to the total cost of the path to the Root Bridge
when the port is the root port. The smaller the cost, the better the path. The Ethernet Switch Blade
uses the following IEEE 802.1D recommendations based on the connection speed of your port:
Port Path Cost
Link SpeedRecommended ValueRecommended Range
10 Mb/s10050-600
100 Mb/s1910-60
1 Gb/s43-10
10 Gb/sTBDTBD
To change the port path, use the brctl setpathcost option. For example, to set the port
priority to a value consistent with a gigabit interface,
The previous section outlines the Layer 2 switch configuration that is automatically configured
when you initially bring up the OpenArchitect switch. In order to communicate between Layer2
interfaces, you must properly setup routing.
The steps to build a Layer 2 switch involve creating a group of switch ports in a VLAN (or Layer
2 switching domain) and bringing that interface up. zconfig creates the VLAN group of switch
ports as well as a network interface. Use ifconfig(1M) on the network interface to bring up
the VLAN group with Layer 2 switching. Layer 3 routing information is then used to route
between the Layer 2 network devices.
Take a simple example of two VLANs configured on the switch, each with four ports. First
teardown any existing configuration,
zconfig –t
Use zconfig to create two new VLANs, each with four ports, and untag them,
Now, use ifconfig to assign each zhp interface an IP address,
ifconfig zhp0 10.0.0.1
ifconfig zhp1 11.0.0.1
At this point, the Linux host has enough information to route between the networks of the directly
attached interfaces, 10.0.0.0 via zhp0, and 11.0.0.0 via zhp1.
The next step is to enable the zl3d daemon to move that routing information from the host to the
Ethernet Switch Blade switching tables in silicon. Once enabled, zl3d will monitor the Linux
routing tables for changes in configuration and update the switch silicon tables. Start zl3d to
update the switch tables:
zl3d zhp0 zhp1
The Ethernet Switch Blade switch is now configured as a Layer3 switch that can route between
two Layer2 devices in silicon.
Using the S50layer3 Script
To modify the configuration to a Layer 3 switch, remove the
/etc/rcZ.d
directory, and replace it with the example script file,
script separate VLANs are set up for each port. The VLANs, are labeled as
zhp0..zhpn. Each VLAN is associated with an individual zre interface. There is always a
one to one connection between VLANs and zhp interfaces. Remember, zre and
zhp
interfaces
can begin with a zero value but a VLAN cannot (that is, zhp0 has zre0 on vlan1, zhp1 has
zre1 on vlan2). Each zhp interface is assigned a separate IP address in the example script.
The S50layer3 script executes the following commands:
• Runs zconfig command to create 48 untagged VLANs (one for each switch port).
/usr/sbin/zconfig zhp0..47: vlan1..48=zre0+
/usr/sbin/zconfig zre0..47=untag1+
NOTE : Double periods (..) after vlan1 and untag1 are used to indicate a range of values.
The plus (+) sign after zre1 is a wildcard character that means auto-incremented and
causes each zhp interface to hold only one zre (that is, zhp0 has zre1 on vlan1,
zhp1 has zre1 on vlan2).
Runs the Linux ifconfig(1M) command for each interface to assign default IP addresses
(10.0.0.43-10.0.47.43), sets the netmask and brings up the interfaces.
ifconfig zhp0 10.0.00.42 netmask 255.255.255.0 up
ifconfig zhp1 10.0.01.42 netmask 255.255.255.0 up
ifconfig zhp2 10.0.02.42 netmask 255.255.255.0 up
.
.
.
ifconfig zhp21 10.0.45.42 netmask 255.255.255.0 up
ifconfig zhp22 10.0.46.42 netmask 255.255.255.0 up
ifconfig zhp23 10.0.47.42 netmask 255.255.255.0 up
• Runs the OpenArchitect zl3d. The zl3d application monitors the Linux routing tables
and updates the switch routing tables for each interface configured above.
/usr/sbin/zl3d zhp0..47
zl3d initially creates and adds each zhp interface (VLAN) to the switch routing tables. The
zhp0..zhp47 is shorthand for the list of interfaces (zhp0, zhp1, …, zhp47) to monitor
with zl3d.
To Modify the Layer 3 Script
• Modify the example script you copied into the /etc/rcZ.d directory. Adjust and assign
the number of IP addresses as applicable. In the example below, the IP address is changed
for the interface in the ifconfig command line of the script.
interfaces, that are added to the routing tables, depending on the
number of VLANs you are adding for your network. Include any other details, as
applicable.
• Run the OpenArchitect zsync command to save your changes.
zsync
• Reboot the switch.
• After rebooting, your switch works from your customized Layer 3 configuration.
Layer 3 Routing Protocols with GateD
An advanced networking configuration may require using the GateD software platform for
deployment of Routing Information Protocols (RIP 1 or RIP 2) and Open Shortest Path First
(OSPF) protocols. Once you’ve configured your Layer2 and Layer3 devices, start gated.
Using the S55gatedRip1 Script
To use GateD protocol with the switch, you need to copy two files into the same directory as your
Layer 3 configuration file. From the
and its corresponding GateD configuration file (for example,
gated.conf.rip1
).
/etc/rcZ.d/examples
folder, copy the example script file
S55gatedRip1
and
The example startup script executes the following commands (S55gatedRip1 is used as an
example):
• Starts GateD with Rip1 using gated.conf.rip1 as the configuration file:
/usr/sbin/gated –f /etc/rcZ.d/gated.conf.rip1
The GateD conf file specifies the following configuration commands:
• Implements the passive function so GateD is prevented from rerouting information to a
different interface if insufficient information is received.
• Open and make configuration changes to the listed
conf
file to coincide with the current
Layer 3 configuration (that is, adjust IP addresses and number of interfaces available). See
GateD documentation if you have questions regarding the
• Run the OpenArchitect zsync command to save your changes. Be sure your changes are
conf
file.
correct:
Zsync
• Reboot the switch. After rebooting, your switch operates as a Layer 3 switch with GateD
routing.
Class of Service (COS)
This following section provides information on using the OpenArchitect switch to provide Class
of Service (COS) support. The switching fabric architecture defines the scope of the COS
parameters. Some apply to an individual port, and others apply to the whole switch. It is
important for the user to understand the scope of the parameters to ensure that the expected
behavior occurs.
Egress Queues
The Ethernet Switch Blade fabric switch provides 1 to 8 COS queues per egress port, and for
packets destined to the CPU from the switching fabric. By default, a freshly booted
OpenArchitect switch has a single queue per egress port (and the CPU).
Ingress Classification
Incoming packets are mapped to queues based on their priority tags. The built-in behavior of the
Ethernet Switch Blade uses the 802.1p tag within a packet as the queue selector. There is one
COS to queue selector map per port.
By using the Linux iptables utility and zfilterd with ztmd, the queue selection can be
based on any information in the first 64 bytes of the IP packet header. The default OpenArchitect
switch behavior has all COS values mapping to a single queue on each of the egress ports.
A default priority for an untagged packet can be assigned for each port. By default, these
incoming priority values are all mapped to COS queue 0. To change the default priority for
untagged packets, or to define the mapping from priority values to COS queues, use the zcos
command (refer to Appendix A).
The OpenArchitect switch can mark or remark packets using the TOS field or 802.1p tag. This is
also controlled through the Linux iptables utility.
Scheduling
The servicing of configured queues by the switching fabric is referred to as scheduling. The
OpenArchitect switch has three built-in scheduling algorithms. The type of scheduling algorithm
used is implied, rather than being explicitly specified, based on the number of queues and which
options are configured. The following scheduling algorithms are provided:
First In First Out (FIFO) – When only one queue is configured per port, packets are serviced in
the order in which they arrive. This is the default for the OpenArchitect switch.
Strict Priority – This algorithm is used when more than one queue is provisioned on the port. The
highest priority queue, which is also the highest numbered one, is always serviced first (Example:
If four queues are configured, queue three is of higher priority than queue zero). As long as there
are packets in the highest priority queue, the lower priority queues are not serviced. The danger is
that higher priority traffic could block lower priority traffic.
Weighted Round Robin (WRR) – This algorithm is similar to Strict Priority scheduling, but it
provides fairness with quanta for each queue. Each queue is assigned a number of packets, known
as weight, that it is allowed to transmit before it yields to a lower priority queue. Note that with
WRR, the priorities of the queues are dependent on the weights allocated. A higher priority queue
with a smaller weight will get less wire-time than a lower priority queue configured with a larger
weight. The relative weights used for priority queues on a port can be set using the zcos
command (this is a switch-wide parameter).
ztmd Explained
ztmd is a traffic management daemon which accepts messages from traffic filtering and quality
of service applications and sets up the hardware.
zfilterd Explained
zfilterd is a daemon that intercepts filtering rules entered by the user via iptables, checks
them for validity and then passes them on to ztmd for entry in the switch.
Running zfilterd
Before starting zfilterd, ztmd must be running. Your can start both from within a script, or
directly from the command line. For example,
ztmd
zfilterd
iptables rules can be entered at any time. If your iptables filtering rules set is extensive,
you may want to move your set of iptables commands to a start up script to run upon
initialization. This could be accomplished by creating a standalone "S" script and placing that
script into /etc/rcZ.d
.
Restrictions on Implementation
Several restrictions exist on the rules that can be implemented on the FFP hardware. These
include:
Actions
DROP the packet. ACCEPT the packet.
Output Port
Should be specified if the action is ACCEPT, if no output port is specified, an IRULE table entry
is generated for every port.
Field values
If specified as ranges, they must be on power of two boundaries.
Negation
Can only be used for icmp, tcp, or udp fields.
Fields supported are: Source IP address, destination IP address, IP protocol, TCP or UDP source
port or destination port, ICMP type, and TCP flags bits (such as SYN).
The input port and output port may also be specified as either zre<n>, where <n> is one of the
48 physical ports, or as zhp<n>, where the zhp interface used must be previously defined using
zconfig.
A restriction on the fields supported is the size of the IMASK table. There are only 16 entries per
port available, which means only 16 combinations of fields can be used at any time.
Conflict Resolution
There are differences from the expected behavior of implementing iptables in a host:
Although the rules are taken from the FORWARD and INPUT chains, they are applied to all
packets, including those destined for the local CPU. The order of application of the rules is not
necessarily the order in which they appear in the chains. If a rule uses a mask that is less
restrictive than another rule, it will be applied first. The last rule that is matched determines the
action that will take place. For example, the rules:
iptables -a FORWARD -i zhp3 -j DROP
iptables -a FORWARD -i zhp3 -o zhp1 -p tcp --dport
smtp -j ACCEPT
result in SMTP packets received on any port in zhp3 to be sent for any port in zhp1; all other
packets from zhp3 would be dropped. The order of the two rules in the FORWARD chain does
not matter.
On the other hand, in the following sequence of rules, the position of the rule that drops SYN
packets is important. Since the set of fields it examines is not a subset of the fields examined by
the ACCEPT rules, and visa versa, the ordering rule given above does not apply. In this case, the
order it is applied will be the same as its position in the FORWARD chain, and all packets which
are TCP SYN packets from zhp5 for zhp3 will be DROPPED, even if they also match one of
the ACCEPT rules.
iptables -a FORWARD -i zhp5 -o zhp3 -p tcp --syn -j DROP
iptables and filtering
iptables is a firewall management user-space utility used in conjunction with the Linux 2.4
kernels, and takes advantage of the netfilter 2.4 kernel code. iptables is extended with a few
more targets to support the hardware filtering functionality used in the chips on the Ethernet
Switch Blade (fabric board). Generally, all of the iptables functionality is usable with a few
minor extensions.
A more detailed source on iptables can be found at:
http://www.netfilter.org/
Almost all the contents described here are derived from there.
There are also many tutorials and iptables manipulation tools, both graphical and command
line. This is expressive of the Open Architect concept. A good place to start is:
http://freshmeat.net/search/?q=iptables
Introduction
Firewall rules are stored in tables. These tables are sometimes also known as firewall chains or
just chains. Tables normally store rules for what are known as hooks, which can be looked as
packet-path junctions. There are five defined hooks: PRE-ROUTE, POST-ROUTE, INPUT,
OUTPUT and FORWARDING. The example below illustrates the default chains on boot up.
By default, INPUT, FORWARD and OUTPUT chains are installed on boot up. Additional rules
Preroute
Output
Post
Route
Input
Forward
Local
Process
Outgoing
Incoming
Routing
Decision
can be installed for the other chains. Additionally, one can write software extensions to add more
chains. Figure 4.2 provides an illustration of the Firewall Flow.
Figure 4.2: Firewall Flow
When a packet reaches a circle in the diagram, that chain is examined to decide the fate of the
packet. Two basic fates of a packet are defined as DROP and ACCEPT. If the chain says to
DROP the packet, it is killed there; however, if the chain says to ACCEPT the packet, it
continues traversing the diagram, ultimately terminating at an application or getting forwarded
out of the box. There are additional actions which may be applied to packets. These are
described in the "Supported Targets" section.
A chain is a checklist of rules. Each rule is checked against the packet header and if a rule
matches, action is taken. If the rule doesn't match the packet, then the next rule in the chain is
consulted. Finally, if there are no more rules to consult, then the kernel looks at the chain default
policy to decide what to do. In a security-conscious system, this policy usually tells the kernel to
DROP the packet.
In the Ethernet Switch Blade product, both the FORWARD chain hook, and the INPUT chain
hook (packets destined for the CPU) are implemented in hardware. The rest of the hooks are in
software in the Linux kernel. An extension of the FORWARD hook also resides in software. It is
important to note that this is in sync with routing being implemented in hardware with software
assist for exception handling. Under general circumstances, when routing happens in hardware,
only the FORWARD chain is traversed. Under exceptional handling of an incoming packet, one
can force the full software traversal. As a router you do not really care about the other hooks
except in the situation where you have some special handling, in which case a policy would force
the packet to be sent to the CPU for further processing.
NOTE: This is also how one would extend the OA packet munging capabilities (for
example, introduce NAT).
Packet Walk
When a packet comes in via one of the interface ports, the Ethernet Switch Blade makes a routing
decision. If the packet was destined for the Ethernet Switch Blade fabric switch itself or if the
send to CPU action is specified, it is sent to the INPUT chain for further processing. If there is no
valid way to forward the packet, it is dropped. If the switch is configured to forward the packet, it
is sent to the FORWARD chain.
Next the hardware FORWARD chain is walked. If there is a rule inserted that matches the packet
headers, then it is looked up next. The inserted policy will decide the packets fate.
In essence, a filter rule will be used to scan the packet data for certain characteristics. Upon a
match a selected 'target' is executed. The target decides what should happen to the packet.
Filter Rules Specifications
A rule could be added (-a) to a chain, deleted (-D) from a chain, replaced (-R) from a chain or
inserted (-I) in a specific position in a chain. Each rule specifies a set of conditions the packet
must meet, and what to do if it meets them ('what to do' is referred to as a `target').
Here's an example filter rule:
iptables -a FORWARD -p UDP -s 0/0 -d 10.0.0.1/32 --source-port
53 -j DROP
This adds to the FORWARD chain the rule: "If you see UDP packets (-p UDP) from anywhere
(-s 0/0) going to host 10.0.0.1 (-d 10.0.0.1/32) with a source port number 53 (--source-port 53)
then the target is to DROP (-j DROP). More details on rule specifications follow.
Specifying Source and Destination IP Addresses
Source ( -s,
specified in four ways. The most common way is to use the full name, such as
www.linuxhq.com
Netmasks can be applied to IP addresses to specify ranges, like199.95.207.0/24 or
199.95.207.0/255.255.255.0 Both specify any IP address from 199.95.207.0 to 199.95.207.255
inclusive. To specify an all-inclusive IP address /0 can be used, like: -s or
rule we use above applies this trick. Note however that the effect above is the same as not
specifying the -s option at all.
Specifying Protocol
The protocol can be specified with the -p (or
know the numeric protocol values for IP) or a name for the special cases of
Case does not matter, so
Specifying an ICMP Message Type
If the protocol is ICMP, the --icmp-type option can be used to match a specific message type,
for example, --icmp-type ping
--source or --src
. The second way is to specify the IP address such as 127.0.0.1.
The type can be preceded by ! to match any message except the type listed, for example, -icmp-type ! 1
Specifying TCP or UDP ports
If the protocol is TCP or UDP, the -s ( or --sport) and -d (or --dport) options specify the
TCP or UDP ports to match.
A range of ports can be specified by giving the first and last ports separated by a :, as in -dport 0:1023. It is also possible to precede the port specification with a ! to match all ports
which are not included in the range, for example, --sport ! 0:1023. However, the range of
ports must be a power of two, starting with a port number which is a multiple of the range.
Specifying TCP flags
If the protocol is TCP, a match on particular TCP flags is specified by listing the flag names; for
example, -p tcp --syn.
Specifying an Interface
The -i (or
--in-interface
) and -o (or
--out-interface
) options specify the name of an
interface to match. An interface is the physical device the packet came in on (-i) or is going out
on (-o). You can use the ifconfig command to list the `up' interfaces (for example, working
at the moment).
As a special case, an interface name ending with a + will match all interfaces, whether they
currently exist or not, which begin with that string. For example, to specify a rule which matches
all zhp interfaces, the
-i zhp+
option would be used.
Filter Rule Targets
As mentioned above the -j construct within a rule specifies which target is to be used in filter rule
to define a target.
Supported Targets
The following are the supported targets. The switch has many additional targets that are software
based (example Network Address Translation or generic connection tracking).
tc, which stands for Traffic Control, is a mechanism for enabling Quality of Service on Linux.
tc uses three functional objects: queuing disciplines, which comprise queuing and scheduling
algorithms such as FIFO queues, priority queues, RED queues, and token buckets; classes, which
are leafs in queuing discipline hierarchies; and filters, such as u32 filters and route filters. In
addition to these three building blocks, tc also includes policers and meters, which may be
associated with filters.
The functional elements of tc may be combined to produce complex QoS rules. For example, a
packet may be matched to a filter, metered, policed as in-profile or out-of-profile, remarked,
mapped to a FIFO queue, and transmitted by a priority scheduler. tc is very flexible in the data
paths that it allows.
The utility zqosd is a daemon that monitors Linux QoS policy and shadows the policy rules into
a hardware configuration. When zqosd is running, tc rules are translated into hardware rules.
NOTE: This document does not detail all of the capabilities of the tc command, rather it
explicitly mentions only features that are supported by OpenArchitect-based switches.
The examples that follow assume that the switch is running the standard Layer 2 start-up script,
/etc/rcZ.d/examples/S50layer2, with all ports placed in a single VLAN, zhp0. Note that
this assumption is implied only by the fact that changes to zhp0 are shown to configure all ports.
Neither tc nor zqosd is limited by the interface setup. Each utility works on either VLANs
(zhp) or ports (zre).
FIFO Queues (pfifo and bfifo disciplines)
The simplest configuration for tc involves no classes or filters, and only a single FIFO queue.
With tc, queue sizes may be specified in bytes or packets. The first example defines a packetlimited FIFO. This example begins with only tc and then illustrates tc in conjunction with
zqosd.
As a first step, confirm that no tc configuration is active on the switch, by listing any queue
disciplines:
tc qdisc ls
The command should return nothing. Now, add a single packet-limited FIFO queue to zhp0 and
confirm that it has been installed to software:
such as zhp0, and a port, such as zre0, are each treated as devices. Breakdown of the options:
handle 100:0
Defines the handle for the queuing discipline. This handle may be used to reference the pfifo
queue. Note that the handle is included with the output of the qdisc ls command.
(100:0 and 100: are equivalent in tc.) The choice of handle is significant for zqosd.
root
Tells tc that this is the base queuing discipline for the device, not a child of another queuing
discipline.
pfifo limit 32
Specifies a packet-limited FIFO queue with an upper bound of 32 packets.
Now, delete the queuing discipline from zhp0 and confirm that it has been removed:
tc qdisc del dev zhp0 root
tc qdisc ls
Thus far, tc has been used without zqosd. It is not sufficient to install software rules on the
OpenArchitect switch though, because the normal case is for packets to be switched in hardware.
For that reason, zqosd must be used to shadow tc configuration into hardware. Like
zfilterd, zqosd works with ztmd, which provides the actual hardware interaction.
If ztmd is not already running, start it:, then initiate the zqosd daemon with no parameters:
ztmd
zqosd
Now, repeat the same tc command as before, to install a packet-limited FIFO queue:
When this command is processed, zqosd detects the state change and generates output.
For each port belonging to zhp0, the queue size has changed to 32 packets. Under the default
switch configuration, all ports other than the CPU port belong to zhp0; so all queues other than
the CPU queue are affected.
As before, remove the tc configuration with the command:
tc qdisc del dev zhp0 root
Note that zqosd detects this state change. In fact, examining the CoS configuration on the
switch reveals that the queue sizes have reverted to their default values.
The byte-limited FIFO queue case differs only slightly from the packet-limited FIFO case. The
syntax is almost identical. In hardware the limit is based on 128-byte cells. The specified byte
limit is divided by 128 to determine the cell limit. Always specify a byte limit of at least 128
bytes to avoid setting the queue length to zero.
For example, to set the byte limit for zhp0 to 4096,
Tear down any installed rules before proceeding with the next example:
tc qdisc del dev zhp0 root
PRIO and WRR queues
The FIFO examples used a single queue for each interface. In fact, the Ethernet Switch Blade
fabric switch is capable of attaching 1 to 8 queues to each port, with either priority or weighted
round robin (WRR) scheduling, and classification based on a priority map.
In tc, the prio queuing discipline establishes multiple queues and specifies their associated
priority map. Although WRR support is not part of the standard tc distribution, it has been
added to the prio discipline.
The final example in this document illustrates WRR. A strict priority scheduler is a simpler case
that can be constructed easily from this example.
Examine the existing CoS settings on the switch, noting the number of queues per port, queue
sizes, scheduling parameters, and priority map. Each of these values changes with this test.
The full set of commands to install four queues, a priority map, and weights is as follows:
tc qdisc add dev zhp0 parent 100:1 pfifo limit 120
tc qdisc add dev zhp0 parent 100:2 pfifo limit 100
tc qdisc add dev zhp0 parent 100:3 pfifo limit 80
tc qdisc add dev zhp0 parent 100:4 pfifo limit 60
The first command attaches a queuing discipline as the root discipline for zhp0, with a handle of
“100:0,” as in the FIFO cases. The “prio” option identifies the type of queuing discipline.
Priority scheduling implies multiple queues and the “bands 4” parameters specify that there are
four queues.
The priority map may be read from left to right as Priority n maps to Queue q, where n is the
index of the list element (numbering from 0) and q is the value specified by that element. So, this
example would read:
Priority 0 maps to Queue 1
Priority 1 maps to Queue 2
Priority 2 maps to Queue 2
Priority 3 maps to Queue 2
Priority 4 maps to Queue 3
Note that the tc priority map applies to a 4-bit field. With the Ethernet Switch Blade, the
priority map refers to the 802.1p tag, which is a 3-bit field. When translating this tc rule to
hardware, only Priorities 0 through 7 are significant; the other eight priorities are ignored.
The parameters wrr 1 2 4 6 specify that WRR scheduling is being used and assigns a relative
weight to each queue. The weights are treated as numbers of packets to be sent from each queue.
In this example, if the queues have sufficient packets, queue 1 will have twice as many packets
sent as queue 0, queue 2 will have four times as many, and queue 3 will have six times as many.
wrr parameters are scaled such that the maximum value is no more than 15. values which would
be 0 are set to 1:
Queue 0 has a weight of 1000 bytes
Queue 1 has a weight of 2000 bytes
Queue 2 has a weight of 4000 bytes
Queue 3 has a weight of 6000 bytes
The remaining commands each define a packet-limited FIFO queue. As with all previous tc
examples, these queues are created on device zhp0. However, unlike all previous examples,
they are not created as root disciplines for the device. Instead, the “parent” option identifies them
as child queues of the prio discipline.
For example, “parent 100:1” identifies that queue as the first child of the prio discipline (Queue
0), because the prio discipline’s handle is 100:0.
After running each of those commands, again examine the CoS parameters. As with the simple
FIFO example, queue sizes change to 32 packets. In addition, though, the number of queues
changes to 4 for each port in zhp0. Furthermore, the weights have changed for each queue, as
have the queue mappings.
To test the strict priority case, simply remove the
wrr 1 2 4 6 options from the first tc
command. Note that all queue disciplines in this test may be cleared by deleting the root
discipline, as before:
The U32 filter provides the capability to match on fields in the L2, L3 or L4 header of a packet.
Each match rule gives the location of the field to be tested, which is always a 32 bit word, a mask
selecting the bits to be tested, and a value which is to be matched by the packet field. Many
matches can be specified in one tc filter command. Only if all matches succeed does the filter
match. In that case, the flowid field identifies the classid of the class this packet belongs in.
The following tc commands put all icmp packets in class 100:10, packets from IP address
1.2.3.4 in class 100:20. Packets for IP address 1.2.3.4 in class 100:20, and arp reply packets in
class 100:30. The last filter illustrates using an offset from the beginning of the protocol header,
along with a mask, to locate the field to be matched
tc filter add dev zhp0 protocol ip parent 100:0 u32 match ip
protocol 1 0xff flowid 100:10
tc filter add dev zhp0 protocol ip parent 100:0 u32 match ip
src 1.2.3.4/32 flowid 100:20
tc filter add dev zhp0 protocol ip parent 100:0 u32 match ip
dst 1.2.3.4/32 flowid 100:20
tc filter add dev zhp0 protocol arp parent 100:0 u32 match u32
2 0xffff at +4 flowid 100:30
Combining Queuing Disciplines
Any of the queue length limiting disciplines can be used with the bandwidth management queue
disciplines, by defining them with the handle of one of the classes as their parent. For the htb
queueing discipline, each class has an explicit handle specified when it is defined. For the prio
queueing discipline, including wrr, each band is a class; their handles are formed from the
handle of the prio qdisc by appending a minor number of 1 to n for the n bands. For
example, the following commands define two strict priority queues for port zre5, with the lower
priority queue limited to 32 kb and the higher priority queue limited to 32 kb:
These translation rules handle conversions of individual rules from tc entries into hardware
entries. They do not explain the results of creating rules that are individually supported; but
which do not make sense in conjunction.
Although the translation rules handle some inconsistency between software and hardware, a user
PDP
PEPPEP
PEP
must define a combination of rules that is reasonable in hardware, to ensure predictable results.
Handle Semantics
All examples have illustrated zqosd copying tc rules into hardware. In fact, the zqosd utility
also enables the user to add tc rules that remain only in software. This selection is based on
handles. zqosd processes all supported queue disciplines and filters with handles between 100:0
and 200:FFFF.
COPS: Common Open Policy Service
The Common Open Policy Service (COPS) is a protocol for distributing networking policy to
devices such as switches and routers. COPS allows a single Policy Decision Point (PDP) to
distribute policy to multiple Policy Enforcement Points (PEPs). A PDP acts as a server for PEP
clients. Figure 4.3 Provides an illustration of the COPS Network Architecture.
Figure 4.3: COPS Network
Architecture
A PDP contains all of the policy rulers for its associated PEPs. A PDP typically stores rules in a
data and is a dedicated server, not a forwarding device.
A PEP is any network device that has to enforce policy decisions. For example, a switch that
restricts network access or prioritizes traffic fits the definition of a Policy Enforcement Point. A
PEP makes no policy decision. It simply applies policy that receives from its PDP.
COPS uses a connection-based query and response mechanism. The following scenario illustrates
PEP-PDP communication:
•A PEP comes online and opens a connection to its PDP.
•After a connection has been established, the PEP transmits state information to the PDP.
•The PDP uses that state information to determine what policy is applicable for the PEP.
•The PEP installs the policy and applies it to future traffic.
As long as COPS is running, a connection between the PEP and PDP should stay open. A PEP
could query a PDP at any time asking for a policy decision. Alternatively, an administrator could
modify the policy on a PDP, which would then push any policy changes to its PEPs.
Protocol Architecture
The COPS protocol is broken into several components. The base layer is the COPS protocol
itself, which defines the messaging format. This protocol defines how communication is handled
without specifying the details of the message data.
The base COPS protocol is then used by different client types. These client types apply the COPS
messaging scheme to particular types of data. The currently standardized client types deal with
the RSVP model (COPS-RSVP) and provisioning model (COPS-PR).
The COPS-RSVP scheme is designed around the requirement that a PEP will have to query a
PDP in response to events. An RSVP PEP is constantly listening for resource reservation requests
and relaying those requests to its PDP.
By contrast, the provisioning model is based on longer lasting policy. The expectation is that
policy should be administratively defined at the PDP and pushed to the PEPs as needed.
OpenArchitect is a COPS-PR client.
The most common use of COPS-PR is for distributing Differentiated Services (Diffserv) policy.
Diffserv is concerned with such Quality of Service elements as queues and schedulers.
OpenArchitect PEP
The OpenArchitect PEP implementation is known as pepd. The pepd utility is based on:
RFC 2478: Common Open Policy Service (COPS)
RFC 3084: COPS Usage for Policy Provisioning
RFC 3159: Structure of Policy Provisioning Information
RFC 3289: Management Information Base (MIB) for the Differentiated Services Architecture
Internet Draft: Differentiated Services Quality of Service Policy Information Base (latest version
draft-ietf-diffserv-pib-09)
Internet Draft: Framework Policy Information Base (latest version draft-ietf-rap-frameworkpib-
09)
A Policy Information Base (PIB) defines the representation of a particular data set. For example,
the Diffserv PIB specifies the structures used to represent all Diffserv elements. PIBs are
functionally equivalent to Management Information Bases (MIBs) such as those used by SNMP.
The OA PEP has implemented those portions of the Diffserv and Framework PIBs that are
supported by the underlying switch architecture.
The pepd utility requires a PDP that has implemented the above RFCs and drafts. Until all draft
standards are approved, the certain COPS-PR data types will not be assigned OIDs. pepd uses
non-standard OIDs for the unassigned values.
Using pepd
The pepd utility works by connection to a PDP, informing the PDP of its roles, and installing
any rules that the PDP has for those roles. Configuration information should be specified in a
configuration file, specified on the command line with the –f option.
One of the main benefits of the OpenArchitect switch is that it runs Linux, so much of the switch
administration is already familiar to most network or system administrators. It is a good idea to
complement these instructions with a standard Linux reference guide, such as Linux Network Administrator’s Guide available from O’Reilly. Below are brief descriptions of some of the more
routine administrative task pertinent to the switch.
Setting the Root Password
The switch is shipped with a default user root and no password. To set the root password, use the
password command:
ZX7100-OA<release no.># passwd
Changing password for root
Enter the new password (minimum of 5, maximum of 8 characters)
Please use a combination of upper and lower case letters and
numbers.
Enter new password:
Re-enter new password:
Password changed.
ZX7100-OA<release no.>#
NOTE: Even when just changing the password, you need to save the file system overlay
with the zsync command, or you will lose your changes upon reboot.
Adding Additional Users
Additional users can be added with the adduser command. Additional users are desirable for
connecting to the switch via ftpd and other daemons that require a login other than root and a
password. To create a user named guest, run adduser
ZX7100-OA<release no.># adduser guest
Changing password for guest
Enter the new password (minimum of 5, maximum of 8 characters)
Please use a combination of upper and lower case letters and
numbers.
If you wish to access the switch from some place other than a directly attached network, you may
want to setup a default route. Use the route command to set a default gateway.
route add default gw 10.0.0.254
Put the entry into the
reboot.
/etc/init.d/rcS
startup script to automatically set a default route upon
Name Service Resolution
Name service lookups will be done locally using
which name server to use by including an entry in
/etc/hosts
/etc/resolv.conf
. You can also tell the switch
.
DHCP Client Configuration
A utility is included to dynamically determine the IP address of the OpenArchitect switch
interfaces. To set the the IP address dynamically, execute the command,
dhclient zhp0
The default device name, zhp0, works with the default configuration of the OpenArchitect
switch and will attempt to obtain an IP address from the local DHCP server. To use DHCP to
set your IP addresses automatically on boot up, uncomment the the following line in
/etc/init.d/rcS
by removing the # sign
/usr/sbin/dhclient zhp0
DHCP Server Configuration
The OpenArchitect switch includes a DHCP server. To start the DHCP server, configure
Consult Linux Network administration manuals for more information on DHCP and
configuration options.
To use DHCP to set your IP addresses automatically on boot up, uncomment the the following
line in
/etc/init.d/rcS
dhcpd
by removing the # sign
Network Time Protocol (NTP) Client Configuration
NTP is a protocol for setting the real time clock on a system. There are numerous primary and
secondary servers available on the network. For more NTP information, and a list of available
NTP servers, see the following URL:
http://www.ntp.org/
You will need to have your network settings properly configured to reach an available NTP
server on your local network or the internet. To set the time and date, execute ntpdate with the
server of your choice. For example,
ntpdate –u ntp.ucsd.edu
The –u is required if the OpenArchitect switch is operating behind some types of firewalls.
If you wish for ntpdate to set your date and time automatically each time you boot,
uncomment the example ntpdate command line in
ntpdate returns the Universal Time (UTC, formerly Greenwich Mean Time, or GMT). To
display the localtime, set the TZ variable to the appropriate name and the number of hours offset
from UTC. For instance,
export TZ=PST8
for Pacific Standard Time offset from UTC by 8 hours. To set an environment variable, add the
entry to /etc/profile. Remember to zsync to make your changes permanent.
/etc/init.d/rcS
by removing the # sign.
Network File System (NFS) Client Configuration
The OpenArchitect switch includes an NFS client for mounting remote file systems. You will
need to start NFS server processes in order to use NFS. You will need to start the following
servers:
Once the above servers are started, you can mount a remote NFS file system.
mount rhost:nfs_file_system local_mount_point
If the remote NFS file system you’re mounting is on an OA switch, you should mount with
caching disabled.
mount rhost:nfs_file_system –o noac local_mount_point
All the necessary servers are included in
To automatically start all NFS client services each time you boot, uncomment the NFS Client
servers. Go to the
removing the # sign.
/sbin/portmap
/sbin/rpc.statd
/usr/sbin/rpc.mountd -r
You can also include commands to mount remote NFS file systems at boot time. There is an
example line included at the appropriate location in
the mount command included for your particular configuration.
NOTE: A “sleep” of 5 seconds is included to allow time for the links to come up prior to
attempting the mount.
sleep 5
mount 10.0.0.1:/nfs –t nfs –o noac /mnt
/etc/init.d/rcS
/etc/init.d/rcS
file. Uncomment the following command lines by
/etc/init.d/rcS
but are commented out by default.
. Uncomment and alter
NFS Server Configuration
The switch also contains an NFS server so that you can mount the switch file system from other
systems. To enable the NFS server, first follow the steps to enable the NFS client. Then, edit
/etc/exports
Administrator’s Guide (or man pages) regarding options for exported file systems. Generally, an
to include the file systems you wish to export. Consult a standard Linux Network
/etc/exports
looks like the following:
Page 77
Now start nfsd to export the mount points and begin answering requests from remote clients.
/sbin/rpc.nfsd –r
To export file systems automatically on boot, edit
/sbin/rpc.nfsd
/sbin/rpc.nfsd -r
command line by removing the #.
/etc/init.d/rcS
, uncomment the
Connecting to the Switch Using FTP
Use ftp to transfer files to or from the switch. See the Linux Reference Guide for details of the
ftp command. In general, you can use ftp to connect to any system running an ftp server,
including other OpenArchitect switches, to either get (transfer files from the remote host to the
switch) or put (transfer files from the switch to the remote host) files.
ftp <remote_host>
ftpd Server Configuration
The switch itself can also be configured to run an FTP server (ftpd). See the Linux Reference
Guide for details of the ftpd command. You will need to add a user to the switch in order to
connect via ftp from a remote host, since root is not allowed ftp access. See the earlier section
in this chapter regarding how to add a user. The ftp daemon is started by default. If you wish to
shutdown the ftp daemon, comment out the
betaftpd
line in
/etc/init.d/rcS
.
Connecting to the Switch Using TFTP
Trivial File Transfer Protocol or tftp, is a very simple protocol used to transfer files. It is
designed to be small and easy to implement. Therefore, it lacks most of the features of a regular
FTP, like user authentication. You can use ftpd to connect to any system running a tftp server
(tftpd) including other OpenArchitect switches.
tftp <remote_host>
TFTPD Server Configuration
The tftp server is started by inetd(8) using the configuration set up in
The use of tftp(1) does not require an account or password on the remote system. Due to the
lack of authentication information, tftpd will allow only publicly readable files to be accessed.
The default location of these files is
Simple Network Management Protocol (SNMP) is the defacto standard for network management.
An SNMP agent maintains a structure of data for a network device in a virtual information
database, called a Management Information Base (MIB). A network management station is
capable of accessing the MIB of the network device to monitor and configure the network device.
The OpenArchitect switch utilizes the NET-SNMP (formerly UCD-SNMP) agent core.
Additional information on the agent can be found at: http://www.net-snmp.com. The
OpenArchitect switch agent will respond to SNMPv1, SNMPv2, and SNMPv3 requests.
Protocols supported on the OpenArchitect switch by gated, such as RIP and OSPF communicate
with SNMP agent via the SNMP Multiplexing (SMUX) protocol.
Supported MIBS
OpenArchitect includes MIB support as documented by each of the RFCs listed. The MIBs
themselves are located on the switch in the /usr/share/snmp/mibs directory.
Supported MIBs
RFC 1155:Structure and Identification of Management Information for TCP/IP-based
Internets
RFC 1227:SNMP MUX Protocol and MIB
RFC 1493:Definitions of Managed Objects for Bridges (obsoletes RFC 1286)
RFC 1657:Definitions of Managed Objects for the Fourth Version of the Border
Gateway Protocol (BGP-4) using SMI-V2
RFC 1724:RIP Version 2 MIB Extension (obsoletes RFC 1389)
RFC 1850:OSPF Version 2 Management Information Base (obsoletes RFC 1253,
which obsoletes RFC 1252, which obsoletes RFC 1248)
RFC 2011:SNMPv2 Management Information Base for the Internet Protocol Using
SMIv2
RFC 2012:SNMPv2 Management Information Base for the Transmission Control
Protocol Using SMIv2
RFC 2012:SNMPv2 Management Information Base for the User Datagram Protocol
Using SMIv2
RFC 2013:Management Information Base for Network Management of TCP/IP-
based internets: MIB-II (obsoletes RFC 1213, which obsoletes RFC
1158)
RFC 2021:Remote Network Monitoring Management Information Base Version 2
RFC 2096:IP Forwarding Table MIB
RFC 2571:An Architecture for Describing SNMP Management Frameworks
RFC 2572:Message Processing and Dispatching for the Simple Network
RFC 2574:User-based Security Model (USM) for version 3 of the Simple Network
Management Protocol (SNMPv3)
RFC 2575:View-based Security Model (VACM) for version 3 of the Simple Network
Management Protocol (SNMP)
RFC 2576:Coexistence between Version 1, Version 2 and Version 3 of the Internet-
standard Network Management Framework
RFC 2665:Definitions of Managed Objects for Ethernet-like Interfaces
RFC 2674:Definitions of Managed Objects for Bridges with Traffic Classes,
Multicast Filtering and Virtual LAN Extensions
RFC 2742:Definitions of Managed Objects for Extensible SNMP Agents
RFC 2787:Definitions of Managed Objects for the Virtual Router Redundancy
Protocol
RFC 2819:Remote Network Monitoring Management Information Base
RFC 2863:The Interfaces Group MIB (obsoletes RFC 2233, which obsoletes RFC
1573, which obsoletes RFC1229)
RFC 2932:IPv4 Multicast Routing MIB
RFC 3165:Definitions of Managed Objects for the Delegation of Management
Scripts
RFC 3231:Definitions of Managed Objects for Scheduling Management Operations
ZNYX Networks Private
MIB
UCD-SNMP Enterprise
MIB
Custom ZNYX MIB to support software and hardware features not
covered by standard MIBs. The Private MIBs are ZX7100BASE.MIB AND
ZX7100FABRIC.MIB, pointed to by ZNYX-H.MIB.
UCD-SNMP MIB related to management and monitoring of the LINUX
host
Table 5.1: Supported MIBs
Supported Traps
Upon certain events, the OpenArchitect switch can be configured to send notification of the
event, called an SNMP Trap out to a defined recipient/manager or managers. Traps are not issued
in real time. OpenArchitect will send SNMP traps for the following conditions:
zhpinterface consisting of ports (zres) and trunks of ports
(zrls)
A
zrl
(trunk device) is treated as an aggregate of its constituent
aggregate of its immediately contributing sub-interfaces (
up a trunk do not contribute to the
The administrative status of a
zre
zhp
and
.
zhp
are independent of each other. If the administrative
zre's
zres
and
(ports). A
zrl's
). The ports that make
zhp
is an
status is down, then the operational status will be down independent of the underlying link state.
You must ifconfig up the
zres
to see the operational link status for a
zre
. When the
administrative status is up, the operational status is dependent on the underlying physical state.
For example, if
status given the administrative status is up on
otherwise interface X has interface Y as a logical constituent.
SNMP Configuration
The SNMP agent is called snmpd and is started by default from the Linux boot up script
/etc/rcZ.d/S75snmpd
Configuration of the OpenArchitect switch SNMP agent is the same as configuration of any
standard Linux host that uses the NET-SNMP agent. Configuration information for persistent
data and security information is kept in
location, which for the OpenArchitect switch is
location to change sys information such as the syslocation and syscontact, as well as
permissions such as the rocommunity or rwcommunity.
NOTE: For NET-SNMP agents, these objects (sysLocation.0, sysContact.0 and
sysName.0) ordinarily are read-write. However, specifying the value for one of these
objects by giving the appropriate token in snmpd.conf makes the corresponding object
read-only, and attempts to set the value of the object will result in a notWritable error
The processing for link up and link down traps is now user configurable. As the default, traps
conform to RFC2863, meaning the trap contents will include:
ifIndex, ifAdminStatus and ifOperstatus
You can alter this behavior by specifying:
cisco_link_traps on
If cisco_link_traps are turned on as described then link up and link down traps will have a
cisco-like format and the trap contents will include:
ifDescr and ifType
Examine and edit
Information in
forced to reread its configuration. See the standard Linux man page for
details.
/usr/share/snmp/snmpd.conf
/usr/share/snmp/snmpd.conf
is only read at startup - or when the daemon is
appropriately for your configuration.
snmpd.conf
for more
SNMP Applications
The OpenArchitect switch includes the snmpget, snmpwalk, and snmpset applications you
can use these standard Linux utilities to test your SNMP agent. For example,
snmpwalk localhost –c public
walks the entire MIB of the localhost (that is, OpenArchitect switch) starting at the top of the
MIB. See the Linux Reference Man Pages for the usage of the SNMP utilities.
MIB values are decoded from their numerical representations into readable text by parsing MIBs
located in
directory and zsync to save across reboots.
the
/usr/share/snmp/mibs/
directory. If you need to add a MIB, add it to that
Port Mirroring
zmirror sets packet mirroring from a given set of ports to a given port. Turning on packet
mirroring causes a copy of the packet to be sent to the mirror_to port. There is only one
mirror_to port, and no limitation on mirror_from ports. Use the zmirror command in
the following way,
zmirror mirror_from mirror_to
After executing the following three commands, packets received on ports 0, 1 and 2 would be
mirrored (copied and transmitted) to port 12. This mirroring would be in addition to any Layer 3
or Layer 2 switching.
zmirror zre0 zre12
zmirror zre1 zre12
zmirror zre2 zre12
To clear the current mirroring use the -t option. The -e option can be used to indicate that
packets being sent on a given port should be copied to the mirror_to port. For example if the
-e option is used as follows, the packets transmitted, as opposed to received, on ports 0, 1 or 2
would be mirrored to port 12.
zmirror -e zre0 zre12
zmirror -e zre1 zre12
zmirror -e zre2 zre12
Link and LED Control
The zlc application sets the link speed and state of individual ports of the switch, or display their
current state. It can also set or clear the extract led or the internal fault led, or to set a port down
or up. To force the link on port 0 down,
zlc zre1 down
To check the status of a link,
zlc zre1 query
To check the status of all links,
zlc zre0..51 query
Link Event Monitoring
The zlmd application is intended to run as a daemon, waiting for a configured event to occur and
then running the program configured for that event. The events monitored are changes in the link
status at any of the in-band ports of the switch, the start of removal of the switch from the ATCA
backplane, or the cancellation of the removal before it actually takes place. The program can be a
shell script that initiates appropriate actions to respond to the event.
This chapter includes basic information about the OpenArchitect switch environment including
an overview of the file system structure, modifying and updating switch files, upgrading the
switch driver and kernel, and implementing a system recovery.
Overview of the OpenArchitect switch boot process
The OpenArchitect switch is equipped with a Random Access Memory (RAM) disk and three
Read-Only Memory (ROM) devices, including, a boot ROM and two application flash devices.
Figure 6.1: ROM Devices in Open Architect
The boot ROM is located on device 0 and contains the OpenArchitect zmon application that
operates as a boot loader and includes a device bootstring. Device 1 contains the application
flash 1 image of the Linux operating system and the OpenArchitect overlay file system.
Application flash 1 is the primary working image for the switch. Device 2 contains the
application flash 2 that is an exact copy of application flash 1. You would only boot from this
device if application flash 1 is corrupted and you need to restore the switch to the factory-shipped
configuration.
Under normal circumstances, the booting up process follows the process outlined in Figure 6.2.
During boot up, the zmon bootloader reads the device bootstring to locate and validate the
correct application image to load. The bootstring command is in the following format:
Figure 6.2: Boot Flow Chart
boot : X | [<options>] X represents the device value 0, 1 or 2
The boot process opens and uncompresses the initrd image onto the RAM disk. Then zmon
begins booting the Linux image. After Linux boots, the init process executes the
/
etc/init.d/rcS
Flow). The
script which, in turn, executes
/etc/rcZ.d/rc
files are the switch configuration files (for example,S50layer2).
Any modifications made to the scripts for your particular configuration must be properly saved or
your changes are lost when you reboot. The file system for the switch only exists in memory. A
rewritable overlay is contained within the upper four megabytes of the first application flash.
Modifying Files and Updating the Switch
Any file in OpenArchitect can be added, deleted or modified, with the exception of
/usr/sbin/zmnt, /lib/modules/zfm_c.o
reboot by running the script zsync.
A directory /.zsync
overlaying process. The user should not modify the files in this directory or unpredictable results
may occur.
contains database files used by zsync for managing the file system
,
and the
/tmp
directory. Files are saved across a system
/sbin/init
,
Recovering from a System Failure
If the switch does not function after you initially change or reconfigure the image, you have
several options for recovering from an error. First, try to telnet into the switch. If you are
successful, remember to run zsync after fixing your problem.
If you cannot telnet, attach a console cable to the switch. Bring down the system and properly
attach the console cable, see Connecting to the Console Port .
System Boots with a Console Cable
After attaching the system console cable, if the system boots, fix the problem that does not allow
you to telnet to the box, run zsync, and reboot. The problem is likely to be in the
configured interface with a proper IP address. For example, zhp0 is configured with the IP
address 10.0.0.43 in the factory default configuration.
Booting with the –i option
If you cannot telnet into the switch and Linux fails to boot, it is likely that a change saved by
zsync has left the switch in an inaccessible state. To allow users to recover from mistakes saved
in the overlay file system, a boot argument of –i passed to the init process will stop the
untarring of the saved overlay files. As a result, the system boots to the factory-shipped
configuration.
• Connect through the console port. During boot up, the system displays the Linux boot string.
Linux/PPC load: for 5 seconds. During the 5 second pause, enter the boot option -i
and press Return
Linux/PPC load: root=/dev/ram init=/sbin/init -i
• Initiating the –i option of zbootcfg.
zbootcfg –d 1 –i
• Reboot the system. After the reboot, clear the –i option from the boot string. Enter the
following command:
zbootcfg –d 1
The reboot command will also take -i as an option and pass it to the Linux boot,
reboot -i
• When the system boots, the overlay file system is returned to the factory-installed
configuration. At this point, you have a few options.
• Run zsync and the factory-installed system will be restored to your flash.
CAUTION: All changes you have made and saved prior to the zsync command will
be lost.
• Restore particular files from the existing overlay. Use the zmnt command to mount the
overlay in a designated directory and copy back just the changes you want to keep from the
existing overlay. For example, if you wanted to recover your
existing overlay, use zmnt to mount the overlay in a designated directory, like
After attaching the system console cable, if the system hangs during boot, try booting with the –i
option as described in the previous section. It is possible that important Linux system files
became corrupted and incorrectly saved in the flash overlay. Use zmnt as described in the
previous section to fix or remove the problem files from the overlay. If the system will not boot
with the –i option, refer to Booting the Duplicate Flash Image section in this chapter.
Booting the Duplicate Flash Image
Another recovery method, if Linux fails to boot, is to temporarily boot the factory-installed
duplicate image located in the second flash device.
Connect through the console port.
When you see the number counter appear after the zmonitor … banner, press any key on the
console keyboard to enter the zmon application.
At the monitor prompt, type
boot:2
You should see the counter again, but the system should boot into the secondary kernel. If you
have difficulties booting, contact Hewlett-Packard technical support.
At this point, follow the Upgrading the OpenArchitect Image section to put a new RAM disk
image in the application flash 1.
IMPORTANT: Be sure not to program flash 2, since currently this is your only
bootable image.
The command to program flash 1 should be similar to the following command. The image name
may be slightly different depending on the model of switch and version of the image:
zflash –d 1 rdr7100.zImage.initrd
:
Upgrading the OpenArchitect Image
Use telnet, or preferably, attach a console cable to the switch, and login to the switch. If you
are connecting via telnet, be aware that the upgrade process will reset the switch to the default
IP address of 10.0.0.43, so you will have to be able to reach 10.0.0.43.
Download the OpenArchitect image to a local system.
The OpenArchitect image is very close to the limit of free space available on a default system so
you may need to clear some space prior to downloading the OpenArchitect image to the switch.
Check for free space with the df command.
One of the easiest ways to create free space is to remove /usr/sbin/gated. The application
will be replaced during the update procedure. Once you have enough free space, proceed.
• From the switch console, ftp the new OpenArchitect (rdr) image from the local system to
your switch.
• The switch has two flash available: Device 1 and device 2. Use the zflash command to
write the new OpenArchitect image into the first flash device.
NOTE: Make sure that Surviving Partner is not running before using zflash. The delays
incurred while zflash writes the flash can cause the Surviving Partner daemons to think
there is a failure, resulting in link oscillation.
zflash -d 1 <image_file>
The image
file will be something named similar to the following,
zflash -d 1 rdr7000.zImage.initrd
Upgrading or Adding Files
Follow the procedure below to upgrade or add a new file to the switch. Place the file you are
adding or upgrading into the appropriate location in the file system. Save the file in the overlay
directory area on the application flash by running zsync.
zsync
After running zsync, the file is saved to the flash for future reboots.
Excluding Saving Files to Flash
Specific files or directories can be excluded from saving to flash by zsync by including an entry
in
/etc/exclude
order to save those files to flash with zsync.
. Likewise, existing entries in
/etc/exclude
such as
/tmp
can be removed in
Upgrading the Switch Driver
The switch driver upgrade process is the same as a file upgrade. However, more caution should
be taken since the driver module is likely to be the method by which you are logging into the
system. If the switch driver has a problem, you will need to have a console cable to recover. To
upgrade a switch driver, replace the file /
apt-get is a utility created by the Debian Linux community to allow remote fetching and
installation of software stored in a repository in Debian package format. It allows users to keep
their software up-to-date with the latest binaries, and install new software without the need to
recompile.
Users may create their own repositories and add entries in /etc/apt/sources.list
( empty by default ) for their private access methods to their private repository. See
http://www.debian.org for complete APT documentation.
At this point, the OpenArchitect Ethernet Switch Blade should be installed and powered up for
the first time. This chapter helps you connect and configure the base switch by presenting
command line examples as well as a discussion of the example configuration scripts. You may
configure the fabric switch independently from the base switch.
Two switches, two consoles
There are two separate switches in the Ethernet Switch Blade. The base switch handles traffic
among base ports 0-23. These ports are reserved for control functions on the ATCA rack such as
connecting to IPMI (shelf managers), and connecting each node card to control and monitoring
devices.
Connecting to the Base Switch Console
You can connect to the switch console using a telnet connection or with a console cable. Use
the procedure below for a telnet connection. See Connecting to the Console Port, for
instructions.
• Connect an Ethernet cable to the host and the switch.
• Configure a host on the 10.0.0.0 network.
• The OpenArchitect switch is pre-configured with address 10.0.0.42. telnet to 10.0.0.42.
telnet 10.0.0.42
After you are connected, enter the login name
ZX6000-OA login: root
ZX6000-OA<release no.>#
root
. No password is required.
OpenArchitect Configuration Procedure
Switch configurations can be accomplished with a few simple commands. Once you’ve
configured your switch, the commands should be placed into a start up configuration script. Like
most Linux systems, the OpenArchitect switch boot process runs initialization commands and
scripts in
executes all scripts located in
Any configuration scripts you create should be named in the standard Linux/Unix manner,
starting with an uppercase “S” and numbered in the sequence you would like them executed. The
final step once the switch has been properly configured is to use the zsync command to save all
/etc/init.d/
. In particular, OpenArchitect runs
/etc/rcZ.d
starting with an uppercase “S” in alphabetical order.
You may use standard bash shell procedures to change the prompts on your base switches. Many
sites choose a system that distinguishes among the individual switches at their location. The
same rules apply for saving your choice (zsync) as for all other configuration changes.
Default Configuration Scripts
As shipped the following scripts are run from /etc/rcZ.d as the switch boots up:
•
S20stack
switch fabric chips into a single 24 port virtual switch. zstack must be run before
any other switch configuration.
•
S30e1000
ports.
•
S40vpd
Data (VPD) area if necessary.
-
Script that calls zstack to combine the two BCM5695 twelve-port
-
Script that loads the e1000 driver module for the Out-of-Band Ethernet
-
Script that checks the current OA version, and loads into the Vital Product
•
S50layer2
-
Script that sets up a basic Layer 2 switch. All 24 10/100/1000 ports are
set up on one IP network (VLAN). The ISL is set up in its own vlan.
Example Configuration Scripts
Example scripts are supplied that can be used as templates. Use one of the scripts located in the
switch
configuration for the switch is located in the script file
/etc/rcZ.d/examples
directory to help you configure the switch. The default
/etc/rcZ.d/S50layer2
.
The following scripts are included. Each is examined in more detail later in the appropriate
section describing common Layer 2 and Layer 3 configurations:
•
S50layer2
-
Script which sets up a basic Layer 2 switch. All 24 10/100/1000 ports are set
up on one IP network (VLAN). This is a copy of the switch in /etc/rcZ.d that is
loaded in the default configuration.
•
S50layer2sp
-
Script which sets up a basic Layer 2 switch. All 24 10/100/1000 ports
are set up on one IP network (VLAN), and turns on bridge support for Spanning Tree.
•
S50layer3
-
Script which sets up a basic Layer 3 switch. All 24 10/100/1000 are set
up on individual IP networks (VLANs). Layer 3 switching is enabled.
Script which sets up multiple untagged VLANs. The first VLAN
includes the first ten 10/100/1000 ports, the next contains the last ten 10/100/1000
ports, the third VLAN contains two 10/100/1000 ports, the last VLAN contains the
last two 10/100/1000 ports. Layer 3 switching is enabled.
•
S55gatedRip1
-
Script which is used with a Layer 3 switch and calls the GateD
daemon to enable RIP 1 routing protocol.
•
S55gatedRip2
-
Script which is used with a Layer 3 switch and calls the GateD
daemon to enable RIP 2 routing protocol.
•
S55gatedOspf
-
Script which is used with a Layer 3 switch and calls the GateD
daemon to enable OSPF routing protocol.
Overview of OpenArchitect VLAN Interfaces
When you initially boot up the switch, one virtual host port is automatically created by
OpenArchitect to enable interaction between the software and hardware. This initial host port,
called ZNYX Host Port (zhp), is a network interface that provides communication between all
24 in-band ports. Therefore, linking to any port on the switch enables you to connect with
OpenArchitect.
A zhp device is associated with one Virtual Local Area Network (VLAN). A virtual local area
network (VLAN) is a logical mapping of workstations and network devices on some basis other
than geographic location (for example, by department, type of user, or primary application). The
primary purpose of a VLAN is to isolate traffic and enable communication to flow more
efficiently within groups of mutual interest. VLANs reduce the time it takes to implement
workstation and network moves, adds and changes. The switch is used to bridge from one VLAN
to another. Figure 7.1 is an illustration of multiple VLANs.
The OpenArchitect switch is capable of switching VLAN tagged and untagged data packets.
VLAN tagged packets conform to the 802.1q specification and the packet header contains an
additional four bytes of VLAN tag information. A given port can be specified to accept VLAN
tagged or untagged traffic. Internally, all traffic for a particular VLAN is treated as tagged traffic.
Switch Port Interfaces
For each switch port, OpenArchitect creates a separate interface with its own MAC address called
a ZNYX raw Ethernet (
each in band port. You cannot directly access or modify the
During the initial power up of the switch, the default configuration creates a Layer 2 switch. The
Layer 2 configuration places all of the zre interfaces in the same
after zre represents the corresponding switch port number (that is, zre1 represents port 1 on the
switch).
zre
). After the initial power up, 24
zre
interfaces are created, one for
zre
interfaces.
zhp
interface. The number
Layer 2 Switch Configuration
The steps to build a Layer 2 switch involve creating a group of switch ports in a VLAN (or Layer
2 switching domain) and bringing that interface up. zconfig creates the VLAN group of switch
ports as well as a network interface. Use ifconfig(1M) on the network interface to bring up
the VLAN group. Figure 7.2 provides an illustration of a Layer 2 Switch connection.
During the initial power up, a startup script called
time creating a single untagged VLAN (IP interface labeled as
/etc/rcZ.d/S50layer2
zhp0
) which includes all Ethernet
is executed at boot
and gigabit ports as one Layer2 switch. The interface to the host is then assigned the IP address of
10.0.0.42 to allow access to the switch. The
• Uses zconfig to create and configure a single, untagged VLAN that contains all 24 switch
S50layer2
script does the following:
ports.
/usr/sbin/zconfig zhp0: vlan1=zre0..23
/usr/sbin/zconfig zre0..23=untag1
• Uses ifconfig(1M) to assign the IP address 10.0.0.42 to the interface.
/usr/sbin/ifconfig zhp0 10.0.0.42 up
To create another VLAN that only contained the two ports, first use zconfig from the
command to build the VLAN and create a network interface for the host.
zconfig zhp1: vlan2=zre20,zre21
Then, bring up the interface with ifconfig(1M):
ifconfig zhp1 193.08.1.1 up
Note that ports zre20 and zre21 are members of both vlan1 and vlan2, and that they are
tagged for vlan2. A port cannot be untagged for more than one VLAN. You can view the
configured VLANs with zconfig.
example, to reconfigure the IP address on your Layer 2 switch,
script can be used and example, or edited to customize your Layer2 setup. For
•Open the
•Change the IP address value listed under the Linux ifconfig(1M) command line.
•Save your changes by running OpenArchitect zsync.
•Reboot the switch.
S50Layer2
file in the Linux vi editor.
Rapid Spanning Tree
The Rapid Spanning Tree Protocol (RSTP) configures a simply connected active topology from
the arbitrarily connected components of a Bridged Local Area Network. RSTP participants use a
simple dialog carried in packets called Bridge Protocol Data Units (BPDUs) for finding the
shortest path between two networks and for eliminating loops from the topology. If nodes
attached to ports fail or are added or deleted, the topology dynamically changes to accommodate
the new configuration. If your network topology is such that there is no real redundancy or
chance for loops, you do not need to turn on Spanning Tree.
zl2d is a shell script used to create Linux bridges consisting of the name of the previously
created zhp device or devices preceded with a "b" (for example, if you are creating a Bridge
device from zhp0, the resulting device would be bzhp0). zl2d then starts a background task
that monitors the port information of the Linux bridge at a specified interval and updates the
Spanning Tree state fields in the hardware when necessary.
brctl(8) is called by zl2d for configuring certain RSTP parameters. For an explanation of
these parameters, see the IEEE 802.1d specification, or reference the brctl(8) man page in
Appendix A. The following demonstrates a simple example of setting up a Layer 2 switch and
starting RSTP.
To Enable Rapid Spanning Tree:
Create a VLAN containing the ports that will be a part of the Linux bridge running Rapid
Spanning Tree. This example will use ports 0-3 (untagged):
zconfig zhp0: vlan1=zre0..3
zconfig zre0..3=untag1
Create a bridge device from the zhp device,
zl2d start zhp0
A Bridge device named bzhp0 should now exist consisting of ports zre0 through zre3 with
Spanning Tree enabled. To view the bridge device, use the brctl command,
Each port has an associated cost that contributes to the total cost of the path to the Root Bridge
when the port is the root port. The smaller the cost, the better the path. The Ethernet Switch Blade
uses the following IEEE 802.1D recommendations based on the connection speed of your port:
Port Path Cost
Link
Speed
10 Mb/s10050-600
100 Mb/s1910-60
1 Gb/s43-10
Recommended
Value
Recommended Range
Table 7.1: Port Path Cost
To change the port path, use the brctl setpathcost option. For example, to set the port
priority to a value consistent with a gigabit interface,
brctl setpathcost bzhp0 zre1 4
Layer 3 Switch Configuration
The previous section outlines the Layer 2 switch configuration that is automatically configured
when you initially bring up the OpenArchitect switch. In order to communicate between Layer2
interfaces, you must properly setup routing.
The steps to build a Layer 2 switch involve creating a group of switch ports in a VLAN (or Layer
2 switching domain) and bringing that interface up. zconfig creates the VLAN group of switch
ports as well as a network interface. Use ifconfig(1M) on the network interface to bring up
the VLAN group with Layer 2 switching. Layer3 routing information is then used to route
between the Layer2 network devices.
Take a simple example of two VLANs configured on the switch, each with four ports. First
teardown any existing configuration,
zconfig –t
Use zconfig to create two new VLANs, each with four ports, and untag them,
Now, use ifconfig to assign each zhp interface an IP address,
ifconfig zhp0 10.0.0.1
ifconfig zhp1 11.0.0.1
At this point, the Linux host has enough information to route between the networks of the directly
attached interfaces, 10.0.0.0 via zhp0, and 11.0.0.0 via zhp1.
The next step is to enable the ZNYX zl3d daemon to move that routing information from the
host to the base switch switching tables in silicon. Once enabled, zl3d will monitor the Linux
routing tables for changes in configuration and update the switch silicon tables. Start zl3d to
update the switch tables:
zl3d zhp0 zhp1
The base switch switch is now configured as a Layer3 switch that can route between two Layer2
devices in silicon.
Using the S50layer3 Script
To modify the configuration to a Layer 3 switch, remove the
/etc/rcZ.d
In the
S50layer3 file, each port is assigned its own Virtual Local Area Network (VLAN)
interface (port interfaces are labeled as
with an individual
but a VLAN cannot. Each zre interface is assigned a separate IP address in the example script
(see Figure 7.3).
directory, and replace it with the example script file,
Each vlan interface (zhp) has only one switch port (zre)
VLAN 1
zre0
zre15
VLAN16
zre16
VLAN17
zre17
VLAN18
zre19
VLAN20
VLAN19VLAN23
VLAN22
VLAN21
zre21
zre22
zre18
VLAN24
zre23
zre14
Figure 7.3: Layer 3 Switch
The S50layer3 script executes the following commands:
•Runs zconfig command to create 24 untagged VLANs (one for each switch port).
/usr/sbin/zconfig zhp0..23: vlan1..24=zre0+
/usr/sbin/zconfig zre0..23=untag1+
NOTE: Double periods (..) after vlan1 and untag1 are used to indicate a range of values.
The plus (+) sign after zre1 is a wildcard character that means auto-incremented and
causes each zhp interface to hold only one zre (that is, zhp0 has zre1 on vlan1,
zhp1 has zre1 on vlan2).
•Runs the Linux ifconfig(1M) command for each interface to assign default IP
addresses (10.0.0.42-10.0.23.42), sets the netmask and brings up the interfaces.
ifconfig zhp0 10.0.00.42 netmask 255.255.255.0 up
ifconfig zhp1 10.0.01.42 netmask 255.255.255.0 up
ifconfig zhp2 10.0.02.42 netmask 255.255.255.0 up
.
.
.
ifconfig zhp21 10.0.21.42 netmask 255.255.255.0 up
ifconfig zhp22 10.0.22.42 netmask 255.255.255.0 up
ifconfig zhp23 10.0.23.42 netmask 255.255.255.0 up
•Runs the OpenArchitect zl3d. The zl3d application monitors the Linux routing tables
and updates the switch routing tables for each interface configured above.
/usr/sbin/zl3d zhp0..23
zl3d initially creates and adds each zhp interface (VLAN) to the switch routing tables. The
zhp0..zhp23 is shorthand for the list of interfaces (zhp0, zhp1, …, zhp23) to monitor with
zl3d.
To Modify the Layer 3 Script
•Modify the example script you copied into the /etc/rcZ.d directory. Adjust and
assign the number of IP addresses as applicable. In the example below, the IP address is
changed for the interface in the ifconfig command line of the script.
From:
ifconfig zhp0 10.0.0.42 netmask 255.255.255.0 broadcast 10.0.0.255 up
interfaces, that are added to the routing tables, depending on
the number of VLANs you are adding for your network. Include any other details, as
applicable.
•Run the OpenArchitect zsync command to save your changes.
zsync
•Reboot the switch.
•After rebooting, your switch works from your customized Layer 3 configuration.
Layer 3 Switch Using Multiple VLANs
An example script is also provided for setting up multiple VLANs each with multiple ports.
Using the S50multivlan Script
The Layer 3 switch example file, S50multivlan, is included to help you configure multiple
VLANs to a Layer 3 switch. A VLAN can include one or more switch ports. In the
S50multivlan
•VLAN 1, zhp0: for the first set of six ports, zre0-zre5
•VLAN 2, zhp1: for the second set of six ports, zre6-zre11
•VLAN 3, zhp2: for the third set of six ports, zre12-zre17