ATM OC-3/12/Ethernet IP Router Example Design
Application Note - Rev 1.0, 3/20/2002
Order Number: 278393-001
Information in this document is provided in connection with Intel® products. No license, express or implied, by estoppel or otherwise, to any intellectual
property rights is granted by this document. Except as provided in Intel’s Terms and Conditions of Sale for such products, Intel assumes no liability
whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to
fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not
intended for use in medical, life saving, or life sustaining applications.
Intel may make changes to specifications and product descriptions at any time, without notice.
Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for
future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.
The IXP1200 Network Processor may contain design defects or errors known as errata which may cause the product to deviate from published
specifications. Current characterized errata are available on request.
MPEG is an international standard for video compression/decompression promoted by ISO. Implementations of MPEG CODECs, or MPEG enabled
platforms may require licenses from various entities, including Int el Corporation.
This ATM OC-3 / Ethernet IP Router Example Design Application Note as well as the software described in it is furnished under license and may only
be used or copied in accordance with the terms of the license. The information in this manual is furnished for informational use only, is subject to
change without notice, and should not be construed as a commitment by Intel Corporation. Intel Corporation assumes no responsibility or liability for
any errors or inaccuracies that may appear in this document or any software that may be provided in association with this document. Except as
permitted by such license, no part of this document may be reproduced, stored in a retrieval system, or transmitted in any form or by any means
without the express written consent of Intel Corporation.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an ordering number and are referenced in this document, or other Intel literature may be obtained by calling 1-800-
548-4725 or by visiting Intel’s website at http://www.intel.com.
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
1.0Introduction
Intel develops example software to demonstrate the capabilities of the IXP1200 Network Processor
Family. This document describes the implementation of example software demonstrating the
IXP1200, IXP1240, and IXP1250 in an ATM environment. In particular, this example design uses
the IXP12xx to route IP packets between ATM and Ethernet networks.
From the point of view of this example software, the IXP1240 and IXP1250 are synonymous - the
project utilizes their common hardware CRC feature; but is not aware of the IXP1250’s additional
ECC capability. The IXP1200, on the other hand, does not have hardware CRC support, and thus
supports only a software-CRC configuration.
This document serves as a companion to the comments in the source code, and is intended to
clarify the structure and general workings of th e design. The fo llowing material is covered: purp ose
and scope of the design; software partitioning and data flow, StrongARM
initialization; microengine functional block description; sub sy stem s and data structures; interthread signaling; project configuration; testing environments; simulation support; limitations, and
example design extension. The end of this document contains lists of document conventions,
acronyms and definitions, and related documents.
1.1Purpose of ATM Example Design
®
Core and microengine
This example design demonstrates just one software architectu re in which the IXP12x x can be used
in ATM-related designs. It is not intended to be ’production ready’. Rather , it is intended to serve as
a starting point for customers designing similar applications. It is also intended for customers to
understand the IXP12xx Network Processor’s capabilities and expected performance.
Users may modify the code, adding additional modules that are proprietary or more specific to their
needs, and estimate performance, although performance numbers gained from this design are
applicable only to the example as presented. Customer changes to the design can result in either
increases or decreases in performance.
1.2Scope of Example Design
This document describes the implementation in sufficient detail that a programmer should b e able
to successfully modify the source code. The README.txt file that accompanies the software
should be consulted for instructions on running the project, building the code, and the actual layout
of the source files.
Application Note7
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
1.2.1Supported / Not Implemented Functions
The following identifies the ATM, Ethernet, and StrongARM supported functions, as well as those
functions that are not su pported.
ATM SupportEthernet Support
1xOC-12 port or up to
4xOC-3 ports (full-duplex).
Segmentation and Reassembly (SAR).
ATM Adaptation Layer 5
(AAL5 with CRC-32).
IP over ATM LLC/SNAP
Encapsulation.
Routing from ATM to
Ethernet ports based on IP.
Unspecified Bit Rate
(UBR).
Full ATM VC name space.
16K Virtual Circuits (VC)
simultaneously in use.
Up to 8 100Mbps
Ethernet ports (full
duplex).
Routing from
Ethernet to ATM
ports based on IP.
StrongARM Core
Processing Hooks
RFC1812 compliance.
AAL5 Protocol data units
(PDUs) for signaling,
(ILMI, LECS, PNNI, CIP)
forwarded to the
StrongARM core.
NOT Implemented
Control Plane processing.
ATM Traffic shaping.
ATM ARP support.
The majori ty of RFC1812 router validations are performe d i n the layer 3 forwardin g code running
on the microengines, while rare case exception packets are sent to the StrongARM core control
plane for validation and processing. No processing code on the StrongARM core is currently
implemented. Refer to the document "IXP1200 Network Processor RFC 1812 Compliant Layer 3
Forwarding Example Design Implementation Details" for further information.
This example design can be configured to run in three different hardware/software configurations
(see the README.TXT file for further information):
ConfigurationDescription
One ATM OC-12 port and eight
100Mbps Ethernet ports
Four ATM OC-3 ports and eight
100Mbps Ethernet ports
Two ATM OC-3 ports and four
100Mbps Ethernet ports
For use with the IXP1240/1250, which uses hardware CRC capability.
Similar to the above configuration (requires the IXP1240/50), except that
it uses four OC-3 ports.
For use with the IXP1200 (which does not have hardware CRC
capability). Instead, CRC computation is performed by two microengines
(thus the reduced data rates).
1.3Background
1.3.1Ethernet, IP and AAL5 Protocol Processing
Figure 1 identifies how this design processes Ethernet, IP , an d AAL5 protocols., Reading from top
to bottom, Ethernet packets go through the LLC/SNAP Encapsulation, followed by segmentation
into ATM AAL5 cells. Reading from bottom to top, it also shows the reverse process, in which
AAL5 cells are reassembled into Ethernet packets.
8Application Note
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
Figure 1. IP over ATM Encapsulation Format
Ethernet
to ATM
Ethernet
Data
Enet HeaderIP Packet
IP
Data
(LLC/SNAP)
Encapsulation
AAL5
CS
SAR
Sub-layer
GFC
4 bits
LLC
3 bytes
CS-SDU Info Field
Payload
48 bytes
VPI
VCI
8 bits
16 bits
OUT
3 bytes
PTI
3 bits
IP HeaderPayload
PID
2 bytes
Payload
48 bytes
CLP
HEC
1 bit
8 bits
IP Packet
Padding
0-47 bytesUU1 byte
ATM Header
(5 bytes)
1.3.2Frame and PDU Length vs. IP Packet Length
Figure 2 shows the relationship between IP Packet Length (X axis), Ethernet Frame Length, and
AAL5 PDU length (Y axis). Packet lengths 20 - 128 bytes are shown to illustrate 1-, 2-, and 3-cell
PDUs. The same pattern continues through the maximum Ethernet MTU size - the 1500 byte
packet, which requires 32 cells. There are a few important items to notice on this graph:
Length
CPI
2 bytes
1 byte
48 bytes
ATM Cell
Cells from other VCs
can be interleaved with
cells from this VC
4 bytes
Payload
CRC
ATM to
Ethernet
A8921-01
• 1.The smallest possible Ethernet frame is 64-bytes, which includes the IP packet in addition to
a 14-byte Ethernet header and 4-byte FCS. Adding an 8-byte preamble and 12-byte interframe
gap (960ns) to this frame increases it’s wire-occupancy time to 84 bytes. After IP packet length
exceeds 46 bytes, Ethernet frame length is a linear function of IP packet length.
• AAL5 PDU length is a step-wise function of IP packet length, due to rounding up to ATM cell
boundaries. At 53 bytes per cell, the 4-byte ATM header and 1 byte HEC are included here, but
the physical layer SONET overhead is not shown.
• The smallest possible IP packet, 20 bytes, corresponds to an IP header that does not contain an
IP payload. This packet fits into a single cell PDU, as do packets up to size 32 bytes (20 byte
IP header plus 12 paylo ad bytes).
• Minimized TCP/IP packets are 40 bytes - 20 byte IP header, 20 byte TCP header, and 0 TCP
payload bytes. These "40 byte packets" require 2 cell PDUs - they do not fit into single cell
Application Note9
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
PDUs because 8-bytes of LLC/SNAP plus 8 bytes of AAL5 trailer pu sh them over the 48 byte
payload capacity of a single ATM cell.
• Fully populated 64-byte minimum-sized Ethernet frames carry 46-byte IP packets, and also fit
into 2 cell PDUs, as do IP packets up through 80 bytes.
Figure 2. Frame and PDU Length vs. IP Packet Length
1.3.3Expected Ethernet Transmit Bandwidth
This example design has more Ethernet transmit wire capacity than most full-bandwidth ATM
input workloads is able to consume. All configurations of this example design include more
Ethernet bandwidth than ATM bandwidth. This assures that Ethernet reception is fast enough to
supply ATM transmit at full wire rate, and that Ethernet can transmit fast enough to consume A TM
receive at full wire rate.
When Ethernet receive bandwidth exceeds ATM transmit wire-rate, the design discards the excess
Ethernet input. In the reverse direction, ATM receive wire-rate is less than Ethernet transmit wirerate, and so Ethernet transmit will never be fully consumed.
Given that the design receives cells at OC-3 or OC-12 wire-rate, Figure 3 shows the expected
Ethernet Transmit bandwidth. This pattern is a direct result of the minimum Ethernet frame size
and cell granularity of AAL5 shown in the previous figure. For exampl e, a 32-byte IP packet would
completely fill one cell, and when forwarded to Ethernet, Ethernet it expands to consume the entire
84-bytes of wire-time associated with a 64-byte minimum size Ethernet frame. In this scenario
ATM is more Mbps efficient than Ethernet, 949 Mbps Ethernet output would be expected.
However, as only 800Mbps of Ethernet bandwidth is available, the one-cell PDU workload will
drive the Ethernet wires to their 800Mbps capacity and discard the last 149Mbps.
10Application Note
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
A 33-byte IP packet overflows into 2 cells, requiring 53 more bytes on the input wire. This
effectively slows down the input rate, and the theoretical best-case Ethernet Transmit bandwidth
for this input drops to 475Mbps, well within the capacity of the 8 100Mbps Ethernet ports. Indeed,
only in the one-cell/PDU case does the Ethernet transmit bandwidth requirement exceed the
800Mbps available.
As packets grow larger , the n et ef fect of over flowing to the next cell is smaller. However, the peaks
in maximum bandwidth are also lower, reflecting the additional ATM header that is needed for
each additional cell in the PDU.
The following figure identifies the expected aggr egate Ethernet tran smit bandwid th for ATM OC-3
and OC-12 wire-rate input:
Figure 3. Expected Ethernet Transmit Bandwidth
1.4Execution Environment
1.4.1Software
The software execution environment supported by the Developer’s Workbench is described in the README.txt file that accompanies the source code files for the project. This includes descriptions
of the directory and file structure, and project reconfiguration instructions. See Section 5.0 for
additional information on configuring the project.
The software simulation of the example design consumes test data streams from the Data Stream
feature of the Developer’s Workbench or through a network simulator Dynamic Linked Library
(DLL). Sample Ethernet and ATM data streams are provided.
Application Note11
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
Figure 4 shows how data stream PDUs can be created in the Workbench for ATM, Ethernet, IP, and
other protocol data streams. These data streams can then be assigned to feed different ports. To test
how the example design performs IP routing, different destination IP addresses can be chosen in
the PDU.
Figure 4. Developer’s Workbench - ATM Data Stream Dialog Box
Figure 5 shows the IX Bus Device Status window. This window gives a continually updated
snapshot of IX Bus activity. It can be used to gain an overall picture of what data is being
transferred over the IX Bus "on-the-fly", and the data or wire transmission rate. The Data
Streaming feature and the IX Bus Device Status window are both documented in the IXP1200
Development Tools User’s Guide.
In the simulation environment, the IP and ATM VC table management software that normally run
on the StrongARM core are emulated with a combination of Transactor (simulator) foreign mode ls
and interpreted Transactor scripts.
12Application Note
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
.
Figure 5. Developer’s Workbench - IX Bus Device Status Window
1.4.2Hardware
The README.txt file contained in the vxworks subdirectory of the project source code describes
how to build and run the project on h ardware using VxWorks
simulation mode by default, some simple changes to the project configuration must be made before
it will run on hardware. To run on hardware, Tornado 2.1
Workbench 2.01 need to be installed on the host system. Further details may be found in the
README.txt file in the vxworks subdirectory.
2.0System Overview
2.1System Programming Model
Figure 6 shows the system hardware, as seen by the software. Data flows from the receive p orts on
the left, through the IXP12xx’s RFIFO and its various hardware resources, and then to the TFIFO
and out the transmit ports on the right. ( While logically indepe nden t, receive and tran sm it po rts for
each interface are implemented in the same physical hardware package. The figure uses a single
block arrow to illustrate 1-4 AT M por ts, and 1 -8 Ethernet por ts, d epending on the configuration.)
®
. While the project runs in
®
as well as the IXP1200 Developer’s
Application Note13
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
The StrongARM core shares access to SRAM and DRAM with the microengines, and thus can
manage the VC and IP tables. The StrongARM core runs a Developer’s Workbench debug library
to connect to Developer’s Workbench running on a remote ho st t o debu g an d download microcode.
Figure 6. System Programming Model
2.2StrongARM Core Software
In this example implementation, the StrongARM core runs VxWorks, and initializes the hardware;
controls the baseboard 82559 PCI Ethernet NIC; runs the IXP1200 Developer's Workbench debug
library, and connects it to a remote system host via the PCI Ethernet NIC; runs various startup
utilities (including atm_init() to initialize the IP route and VC Lookup tables) and provides those
utilities for run-time; and runs an agent to consume exception packets which are not handled by the
microengines in the data plane.
In the simulation environment, the IP and VC table management software are emulated with
Transactor foreign models - DLLs which are linked into the Transactor. The same source code is
compiled into the Transactor foreign models for SIMULATION, and th e VxWorks utilities to run
on HARDWARE.
14Application Note
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
2.3Software Partitioning
The following figures show how the microcode functional blocks are partitioned on IXP12xx
hardware for the three system configurations.
Figure 7. IXP1240 1xATM OC-12 and 8xEthernet 100Mbps Microengine Partitioning
Ethernet TX
Ethernet RX
Ethernet
Ethernet
Ethernet
Ethernet
Scheduler
Fill
Fill
Fill
Port0
Port1
Port2
Port3
Ethernet
Ethernet
Ethernet
Ethernet
Ethernet
Ethernet
Ethernet
Ethernet
Ethernet
Ethernet
Ethernet
Ethernet
A9634-01
OC-12 Port
OC-12 Port
Legend:
= Thread
= Microengine
= Physical Port
ATM RX
Port 8
Port 8
Port 8
Port 8
ATM TX
Fill
Fill
Fill
Unused
MSGQ
MSGQ
= Scratchpad
Memory
= SRAM
IPR
IP Route
IP Route
IP Route
IP Route
MSGQ
Ethernet RX
Port4
Port5
Port6
Port7
= MSGQ
PktQ
PktQ
PktQ
PktQ
PktQ
PktQ
PktQ
PktQ
All three figures show the ATM ports on the left, and the Ethernet ports on the right. All ports are
bi-directional, but are shown as uni-directional for clarity. The IX bus is configured in dual 32 bit
unidirectional mode.
The ATM Receive microengine uses the SRAM VC Lookup Table to assemble ATM cells into
AAL5 PDUs in DRAM. It forwards the descriptor to the fully-assembled PDUs to the IP Route
microengine via a single message queue (MSGQ) in scratchpad RAM.
The IP Route microengine reads the IP header from DRAM, performs additional checks per
RFC1812, performs an IP lookup to make a routing decision, then enqueues the Ethernet frame to
the appropriate Ethernet Transmit packet queue. In the Software CRC configuration, the packet is
processed by a CRC-32 checking microengine before being enqueued to an Ethernet transmit
packet.
In the reverse direction, Ethernet frames are received on the Ethernet ports by the Ethernet receive
microengine(s), which perform IP lookup and RFC1812 check s. The packets are then enqueued on
the appropriate queues to be consumed by the ATM transmit microengine. In the software CRC
configuration Figure 9, the PDUs are first processed by a CRC generation microengine before
going to the ATM Transmit microengine.
Application Note15
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
In the OC-12 configuration, there are two message queues (MSGQs) in scratchpad RAM, one for
PDUs from each Ethernet Receive microengine. The pool of threads in the ATM transmit
microengine alternately poll the two MSGQs.
In the OC-3 configurations, there is a buffer descriptor queue (BDQ) in SRAM associated with
each ATM transmit port. BDQs are similar to packetqs, but they are slightly more efficient in
configurations, where for example the transmitter dedicates a thread to each BDQ.
Figure 8. IXP1240 OC-3 4xATM and 8xEthernet 100Mbps Microengine Partitioning
Ethernet TX
Scheduler
Fill
Fill
Fill
Ethernet
Ethernet
Ethernet
Ethernet
Ethernet
Ethernet
Ethernet
Ethernet
OC-3
OC-3
OC-3
OC-3
ATM RX
Port 8
Port 9
Port 10
Port 11
MSGQ
IPR
IP Route
IP Route
IP Route
IP Route
PktQ
PktQ
PktQ
PktQ
PktQ
PktQ
PktQ
PktQ
ATM TX
OC-3
OC-3
OC-3
OC-3
Legend:
= Thread
= Microengine
= Physical Port
Port 8
Port 9
Port 10
Port 11
2.3.1Lookup Tables
Not shown in the diagrams, the microengines make use of either three or four lookup tables:
• VC Lookup Table - resides in SRAM and is used by the ATM Receive microengine.
• IP Lookup Table - resides partially in SRAM and partially in DRAM, and is used by the IP
Route microengine and the Ethernet Receive microengine.
• MAC Address Hash Table - resides in SRAM and is used for RFC 1812 Port address
verification.
• Software CRC configurations use a table of pre-computed CRC-32 syndromes in SRAM.
BDQ
BDQ
BDQ
BDQ
= Scratchpad
Memory
= SRAM
Ethernet RX
Port4
Port5
Port6
Port7
= MSGQ
Ethernet RX
Ethernet
Ethernet
Ethernet
Ethernet
Port0
Port1
Port2
Port3
Ethernet
Ethernet
Ethernet
Ethernet
16Application Note
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
Figure 9. IXP1200 2xATM OC-3 Software-CRC and 4xEthernet 100Mbps Microengine
Partitioning
ATM RX
OC-3
OC-3
OC-3
OC-3
Legend:
= Thread
= Microengine
= Physical Port
Port 8
Port 9
IP Route
IP Route
ATM TX
Port 8
Port 9
Unused
Unused
MSGQ
BDQ
BDQ
= Scratchpad
Memory
= SRAM
2.4Data Flow
2.4.1ATM to Ethernet Data Flow
*
CRC CHK
Check
Check
Check
Check
CRC GEN
Generate
Generate
Generate
Generate
= MSGQ
PktQ
PktQ
PktQ
PktQ
MSGQ
Ethernet TX
Scheduler
Fill
Fill
Fill
Ethernet RX
Port0
Port1
Port2
Port3
Ethernet
Ethernet
Ethernet
Ethernet
Ethernet
Ethernet
Ethernet
Ethernet
A9636-01
Figure 10 outlines th e processing to receive ATM cells and forward them to Ethernet ports. For a
given VC, three different types of cells of the PDU can arrive: the first cell, middle cells, and last
cell:
1. The first cell of the IP over ATM PDU contains three types of headers: ATM header, LLC/
SNAP header, and IP Header. This is sufficient information to make a forwarding decision.
The payload portion of this cell is moved directly from the RFIFO to DRAM.
2. Subsequent middle cells are moved directly from the RFIFO to DRAM without any additional
processing.
3. When the last cell of th e PDU (which contains the AAL5 trailer) is receiv ed, the payload of the
cell is moved directly from the RFIFO to DRAM, and the completed PDU is then enqueued
for Ethernet transmi s sion.
2.4.1.1VC Lookup
A VC lookup is performed on each cell received over an ATM port. The appropriate VC Table
Entry is located using the VPI/VCI value in the ATM header plus the port number. The lookup
provides an DRAM packet buffer base address, plus the CRC-32 syndrome for the PDU. As each
additional payload is added to the DRAM buffer, the offset value is incremented and the CRC
Application Note17
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
syndrome is updated appropriately. The VC Table Entry also contains an AAL type field.
Currently, this example design supports only classical IP over ATM, where the AAL type can be
either 0 or 5. A value of 0 indicates that the VC is not open, so any cell received on that VC is
immediately discarded.
The LLC/SNAP field specifies the protocol type. Currently, the only valid value is 0x AA AA 03
00 00 00 08 00 (classical IP over ATM). While this implementation consumes and produces just
one valid LLC/SNAP pattern, this pattern is not hard-coded. The LLC/SNAP bits are included in
the IP route table entry, as well as the VC lookup table. This is to make it easy to modify the design,
not only support a different LLC/SNAP pattern, but also to be able to support different valid
patterns for each VC.
2.4.1.2IP Lookup Ta ble
Each PDU contains an IP header in its first cell. Therefore, a single IP lookup is performed for each
PDU, regardless of the number of cells in the PDU.
Figure 10. ATM to Ethernet Processing Steps
ATM PDU on Rx Port
1
Receive
ATM Cell
LLC
ATM
Hdr
Hdr
IP
Payload
Hdr
VC Look-up check
LLC/SNAP header
on first cell
ATM
Hdr
If end of PDU
6
Check
length
8
Strip
AAL-5
trailer
IP
Lookup
Address
2
3
IP
VC Lookup Table Entry
VPI/VCI
CRC
LENCLPPADUU
7
Check
CRC
IP look-up
on first cell
Route Table
Port
Port
number
type
AAL
LLC/SNAP
type
header
Move
payload
to buffer
header
Packet Buffer
5
Enet
Buffer
offset
SDRAM
Cell N
(40 Bytes)
Cell 1
(48 Bytes)
Cell 0
(48 Bytes)
Locate buffer & offset
4
Buffer base
address
Check CRC
on AAL-5 PDU
7
Ethernet PDU on Tx Port
Transmit
MPKT
Ether
Hdr
IP Payload
Build MPKT,
9
add Ethernet
header on first
MPKT
CRC-32
Residue
10
A9638-01
18Application Note
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
2.4.2Ethern e t to ATM Data Flow
Figure 11 outlines the sequence of events that takes place when processing incoming Ethernet
packets. Incoming Ethernet packets can either fit within a single MPKT ("m-packet", 64 byte
packet "fragment"), or span multiple MPKTs. The SOP (start of packet) and EOP (end of packet)
bits indicate the starting and ending MPKTs. As MPKTs are rece ived, they are stored in an DRAM
data buffer.
When the first MPKT is received (SOP asserted), the IP header is read from the RFIFO, the header
checksum is checked, the appropriate IP fields are updated (i.e. TTL), and an IP lookup is
performed. The IP Lookup Table Entry tells the receiver which port to route to, and which LLC/
SNAP pattern to prepend to the PDU. The LLC/SNAP and modified IP headers are then written to
DRAM.
When the final MPKT is received (EOP asserted), the AAL5 trailer is written out to DRAM and the
fully assembled PDU is enqueued for ATM transmission.
Figure 11. Ethernet to ATM Processing Steps
Ethernet Frame on Rx Port
Receive
1
MPKT
Ether
HdrIPHdr
Set current port state on first
3
MPKT strip Enet header
Payload
Perform IP
2
lookup on SOP
IP Lookup
IP
Address
Port State
Buffer Base Address
SDRAM
Pack Buffer
Payload
MPKT N
5
Move
MPKT
payload to
buffer
Route Table
Port
Port
type
number
Buffer OffsetLength
ATM
header
4
Locate buffer & offset
Payload
MPKT 1
Payload
MPKT0
LLC
header
Generate CRC
on PDU
7
6
Add
LLC/SNAP
header &
AAL-5
trailer
on EOP
IP
LLC
Packet
Hdr
8
Place
on Tx
queue
ATM PDU on Tx Port
ATM
Hdr
AAL-5
trailer
Segmentation
Queues
UBR
Queue 0
UBR
Queue 1
Payload
9
Transmit
from
segmentation
queues on
transmit
add ATM
header
Port 0
Port 1
A9637-01
2.5StrongARM Core Initia li zat ion
On hardware, NetApp_Init is linked into VxWorks, and does the following:
1. Initialize the hardware, including the MACs and PHYs via VxWorks network drivers.
2. Control the baseboard 82559 PCI Ethernet NIC.
Application Note19
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
3. Run the IXP1200 Developer’s Workbench debug library, and connects it to a remote system
host via the PCI Ethernet NIC to download and debug IXP1240 microcode.
Then, atm_init() is invoked to initialize data structures in memory:
• Buffer Descriptor Free-list.
• CRC-32 Lookup Table.
• IP Lookup T abl e.
• VC Lookup Table and hash miss free-list.
• IP directed broadcast address hash table.
• Ethernet receive port MAC address hash table.
On hardware, atm_init() resides in the atm_utils.o VxWorks-loadable module running on the
StrongARM core. In the simulation environment, atm_init() resides in the atm_util.dll foreign
model and is invoked from the Transactor startup script atm_ether_init.ind.
2.6Microengine Initialization
One microengine includes system_init.uc and invokes system_init() at its beginning. system_init()
is the central microcode initialization macro. It handles initialization not handled by th e
StrongARM core, and then sends a signal to thread0 of every microengine, including itself.
(system_init() can be invoked from any microengine. ether_tx_threads.uc is used simply because
of available microstore space.)
Reset causes every microengine to execute thread0 first, so every microengine begins with thread0
waiting for the inter-thread signal from system_init(). Upon receipt, thread0 i s responsible for
starting up the microengine in an orderly fashion, e.g. initializing absolute registers and sig n a ling
the other threads to start.
3.0Microengine Functional Blocks
3.1ATM Receive Microengine
The ATM Receive microengine is a single microengine dedicated to receive cells from the ATM
ports, check CRC-32 while re-assembling them into PDUs, an d then for ward them to the I P Router
microengine. (In the software CRC configuration, an additional microengine is used to handle
CRC checking.)
3.1.1Structure
The following identifies the ATM Receive microengine structure for OC-12 and OC-3 ports:
OC-12 PortOC-3 Ports
Four threads working in parallel on one port.One thread/port.
20Application Note
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
OC-12 PortOC-3 Ports
"Fast-port" speculative receive requests."Slow-port" status check before receive requests.
VC Cache enabled.VC Cache disabled.
NUMBER_OF_ATM_PORTS must be 1.NUMBER_OF_ATM_PORTS may be 1, 2, or 4.
3.1.2High Level Algorithm
In all configurations, each Receive thread gets its own RFIFO element, as assigned by
port_rx_init().
Figure 12. ATM Receive High Level Algorithm
while(1)
#if (ATM_OC3_PORTS)
poll RCV_RDY_LO until port is ready
#endif
wait until < 3 receive requests in flight from this engine
receive cell from PHY to RFIFO
if (no Buffer Descriptor available "on deck")
pop buffer descriptor from free list.
read ATM header from RFIFO
#if (ATM_OC12_PORT)
if (RX_CANCEL)
handle & continue
#endif
if (RXFAIL)
handle & continue
if(not user cell)
handle & continue
#if (ATM_OC12_PORT)
if(ATM header hits in VC cache)
get VC info from VC cache
else // cache miss
allocate unused cache entry
#endif // ATM_OC12_PORT
look-up VC in hashed VC table
if (VC not open)
handle & continue
if (no Buffer Descriptor associated with VC)
assign "on deck" descriptor to this VC.
if (VC not open for AAL5)
drop cell & continue
if (first cell of PDU)
if (cell LLC/SNAP != VC table LLC/SNAP)
drop cell
move first cell to DRAM from RFIFO, calculate CRC-32
else
move nth cell to DRAM from RFIFO, calculate CRC-32
if (last cell of PDU)
if (bad CRC-32)
drop PDU, continue
if (AAL5 length == 0)
drop PDU, continue
update buffer descriptor
msgq_send() buffer descriptor to IP Route engine
else // not last cell
#if (ATM_OC12_PORT)
update and exit VC cache entry
#endif
update VC table entry
Application Note21
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
3.2ATM Transmit Microengine
The ATM Transmit microengine is an AAL5 Unspecified Bit Rate (UBR) Transmitter that uses a
single microengine to move cells at wire-rate in either single OC-12 or up to four OC-3 port
configurations. No attempt is made to mix, schedule, or otherwise ’shape’ the order of the cells on
the wire.
The transmitter consumes PDUs one at a time from beginning to end, resulting in an output stream
in which cells from the same PDU are transmitted "back-to-back" from first through the last cell of
the PDU.
The transmitter is implemented with 3 identical fill threads. Unlike the Ethernet transmitter, the
ATM transmitter does not have a thread dedicated to scheduling the work of the fill threads. Rather,
the fill threads use shared absolute registers to act as a "distributed scheduler". The fourth thread
could also be enabled as a fill thread, but is not needed at the wire rates in this design.
In IXP1240/1250 hardware CRC configurati on s, th e ATM Transmitter generates CRC-32 upon
transferring cells from DRAM to the TFIFO. In the IXP1200 software CRC configurations, CRC32 is computed by a dedicated CRC-32 generation microengine.
3.2.1High Level Algorithm
Figure 13. ATM Transmit High Level Algorithm
while(1)
critsect_enter(@poll_for_new_work_mutex)
if (engine not active sending a PDU)
dequeue a PDU
if (Rosetta not ready to transmit)
goto skip#
critsect_exit(@poll_for_new_work_mutex)
get transmit (cell) assignment from active PDU
sequence_enter(SEQ_TFIFO) - remember TIFO element allocation order
_atm_tfifo_element() to claim the next TIFO element
write payload from DRAM to TFIFO
_build_atm_tx_assignment() set-up TFIFO control word
_my_tfifo_status_write() write control to TFIFO
atm_tx_tfifo_write_cell_header_and_data0() – ATM header into TFIFO
sequence_wait(SEQ_TFIFO) - wait for my element to be next
tfifo_ptr_wait() - don't validate too far ahead of xmit_ptr
tfifo_validate_write()
sequence_exit(SEQ_TFIFO)
continue
skip#: // skip a TIFO element
critsect_exit(@poll_for_new_work_mutex)
sequence_enter(SEQ_TFIFO) - remember TIFO element allocation order
_atm_tfifo_element() - to claim the next TIFO element
_my_tfifo_skipstatus_write() - write control to TFIFO
sequence_wait(SEQ_TFIFO) - wait for my element to be next
tfifo_ptr_wait() - don't validate too far ahead of xmit_ptr
tfifo_validate_write()
sequence_exit(SEQ_TFIFO)
22Application Note
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
3.3IP-Router Microengine
The IP Router microengine consumes packets from the ATM receive microengine via a message
queue, and routes them to the approp riate Ethernet transmit packetq. In the IXP1200 software-CRC
configuration, this function is carried out by two threads residin g on the ATM Receive microengine
rather than on a dedicated IP router microengine.
3.3.1Structure
All threads are identical. In hardware-CRC configurations, four IP Router threads reside on the
dedicated IP-router microengine. In the software-CRC configuration, two IP Router threads reside
on the ATM Receive microengine.
3.3.2High Level Algorithm
Figure 14. IP Router High Level Algorithm
while(1)
msgq_receive() packet from ATM RX engine
ip_filter() out SNMP, IGMP
ip_addr_validation() to discard packets from reserved addresses
ip_dbcast_check() to filter out packets from directed broadcast addresses
ip_proc()
ip_verify() check TTL and checksum
ip_modify() update TTL
ip_route_lookup()
port_enabled_check() to discard packets from disabled port
update Ethernet MAC Source Address with our own
#ifdef ATM_LOOPBACK //Allow hardware configurations with ATM outputs
if(output port == ATM port)
over-ride ATM destination port with round-robin Ethernet port
#endif
packetq_send() packet to destination Ethernet port
//connected directly to ATM inputs
3.4Ethernet Receive Microengine
The Ethernet Receive microengine is based on rx_ether100m.uc, an extended version of the
Ethernet receive threads from the Software Development Kit’s (SDK's) 16-port Ethernet example
1
. While the code looks quite different from that on the SDK, most of the changes required a
design
simple move to a more efficient structure, without changing the logical function of the
microengine. For example, the threads take advantage of updated APIs for the RFC1812 macros to
lower the overhead of RFC1812 support.
Semantically, there are only a few differences from the SDK Ethernet design.
• IP lookup can return an ATM destination port, or an Ethernet destination port.
• For ATM destinations, prepend the LLC/SNAP to the payload.
• For ATM destinations, append the AAL5 trailer.
1.The SDK (Software Develo pm ent Kit) 2.01 CD contains a number of earlier IXP1200 Ethern et exa m pl e designs that have remained
relatively unchanged from previous releases of the SDK. The Ethernet receive and transmit code in this example design reuses that code with
few modifications
Application Note23
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
• For A TM destinations, enqueue to the ATM Transmit microengine, or for software CRC, to the
appropriate AAL5 CRC-32 generation queues.
The ETHERNET_LOOPBACK build option enables routing packets from Ethernet Receive ports
to Ethernet Transmit ports. This is useful for equipment checkout in the lab. If this option is not
defined, packets received from ethernet ports which route to ethernet output ports are discarded
with IP_NO_ROUTE exception. If this option is defined, the packets are forwarded as requested.
3.4.1Ethernet Receive Structure
There are four identical threads on each Ethernet receive microengine. Each thread services a
specific port and uses a specific RFIFO element.
3.4.2Ethernet Receive High Level Algorithm
Figure 15. Ethernet Receive High Level Algorithm
while(1)
if(no receive buffer in hand)
allocate a receive buffer
receive MPKT from MAC to RFIFO
if(SOP)
read link layer header from RFIFO
if (not Ethernet)
record output queue to be to StrongARM core
else
transfer end of MPKT from RFIFO to DRAM
read IP header from RFIFO
if (IP header checksum error)
remember to discard this packet
endif
update IP header TTL and checksum
ip_lookup()
write LLC/SNAP and modified IP header to DRAM
endif
else // !SOP
extract byte count from receive state
transfer MPKT from RFIFO to DRAM data buffer
endif
if(EOP)
write AAL5 trailer
enqueue PDU to ATM transmitter
endif
3.5Ethernet Transmit Microengine
The Ethernet Transmit microengine is rooted in ether_tx_threads.uc, which simply includes
system_init.uc, invokes system_init(), sets some definitions, and inclu des tx_ether100m.uc from
the 16-port Ethernet example design on the 2.01 SDK.
Other than that change, there is only one other difference between this Ethernet transmitter and the
implementation used by SDK example designs like L3fwd8_1f. With RFC1812 enabled, the SDK
example designs place the Ports-With-Packets (PWP) vector in SRAM and polls it to find packets
to send. This design uses a more efficient implementation that polls an scratchpad resident PWP
vector for the data plane, and checks for a signal before polling an SRAM resident PWP vector to
consume packets from the StrongARM core.
24Application Note
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
3.5.1Ethernet Transmit Structure
The Ethernet Transmit microengine contains three fill threads and one transmit scheduler thread.
The Ethernet transmitter uses the eight even TIFO elements, allowing the ATM transmitter to use
the eight odd Transmit FIFO elements. This is the same TFIFO sharing mechanism that is used by
the L3fwd8_1f SDK example, except here the peer transmitter is ATM instead of Ethernet.
3.5.2High Level Algorithm
As mentioned in “project_config.h”, defining ETHERNET_LOOPBACK allows the project to
forward packets from Ethernet source po rts to Ethernet des tination ports. En abling this option adds
a small cost in the Ethernet transmitter because it needs to be able to handle transmit data starting
on variable buffer offsets.
This implementation uses thread0 as a scheduler, and the others are used as fill threads:
Thread0:
while(1)
tx_100m_assign()
tx_100m_assign() makes work assignments to the three fill threads of this microengine. Slow ports
are mapped directly to TFIFO elements. Therefore, if the target port has no packets, the fill thread
is given a ‘skip’ assignment. When the fill thread executes a skip assignment, it forces the
hardware to skip a TFIFO element without transmitting any data from the TFIFO element onto the
IX bus.
Threads1,2,3:
while(1)
read assignment from scheduler
restore portinfo state from absolute registers
if (assigned to transmit a packet)
transfer MPKT to TFIFO and validate
update portinfo state
else
skip TFIFO element
endif
3.6CRC-32 Calculations using IXP1240/1250 Hardware
The IXP1240 adds sdram_crc[] instructions to the IXP1200 instruction set fo r efficient CRC
calculation. This design takes advantage of that hardware support in the ATM receiver and the
ATM transmitter. On receive (reassembly), CRC is checked when ATM cells are transferred from
RFIFO to DRAM. On transmit (segmentation), CRC is generated when ATM cells are transferred
from DRAM to the TFIFO.
3.6.1CRC-32 Hardware Checking on Receive
Quadword 0 is copied with an sdram_crc[r_fifo_rd], mask_right instruction. This applies the CRC
to the four bytes labeled "LLC0" in Figure 16, but not to the ATM header. The ATM header is not
actually needed in the DRAM data buffer, but it is transferred, because this is more efficient than
performing a read/modify/write to preserve insignificant bits in the buffer.
Application Note25
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
Quadwords 1-5 are transferred by an sdram_crc[r_fifo_rd, 5] instruction. Quadword 6 contains
"Data 11" -- the eleventh 32-bit longword of the cell. Data 11 is stored in the VC table entry to be
consumed when the next cell in this PDU arrives. When the first cell is also the last cell (for
example, for a single-cell-PDU), Data11 contains the CRC-32 of the AAL5 trailer, and it is
compared to the one’s complement of the computed CRC syndrome.
Figure 16. First Cell of a PDU in RFIFO and in DRAM
This design can actually skip the first RFIFO->DRAM transfer because LLC0 is constant on the
first cell and it is explicitly compared with the LLC0 value in the microengine. After a successful
compare, it is stripped from the packet. With the following optimization enabling definition,, the
CRC computation begins with LLC1 using the syndrome that would result from CRC over LLC0
(with the initial configuration, it is enabled by default):
#define CRC32_RX_LLC0
The algorithm for transferring the nth cell of a PDU is slightly different than that for moving the
first cell - as illustrated in Figure 17.
• The four bytes labeled ’11’ make up the longword ’data11’ from the first cell. The four bytes
labeled ’0’ make up the longword ’data0’ from the second cell.
26Application Note
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
• Upon reception of the first cell, data11 is saved in the VC cache/table entry . Up on reception of
the 2nd cell, data11 is retrieved from the VC cache/table entry, combined with data0 of the
second cell, and written in a single burst to DRAM.
Moving the nth cell (not cell0) in a PDU from the RFIFO to DRAM is similar to using the macro
atm_move_cell0_rfifo_to_sdram(), except that:
• The nth cell must start with a run-time crc_residue resulting from CRC on the previous cell in
the PDU.
• The nth cell must combine data11 of the previous cell with data0, as shown in Figure 17.
3.6.2CRC-32 Hardware Generation on Transmit
Figure 18 and Figure 19 show the layout of the cell source as it appears in DRAM, and the desired
format in the TIFO, respectively. Aspects of the first, nth, and last cell are all overlaid on the same
diagram, as the positions are the same. In each diagram, rows are 64-bit “quadwords”.
The alignment of this cell in DRAM is dependent on how the data was received. In this example
design, the data was received on Ethernet, with a 14 byte Ethernet header. Therefore, the first byte
of the IP header starts on the 15th byte of the buffer.
The sdram_crc[t_fifo_wr] commands account for this alignment by using the IXP12xx byte
alignment hardware. These diagrams show bytes in big-endian order, while the instruction
encoding asks for byte alignment assuming little endian order. Therefore the 6-byte offset shown
here, becomes a 2-byte offset as encoded in the indirect_ref.
Application Note27
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
The hardware byte aligner operates on the data before the CRC computation hardware. This can be
seen in the transfer to quadword 0 of the TFIFO element with sdram_crc[t_fifo_wr], mask_right
with a byte alignment of 2 and a CRC mask value of 4.
Quadwords 1-5 are transferred with sdram_crc[t_fifo_wr, 5] with the same alignment. For
quadword 6, the processing depends upon whether or not it is the last cell of a PDU:
• If quadword 6 is not the last cell, it is transferred via sdram[t_fifo _wr], mask_left, then the
syndrome is extracted for use when the next cell is sent on this VC.
• If quadword 6 is the last cell, the syndrome is read after quadword 5 is finished, it is inverted
and transferred viat_fifo_wr[] to quadword 6 from the microengine.
In all cases, after the cell is transferred and CRC is done, the first quadword is overwritten by the
microengine to insert the ATM header on the front of the cell. As the TFIFO is addressable only as
quadwords, the write will also update the first four bytes of cell payload (labeled LLC0 in the
example diagram). To preserve these first four payload bytes, the microengine first reads them
from DRAM and combines them with the ATM header before overwriting quadword0.
As with LLC0 in the ATM receiver, this design can be optimized to take advantage of that the
constant LLC0 constitutes the first four bytes of payload on the first cell of a PDU (with the initial
configuration, it is enabled by default):
#define CRC32_TX_LLC0
3.7CRC-32 Checker and Generator Microengines (Soft-CRC)
The CRC-32 microengine code, "Software CRC", is needed only for IXP1200 configurations.
IXP1240 or IXP1250 designs employ sdram_crc[] hardware instructions to perform the same
calculation more efficiently.
In IXP1200 configurations, there are two microengines dedicated to AAL5 CRC-32 calculations:
• One consumes the ATM Receive data stream and checks the CRC-32 before routing to
Ethernet Transmit packet-queues.
• One consumes the Ethernet Receive data stream and generates CRC-32 before forwarding to
the appropriate ATM Transmit queues.
3.7.1Functional Differences between Checker and Generator
There are four functional differences between the Checker and Generator:
• DRAM data buffer payload alignment: depends on if it was received from ATM or Ethernet.
• Queues to be consumed.
• Queues to be supplied.
• CRC-32 answer - the checker compares it to the received CRC, while the Generator writes it
into the AAL5 trailer.
The source code is assembled into binaries optimal for Checking or Generating based on the
microengine number assignments from system_config.h.
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
3.7.2CRC-32 Checker and Generator High Level Algorithm
Figure 20. CRC-32 High Level Algorithm
// CRC Checker
while(1)
dequeue PDU from CRC CHK BDQ
calculate_crc() over entire PDU
if (AAL5 trailer CRC == calculated CRC)
enqueue PDU onto Ethernet Transmit packet queue
else
drop PDU
endif
//CRC Generator
while(1)
dequeue PDU from CRC GEN BDQ
calculate_crc() over entire PDU
write calculated CRC into AAL5 trailer in DRAM data buffer
enqueue PDU onto ATM TX UBR BDQ
The PDUs within each VC on each port are enqueued on the output in the same order that they
were dequeued from the input.
3.7.3CRC-32 Computation
CRC-32 computation is performed by the calculate_crc32() macro in atm_aal5_crc32lib.uc.
The data stream is used to index tables of pre-computed CRC-32 results. The results are combined
serially to produce the CRC-32 for the entire AAL5 PDU.
The lookup tables are generated by code in atm_aal5_crc32_table.c. In simulation, the code
produces files that contain the tables and are downloaded into SRAM by startup scripts.
For hardware, the tables are generated by the same code running on the StrongARM core, but
rather than creating files, the tables are written directly to memory.
4.0Software Subsystems & Data Structures
4.1Virtual Circuit Lookup Table - atm_vc_table.uc
4.1.1VC Table Function
The ATM receive microengine uses a VC Lookup T ab le to manage r eassemb ly of cells into PDUs.
The virtual circuit address bits in each cell header, plus the receive port number , un iquely specify a
VC table entry for that VC. ATM Receive performs a VC Lookup to qualify every cell received.
The lookup returns the VC Lookup Table Entry structure with the format shown in Figure 23 and
Figure 24.
Application Note29
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
The OC-12 configuration uses a VC Table Cache in conjunction with the VC table, however the
description of the backing VC table in this section applies with or without the presence of a VC
Cache.
The VC table entry answers the following questions for the ATM Receive thread:
• Is the VC open? (If no, discard the cell)
• Which LLC/SNAP patterns are expected at the start of each PDU? (If no match, discard cell.)
• Which AAL is the VC open for? (ATM Receive currently processes only AAL5.)
• Where should ATM Receive put the payload in DRAM (buffer and offset)?
• For hardware CRC: what is the current syndrome for this PDU?
4.1.2VC_TABLE_HASHED Structure
VC_TABLE_HASHED supports the entire ATM VC name-space by employing the IXP12xx
hashing hardware as follows:
• At initialization, microcode loads the hash48 multipli er CSRs with the largest prime number
that fits in to 48 bits: 0xffffffffff c5.
• At run-time, ATM Receive locates entries like so:
The index is used to read an entry from a 64K entry "primary" hashed VC Table in SRAM. If the
key in the entry matches the starting key, the hash table has successfully delivered the right VC
table entry with just one SRAM read. If the key does not match the key in the entry read from the
primary table, it follows a linked "collision" list threaded with the entry "Next" field (see figure
Figure 23)
30Application Note
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
Figure 21. Hashed VC Table Structure
Primary VC Table
(SRAM)
Primary VC Entry
with a collision list
VC Entry on
collision list
Primary VC Entry
without a collision list
…
Hardware Top
of Stack Registers
Collision freelist starting with
SRAM top of stack register
When atm_vc_table_entry_create() attempts to add an entry to the table and determines that the
entry in the primary table is alread y o ccup ied, it needs to co me u p with an available entry to thread
onto the Next pointer. Although other implementations (which have less available RAM) take
entries from the primary table to perform this task, this implementation has a dedicated pool of 16K
collision entries that are available in a buf.uc style freelist threaded on hardware stack 1. The
motivation is that VC lookup is on the critical performance path. Therefore, this design needs to
maximize the chances that entries will be found in the primary table rather than on the collision
lists. However, the optimal primary table and collision free-list sizes will depend on the target
workload (an implementation issue).
4.1.3VC_TABLE_LINEAR Structure
VC_TABLE_LINEAR implements a simple linear array of VC table entry structures in SRAM.
The size of the table depends on the number of VCs being supported, which correspondingly
depends on the number of ports and the number of significant VCI and VPI bits in the ATM header .
The defaults for these parameters are set in system_config.h, and can be overridden in project_config.h.
VC Entry at
end of freelist
. . .
A9633-01
Application Note31
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
The project defaults to support a 64K-entry VC table - independent of the number of ports. It does
this with eight significant VCI bits, and eight more bits split between VPI and ports. This means
that the design can distinguish the difference between 64K different VCs. However, it does not
mean that the design can simultaneously reassemble PDUs on all 6 4K entries. The sys tem supports
only 16K packet buffers, and would run out of buffers were it to attempt to assemble PDUs on
more than 16K VCs.
4.1.4VC Table Management API - atm_utils.c
atm_utils.c implements C-language utilities to manage the VC Lookup Table. These utilities are
available both in simulation at the Transactor command prompt, as well as VxWorks kernel entry
points.
The current implementation assumes Permanent Virtual Circuits (PVCs), i.e. it does not support the
StrongARM core updating the VC table while the microcode is using the table. Switched Virtual
Circuit (SVC) support could be added by employing SRAM locks or atomic operations to avoid
conflicts between simultaneous StrongARM core and microengine access to the same VC entry.
4.1.5VC Table Entry
The format of the VC Table entry for VC_TABLE_HASHED is the same as for
VC_TABLE_LINEAR, with the addition of 2 32-bit words to hold the Next address and the hash
Key for the entry.
This format is only partially hidden from ATM Receive, the consumer of the VC table API, though
macros could be implemented to make it appear to opaque.
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
EntryDescription
Next
Key
Buffer Offset
Buffer Index
LLC/SNAP
Q
AAL
CRC
Cell Data11
Address of the next entry in the chain of entries that hash to the same row. 0 indicates no
next entry. (21 bit SRAM address)
Hash key used to find this entry, also used to confirm arrival at the desired entry. Key =
(atm_header & 0xFFFFFFF0) | port#
Indicates which 64-bit DRAM word in the buffer should receive the next payload. On
completion of PDU assembly, this field is copied to the buffer descriptor.
Buffer descriptor (and data buffer) to be used by ATM Receive to deposit payloads on this
VC.
1: LLC0_IP, LLC1_IP
else: available for other patterns
Queue To St rongARM core "Q" flag
1: queue all traffic to core
0: do not queue to core
5: ATM Adaptation Layer 5
0: VC is not open
The CRC-32 syndrome associated with the PDU. It is saved in the VC table entry after a cell
is moved, and then retrieved and used when the next cell in the PDU is received.
The last four bytes of the previous cell in this PDU. Used during re-assembly of PDUs to
allow 8-byte quadword burst writes to DRAM without using DRAM Read/Modify/Write
instructions.
0Buffer OffsetBuffer IndexLLC/SNAP QA AL
1CRC
2Cell data11
EntryDescription
Buffer Offset
Buffer IndexBuffer descriptor (and data buffer) to be used by ATM Receive to deposit payloads on this VC.
LLC/SNAP
Q
Indicates which 64-bit DRAM word in the buffer should receive the next payload. On
completion of PDU assembly, this field is copied to the buffer descriptor.
1: LLC0_IP, LLC1_IP
else: available for other patterns
Queue To St rongARM core "Q" flag
1: queue all traffic to core
0: do not queue to core
9876543210
0
Application Note33
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
EntryDescription
AAL
CRC
Cell Data11
5: ATM Adaptation Layer 5
0: VC is not open
The CRC-32 syndrome associated with the PDU. It is saved in the VC table entry after a cell
is moved, and then retrieved and used when the next cell in the PDU is received.
The last four bytes of the previous cell in this PDU. Used during re-assembly of PDUs to allow
8-byte quadword burst writes to DRAM without using DRAM Read/Modify/Write instructions.
4.2Virtual Circuit Lookup Table Cache
4.2.1VC Cache Function
4.2.1.1OC-12 Configuration
The intent of the VC cache is not to reduce average latency but to account for back to back cells
from the same VC. It is not possible to reduce average latency because the design has to account
for worst case cache miss on every VC lookup anyway.
The OC-12 configuration caches the results of VC Table lookup operations in absolute registers.
The intent of the VC cache is not to reduce average latency, but rather to account for back-to-back
cells from the same VC. It is not possible to reduce average latency, because the design has to
account for worst-case cache miss on every VC lookup. In this scenario, processing of the
subsequent cell can only commence once processing of the previous cell has been completed and
recorded in the VC Table Entry. In particular, the subsequent cell can access the VC Table Entry
only after the previous cell has updated the buffer offset telling the cell where to go, and updated
the CRC syndrome. The CRC syndrome is known only after the previous cell is done transferring
from RFIFO to DRAM, and it must be known before the subsequent cell starts transferring from
RFIFO to DRAM.
4.2.1.2OC-3 Configuration
The OC-3 configuration does not require, and thus does not enable, the VC Cache. In the OC-3
receiver, there is a single thread dedicated to each port. Therefo re, by definition the cells coming in
on each port are on different VCs and threads will thus never have to wait for access to the same
VC Table Entry.
4.2.2VC Cache Structure
There are four VC Cache entries, enough to guarantee that every thread in the ATM Receive
microengine will always be able to find one to use. Each VC Cache entry occupies 6 absolute
registers.
Register(s)Description
@vc_key0...@vc_key3VC and port associated with the entry
@seq_enter0...@seq_enter3
@seq_exit0... @seq_exit3
@vc_flags0...@vc_flags3Local working copy of the flags in the VC Table Entry.
34Application Note
Modified on: 3/20/02,
Implement a sequence number for each entry to maintain the order that
multiple threads attempt to access the entry.
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
Register(s)Description
@vc_crc0...@vc_crc3Local working copy of the CRC syndrome in the VC Table Entry.
@data11_0...@data11_3
@vc_address...@vc_address3
4.2.3VC Cache API
There is no interaction between the StrongARM core and the VC Cache. In particular, there is no
method for the StrongARM core to force the ATM Receive microengine to invalidate cache entries
to synchronize with StrongARM core initiated updates to the VC Table. If the design is enhanced
to support SVCs in addition to PVCs, then the Core will need such an interface to guarantee that
the ATM Receive microengine does not operate with stale cache entries. (As the ATM Receive
microengine does not consume any inter-thread signals after initialization, they are available for
interaction with the StrongARM core.)
The macros that implement the microcode API to the VC Cache are implemented and described in
atm_rx.uc.
Holds the last four bytes of the previous cell in the VC table, so the
microengine can combine it with the first four bytes of the subsequent cell
and perform a single 8-byte DRAM write including them both.
Records the address in SRAM where the backing VC Table Entry came
from, so that it is not necessary to re-compute it when it is time to write the
updated entry back to SRAM.
4.3IP Lookup Table
The IP lookup table used in the ATM/Ethernet router is an extension of the implementation used in
the homogeneous Ethernet example designs. The same table is used to store both ATM and
Ethernet port destinations. The two IP Lookup Table Entry formats are shown in Figure 25 and
Figure 26.
4.3.1IP Table Function
The route table provides routing information for a given IP destination address. The type of
information provided by the table differs slightly depending on which technology (ATM or
Ethernet) will be used to transmit the packet.
• If the output port is Ethernet, the route table will provide the output port number and the MAC
address information.
• If the output port is ATM, the route table will provide the output queue (In the current
implementation this is a physical port identifier, future designs may use this queue designation
to represent a "virtual" port), the VCI/VPI for the connection, and the LLC/SNAP header to
use when encapsulating the IP packet.
4.3.2IP Table Structure
The ATM project uses the Trie5 Longest Pref ix Match algo rithm implemented in ip.uc. The lookup
portion of the table is maintained in SRAM with the actual route table entries in DRAM.
Application Note35
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
4.3.3IP Table Management API
The route table is managed by the Route Table Manager (RTM), which may be used from both
Transactor Scripts and VxWorks. It may be compiled and loaded as a local foreign model, thus
allowing its C functions to be called from a Transactor Script. Or, it can be compiled as a VxWorks
loadable object.
The API ma y be printed o ut by entering rt_help() at the command line of either VxWo rks, or the
Transactor simulator.
4.3.3.1route_table_init()
Initializes route table memory and data structures.
route_table_init(int sram_base_addr, int dram_base_addr)
ParameterDescription
sram_base_addrThe starting address of the SRAM memory allocated for route lookup entries.
dram_base_addrThe starting address of the DRAM memory allocated for the route table entries.
4.3.3.2mtu_change()
Sets the MTU for subsequent route table additions.
mtu_change(int new_mtu)
ParameterDescription
int new_mtuNew default MTU.
4.3.3.3atm_route_add()
Adds a route for ATM destination to the route table.
atm_route_add(char *dest, char *netmask,char *gateway, int port_type,int
queue_index,int atm_hdr,int llc_snap_hi, int llc_snap_lo)
ParameterDescription
char *destString IP destination, e.g. "1.1.1.1"
char *netmaskString netmask, e.g., "255.255.0.0"
char *gatewayString next hop gateway, e.g., "255.255.0.0"
int port_typ eType of port.
int queue_indexIndex of the output queue.
int atm_hdrvpi/vci for the connection.
int llc_snap_hihi 32 bits of llc/snap header.
int llc_snap_lolo 32 bits of llc/snap header.
36Application Note
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
4.3.3.4enet_route_add()
Adds a route with Ethernet destination to the route table.
enet_route_add(char *dest, char *netmask, char *gateway, int itf, int
gateway_da_hi32, int gateway_da_lo16,int gateway_sa_hi16, int gateway_sa_lo32)
ParameterDescription
char *destString IP destination, e.g. "1.1.1.1"
char *netmaskString netmask, e.g., "255.255.0.0"
char *gatewayString next hop gateway, e.g., "255.255.0.0"
int itfPhysical interface id (outputport number).
int gateway_da_hi32High 32 bits of the MAC destination address.
int gateway_da_lo16Low 16 bits of the MAC destination address.
int gateway_sa_hi16High 32 bits of the MAC source address.
int gateway_sa_lo32Low 16 bits of the MAC source address.
4.3.3.5rt_ent_info()
Displays the available route table information for a given destination address.
rt_ent_info(char *destination)
ParameterDescription
destinationThe destination address, in dotted decimal form, of the route entry to display.
4.3.3.6route_delete()
Deletes a route from the route table.
route_delete(char *dest, char *netmask)
ParameterDescription
destString IP destination, e.g. "1.1.1.1"
netmaskString netmask, e.g., "255.255.0.0"
4.3.3.7rt_help ()
Outputs a list of command line RTM functions.
4.3.4IP Route Table Entry
The IP lookup table entries reside in DRAM. The same table is used for both ATM and Ethernet
destinations. The ATM and Ethernet Receive threads call the macro route_lookup() to obtain an
index in the route table to the table entry. If the ITF field contains the ATM port type bit
(0x80000000), then the entry is interpreted as an ATM destination, otherwise it is an Ethernet
destination.
Application Note37
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
Figure 25. IP Route Table Entry - ATM Destination
0123456789
ATM Bit +
MTU
ATM bit + MTU0x80000000 | MTU
Queue Indexqueue index (16 bits)
ATM HeaderATM header for this VC, sans PTI bits
IP DestIP destination address (32 bits)
IP mask IP subnet mask (32 bits)
IP Gateway IP next hop gateway (32 bits)
LLC Highupper 32 bits of LLC/SNAP header
LLC Lowlower 32 bits of LLC/SNAP header
Queue
Index
EntryDescription
1011121314151617181920212223242526272829303
ATM
Header
IP DestIP Mask
Figure 26. IP Route Table Entry - Ethernet Destination
0123456789
ITF
MAC DA
(0-3)
1011121314151617181920212223242526272829303
0
IP DestIP Mask
IP
Gateway
IP
Gateway
LLC HighLLC Low
MAC SA
(0-3)
Bytes ->
1
Bytes ->
1
MAC DA (4,5)
EntryDescription
ITFOutput interface (32 bits).
MAC DA 0-3Upper 32 bits of the destination MAC address.
MAC DA 4-5Lower 16 bits of the destination MAC address.
IP DestIP destination address (32 bits).
IP MaskIP subnet mask (32 bits)
IP GatewayIP next hop gateway (32 bits).
MAC SA 0-3Upper 16 bits of this gateway’s source MAC address.
MAC SA (4,5)
MTUMaximum packet size.
Lower 32 bits of this gateway’s source MAC address.
4.4SRAM Buffer Descriptors and DRAM Data Buffers
SRAM Buffer Descriptors and DRAM Data Buffers are a fundamental component of this design.
Each descriptor occupies 16 bytes of SRAM, and is used as a handle to describe and manage the
buffer. Each data buffer occupies 2K bytes of DRAM and holds the PDU payloads.
MAC SA (4,5)
MTU
38Application Note
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
Both descriptors and buffers are stored in arrays. The array index is used to associate a unique
DRAM Data Buffer with each SRAM Descriptor:
Figure 27. SRAM Descriptor to DRAM Buffer Mapping
SRAM
SRAM
data buffer [i+2]
descriptor [i+2]
descriptor [i+1]
descriptor [i]
4.4.1SRAM Buffer Descriptor Format
This buffer descriptor format is used throughout the design, except when a descriptor is enqueued
onto a packet_queue for Ethernet transmit.
Figure 28. Buffer Descriptor Format for ATM Transmit Destination Port
ZUnused - will be overwritten upon enqueue/dequeue address updates
Next BD32-bit SRAM address of the next buffer descriptor in the same queue
Last QuadOffset of the last quadword in the buffer that contains data
XUnused - will be erased every time LAST_QUAD is updated, Rx any cell
Queue IndexIndex of the queue where this descriptor came from
Start Byte OffsetOffset of the first byte of data to be transmitted
YUnused - will be erased every time Start byte offset is updated, Rx first cell -- Tx any cell
ATM HeaderATM Header (w/o HEC) to be attached to each cell of the PDU in the buffer
data buffer [i+1]
data buffer [i]
A9783-01
9876543210
0
Application Note39
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
Figure 29. Buffer Descriptor Format for Ethernet Transmit Destination Port
3130292827262524232221201918171615141312111
0
1RCV_PORTFL_IDSTART_BYTEEND_BYTEELE_COUNT
2-1
3
EntryDescription
RCV_PORTReceive Port
FL_IDFree list ID
START_BYTEFrame start location in the buffer (zero-based)
END_BYTE
ELE_COUNTNumber of 64-byte MPKTs in packet
Number of bytes in the last MPKT - minus 1 (e.g. 0 means 1 byte)
4.4.2DRAM Data Buffer Format
Packet payloads are stored in DRAM data buffers. Depending on if the data was received on an
ATM or Ethernet port, the payload will land in a different place within the data buffer..
Figure 30. DRAM Data Buffer Format - 12 Byte Offset (Received by ATM)
0123456789
101112131415161718192
0
9876543210
0
...Bytes ->
ATM
Header
LLC/SNAPIP...IPPadAAL5 Trailer
Figure 31. DRAM Data Buffer Format - 6 Byte Offset (Received by ATM, Transmitted by
Ethernet)
0123456789
Enet Dest AddrEnet Src AddrTypIP...IP
101112131415161718192
0
...Bytes ->
Figure 32. DRAM Data Buffer Format - 6 Byte Offset (Received by Ethernet, Transmitted by
ATM)
0123456789
10111213141
LLC/SNAPIP...IPPadAAL5 Trailer
...Bytes ->
5
Figure 33. DRAM Data Buffer Received by Ethernet
0123456789
Enet DstAdrEnet SrcAdrTYP IP...IP
40Application Note
101112131
...Bytes ->
4
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
4.4.3System Limit on Packet Buffers
Several factors are involved in the number of packet buffers the system can support:
• The Ethernet transmitter uses packetqs (packetq.uc), and the implementation of packetqs can
address onl y 16,000 different buffers.
• DRAM capacity used = 2KB/buffer * number o f buf fers. Therefo re, for 16, 000 buf fers , 32MB
of DRAM is consumed, which is half the memory capacity of most baseboards. (DRAM
capacity used by packet buffers can be crunched by reducing the buff er size to just fit a 1500
byte MTU. (2KB is overkill for this, but a handy power of 2), as well as enhancing the design
to also supporting small data buffers to hold small packets).
• SRAM capacity used = 16B * number of buf fer s. Theref ore, for 16, 000 buf fe rs only 256 KB of
SRAM is used, vs. an 8MB SRAM capacity.
4.5Sequence Numbers - sequence.uc
Intra-microengine register-based sequence numbers are supplied by sequence.uc, and are used
extensively throughout the ATM portion of this design. This example employs a singlemicroengine fast port receiver and so unlike other designs, it has no use for the global hardware
enqueue sequence number registers. ATM Receive has inters ecting sequence numbers to de-coup le
RFIFO receive order, VC cache/table lookup, and msgq_send(). ATM Transmit has sequence
numbers to decouple cell within a PDU order from TFIFO validate order. On the IXP1200 software
CRC microengine, sequence numbers are used to maintain PDU order within a VC.
sequence.uc contains the following API calls:
API CallDescription
sequence_init(SEQUENCE_HANDLE)Initialize global state for the sequence number.
sequence_enter(SEQUENCE_HANDLE)
sequence_wait(SEQUENCE_HANDLE)Wait until exit sequence number is equal to mine.
sequence_exit(SEQUENCE_HANDLE)Increment exit sequence number and continue.
4.5.1SEQUENCE_HANDLE Usage
All sequence.uc calls use the same parameters. For convenience, a handle is typically defined and
used for all of the calls, as shown in the example below.
ParameterDescription
in_my_seqRelative GPR to hold sequence number for this thread.
in_enterAbsolute GPR to hold ENTER sequence for all threads.
in_enter_inc
io_exitAbsolute GPR to hold the EXIT sequence for all threads.
in_exit_inc
NUM_BITS
A register containing the value 1, or the constant 1. Register gives highest
performance.
A register containing the value 1, or the constant 1 Register gives highest
performance.
Number of bits in the sequence number. Must be a power of 2, from 1 to 32
inclusive. 32 is highest performance.
Increment absolute enter sequence number, and return that
number in a relative GPR.
Application Note41
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
4.5.2Usage Mo del
The following model is described by an analogy to waiting in line at a bakery:.
StepSequence OperationBakery Line Analogy
sequence_enter() returns a sequence number to a thread
and updates the absolute.enter so that the next time
1
sequence_enter() is invoked, the following sequence
number will be returned
sequence_wait() compares its sequence number with the
2
absolute.exit, and context swaps until they are the same.
Having gotten past sequence_wait(), the thread
3
processes the critical region.
sequence_exit() increments absolute.exit to let the next
4
sequence number past sequence_wait().
Enter bakery and take a ticket.
Wait in line for the "Now Serving" sign to
match your ticket.
Get served, keep others in line away from
counter.
Exit bakery, "Now Serving..." sign gets
incremented to let next customer to
counter.
4.5.2.1Example
#define MY_SEQUENCE_HANDLE my_seq_number, @enter, @one, @exit, @one, 32
sequence_init(MY_SEQUENCE_HANDLE) // initilize global state
while()
<...> // get work in order
sequence_enter(MY_SEQUENCE_HANDLE) // record the order
<...> // process non-critical section
sequence_wait(MY_SEQUENCE_HANDLE) // wait my turn
msgq_send() // process critical section
sequence_exit(MY_SEQUENCE_HANDLE) // let the next guy go
4.6Message Queues - msgq.uc
The Message Queue subsystem supports 31-bit messages between microengines. The queues are
implemented with circular buffers, typically in scratchpad RAM. The queues are point-to-point,
there can be only one sender microeng ine, and one receiver microengin e because the q ueue indexes
are stored privately in microengine registers rather than shared in RAM.
If the sender sends to a fu ll queue, it will return an err or so that the send er is able to determine what
to do with the unsent message.
The threads within the sender must cooperate and not simultaneously access the same queue. This
is typically done by putting the msgq_send() or msgq_receive() inside a critical section.
The message queue handle can specify that receives be either asynchronous or synchronous:
• Asynchronous receives (MSGQ_ASYNC) will return after reading what was in the queue, no
matter if it was valid or invalid. The invoking thread must look at the invalid bit to decide what
to do with the message.
• Synchronous receives can either loop internally on receipt of invalid messages
(MSGQ_SYNC_POLL), or go to sleep after receiving an invalid message
(MSGQ_SYNC_SLEEP). The sender must know to (always) wake up the receiver if
MSGQ_SYNC_SLEEP is used.
42Application Note
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
4.6.1MSGQ_HANDLE Parameters
The following parameters make up MSGQ_HANDLE and are common to all macros in msgq.uc:
ParameterDescription
io_index
in_base_addr
in_const_one
BASE_ADDRBase address of the queue in RAM_TYPE -- loaded into in_base_addr by msgq_init().
SYNC_TYPE
RAM_TYPE RAM type. Typically scratchpad, can also be SRAM.
MSGQ_SIZE
GPR storing the current index into the queue. An absolute register is used to share the
index between threads. However, if t he threads don’t share access to the queue, a relative
GPR can be used.
GPR storing the base address of the queue in RAM_TYPE (scratchpad or SRAM). An
absolute GPR is used when the queue is shared between threads.
The value one in a GPR, typically absolute, or the constant 1. The register is generally
used to save cycles.
Synchronization type, as follows:
#define MSGQ_ASYNC 0 - return immediately, with or without data
#define MSGQ_SYNC_POLL 1 - wait for data -- poll while waiting
#define MSGQ_SYNC_SLEEP 2 - wait for data -- sleep while waiting, sender must know
to wake up receiver
Number of longwords in the message queue. Must be a power of 2. 16 is typically used for
scratchpad queues because it saves instructions.
4.6.2msgq_init_queue()
Initializes the global queue in RAM_TYPE. Called by central initialization code before queues are
accessed.
msgq_init_queue(MSGQ_HANDLE)
ParameterDescription
MSGQ_HANDLEParameters described in “MSGQ_HANDLE Parameters”.
4.6.3msgq_init_regs()
Initializes the registers used to access the queue. Called by both producer and consumer.
msgq_init_regs(MSGQ_HANDLE)
ParameterDescription
MSGQ_HANDLEParameters described in “MSGQ_HANDLE Parameters”.
4.6.4msgq_send()
Sends a message to the queue.
Application Note43
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
msgq_send(io_message, MSGQ_HANDLE, RAM_OPTION)
ParameterDescription
io_message
MSGQ_HANDLEParameters described in “MSGQ_HANDLE Parameters”.
RAM_OPTION
4.6.5msgq_receive()
Receives a message from the queue.
msgq_receive(io_xfer, MSGQ_HANDLE)
ParameterDescription
io_xfer
MSGQ_HANDLEParameters described in “MSGQ_HANDLE Parameters”.
4.6.6Example
In the following example, a single microengine uses four threads to receive from INPUT_MSGQ,
perform some processing , then se nd to OUTPUT_MSGQ in the order received. The example shows
how critical sections are used to control multiple threads accessing the same queue, and how
sequence numbers can be used to maintain queue order.
The message to be sent. Valid messages must have bit 31 clear, and must not
be 0. 0 is returned on success, the message is untouched on failure.
ctx_swap, sig_done, no_option -- depending on the behavior desired for the
write at the end of msgq_send().
A read/write SRAM transfer register for use by msgq_receive(). The write
transfer is terminated and the read transfer returns the message.
msgq_init_queue(INPUT_MSGQ) ; must complete before any threads access queue
msgq_init_queue(OUTPUT_MSGQ) ; must complete before any threads access queue
...
msgq_init_regs(INPUT_MSGQ)
msgq_init_regs(OUTPUT_MSGQ)
sequence_init(MY_SEQUENCE)
critsect_init(@mutex)
...
critsect_enter(@mutex) ; allow only 1 thread to access queue at a time
sequence_enter(MY_SEQUENCE) ; remember the order messages were received
msgq_receive($xfer, INPUT_MSGQ) ; receive a message
critsect_exit(@mutex) ; allow next thread to receive
44Application Note
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
... ; process the message, threads may get out of order.
move(message, $xfer)
sequence_wait(MY_SEQUENCE) ; wait until it is my turn to send
msgq_send(message, $xfer, MY_MSGQ, ctx_swap)
.if (message != 0)
counter_inc(OUTPUT_MSGQ_IS_FULL) ; record failure
buf_push(message, ...)
; if message is descriptor, return it...
.endif
sequence_exit(MY_SEQUENCE)
; allow next thread through sequence_wait()
4.7Buffer Descriptor Queues - bdq.uc
This design uses a generic buffer descriptor queuing s ubsystem to pass data between microengin es.
This section describes the facility so that it will be clear when it is applied throughout the design.
Buffer Descriptor Queues (BDQs) are analogous to packet queues, as defined in packetq.uc and tx.uc. BDQs support cached dequeues, and are therefore more efficient when a microengine
dequeues from a small number of queues.
4.7.1BDQ Management Macros
Buffer descriptor queue management macros are used for queueing SRAM buffer descriptors
between microengines.
4.7.1.1Features
FeatureDescription
Arbitrary queue capacity
High water marks (HWMs)
and low water marks (LWMs)
Non-blocking simultaneous
enqueue and dequeue
Empty queue notification
Queues are implemented via a linked list of buffer descriptors in SRAM.
These lists can grow to any size up to a configurable water mark, or the
enqueuing microengine exhausts its supply of available buffers.
The queue handle has settings for LWMs and HWMs to manage queue
length. bdq_enqueue() will reject all enqueues when the queue size is above
the HWM. bdq_enqueue() will reject a handle-specified ratio of the enqueues
when queue length is between LWM and HWM.
If the queue has more than 1 entry, then the dequeuing thread can perform a
"cached deqeueue" where it not only doesn’t contend for the lock on the
queue header, it doesn’t read the queue header at all
The dequeuing threads have the option of sleeping on an inter-thread signal if
the queue is empty.
4.7.1.2Limitations
For the dequeue front of queue to be cached by the dequeuing microengine, a single microengine
must be assigned to dequeue from each queue, and must have three available absolute registers.
Application Note45
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
For the synchronous empty->non-empty queue notification feature to be used, only one
microengine can be assigned to dequeue from each queue. Further, it is optimal when threads on
that microengine dequeue from a single queue rather than from multiple queues.
If the dequeuing thread services multiple queues, it can use packetq_send queues and associated
dequeue code, or the polled scratchpad bit-vector notification mechanism can be added to these
macros. Queue headers must be in SRAM, as these macros do not currently support scratchpad
RAM headers
Figure 34. Buffer Descriptor Queue API
bdq_init() Initialize queue structure.
bdq_enqueue()Enqueue on Back.
bdq_dequeue()Dequeue from Front.
Figure 35. Buffer Descriptor Queue Descriptor Structure (Resides in SRAM)
3130292827262524232221201918171615141312111
0reservedCount
1Back (32 bit address)
2Front (32 bit address)
Figure 36. Buffer Descriptor Queue Structure (Only Relevant Part Shown)
3130292827262524232221201918171615141312111
0overwrittenNext BD Address
4.8Counters
This design uses a counter subsystem wrapper around incrementing scratchpad RAM locations.
The subsystem manages counter names, enabling and disabling counters at compile time, and
pretty printing. Part of the counter subsystem runs on the microengines, and part on the
StrongARM core
counters.uc provides the following microcode API:
• counter_reset()
• counter_inc()
• port_counter_inc()
9876543210
0
9876543210
0
counters.c provides the following API to the Transactor command prompt as well as VxWorks
console (neither macro requires parameters):
• counters_init()
• counters_print()
The counter names are allocated in system_config.h.
• In simulation, counters.c is compiled into the atm_utils.dll Transactor foreign model.
46Application Note
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
• On hardware, counters.c is compiled into the atm_utils.o VxWorks-loadable module to
provide counters at the VxW o rks console.
4.8.1Global Parameters
ParameterDescription
COUNTERS_BASEBase address of the scratchpad counter array (mandatory)
COUNTER_LOCATIONSSize of the counter array (optional). Default is 64
COUNTER_STRINGn
String to print for counter n, where n is from 0 until
COUNTER_LOCATIONS -1 (optional). Default is "Counter n"
4.8.2Use of the Counter Subsystem
In this design, system_config.h controls the counter subsystem and defines a handle for each
counter. This handle provides the parameters to counter_inc() in the microcode. For example,
counter_inc(ATM_RX_CELL_DROP_VC_CLOSED) is invoked in ATM Receive threads every
time a cell is discarded because it arrived on a VC that is not open.
• The flags to determine at compile-time if the counter should be invoked.
4.8.2.1Counter Base Address
The base address of the counter array is defined so that it starts immediately after the per-port
exception counters defined in mem_map.h, and it is used as the first member of every counter
handle. (This is why the counter example in “counters_print()”starts at (decimal) scratchpad
location 195.)
#define COUNTERS_BASE 0xc3
4.8.2.2Counter Index
The index of the counter is simply entered directly into the list of counter handle definitions. Be
careful not to duplicate any counter indexes, because it would cause multiple handles to increment
the same location.
Application Note47
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
4.8.2.3Global Counter Enable and Flags
Global Counter Enable and Flags
COUNTERS_ENABLE_MASK is the global counter enable and is set via a #define statement in
system_config.h:
#define StatementDescription
COUNTERS_ENABLE_MASK 0xFFFFFFFFEnable all counters (default).
COUNTERS_ENABLE_MASK 0Disable all counters.
To enable a counter for a command:
1. Ensure that the COUNTERS_ENABLE_MASK is set to enable.
2. Set the individual command’s IN_ENABLE_FLAGS parameter to match the
COUNTERS_ENABLE_MASK definition.
Counter Flags
The counters are enabled by membership in the “counter groups” enumerated in the table; the
counter groups are enabled by having their corresponding bit set in the
COUNTERS_ENABLE_MASK.
The default COUNTERS_ENABLE_MASK enables all the error counters and disables all the
normal counters in an effort to record abnormal events without a measurable performance impact.
For example, the following definition enables just the cell and packet drop related counters.
in_port_indexPort index.
IN_PORT_BASEBase port number.
IN_EXCEPTION_INDEXT he per-port counter to be incremented.
IN_PORT_COUNTERS_BASEAddress of 0th counter for port 0.
Application Note49
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
ParameterDescription
IN_TOTAL_DISCARDSAddress of global discard counter.
Highest valid port number -- from a per-port counters point of view.
If the sum of IN_PORT_BASE and in_port_index exceeds
IN_MAX_PORT_NUMBER
IN_ENABLE_FLAGS
IN_MAX_PORT_NUMBER, then the port number is truncated to
IN_MAX_PORT_NUMBER. This allows limiting the scratchpad
RAM dedicated to counters while still allowing event counting on
very high numbered ports (e.g., logical ports used by the
StrongARM core)
Counter increment flag. Must match the
COUNTERS_ENAB LE_M ASK bit. If set to
COUNT_PORT_EXCEPTIONS, the global counter at
IN_TOTAL_DISCARDS will be incremented in addition to the perport counter.
#define COUNT_PORT_EVENTS (1 << 11) // normal port activity
#define COUNT_PORT_EXCEPTIONS (1 << 12) // per-port exceptions
The 16 per-port counters are named by various include files, as summarized by the string table that
counters_print() uses to prin t the per-port c ounters:
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
4.8.4counters.c
4.8.4.1counters_init()
Initializes all counters.
4.8.4.2counters_print()
Prints the names and values of all counters.
Example
In this example of output from counters_print(), the system ran the dual-OC-3 software-CRC
configuration overnight with an ATM loop-back cable. All counters were enabled. The first column
is the word’s location in scratchpad RAM, the second column, the number in [] brackets, is the
counter index, the third column is the counter value, and after that starts a string identifying the
counter. At the end we see a few of the per-port counters have incremented as well.
4.9Global $transfer Register Name Manager - xfer.uc
SRAM transfer registers are easily allocated and deallocated by using .local/.endlocal, or by using
the xbuf.uc subsystem, which is based on .local. This works well for read transfer registers, because
the programmer always knows when the read is done, and thus when the read transfer register can
be freed.
However, write transfer registers are a different problem. While it is possible to use the same
mechanism as for read transfer registers, this requires waiting for writes to complete before reusing the write transfer registers, and this wait may impact performance.
An alternative is to not wait for the write to complete, but to infer the completion of writes by their
order before subsequent reads in the ordered SRAM queue. The .local mechanism and xbuf.uc
require strict block structure, and are thus not well suited to write transfer registers becoming
available based on seemingly unrelated events. The question becomes then how to manage the
name space for write transfer registers.
The answer, at leas t fo r som e imp lementations s uch as the ATM receive microengine, is to allocate
transfer registers globally, and to use the new xfer.uc subsystem to help manage the name space.
// Macros to aid in manually allocating transfer registers.
// Essentially wrappers for .xfer_order, .operand_synonym
// that use the pre-processor to do as much assembly-time
// sanity checking as possible.
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
// sram[read, $foo], ordered, ctx_swap
4.10Mutex Vectors
Mutex vectors are an extension to critical sections that allows multiple critical sections to be
contained within a single absolute register . (critsect.uc implements critical sections, critsect macros
are documented in the IXP1200 Macro Library Reference Manual.) Critsect macros are used to
allow only 1 of the 4 threads of a microengine to execute a critical code section at one time. The
critsect macros allow the four threads within a microengine to use a semaphore implemented in an
absolute register. The semaphore is used to restrict use of a resource shared by the threads in a
microengine. The OC-3 Ethernet receiver uses them to prevent multiple threads from enqueuing on
the same transmit queue, while allowing them to concurrently enqueue on different transmit
queues. The mutex vector subsystem is implemented in mutex_vector.uc.
The following critical section macros are for use within a microengine. Up to 32 critical sections
can be implemented with each absolute register. These macros are used where run-time selection
between multiple mutexes is necessary. If only one mutex is needed, the macros in critsect.uc are
slightly smaller and faster.
4.10.1mutex_vector_init()
Initializes critical sections to enable subsequent mutex_vector_enter() to succeed.
mutex_vector_init(out_abs_reg)
ParameterDescription
out_abs_regAbsolute register containing the semaphores.
4.10.2mutex_vector_enter()
Enters the specified microengine critical section.
mutex_vector_enter(io_abs_reg, in_bit_number)
ParameterDescription
out_abs_regAbsolute register containing the semaphores.
bit number of the semaphore
in_bit_number
0 bits: critical section available
1 bits: critical section occupied
init: clears all bits
4.10.3mutex_vector_exit()
Exits the specified microengine critical section.
mutex_vector_exit(io_abs_reg, in_bit_number)
Application Note53
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
ParameterDescription
out_abs_regAbsolute register containing the semaphores.
• Initialization, as detailed in the “Microengine Initialization” section.
• Notification to a BDQ (Buffer Descriptor Queue) dequeue thread that data is available, as
detailed in the BDQ section.
• Within the Ethernet Transmit microengine.
• The StrongARM core signals the Ethernet Transmit microengine to notify it that it has
enqueued packets to send.
5.0Project Configuration / Modifying the Example
Design
The design can be assembled with a variety of options, all of which are configurable in the header
files: project_config.h and system_config.h.
5.1project_config.h
As detailed in the project’s README.txt, shared project source code can be simultaneously
complied and run in a number of different configurations. project_config.h is a small top-level
header file that is copied and modified into those different configurations.
// ATM Wire Rate
#define ATM_OC3_PORTS
// Number of ATM Ports -- OC3 defaults to 4.
// To run on IXD4521 "Rainsford" WAN Card Daughter Card, limit to 2 ports.
#define NUMBER_OF_ATM_PORTS 2
// Define NUMBER_OF_ETHERNET_PORTS to 4 for IXP1200.
// Default is 8, as supported by the IXP1240 version of this project.
#define NUMBER_OF_ETHERNET_PORTS 4
// Define SW_CRC_RX to enable CRC-32 checking via microcode table lookup.
// Project build must also load the appropriate threads.
#define SW_CRC_RX
// Define SW_CRC_TX to enable CRC-32 checking via microcode table lookup.
// Project build must also load the appropriate threads.
#define SW_CRC_TX
54Application Note
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
// Define DEBUG to enable all the counters and run-time checking.
// Disable for maximum performance.
// #define DEBUG
// Define COUNTERS_ENABLE_MASK to all 1’s to enable every system counter.
// Otherwise its default is set in system_config.h
// #define COUNTERS_ENABLE_MASK0xFFFFFFFF
// Define ATM_LOOPBACK to allow hardware configurations with ATM outputs
// connected directly to ATM inputs -- either via board loopback jumper
// or external loopback cable. Normally the design would discard
// an IP packet received on ATM with an IP destination on an ATM port.
// ATM_LOOPBACK simply forwards it to the next ethernet port.
#define ATM_LOOPBACK
// Define ETHERNET_LOOPBACK to allow routing packets from Ethernet
// Receive to Ethernet Transmit. Otherwise packets received on
// Ethernet ports with Ethernet destinations will be discarded.
// Useful for equipment check-out in the lab.
// #define ETHERNET_LOOPBACK
// Define RFC1812 to enable all the required router tests under spec RFC1812
// on ethernet to ethernet and ATM to ethernet traffic.
#define RFC1812
5.2system_config.h
The system_config.h header file is used to define ATM headers, counters, and other settings. The
project’s README.txt file should be consulted for more detail.
5.3Switching Between Hardware Configurations
As detailed in the README.txt file, the project source code comes with three sub-projects, one for
each of the configurations shown above. All of the project source code is shared by the three
projects, except for the three files that are necessary to distinguish the hardware configurations atm_ether.dwp, atm_ether.dwo, and project_config.h. Additional projects can be built from the
same source tree by simply copying and modifying the closest sub-project and its three unique
files.
The software-CRC configuration can run on any version of the IXP12xx hardware. However , the
hardware-CRC configurations depend on the IXP1240 or greater (CHIP_ID >= 6). OC-3 and OC12 configurations require different versions of the WAN daughter card (the OC-12 requires a
modified OC-3 daughter card).
6.0Testing Environments
In simulation, this project was tested with IXA SDK V2.01 Development Environment on
Windows 2000. On hardware, it has been tested with VxWorks Tornado 2.1, on the IXDP1240
Advanced Development Platform.
Application Note55
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
7.0Simulation Support (Scripts, etc.)
Simulation support for this example design is provided by using a comb ination of the Foreign
Model DLLs (libraries linked to the Transactor simulator), with interpreted T ransactor scripts (.ind
files).
The IP Route Table Manager and associated RFC1812 utilities are implemented in the rtm_dll.dll
foreign model. The ATM VC table manage r and asso ciated utilities are implemented in the
atm_utils.dll foreign model. Entry points in these DLLs, such as route_populate() and atm_init()
are called from the atm_ether_init.ind Transactor script upon initialization. DLL entry points are
also available from the Transactor command line interface. The same utilities are compiled into the
atm_utils.o VxWorks kernel module, and are thus available at the VxWorks command prompt.
Some simple C programs are also provided to check the Developer’s Workbench output files for
correct output data (i.e. CRC verification for PDUs; and integrity of output stream). See the
README.txt file for more details.
8.0Limitations
This design supports the entire ATM VC name space. However, the implementation has 16K
buffers, and thus can support simultaneous reassembly of no more than 16K PDUs. The buffer
limitation comes from two sources.
• The fixed-length 2KB DRAM buffers must fit in p hysical memory. 16K 2KB buffers consume
32MB of DRAM.
• The Ethernet Transmit Packetq implementation can address only 16K buffer descriptors.
9.0Extending the Example Design
This example design shows how microcode handles "fast-path" data-plane processing. It queues
exception packets to the StrongARM core where they are simply discarded. Customers can supply
their own software running on the StrongARM core to process these packets.
• This design supports only AAL5. The ATM receiver with its VC table, and the ATM
Transmitter could be modified to support other AALs.
• This design does not support ATM traffic shaping. However, this code could be applied to
other configurations where threads are dedicated to traffic shaping.
• This design does not support ATM receive policing, but the ATM receiver could be enhanced
to do so.
• Switched Virtual Circuits (SVCs) are not implemented, only Permanent Virtual Circuits
(PVCs) are currently implemented.
56Application Note
Modified on: 3/20/02,
IXP1200 Network Processor Family ATM OC-3/12/Ethernet IP Router Example Design
10.0Document Conventions
In illustrations of 32-bit registers, or data structures in memory; smaller addresses appear toward
the top of the figure, - as they would appear in a memory dump on the screen. Bit positions are
numbered from the right to the left.
Figure 37. Illustration of Array of 32-bit Words
bits
address
address
n+1
address
n+2
3130292827262524232221201918171615141312111
n
Byte 0Byte 1Byte 2Byte 3
Byte 4Byte 5Byte 6Byte 7
Byte 8Byte 9Byte 10Byte 11
Bytes are numbered from left to right as shown in the array in Figure 37, as well as in the example
byte sequence inFigure 38. Bytes of a word are numbered starting at the most significant byte.
There are two README.txt files. One is in the atm_ether project source
directory, and is a "Quick Start and Source Code Guide." The second
README.txt file can be found in the vxworks subdirectory, and describes
how to run the project on hardware.
58Application Note
Modified on: 3/20/02,
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.