Intel IXP2800 User Manual

Intel® IXP2800 Network Processor
Hardware Reference Manual
August 2004
Order Number: 278882-010
Date Revision Description
March 2002 001 First release for IXP2800 Customer Information Book V 0.4
May 2002 002 Update for the IXA SDK 3.0 release.
August 2002 003 Update for the IXA SDK 3.0 Pre-Release 4.
November 2002 004 Update for the IXA SDK 3.0 Pre-Release 5.
May 2003 005 Update for the IXA SDK 3.1 Alpha Release
September 2003 006 Update for the IXA SDK 3.5 Pre-Release 1
October 2003 007
January 2004 008 Updated for new trademark usage: Intel XScale
May 2004 009
August 2004 010 Preparation for web posting.
Added information about Receiver and Transmitter Interoperation with Framers and Switch Fabrics.
®
technology.
Updated Sections 6.5.2, 8.5.2.2, 9.2.2.1, 9.3.1, 9.3.3.2,
9.5.1.4, 9.5.3.4, and 10.3.1. Updated Figure 123 and Timing Diagrams in Figures 43, 44,
46, 47, 50, 51, 54, and 55. Added Chapter 11, “Performance Monitor Unit”.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT.
Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights.
Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications.
Intel may make changes to specifications and product descriptions at any time, without notice.
Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.
The IXP2800 Network Processor may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Except as permitted by such license, no part of this document may be reproduced, stored in a retrieval system, or transmitted in any form or by any means without the express written consent of Intel Corporation.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an ordering number and are referenced in this document, or other Intel literature may be obtained by calling 1-800-548-4725 or by visiting Intel's website at http://www.intel.com.
Intel and XScale are registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2004, Intel Corporation.
2 Hardware Reference Manual

Contents

Contents
1 Introduction.................................................................................................................................. 25
1.1 About This Document ......................................................................................................... 25
1.2 Related Documentation ...................................................................................................... 25
1.3 Terminology ........................................................................................................................26
2 Technical Description ................................................................................................................. 27
2.1 Overview............................................................................................................................. 27
2.2 Intel XScale
2.2.1 ARM* Compatibility................................................................................................30
2.2.2 Features................................................................................................................. 30
2.3 Microengines ......................................................................................................................33
2.3.1 Microengine Bus Arrangement ..............................................................................35
2.3.2 Control Store.......................................................................................................... 35
2.3.3 Contexts.................................................................................................................35
2.3.4 Datapath Registers ................................................................................................ 37
2.3.5 Addressing Modes ................................................................................................. 41
2.3.6 Local CSRs............................................................................................................ 43
2.3.7 Execution Datapath ............................................................................................... 43
2.3.8 CRC Unit................................................................................................................ 48
2.3.9 Event Signals......................................................................................................... 49
2.4 DRAM .................................................................................................................................50
2.4.1 Size Configuration ................................................................................................. 50
2.4.2 Read and Write Access ......................................................................................... 51
2.5 SRAM .................................................................................................................................51
2.5.1 QDR Clocking Scheme ..........................................................................................52
2.5.2 SRAM Controller Configurations............................................................................52
2.5.3 SRAM Atomic Operations ......................................................................................53
2.5.4 Queue Data Structure Commands ........................................................................54
2.5.5 Reference Ordering ............................................................................................... 54
®
Core Microarchitecture ................................................................................. 30
2.2.2.1 Multiply/Accumulate (MAC).................................................................... 30
2.2.2.2 Memory Management ............................................................................ 30
2.2.2.3 Instruction Cache ................................................................................... 30
2.2.2.4 Branch Target Buffer..............................................................................31
2.2.2.5 Data Cache ............................................................................................31
2.2.2.6 Interrupt Controller ................................................................................. 31
2.2.2.7 Address Map.......................................................................................... 32
2.3.4.1 General-Purpose Registers (GPRs) ......................................................37
2.3.4.2 Transfer Registers ................................................................................. 37
2.3.4.3 Next Neighbor Registers........................................................................38
2.3.4.4 Local Memory ....................................................................................... 39
2.3.5.1 Context-Relative Addressing Mode .......................................................41
2.3.5.2 Absolute Addressing Mode .................................................................... 42
2.3.5.3 Indexed Addressing Mode ..................................................................... 42
2.3.7.1 Byte Align............................................................................................... 43
2.3.7.2 CAM ....................................................................................................... 45
2.5.5.1 Reference Order Tables ........................................................................ 54
2.5.5.2 Microengine Software Restrictions to Maintain Ordering.......................56
Hardware Reference Manual 3
Contents
2.6 Scratchpad Memory............................................................................................................56
2.6.1 Scratchpad Atomic Operations .............................................................................. 57
2.6.2 Ring Commands .................................................................................................... 57
2.7 Media and Switch Fabric Interface ..................................................................................... 59
2.7.1 SPI-4......................................................................................................................60
2.7.2 CSIX ...................................................................................................................... 61
2.7.3 Receive.................................................................................................................. 61
2.7.3.1 RBUF ..................................................................................................... 62
2.7.3.1.1 SPI-4 and the RBUF .............................................................. 62
2.7.3.1.2 CSIX and RBUF..................................................................... 63
2.7.3.2 Full Element List .................................................................................... 63
2.7.3.3 RX_THREAD_FREELIST...................................................................... 63
2.7.3.4 Receive Operation Summary................................................................. 64
2.7.4 Transmit................................................................................................................. 65
2.7.4.1 TBUF...................................................................................................... 65
2.7.4.1.1 SPI-4 and TBUF..................................................................... 66
2.7.4.1.2 CSIX and TBUF ..................................................................... 67
2.7.4.2 Transmit Operation Summary................................................................ 67
2.7.5 The Flow Control Interface .................................................................................... 68
2.7.5.1 SPI-4...................................................................................................... 68
2.7.5.2 CSIX....................................................................................................... 68
2.8 Hash Unit............................................................................................................................ 69
2.9 PCI Controller ..................................................................................................................... 71
2.9.1 Target Access........................................................................................................ 71
2.9.2 Master Access ....................................................................................................... 71
2.9.3 DMA Channels....................................................................................................... 71
2.9.3.1 DMA Descriptor...................................................................................... 72
2.9.3.2 DMA Channel Operation........................................................................ 73
2.9.3.3 DMA Channel End Operation ................................................................ 74
2.9.3.4 Adding Descriptors to an Unterminated Chain....................................... 74
2.9.4 Mailbox and Message Registers............................................................................ 74
2.9.5 PCI Arbiter ............................................................................................................. 75
2.10 Control and Status Register Access Proxy......................................................................... 76
2.11 Intel XScale
®
Core Peripherals .......................................................................................... 76
2.11.1 Interrupt Controller................................................................................................. 76
2.11.2 Timers....................................................................................................................77
2.11.3 General Purpose I/O.............................................................................................. 77
2.11.4 Universal Asynchronous Receiver/Transmitter...................................................... 77
2.11.5 Slowport................................................................................................................. 77
2.12 I/O Latency ......................................................................................................................... 78
2.13 Performance Monitor .......................................................................................................... 78
3 Intel XScale
®
Core ....................................................................................................................... 79
3.1 Introduction ......................................................................................................................... 79
3.2 Features.............................................................................................................................. 80
3.2.1 Multiply/ACcumulate (MAC)................................................................................... 80
3.2.2 Memory Management............................................................................................ 80
3.2.3 Instruction Cache................................................................................................... 81
3.2.4 Branch Target Buffer (BTB) ................................................................................... 81
3.2.5 Data Cache............................................................................................................ 81
3.2.6 Performance Monitoring ........................................................................................ 81
4 Hardware Reference Manual
Contents
3.2.7 Power Management...............................................................................................81
3.2.8 Debugging ............................................................................................................. 81
3.2.9 JTAG...................................................................................................................... 81
3.3 Memory Management.........................................................................................................82
3.3.1 Architecture Model ................................................................................................. 82
3.3.1.1 Version 4 versus Version 5 .................................................................... 82
3.3.1.2 Memory Attributes.................................................................................. 82
3.3.1.2.1 Page (P) Attribute Bit ............................................................. 82
3.3.1.2.2 Instruction Cache ................................................................... 83
3.3.1.2.3 Data Cache and Write Buffer .................................................83
3.3.1.2.4 Details on Data Cache and Write Buffer Behavior................. 83
3.3.1.2.5 Memory Operation Ordering ..................................................84
3.3.2 Exceptions .............................................................................................................84
3.3.3 Interaction of the MMU, Instruction Cache, and Data Cache.................................85
3.3.4 Control ...................................................................................................................85
3.3.4.1 Invalidate (Flush) Operation...................................................................85
3.3.4.2 Enabling/Disabling ................................................................................. 85
3.3.4.3 Locking Entries ...................................................................................... 86
3.3.4.4 Round-Robin Replacement Algorithm ...................................................87
3.4 Instruction Cache................................................................................................................ 88
3.4.1 Instruction Cache Operation .................................................................................. 89
3.4.1.1 Operation when Instruction Cache is Enabled.......................................89
3.4.1.2 Operation when Instruction Cache is Disabled ...................................... 90
3.4.1.3 Fetch Policy ........................................................................................... 90
3.4.1.4 Round-Robin Replacement Algorithm ...................................................90
3.4.1.5 Parity Protection..................................................................................... 91
3.4.1.6 Instruction Cache Coherency.................................................................91
3.4.2 Instruction Cache Control ...................................................................................... 92
3.4.2.1 Instruction Cache State at Reset ...........................................................92
3.4.2.2 Enabling/Disabling ................................................................................. 92
3.4.2.3 Invalidating the Instruction Cache.......................................................... 92
3.4.2.4 Locking Instructions in the Instruction Cache ........................................ 92
3.4.2.5 Unlocking Instructions in the Instruction Cache .....................................94
3.5 Branch Target Buffer (BTB) ................................................................................................ 94
3.5.1 Branch Target Buffer Operation.............................................................................94
3.5.1.1 Reset......................................................................................................95
3.5.2 Update Policy......................................................................................................... 96
3.5.3 BTB Control ...........................................................................................................96
3.5.3.1 Disabling/Enabling ................................................................................. 96
3.5.3.2 Invalidation............................................................................................. 96
3.6 Data Cache.........................................................................................................................96
3.6.1 Overviews ..............................................................................................................97
3.6.1.1 Data Cache Overview ............................................................................ 97
3.6.1.2 Mini-Data Cache Overview ....................................................................98
3.6.1.3 Write Buffer and Fill Buffer Overview..................................................... 99
3.6.2 Data Cache and Mini-Data Cache Operation ........................................................ 99
3.6.2.1 Operation when Caching is Enabled...................................................... 99
3.6.2.2 Operation when Data Caching is Disabled ............................................99
3.6.2.3 Cache Policies ..................................................................................... 100
3.6.2.3.1 Cacheability ......................................................................... 100
3.6.2.3.2 Read Miss Policy .................................................................100
3.6.2.3.3 Write Miss Policy..................................................................101
Hardware Reference Manual 5
Contents
3.6.2.3.4 Write-Back versus Write-Through........................................ 101
3.6.2.4 Round-Robin Replacement Algorithm ................................................. 102
3.6.2.5 Parity Protection................................................................................... 102
3.6.2.6 Atomic Accesses.................................................................................. 102
3.6.3 Data Cache and Mini-Data Cache Control .......................................................... 103
3.6.3.1 Data Memory State After Reset........................................................... 103
3.6.3.2 Enabling/Disabling ............................................................................... 103
3.6.3.3 Invalidate and Clean Operations.......................................................... 103
3.6.3.3.1 Global Clean and Invalidate Operation ................................ 104
3.6.4 Reconfiguring the Data Cache as Data RAM ...................................................... 105
3.6.5 Write Buffer/Fill Buffer Operation and Control ..................................................... 106
3.7 Configuration .................................................................................................................... 106
3.8 Performance Monitoring ................................................................................................... 107
3.8.1 Performance Monitoring Events .......................................................................... 107
3.8.1.1 Instruction Cache Efficiency Mode....................................................... 108
3.8.1.2 Data Cache Efficiency Mode................................................................ 109
3.8.1.3 Instruction Fetch Latency Mode........................................................... 109
3.8.1.4 Data/Bus Request Buffer Full Mode .................................................... 109
3.8.1.5 Stall/Writeback Statistics...................................................................... 110
3.8.1.6 Instruction TLB Efficiency Mode .......................................................... 111
3.8.1.7 Data TLB Efficiency Mode ................................................................... 111
3.8.2 Multiple Performance Monitoring Run Statistics .................................................. 111
3.9 Performance Considerations ............................................................................................ 111
3.9.1 Interrupt Latency.................................................................................................. 112
3.9.2 Branch Prediction ................................................................................................ 112
3.9.3 Addressing Modes ............................................................................................... 113
3.9.4 Instruction Latencies............................................................................................ 113
3.9.4.1 Performance Terms ............................................................................. 113
3.9.4.2 Branch Instruction Timings .................................................................. 115
3.9.4.3 Data Processing Instruction Timings ................................................... 115
3.9.4.4 Multiply Instruction Timings.................................................................. 116
3.9.4.5 Saturated Arithmetic Instructions......................................................... 117
3.9.4.6 Status Register Access Instructions .................................................... 118
3.9.4.7 Load/Store Instructions........................................................................ 118
3.9.4.8 Semaphore Instructions....................................................................... 118
3.9.4.9 Coprocessor Instructions ..................................................................... 119
3.9.4.10 Miscellaneous Instruction Timing......................................................... 119
3.9.4.11 Thumb Instructions .............................................................................. 119
3.10 Test Features....................................................................................................................119
3.10.1 IXP2800 Network Processor Endianness............................................................ 120
3.10.1.1 Read and Write Transactions Initiated by the Intel XScale
3.10.1.1.1 Reads Initiated by the Intel XScale® Core ........................ 121
3.10.1.1.2 The Intel XScale
®
Core Writing to the IXP2800
®
Network Processor .................................................................. 123
3.11 Intel XScale® Core Gasket Unit ....................................................................................... 125
3.11.1 Overview.............................................................................................................. 125
3.11.2 Intel XScale® Core Gasket Functional Description ............................................. 127
3.11.2.1 Command Memory Bus to Command Push/Pull Conversion .............. 127
3.11.3 CAM Operation .................................................................................................... 127
3.11.4 Atomic Operations ............................................................................................... 128
3.11.4.1 Summary of Rules for the Atomic Command Regarding I/O ............... 129
3.11.4.2 Intel XScale® Core Access to SRAM Q-Array..................................... 129
Core ...... 121
6 Hardware Reference Manual
Contents
3.11.5 I/O Transaction ....................................................................................................130
3.11.6 Hash Access ........................................................................................................ 130
3.11.7 Gasket Local CSR ...............................................................................................131
3.11.8 Interrupt ...............................................................................................................132
3.12 Intel XScale® Core Peripheral Interface........................................................................... 134
3.12.1 XPI Overview .......................................................................................................134
3.12.1.1 Data Transfers ..................................................................................... 135
3.12.1.2 Data Alignment .................................................................................... 135
3.12.1.3 Address Spaces for XPI Internal Devices ............................................ 136
3.12.2 UART Overview ...................................................................................................137
3.12.3 UART Operation .................................................................................................. 138
3.12.3.1 UART FIFO OPERATION....................................................................138
3.12.3.1.1 UART FIFO Interrupt Mode Operation –
Receiver Interrupt .................................................................... 138
3.12.3.1.2 FIFO Polled Mode Operation ............................................. 139
3.12.4 Baud Rate Generator...........................................................................................139
3.12.5 General Purpose I/O (GPIO) ............................................................................... 140
3.12.6 Timers.................................................................................................................. 141
3.12.6.1 Timer Operation ...................................................................................141
3.12.7 Slowport Unit ....................................................................................................... 142
3.12.7.1 PROM Device Support.........................................................................143
3.12.7.2 Microprocessor Interface Support for the Framer ................................ 143
3.12.7.3 Slowport Unit Interfaces.......................................................................144
3.12.7.4 Address Space.....................................................................................145
3.12.7.5 Slowport Interfacing Topology ............................................................. 145
3.12.7.6 Slowport 8-Bit Device Bus Protocols ...................................................146
3.12.7.6.1 Mode 0 Single Write Transfer for Fixed-Timed Device ......147
3.12.7.6.2 Mode 0 Single Write Transfer for Self-Timing Device........ 148
3.12.7.6.3 Mode 0 Single Read Transfer for Fixed-Timed Device...... 149
3.12.7.6.4 Single Read Transfer for a Self-Timing Device..................150
3.12.7.7 SONET/SDH Microprocessor Access Support ....................................150
3.12.7.7.1 Mode 1: 16-Bit Microprocessor Interface Support with
16-Bit Address Lines................................................................151
3.12.7.7.2 Mode 2: Interface with 8 Data Bits and 11 Address Bits ....155
3.12.7.7.3 Mode 3: Support for the Intel and AMCC* 2488 Mbps
SONET/SDH Microprocessor Interface ...................................157
4Microengines............................................................................................................................. 167
4.1 Overview........................................................................................................................... 167
4.1.1 Control Store........................................................................................................ 169
4.1.2 Contexts...............................................................................................................169
4.1.3 Datapath Registers .............................................................................................. 171
4.1.3.1 General-Purpose Registers (GPRs) ....................................................171
4.1.3.2 Transfer Registers ............................................................................... 171
4.1.3.3 Next Neighbor Registers......................................................................172
4.1.3.4 Local Memory ...................................................................................... 172
4.1.4 Addressing Modes ............................................................................................... 173
4.1.4.1 Context-Relative Addressing Mode .....................................................173
4.1.4.2 Absolute Addressing Mode .................................................................. 174
4.1.4.3 Indexed Addressing Mode ...................................................................174
4.2 Local CSRs....................................................................................................................... 174
4.3 Execution Datapath .......................................................................................................... 174
Hardware Reference Manual 7
Contents
4.3.1 Byte Align............................................................................................................. 174
4.3.2 CAM..................................................................................................................... 176
4.4 CRC Unit........................................................................................................................... 179
4.5 Event Signals.................................................................................................................... 180
4.5.1 Microengine Endianness ..................................................................................... 181
4.5.1.1 Read from RBUF (64 Bits)................................................................... 181
4.5.1.2 Write to TBUF ...................................................................................... 182
4.5.1.3 Read/Write from/to SRAM ................................................................... 182
4.5.1.4 Read/Write from/to DRAM ................................................................... 182
4.5.1.5 Read/Write from/to SHaC and Other CSRs......................................... 182
4.5.1.6 Write to Hash Unit................................................................................ 183
4.5.2 Media Access ...................................................................................................... 183
4.5.2.1 Read from RBUF ................................................................................. 184
4.5.2.2 Write to TBUF ...................................................................................... 185
4.5.2.3 TBUF to SPI-4 Transfer ....................................................................... 186
5 DRAM.......................................................................................................................................... 187
5.1 Overview........................................................................................................................... 187
5.2 Size Configuration ............................................................................................................188
5.3 DRAM Clocking ................................................................................................................189
5.4 Bank Policy ....................................................................................................................... 190
5.5 Interleaving ....................................................................................................................... 191
5.5.1 Three Channels Active (3-Way Interleave).......................................................... 191
5.5.2 Two Channels Active (2-Way Interleave) ............................................................ 193
5.5.3 One Channel Active (No Interleave) .................................................................... 193
5.5.4 Interleaving Across RDRAMs and Banks ............................................................ 194
5.6 Parity and ECC................................................................................................................. 194
5.6.1 Parity and ECC Disabled ..................................................................................... 194
5.6.2 Parity Enabled ..................................................................................................... 195
5.6.3 ECC Enabled ....................................................................................................... 195
5.6.4 ECC Calculation and Syndrome .......................................................................... 196
5.7 Timing Configuration.........................................................................................................196
5.8 Microengine Signals .........................................................................................................197
5.9 Serial Port......................................................................................................................... 197
5.10 RDRAM Controller Block Diagram.................................................................................... 198
5.10.1 Commands .......................................................................................................... 199
5.10.2 DRAM Write......................................................................................................... 199
5.10.2.1 Masked Write....................................................................................... 199
5.10.3 DRAM Read......................................................................................................... 200
5.10.4 CSR Write............................................................................................................ 200
5.10.5 CSR Read............................................................................................................ 200
5.10.6 Arbitration ............................................................................................................ 201
5.10.7 Reference Ordering ............................................................................................. 201
5.11 DRAM Push/Pull Arbiter ................................................................................................... 201
5.11.1 Arbiter Push/Pull Operation ................................................................................. 202
5.11.2 DRAM Push Arbiter Description .......................................................................... 203
5.12 DRAM Pull Arbiter Description.......................................................................................... 204
6 SRAM Interface.......................................................................................................................... 207
6.1 Overview........................................................................................................................... 207
6.2 SRAM Interface Configurations ........................................................................................ 208
8 Hardware Reference Manual
Contents
6.2.1 Internal Interface..................................................................................................209
6.2.2 Number of Channels............................................................................................209
6.2.3 Coprocessor and/or SRAMs Attached to a Channel............................................ 209
6.3 SRAM Controller Configurations.......................................................................................209
6.4 Command Overview .........................................................................................................211
6.4.1 Basic Read/Write Commands.............................................................................. 211
6.4.2 Atomic Operations ............................................................................................... 211
6.4.3 Queue Data Structure Commands ......................................................................213
6.4.3.1 Read_Q_Descriptor Commands.......................................................... 216
6.4.3.2 Write_Q_Descriptor Commands ..........................................................216
6.4.3.3 ENQ and DEQ Commands .................................................................. 217
6.4.4 Ring Data Structure Commands.......................................................................... 217
6.4.5 Journaling Commands......................................................................................... 217
6.4.6 CSR Accesses ..................................................................................................... 217
6.5 Parity................................................................................................................................. 217
6.6 Address Map..................................................................................................................... 218
6.7 Reference Ordering .......................................................................................................... 219
6.7.1 Reference Order Tables ...................................................................................... 219
6.7.2 Microcode Restrictions to Maintain Ordering ....................................................... 220
6.8 Coprocessor Mode ........................................................................................................... 221
7 SHaC — Unit Expansion ...........................................................................................................225
7.1 Overview........................................................................................................................... 225
7.1.1 SHaC Unit Block Diagram.................................................................................... 225
7.1.2 Scratchpad........................................................................................................... 227
7.1.2.1 Scratchpad Description........................................................................227
7.1.2.2 Scratchpad Interface............................................................................229
7.1.2.2.1 Command Interface .............................................................229
7.1.2.2.2 Push/Pull Interface...............................................................229
7.1.2.2.3 CSR Bus Interface ............................................................... 229
7.1.2.2.4 Advanced Peripherals Bus Interface (APB) ......................... 229
7.1.2.3 Scratchpad Block Level Diagram......................................................... 229
7.1.2.3.1 Scratchpad Commands .......................................................230
7.1.2.3.2 Ring Commands ..................................................................231
7.1.2.3.3 Clocks and Reset.................................................................235
7.1.2.3.4 Reset Registers ...................................................................235
7.1.3 Hash Unit .............................................................................................................236
7.1.3.1 Hashing Operation ............................................................................... 237
7.1.3.2 Hash Algorithm .................................................................................... 239
8 Media and Switch Fabric Interface...........................................................................................241
8.1 Overview........................................................................................................................... 241
8.1.1 SPI-4.................................................................................................................... 243
8.1.2 CSIX ....................................................................................................................246
8.1.3 CSIX/SPI-4 Interleave Mode................................................................................246
8.2 Receive............................................................................................................................. 247
8.2.1 Receive Pins........................................................................................................248
8.2.2 RBUF ...................................................................................................................248
8.2.2.1 SPI-4 .................................................................................................... 250
8.2.2.2 CSIX.....................................................................................................253
8.2.3 Full Element List .................................................................................................. 255
8.2.4 Rx_Thread_Freelist_# ......................................................................................... 255
Hardware Reference Manual 9
Contents
8.2.5 Rx_Thread_Freelist_Timeout_# .......................................................................... 256
8.2.6 Receive Operation Summary............................................................................... 256
8.2.7 Receive Flow Control Status ............................................................................... 258
8.2.7.1 SPI-4.................................................................................................... 258
8.2.7.2 CSIX..................................................................................................... 259
8.2.7.2.1 Link-Level............................................................................. 259
8.2.7.2.2 Virtual Output Queue ........................................................... 260
8.2.8 Parity.................................................................................................................... 260
8.2.8.1 SPI-4.................................................................................................... 260
8.2.8.2 CSIX..................................................................................................... 261
8.2.8.2.1 Horizontal Parity................................................................... 261
8.2.8.2.2 Vertical Parity....................................................................... 261
8.2.9 Error Cases.......................................................................................................... 261
8.3 Transmit............................................................................................................................ 262
8.3.1 Transmit Pins....................................................................................................... 262
8.3.2 TBUF ................................................................................................................... 263
8.3.2.1 SPI-4.................................................................................................... 266
8.3.2.2 CSIX..................................................................................................... 267
8.3.3 Transmit Operation Summary.............................................................................. 268
8.3.3.1 SPI-4.................................................................................................... 268
8.3.3.2 CSIX..................................................................................................... 269
8.3.3.3 Transmit Summary............................................................................... 270
8.3.4 Transmit Flow Control Status .............................................................................. 270
8.3.4.1 SPI-4.................................................................................................... 271
8.3.4.2 CSIX..................................................................................................... 273
8.3.4.2.1 Link-Level............................................................................. 273
8.3.4.2.2 Virtual Output Queue ........................................................... 273
8.3.5 Parity.................................................................................................................... 273
8.3.5.1 SPI-4.................................................................................................... 273
8.3.5.2 CSIX..................................................................................................... 274
8.3.5.2.1 Horizontal Parity................................................................... 274
8.3.5.2.2 Vertical Parity....................................................................... 274
8.4 RBUF and TBUF Summary .............................................................................................. 274
8.5 CSIX Flow Control Interface ............................................................................................. 275
8.5.1 TXCSRB and RXCSRB Signals .......................................................................... 275
8.5.2 FCIFIFO and FCEFIFO Buffers ........................................................................... 276
8.5.2.1 Full Duplex CSIX.................................................................................. 277
8.5.2.2 Simplex CSIX....................................................................................... 278
8.5.3 TXCDAT/RXCDAT, TXCSOF/RXCSOF, TXCPAR/RXCPAR,
and TXCFC/RXCFC Signals................................................................................ 280
8.6 Deskew and Training ........................................................................................................ 280
8.6.1 Data Training Pattern........................................................................................... 282
8.6.2 Flow Control Training Pattern .............................................................................. 282
8.6.3 Use of Dynamic Training ..................................................................................... 283
8.7 CSIX Startup Sequence.................................................................................................... 287
8.7.1 CSIX Full Duplex ................................................................................................. 287
8.7.1.1 Ingress IXP2800 Network Processor................................................... 287
8.7.1.2 Egress IXP2800 Network Processor.................................................... 287
8.7.1.3 Single IXP2800 Network Processor..................................................... 288
8.7.2 CSIX Simplex....................................................................................................... 288
8.7.2.1 Ingress IXP2800 Network Processor................................................... 288
8.7.2.2 Egress IXP2800 Network Processor.................................................... 289
10 Hardware Reference Manual
Contents
8.7.2.3 Single IXP2800 Network Processor.....................................................289
8.8 Interface to Command and Push and Pull Buses ............................................................. 290
8.8.1 RBUF or MSF CSR to Microengine S_TRANSFER_IN Register for Instruction:.291
8.8.2 Microengine S_TRANSFER_OUT Register to TBUF or
MSF CSR for Instruction:..................................................................................... 291
8.8.3 Microengine to MSF CSR for Instruction: ............................................................ 291
8.8.4 From RBUF to DRAM for Instruction: .................................................................. 291
8.8.5 From DRAM to TBUF for Instruction:................................................................... 292
8.9 Receiver and Transmitter Interoperation with Framers and Switch Fabrics .....................292
8.9.1 Receiver and Transmitter Configurations ............................................................293
8.9.1.1 Simplex Configuration..........................................................................293
8.9.1.2 Hybrid Simplex Configuration ..............................................................294
8.9.1.3 Dual Network Processor Full Duplex Configuration ............................. 295
8.9.1.4 Single Network Processor Full Duplex Configuration (SPI-4.2)........... 296
8.9.1.5 Single Network Processor, Full Duplex Configuration
(SPI-4.2 and CSIX-L1) .........................................................................297
8.9.2 System Configurations.........................................................................................297
8.9.2.1 Framer, Single Network Processor Ingress and Egress, and
Fabric Interface Chip............................................................................ 298
8.9.2.2 Framer, Dual Network Processor Ingress, Single
Network Processor Egress, and Fabric Interface Chip ........................298
8.9.2.3 Framer, Single Network Processor Ingress and Egress, and
CSIX-L1 Chips for Translation and Fabric Interface ............................299
8.9.2.4 CPU Complex, Network Processor, and Fabric Interface Chip ...........299
8.9.2.5 Framer, Single Network Processor, Co-Processor, and
Fabric Interface Chip............................................................................ 300
8.9.3 SPI-4.2 Support ................................................................................................... 301
8.9.3.1 SPI-4.2 Receiver..................................................................................301
8.9.3.2 SPI-4.2 Transmitter..............................................................................302
8.9.4 CSIX-L1 Protocol Support ...................................................................................303
8.9.4.1 CSIX-L1 Interface Reference Model: Traffic Manager and Fabric
Interface Chip.......................................................................................303
8.9.4.2 Intel® IXP2800 Support of the CSIX-L1 Protocol ................................304
8.9.4.2.1 Mapping to 16-Bit Wide DDR LVDS .................................... 304
8.9.4.2.2 Support for Dual Chip, Full-Duplex Operation ..................... 305
8.9.4.2.3 Support for Simplex Operation............................................. 306
8.9.4.2.4 Support for Hybrid Simplex Operation .................................307
8.9.4.2.5 Support for Dynamic De-Skew Training...............................308
8.9.4.3 CSIX-L1 Protocol Receiver Support ....................................................309
8.9.4.4 CSIX-L1 Protocol Transmitter Support ................................................310
8.9.4.5 Implementation of a Bridge Chip to CSIX-L1 .......................................311
8.9.5 Dual Protocol (SPI and CSIX-L1) Support ...........................................................312
8.9.5.1 Dual Protocol Receiver Support...........................................................312
8.9.5.2 Dual Protocol Transmitter Support....................................................... 312
8.9.5.3 Implementation of a Bridge Chip to CSIX-L1 and SPI-4.2 ................... 313
8.9.6 Transmit State Machine .......................................................................................314
8.9.6.1 SPI-4.2 Transmitter State Machine...................................................... 314
8.9.6.2 Training Transmitter State Machine..................................................... 315
8.9.6.3 CSIX-L1 Transmitter State Machine .................................................... 315
8.9.7 Dynamic De-Skew ...............................................................................................316
8.9.8 Summary of Receiver and Transmitter Signals ...................................................317
Hardware Reference Manual 11
Contents
9 PCI Unit....................................................................................................................................... 319
9.1 Overview........................................................................................................................... 319
9.2 PCI Pin Protocol Interface Block....................................................................................... 321
9.2.1 PCI Commands ................................................................................................... 322
9.2.2 IXP2800 Network Processor Initialization............................................................ 323
9.2.2.1 Initialization by the Intel XScale® Core................................................ 324
9.2.2.2 Initialization by a PCI Host................................................................... 324
9.2.3 PCI Type 0 Configuration Cycles......................................................................... 325
9.2.3.1 Configuration Write .............................................................................. 325
9.2.3.2 Configuration Read.............................................................................. 325
9.2.4 PCI 64-Bit Bus Extension .................................................................................... 325
9.2.5 PCI Target Cycles................................................................................................ 326
9.2.5.1 PCI Accesses to CSR.......................................................................... 326
9.2.5.2 PCI Accesses to DRAM....................................................................... 326
9.2.5.3 PCI Accesses to SRAM ....................................................................... 326
9.2.5.4 Target Write Accesses from the PCI Bus ............................................ 326
9.2.5.5 Target Read Accesses from the PCI Bus ............................................ 327
9.2.6 PCI Initiator Transactions .................................................................................... 327
9.2.6.1 PCI Request Operation........................................................................ 327
9.2.6.2 PCI Commands.................................................................................... 328
9.2.6.3 Initiator Write Transactions .................................................................. 328
9.2.6.4 Initiator Read Transactions.................................................................. 328
9.2.6.5 Initiator Latency Timer ......................................................................... 328
9.2.6.6 Special Cycle ....................................................................................... 329
9.2.7 PCI Fast Back-to-Back Cycles............................................................................. 329
9.2.8 PCI Retry ............................................................................................................. 329
9.2.9 PCI Disconnect .................................................................................................... 329
9.2.10 PCI Built-In System Test...................................................................................... 329
9.2.11 PCI Central Functions......................................................................................... 330
9.2.11.1 PCI Interrupt Inputs.............................................................................. 330
9.2.11.2 PCI Reset Output................................................................................. 330
9.2.11.3 PCI Internal Arbiter .............................................................................. 331
9.3 Slave Interface Block ........................................................................................................332
9.3.1 CSR Interface ...................................................................................................... 332
9.3.2 SRAM Interface ................................................................................................... 333
9.3.2.1 SRAM Slave Writes ............................................................................. 333
9.3.2.2 SRAM Slave Reads ............................................................................. 334
9.3.3 DRAM Interface ................................................................................................... 334
9.3.3.1 DRAM Slave Writes ............................................................................. 334
9.3.3.2 DRAM Slave Reads............................................................................. 335
9.3.4 Mailbox and Doorbell Registers........................................................................... 336
9.3.5 PCI Interrupt Pin .................................................................................................. 339
9.4 Master Interface Block ...................................................................................................... 340
9.4.1 DMA Interface...................................................................................................... 340
9.4.1.1 Allocation of the DMA Channels .......................................................... 341
9.4.1.2 Special Registers for Microengine Channels....................................... 341
9.4.1.3 DMA Descriptor.................................................................................... 342
9.4.1.4 DMA Channel Operation...................................................................... 343
9.4.1.5 DMA Channel End Operation .............................................................. 344
9.4.1.6 Adding Descriptor to an Unterminated Chain ...................................... 344
9.4.1.7 DRAM to PCI Transfer......................................................................... 344
9.4.1.8 PCI to DRAM Transfer......................................................................... 345
12 Hardware Reference Manual
Contents
9.4.2 Push/Pull Command Bus Target Interface........................................................... 345
9.4.2.1 Command Bus Master Access to Local Configuration Registers ........345
9.4.2.2 Command Bus Master Access to Local Control and
Status Registers...................................................................................346
9.4.2.3 Command Bus Master Direct Access to PCI Bus ................................ 346
9.4.2.3.1 PCI Address Generation for IO and MEM Cycles................346
9.4.2.3.2 PCI Address Generation for Configuration Cycles...............347
9.4.2.3.3 PCI Address Generation for Special and IACK Cycles........ 347
9.4.2.3.4 PCI Enables ......................................................................... 347
9.4.2.3.5 PCI Command ..................................................................... 347
9.5 PCI Unit Error Behavior .................................................................................................... 348
9.5.1 PCI Target Error Behavior ................................................................................... 348
9.5.1.1 Target Access Has an Address Parity Error ........................................348
9.5.1.2 Initiator Asserts PCI_PERR_L in Response to One of Our Data
Phases ................................................................................................. 348
9.5.1.3 Discard Timer Expires on a Target Read.............................................348
9.5.1.4 Target Access to the PCI_CSR_BAR Space Has Illegal
Byte Enables........................................................................................ 348
9.5.1.5 Target Write Access Receives Bad Parity PCI_PAR with the Data .....349
9.5.1.6 SRAM Responds with a Memory Error on One or More Data Phases
on a Target Read .................................................................................349
9.5.1.7 DRAM Responds with a Memory Error on One or More Data Phases
on a Target Read .................................................................................349
9.5.2 As a PCI Initiator During a DMA Transfer ............................................................349
9.5.2.1 DMA Read from DRAM (Memory-to-PCI Transaction) Gets a
Memory Error .......................................................................................349
9.5.2.2 DMA Read from SRAM (Descriptor Read) Gets a Memory Error........ 350
9.5.2.3 DMA from DRAM Transfer (Write to PCI) Receives PCI_PERR_L on
PCI Bus................................................................................................350
9.5.2.4 DMA To DRAM (Read from PCI) Has Bad Data Parity ....................... 350
9.5.2.5 DMA Transfer Experiences a Master Abort (Time-Out) on PCI ...........351
9.5.2.6 DMA Transfer Receives a Target Abort Response During a
Data Phase .......................................................................................... 351
9.5.2.7 DMA Descriptor Has a 0x0 Word Count (Not an Error) .......................351
9.5.3 As a PCI Initiator During a Direct Access from the Intel
XScale® Core or Microengine .............................................................................351
9.5.3.1 Master Transfer Experiences a Master Abort (Time-Out) on PCI ........351
9.5.3.2 Master Transfer Receives a Target Abort Response During
a Data Phase .......................................................................................351
9.5.3.3 Master from the Intel XScale® Core or Microengine Transfer
(Write to PCI) Receives PCI_PERR_L on PCI Bus .............................352
9.5.3.4 Master Read from PCI (Read from PCI) Has Bad Data Parity ............352
9.5.3.5 Master Transfer Receives PCI_SERR_L from the PCI Bus ................352
9.5.3.6 Intel XScale® Core Microengine Requests Direct Transfer when
the PCI Bus is in Reset ........................................................................352
9.6 PCI Data Byte Lane Alignment ......................................................................................... 352
9.6.1 Endian for Byte Enable ........................................................................................ 355
10 Clocks and Reset....................................................................................................................... 359
10.1 Clocks ............................................................................................................................... 359
10.2 Synchronization Between Frequency Domains ................................................................363
10.3 Reset ................................................................................................................................364
10.3.1 Hardware Reset Using nRESET or PCI_RST_L .................................................364
Hardware Reference Manual 13
Contents
10.3.2 PCI-Initiated Reset............................................................................................... 366
10.3.3 Watchdog Timer-Initiated Reset .......................................................................... 366
10.3.3.1 Slave Network Processor (Non-Central Function)............................... 367
10.3.3.2 Master Network Processor (PCI Host, Central Function) .................... 367
10.3.3.3 Master Network Processor (Central Function)..................................... 367
10.3.4 Software-Initiated Reset ...................................................................................... 367
10.3.5 Reset Removal Operation Based on CFG_PROM_BOOT.................................. 368
10.3.5.1 When CFG_PROM_BOOT is 1 (BOOT_PROM is Present) ................ 368
10.3.5.2 When CFG_PROM_BOOT is 0 (BOOT_PROM is Not Present) .........368
10.3.6 Strap Pins ............................................................................................................ 368
10.3.7 Powerup Reset Sequence ................................................................................... 370
10.4 Boot Mode ........................................................................................................................ 370
10.4.1 Flash ROM........................................................................................................... 372
10.4.2 PCI Host Download ............................................................................................. 372
10.5 Initialization ....................................................................................................................... 373
11 Performance Monitor Unit ........................................................................................................ 375
11.1 Introduction ....................................................................................................................... 375
11.1.1 Motivation for Performance Monitors................................................................... 375
11.1.2 Motivation for Choosing CHAP Counters ............................................................ 376
11.1.3 Functional Overview of CHAP Counters.............................................................. 377
11.1.4 Basic Operation of the Performance Monitor Unit ............................................... 378
11.1.5 Definition of CHAP Terminology .......................................................................... 379
11.1.6 Definition of Clock Domains................................................................................. 380
11.2 Interface and CSR Description ......................................................................................... 380
11.2.1 APB Peripheral .................................................................................................... 381
11.2.2 CAP Description .................................................................................................. 381
11.2.2.1 Selecting the Access Mode.................................................................. 381
11.2.2.2 PMU CSR ............................................................................................ 381
11.2.2.3 CAP Writes .......................................................................................... 381
11.2.2.4 CAP Reads .......................................................................................... 381
11.2.3 Configuration Registers ....................................................................................... 382
11.3 Performance Measurements ............................................................................................ 382
11.4 Events Monitored in Hardware ......................................................................................... 385
11.4.1 Queue Statistics Events....................................................................................... 385
11.4.1.1 Queue Latency..................................................................................... 385
11.4.1.2 Queue Utilization.................................................................................. 385
11.4.2 Count Events ....................................................................................................... 385
11.4.2.1 Hardware Block Execution Count ........................................................ 385
11.4.3 Design Block Select Definitions ........................................................................... 386
11.4.4 Null Event ............................................................................................................ 387
11.4.5 Threshold Events................................................................................................. 388
11.4.6 External Input Events........................................................................................... 389
11.4.6.1 XPI Events Target ID(000001) / Design Block #(0100) ....................... 389
11.4.6.2 SHaC Events Target ID(000010) / Design Block #(0101).................... 393
11.4.6.3 IXP2800 Network Processor MSF Events Target ID(000011) /
Design Block #(0110)........................................................................... 396
11.4.6.4 Intel XScale® Core Events Target ID(000100) /
Design Block #(0111)........................................................................... 402
11.4.6.5 PCI Events Target ID(000101) / Design Block #(1000) ....................... 405
11.4.6.6 ME00 Events Target ID(100000) / Design Block #(1001).................... 409
14 Hardware Reference Manual
Contents
11.4.6.7 ME01 Events Target ID(100001) / Design Block #(1001).................... 410
11.4.6.8 ME02 Events Target ID(100010) / Design Block #(1001).................... 411
11.4.6.9 ME03 Events Target ID(100011) / Design Block #(1001).................... 411
11.4.6.10 ME04 Events Target ID(100100) / Design Block #(1001).................... 412
11.4.6.11 ME05 Events Target ID(100101) / Design Block #(1001).................... 412
11.4.6.12 ME06 Events Target ID(100110) / Design Block #(1001).................... 413
11.4.6.13 ME07 Events Target ID(100111) / Design Block #(1001).................... 413
11.4.6.14 ME10 Events Target ID(110000) / Design Block #(1010).................... 414
11.4.6.15 ME11 Events Target ID(110001) / Design Block #(1010).................... 414
11.4.6.16 ME12 Events Target ID(110010) / Design Block #(1010).................... 415
11.4.6.17 ME13 Events Target ID(110011) / Design Block #(1010).................... 415
11.4.6.18 ME14 Events Target ID(110100) / Design Block #(1010).................... 416
11.4.6.19 ME15 Events Target ID(110101) / Design Block #(1010).................... 416
11.4.6.20 ME16 Events Target ID(100110) / Design Block #(1010).................... 417
11.4.6.21 ME17 Events Target ID(110111) / Design Block #(1010).................... 417
11.4.6.22 SRAM DP1 Events Target ID(001001) / Design Block #(0010)...........418
11.4.6.23 SRAM DP0 Events Target ID(001010) / Design Block #(0010)...........418
11.4.6.24 SRAM CH3 Events Target ID(001011) / Design Block #(0010)...........420
11.4.6.25 SRAM CH2 Events Target ID(001100) / Design Block #(0010)...........421
11.4.6.26 SRAM CH1 Events Target ID(001101) / Design Block #(0010)...........421
11.4.6.27 SRAM CH0 Events Target ID(001110) / Design Block #(0010)...........422
11.4.6.28 DRAM DPLA Events Target ID(010010) / Design Block #(0011) ........423
11.4.6.29 DRAM DPSA Events Target ID(010011) / Design Block #(0011)........ 424
11.4.6.30 IXP2800 Network Processor DRAM CH2 Events Target ID(010100) /
Design Block #(0011)........................................................................... 425
11.4.6.31 IXP2800 Network Processor DRAM CH1 Events Target ID(010101) /
Design Block #(0011)........................................................................... 429
11.4.6.32 IXP2800 Network Processor DRAM CH0 Events Target ID(010110) /
Design Block #(0011)........................................................................... 429
Hardware Reference Manual 15
Contents
Figures
1 IXP2800 Network Processor Functional Block Diagram ............................................................ 28
2 IXP2800 Network Processor Detailed Diagram.......................................................................... 29
3 Intel XScale® Core 4-GB (32-Bit) Address Space ..................................................................... 32
4 Microengine Block Diagram........................................................................................................ 34
5 Context State Transition Diagram .............................................................................................. 36
6 Byte-Align Block Diagram........................................................................................................... 44
7 CAM Block Diagram ................................................................................................................... 46
8 Echo Clock Configuration ........................................................................................................... 52
9 Logical View of Rings ................................................................................................................. 57
10 Example System Block Diagram ................................................................................................ 59
11 Full-Duplex Block Diagram......................................................................................................... 60
12 Simplified MSF Receive Section Block Diagram........................................................................ 61
13 Simplified Transmit Section Block Diagram................................................................................ 65
14 Hash Unit Block Diagram ........................................................................................................... 70
15 DMA Descriptor Reads............................................................................................................... 72
16 Intel XScale
17 Example of Locked Entries in TLB ............................................................................................. 88
18 Instruction Cache Organization .................................................................................................. 89
19 Locked Line Effect on Round Robin Replacement..................................................................... 93
20 BTB Entry ................................................................................................................................... 95
21 Branch History............................................................................................................................ 95
22 Data Cache Organization ........................................................................................................... 97
23 Mini-Data Cache Organization ................................................................................................... 98
24 Byte Steering for Read and Byte-Enable Generation by the Intel XScale
25 Intel XScale® Core-Initiated Write to the IXP2800 Network Processor.................................... 124
26 Intel XScale® Core-Initiated Write to the IXP2800 Network Processor (Continued)................ 125
27 Global Buses Connection to the Intel XScale® Core Gasket ................................................... 126
28 Flow Through the Intel XScale
29 Interrupt Mask Block Diagram ..................................................................................................133
30 XPI Interfaces for IXP2800 Network Processor........................................................................ 135
31 UART Data Frame.................................................................................................................... 138
32 GPIO Functional Diagram ........................................................................................................ 140
33 Timer Control Unit Interfacing Diagram .................................................................................... 141
34 Timer Internal Logic Diagram ................................................................................................... 142
35 Slowport Unit Interface Diagram...............................................................................................144
36 Address Space Hole Diagram .................................................................................................. 145
37 Slowport Example Application Topology .................................................................................. 146
38 Mode 0 Single Write Transfer for a Fixed-Timed Device.......................................................... 147
39 Mode 0 Single Write Transfer for a Self-Timing Device ........................................................... 148
40 Mode 0 Single Read Transfer for Fixed-Timed Device ............................................................ 149
41 Mode 0 Single Read Transfer for a Self-Timing Device ........................................................... 150
42 An Interface Topology with Lucent* TDAT042G5 SONET/SDH............................................... 152
43 Mode 1 Single Write Transfer for Lucent* TDAT042G5 Device (B0) ....................................... 153
44 Mode 1 Single Read Transfer for Lucent* TDAT042G5 Device (B0) ....................................... 154
45 An Interface Topology with PMC-Sierra* PM5351 S/UNI-TETRA* .......................................... 155
46 Mode 2 Single Write Transfer for PMC-Sierra* PM5351 Device (B0) ...................................... 156
47 Mode 2 Single Read Transfer for PMC-Sierra* PM5351 Device (B0) ...................................... 157
®
Core Architecture Features .................................................................................. 80
®
Core ..................... 122
®
Core Interrupt Controller........................................................ 132
16 Hardware Reference Manual
Contents
48 An Interface Topology with Intel / AMCC* SONET/SDH Device .............................................. 158
49 Mode 3 Second Interface Topology with Intel / AMCC* SONET/SDH Device.......................... 159
50 Mode 3 Single Write Transfer Followed by Read (B0) ............................................................. 160
51 Mode 3 Single Read Transfer Followed by Write (B0) ............................................................. 161
52 An Interface Topology with Intel / AMCC* SONET/SDH Device in Motorola* Mode ................162
53 Second Interface Topology with Intel / AMCC* SONET/SDH Device.......................................163
54 Mode 4 Single Write Transfer (B0) ........................................................................................... 164
55 Mode 4 Single Read Transfer (B0) ...........................................................................................165
56 Microengine Block Diagram...................................................................................................... 168
57 Context State Transition Diagram............................................................................................. 170
58 Byte Align Block Diagram ......................................................................................................... 175
59 CAM Block Diagram .................................................................................................................177
60 Read from RBUF (64 Bits)........................................................................................................181
61 Write to TBUF (64 Bits).............................................................................................................182
62 48-, 64-, and 128-Bit Hash Operand Transfers ........................................................................ 183
63 Bit, Byte, and Longword Organization in One RBUF Element.................................................. 184
64 Write to TBUF ........................................................................................................................... 185
65 MSF Interface ........................................................................................................................... 186
66 Clock Configuration .................................................................................................................. 189
67 IXP2800 Clocking for RDRAM at 400 MHz ..............................................................................190
68 IXP2800 Clocking for RDRAM at 508 MHz ..............................................................................190
69 Address Mapping Flow .............................................................................................................191
70 RDRAM Controller Block Diagram............................................................................................198
71 DRAM Push/Pull Arbiter Functional Blocks ..............................................................................202
72 DRAM Push Arbiter Functional Blocks .....................................................................................204
73 DRAM Pull Arbiter Functional Blocks .......................................................................................205
74 SRAM Controller/Chassis Block Diagram................................................................................. 208
75 SRAM Clock Connection on a Channel.................................................................................... 210
76 External Pipeline Registers Block Diagram ..............................................................................211
77 Queue Descriptor with Four Links ............................................................................................ 213
78 Enqueueing One Buffer at a Time ............................................................................................ 213
79 Previously Linked String of Buffers...........................................................................................214
80 First Step to Enqueue a String of Buffers to a Queue (ENQ_Tail_and_Link) ...........................214
81 Second Step to Enqueue a String of Buffers to a Queue (ENQ_Tail) ...................................... 214
82 Connection to a Coprocessor Though Standard QDR Interface .............................................. 221
83 Coprocessor with Memory Mapped FIFO Ports ....................................................................... 222
84 SHaC Top Level Diagram......................................................................................................... 226
85 Scratchpad Block Diagram .......................................................................................................228
86 Ring Communication Logic Diagram ........................................................................................231
87 Hash Unit Block Diagram..........................................................................................................236
88 Example System Block Diagram .............................................................................................. 242
89 Full-Duplex Block Diagram .......................................................................................................243
90 Receive and Transmit Clock Generation ..................................................................................245
91 Simplified Receive Section Block Diagram ............................................................................... 247
92 RBUF Element State Diagram..................................................................................................257
93 Extent of DIP-4 Codes .............................................................................................................. 260
94 Simplified Transmit Section Block Diagram.............................................................................. 262
95 TBUF State Diagram ................................................................................................................ 270
96 Tx Calendar Block Diagram...................................................................................................... 271
97 CSIX Flow Control Interface — TXCSRB and RXCSRB .......................................................... 275
Hardware Reference Manual 17
Contents
98 CSIX Flow Control Interface — FCIFIFO and FCEFIFO in Full Duplex Mode ......................... 277
99 CSIX Flow Control Interface — FCIFIFO and FCEFIFO in Simplex Mode .............................. 278
100 MSF to Command and Push and Pull Buses Interface Block Diagram .................................... 290
101 Basic I/O Capability of the Intel® IXP2800 Network Processor................................................ 292
102 Simplex Configuration .............................................................................................................. 293
103 Hybrid Simplex Configuration ................................................................................................... 294
104 Dual Network Processor, Full Duplex Configuration ................................................................ 295
105 Single Network Processor, Full Duplex Configuration (SPI-4.2 Protocol) ................................ 296
106 Single Network Processor, Full Duplex Configuration (SPI-4.2 and CSIX-L1 Protocols) ......... 297
107 Framer, Single Network Processor Ingress and Egress, and Fabric Interface Chip ................ 298
108 Framer, Dual Processor Ingress, Single Processor Egress, and Fabric Interface Chip ........... 298
109 Framer, Single Network Processor Ingress, Single Network Processor Egress,
CSIX-L1 Translation Chip and CSIX-L1 Fabric Interface Chip................................................. 299
110 CPU Complex, Network Processor, and Fabric Interface Chips .............................................. 299
111 Framer, Single Network Processor, Co-Processor, and Fabric Interface Chip ........................ 300
112 SPI-4.2 Interface Reference Model with Receiver and Transmitter Labels
Corresponding to Link Layer Device Functions........................................................................ 301
113 CSIX-L1 Interface Reference Model with Receiver and Transmitter Labels
Corresponding to Fabric Interface Chip Functions ................................................................... 303
114 Reference Model for IXP2800 Support of the Simplex Configuration Using
Independent Ingress and Egress Interfaces............................................................................. 306
115 Reference Model for Hybrid Simplex Operation ....................................................................... 307
116 Block Diagram of Dual Protocol (SPI-4.2 and CSIX-L1) Bridge Chip ....................................... 313
117 Summary of Receiver and Transmitter Signaling ..................................................................... 317
118 PCI Functional Blocks .............................................................................................................. 320
119 Data Access Paths ................................................................................................................... 321
120 PCI Arbiter Configuration Using CFG_PCI_ARB(GPIO[2]) ...................................................... 331
121 Example of Target Write to SRAM of 68 Bytes ........................................................................ 333
122 Example of Target Write to DRAM of 68 Bytes ........................................................................ 335
123 Example of Target Read from DRAM Using 64-Byte Burst ...................................................... 336
124 Generation of the Doorbell Interrupts to PCI ............................................................................ 337
125 Generation of the Doorbell Interrupts to the Intel XScale® Core.............................................. 338
126 PCI Interrupts ........................................................................................................................... 339
127 DMA Descriptor Reads............................................................................................................. 342
128 PCI Address Generation for Command Bus Master to PCI...................................................... 346
129 PCI Address Generation for Command Bus Master to PCI Configuration Cycle ..................... 347
130 Overall Clock Generation and Distribution ............................................................................... 360
131 IXP2800 Network Processor Clock Generation........................................................................ 363
132 Synchronization Between Frequency Domains........................................................................ 364
133 Reset Out Behavior .................................................................................................................. 365
134 Reset Generation ..................................................................................................................... 366
135 Boot Process ............................................................................................................................ 371
136 Performance Monitor Interface Block Diagram......................................................................... 376
137 Block Diagram of a Single CHAP Counter ............................................................................... 378
138 Basic Block Diagram of IXP2800 Network Processor with PMU .............................................. 379
139 CAP Interface to the APB ......................................................................................................... 380
140 Conceptual Diagram of Counter Array ..................................................................................... 382
18 Hardware Reference Manual
Contents
Tables
1 Data Terminology ....................................................................................................................... 26
2 Longword Formats......................................................................................................................26
3 IXP2800 Network Processor Microengine Bus Arrangement .....................................................35
4 Next Neighbor Write as a Function of CTX_ENABLE[NN_MODE] ............................................38
5 Registers Used By Contexts in Context-Relative Addressing Mode .......................................... 41
6 Align Value and Shift Amount .....................................................................................................43
7 Register Contents for Example 10.............................................................................................. 44
8 Register Contents for Example 11.............................................................................................. 45
9 RDRAM Sizes............................................................................................................................. 50
10 SRAM Controller Configurations................................................................................................. 52
11 Total Memory per Channel .........................................................................................................53
12 Address Reference Order........................................................................................................... 55
13 Q_array Entry Reference Order.................................................................................................. 55
14 Ring Full Signal Use – Number of Contexts and Length versus Ring Size ................................58
15 TBUF SPI-4 Control Definition....................................................................................................66
16 TBUF CSIX Control Definition ....................................................................................................67
17 DMA Descriptor Format ..............................................................................................................72
18 Doorbell Interrupt Registers........................................................................................................ 75
19 I/O Latency ................................................................................................................................. 78
20 Data Cache and Buffer Behavior when X = 0............................................................................. 83
21 Data Cache and Buffer Behavior when X = 1............................................................................. 83
22 Memory Operations that Impose a Fence ..................................................................................84
23 Valid MMU and Data/Mini-Data Cache Combinations................................................................85
24 Performance Monitoring Events ............................................................................................... 107
25 Some Common Uses of the PMU............................................................................................. 108
26 Branch Latency Penalty............................................................................................................ 112
27 Latency Example ...................................................................................................................... 114
28 Branch Instruction Timings (Predicted by the BTB)..................................................................115
29 Branch Instruction Timings (Not Predicted by the BTB) ........................................................... 115
30 Data Processing Instruction Timings ........................................................................................ 115
31 Multiply Instruction Timings ......................................................................................................116
32 Multiply Implicit Accumulate Instruction Timings ...................................................................... 117
33 Implicit Accumulator Access Instruction Timings......................................................................117
34 Saturated Data Processing Instruction Timings........................................................................ 117
35 Status Register Access Instruction Timings .............................................................................118
36 Load and Store Instruction Timings ..........................................................................................118
37 Load and Store Multiple Instruction Timings............................................................................. 118
38 Semaphore Instruction Timings ................................................................................................118
39 CP15 Register Access Instruction Timings............................................................................... 119
40 CP14 Register Access Instruction Timings............................................................................... 119
41 SWI Instruction Timings............................................................................................................ 119
42 Count Leading Zeros Instruction Timings ................................................................................. 119
43 Little-Endian Encoding.............................................................................................................. 120
44 Big-Endian Encoding ................................................................................................................120
45 Byte-Enable Generation by the Intel XScale
Big-Endian Systems .................................................................................................................121
46 Byte-Enable Generation by the Intel XScale
and Big-Endian Systems ..........................................................................................................123
®
Core for Byte Transfers in Little- and
®
Core for 16-Bit Data Transfers in Little-
Hardware Reference Manual 19
Contents
47 Byte-Enable Generation by the Intel XScale® Core for Byte Writes in Little- and
Big-Endian Systems ................................................................................................................. 123
48 Byte-Enable Generation by the Intel XScale
®
Core for Word Writes in Little- and
Big-Endian Systems ................................................................................................................. 124
49 CMB Write Command to CPP Command Conversion ............................................................. 127
50 IXP2800 Network Processor SRAM Q-Array Access Alias Addresses .................................... 129
51 GCSR Address Map (0xd700 0000)......................................................................................... 131
52 Data Transaction Alignment .....................................................................................................136
53 Address Spaces for XPI Internal Devices................................................................................. 136
54 8-Bit Flash Memory Device Density ......................................................................................... 143
55 SONET/SDH Devices ............................................................................................................... 143
56 Next Neighbor Write as a Function of CTX_Enable[NN_Mode] ............................................... 172
57 Registers Used by Contexts in Context-Relative Addressing Mode......................................... 173
58 Align Value and Shift Amount ................................................................................................... 174
59 Register Contents for Example 23............................................................................................ 175
60 Register Contents for Example 24............................................................................................ 176
61 RDRAM Loading....................................................................................................................... 188
62 RDRAM Sizes........................................................................................................................... 188
63 Address Rearrangement for 3-Way Interleave (Sheet 1 of 2) .................................................. 192
64 Address Rearrangement for 3-Way Interleave (Sheet 2 of 2) (Rev B) .................................... 193
65 Address Bank Interleaving........................................................................................................ 194
66 RDRAM Timing Parameter Settings ......................................................................................... 196
67 Ordering of Reads and Writes to the Same Address for DRAM............................................... 201
68 DRAM Push Arbiter Operation ................................................................................................. 203
69 DPLA Description ..................................................................................................................... 204
70 SRAM Controller Configurations .............................................................................................. 209
71 Total Memory per Channel ....................................................................................................... 210
72 Atomic Operations .................................................................................................................... 212
73 Queue Format .......................................................................................................................... 215
74 Ring/Journal Format ................................................................................................................. 216
75 Ring Size Encoding .................................................................................................................. 216
76 Address Map ............................................................................................................................ 218
77 Address Reference Order......................................................................................................... 219
78 Q_array Entry Reference Order ............................................................................................... 220
79 Ring Full Signal Use – Number of Contexts and Length versus Ring Size .............................. 232
80 Head/Tail, Base, and Full Threshold – by Ring Size ................................................................ 233
81 Intel XScale® Core and Microengine Instructions .................................................................... 235
82 S_Transfer Registers Hash Operands ..................................................................................... 237
83 SPI-4 Control Word Format ......................................................................................................244
84 Order of Bytes within the SPI-4 Data Burst .............................................................................. 245
85 CFrame Types.......................................................................................................................... 246
86 Receive Pins Usage by Protocol .............................................................................................. 248
87 Order in which Received Data Is Stored in RBUF .................................................................... 248
88 Mapping of Received Data to RBUF Partitions ........................................................................ 249
89 Number of Elements per RBUF Partition.................................................................................. 249
90 RBUF SPIF-4 Status Definition ................................................................................................ 252
91 RBUF CSIX Status Definition ...................................................................................................254
92 Rx_Thread_Freelist Use........................................................................................................... 255
93 Summary of SPI-4 and CSIX RBUF Operations ...................................................................... 258
94 Transmit Pins Usage by Protocol ............................................................................................. 262
20 Hardware Reference Manual
Contents
95 Order in which Data is Transmitted from TBUF ........................................................................ 263
96 Mapping of TBUF Partitions to Transmit Protocol ....................................................................263
97 Number of Elements per TBUF Partition .................................................................................. 264
98 TBUF SPI-4 Control Definition..................................................................................................266
99 TBUF CSIX Control Definition ..................................................................................................267
100 Transmit SPI-4 Control Word....................................................................................................268
101 Transmit CSIX Header..............................................................................................................269
102 Summary of RBUF and TBUF Operations................................................................................ 274
103 SRB Definition by Clock Phase Number...................................................................................276
104 Data Deskew Functions............................................................................................................281
105 Calendar Deskew Functions.....................................................................................................281
106 Flow Control Deskew Functions ............................................................................................... 281
107 Data Training Sequence ........................................................................................................... 282
108 Flow Control Training Sequence ..............................................................................................282
109 Calendar Training Sequence ....................................................................................................283
110 IXP2800 Network Processor Requires Data Training...............................................................284
111 Switch Fabric or SPI-4 Framer Requires Data Training ...........................................................285
112 IXP2800 Network Processor Requires Flow Control Training ..................................................286
113 Switch Fabric Requires Flow Control Training..........................................................................286
114 SPI-4.2 Transmitter State Machine Transitions on 16-Bit Bus Transfers .................................314
115 Training Transmitter State Machine Transitions on 16-Bit Bus Transfers ................................315
116 CSIX-L1 Transmitter State Machine Transitions on CWord Boundaries ..................................315
117 PCI Block FIFO Sizes ...............................................................................................................322
118 Maximum Loading .................................................................................................................... 322
119 PCI Commands ........................................................................................................................ 322
120 PCI BAR Programmable Sizes ................................................................................................. 324
121 PCI BAR Sizes with PCI Host Initialization ...............................................................................324
122 Legal Combinations of the Strap Pin Options...........................................................................330
123 Slave Interface Buffer Sizes ..................................................................................................... 332
124 Doorbell Interrupt Registers...................................................................................................... 337
125 IRQ Interrupt Options by Stepping............................................................................................339
126 DMA Descriptor Format ............................................................................................................ 342
127 PCI Maximum Burst Size..........................................................................................................345
128 Command Bus Master Configuration Transactions ..................................................................347
129 Command Bus Master Address Space Map to PCI.................................................................. 347
130 Byte Lane Alignment for 64-Bit PCI Data In (64 Bits PCI Little-Endian to Big-Endian
with Swap) ................................................................................................................................ 353
131 Byte Lane Alignment for 64-Bit PCI Data In (64 Bits PCI Big-Endian to Big-Endian
without Swap) ........................................................................................................................... 353
132 Byte Lane Alignment for 32-Bit PCI Data In (32 Bits PCI Little-Endian to Big-Endian
with Swap) ................................................................................................................................ 353
133 Byte Lane Alignment for 32-Bit PCI Data In (32 Bits PCI Big-Endian to Big-Endian
without Swap) ........................................................................................................................... 353
134 Byte Lane Alignment for 64-Bit PCI Data Out (Big-Endian to 64 Bits PCI Little
Endian with Swap) .................................................................................................................... 354
135 Byte Lane Alignment for 64-Bit PCI Data Out (Big-Endian to 64 Bits PCI Big-Endian
without Swap) ........................................................................................................................... 354
136 Byte Lane Alignment for 32-Bit PCI Data Out (Big-Endian to 32 Bits PCI Little
Endian with Swap) .................................................................................................................... 354
137 Byte Lane Alignment for 32-Bit PCI Data Out (Big-Endian to 32 Bits PCI Big-Endian
without Swap) ........................................................................................................................... 354
Hardware Reference Manual 21
Contents
138 Byte Enable Alignment for 64-Bit PCI Data In (64 Bits PCI Little-Endian to Big-
Endian with Swap).................................................................................................................... 355
139 Byte Enable Alignment for 64-Bit PCI Data In (64 Bits PCI Big-Endian to Big-Endian
without Swap) ........................................................................................................................... 355
140 Byte Enable Alignment for 32-Bit PCI Data In (32 bits PCI Little-Endian to Big-
Endian with Swap).................................................................................................................... 355
141 Byte Enable Alignment for 32-Bit PCI Data In (32 Bits PCI Big-Endian to Big-Endian
without Swap) ........................................................................................................................... 356
142 Byte Enable Alignment for 64-Bit PCI Data Out (Big-Endian to 64 Bits PCI Little
Endian with Swap).................................................................................................................... 356
143 Byte Enable Alignment for 64-Bit PCI Data Out (Big-Endian to 64 Bits PCI Big
Endian without Swap)............................................................................................................... 356
144 Byte Enable Alignment for 32-Bit PCI Data Out (Big-Endian to 32 Bits PCI Little
Endian with Swap).................................................................................................................... 356
145 Byte Enable Alignment for 32-Bit PCI Data Out (Big-Endian to 32 Bits PCI Big
Endian without Swap)............................................................................................................... 357
146 PCI I/O Cycles with Data Swap Enable .................................................................................... 358
147 Clock Usage Summary............................................................................................................. 360
148 Clock Rates Examples ............................................................................................................. 362
149 IXP2800 Network Processor Strap Pins................................................................................... 369
150 Supported Strap Combinations ................................................................................................ 370
151 APB Usage ............................................................................................................................... 381
152 Hardware Blocks and Their Performance Measurement Events.............................................. 383
153 PMU Design Unit Selection ...................................................................................................... 386
154 Chap Counter Threshold Events (Design Block # 0001).......................................................... 388
155 XPI PMU Event List .................................................................................................................. 389
156 SHaC PMU Event List .............................................................................................................. 393
157 IXP2800 Network Processor MSF PMU Event List .................................................................. 396
158 Intel XScale® Core Gasket PMU Event List............................................................................. 402
159 PCI PMU Event List.................................................................................................................. 405
160 ME00 PMU Event List .............................................................................................................. 409
161 ME01 PMU Event List .............................................................................................................. 410
162 ME02 PMU Event List .............................................................................................................. 411
163 ME03 PMU Event List .............................................................................................................. 411
164 ME04 PMU Event List .............................................................................................................. 412
165 ME05 PMU Event List .............................................................................................................. 412
166 ME06 PMU Event List .............................................................................................................. 413
167 ME07 PMU Event List .............................................................................................................. 413
168 ME10 PMU Event List .............................................................................................................. 414
169 ME11 PMU Event List .............................................................................................................. 414
170 ME12 PMU Event List .............................................................................................................. 415
171 ME13 PMU Event List .............................................................................................................. 415
172 ME14 PMU Event List .............................................................................................................. 416
173 ME15 PMU Event List .............................................................................................................. 416
174 ME16 PMU Event List .............................................................................................................. 417
175 ME17 PMU Event List .............................................................................................................. 417
176 SRAM DP1 PMU Event List ..................................................................................................... 418
177 SRAM DP0 PMU Event List ..................................................................................................... 418
178 SRAM CH3 PMU Event List .....................................................................................................420
179 SRAM CH3 PMU Event List .....................................................................................................421
180 SRAM CH3 PMU Event List .....................................................................................................421
22 Hardware Reference Manual
Contents
181 SRAM CH0 PMU Event List .....................................................................................................422
182 IXP2800 Network Processor Dram DPLA PMU Event List....................................................... 423
183 IXP2800 Network Processor Dram DPSA PMU Event List ......................................................424
184 IXP2800 Network Processor Dram CH2 PMU Event List.........................................................425
185 IXP2800 Network Processor Dram CH1 PMU Event List.........................................................429
186 IXP2800 Network Processor Dram CH0 PMU Event List.........................................................429
Hardware Reference Manual 23
Contents
24 Hardware Reference Manual
Intel® IXP2800 Network Processor
Introduction

Introduction 1

1.1 About This Document

This document is the hardware reference manual for the Intel® IXP2800 Network Processor. This information is intended for use by developers and is organized as follows:
Section 2, “Technical Description” contains a hardware overview.
Section 3, “Intel XScale
Section 4, “Microengines” describes Microengine operation.
Section 5, “DRAM” describes the DRAM Unit.
Section 6, “SRAM Interface” describes the SRAM Unit.
Section 7, “SHaC — Unit Expansion” describes the Scratchpad, Hash Unit, and CSRs (SHaC).
®
Core” describes the embedded core.
Section 8, “Media and Switch Fabric Interface” describes the Media and Switch Fabric (MSF)
Interface used to connect the network processor to a physical layer device.
Section 9, “PCI Unit” describes the PCI Unit.
Section 10, “Clocks and Reset” describes the clocks, reset and initialization sequence.
Section 11, “Performance Monitor Unit” describes the PMU.

1.2 Related Documentation

Further information on the IXP2800 is available in the following documents:
IXP2800 Network Processor Datasheet – Contains summary information on the IXP2800 Network Processor including a functional description, signal descriptions, electrical specifications, and mechanical specifications.
IXP2400 and IXP2800 Network Processor Programmer’s Reference Manual – Contains detailed programming information for designers.
IXP2400/IXP2800 Network Processor Development Tools User’s Guide – Describes the Developer Workbench and the development tools you can access through the use of the Workbench GUI.
Hardware Reference Manual 25
Introduction

1.3 Terminology

Table 1 and Table 2 list the terminology used in this manual.
Table 1. Data Terminology
Term Words Bytes Bits
Byte ½ 1 8
Word 1 2 16
Longword 2 4 32
Quadword 4 8 64
Table 2. Longword Formats
Endian Type 32-Bit 64-Bit
Little-Endian (0x12345678) arranged as {12 34 56 78}
Big-Endian (0x12345678) arranged as {78 56 34 12}
Intel® IXP2800 Network Processor
64-bit data 0x12345678 9ABCDE56 arranged as {12 34 56 78 9A BC DE 56}
64-bit data 0x12345678 9ABCDE56 arranged as {78 56 34 12, 56 DE BC 9A}
26 Hardware Reference Manual
Intel® IXP2800 Network Processor
Technical Description

Technical Description 2

2.1 Overview

This section provides a brief overview of the IXP2800 Network Processor internal hardware. This section is intended as an overall hardware introduction to the network processor.
The major blocks are:
Intel XScale
compliant) used to initialize and manage the network processor, and can be used for higher layer network processing tasks.
Intel XScale
Purpose I/O (GPIO) and interface to low-speed off chip peripherals (such as maintenance port of network devices) and Flash ROM.
Microengines (MEs) — Sixteen 32-bit programmable engines specialized for Network
Processing. Microengines do the main data plane processing per packet.
®
core — General purpose 32-bit RISC processor (ARM* Version 5 Architecture
®
technology Peripherals (XPI) — Interrupt Controller, Timers, UART, General
DRAM Controllers — Three independent controllers for Rambus* DRAM. Typically DRAM
is used for data buffer storage.
SRAM Controllers — Four independent controllers for QDR SRAM. Typically SRAM is used
for control information storage.
Scratchpad Memory — 16 Kbytes storage for general purpose use.
Hash Unit — Polynomial hash accelerator. The Intel XScale
it to offload hash calculations.
®
core and Microengines can use
Control and Status Register Access Proxy (CAP) — These provide special inter-processor
communication features to allow flexible and efficient inter-Microengine and Microengine to Intel XScale
®
core communication.
Media and Switch Fabric Interface (MSF) — Interface for network framers and/or Switch
Fabric. Contains receive and transmit buffers.
PCI Controller — PCI Local Bus Specification, Version 2.2* interface for 64-bit 66-MHz I/O. PCI can
be used to either connect to a Host processor, or to attach PCI-compliant peripheral devices.
Performance Monitor — Counters that can be programmed to count selected internal chip
hardware events, which can be used to analyze and tune performance.
Figure 1 is a simple block diagram of the network processor showing the major internal hardware
blocks. Figure 2 is a detailed diagram of the network processor units and buses.
Hardware Reference Manual 27
Intel® IXP2800 Network Processor
Technical Description
Figure 1. IXP2800 Network Processor Functional Block Diagram
Media Switch
Fabric (MSF)
Hash
Unit
Scratched
Memory
PCI
Controller
SRAM
Controller
CAP
0
SRAM
Controller
1
ME 0x1
ME 0x2
ME 0x5
ME 0x6
ME Cluster 0
SRAM
Controller
ME 0x0
ME 0x3
ME 0x4
ME 0x7
2
SRAM
Controller
3
ME
0x10
ME
0x13
ME
0x14
ME
0x17
ME Cluster 1
DRAM
Controller
0
ME
0x11
ME
0x12
ME
0x15
ME
0x16
DRAM
Controller
1
Intel
®
XScale
Core
Peripherals
(XPI)
Performance
Monitor
DRAM
Controller
2
Intel
XScale
Core
A9226-02
®
28 Hardware Reference Manual
Figure 2. IXP2800 Network Processor Detailed Diagram
Intel® IXP2800 Network Processor
Technical Description
Device
SP14/CSIX
PCI
Device
SHaC Unit
Scratch Hash CAP
Media
Controller
RBuf TBuf CSR
mast/targ
CSRs PCI
space DMA
transfers
PCI
Controller
SRAM
SRAM
Controller
S_Pull
Arb 1
SRAM
SRAM
Controller
S_Push
Arb 1
Cmd_Arb_1 (grant/request)
Controller
S_Push
Arb 0
S_Pull_1
S_Push_1
S in xfer
CSRs
SRAM
SRAM
S_Pull
Arb 0
S out
xfer
ME0x10-0x17
D in xfer
Cluster 1
D out
D_Push
D_Pull
Cmd_1
Cmd_0
S_Pull_0
S_Push_0
xfer
DRAM
DRAM
Controller
Cmd
Cmd
FIFO
FIFO
CSRs
DRAM
DRAM
Controller
S in
S out
xfer
xfer
ME0x0-0x7
Cluster 0
DRAM
DRAM
Controller
Arbiter
D_Push_Bus
D out
D in
xfer
xfer
Cmd_Arb_0 (grant/request)
Arbiter
D_Pull_Bus
Gasket
Intel
®
XScale
Core
Command Bus
Arbiter 1
Command Bus
Arbiter 0
Notes:
Connected to the S_Push/Pull Buses Connected to the S_Push/Pull Buses and D_Push/Pull Buses
= Chassis Components
A9750-03
Hardware Reference Manual 29
Intel® IXP2800 Network Processor
Technical Description

2.2 Intel XScale® Core Microarchitecture

The Intel XScale® microarchitecture consists of a 32-bit general purpose RISC processor that incorporates an extensive list of architecture features that allows it to achieve high performance.

2.2.1 ARM* Compatibility

The Intel XScale® microarchitecture is ARM* Version 5 (V5) Architecture compliant. It implements the integer instruction set of ARM* V5, but does not provide hardware support of the floating point instructions.
The Intel XScale ARM V5E DSP extensions.
Backward compatibility with the first generation of StrongARM* products is maintained for user­mode applications. Operating systems may require modifications to match the specific hardware features of the Intel XScale enhancements added to the Intel XScale
®
microarchitecture provides the Thumb instruction set (ARM V5T) and the
®
microarchitecture and to take advantage of the performance
®
core.

2.2.2 Features

2.2.2.1 Multiply/Accumulate (MAC)
The MAC unit supports early termination of multiplies/accumulates in two cycles and can sustain a throughput of a MAC operation every cycle. Several architectural enhancements were made to the MAC to support audio coding algorithms, which include a 40-bit accumulator and support for 16-bit packed values.
2.2.2.2 Memory Management
The Intel XScale® microarchitecture implements the Memory Management Unit (MMU) Architecture specified in the ARM Architecture Reference Manual. The MMU provides access protection and virtual to physical address translation.
The MMU Architecture also specifies the caching policies for the instruction cache and data memory. These policies are specified as page attributes and include:
identifying code as cacheable or non-cacheable
selecting between the mini-data cache or data cache
write-back or write-through data caching
enabling data write allocation policy
and enabling the write buffer to coalesce stores to external memory
2.2.2.3 Instruction Cache
The Intel XScale® microarchitecture implements a 32-Kbyte, 32-way set associative instruction cache with a line size of 32 bytes. All requests that “miss” the instruction cache generate a 32-byte read request to external memory. A mechanism to lock critical code within the cache is also provided.
30 Hardware Reference Manual
2.2.2.4 Branch Target Buffer
The Intel XScale® microarchitecture provides a Branch Target Buffer (BTB) to predict the outcome of branch type instructions. It provides storage for the target address of branch type instructions and predicts the next address to present to the instruction cache when the current instruction address is that of a branch.
The BTB holds 128 entries.
2.2.2.5 Data Cache
The Intel XScale® microarchitecture implements a 32-Kbyte, 32-way set associative data cache and a 2-Kbyte, 2-way set associative mini-data cache. Each cache has a line size of 32 bytes, and supports write-through or write-back caching.
The data/mini-data cache is controlled by page attributes defined in the MMU Architecture and by coprocessor 15.
Intel® IXP2800 Network Processor
Technical Description
The Intel XScale as data RAM. Software may place special tables or frequently used variables in this RAM.
®
microarchitecture allows applications to reconfigure a portion of the data cache
2.2.2.6 Interrupt Controller
The Intel XScale® microarchitecture provides two levels of interrupt, IRQ and FIQ. They can be masked via coprocessor 13. Note that there is also a memory-mapped interrupt controller described with the Intel XScale many chip-wide interrupt sources.
®
technology peripherals (see Section 3.12), which is used to mask and steer
Hardware Reference Manual 31
Intel® IXP2800 Network Processor
Technical Description
2.2.2.7 Address Map
Figure 3 shows the partitioning of the Intel XScale® core microarchitecture 4-Gbyte address space.
Figure 3. Intel XScale
0XFFFF FFF
0XE000 0000
0XDFFF FFF
0XC000 0000
0XBFFF FFF
0x8000 0000
0X7FFF FFFF
®
Core 4-GB (32-Bit) Address Space
PCI MEM
(1/2 Gb)
Other
(1/2 Gb)
SRAM (1 Gb)
3.5 Gb
3.0 Gb
PCI Local CSRs
PCI Config Regs
PCI Spec/IACK
PCI CFG (32 Mb)
PCI I/O (32 Mb)
Intel XScale® Core CSR
RESERVED
(32 Mb x 2)
DRAM CSR (32 Mb) SRAM Ring (32 Mb)
SRAM CSR & Queue
Scratch (32 Mb)
MSF (32 Mb) FLASH ROM
(64 Mb)
RESERVED
CAP-CSRs (32 Mb)
0XDF00 0000 0XDE00 0000
0XDC00 0000 0XDA00 0000 0XD800 0000 0XD600 0000
0XD000 0000 0XCE00 0000 0XCC00 0000 0XCA00 0000 0XC800 0000
0XC400 0000 0XC200 0000 0XC000 0000
DRAM
and
Intel
®
XScale
Core
FLASH ROM
(2 Gb)
0X0000 0000
A9693-02
32 Hardware Reference Manual

2.3 Microengines

The Microengines do most of the programmable pre-packet processing in the IXP2800 Network Processor. There are 16 Microengines, connected as shown in Figure 1. The Microengines have access to all shared resources (SRAM, DRAM, MSF, etc.) as well as private connections between adjacent Microengines (referred to as “next neighbors”).
The block diagram in Figure 4 is used in the Microengine description. Note that this block diagram is simplified for clarity; some blocks and connectivity have been omitted to make the diagram more readable. Also, this block diagram does not show any pipeline stages, rather it shows the logical flow of information.
Microengines provide support for software-controlled multi-threaded operation. Given the disparity in processor cycle times versus external memory times, a single thread of execution often blocks, waiting for external memory operations to complete. Multiple threads allow for thread­interleave operation, as there is often at least one thread ready to run while others are blocked.
Intel® IXP2800 Network Processor
Technical Description
Hardware Reference Manual 33
Intel® IXP2800 Network Processor
Technical Description
Figure 4. Microengine Block Diagram
NNData_In (from previous ME)
640
Local
Mem
Lm_addr_1 Lm_addr_0
CRC_Remainder
CRC Unit
S_Push
Local CSRs
(from DRAM)
d e c o d e
128
GPRs
(A Bank)
A_Operand
128
GPRs
(B Bank)
128
Next
Neighbor
B_Operand
Execution
Datapath
(Shift, Add, Subtract, Multiply Logicals,
Find First Bit, CAM)
ALU_Out
128
D
XFER
Out
128
S
XFER
Out
D_Push
Command
CMD FIFO
(4)
128
D
XFER
In
T_Index NN_Get
128
S
XFER
In
S_Push (from SRAM Scratchpad, MSF, Hash, PCI, CAP)
Control
Store
A_Src B_Src
Immed
Dest
NN_Data_Out (to next ME)
Control Data
S_PullD_Pull
B1670-01
34 Hardware Reference Manual
Intel® IXP2800 Network Processor

2.3.1 Microengine Bus Arrangement

The IXP2800 Network Processor supports a single D_Push/D_Pull bus, and both Microengine clusters interface to the same bus. Also, it supports two command buses, and two sets of S_Push/S_Pull buses connected as shown in Table 3, which also shows the next neighbor relationship between the Microengine.
Table 3. IXP2800 Network Processor Microengine Bus Arrangement
Technical Description
Microengine
Cluster
0
1

2.3.2 Control Store

Microengine
Number
0x00 0x01 NA
0x01 0x02 0x00
0x02 0x03 0x01
0x03 0x04 0x02
0x04 0x05 0x03
0x05 0x06 0x04
0x06 0x07 0x05
0x07 0x10 0x06
0x10 0x11 0x07
0x11 0x12 0x10
0x12 0x13 0x11
0x13 0x14 0x12
0x14 0x15 0x13
0x15 0x16 0x14
0x16 0x17 0x15
0x17 NA 0x16
Next
Neighbor
Previous
Neighbor
Command
Bus
0 0
11
S_Push and
S_Pull Bus
The Control Store is a RAM that holds the program that is executed by the Microengine. It holds 8192 instructions, each of which is 40 bits wide. It is initialized by the Intel XScale
®
core, which
writes to USTORE_ADDR and USTORE_DATA Local CSRs.
The Control Store is protected by parity against soft errors. Parity checking is enabled by CTX_ENABLE[CONTROL STORE PARITY ENABLE]. A parity error on an instruction read will halt the Microengine and assert an interrupt to the Intel XScale
®
core.

2.3.3 Contexts

There are eight hardware Contexts available in the Microengine. To allow for efficient context swapping, each Context has its own register set, Program Counter, and Context specific Local registers. Having a copy per Context eliminates the need to move Context specific information to/ from shared memory and Microengine registers for each Context swap. Fast context swapping allows a Context to do computation while other Contexts wait for I/O (typically external memory accesses) to complete or for a signal from another Context or hardware unit. (A context swap is similar to a taken branch in timing.)
Hardware Reference Manual 35
Intel® IXP2800 Network Processor
Technical Description
Each of the eight Contexts is in one of four states.
1. Inactive — Some applications may not require all eight contexts. A Context is in the Inactive state when its CTX_ENABLE CSR enable bit is a 0.
2. Executing — A Context is in Executing state when its context number is in ACTIVE_CTX_STS CSR. The executing Context’s PC is used to fetch instructions from the Control Store. A Context will stay in this state until it executes an instruction that causes it to go to Sleep state (there is no hardware interrupt or preemption; Context swapping is completely under software control). At most one Context can be in Executing state at any time.
3. Ready — In this state, a Context is ready to execute, but is not because a different Context is executing. When the Executing Context goes to the Sleep state, the Microengine’s context arbiter selects the next Context to go to the Executing state from among all the Contexts in the Ready state. The arbitration is round robin.
4. Sleep — Context is waiting for external event(s) specified in the INDIRECT_WAKEUP_EVENTS CSR to occur (typically, but not limited to, an I/O access). In this state the Context does not arbitrate to enter the Executing state.
The state diagram in Figure 5 illustrates the Context state transitions. Each of the eight Contexts will be in one of these states. At most one Context can be in Executing state at a time; any number of Contexts can be in any of the other states.
Figure 5. Context State Transition Diagram
Reset
CTX_ENABLE
Note:
After reset, the Intel XScale CTX_WAKEUP_EVENTS to 0x1 (voluntary), and then set the appropriate CTX_ENABLE bits to begin executing Context(s).
Inactive Ready
bit is cleared
E
Sleep Executing
®
Core processor must load the starting address of the CTX_PC, load the
The Microengine is in Idle state whenever no Context is running (all Contexts are in either Inactive or Sleep states). This state is entered:
CTX_ENABLE bit is set by
Intel XScale
CTX_ENABLE bit is cleared
v
E
l
a
n
r
e
t
x
Context executes
CTX Arbitration instruction
®
Core
s
e
v
i
r
r
a
l
a
n
g
i
S
t
n
e
Executing Context goes to Sleep state, and this Context is the highest round-robin priority.
A9352-03
1. After reset (CTX_ENABLE Local CSR is clear, putting all Contexts into Inactive states).
2. When a context swap is executed, but no context is ready to wake up.
3. When a condition 2 above, since the
ctx_arb[bpt] instruction is executed by the Microengine (this is a special case of
ctx_arb[bpt] clears CTX_ENABLE, putting all Contexts into
Inactive states).
36 Hardware Reference Manual
The Microengine provides the following functionality during the Idle state:
1. The Microengine continuously checks if a Context is in Ready state. If so, a new Context begins to execute. If no Context is Ready, the Microengine remains in the Idle state.
2. Only the ALU instructions are supported. They are used for debug via special hardware defined in number 3 below.
3. A write to the USTORE_ADDR Local CSR with the USTORE_ADDR[ECS] bit set, causing the Microengine to repeatedly execute the instruction pointed by the address specified in the USTORE_ADDR CSR. Only the ALU instructions are supported in this mode. Also, the result of the execution is written to the ALU_OUT Local CSR rather than a destination register.
4. A write to the USTORE_ADDR Local CSR with the USTORE_ADDR[ECS] bit set, followed by a write to the USTORE_DATA Local CSR loads an instruction into the Control Store. After the Control Store is loaded, execution proceeds as described in number 3 above.

2.3.4 Datapath Registers

As shown in the block diagram in Figure 4, each Microengine contains four types of 32-bit datapath registers:
1. 256 General Purpose registers
Intel® IXP2800 Network Processor
Technical Description
2. 512 Transfer registers
3. 128 Next Neighbor registers
4. 640 32-bit words of Local Memory
2.3.4.1 General-Purpose Registers (GPRs)
GPRs are used for general programming purposes. They are read and written exclusively under program control. GPRs, when used as a source in an instruction, supply operands to the execution datapath. When used as a destination in an instruction, they are written with the result of the execution datapath. The specific GPRs selected are encoded in the instruction.
The GPRs are physically and logically contained in two banks, GPR A, and GPR B, defined in
Table 5.
2.3.4.2 Transfer Registers
Transfer (abbreviated as Xfer) registers are used for transferring data to and from the Microengine and locations external to the Microengine, (for example DRAMs, SRAMs etc.). There are four types of transfer registers.
S_TRANSFER_IN
S_TRANSFER_OUT
D_TRANSFER_IN
D_TRANSFER_OUT
TRANSFER_IN registers, when used as a source in an instruction, supply operands to the execution datapath. The specific register selected is either encoded in the instruction, or selected indirectly via T_INDEX. TRANSFER_IN registers are written by external units (A typical case is when the external unit returns data in response to read instructions. However, there are other
Hardware Reference Manual 37
Intel® IXP2800 Network Processor
Technical Description
methods to write TRANSFER_IN registers, for example a read instruction executed by one Microengine may cause the data to be returned to a different Microengine. Details are covered in the instruction set descriptions).
TRANSFER_OUT registers, when used as a destination in an instruction, are written with the result from the execution datapath. The specific register selected is encoded in the instruction, or selected indirectly via T_INDEX. TRANSFER_OUT registers supply data to external units (for example, write data for an SRAM write).
The S_TRANSFER_IN and S_TRANSFER_OUT registers connect to the S_PUSH and S_PULL buses, respectively.
The D_TRANSFER_IN and D_TRANSFER_OUT Transfer registers connect to the D_PUSH and D_PULL buses, respectively.
Typically, the external units access the Transfer registers in response to instructions executed by the Microengines. However, it is possible for an external unit to access a given Microengine’s Transfer registers either autonomously, or under control of a different Microengine, or the Intel XScale core, etc. The Microengine interface signals controlling writing/reading of the TRANSFER_IN and TRANSFER_OUT registers are independent of the operation of the rest of the Microengine, therefore the data movement does not stall or impact other instruction processing (it is the responsibility of software to synchronize usage of read data).
®
2.3.4.3 Next Neighbor Registers
Next Neighbor registers, when used as a source in an instruction, supply operands to the execution datapath. They are written in two different ways:
1. By an adjacent Microengine (the “Previous Neighbor”).
2. By the same Microengine they are in, as controlled by CTX_ENABLE[NN_MODE].
The specific register is selected in one of two ways:
1. Context-relative, the register number is encoded in the instruction.
2. As a Ring, selected via NN_GET and NN_PUT CSR registers.
The usage is configured in CTX_ENABLE[NN_MODE].
When CTX_ENABLE[NN_MODE] is ‘0’ — when Next Neighbor is a destination in an
instruction, the result is sent out of the Microengine, to the Next Neighbor Microengine.
When CTX_ENABLE[NN_MODE] is ‘1’ — when Next Neighbor is used as a destination in
an instruction, the instruction result data is written to the selected Next Neighbor register in the same Microengine. Note that there is a 5-instruction latency until the newly written data may be read. The data is not sent out of the Microengine as it would be when CTX_ENABLE[NN_MODE] is ‘0’.
Table 4. Next Neighbor Write as a Function of CTX_ENABLE[NN_MODE]
Where the Write Goes
NN_MODE
External?
0Yes No
1No Yes
NN Register in this
Microengine?
38 Hardware Reference Manual
Intel® IXP2800 Network Processor
2.3.4.4 Local Memory
Local Memory is addressable storage within the Microengine. Local Memory is read and written exclusively under program control. Local Memory supplies operands to the execution datapath as a source, and receives results as a destination. The specific Local Memory location selected is based on the value in one of the LM_ADDR registers, which are written by local_csr_wr instructions. There are two LM_ADDR registers per Context and a working copy of each. When a Context goes to the Sleep state, the value of the working copies is put into the Context’s copy of LM_ADDR. When the Context goes to the Executing state, the value in its copy of LM_ADDR are put into the working copies. The choice of LM_ADDR_0 or LM_ADDR_1 is selected in the instruction.
It is also possible to make use of both or one LM_ADDRs as global by setting CTX_ENABLE[LM_ADDR_0_GLOBAL] and/or CTX_ENABLE[LM_ADDR_1_GLOBAL]. When used globally, all Contexts use the working copy of LM_ADDR in place of their own Context specific one; the Context specific ones are unused. There is a three-instruction latency when writing a new value to the LM_ADDR, as shown in Example 1.
Example 1. Three-Cycle Latency when Writing a New Value to LM_ADDR
;some instruction to compute the address into gpr_m local_csr_wr[INDIRECT_LM_ADDR_0, gpr_m]; put gpr_m into lm_addr ;unrelated instruction 1 ;unrelated instruction 2 ;unrelated instruction 3 alu[dest_reg, *l$index0, op, src_reg] ;dest_reg can be used as a source in next instruction
Technical Description
LM_ADDR can also be incremented or decremented in parallel with use as a source and/or destination (using the notation *l$index#++ and *l$index#--), as shown in Example 2, where three consecutive Local Memory locations are used in three consecutive instructions.
Example 2. Using LM_ADDR in Consecutive Instructions
alu[dest_reg1, src_reg1, op, *l$index0++] alu[dest_reg2, src_reg2, op, *l$index0++] alu[dest_reg3, src_reg3, op, *l$index0++]
Local Memory is written by selecting it as a destination. Example 3 shows copying a section of Local Memory to another section. Each instruction accesses the next sequential Local Memory location from the previous instruction.
Example 3. Copying One Section of Local Memory to Another Section
alu[*l$index1++, --, B, *l$index0++] alu[*l$index1++, --, B, *l$index0++] alu[*l$index1++, --, B, *l$index0++]
Example 4 shows loading and using both Local Memory addresses.
Example 4. Loading and Using Both Local Memory Addresses
local_csr_wr[INDIRECT_LM_ADDR_0, gpr_m] local_csr_wr[INDIRECT_LM_ADDR_1, gpr_n] ;unrelated instruction 1 ;unrelated instruction 2 alu[dest_reg1, *l$index0, op, src_reg1] alu[dest_reg2, *l$index1, op, src_reg2]
Hardware Reference Manual 39
Intel® IXP2800 Network Processor
Technical Description
As shown in Example 1, there is a latency in loading LM_ADDR. Until the new value is loaded, the old value is still usable. Example 5 shows the maximum pipelined usage of LM_ADDR.
Example 5. Maximum Pipelined Usage of LM_ADDR
local_csr_wr[INDIRECT_LM_ADDR_0, gpr_m] local_csr_wr[INDIRECT_LM_ADDR_0, gpr_n] local_csr_wr[INDIRECT_LM_ADDR_0, gpr_o] local_csr_wr[INDIRECT_LM_ADDR_0, gpr_p] alu[dest_reg1, *l$index0, op, src_reg1] ; uses address from gpr_m alu[dest_reg2, *l$index0, op, src_reg2] ; uses address from gpr_n alu[dest_reg3, *l$index0, op, src_reg3] ; uses address from gpr_o alu[dest_reg4, *l$index0, op, src_reg4] ; uses address from gpr_p
LM_ADDR can also be used as the base of a 16 32-bit word region of memory, with the instruction specifying the offset from that base, as shown in Example 6. The source and destination can use different offsets.
Example 6. LM_ADDR Used as Base of a 16 32-Bit Word Region of Local Memory
alu[*l$index0[3], *l$index0[4], +, 1]
Note: Local Memory has 640 32-bit words. The local memory pointers (LM_ADDR) have an addressing
range of up to 1K longwords. However, only 640 longwords are currently populated with RAM. Therefore:
0 – 639 (0x0 – 0x27F) are addressable as local memory.
640 – 1023 (0x280 – 0x3FF) are addressable, but not populated with RAM.
To the programmer, all instructions using Local Memory act as follows, including read/modify/write instructions like immed_w0, ld_field, etc.
1. Read LM_ADDR location (if LM_ADDR is specified as source).
2. Execute logic function.
3. Write LM_ADDR location (if LM_ADDR is specified as destination).
4. If specified, increment or decrement LM_ADDR.
5. Proceed to next instruction.
Example 7 is legal because
lm_addr_0[2] does not post-modify LM_ADDR.
Example 7. LM_ADDR Use as Source and Destination
alu[*l$index0[2], --, ~B, *l$index0]
In Example 7, the programmer sees:
1. Read Local Memory memory location pointed to by LM_ADDR.
2. Invert the data.
3. Write the data into the address pointed to by LM_ADDR with the value of 2 that is OR’ed into the lower bits.
4. Increment LM_ADDR.
5. Proceed to next instruction.
40 Hardware Reference Manual
In Example 8, the second instruction will access the Local Memory location one past the source/ destination of the first.
Example 8. LM_ADDR Post-Increment
alu[*l$index0++, --, ~B, gpr_n] alu[gpr_m, --, ~B, *l$index0]

2.3.5 Addressing Modes

GPRs can be accessed in either a context-relative or an absolute addressing mode. Some instructions can specify either mode; other instructions can specify only Context-Relative mode.
Transfer and Next Neighbor registers can be accessed in Context-Relative and Indexed modes, and Local Memory is accessed in Indexed mode. The addressing mode in use is encoded directly into each instruction, for each source and destination specifier.
2.3.5.1 Context-Relative Addressing Mode
The GPRs are logically subdivided into equal regions such that each Context has relative access to one of the regions. The number of regions is configured in the CTX_ENABLE CSR, and can be either 4 or 8. Thus a Context-Relative register number is actually associated with multiple different physical registers. The actual register to be accessed is determined by the Context making the access request (the Context number is concatenated with the register number specified in the instruction). Context-Relative addressing is a powerful feature that enables eight (or four) different contexts to share the same code image, yet maintain separate data.
Intel® IXP2800 Network Processor
Technical Description
Table 5 shows how the Context number is used in selecting the register number in relative mode.
The register number in Table 5 is the Absolute GPR address, or Transfer or Next Neighbor Index number to use to access the specific Context-Relative register. For example, with eight active Contexts, Context-Relative Register 0 for Context 2 is Absolute Register Number 32.
Table 5. Registers Used By Contexts in Context-Relative Addressing Mode
Number of
Active
Contexts
8
(Instruction
always specifies
registers in
range 0 – 15)
4
(Instruction
always specifies
registers in
range 0 – 31)
Active Context Number
0 0 – 15 0 – 15 0 – 15 0 – 15
1 16 – 31 16 – 31 16 – 31 16 – 31
2 32 – 47 32 – 47 32 – 47 32 – 47
3 48 – 63 48 – 63 48 – 63 48 – 63
4 64 – 79 64 – 79 64 – 79 64 – 79
5 80 – 95 80 – 95 80 – 95 80 – 95
6 96 – 111 96 – 111 96 – 111 96 – 111
7 112 – 127 112 – 127 112 – 127 112 – 127
0 0 – 31 0 – 31 0 – 31 0 – 31
2 32 – 63 32 – 63 32 – 63 32 – 63
4 64 – 95 64 – 95 64 – 95 64 – 95
6 96 – 127 96 – 127 96 – 127 96 – 127
Absolute Register Numbers
A Port B Port
GPR
S_Transfer or
Neighbor
Index Number
D_Transfer
Index Number
Hardware Reference Manual 41
Intel® IXP2800 Network Processor
Technical Description
2.3.5.2 Absolute Addressing Mode
With Absolute addressing, any GPR can be read or written by any of the eight Contexts in a Microengine. Absolute addressing enables register data to be shared among all of the Contexts, e.g., for global variables or for parameter passing. All 256 GPRs can be read by Absolute address.
2.3.5.3 Indexed Addressing Mode
With Indexed addressing, any Transfer or Next Neighbor register can be read or written by any one of the eight Contexts in a Microengine. Indexed addressing enables register data to be shared among all of the Contexts. For indexed addressing the register number comes from the T_INDEX register for Transfer registers or NN_PUT and NN_GET registers (for Next Neighbor registers).
Example 9 shows the Index Mode usage. Assume that the numbered bytes have been moved into
the S_TRANSFER_IN registers as shown.
Example 9. Use of Indexed Addressing Mode
Transfer Register
Number
0 0x00 0x01 0x02 0x03
1 0x04 0x05 0x06 0x07
2 0x08 0x09 0x0a 0x0b
3 0x0c 0x0d 0x0e 0x0f
4 0x10 0x11 0x12 0x013
5 0x14 0x15 0x16 0x17
6 0x18 0x19 0x1a 0x1b
7 0x1c 0x1d 0x1e 0x1f
31:24 23:16 15:8 7:0
Data
If the software wants to access a specific byte that is known at compile-time, it will normally use context-relative addressing. For example to access the word in transfer register 3:
alu[dest, --, B, $xfer3] ; move the data from s_transfer 3 to gpr dest
If the location of the data is found at run-time, indexed mode can be used, e.g., if the start of an encapsulated header depends on an outer header value (the outer header byte is in a fixed location).
; Check byte 2 of transfer 0 ; If value==5 header starts on byte 0x9, else byte 0x14 br=byte[$0, 2, 0x5, L1#], defer_[1] local_csr_wr[t_index_byte_index, 0x09] local_csr_wr[t_index_byte_index, 0x14] nop ; wait for index registers to be loaded L1#: ; Move bytes right justified into destination registers nop ; wait for index registers to be loaded nop ; byte_align_be[dest1, *$index++] byte_align_be[dest2, *$index++] ;etc. ; The t_index and byte_index registers are loaded by the same instruction.
42 Hardware Reference Manual

2.3.6 Local CSRs

Local Control and Status registers (CSRs) are external to the Execution Datapath, and hold specific data. They can be read and written by special instructions (local_csr_rd and local_csr_wr) and are accessed less frequently than datapath registers.
Because Local CSRs are not built in the datapath, there is a write-to-use delay of three instructions, and a read-to-consume penalty of two instructions.

2.3.7 Execution Datapath

The Execution Datapath can take one or two operands, perform an operation, and optionally write back a result. The sources and destinations can be GPRs, Transfer registers, Next Neighbor registers, and Local Memory. The operations are shifts, add/subtract, logicals, multiply, byte align, and find first one bit.
2.3.7.1 Byte Align
The datapath provides a mechanism to move data from source register(s) to any destination register(s) with byte aligning. Byte aligning takes four consecutive bytes from two concatenated values (8 bytes), starting at any of four byte boundaries (0, 1, 2, 3), and based on the endian-type (which is defined in the instruction opcode), as shown in Example 5. The four bytes are taken from two concatenated values. Four bytes are always supplied from a temporary register that always holds the A or B operand from the previous cycle, and the other four bytes from the B or A operand of the Byte Align instruction.
Intel® IXP2800 Network Processor
Technical Description
The operation is described below, using the block diagram in Figure 6. The alignment is controlled by the two LSBs of the BYTE_INDEX Local CSR.
Table 6. Align Value and Shift Amount
Right Shift Amount (Number of Bits)
Align Value
(in Byte_Index[1:0])
Little-Endian Big-Endian
0032
1824
21616
3248
(Decimal)
Hardware Reference Manual 43
Intel® IXP2800 Network Processor
Technical Description
Figure 6. Byte-Align Block Diagram
Prev_B
. . .
Example 10 shows a big-endian align sequence of instructions and the value of the various
operands. Table 7 shows the data in the registers for this example. The value in BYTE_INDEX[1:0] CSR (which controls the shift amount) for this example is 2.
Table 7. Register Contents for Example 10
Register
00123
14567
289AB
3CDEF
Byte 3 [31:24]
Byte 2 [23:16]
Byte 1
[15:8]
Shift
Result
Prev_A
. . .
B_OperandA_Operand
Byte_Index
A9353-01
Byte 0
[7:0]
Example 10. Big-Endian Align
Instruction Prev B A Operand B Operand Result
Byte_align_be[--, r0] -- -- 0123 -­Byte_align_be[dest1, r1] 0123 0123 4567 2345 Byte_align_be[dest2, r2] 4567 4567 89AB 6789 Byte_align_be[dest3, r3] 89AB 89AB CDEF ABCD
NOTE: A Operand comes from Prev_B register during byte_align_be instructions.
44 Hardware Reference Manual
Example 11 shows a little-endian sequence of instructions and the value of the various operands. Table 8 shows the data in the registers for this example. The value in BYTE_INDEX[1:0] CSR
(which controls the shift amount) for this example is 2.
Table 8. Register Contents for Example 11
Intel® IXP2800 Network Processor
Technical Description
Register
03210
17654
2BA98
3FEDC
Byte 3
[31:24]
Example 11. Little-Endian Align
Instruction A Operand B Operand Prev A Result
Byte_align_le[--, r0] 3210 -- -- -- Byte_align_le[dest1, r1] 7654 3210 3210 5432 Byte_align_le[dest2, r2] BA98 7654 7654 9876 Byte_align_le[dest3, r3] FEDC BA98 BA98 DCBA
NOTE: B Operand comes from Prev_A register during byte_align_le instructions.
As the examples show, byte aligning “n” words takes “n+1” cycles due to the first instruction needed to start the operation.
Another mode of operation is to use the T_INDEX register with post-increment, to select the source registers. T_INDEX operation is described later in this chapter.
Byte 2 [23:16]
Byte 1
[15:8]
Byte 0
[7:0]
2.3.7.2 CAM
The block diagram in Figure 7 is used to explain the CAM operation.
The CAM has 16 entries. Each entry stores a 32-bit value, which can be compared against a source operand by instruction:
CAM_Lookup[dest_reg, source_reg]
All entries are compared in parallel, and the result of the lookup is a 9-bit value that is written into the specified destination register in bits 11:3, with all other bits of the register 0 (the choice of bits 11:3 is explained below). The result can also optionally be written into either of the LM_Addr registers (see below in this section for details).
The 9-bit result consists of four State bits (dest_reg[11:8]), concatenated with a 1-bit Hit/Miss indication (dest_reg[7]), concatenated with 4-bit entry number (dest_reg[6:3]). All other bits of dest_reg are written with 0. Possible results of the lookup are:
miss (0) — lookup value is not in CAM, entry number is Least Recently Used entry (which
can be used as a suggested entry to replace), and State bits are 0000.
hit (1) — lookup value is in CAM, entry number is entry that has matched; State bits are the
value from the entry that has matched.
Hardware Reference Manual 45
Intel® IXP2800 Network Processor
Technical Description
Figure 7. CAM Block Diagram
Lookup Value
(from A port)
Tag Tag State
Tag State
Tag State
State
Match Match Match
Match
State Status Entry Number
0000 Miss 0 LRU Entry
State Hit 1 Hit Entry
Status
and
LRU
Logic
Lookup Status
(to Dest Req)
A9354-01
Note: The State bits are data associated with the entry. The use is only by software. There is no
implication of ownership of the entry by any Context. The State bits hardware function is:
the value is set by software (at the time the entry is loaded, or changed in an already loaded
entry).
its value is read out on a lookup that hits, and used as part of the status written into the
destination register.
its value can be read out separately (normally only used for diagnostic or debug).
The LRU (Least Recently Used) Logic maintains a time-ordered list of CAM entry usage. When an entry is loaded, or matches on a lookup, it is marked as MRU (Most Recently Used). Note that a lookup that misses does not modify the LRU list.
The CAM is loaded by instruction:
CAM_Write[entry_reg, source_reg, state_value]
The value in the register specified by source_reg is put into the Tag field of the entry specified by entry_reg. The value for the State bits of the entry is specified in the instruction as state_value.
46 Hardware Reference Manual
The value in the State bits for an entry can be written, without modifying the Tag, by instruction:
CAM_Write_State[entry_reg, state_value]
Note: CAM_Write_State does not modify the LRU list.
One possible way to use the result of a lookup is to dispatch to the proper code using instruction:
jump[register, label#],defer [3]
where the register holds the result of the lookup. The State bits can be used to differentiate cases where the data associated with the CAM entry is in flight, or is pending a change, etc. Because the lookup result was loaded into bits[11:3] of the destination register, the jump destinations are spaced eight instructions apart. This is a balance between giving enough space for many applications to complete their task without having to jump to another region, versus consuming too much Control Store. Another way to use the lookup result is to branch on just the hit miss bit, and use the entry number as a base pointer into a block of Local Memory.
When enabled, the CAM lookup result is loaded into Local_Addr as follows:
LM_Addr[5:0] = 0 ([1:0] are read-only bits) LM_Addr[9:6] = lookup result [6:3] (entry number) LM_Addr[11:10] = constant specified in instruction
Intel® IXP2800 Network Processor
Technical Description
This function is useful when the CAM is used as a cache, and each entry is associated with a block of data in Local Memory. Note that the latency from when CAM_Lookup executes until the LM_Addr is loaded is the same as when LM_Addr is written by a Local_CSR_Wr instruction.
The Tag and State bits for a given entry can be read by instructions:
CAM_Read_Tag[dest_reg, entry_reg] CAM_Read_State[dest_reg, entry_reg]
The Tag value and State bits value for the specified entry is written into the destination register, respectively for the two instructions (the State bits are placed into bits [11:8] of dest_reg, with all other bits 0). Reading the tag is useful in the case where an entry needs to be evicted to make room for a new value—the lookup of the new value results in a miss, with the LRU entry number returned as a result of the miss. The CAM_Read_Tag instruction can then be used to find the value that was stored in that entry. An alternative would be to keep the tag value in a GPR. These two instructions can also be used by debug and diagnostic software. Neither of these modify the state of the LRU pointer.
Note: The following rules must be adhered to when using the CAM.
CAM is not reset by Microengine reset. Software must either do a CAM_clear prior to using
the CAM to initialize the LRU and clear the tags to 0, or explicitly write all entries with
CAM_write.
No two tags can be written to have same value. If this rule is violated, the result of a lookup
that matches that value will be unpredictable, and LRU state is unpredictable.
The value 0x00000000 can be used as a valid lookup value. However, note that instruction puts 0x00000000 into all tags. To avoid violating rule 2 after doing necessary to write all entries to unique values prior to doing a lookup of 0x00000000.
CAM_clear
CAM_clear, it is
Hardware Reference Manual 47
Intel® IXP2800 Network Processor
Technical Description
An algorithm for debug software to find out the contents of the CAM is shown in Example 12.
Example 12. Algorithm for Debug Software to Find out the Contents of the CAM
; First read each of the tag entries. Note that these reads ; don’t modify the LRU list or any other CAM state. tag[0] = CAM_Read_Tag(entry_0);
......
tag[15] = CAM_Read_Tag(entry_15); ; Now read each of the state bits
state[0] = CAM_Read_State(entry_0); ... state[15] = CAM_Read_State(entry_15);
; Knowing what tags are in the CAM makes it possible to ; create a value that is not in any tag, and will therefore ; miss on a lookup.
; Next loop through a sequence of 16 lookups, each of which will ; miss, to obtain the LRU values of the CAM. for (i = 0; i < 16; i++) BEGIN_LOOP ; Do a lookup with a tag not present in the CAM. On a ; miss, the LRU entry will be returned. Since this lookup ; missed the LRU state is not modified. LRU[i] = CAM_Lookup(some_tag_not_in_cam); ; Now do a lookup using the tag of the LRU entry. This ; lookup will hit, which makes that entry MRU. ; This is necessary to allow the next lookup miss to ; see the next LRU entry. junk = CAM_Lookup(tag[LRU[i]]); END_LOOP
; Because all entries were hit in the same order as they were ; LRU, the LRU list is now back to where it started before the ; loop executed. ; LRU[0] through LRU[15] holds the LRU list.
The CAM can be cleared with CAM_Clear instruction. This instruction writes 0x00000000 simultaneously to all entries tag, clears all the state bits, and puts the LRU into an initial state (where entry 0 is LRU, ..., entry 15 is MRU).

2.3.8 CRC Unit

The CRC Unit operates in parallel with the Execution Datapath. It takes two operands, performs a CRC operation, and writes back a result. CRC-CCITT, CRC-32, CRC-10, CRC-5, and iSCSI polynomials are supported. One of the operands is the CRC_Remainder Local CSR, and the other is a GPR, Transfer_In register, Next Neighbor, or Local Memory, specified in the instruction and passed through the Execution Datapath to the CRC Unit.
The instruction specifies the CRC operation type, whether to swap bytes and or bits, and which bytes of the operand to include in the operation. The result of the CRC operation is written back into CRC_Remainder. The source operand can also be written into a destination register (however the byte/bit swapping and masking do not affect the destination register; they only affect the CRC computation). This allows moving data, for example, from S_TRANSFER_IN registers to S_TRANSFER_OUT registers at the same time as computing the CRC.
48 Hardware Reference Manual

2.3.9 Event Signals

Event Signals are used to coordinate a program with completion of external events. For example, when a Microengine executes an instruction to an external unit to read data (which will be written into a Transfer_In register), the program must insure that it does not try to use the data until the external unit has written it. This time is not deterministic due to queuing delays and other uncertainty in the external units (for example, DRAM refresh). There is no hardware mechanism to flag that a register write is pending, and then prevent the program from using it. Instead the coordination is under software control, with hardware support.
In the instructions that use external units (i.e., SRAM, DRAM, etc.) there are fields that direct the external unit to supply an indication (called an Event Signal) that the command has been completed. There are 15 Event Signals per Context that can be used, and Local CSRs per Context to track which Event Signals are pending and which have been returned. The Event Signals can be used to move a Context from Sleep state to Ready state, or alternatively, the program can test and branch on the status of Event Signals.
Event Signals can be set in nine different ways.
1. When data is written into S_TRANSFER_IN registers
2. When data is written into D_TRANSFER_IN registers
3. When data is taken from S_TRANSFER_OUT registers
Intel® IXP2800 Network Processor
Technical Description
4. When data is taken from D_TRANSFER_OUT registers
5. By a write to INTERTHREAD_SIGNAL register
6. By a write from Previous Neighbor Microengine to NEXT_NEIGHBOR_SIGNAL
7. By a write from Next Neighbor Microengine to PREVIOUS_NEIGHBOR_SIGNAL
8. By a write to SAME_ME_SIGNAL Local CSR
9. By Internal Timer
Any or all Event Signals can be set by any of the above sources.
When a Context goes to the Sleep state (executes a
ctx_swap token), it specifies which Event Signal(s) it requires to be put in Ready state.
The
ctx_arb instruction also specifies if the logical AND or logical OR of the Event Signal(s) is
ctx_arb instruction, or an instruction with
needed to put the Context into Ready state.
When all of the Context’s Event Signals arrive, the Context goes to Ready state, and then eventually to Executing state. In the case where the Event Signal is linked to moving data into or out of Transfer registers (numbers 1 through 4 in the list above), the code can safely use the Transfer register as the first instruction (for example, using a Transfer_In register as a source operand will get the new read data). The same is true when the Event Signal is tested for branches (br_=signal or br_!signal instructions).
The
ctx_arb instruction, CTX_SIG_EVENTS, and ACTIVE_CTX_WAKEUP_#_EVENTS
Local CSR descriptions provide details.
Hardware Reference Manual 49
Intel® IXP2800 Network Processor
Technical Description

2.4 DRAM

The IXP2800 Network Processor has controllers for three Rambus* DRAM (RDRAM) channels. Each of the controllers independently accesses its own RDRAMs, and can operate concurrently with the other controllers (i.e., they are not operating as a single, wider memory). DRAM provides high-density, high-bandwidth storage and is typically used for data buffers.
RDRAM sizes of 64, 128, 256, or 512 Mbytes, and 1 Gbyte are supported; however, each of
the channels must have the same number, size, and speed of RDRAMs populated. Refer to
Section 5.2 for supported size and loading configurations.
Up to two Gbytes of DRAM is supported. If less than two Gbytes of memory is present, the
upper part of the address space is not used. It is also possible, for system cost and area savings, to have Channels 0 and 1 populated with Channel 2 empty, or Channel 0 populated with Channels 1and 2 empty.
Reads and writes to RDRAM are generated by Microengines, The Intel XScale
(external Bus Masters and DMA Channels). The controllers also do refresh and calibration cycles to the RDRAMs, transparently to software.
RDRAM Powerdown and Nap modes are not supported.
Hardware interleaving (also known as striping) of addresses is done to provide balanced
access to all populated channels. The interleave size is 128 bytes. Interleaving helps to maintain utilization of available bandwidth by spreading consecutive accesses to multiple channels. The interleaving is done in the hardware in such a way that the three channels appear to software as a single contiguous memory space.
®
core, and PCI
ECC (Error Correcting Code) is supported, but can be disabled. Enabling ECC requires that
x18 RDRAMs be used. If ECC is disabled x16 RDRAMs can be used. ECC can detect and correct all single-bit errors, and detect all double-bit errors. When ECC is enabled, partial writes (writes of less than 8 bytes) must be done as read-modify-writes.

2.4.1 Size Configuration

Each channel can be populated with anywhere from one-to-four RDRAMs (Short Channel Mode). Refer to Section 5.2 for supported size and loading configurations. The RAM technology used will determine the increment size and maximum memory per channel as shown in Table 9.
Table 9. RDRAM Sizes
RDRAM Technology
64/72 MB 8 MB 256 MB
128/144 MB 16 MB 512 MB
256/288 MB 32 MB 1 GB
512/576 MB 64 MB 2 GB
NOTES:
1. The two numbers shown for each technology indicate x16 parts and x18 parts.
2. The maximum memory that can be addressed across all channels is 2 GB. This limitation is based on the partitioning of the 4-GB address space (32-bit addresses). Therefore, if all three channels are used, each can be populated up to a maximum of 768 MB. Two channels can be populated to a maximum of 1 GB each. A single channel can be populated to a maximum of 2 GB.
1
Increment Size Maximum per Channel
2
2
RDRAMs with 1 x 16 or 2 x 16 dependent banks, and 4 independent banks are supported.
50 Hardware Reference Manual

2.4.2 Read and Write Access

The minimum DRAM physical access length is 16 bytes. Software (and PCI) can read or write as little as a single byte, however the time (and bandwidth) taken at the DRAMs is the same as for an access of 16 bytes. Therefore, the best utilization of DRAM bandwidth will be for accesses that are multiples of 16 bytes.
If ECC is enabled, writes of less than 8 bytes must do read-modify-writes, which take two 16-byte time accesses (one for the read and one for the write).

2.5 SRAM

The IXP2800 Network Processor has four independent SRAM controllers, which each support pipelined QDR synchronous static RAM (SRAM) and/or a coprocessor that adheres to QDR signaling. Any or all controllers can be left unpopulated if the application does not need to use them. SRAM are accessible by the Microengines, the Intel XScale (external bus masters and DMA).
The memory is logically four bytes (32-bits) wide; physically the data pins are two bytes wide and are double clocked. Byte parity is supported. Each of the four bytes has a parity bit, which is written when the byte is written and checked when the data is read. There are byte-enables that select which bytes to write for writes of less than 32 bits.
Intel® IXP2800 Network Processor
Technical Description
®
core, and the PCI Unit
Each of the 4 QDR ports are QDR and QDRII compatible. Each port implements the “_K” and “_C” output clocks and “_CQ” as an input and their inversions. (Note: the “_C” and “_CQ” clocks are optional). Extensive work has been performed providing impedance controls within the IXP2800 Network Processor for processor-initiated signals driving to QDR parts. Providing a clean signaling environment is critical to achieving 200 – 250 MHz QDRII data transfers.
The configuration assumptions for the IXP2800 Network Processor I/O driver/receiver development includes four QDR loads and the IXP2800 Network Processor. The IXP2800 Network Processor supports bursts of two SRAMs, but does not support bursts of four SRAMs.
The SRAM controller can also be configured to interface to an external coprocessor that adheres to the QDR electricals and protocol. Each SRAM controller may also interface to an external coprocessor through its standard QDR interface. This interface enables the cohabitation of both SRAM devices and coprocessors to operate on the same bus. The coprocessor behaves as a memory-mapped device on the SRAM bus.
Hardware Reference Manual 51
Intel® IXP2800 Network Processor
Technical Description

2.5.1 QDR Clocking Scheme

The controller drives out two pairs of K clock (K and K#). It also drives out two pairs of C clock (C and C#). Both C/C# clocks externally return to the controller for reading data. Figure 8 shows
Figure 8. Echo Clock Configuration
the clock diagram if the clocking scheme for QDR interface driving four SRAM chips.
Termination
Clam-shelled SRAMS
CQ/CQ#
Package Balls
QDRn_CIN[0]
®
Intel IXP2800 Network Processor
*The CIN[1] pin is not used internally to capture the READ data; however, the I/O Pad can be used to terminate the signal.
QDRn_K[0] QDRn_C[0]
QDRn_C[1] QDRn_K[1]
*QDRn_CIN[1]
Termination

2.5.2 SRAM Controller Configurations

Each channel has enough address pins (24) to support up to 64 Mbytes of SRAM. The SRAM controllers can directly generate multiple port enables (up to four pairs) to allow for depth expansion. Two pairs of pins are dedicated for port enables. Smaller RAMs use fewer address signals than the number provided to accommodate the largest RAMs, so some address pins (23:20) are configurable as either address or port enable based on CSR setting as shown in Table 10. Note that all of the SRAMs on a given channel must be the same size.
K/K#
C/C#
C/C#
K/K#
Package Balls
CQ/CQ#
B3664-01
Table 10. SRAM Controller Configurations (Sheet 1 of 2)
SRAM
Configuration
512K x 18 1 MB 17:0 23:22, 21:20 4
1M x 18 2 MB 18:0 23:22, 21:20 4
2M x 18 4 MB 19:0 23:22, 21:20 4
4M x 18 8 MB 20:0 23:22 3
SRAM Size
52 Hardware Reference Manual
Addresses Needed
to Index SRAM
Addresses Used
as Port Enables
Total Number of Port
Select Pairs Available
Table 10. SRAM Controller Configurations (Sheet 2 of 2)
Intel® IXP2800 Network Processor
Technical Description
SRAM
Configuration
8M x 18 16 MB 21:0 23:22 3
16M x 18 32 MB 22:0 None 2
32M x 18 64 MB 23:0 None 2
SRAM Size
Addresses Needed
to Index SRAM
Addresses Used
as Port Enables
Total Number of Port
Select Pairs Available
Each channel can be expanded by depth according to the number of port enables available. If external decoding is used, then the number of SRAMs used is not limited by the number of port enables generated by the SRAM controller.
Note: Doing external decoding may require external pipeline registers to account for the decode time,
depending on the desired frequency.
Maximum SRAM system sizes are shown in Table 11. Shaded entries require external decoding, because they use more port enables than the SRAM controller can supply directly.
Table 11. Total Memory per Channel
SRAM Size
12345678
512K x 18 1 MB 2 MB 3 MB
1M x 18 2 MB 4 MB 6 MB 8 MB 10 MB 12 MB 14 MB 16 MB
2M x 18 4 MB 8 MB 12 MB 16 MB 20 MB 24 MB 28 MB 32 MB
4M x 18 8 MB 16 MB 24 MB 32 MB 64 MB NA NA NA
8M x 18 16 MB 32 MB 48 MB 64 MB NA NA NA NA
16M x 18 32 MB 64 MB NA NA NA NA NA NA
32M x 18 64 MB NA NA NA NA NA NA NA
Number of SRAMs on Channel
4 MB 5 MB 6 MB 7 MB 8 MB

2.5.3 SRAM Atomic Operations

In addition to normal reads and writes, SRAM supports the following atomic operations. Microengines have specific instructions to do each atomic operation; Intel XScale microarchitecture uses aliased address regions to do atomic operations.
bit set
bit clear
increment
decrement
add
swap
The SRAM does read-modify-writes for the atomic operations, the pre-modified data can also be returned if desired. The atomic operations operate on a single 32-bit word.
Hardware Reference Manual 53
®
Intel® IXP2800 Network Processor
Technical Description

2.5.4 Queue Data Structure Commands

The ability to enqueue and dequeue data buffers at a fast rate is key to meeting line-rate performance. This is a difficult problem as it involves dependent memory references that must be turned around very quickly. The SRAM controller includes a data structure (called the Q_array) and associated control logic to perform efficient enqueue and dequeue operations. The Q_array has 64 entries, each of which can be used in one of four ways.
Linked-list queue descriptor (resident queues)
Cache of recently used linked-list queue descriptors (backing store for the cache is in SRAM)
Ring descriptor
Journal
The commands provided are:
For Linked-list queues or Cache of recently used linked-list queue descriptors
Read_Q_Descriptor_Head(address, length, entry, xfer_addr)
Read_Q_Descriptor_Tail(address, length, entry)
Read_Q_Descriptor_Other(address, entry)
Write_Q_Descriptor(address, entry)
Write_Q_Descriptor_Count(address, entry)
ENQ(buff_desc_adr, cell_count, EOP, entry)
ENQ_tail(buff_desc_adr, entry)
DEQ(entry, xfer_addr)
For Rings
Get(entry, length, xfer_addr)
Put(entry, length, xfer_addr)
For Journals
Journal(entry, length, xfer_addr)
Fast_journal(entry)
Note: The Read_Q_Descriptor_Head, Read_Q_Descriptor_Tail, etc.) are used to initialize the rings and
journals but not used to perform the ring and journal function.

2.5.5 Reference Ordering

This section covers the ordering between accesses to any one SRAM controller.
2.5.5.1 Reference Order Tables
Table 12 shows the architectural guarantees of order to access to the SAME SRAM address
between a reference of any given type (shown in the column labels) and a subsequent reference of any given type (shown in the row labels). The definition of first and second is defined by the order they are received by the SRAM controller.
Note: A given Network Processor version may implement a superset of these order guarantees. However,
that superset may not be supported in future implementations.
54 Hardware Reference Manual
Verification is required to test only the order rules shown in Table 12 and Table 13).
Note: A blank entry in Table 12 means that no order is enforced.
Table 12. Address Reference Order
Intel® IXP2800 Network Processor
Technical Description
1st ref
nd
2
ref Memory
Read
Memory Read Order
CSR Read Order
Memory Write Order
CSR Write Order
Memory RMW Order
Queue / Ring / Q_ Descr Commands
CSR Read
Table 13 shows the architectural guarantees of order to access to the SAME SRAM Q_array entry
between a reference of any given type (shown in the column labels) and a subsequent reference of any given type (shown in the row labels). The definition of first and second is defined by the order they are received by the SRAM controller. The same caveats apply as for Table 12.
Table 13. Q_array Entry Reference Order
1st ref
nd
2
ref
Read_Q
_Descr
head,
tail
Read_
Q_Des
cr
other
Write_Q
_Descr
Memory
Write
Enqueue Dequeue Put Get Journal
CSR Write
Memory
RMW
Queue /
Ring /
Q_Descr
Commands
See
Table 13.
Read_Q_Descr head,tail
Read_Q_ Descr other
Write_Q_ Descr
Enqueue Order Order Order
Dequeue Order Order Order
Put Order
Get Order
Journal Order
Order
Order
Hardware Reference Manual 55
Intel® IXP2800 Network Processor
Technical Description
2.5.5.2 Microengine Software Restrictions to Maintain Ordering
It is the Microengine programmer’s job to ensure order where the program flow finds order to be necessary and where the architecture does not guarantee that order. The signaling mechanism can be used to do this. For example, say that microcode needs to update several locations in a table. A location in SRAM is used to “lock” access to the table. Example 13 is the code for the table update.
Example 13. Table Update Code
IMMED [$xfer0, 1] SRAM [write, $xfer0, flag_address, 0, 1], ctx_swap [SIG_DONE_2] ; At this point, the write to flag_address has passed the point of coherency. Do
the table updates. SRAM [write, $xfer1, table_base, offset1, 2] , sig_done [SIG_DONE_3] SRAM [write, $xfer3, table_base, offset2, 2] , sig_done [SIG_DONE_4] CTX_ARB [SIG_DONE_3, SIG_DONE_4] ; At this point, the table writes have passed the point of coherency. Clear the
flag to allow access by other threads. IMMED [$xfer0, 0] SRAM [write, $xfer0, flag_address, 0, 1, ctx_swap [SIG_DONE_2]
Other rules:
All accesses to atomic variables should be via read-modify-write instructions.
If the flow must know that a write is completed (actually in the SRAM itself), follow the write
with a read to the same address. The write is guaranteed to be complete when the read data has been returned to the Microengine.
With the exception of initialization, never do WRITE commands to the first three longwords
of a queue_descriptor data structure (these are the longwords that hold head, tail, and count, etc.). All accesses to this data must be via the Q commands.
To initialize the Q_array registers, perform a memory write of at least three longwords,
followed by a memory read to the same address (to guarantee that the write completed). Then, for each entry in the Q_array, perform a read_q_descriptor_head followed by a read_q_descriptor_other using the address of the same three longwords.

2.6 Scratchpad Memory

The IXP2800 Network Processor contains a 16 Kbytes of Scratchpad Memory, organized as 4K 32-bit words, that is accessible by Microengines and the Intel XScale Memory provides the following operations:
Normal reads and writes. 1–16 32-bit words can be read/written with a single Microengine
instruction. Note that Scratchpad is not byte-writable (each write must write all four bytes).
Atomic read-modify-write operations, bit-set, bit-clear, increment, decrement, add, subtract,
and swap. The RMW operations can also optionally return the pre-modified data.
Sixteen Hardware Assisted Rings for interprocess communication. (A ring is a FIFO that uses
a head and tail pointer to store/read information in Scratchpad memory.)
®
core. The Scratchpad
Scratchpad Memory is provided as a third memory resource (in addition to SRAM and DRAM) that is shared by the Microengines and the Intel XScale
®
XScale
core can distribute memory accesses between these three types of memory resources to
®
core. The Microengines and the Intel
provide a greater number of memory accesses occurring in parallel.
56 Hardware Reference Manual

2.6.1 Scratchpad Atomic Operations

Intel® IXP2800 Network Processor
Technical Description
In addition to normal reads and writes, the Scratchpad Memory supports the following atomic operations. Microengines have specific instructions to do each atomic operation; the Intel XScale microarchitecture uses aliased address regions to do atomic operations.
bit set
bit clear
increment
decrement
add
subtract
swap
The Scratchpad Memory does read-modify-writes for the atomic operations, the pre-modified data can also be returned if desired. The atomic operations operate on a single 32-bit word.

2.6.2 Ring Commands

The Scratchpad Memory provides sixteen Rings used for interprocess communication. The rings provide two operations.
Get(ring, length)
Put(ring, length)
®
Ring is the number of the ring (0 through 15) to get or put from, and length specifies the number of 32-bit words to transfer. A logical view of one of the rings is shown in Tabl e 9.
Figure 9. Logical View of Rings
1 of 16
Head Count SizeTail
Address Decoder
Scratchpad RAM
Read / Write / Atomic Addresses
Full
A9355-01
Hardware Reference Manual 57
Intel® IXP2800 Network Processor
Technical Description
Head, Tail, and Size are registers in the Scratchpad Unit. Head and Tail point to the actual ring data, which is stored in the Scratchpad RAM. The count of how many entries are on the Ring is determined by hardware using the Head and Tail. For each Ring in use, a region of Scratchpad RAM must be reserved for the ring data.
Note: The reservation is by software convention. The hardware does not prevent other accesses to the
region of Scratchpad Memory used by the Ring. Also the regions of Scratchpad Memory allocated to different Rings must not overlap.
Head points to the next address to be read on a get, and Tail points to the next address to be written on a put. The size of each Ring is selectable from the following choices: 128, 256, 512, or 1024 32-bit words.
Note: The region of Scratchpad used for a Ring is naturally aligned to it size.
When the Ring is near full, it asserts an output signal, which is used as a state input to the Microengines. They must use that signal to test (by doing Branch on Input State) for room on the Ring before putting data onto it. There is a lag in time from a put instruction executing to the Full signal being updated to reflect that put. To guarantee that a put will not overfill the ring there is a bound on the number of Contexts and the number of 32-bit words per write based on the size of the ring, as shown in Table 14. Each Context should test the Full signal, then do the put if not Full, and then wait until the Context has been signaled that the data has been pulled before testing the Full signal again.
An alternate usage method is to have Contexts allocate and deallocate entries from a shared count variable, using the atomic subtract to allocate and atomic add to deallocate. In this case the Full signal is not used.
Table 14. Ring Full Signal Use – Number of Contexts and Length versus Ring Size
Number of
Contexts
1 16161616
2 16161616
4 8 16 16 16
8 4 12 16 16
16 2 6 14 16
2414916
3213715
40 Illegal 2 5 12
48 Illegal 2 4 10
64 Illegal 1 3 7
128 Illegal Illegal 1 3
NOTES:
1. Number in each table entry is the largest length that should be put. 16 is the largest length that a single put instruction can generate.
2. Illegal -- With that number of Contexts, even a length of one could cause the Ring to overfill.
128 256 512 1024
Ring Size
58 Hardware Reference Manual
Intel® IXP2800 Network Processor

2.7 Media and Switch Fabric Interface

The Media and Switch Fabric (MSF) Interface is used to connect the IXP2800 Network Processor to a physical layer device (PHY) and/or to a Switch Fabric. the MSF consists of separate receive and transmit interfaces. Each of the receive and transmit interfaces can be separately configured for either SPI-4 Phase 2 (System Packet Interface) for PHY devices or CSIX-L1 protocol for Switch Fabric Interfaces.
The receive and transmit ports are unidirectional and independent of each other. Each port has 16 data signals, a clock, a control signal, and a parity signal, all of which use LVDS (differential) signaling, and are sampled on both edges of the clock. There is also a flow control port consisting of a clock, data, and ready status bits, and used to communicate between two IXP2800 Network Processors, or the IXP2800 Network Processor chip and a Switch Fabric Interface. These are also LVDS, dual-edge data transfer. All of the high speed LVDS interfaces support dynamic deskew training.
The block diagram in Figure 10 shows a typical configuration.
Figure 10. Example System Block Diagram
Technical Description
Receive protocol is SPI-4 Transmit mode is CSIX
Ingress
®
Intel
IXP2800
Network Processor
Framing/MAC
Device
(PHY)
SPI-4
Protocol
RDAT
RSTAT
Intel
Egress
®
IXP2800
TDAT
Flow Control
Optional
Gasket
(Note
Switch
1
)
Fabric
CSIX
Protocol
Network Processor
TSTAT
TDAT
Receive protocol is CSIX Transmit mode is SPI-4
Notes:
1. Gasket is used to convert 16-bit, dual-data IXP2800 signals to wider single edge CWord signals used by Switch Fabric, if required.
2. Per the CSIX specification, the terms "egress" and ingress" are with respect to the Switch Fabric.
So the egress processor handles traffic received from the Switch Fabric and the ingress processor handles traffic sent to the Switch Fabric.
RDAT
A9356-03
Hardware Reference Manual 59
Intel® IXP2800 Network Processor
Technical Description
An alternate system configuration is shown in the block diagram in Figure 11. In this case, a single IXP2800 Network Processor is used for both Ingress and Egress. The bit rate supported would be less than in Figure 10. A hypothetical Bus Converter chip, external to the IXP2800 Network Processor is used. The block diagram in Figure 11 is only an illustrative example.
Figure 11. Full-Duplex Block Diagram
Receive and transmit protocol is SPI-4 and CSIX on transfer­by-transfer basis.
Intel® IXP2800
Network Processor
TDATRDAT
Notes:

2.7.1 SPI-4

SPI-4 is an interface for packet and cell transfer between a physical layer (PHY) device and a link layer device (the IXP2800 Network Processor), for aggregate bandwidths of OC-192 ATM and Packet over SONET/SDH (POS), as well as 10 Gb/s Ethernet applications.
The Optical Internetworking Forum (OIF), www.oiforum.com, controls the SPI-4 Implementation Agreement document.
Framing/MAC
Device
(PHY)
Rx
Tx
Switch Fabric
Bus Converter
UTOPIA-3
or IXBUS
Protocol
The Bus Converter chip receives and transmits both SPI-4 and CSIX protocols from/to Intel IXP2800 Network Processor. It steers the data, based on protocol, to either PHY device or Switch Fabric. PHY interface can be UTOPIA-3, IXBUS, or any other required protocol.
Tx
Rx
CSIX
Protocol
A9357-02
SPI-4 protocol transfers data in variable length bursts. Associated with each burst is information such as Port number (for a multi-port device such as a 10 x 1 GbE), SOP, and EOP. This information is collected by the MSF and passed to the Microengines.
60 Hardware Reference Manual

2.7.2 CSIX

CSIX-L1 (Common Switch Interface) defines an interface between a Traffic Manager (TM) and a Switch Fabric (SF) for ATM, IP, MPLS, Ethernet, and similar data communications applications.
The Network Processor Forum (NPF) www.npforum.org, controls the CSIX-L1 specification.
The basic unit of information transferred between Traffic Managers and Switch Fabrics is called a CFrame. There are three categories of CFrames:
Data
Control
Flow Control
Associated with each CFrame is information such as length, type, address. This information is collected by MSF and passed to Microengines.
MSF also contains a number of hardware features related to flow control.

2.7.3 Receive

Intel® IXP2800 Network Processor
Technical Description
Figure 12 is a simplified block diagram of the MSF receive section.
Figure 12. Simplified MSF Receive Section Block Diagram
Checksum
RDAT RCTL RPAR
RPROT
RCLK
RCLK REF
RSTAT
CSIX
Protocol
Logic
SPI-4
Protocol
Logic
SPI-4
Flow
Control
Clock for Receive Functions
FCEFIFO
- - - - - -
- - - - - -
- - - - - -
- - - - - -
TXCDAT
RBUF
- - - - - -
- - - - - -
- - - - - -
- - - - - ­Full Indication to Flow Control
Element
TXCFC
(FCIFIFO full)
128
Full List
CSIX CFrames mapped by RX_Port_Map CSR (normally Flow Control CFrames are mapped here)
Buffers
Control
Receive
Thread
Freelists
(to MEs)
(to DRAM)
CSR Write
A9365-01
32
64
Hardware Reference Manual 61
Intel® IXP2800 Network Processor
Technical Description
2.7.3.1 RBUF
RBUF is a RAM that holds received data. It stores received data in sub-blocks (referred to as elements), and is accessed by Microengines or the Intel XScale
®
core reading the received information. Details of how RBUF elements are allocated and filled is based on the receive data protocol. When data is received, the associated status is put into the FULL_ELEMENT_LIST FIFO and subsequently sent to Microengines to process. FULL_ELEMENT_LIST insures that received elements are sent to Microengines in the order that the data was received.
RBUF contains a total of 8 KB of data. The element size is programmable as either 64 bytes, 128 bytes, or 256 bytes per element. In addition, RBUF can be programmed to be split into one, two, or three partitions depending on application. For receiving SPI-4, one partition would be used. For receiving CSIX, two partitions are used (Control CFrames and Data CFrames). When both protocols are being used, the RBUF can be split into three partitions. For both SPI-4 and CSIX, three partitions are used.
Microengines can read data from the RBUF to Microengine S_TRANSFER_IN registers using the
msf[read] instruction where they specify the starting byte number (which must be aligned to 4
bytes), and number of 32-bit words to read. The number in the instruction can be either the number of 32-bit words, or number of 32-bit word pairs, using the single and double instruction modifiers, respectively.
Microengines can move data from RBUF to DRAM using the
dram instruction where they specify
the starting byte number (which must be aligned to 4 bytes), the number of 32-bit words to read, and the address in DRAM to write the data.
For both types of RBUF read, reading an element does not modify any RBUF data, and does not free the element, so buffered data can be read as many times as desired. This allows, for example, a processing pipeline to have different Microengines handle different protocol layers, with each Microengine reading only the specific header information it requires.
2.7.3.1.1 SPI-4 and the RBUF
SPI-4 data is placed into RBUF with each SPI-4 burst allocating an element. If a SPI-4 burst is larger than the element size, another element is allocated. The status information for the element contains the following information:
3130292827262524232221201918171615141312111
RPROT
Element Byte Count
636261605958575655545352515049484746454443424140393837363534333
Reserved Checksum
SOP
EOP
Err
Len Err
9876543210
0 Abort Err
Par Err
Type
Null
ADR
The definitions of the fields are shown in Table 90, “RBUF SPIF-4 Status Definition” on page 252.
2
62 Hardware Reference Manual
2.7.3.1.2 CSIX and RBUF
CSIX CFrames are placed into either RBUF with each CFrame allocating an element. Unlike SPI-4, a single CFrame must not spill over into another element. Since CSIX spec specifies a maximum CFrame size of 256 bytes, this can be done by programming the element size to 256 bytes. However, if the Switch Fabric uses a smaller CFrame size, then a smaller RBUF element size can be used.
Flow Control CFrames are put into the FCEFIFO, to be sent to the Ingress IXP2800 Network Processor where a Microengine will read them to manage flow control information to the Switch Fabric.
The status information for the element contains the following information:
Intel® IXP2800 Network Processor
Technical Description
3130292827262524232221201918171615141312111
RPROT
Element Payload Length
636261605958575655545352515049484746454443424140393837363534333
Extension Header
CR
Len Err
Err
P
9876543210
0
HP Err
VP Err
Null
Reserved Type
The definitions of the fields are shown in Table 91, “RBUF CSIX Status Definition” on page 254.
2.7.3.2 Full Element List
Receive control hardware maintains the FULL_ELEMENT_LIST to hold the status of valid RBUF elements, in the order in which they were received. When an RBUF element is filled, its status is added to the tail of the FULL_ELEMENT_LIST. When a Microengine is notified of element arrival (by having the status written to its S_Transfer register), it is removed from the head of the FULL_ELEMENT_LIST.
2.7.3.3 RX_THREAD_FREELIST
RX_THREAD_FREELIST is a FIFO that indicates Microengine Contexts that are awaiting an RBUF element to process. This allows the Contexts to indicate their ready status prior to the reception of the data, as a way to eliminate latency. Each entry added to a Freelist also has an associated S_TRANSFER register and signal number. There are three RX_THREAD_FREELISTS that correspond to the RBUF partitions.
2
To be added as ready to receive an element, a Microengine does an
msf[fast_write] to the RX_THREAD_FREELIST address; the write data is the Microengine/
msf[write] or an
CONTEXT/S_TRANSFER register number to add to the Freelist.
When there is valid status at the head of the Full Element List, it will be pushed to a Microengine. The receive control logic pushes the status information (which includes the element number) to the Microengine in the head entry of RX_THREAD_FREELIST, and sends an Event Signal to the Microengine. It then removes that entry from the RX_THREAD_FREELIST, and removes the status from Full Element List.
Hardware Reference Manual 63
Intel® IXP2800 Network Processor
Technical Description
Each RX_THREAD_FREELIST has an associated countdown timer. If the timer expires and no new receive data is available yet, the receive logic will autopush a Null Receive Status Word to the next thread on the RX_THREAD_FREELIST. A Null Receive Status Word has the “Null” bit set, and does not have any data or RBUF entry associated with it.
The RX_THREAD_FREELIST timer is useful for certain applications. Its primary purpose is to keep the receive processing pipeline (implemented as code running on the Microengines) moving even when the line has gone idle.
It is especially useful if the pipeline is structured to handle mpackets in groups, i.e., eight mpackets at a time. If seven mpackets are received, then the line goes idle, then the timeout will trigger the autopush of a null Receive Status Word, filling the eighth slot and allowing the pipeline to advance. Another example is if one valid mpacket is received before the line goes idle for a long period; seven null Receive Status Words will be autopushed, allowing the pipeline to proceed. Typically the timeout interval is programmed to be slightly larger than the minimum arrival time of the incoming cells or packets.
The timer is controlled using the RX_THREAD_FREELIST_TIMEOUT_# CSR. The timer may be enabled or disabled, and the timeout value specified using this CSR.
2.7.3.4 Receive Operation Summary
During receive processing, received CFrames, and SPI-4 cells and packets (which in this context are all called mpackets) are placed into the RBUF, and then handed off to a Microengine to process. Normally, by application design, some number of Microengine Contexts will be assigned to receive processing. Those Contexts will have their number added to the proper RX_THREAD_FREELIST (via wait for arrival of an mpacket (or alternatively poll waiting for arrival of an mpacket).
When an mpacket arrives, MSF receive control logic will autopush eight bytes of information for the element to the Microengine/CONTEXT/S_TRANSFER registers at the head of RX_THREAD_FREELIST. The information pushed is:
msf[write]or msf[fast_write]), and then will go to sleep to
Status Word (SPI-4) or Header Status (CSIX) — see Table 90, “RBUF SPIF-4 Status
Definition” on page 252 for more information.
Checksum (SPI-4) or Extension Header (CSIX) — see Table 91, “RBUF CSIX Status
Definition” on page 254 for more information.
To handle the case where the receive Contexts temporarily fall behind and RX_THREAD_FREELIST is empty, all received element numbers are held in the FULL_ELEMENT_LIST. In that case, as soon as an RX_THREAD_FREELIST entry is entered, the status of the head element of FULL_ELEMENT_LIST will be pushed to it.
The Microengines may read part of (or the entire) RBUF element to their S_TRANSFER registers (via an
msf[read] instruction) for header processing, etc., and may also move the element data to
DRAM (via a
When a Context is done with an element, it does an RBUF_ELEMENT_DONE address; the write data is the element number. This marks the element as free and available to be re-used. There is no restriction on the order in which elements are freed; Contexts can do different amounts of processing per element based on the contents of the element — therefore elements can be returned in a different order than they were handed to Contexts.
dram[rbuf_rd] instruction).
msf[write]or msf[fast_write] to
64 Hardware Reference Manual

2.7.4 Transmit

Figure 13 is a simplified Block Diagram of the MSF transmit section.
Figure 13. Simplified Transmit Section Block Diagram
Intel® IXP2800 Network Processor
Technical Description
From Other CSRs
2.7.4.1 TBUF
TBUF is a RAM that holds data and status to be transmitted. The data is written into sub-blocks referred to as elements, by Microengines or the Intel XScale
From ME
From DRAM
- - - - - -
- - - - - -
- - - - - -
- - - - - -
Element
RXCFC
(FCIFIFO full)
TBUF
Valid Logic
FCIFIFO
- - - - - -
- - - - - -
- - - - - -
- - - - - -
RXCDAT
Control
ME Reads
(S_Push_Bus)
SPI-4
Protocol
Logic
CSIX
Protocol
Logic
RXCSRB
(Ready Bits)
®
core.
Internal Clock
for Transmit
Logic
Byte Align
Internal
Clock
TDAT TCTL TPAR
TCLK
TCLK REF
A9366-01
TBUF contains a total of 8 Kbytes of data. The element size is programmable as either 64 bytes, 128 bytes, or 256 bytes per element. In addition, TBUF can be programmed to be split into one, two, or three partitions depending on application. For transmitting SPI-4, one partition would be used. For transmitting CSIX, two partitions are used (Control CFrames and Data CFrames). For both SPI-4 and CSIX, three partitions are used.
Microengines can write data from Microengine S_TRANSFER_OUT registers to the TBUF using the
msf[write] instruction where they specify the starting byte number (which must be aligned to
4 bytes), and number of 32-bit words to write. The number in the instruction can be either the number of 32-bit words, or number of 32-bit word pairs, using the single and double instruction modifiers, respectively.
Microengines can move data from DRAM to TBUF using the
dram instruction where they specify
the starting byte number (which must be aligned to 4 bytes), the number of 32-bit words to write, and the address in DRAM of the data.
Hardware Reference Manual 65
Intel® IXP2800 Network Processor
Technical Description
All elements within a TBUF partition are transmitted in the order. Control information associated with the element defines which bytes are valid. The data from the TBUF will be shifted and byte aligned as required to be transmitted.
2.7.4.1.1 SPI-4 and TBUF
For SPI-4, data is put into the data portion of the element, and information for the SPI-4 Control Word that will precede the data is put into the Element Control Word.
When the Element Control Word is written, the information is:
3130292827262524232221201918171615141312111
Payload Length
636261605958575655545352515049484746454443424140393837363534333
Prepend
Offset
Prepend Length
Payload
Offset
Res
Res
9876543210
0
SOP
Res
EOP
ADR
Skip
The definitions of the fields are shown in Table 15.
Table 15. TBUF SPI-4 Control Definition
Field Definition
Indicates the number of Payload bytes, from 1 to 256, in the element. The value of 0x00
Payload Length
Prepend Offset Indicates the first valid byte of Prepend, from 0 to 7
Prepend Length Indicates the number of bytes in Prepend, from 0 to 31.
Payload Offset Indicates the first valid byte of Payload, from 0 to 7.
Skip
SOP
EOP
ADR
NOTE:
1. Normally EOPS is sent on the next Control Word (along with ADR and SOP) to start the next element. If there is no valid element pending at the end of sending the data, the transmit logic will insert an Idle Control Word with the EOPS information.
means 256 bytes. The sum of Prepend Length and Payload Length will be sent. That value will also control the EOPS field (1 or 2 bytes valid indicated) of the Control Word that will succeed the data transfer. Note 1.
Allows software to allocate a TBUF element and then not transmit any data from it. 0—transmit data according to other fields of Control Word. 1—free the element without transmitting any data.
Indicates if the element is the start of a packet. This field will be sent in the SOPC field of the Control Word that will precede the data transfer.
Indicates if the element is the end of a packet. This field will be sent in the EOPS field of the Control Word that will succeed the data transfer. Note 1.
The port number to which the data is directed. This field will be sent in the ADR field of the Control Word that will precede the data transfer.
2
66 Hardware Reference Manual
2.7.4.1.2 CSIX and TBUF
For CSIX, payload information is put into the data area of the element, and Base and Extension Header information is put into the Element Control Word.
When the Element Control Word is written, the information is:
Intel® IXP2800 Network Processor
Technical Description
3130292827262524232221201918171615141312111
Payload Length
636261605958575655545352515049484746454443424140393837363534333
Prepend
Offset
Prepend Length
Extension Header
Payload
Offset
Res
9876543210
0
Skip
Res
CR
P
Res Type
The definitions of the fields are shown in Table 16.
Table 16. TBUF CSIX Control Definition
Field Definition
Indicates the number of Payload bytes, from 1 to 256, in the element. The value of 0x00
Payload Length
Prepend Offset Indicates the first valid byte of Prepend, from 0 to 7.
Prepend Length Indicates the number of bytes in Prepend, from 0 to 31.
Payload Offset Indicates the first valid byte of Payload, from 0 to 7.
Skip
CR CR (CSIX Reserved) bit to put into the CSIX Base Header.
P P (Private) bit to put into the CSIX Base Header.
Type Type Field to put into the CSIX Base Header. Idle type is not legal here.
Extension Header
means 256 bytes. The sum of Prepend Length and Payload Length will be sent, and also put into the CSIX Base Header Payload Length field. Note that this length does not include any padding that may be required. Padding is inserted by transmit hardware as needed.
Allows software to allocate a TBUF element and then not transmit any data from it. 0—transmit data according to other fields of Control Word. 1—free the element without transmitting any data.
The Extension Header to be sent with the CFrame. The bytes are sent in big-endian order; byte 0 is in bits 63:56, byte 1 is in bits 55:48, byte 2 is in bits 47:40, and byte 3 is in bits 39:32.
2
2.7.4.2 Transmit Operation Summary
During transmit processing data to be transmitted is placed into the TBUF under Microengine control. The Microengine allocates an element in software; the transmit hardware processes TBUF elements within a partition in strict sequential order so the software can track which element to allocate next.
Microengines may write directly into an element by an DRAM written into the element by a
dram[tbuf_wr] instruction. Data can be merged into the
element by doing both.
Hardware Reference Manual 67
msf[write] instruction, or have data from
Intel® IXP2800 Network Processor
Technical Description
There is a Transmit Valid bit per element, that marks the element as ready to be transmitted. Microengines move all data into the element, by either or both of
dram[tbuf_wr] instructions to the TBUF. Microengines also write the element Transmit Control
Word with information about the element. When all of the data movement is complete, the Microengine sets the element valid bit.
1. Move data into TBUF by either or both of the TBUF.
2. Wait for 1 to complete.
3. Write Transmit Control Word at TBUF_ELEMENT_CONTROL_# address. Using this address sets the Transmit Valid bit.

2.7.5 The Flow Control Interface

The MSF provides flow control support for SPI-4 and CSIX.
2.7.5.1 SPI-4
SPI-4 uses a FIFO Status Channel to provide flow control information. MSF receives the information from the PHY device and stores it so that Microengines can read the information on a per-port basis. It can then use that information to determine when to transmit data to a given port.
msf[write] and
msf[write] and dram[tbuf_wr] instructions to
The MSF also sends status to the PHY based on the amount of available space in the RBUF — i.e., done by hardware without Microengines.
2.7.5.2 CSIX
CSIX provides two types of flow control — link level and per queue.
The link level control is handled by hardware. MSF will stop transmission is response to link
level flow control received from the Switch Fabric. MSF will assert link level flow control based on the amount of available space in the RBUF.
Per queue flow control information is put into the FCIFIFO and handled by Microengine
software. Also, if required, Microengines can send Flow Control CFrames to the Switch Fabric under software control.
In both cases, for a full-duplex configuration, information is passed from the Switch Fabric to the Egress IXP2800 Network Processor, which then passes it to the Ingress IXP2800 Network Processor over a proprietary flow control interface.
68 Hardware Reference Manual

2.8 Hash Unit

The IXP2800 Network Processor contains a Hash Unit that can take 48-, 64-, or 128-bit data and produce a 48-, 64-, or a 128-bit hash index, respectively. The Hash Unit is accessible by the Microengines and the Intel XScale example L2 addresses. Figure 14 is a block diagram of the Hash Unit.
Intel® IXP2800 Network Processor
Technical Description
®
core, and is useful in doing table searches with large keys, for
Up to three hash indexes can be created using a single Microengine instruction. This helps to minimize command overhead. The Intel XScale
A Microengine initiates a hash operation by writing the hash operands into a contiguous set of S_TRANSFER_OUT registers and then executing the hash instruction. The Intel XScale initiates a hash operation by writing a set of memory-mapped HASH_OP registers, which are built in the Intel XScale
®
core gasket, with the data to be used to generate the hash index. There are
®
core can only do a single hash at a time.
®
core
separate registers for 48-, 64-, and 128-bit hashes. The data is written from MSB to LSB, with the write to LSB triggering the Hash Operation. In both cases, the Hash Unit reads the operand into an input buffer, performs the hash operation, and returns the result.
The Hash Unit uses a hard-wired polynomial algorithm and a programmable hash multiplier to create hash indexes. Three separate multipliers are supported, one for 48-bit hash operations, one for 64-bit hash operations and one for 128-bit hash operations. The multiplier is programmed through Control registers in the Hash Unit.
The multiplicand is shifted into the hash array, 16 bits at a time. The hash array performs a 1’s-complement multiply and polynomial divide, using the multiplier and 16 bits of the multiplicand. The result is placed into an output buffer register and also feeds back into the array. This process is repeated three times for a 48-bit hash (16 bits x 3 = 48), four times for a 64-bit hash (16 bits x 4 = 64), and eight times for a 128-bit hash (16 x 8 = 128). After the multiplicand has been passed through the hash array, the resulting hash index is placed into a two-stage output buffer.
After each hash index is completed, the Hash Unit returns the hash index to the Microengines’ S_TRANSFER_IN registers, or the Intel XScale
®
core HASH_OP registers. For Microengine initiated hash operations, the Microengine is signaled after all the hashes specified in the instruction have been completed.
For the Intel XScale
®
core initiated hash operations, the Intel XScale® core reads the results from the memory-mapped HASH_OP registers. The addresses of Hash Results are the same as the HASH_OP registers. Because of queuing delays at the Hash Unit, the time to complete an operation is not fixed. The Intel XScale
®
core can do one of two operations to get the hash results.
Poll the HASH_DONE register. This register is cleared when the HASH_OP registers are
written. Bit [0] of HASH_DONE register is set when the HASH_OP registers get the return result from the Hash Unit (when the last word of the result is returned). The Intel XScale software can poll on HASH_DONE, and read HASH_OP when HASH_DONE is equal to 0x00000001.
Read HASH_OP directly. The interface hardware will acknowledge the read only when the
result is valid. This method will result in the Intel XScale
®
core stalling if the result is not
valid when the read happens.
The number of clock cycles required to perform a single hash operation equals: two or four cycles through the input buffers, three, four or eight cycles through the hash array, and two or four cycles through the output buffers. Because of the pipeline characteristics of the Hash Unit, performance is improved if multiple hash operations are initiated with a single instruction rather than separate hash instructions for each hash operation.
Hardware Reference Manual 69
®
core
Intel® IXP2800 Network Processor
Technical Description
Figure 14. Hash Unit Block Diagram
Data Used to Create Hash Index from S_Transfer_Out
2-Stage Input Buffer
128
Hashed Multiplicand 3
128
Hashed Multiplicand 2 Hashed Multiplicand 1
Hash Indexes to S_Transfer_In
Registers
Multiplicand 3 Multiplicand 2
128
Multiplicand 1
16
Hash Array
2-Stage Output Buffer
shift
Hash_Multiplier_48
Hash_Multiplier_64
Hash_Multiplier_128
48-bit, 64-bit or 128-bit Hash Select
A9367-02
70 Hardware Reference Manual

2.9 PCI Controller

The PCI Controller provides a 64-bit, 66 MHz capable PCI Local Bus Revision 2.2 interface, and is compatible to 32-bit or 33 MHz PCI devices. The PCI controller provides the following functions:
Target Access (external Bus Master access to SRAM, DRAM, and CSRs)
Master Access (the Intel XScale
Two DMA Channels
Mailbox and Doorbell registers for the Intel XScale
PCI arbiter
The IXP2800 Network Processor can be configured to act as PCI central function (for use in a stand-alone system), where it provides the PCI reset signal, or as an add-in device, where it uses the PCI reset signal as the chip reset input. The choice is made by connecting the cfg_rst_dir input pin low or high.

2.9.1 Target Access

Intel® IXP2800 Network Processor
®
core access to PCI Target devices)
®
core to Host communication
Technical Description
There are three Base Address Registers (BARs) to allow PCI Bus Masters to access SRAM, DRAM, and CSRs, respectively. Examples of PCI Bus Masters include a Host Processor (for example a Pentium encryption coprocessor.
The SRAM BAR can be programmed to sizes of 16, 32, 64, 128, or 256 Mbytes, or no access.
The DRAM BAR can be programmed to sizes of 128, 256, or 512 Mbytes or 1 Gbyte, or no access.
The CSR BAR is 8 KB.
PCI Boot Mode is supported, in which the Host downloads the Intel XScale DRAM, while holding the Intel XScale Intel XScale attached to the Slowport.
®
processor), or an I/O device such as an Ethernet controller, SCSI controller, or
®
core reset is deasserted. The alternative is to provide the boot image in a Flash ROM

2.9.2 Master Access

The Intel XScale® core and Microengines can directly access the PCI bus. The Intel XScale® core can do loads and stores to specific address regions to generate all PCI command types. Microengines use PCI instruction, and also use address regions to generate different PCI commands.

2.9.3 DMA Channels

®
®
core in reset. Once the boot image has been loaded, the
core boot image into
There are two DMA Channels, each of which can move blocks of data from DRAM to the PCI or from the PCI to DRAM. The DMA channels read parameters from a list of descriptors in SRAM, perform the data movement to or from DRAM, and stop when the list is exhausted. The descriptors are loaded from predefined SRAM entries or may be set directly by CSR writes to DMA Channel registers. There is no restriction on byte alignment of the source address or the destination address.
Hardware Reference Manual 71
Intel® IXP2800 Network Processor
Technical Description
For PCI to DRAM transfers, the PCI command is Memory Read, Memory Read line, or Memory Read Multiple. For DRAM to PCI transfers, the PCI command is Memory Write. Memory Write Invalidate is not supported.
Up to two DMA channels are running at a time with three descriptors outstanding. Effectively, the active channels interleave bursts to or from the PCI Bus.
Interrupts are generated at the end of DMA operation for the Intel XScale Microengines do not provide an interrupt mechanism. The DMA Channel will instead use an Event Signal to notify the particular Microengine on completion of DMA.
2.9.3.1 DMA Descriptor
Each descriptor uses four 32-bit words in SRAM, aligned on a 16-byte boundary. The DMA channels read the descriptors from SRAM into working registers once the control register has been set to initiate the transaction. This control must be set explicitly; this starts the DMA transfer. Register names for DMA channels are listed in Figure 15 and Table 17 lists the descriptor contents.
Figure 15. DMA Descriptor Reads
Local SRAM
Last Descriptor
4
1
Prior Descriptor
Next Descriptor
3
2
Current Descriptor
Working Register
DMA Channel Register
Byte Count Register PCI Address Register DRAM Address REgister Descriptor Pointer Register
Channel Register Name
(X can be 1, 2, or 3) CHAN_X_BYTE_COUNT CHAN_X_PCI_ADDR CHAN_X_DRAM_ADDR CHAN_X_DESC_PTR
Control Register
DMA Channel Register
Control Register CHAN_X_CONTROL
Channel Register Name
(X can be 1, 2, or 3)
®
core. However,
A9368-01
After a descriptor is processed, the next descriptor is loaded in the working registers. This process repeats until the chain of descriptors is terminated (i.e., the End of Chain bit is set).
Table 17. DMA Descriptor Format
Offset from Descriptor Pointer Description
0x0 Byte Count
0x4 PCI Address
0x8 DRAM Address
0xC Next Descriptor Address
72 Hardware Reference Manual
2.9.3.2 DMA Channel Operation
The DMA channel can be set up to read the first descriptor in SRAM, or with the first descriptor written directly to the DMA channel registers. When descriptors and the descriptor list are in SRAM, the procedure is as follows:
1. The DMA channel owner writes the address of the first descriptor into the DMA Channel Descriptor Pointer register (DESC_PTR).
2. The DMA channel owner writes the DMA Channel Control register (CONTROL) with miscellaneous control information and also sets the channel enable bit (bit 0). The channel initial descriptor bit (bit 4) in the CONTROL register must also be cleared to indicate that the first descriptor is in SRAM.
3. Depending on the DMA channel number, the DMA channel reads the descriptor block into the corresponding DMA registers, BYTE_COUNT, PCI_ADDR, DRAM_ADDR, and DESC_PTR.
4. The DMA channel transfers the data until the byte count is exhausted, and then sets the channel transfer done bit in the CONTROL register.
5. If the end of chain bit (bit 31) in the BYTE_COUNT register is clear, the channel checks the Chain Pointer value. If the Chain Pointer value is not equal to 0. it reads the next descriptor and transfers the data (step 3 and 4 above). If the Chain Pointer value is equal to 0, it waits for the Descriptor Added bit of the Channel Control register to be set before reading the next descriptor and transfers the data (step 3 and 4 above). If bit 31 is set, the channel sets the channel chain done bit in the CONTROL register and then stops.
6. Proceed to the Channel End Operation.
Intel® IXP2800 Network Processor
Technical Description
When single descriptors are written into the DMA channel registers, the procedure is as follows:
1. The DMA channel owner writes the descriptor values directly into the DMA channel registers. The end of chain bit (bit 31) in the BYTE_COUNT register must be set, and the value in the DESC_PTR register is not used.
2. The DMA channel owner writes the base address of the DMA transfer into the PCI_ADDR to specify the PCI starting address.
3. When the first descriptor is in the BYTE_COUNT register, the DRAM_ADDR register must be written with the address of the data to be moved.
4. The DMA channel owner writes the CONTROL register with miscellaneous control information, along with setting the channel enable bit (bit 0). The channel initial descriptor in register bit (bit 4) in the CONTROL register must also be set to indicate that the first descriptor is already in the channel descriptor registers.
5. The DMA channel transfers the data until the byte count is exhausted, and then sets the channel transfer done bit (bit 2) in the CONTROL register.
6. Since the end of the chain bit (bit 31) in the BYTE_CONT register is set, the channel sets the channel chain done bit (bit 7) in the CONTROL register and then stops.
7. Proceed to the Channel End Operation.
Hardware Reference Manual 73
Intel® IXP2800 Network Processor
Technical Description
2.9.3.3 DMA Channel End Operation
1. Channel owned by PCI: If not masked via the PCI Outbound Interrupt Mask register, the DMA channel interrupts the
PCI host after the setting of the DMA done bit in the CHAN_X_CONTROL register, which is readable in the PCI Outbound Interrupt Status register.
2. Channel owned by the Intel XScale If enabled via the Intel XScale
the Intel XScale
®
core by setting the DMA channel done bit in the CHAN_X_CONTROL
register, which is readable in the Intel XScale
®
core:
®
core Interrupt Enable registers, the DMA channel interrupts
®
core Interrupt Status register.
3. Channel owned by Microengine: If enabled via the Microengine Auto-Push Enable registers, the DMA channel signals the
Microengine after setting the DMA channel done bit in the CHAN_X_CONTROL register, which is readable in the Microengine Auto-Push Status register.
2.9.3.4 Adding Descriptors to an Unterminated Chain
It is possible to add a descriptor to a chain while a channel is running. To do so, the chain should be left unterminated, i.e., the last descriptor should have End of Chain clear, and the Chain Pointer value equal to 0. A new descriptor (or linked list of descriptors) can be added to the chain by overwriting the Chain Pointer value of the unterminated descriptor (in SRAM) with the Local Memory address of the (first) added descriptor (the added descriptor must actually be valid in Local Memory prior to that). After updating the Chain Pointer field, the software must write a 1 to the Descriptor Added bit of the Channel Control register. This is necessary for the case where the channel was paused to reactivate the channel. However, software need not check the state of the channel before writing that bit; there is no side-effect of writing that bit in the case where the channel had not yet read the unlinked descriptor.
If the channel was paused or had read an unlinked Pointer, it will re-read the last descriptor processed (i.e., the one that originally had the 0 value for Chain Pointer) to get the address of the newly added descriptor.
A descriptor cannot be added to a descriptor that has End of Chain set.

2.9.4 Mailbox and Message Registers

Mailbox and Doorbell registers provide hardware support for communication between the Intel
®
XScale
Four 32-bit mailbox registers are provided so that messages can be passed between the Intel XScale both the Intel XScale messages are not used internally by the PCI Unit in any way. The mailbox registers are often used with the Doorbell interrupts.
Doorbell interrupts provide an efficient method of generating an interrupt as well as encoding the purpose of the interrupt. The PCI Unit supports a 32-bit the Intel XScale register that is used by a PCI device to generate an the Intel XScale 32-bit PCI DOORBELL register that is used by the Intel XScale A source generating the Doorbell interrupt can write a software defined bitmap to the register to indicate a specific purpose. This bitmap is translated into a single interrupt signal to the destination
core and a device on the PCI Bus.
®
core and a PCI device. All four registers can be read and written with byte resolution from
®
core and PCI. How the registers are used is application dependent and the
®
core DOORBELL
®
core interrupt, and a separate
®
core to generate a PCI interrupt.
74 Hardware Reference Manual
(either a PCI interrupt or an Intel XScale® core interrupt). When an interrupt is received, the DOORBELL registers can be read and the bit mask can be interpreted. If a larger bit mask is required than that is provided by the DOORBELL register, the MAILBOX registers can be used to pass up to 16 bytes of data.
The doorbell interrupts are controlled through the registers shown in Table 18.
Table 18. Doorbell Interrupt Registers
Register Name Description
XSCALE DOORBELL Used to generate the Intel XScale
XSCALE DOORBELL SETUP
PCI DOORBELL Used to generate the PCI Doorbell interrupts.
PCI DOORBELL SETUP Used to initialize the PCI Doorbell register and for diagnostics.

2.9.5 PCI Arbiter

The PCI unit contains a PCI bus arbiter that supports two external masters in addition to the PCI Unit’s initiator interface. If more than two external masters are used in the system, the aribter can be disabled and an external (to the IXP2800 Network Processor used. In that case, the IXP2800 Network Processor will provide its PCI request signal to the external aribter, and use that arbiters grant signal.
Used to initialize the Intel XScale
Intel® IXP2800 Network Processor
Technical Description
®
core Doorbell interrupts.
®
core Doorbell register and for diagnostics.
The arbiter uses a simple round-robin priority algorithm; it asserts the grant signal corresponding to the next request in the round-robin during the current executing transaction on the PCI bus (this is also called hidden arbitration). If the arbiter detects that an initiator has failed to assert frame_l after 16 cycles of both grant assertion and PCI bus idle condition, the arbiter deasserts the grant. That master does not receive any more grants until it deasserts its request for at least one PCI clock cycle. Bus parking is implemented in that the last bus grant will stay asserted if no request is pending.
To prevent bus contention, if the PCI bus is idle, the arbiter never asserts one grant signal in the same PCI cycle in which it deasserts another, It deasserts one grant, and then asserts the next grant after one full PCI clock cycle has elapsed to provide for bus driver turnaround.
Hardware Reference Manual 75
Intel® IXP2800 Network Processor
Technical Description

2.10 Control and Status Register Access Proxy

The Control and Status Register Access Proxy (CAP) contains a number of chip-wide control and status registers. Some provide miscellaneous control and status, while others are used for inter­Microengine or Microengine to the Intel XScale Scratchpad Memory and SRAM can also be used for inter-process communication). These include:
INTERTHREAD SIGNAL — Each thread (or context) on a Microengine can send a signal to
any other thread by writing to InterThread_Signal register. This allows a thread to go to sleep waiting completion of a task by a different thread.
THREAD MESSAGE — Each thread has a message register where it can post a software-
specific message. Other Microengine threads, or the Intel XScale availability of messages by reading theTHREAD_MESSAGE_SUMMARY register. Both the THREAD_MESSAGE and corresponding THREAD_MESSAGE_SUMMARY clear upon a read of the message; this eliminates a race condition when there are multiple message readers. Only one reader will get the message.
SELF DESTRUCT — This register provides another type of communication. Microengine
software can atomically set individual bits in the SELF_DESTRUCT registers; the registers clear upon read. The meaning of each bit is software-specific. Clearing the register upon read eliminates a race condition when there are multiple readers.
THREAD INTERRUPT — Each thread can interrupt the Intel XScale
interrupts; the usage is software-specific. Having two interrupts allows for flexibility, for example, one can be assigned to normal service requests and one can be assigned to error conditions. If more information needs to be associated with the interrupt, mailboxes or Rings in Scratchpad Memory or SRAM could be used.
REFLECTOR — CAP provides a function (called “reflector”) where any Microengine thread
can move data between its registers and those of any other thread. In response to a single write or read instruction (with the address in the specific reflector range) CAP will get data from the source Microengine and put it into the destination Microengine. Both the sending and receiving threads can optionally be signaled upon completion of the data movement.
®
core communication (note that rings in
®
core, can poll for
®
core on two different

2.11 Intel XScale® Core Peripherals

2.11.1 Interrupt Controller

The Interrupt Controller provides the ability to enable or mask interrupts from a number of chip wide sources, for example:
Timers (normally used by Real-Time Operating System).
Interrupts generated by Microengine software to request services from the Intel XScale
External agents such as PCI devices.
Error conditions, such as DRAM ECC error, or SPI-4 parity error.
Interrupt status is read as memory mapped registers; the state of an interrupt signal can be read even if it is masked from interrupting. Enabling and masking of interrupts is done as writes to memory mapped registers.
76 Hardware Reference Manual
®
core.
Intel® IXP2800 Network Processor
Technical Description

2.11.2 Timers

The IXP2800 Network Processor contains four programmable 32-bit timers, which can be used for software support. Each timer can be clocked by the internal clock, by a divided version of the clock, or by a signal on an external GPIO pin. Each timer can be programmed to generate a periodic interrupt after a programmed number of clocks. The range is from several ns to several minutes depending on the clock frequency.
In addition, timer 4 can be used as a watchdog timer. In this use, software must periodically reload the timer value; if it fails to do so and the timer counts to 0, it will reset the chip. This can be used to detect if software “hangs” or for some other reason fails to reload the timer.

2.11.3 General Purpose I/O

The IXP2800 Network Processor contains eight General Purpose I/O (GPIO) pins. These can be programmed as either input or output and can be used for slow speed I/O such as LEDs or input switches. They can also be used as interrupts to the Intel XScale programmable timers.
®
core, or to clock the

2.11.4 Universal Asynchronous Receiver/Transmitter

The IXP2800 Network Processor contains a standard RS-232 compatible Universal Asynchronous Receiver/Transmitter (UART), which can be used for communication with a debugger or maintenance console. Modem controls are not supported; if they are needed, GPIO pins can be used for that purpose.
The UART performs serial-to-parallel conversion on data characters received from a peripheral device and parallel-to-serial conversion on data characters received from the processor. The processor can read the complete status of the UART at any time during operation. Available status information includes the type and condition of the transfer operations being performed by the UART and any error conditions (parity, overrun, framing or break interrupt).
The serial ports can operate in either FIFO or non-FIFO mode. In FIFO mode, a 64-byte transmit FIFO holds data from the processor to be transmitted on the serial link and a 64-byte receive FIFO buffers data from the serial link until read by the processor.
The UART includes a programmable baud rate generator that is capable of dividing the internal clock input by divisors of 1 to 2 logic. It also drives the receive logic. The UART can be operated in polled or in interrupt driven mode as selected by software.

2.11.5 Slowport

The Slowport is an external interface to the IXP2800 Network Processor, used for Flash ROM access and 8, 16, or 32-bit asynchronous device access. It allows the Intel XScale write data transfers to these slave devices.
16
- 1 and produces a 16X clock to drive the internal transmitter
®
core do read/
The address bus and data bus are multiplexed to reduce the pin count. In addition, 24 bits of address are shifted out on three clock cycles. Therefore, an external set of buffers is needed to latch the address. Two chip selects are provided.
Hardware Reference Manual 77
Intel® IXP2800 Network Processor
Technical Description
The access is asynchronous. Insertion of delay cycles for both data setup and hold time is programmable via internal Control registers. The transfer can also wait for a handshake acknowledge signal from the external device.

2.12 I/O Latency

Table 19 shows the latencies for transferring data between the Microengine and the other sub-
system components. The latency is measured in 1.4 GHz cycles.
Table 19. I/O Latency
Sub-system
DRAM
(RDR)
Transfer Size
Average Read
Latency
Average Write
Latency
Note1: RDR, QDR, MSF, and Scratch values are extracted from a simulation model. Note 2: Minimum DRAM burst size on pins is 16 bytes. Transfers less than 16 bytes incur the same as a
16-byte transfer. Note 3: At 1016 MHz, read latency should be ~ 240 cycles.
8 bytes – 16 bytes
(note 2)
~ 295 cycles
(note 3)
~ 53 cycles ~ 53 cycles ~ 40 cycles

2.13 Performance Monitor

The Intel XScale® core hardware provides two 32-bit performance counters that allow two unique events to be monitored simultaneously. In addition, the Intel XScale clock counter that can be used in conjunction with the performance counters; its sole purpose is to count the number of core clock cycles, which is useful in measuring total execution time.
SRAM (QDR)
4 bytes 4 bytes 8 bytes
100 (light load) – 160 (heavy load)
Scratch MSF
~ 100 cycles
(range 53 – 152)
®
core implements a 32-bit
range 53 – 120
(RBUF)
~ 48 cycles
(TBUF)
78 Hardware Reference Manual
Intel® IXP2800 Network Processor
Intel XScale
®
Core
Intel XScale
®
This section contains information describing the Intel XScale® core, Intel XScale® core gasket, and Intel XScale
For additional information about the Intel XScale Developers Manual available on Intel’s Developers web site (http://www.developer.intel.com).

3.1 Introduction

The Intel XScale® core is an ARM* V5TE compliant microprocessor. It has been designed for high performance and low-power; leading the industry in mW/MIPs. The Intel XScale incorporates an extensive list of architecture features that allows it to achieve high performance. Many of the architectural features added to the Intel XScale often is a serious impediment to high performance processors.
This includes:
The ability to continue instruction execution even while the data cache is retrieving data from
external memory.
A write buffer.
Write-back caching.
Various data cache allocation policies that can be configured different for each application.
®
core Peripherals (XPI).
Core 3
®
architecture refer to the Intel XScale® Core
®
core
®
core help hide memory latency that
Cache locking.
All these features improve the efficiency of the memory bus external to the core.
ARM* Version 5 (V5) Architecture added floating point instructions to ARM* Version 4. The Intel
®
XScale provide hardware support of the floating point instructions.
The Intel XScale extensions.
core implements the integer instruction set architecture of ARM* V5, but does not
®
core provides the Thumb instruction set (ARM* V5T) and the ARM* V5E DSP
Hardware Reference Manual 79
Intel® IXP2800 Network Processor
Intel XScale
®
Core

3.2 Features

Figure 16 shows the major functional blocks of the Intel XScale® core.
Figure 16. Intel XScale
®
Core Architecture Features
Instruction Cache
32 Kbytes 32 ways Lockable by line
Branch Target Buffer
128 entries
IMMU
32 entry TLB Fully associative Lockable by entry
Data Cache
Performance
Monitoring
Power Management
Debug
Hardware Breakpoint Branch History Table
Idle Drowsy Sleep

3.2.1 Multiply/ACcumulate (MAC)

The MAC unit supports early termination of multiplies/accumulates in two cycles and can sustain a throughput of a MAC operation every cycle. Architectural enhancements to the MAC support audio coding algorithms, including a 40-bit accumulator and support for 16-bit packed data.
Max 32 Kbytes 32 ways wr-back or
wr-through Hit under
miss
DMMU
32 entry TLB Fully associative Lockable by entry
MAC
Single Cycle Throughput (16*32)
16-bit SIMD 40-bit Accumulator
Data RAM
Max 28 Kbytes Re-map of
data cache
Mini-Data Cache
2 Kbytes 2 ways
Fill Buffer
4 - 8 entries
Write Buffer
8 entries Full coalescing
JTAG
A9642-01

3.2.2 Memory Management

The Intel XScale® core implements the Memory Management Unit (MMU) Architecture specified in the ARM* Architecture Reference Manual (see the ARM* website at http://www.arm.com). The MMU provides access protection and virtual to physical address translation. The MMU Architecture also specifies the caching policies for the instruction cache and data memory. These policies are specified as page attributes and include:
identifying code as cacheable or non-cacheable
selecting between the mini-data cache or data cache
write-back or write-through data caching
enabling data write allocation policy
and enabling the write buffer to coalesce stores to external memory
80 Hardware Reference Manual

3.2.3 Instruction Cache

The Intel XScale® core implements a 32-Kbyte, 32-way set associative instruction cache with a line size of 32 bytes. All requests that “miss” the instruction cache generate a 32-byte read request to external memory. A mechanism to lock critical code within the cache is also provided.

3.2.4 Branch Target Buffer (BTB)

The Intel XScale® core provides a Branch Target Buffer to predict the outcome of branch type instructions. It provides storage for the target address of branch type instructions and predicts the next address to present to the instruction cache when the current instruction address is that of a branch.
The BTB holds 128 entries.

3.2.5 Data Cache

The Intel XScale® core implements a 32-Kbyte, a 32-way set associative data cache and a 2-Kbyte, 2-way set associative mini-data cache. Each cache has a line size of 32 bytes, and supports write­through or write-back caching.
Intel® IXP2800 Network Processor
Intel XScale
®
Core
The data/mini-data cache is controlled by page attributes defined in the MMU Architecture and by coprocessor 15. The Intel XScale cache as data RAM. Software may place special tables or frequently used variables in this RAM.
®
core allows applications to reconfigure a portion of the data

3.2.6 Performance Monitoring

Two performance monitoring counters have been added to the Intel XScale® core that can be configured to monitor various events. These events allow a software developer to measure cache efficiency, detect system bottlenecks, and reduce the overall latency of programs.

3.2.7 Power Management

The Intel XScale® core incorporates a power and clock management unit that can assist in controlling clocking and managing power.

3.2.8 Debugging

The Intel XScale® core supports software debugging through two instruction address breakpoint registers, one data-address breakpoint register, one data-address/mask breakpoint register, and a trace buffer.

3.2.9 JTAG

Testability is supported on the Intel XScale® core through the Test Access Port (TAP) Controller implementation, which is based on IEEE 1149.1 (JTAG) Standard Test Access Port and Boundary­Scan Architecture. The purpose of the TAP controller is to support test logic internal and external to the Intel XScale
Hardware Reference Manual 81
®
core such as built-in self-test, boundary-scan, and scan.
Intel® IXP2800 Network Processor
Intel XScale
®
Core

3.3 Memory Management

The Intel XScale® core implements the Memory Management Unit (MMU) Architecture specified in the ARM Architecture Reference Manual. To accelerate virtual to physical address translation, the Intel XScale TLB to cache the latest translations. Each TLB holds 32 entries and is fully-associative. Not only do the TLBs contain the translated addresses, but also the access rights for memory references.
If an instruction or data TLB miss occurs, a hardware translation-table-walking mechanism is invoked to translate the virtual address to a physical address. Once translated, the physical address is placed in the TLB along with the access rights and attributes of the page or section. These translations can also be locked down in either TLB to guarantee the performance of critical routines.
The Intel XScale memory:
cacheable
bufferable
line allocate policy
write policy
®
core uses both an instruction Translation Look-aside Buffer (TLB) and a data
®
core allows system software to associate various attributes with regions of
I/O
mini Data Cache
Coalescing
P bit
Note: The virtual address with which the TLBs are accessed may be remapped by the PID register.

3.3.1 Architecture Model

3.3.1.1 Version 4 versus Version 5
ARM* MMU Version 5 Architecture introduces the support of tiny pages, which are 1 Kbyte in size. The reserved field in the first-level descriptor (encoding 0b11) is used as the fine page table base address.
3.3.1.2 Memory Attributes
The attributes associated with a particular region of memory are configured in the memory management page table and control the behavior of accesses to the instruction cache, data cache, mini-data cache and the write buffer. These attributes are ignored when the MMU is disabled.
To allow compatibility with older system software, the new Intel XScale advantage of encoding space in the descriptors that was formerly reserved.
®
core attributes take
3.3.1.2.1 Page (P) Attribute Bit
The P bit assigns a page attribute to a memory region. Refer to the Intel Network Processor Programmer’s Reference Manual for details about the P bit.
82 Hardware Reference Manual
®
IXP2400 and IXP2800
3.3.1.2.2 Instruction Cache
When examining these bits in a descriptor, the Instruction Cache only utilizes the C bit. If the C bit is clear, the Instruction Cache considers a code fetch from that memory to be non-cacheable, and will not fill a cache entry. If the C bit is set, then fetches from the associated memory region will be cached.
3.3.1.2.3 Data Cache and Write Buffer
All of these descriptor bits affect the behavior of the Data Cache and the Write Buffer.
If the X bit for a descriptor is 0 (see Table 20), the C and B bits operate as mandated by the ARM* architecture. If the X bit for a descriptor is one, the C and B bits’ meaning is extended, as detailed in Table 21.
Table 20. Data Cache and Buffer Behavior when X = 0
Intel® IXP2800 Network Processor
Intel XScale
®
Core
C B Cacheable? Bufferable? Write Policy
0 0 N N Stall until complete
0 1 N Y
1 0 Y Y Write Through Read Allocate
1 1 Y Y Write Back Read Allocate
1. Normally, the processor will continue executing after a data access if no dependency on that access is encountered. With this setting, the processor will stall execution until the data access completes. This guarantees to software that the data ac­cess has taken effect by the time execution of the data access instruction completes. External data aborts from such access­es will be imprecise.
Table 21. Data Cache and Buffer Behavior when X = 1
C B Cacheable? Bufferable? Write Policy
0 0 Unpredictable; do not use
0 1 N Y
1 0
1 1 Y Y Write Back
1. Normally, bufferable writes can coalesce with previously buffered data in the same address range
(Mini Data
Cache)
———
Line
Allocation
Policy
Line
Allocation
Policy
Read/Write
Allocate
Notes
1
Notes
Writes will not coalesce into
1
buffers
Cache policy is determined by MD field of Auxiliary Control register
3.3.1.2.4 Details on Data Cache and Write Buffer Behavior
If the MMU is disabled all data accesses will be non-cacheable and non-bufferable. This is the same behavior as when the MMU is enabled, and a data access uses a descriptor with X, C, and B all set to 0.
The X, C, and B bits determine when the processor should place new data into the Data Cache. The cache places data into the cache in lines (also called blocks). Thus, the basis for making a decision about placing new data into the cache is a called a “Line Allocation Policy.”
Hardware Reference Manual 83
Intel® IXP2800 Network Processor
Intel XScale
®
Core
If the Line Allocation Policy is read-allocate, all load operations that miss the cache request a 32-byte cache line from external memory and allocate it into either the data cache or mini-data cache (this is assuming the cache is enabled). Store operations that miss the cache will not cause a line to be allocated.
If read/write-allocate is in effect, load or store operations that miss the cache will request a 32-byte cache line from external memory if the cache is enabled.
The other policy determined by the X, C, and B bits is the Write Policy. A write-through policy instructs the Data Cache to keep external memory coherent by performing stores to both external memory and the cache. A write-back policy only updates external memory when a line in the cache is cleaned or needs to be replaced with a new line. Generally, write-back provides higher performance because it generates less data traffic to external memory.
3.3.1.2.5 Memory Operation Ordering
A fence memory operation (memop) is one that guarantees all memops issued prior to the fence will execute before any memop issued after the fence. Thus software may issue a fence to impose a partial ordering on memory accesses.
Table 22 shows the circumstances in which memops act as fences.
Any swap (SWP or SWPB) to a page that would create a fence on a load or store is a fence.
Table 22. Memory Operations that Impose a Fence
operation X C B
load 0
store101
load or store 0 0 0

3.3.2 Exceptions

The MMU may generate prefetch aborts for instruction accesses and data aborts for data memory accesses.
Data address alignment checking is enabled by setting bit 1 of the Control register (CP15, register 1). Alignment faults are still reported even if the MMU is disabled. All other MMU exceptions are disabled when the MMU is disabled.
84 Hardware Reference Manual
Intel® IXP2800 Network Processor
Intel XScale
®
Core

3.3.3 Interaction of the MMU, Instruction Cache, and Data Cache

The MMU, instruction cache, and data/mini-data cache may be enabled/disabled independently. The instruction cache can be enabled with the MMU enabled or disabled. However, the data cache can only be enabled when the MMU is enabled. Therefore only three of the four combinations of the MMU and data/mini-data cache enables are valid (see Table 23). The invalid combination will cause undefined results.
Table 23. Valid MMU and Data/Mini-Data Cache Combinations
MMU Data/Mini-data Cache
Off Off
On Off
On On

3.3.4 Control

3.3.4.1 Invalidate (Flush) Operation
The entire instruction and data TLB can be invalidated at the same time with one command or they can be invalidated separately. An individual entry in the data or instruction TLB can also be invalidated.
Globally invalidating a TLB will not affect locked TLB entries. However, the invalidate-entry operations can invalidate individual locked entries. In this case, the locked remains in the TLB, but will never “hit” on an address translation. Effectively, a hole is in the TLB. This situation may be rectified by unlocking the TLB.
3.3.4.2 Enabling/Disabling
The MMU is enabled by setting bit 0 in coprocessor 15, register 1 (Control register). When the MMU is disabled, accesses to the instruction cache default to cacheable and all accesses to data memory are made non-cacheable. A recommended code sequence for enabling the MMU is shown in Example 14.
Example 14. Enabling the MMU
; This routine provides software with a predictable way of enabling the MMU. ; After the CPWAIT, the MMU is guaranteed to be enabled. Be aware ; that the MMU will be enabled sometime after MCR and before the instruction ; that executes after the CPWAIT. ; Programming Note: This code sequence requires a one-to-one virtual to ; physical address mapping on this code since ; the MMU may be enabled part way through. This would allow the instructions ; after MCR to execute properly regardless the state of the MMU.
MRC P15,0,R0,C1,C0,0; Read CP15, register 1 ORR R0, R0, #0x1; Turn on the MMU MCR P15,0,R0,C1,C0,0; Write to CP15, register 1
; The MMU is guaranteed to be enabled at this point; the next instruction or ; data address will be translated.
Hardware Reference Manual 85
Intel® IXP2800 Network Processor
Intel XScale
®
Core
3.3.4.3 Locking Entries
Individual entries can be locked into the instruction and data TLBs. If a lock operation finds the virtual address translation already resident in the TLB, the results are unpredictable. An invalidate by entry command before the lock command will ensure proper operation. Software can also accomplish this by invalidating all entries, as shown in Example 15.
Locking entries into either the instruction TLB or data TLB reduces the available number of entries (by the number that was locked down) for hardware to cache other virtual to physical address translations.
A procedure for locking entries into the instruction TLB is shown in Example 15.
If a MMU abort is generated during an instruction or data TLB lock operation, the Fault Status register is updated to indicate a Lock Abort, and the exception is reported as a data abort.
Example 15. Locking Entries into the Instruction TLB
; R1, R2 and R3 contain the virtual addresses to translate and lock into ; the instruction TLB.
; The value in R0 is ignored in the following instruction. ; Hardware guarantees that accesses to CP15 occur in program order
MCR P15,0,R0,C8,C5,0 ; Invalidate the entire instruction TLB
MCR P15,0,R1,C10,C4,0 ; Translate virtual address (R1) and lock into
; instruction TLB
MCR P15,0,R2,C10,C4,0 ; Translate
; virtual address (R2) and lock into instruction TLB
MCR P15,0,R3,C10,C4,0 ; Translate virtual address (R3) and lock into
; instruction TLB
CPWAIT
; The MMU is guaranteed to be updated at this point; the next instruction will ; see the locked instruction TLB entries.
Note: If exceptions are allowed to occur in the middle of this routine, the TLB may end up caching a
translation that is about to be locked. For example, if R1 is the virtual address of an interrupt service routine and that interrupt occurs immediately after the TLB has been invalidated, the lock operation will be ignored when the interrupt service routine returns back to this code sequence. Software should disable interrupts (FIQ or IRQ) in this case.
As a general rule, software should avoid locking in all other exception types.
86 Hardware Reference Manual
Intel® IXP2800 Network Processor
Intel XScale
The proper procedure for locking entries into the data TLB is shown in Example 16.
Example 16. Locking Entries into the Data TLB
; R1, and R2 contain the virtual addresses to translate and lock into the data TLB
MCR P15,0,R1,C8,C6,1 ; Invalidate the data TLB entry specified by the
; virtual address in R1
MCR P15,0,R1,C10,C8,0 ; Translate virtual address (R1) and lock into
; data TLB
; Repeat sequence for virtual address in R2 MCR P15,0,R2,C8,C6,1 ; Invalidate the data TLB entry specified by the
; virtual address in R2
MCR P15,0,R2,C10,C8,0 ; Translate virtual address (R2) and lock into
; data TLB
CPWAIT ; wait for locks to complete
; The MMU is guaranteed to be updated at this point; the next instruction will ; see the locked data TLB entries.
Note: Care must be exercised here when allowing exceptions to occur during this routine whose handlers
may have data that lies in a page that is trying to be locked into the TLB.
®
Core
3.3.4.4 Round-Robin Replacement Algorithm
The line replacement algorithm for the TLBs is round-robin; there is a round-robin pointer that keeps track of the next entry to replace. The next entry to replace is the one sequentially after the last entry that was written. For example, if the last virtual to physical address translation was written into entry 5, the next entry to replace is entry 6.
At reset, the round-robin pointer is set to entry 31. Once a translation is written into entry 31, the round-robin pointer gets set to the next available entry, beginning with entry 0 if no entries have been locked down. Subsequent translations move the round-robin pointer to the next sequential entry until entry 31 is reached, where it will wrap back to entry 0 upon the next translation.
A lock pointer is used for locking entries into the TLB and is set to entry 0 at reset. A TLB lock operation places the specified translation at the entry designated by the lock pointer, moves the lock pointer to the next sequential entry, and resets the round-robin pointer to entry 31. Locking entries into either TLB effectively reduces the available entries for updating. For example, if the first three entries were locked down, the round-robin pointer would be entry 3 after it rolled over from entry 31.
Only entries 0 through 30 can be locked in either TLB; entry 31can never be locked. If the lock pointer is at entry 31, a lock operation will update the TLB entry with the translation and ignore the lock. In this case, the round-robin pointer will stay at entry 31.
Hardware Reference Manual 87
Intel® IXP2800 Network Processor
Intel XScale
®
Core
Figure 17 illustrates locked entries in TLB.
Figure 17. Example of Locked Entries in TLB
entry 0 entry 1
Note: 8 entries locked, 24 entries available for round robin replacement

3.4 Instruction Cache

The Intel XScale® core instruction cache enhances performance by reducing the number of instruction fetches from external memory. The cache provides fast execution of cached code. Code can also be locked down when guaranteed or fast access time is required.
Figure 18 shows the cache organization and how the instruction address is used to access the cache.
The instruction cache is a 32-Kbyte, 32-way set associative cache; this means there are 32 sets with each set containing 32 ways. Each way of a set contains eight 32-bit words and one valid bit, which is referred to as a line. The replacement policy is a round-robin algorithm and the cache also supports the ability to lock code in at a line granularity.
entry 7 entry 8
entry 22 entry 23
entry 30 entry 31
Locked
A9684-01
88 Hardware Reference Manual
Figure 18. Instruction Cache Organization
Set Index
Set 0
way 0
This example
shows Set 0 being
selected by the
Set Index
Tag
Word Select
way 1
way 31
Set 1
way 0 way 1
Intel® IXP2800 Network Processor
Set 31
way 0 way 1
8 Words (cache line)
8 Words (cache line)
DataCAM
8 Words (cache line)
DataCAM
DataCAM
Intel XScale
®
Core
Instruction Word
(4 bytes)
Instruction Address (Virtual)
31 54 21010 9
Tag
Note: CAM = Content Addressable Memory
The instruction cache is virtually addressed and virtually tagged. The virtual address presented to the instruction cache may be remapped by the PID register.

3.4.1 Instruction Cache Operation

3.4.1.1 Operation when Instruction Cache is Enabled
When the cache is enabled, it compares every instruction request address to the addresses of instructions that it is holding in cache. If the requested instruction is found, the access “hits” the cache, which returns the requested instruction. If the instruction is not found, the access “misses” the cache, which requests a fetch from external memory of the 8-word line (32 bytes) that contains the instruction (using the fetch policy). As the fetch returns instructions to the cache, they are put in one of two fetch buffers and the requested instruction is delivered to the instruction decoder. A fetched line is written into the cache if it is cacheable (code is cacheable if the MMU is disabled or if the MMU is enabled and the cacheable (C) bit is set to 1 in its corresponding page).
Set Index
Word
A9685-01
Note: An instruction fetch may “miss” the cache but “hit” one of the fetch buffers. If this happens, the
requested instruction is delivered to the instruction decoder in the same manner as a cache “hit.”
Hardware Reference Manual 89
Intel® IXP2800 Network Processor
Intel XScale
®
Core
3.4.1.2 Operation when Instruction Cache is Disabled
Disabling the cache prevents any lines from being written into the instruction cache. Although the cache is disabled, it is still accessed and may generate a “hit” if the data is already in the cache.
Disabling the instruction cache does not disable instruction buffering that may occur within the instruction fetch buffers. Two 8-word instruction fetch buffers will always be enabled in the cache disabled mode. As instruction fetches continue to “hit” within either buffer (even in the presence of forward and backward branches), no external fetches for instructions are generated. A miss causes one or the other buffer to be filled from external memory using the fill policy.
3.4.1.3 Fetch Policy
An instruction-cache “miss” occurs when the requested instruction is not found in the instruction fetch buffers or instruction cache; a fetch request is then made to external memory. The instruction cache can handle up to two “misses.” Each external fetch request uses a fetch buffer that holds 32-bytes and eight valid bits, one for each word. A miss causes the following:
1. A fetch buffer is allocated.
2. The instruction cache sends a fetch request to the external bus. This request is for a 32-byte line.
3. Instructions words are returned back from the external bus, at a maximum rate of 1 word per core cycle. As each word returns, the corresponding valid bit is set for the word in the fetch buffer.
4. As soon as the fetch buffer receives the requested instruction, it forwards the instruction to the instruction decoder for execution.
5. When all words have returned, the fetched line will be written into the instruction cache if it is cacheable and if the instruction cache is enabled. The line chosen for update in the cache is controlled by the round-robin replacement algorithm. This update may evict a valid line at that location.
6. Once the cache is updated, the eight valid bits of the fetch buffer are invalidated.
3.4.1.4 Round-Robin Replacement Algorithm
The line replacement algorithm for the instruction cache is round-robin. Each set in the instruction cache has a round-robin pointer that keeps track of the next line (in that set) to replace. The next line to replace in a set is the one after the last line that was written. For example, if the line for the last external instruction fetch was written into way 5-set 2, the next line to replace for that set would be way 6. None of the other round-robin pointers for the other sets are affected in this case.
After reset, way 31 is pointed to by the round-robin pointer for all the sets. Once a line is written into way 31, the round-robin pointer points to the first available way of a set, beginning with way0 if no lines have been locked into that particular set. Locking lines into the instruction cache effectively reduces the available lines for cache updating. For example, if the first three lines of a set were locked down, the round-robin pointer would point to the line at way 3 after it rolled over from way 31.
90 Hardware Reference Manual
3.4.1.5 Parity Protection
The instruction cache is protected by parity to ensure data integrity. Each instruction cache word has 1 parity bit. (The instruction cache tag is not parity protected.) When a parity error is detected on an instruction cache access, a prefetch abort exception occurs if the Intel XScale to execute the instruction. Before servicing the exception, hardware place a notification of the error in the Fault Status register (Coprocessor 15, register 5).
A software exception handler can recover from an instruction cache parity error. This can be accomplished by invalidating the instruction cache and the branch target buffer and then returning to the instruction that caused the prefetch abort exception. A simplified code example is shown in
Example 17. A more complex handler might choose to invalidate the specific line that caused the
exception and then invalidate the BTB.
Example 17. Recovering from an Instruction Cache Parity Error
; Prefetch abort handler MCR P15,0,R0,C7,C5,0 ; Invalidate the instruction cache and branch target
; buffer
CPWAIT ; wait for effect
;
Intel® IXP2800 Network Processor
Intel XScale
®
®
Core
core attempts
SUBS PC,R14,#4 ; Returns to the instruction that generated the
; parity error
; The Instruction Cache is guaranteed to be invalidated at this point
If a parity error occurs on an instruction that is locked in the cache, the software exception handler needs to unlock the instruction cache, invalidate the cache and then re-lock the code in before it returns to the faulting instruction.
3.4.1.6 Instruction Cache Coherency
The instruction cache does not detect modification to program memory by loads, stores or actions of other bus masters. Several situations may require program memory modification, such as uploading code from disk.
The application program is responsible for synchronizing code modification and invalidating the cache. In general, software must ensure that modified code space is not accessed until modification and invalidating are completed.
To achieve cache coherence, instruction cache contents can be invalidated after code modification in external memory is complete.
If the instruction cache is not enabled, or code is being written to a non-cacheable region, software must still invalidate the instruction cache before using the newly-written code. This precaution ensures that state associated with the new code is not buffered elsewhere in the processor, such as the fetch buffers or the BTB.
Naturally, when writing code as data, care must be taken to force it completely out of the processor into external memory before attempting to execute it. If writing into a non-cacheable region, flushing the write buffers is sufficient precaution. If writing to a cacheable region, then the data cache should be submitted to a Clean/Invalidate operation to ensure coherency.
Hardware Reference Manual 91
Intel® IXP2800 Network Processor
Intel XScale
®
Core

3.4.2 Instruction Cache Control

3.4.2.1 Instruction Cache State at Reset
After reset, the instruction cache is always disabled, unlocked, and invalidated (flushed).
3.4.2.2 Enabling/Disabling
The instruction cache is enabled by setting bit 12 in coprocessor 15, register 1 (Control register). This process is illustrated in Example 18.
Example 18. Enabling the Instruction Cache
; Enable the ICache MRC P15, 0, R0, C1, C0, 0 ; Get the control register ORR R0, R0, #0x1000 ; set bit 12 -- the I bit MCR P15, 0, R0, C1, C0, 0 ; Set the control register
CPWAIT
3.4.2.3 Invalidating the Instruction Cache
The entire instruction cache along with the fetch buffers are invalidated by writing to coprocessor 15, register 7. This command does not unlock any lines that were locked in the instruction cache nor does it invalidate those locked lines. To invalidate the entire cache including locked lines, the unlock instruction cache command needs to be executed before the invalidate command.
There is an inherent delay from the execution of the instruction cache invalidate command to where the next instruction will see the result of the invalidate. The routine in Example 19 can be used to guarantee proper synchronization.
Example 19. Invalidating the Instruction Cache
MCR P15,0,R1,C7,C5,0 ; Invalidate the instruction cache and branch
; target buffer
CPWAIT
; The instruction cache is guaranteed to be invalidated at this point; the next ; instruction sees the result of the invalidate command.
The Intel XScale
®
core also supports invalidating an individual line from the instruction cache.
3.4.2.4 Locking Instructions in the Instruction Cache
Software has the ability to lock performance critical routines into the instruction cache. Up to 28 lines in each set can be locked; hardware will ignore the lock command if software is trying to lock all the lines in a particular set (i.e., ways 28 – 31can never be locked). When this happens, the line is still allocated into the cache, but the lock will be ignored. The round-robin pointer will stay at way 31 for that set.
Lines can be locked into the instruction cache by initiating a write to coprocessor 15. Register Rd contains the virtual address of the line to be locked into the cache.
92 Hardware Reference Manual
There are several requirements for locking down code:
1. The routine used to lock lines down in the cache must be placed in non-cacheable memory, which means the MMU is enabled. As a corollary: no fetches of cacheable code should occur while locking instructions into the cache.
2. The code being locked into the cache must be cacheable.
3. The instruction cache must be enabled and invalidated prior to locking down lines.
Failure to follow these requirements will produce unpredictable results when accessing the instruction cache.
System programmers should ensure that the code to lock instructions into the cache does not reside closer than 128 bytes to a non-cacheable/cacheable page boundary. If the processor fetches ahead into a cacheable page, then the first requirement noted above could be violated.
Lines are locked into a set starting at way 0 and may progress up to way 27; which set a line gets locked into depends on the set index of the virtual address. Figure 19 is an example of where lines of code may be locked into the cache along with how the round-robin pointer is affected.
Figure 19. Locked Line Effect on Round Robin Replacement
Intel® IXP2800 Network Processor
Intel XScale
®
Core
way 0 way 1
way 7 way 8
set 0
Locked
set 1
Locked
set 2 set 31
Locked
way 22 way 23
way 30 way 31
Notes:
set 0: 8 ways locked, 24 ways available for round robin replacement set 1: 23 ways locked, 9 ways available for round robin replacement set 2: 28 ways locked, only way 28-31 available for replacement set 31: all 32 ways available for round robin replacement
A9686-01
Software can lock down several different routines located at different memory locations. This may cause some sets to have more locked lines than others as shown in Figure 19.
Hardware Reference Manual 93
Intel® IXP2800 Network Processor
Intel XScale
®
Core
Example 20 shows how a routine, called “lockMe” in this example, might be locked into the
instruction cache. Note that it is possible to receive an exception while locking code.
Example 20. Locking Code into the Cache
lockMe: ; This is the code that will be locked into the cache
mov r0, #5 add r5, r1, r2
. . .
lockMeEnd:
. . .
codeLock: ; here is the code to lock the “lockMe” routine
ldr r0, =(lockMe AND NOT 31); r0 gets a pointer to the first line we should lock ldr r1, =(lockMeEnd AND NOT 31); r1 contains a pointer to the last line we should lock
lockLoop:
mcr p15, 0, r0, c9, c1, 0; lock next line of code into ICache cmp r0, r1 ; are we done yet? add r0, r0, #32 ; advance pointer to next line bne lockLoop ; if not done, do the next line
3.4.2.5 Unlocking Instructions in the Instruction Cache
The Intel XScale® core provides a global unlock command for the instruction cache. Writing to coprocessor 15, register 9 unlocks all the locked lines in the instruction cache and leaves them valid. These lines then become available for the round-robin replacement algorithm.

3.5 Branch Target Buffer (BTB)

The Intel XScale® core uses dynamic branch prediction to reduce the penalties associated with changing the flow of program execution. The Intel XScale that provides the instruction cache with the target address of branch type instructions. The branch target buffer is implemented as a 128-entry, direct mapped cache.

3.5.1 Branch Target Buffer Operation

The BTB stores the history of branches that have executed along with their targets. Figure 20 shows an entry in the BTB, where the tag is the instruction address of a previously executed branch and the data contains the target address of the previously executed branch along with two bits of history information.
®
core features a branch target buffer
94 Hardware Reference Manual
Figure 20. BTB Entry
Intel® IXP2800 Network Processor
Intel XScale
®
Core
Branch Address[31:9,1]
The BTB takes the current instruction address and checks to see if this address is a branch that was previously seen. It uses bits [8:2] of the current address to read out the tag and then compares this tag to bits [31:9,1] of the current instruction address. If the current instruction address matches the tag in the cache and the history bits indicate that this branch is usually taken in the past, the BTB uses the data (target address) as the next instruction address to send to the instruction cache.
Bit[1] of the instruction address is included in the tag comparison to support Thumb execution. This organization means that two consecutive Thumb branch (B) instructions, with instruction address bits[8:2] the same, will contend for the same BTB entry. Thumb also requires 31 bits for the branch target address. In ARM* mode, bit[1] is 0.
The history bits represent four possible prediction states for a branch entry in the BTB. Figure 21 shows these states along with the possible transitions. The initial state for branches stored in the BTB is Weakly-Taken (WT). Every time a branch that exists in the BTB is executed, the history bits are updated to reflect the latest outcome of the branch, either taken or not-taken.
The BTB does not have to be managed explicitly by software; it is disabled by default after reset and is invalidated when the instruction cache is invalidated.
Figure 21. Branch History
TAG
DATA
Target Address[31:1]
History
Bits[1:0]
A9687-01
n
e
k
a
T
e
k
n
a
T
n
e
k
a
T
t
o
SN WN WT ST
N
N
Notes:
n
o
e
t
k
T
a
n
e
k
a
T
t
o
N
SN: Strongly Not Take ST: Strongly Taken WN: Weakly Not Taken WT: Weakly Taken
e
k
n
a
T
T
a
k
e
n
N
n
o
e
t
k
T
a
A9688-01
3.5.1.1 Reset
After Processor Reset, the BTB is disabled and all entries are invalidated.
Hardware Reference Manual 95
Intel® IXP2800 Network Processor
Intel XScale
®
Core

3.5.2 Update Policy

A new entry is stored into the BTB when the following conditions are met:
The branch instruction has executed
The branch was taken
The branch is not currently in the BTB
The entry is then marked valid and the history bits are set to WT. If another valid branch exists at the same entry in the BTB, it will be evicted by the new branch.
Once a branch is stored in the BTB, the history bits are updated upon every execution of the branch as shown in Figure 21.

3.5.3 BTB Control

3.5.3.1 Disabling/Enabling
The BTB is always disabled with Reset. Software can enable the BTB through a bit in a coprocessor register.
Before enabling or disabling the BTB, software must invalidate it (described in the following section). This action will ensure correct operation in case stale data is in the BTB. Software should not place any branch instruction between the code that invalidates the BTB and the code that enables/disables it.
3.5.3.2 Invalidation
There are four ways the contents of the BTB can be invalidated.
1. Reset.
2. Software can directly invalidate the BTB via a CP15, register 7 function.
3. The BTB is invalidated when the Process ID register is written.
4. The BTB is invalidated when the instruction cache is invalidated via CP15, register 7 functions.

3.6 Data Cache

The Intel XScale® core data cache enhances performance by reducing the number of data accesses to and from external memory. There are two data cache structures in the Intel XScale Kbyte data cache and a 2-Kbyte mini-data cache. An eight entry write buffer and a four entry fill buffer are also implemented to decouple the Intel XScale memory accesses, which increases overall system performance.
®
core, a 32-
®
core instruction execution from external
96 Hardware Reference Manual

3.6.1 Overviews

3.6.1.1 Data Cache Overview
The data cache is a 32-Kbyte, 32-way set associative cache, i.e., there are 32 sets and each set has 32 ways. Each way of a set contains 32 bytes (one cache line) and one valid bit. There also exist two dirty bits for every line, one for the lower 16 bytes and the other one for the upper 16 bytes. When a store hits the cache, the dirty bit associated with it is set. The replacement policy is a round-robin algorithm and the cache also supports the ability to reconfigure each line as data RAM.
Figure 22 shows the cache organization and how the data address is used to access the cache.
Cache policies may be adjusted for particular regions of memory by altering page attribute bits in the MMU descriptor that controls that memory.
The data cache is virtually addressed and virtually tagged. It supports write-back and write-through caching policies. The data cache always allocates a line in the cache when a cacheable read miss occurs and will allocate a line into the cache on a cacheable write miss when write allocate is specified by its page attribute. Page attribute bits determine whether a line gets allocated into the data cache or mini-data cache.
Figure 22. Data Cache Organization
Intel® IXP2800 Network Processor
Intel XScale
®
Core
Set 31
way 0 way 1
Set Index
Set 0
This example
shows Set 0 being
selected by the
Set Index
Tag
Word Select
Byte Select
Data Address (Virtual)
31 54 21 010 9
way 0 way 1
way 31
Set 1
way 0 way 1
32 bytes (cache line)
Byte Alignment Sign Extension
(4 bytes to Destination Register)
Tag
32 bytes (cache line)
DataCAM
DataCAM
Data Word
32 bytes (cache line)
DataCAM
Set Index
Word
Byte
Note: CAM = Content Addressable Memory
A9689-01
Hardware Reference Manual 97
Intel® IXP2800 Network Processor
Intel XScale
®
Core
3.6.1.2 Mini-Data Cache Overview
The mini-data cache is a 2-Kbyte, 2-way set associative cache; this means there are 32 sets with each set containing 2 ways. Each way of a set contains 32 bytes (one cache line) and one valid bit. There also exist 2 dirty bits for every line, one for the lower 16 bytes and the other one for the upper 16 bytes. When a store hits the cache, the dirty bit associated with it is set. The replacement policy is a round-robin algorithm.
Figure 23 shows the cache organization and how the data address is used to access the cache.
The mini-data cache is virtually addressed and virtually tagged and supports the same caching policies as the data cache. However, lines cannot be locked into the mini-data cache.
Figure 23. Mini-Data Cache Organization
This example
shows Set 0 being
selected by the
Set Index
Set 0
way 0 way 1
Set 1
way 0 way 1
Set 31
way 0 way 1
32 bytes (cache line)
32 bytes (cache line)
32 bytes (cache line)
Tag
Word Select
Byte Select
Data Address (Virtual)
31 54 21 010 9
Tag
Note: CAM = Content Addressable Memory
Byte Alignment Sign Extension
Data Word
(4 bytes to Destination Register)
Set Index
Word
Byte
A9692-01
98 Hardware Reference Manual
Intel® IXP2800 Network Processor
3.6.1.3 Write Buffer and Fill Buffer Overview
The Intel XScale® core employs an eight entry write buffer, each entry containing 16 bytes. Stores to external memory are first placed in the write buffer and subsequently taken out when the bus is available. The write buffer supports the coalescing of multiple store requests to external memory. An incoming store may coalesce with any of the eight entries.
The fill buffer holds the external memory request information for a data cache or mini-data cache fill or non-cacheable read request. Up to four 32-byte read request operations can be outstanding in the fill buffer before the Intel XScale
The fill buffer has been augmented with a four-entry pend buffer that captures data memory requests to outstanding fill operations. Each entry in the pend buffer contains enough data storage to hold one 32-bit word, specifically for store operations. Cacheable load or store operations that hit an entry in the fill buffer get placed in the pend buffer and are completed when the associated fill completes. Any entry in the pend buffer can be pended against any of the entries in the fill buffer; multiple entries in the pend buffer can be pended against a single entry in the fill buffer. Pended operations complete in program order.
®
core needs to stall.

3.6.2 Data Cache and Mini-Data Cache Operation

Intel XScale
®
Core
The following discussions refer to the data cache and mini-data cache as one cache (data/mini­data) since their behavior is the same when accessed.
3.6.2.1 Operation when Caching is Enabled
When the data/mini-data cache is enabled for an access, the data/mini-data cache compares the address of the request against the addresses of data that it is currently holding. If the line containing the address of the request is resident in the cache, the access “hits’ the cache. For a load operation the cache returns the requested data to the destination register and for a store operation the data is stored into the cache. The data associated with the store may also be written to external memory if write-through caching is specified for that area of memory. If the cache does not contain the requested data, the access ‘misses’ the cache, and the sequence of events that follows depends on the configuration of the cache, the configuration of the MMU and the page attributes.
3.6.2.2 Operation when Data Caching is Disabled
The data/mini-data cache is still accessed even though it is disabled. If a load hits the cache it will return the requested data to the destination register. If a store hits the cache, the data is written into the cache. Any access that misses the cache will not allocate a line in the cache when it’s disabled, even if the MMU is enabled and the memory region’s cacheability attribute is set.
Hardware Reference Manual 99
Intel® IXP2800 Network Processor
Intel XScale
®
Core
3.6.2.3 Cache Policies
3.6.2.3.1 Cacheability
Data at a specified address is cacheable given the following:
The MMU is enabled
The cacheable attribute is set in the descriptor for the accessed address
The data/mini-data cache is enabled
3.6.2.3.2 Read Miss Policy
The following sequence of events occurs when a cacheable load operation misses the cache:
1. The fill buffer is checked to see if an outstanding fill request already exists for that line.
— If so, the current request is placed in the pending buffer and waits until the previously
requested fill completes, after which it accesses the cache again, to obtain the request data and returns it to the destination register.
— If there is no outstanding fill request for that line, the current load request is placed in the
fill buffer and a 32-byte external memory read request is made. If the pending buffer or fill buffer is full, the Intel XScale
®
core will stall until an entry is available.
2. A line is allocated in the cache to receive the 32 bytes of fill data. The line selected is determined by the round-robin pointer (see Section 3.6.2.4). The line chosen may contain a valid line previously allocated in the cache. In this case both dirty bits are examined and if set, the four words associated with a dirty bit that’s asserted will be written back to external memory as a 4-word burst operation.
3. When the data requested by the load is returned from external memory, it is immediately sent to the destination register specified by the load. A system that returns the requested data back first, with respect to the other bytes of the line, will obtain the best performance.
4. As data returns from external memory, it is written into the cache in the previously allocated line.
A load operation that misses the cache and is not cacheable makes a request from external memory for the exact data size of the original load request. For example, LDRH requests exactly two bytes from external memory, LDR requests four bytes from external memory, etc. This request is placed in the fill buffer until, the data is returned from external memory, which is then forwarded back to the destination register(s).
100 Hardware Reference Manual
Loading...