Intel IXP2800 User Manual

Download

Intel® IXP2800 Network Processor

Hardware Reference Manual

August 2004

Order Number: 278882-010

Revision History

Date Revision Description

March 2002 001 First release for IXP2800 Customer Information Book V 0.4

May 2002 002 Update for the IXA SDK 3.0 release.

August 2002 003 Update for the IXA SDK 3.0 Pre-Release 4.

November 2002 004 Update for the IXA SDK 3.0 Pre-Release 5.

May 2003 005 Update for the IXA SDK 3.1 Alpha Release

September 2003 006 Update for the IXA SDK 3.5 Pre-Release 1

October 2003 007

January 2004 008 Updated for new trademark usage: Intel XScale

May 2004 009

August 2004 010 Preparation for web posting.

Added information about Receiver and Transmitter Interoperation with Framers and Switch Fabrics.

technology.

Updated Sections 6.5.2, 8.5.2.2, 9.2.2.1, 9.3.1, 9.3.3.2,

9.5.1.4, 9.5.3.4, and 10.3.1. Updated Figure 123 and Timing Diagrams in Figures 43, 44,

46, 47, 50, 51, 54, and 55. Added Chapter 11, “Performance Monitor Unit”.

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT.

Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights.

Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications.

Intel may make changes to specifications and product descriptions at any time, without notice.

Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.

The IXP2800 Network Processor may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Except as permitted by such license, no part of this document may be reproduced, stored in a retrieval system, or transmitted in any form or by any means without the express written consent of Intel Corporation.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an ordering number and are referenced in this document, or other Intel literature may be obtained by calling 1-800-548-4725 or by visiting Intel's website at http://www.intel.com.

Intel and XScale are registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

*Other names and brands may be claimed as the property of others.

2 Hardware Reference Manual

Contents

1 Introduction.................................................................................................................................. 25

1.1 About This Document ......................................................................................................... 25

1.2 Related Documentation ...................................................................................................... 25

1.3 Terminology ........................................................................................................................26

2 Technical Description ................................................................................................................. 27

2.1 Overview............................................................................................................................. 27

2.2 Intel XScale

2.2.1 ARM* Compatibility................................................................................................30

2.2.2 Features................................................................................................................. 30

2.3 Microengines ......................................................................................................................33

2.3.1 Microengine Bus Arrangement ..............................................................................35

2.3.2 Control Store.......................................................................................................... 35

2.3.3 Contexts.................................................................................................................35

2.3.4 Datapath Registers ................................................................................................ 37

2.3.5 Addressing Modes ................................................................................................. 41

2.3.6 Local CSRs............................................................................................................ 43

2.3.7 Execution Datapath ............................................................................................... 43

2.3.8 CRC Unit................................................................................................................ 48

2.3.9 Event Signals......................................................................................................... 49

2.4 DRAM .................................................................................................................................50

2.4.1 Size Configuration ................................................................................................. 50

2.4.2 Read and Write Access ......................................................................................... 51

2.5 SRAM .................................................................................................................................51

2.5.1 QDR Clocking Scheme ..........................................................................................52

2.5.2 SRAM Controller Configurations............................................................................52

2.5.3 SRAM Atomic Operations ......................................................................................53

2.5.4 Queue Data Structure Commands ........................................................................54

2.5.5 Reference Ordering ............................................................................................... 54

Core Microarchitecture ................................................................................. 30

2.2.2.1 Multiply/Accumulate (MAC).................................................................... 30

2.2.2.2 Memory Management ............................................................................ 30

2.2.2.3 Instruction Cache ................................................................................... 30

2.2.2.4 Branch Target Buffer..............................................................................31

2.2.2.5 Data Cache ............................................................................................31

2.2.2.6 Interrupt Controller ................................................................................. 31

2.2.2.7 Address Map.......................................................................................... 32

2.3.4.1 General-Purpose Registers (GPRs) ......................................................37

2.3.4.2 Transfer Registers ................................................................................. 37

2.3.4.3 Next Neighbor Registers........................................................................38

2.3.4.4 Local Memory ....................................................................................... 39

2.3.5.1 Context-Relative Addressing Mode .......................................................41

2.3.5.2 Absolute Addressing Mode .................................................................... 42

2.3.5.3 Indexed Addressing Mode ..................................................................... 42

2.3.7.1 Byte Align............................................................................................... 43

2.3.7.2 CAM ....................................................................................................... 45

2.5.5.1 Reference Order Tables ........................................................................ 54

2.5.5.2 Microengine Software Restrictions to Maintain Ordering.......................56

Hardware Reference Manual 3

Contents

2.6 Scratchpad Memory............................................................................................................56

2.6.1 Scratchpad Atomic Operations .............................................................................. 57

2.6.2 Ring Commands .................................................................................................... 57

2.7 Media and Switch Fabric Interface ..................................................................................... 59

2.7.1 SPI-4......................................................................................................................60

2.7.2 CSIX ...................................................................................................................... 61

2.7.3 Receive.................................................................................................................. 61

2.7.3.1 RBUF ..................................................................................................... 62

2.7.3.1.1 SPI-4 and the RBUF .............................................................. 62

2.7.3.1.2 CSIX and RBUF..................................................................... 63

2.7.3.2 Full Element List .................................................................................... 63

2.7.3.3 RX_THREAD_FREELIST...................................................................... 63

2.7.3.4 Receive Operation Summary................................................................. 64

2.7.4 Transmit................................................................................................................. 65

2.7.4.1 TBUF...................................................................................................... 65

2.7.4.1.1 SPI-4 and TBUF..................................................................... 66

2.7.4.1.2 CSIX and TBUF ..................................................................... 67

2.7.4.2 Transmit Operation Summary................................................................ 67

2.7.5 The Flow Control Interface .................................................................................... 68

2.7.5.1 SPI-4...................................................................................................... 68

2.7.5.2 CSIX....................................................................................................... 68

2.8 Hash Unit............................................................................................................................ 69

2.9 PCI Controller ..................................................................................................................... 71

2.9.1 Target Access........................................................................................................ 71

2.9.2 Master Access ....................................................................................................... 71

2.9.3 DMA Channels....................................................................................................... 71

2.9.3.1 DMA Descriptor...................................................................................... 72

2.9.3.2 DMA Channel Operation........................................................................ 73

2.9.3.3 DMA Channel End Operation ................................................................ 74

2.9.3.4 Adding Descriptors to an Unterminated Chain....................................... 74

2.9.4 Mailbox and Message Registers............................................................................ 74

2.9.5 PCI Arbiter ............................................................................................................. 75

2.10 Control and Status Register Access Proxy......................................................................... 76

2.11 Intel XScale

Core Peripherals .......................................................................................... 76

2.11.1 Interrupt Controller................................................................................................. 76

2.11.2 Timers....................................................................................................................77

2.11.3 General Purpose I/O.............................................................................................. 77

2.11.4 Universal Asynchronous Receiver/Transmitter...................................................... 77

2.11.5 Slowport................................................................................................................. 77

2.12 I/O Latency ......................................................................................................................... 78

2.13 Performance Monitor .......................................................................................................... 78

3 Intel XScale

Core ....................................................................................................................... 79

3.1 Introduction ......................................................................................................................... 79

3.2 Features.............................................................................................................................. 80

3.2.1 Multiply/ACcumulate (MAC)................................................................................... 80

3.2.2 Memory Management............................................................................................ 80

3.2.3 Instruction Cache................................................................................................... 81

3.2.4 Branch Target Buffer (BTB) ................................................................................... 81

3.2.5 Data Cache............................................................................................................ 81

3.2.6 Performance Monitoring ........................................................................................ 81

4 Hardware Reference Manual

Contents

3.2.7 Power Management...............................................................................................81

3.2.8 Debugging ............................................................................................................. 81

3.2.9 JTAG...................................................................................................................... 81

3.3 Memory Management.........................................................................................................82

3.3.1 Architecture Model ................................................................................................. 82

3.3.1.1 Version 4 versus Version 5 .................................................................... 82

3.3.1.2 Memory Attributes.................................................................................. 82

3.3.1.2.1 Page (P) Attribute Bit ............................................................. 82

3.3.1.2.2 Instruction Cache ................................................................... 83

3.3.1.2.3 Data Cache and Write Buffer .................................................83

3.3.1.2.4 Details on Data Cache and Write Buffer Behavior................. 83

3.3.1.2.5 Memory Operation Ordering ..................................................84

3.3.2 Exceptions .............................................................................................................84

3.3.3 Interaction of the MMU, Instruction Cache, and Data Cache.................................85

3.3.4 Control ...................................................................................................................85

3.3.4.1 Invalidate (Flush) Operation...................................................................85

3.3.4.2 Enabling/Disabling ................................................................................. 85

3.3.4.3 Locking Entries ...................................................................................... 86

3.3.4.4 Round-Robin Replacement Algorithm ...................................................87

3.4 Instruction Cache................................................................................................................ 88

3.4.1 Instruction Cache Operation .................................................................................. 89

3.4.1.1 Operation when Instruction Cache is Enabled.......................................89

3.4.1.2 Operation when Instruction Cache is Disabled ...................................... 90

3.4.1.3 Fetch Policy ........................................................................................... 90

3.4.1.4 Round-Robin Replacement Algorithm ...................................................90

3.4.1.5 Parity Protection..................................................................................... 91

3.4.1.6 Instruction Cache Coherency.................................................................91

3.4.2 Instruction Cache Control ...................................................................................... 92

3.4.2.1 Instruction Cache State at Reset ...........................................................92

3.4.2.2 Enabling/Disabling ................................................................................. 92

3.4.2.3 Invalidating the Instruction Cache.......................................................... 92

3.4.2.4 Locking Instructions in the Instruction Cache ........................................ 92

3.4.2.5 Unlocking Instructions in the Instruction Cache .....................................94

3.5 Branch Target Buffer (BTB) ................................................................................................ 94

3.5.1 Branch Target Buffer Operation.............................................................................94

3.5.1.1 Reset......................................................................................................95

3.5.2 Update Policy......................................................................................................... 96

3.5.3 BTB Control ...........................................................................................................96

3.5.3.1 Disabling/Enabling ................................................................................. 96

3.5.3.2 Invalidation............................................................................................. 96

3.6 Data Cache.........................................................................................................................96

3.6.1 Overviews ..............................................................................................................97

3.6.1.1 Data Cache Overview ............................................................................ 97

3.6.1.2 Mini-Data Cache Overview ....................................................................98

3.6.1.3 Write Buffer and Fill Buffer Overview..................................................... 99

3.6.2 Data Cache and Mini-Data Cache Operation ........................................................ 99

3.6.2.1 Operation when Caching is Enabled...................................................... 99

3.6.2.2 Operation when Data Caching is Disabled ............................................99

3.6.2.3 Cache Policies ..................................................................................... 100

3.6.2.3.1 Cacheability ......................................................................... 100

3.6.2.3.2 Read Miss Policy .................................................................100

3.6.2.3.3 Write Miss Policy..................................................................101

Hardware Reference Manual 5

Contents

3.6.2.3.4 Write-Back versus Write-Through........................................ 101

3.6.2.4 Round-Robin Replacement Algorithm ................................................. 102

3.6.2.5 Parity Protection................................................................................... 102

3.6.2.6 Atomic Accesses.................................................................................. 102

3.6.3 Data Cache and Mini-Data Cache Control .......................................................... 103

3.6.3.1 Data Memory State After Reset........................................................... 103

3.6.3.2 Enabling/Disabling ............................................................................... 103

3.6.3.3 Invalidate and Clean Operations.......................................................... 103

3.6.3.3.1 Global Clean and Invalidate Operation ................................ 104

3.6.4 Reconfiguring the Data Cache as Data RAM ...................................................... 105

3.6.5 Write Buffer/Fill Buffer Operation and Control ..................................................... 106

3.7 Configuration .................................................................................................................... 106

3.8 Performance Monitoring ................................................................................................... 107

3.8.1 Performance Monitoring Events .......................................................................... 107

3.8.1.1 Instruction Cache Efficiency Mode....................................................... 108

3.8.1.2 Data Cache Efficiency Mode................................................................ 109

3.8.1.3 Instruction Fetch Latency Mode........................................................... 109

3.8.1.4 Data/Bus Request Buffer Full Mode .................................................... 109

3.8.1.5 Stall/Writeback Statistics...................................................................... 110

3.8.1.6 Instruction TLB Efficiency Mode .......................................................... 111

3.8.1.7 Data TLB Efficiency Mode ................................................................... 111

3.8.2 Multiple Performance Monitoring Run Statistics .................................................. 111

3.9 Performance Considerations ............................................................................................ 111

3.9.1 Interrupt Latency.................................................................................................. 112

3.9.2 Branch Prediction ................................................................................................ 112

3.9.3 Addressing Modes ............................................................................................... 113

3.9.4 Instruction Latencies............................................................................................ 113

3.9.4.1 Performance Terms ............................................................................. 113

3.9.4.2 Branch Instruction Timings .................................................................. 115

3.9.4.3 Data Processing Instruction Timings ................................................... 115

3.9.4.4 Multiply Instruction Timings.................................................................. 116

3.9.4.5 Saturated Arithmetic Instructions......................................................... 117

3.9.4.6 Status Register Access Instructions .................................................... 118

3.9.4.7 Load/Store Instructions........................................................................ 118

3.9.4.8 Semaphore Instructions....................................................................... 118

3.9.4.9 Coprocessor Instructions ..................................................................... 119

3.9.4.10 Miscellaneous Instruction Timing......................................................... 119

3.9.4.11 Thumb Instructions .............................................................................. 119

3.10 Test Features....................................................................................................................119

3.10.1 IXP2800 Network Processor Endianness............................................................ 120

3.10.1.1 Read and Write Transactions Initiated by the Intel XScale

3.10.1.1.1 Reads Initiated by the Intel XScale® Core ........................ 121

3.10.1.1.2 The Intel XScale

Core Writing to the IXP2800

Network Processor .................................................................. 123

3.11 Intel XScale® Core Gasket Unit ....................................................................................... 125

3.11.1 Overview.............................................................................................................. 125

3.11.2 Intel XScale® Core Gasket Functional Description ............................................. 127

3.11.2.1 Command Memory Bus to Command Push/Pull Conversion .............. 127

3.11.3 CAM Operation .................................................................................................... 127

3.11.4 Atomic Operations ............................................................................................... 128

3.11.4.1 Summary of Rules for the Atomic Command Regarding I/O ............... 129

3.11.4.2 Intel XScale® Core Access to SRAM Q-Array..................................... 129

Core ...... 121

6 Hardware Reference Manual

Contents

3.11.5 I/O Transaction ....................................................................................................130

3.11.6 Hash Access ........................................................................................................ 130

3.11.7 Gasket Local CSR ...............................................................................................131

3.11.8 Interrupt ...............................................................................................................132

3.12 Intel XScale® Core Peripheral Interface........................................................................... 134

3.12.1 XPI Overview .......................................................................................................134

3.12.1.1 Data Transfers ..................................................................................... 135

3.12.1.2 Data Alignment .................................................................................... 135

3.12.1.3 Address Spaces for XPI Internal Devices ............................................ 136

3.12.2 UART Overview ...................................................................................................137

3.12.3 UART Operation .................................................................................................. 138

3.12.3.1 UART FIFO OPERATION....................................................................138

3.12.3.1.1 UART FIFO Interrupt Mode Operation –

Receiver Interrupt .................................................................... 138

3.12.3.1.2 FIFO Polled Mode Operation ............................................. 139

3.12.4 Baud Rate Generator...........................................................................................139

3.12.5 General Purpose I/O (GPIO) ............................................................................... 140

3.12.6 Timers.................................................................................................................. 141

3.12.6.1 Timer Operation ...................................................................................141

3.12.7 Slowport Unit ....................................................................................................... 142

3.12.7.1 PROM Device Support.........................................................................143

3.12.7.2 Microprocessor Interface Support for the Framer ................................ 143

3.12.7.3 Slowport Unit Interfaces.......................................................................144

3.12.7.4 Address Space.....................................................................................145

3.12.7.5 Slowport Interfacing Topology ............................................................. 145

3.12.7.6 Slowport 8-Bit Device Bus Protocols ...................................................146

3.12.7.6.1 Mode 0 Single Write Transfer for Fixed-Timed Device ......147

3.12.7.6.2 Mode 0 Single Write Transfer for Self-Timing Device........ 148

3.12.7.6.3 Mode 0 Single Read Transfer for Fixed-Timed Device...... 149

3.12.7.6.4 Single Read Transfer for a Self-Timing Device..................150

3.12.7.7 SONET/SDH Microprocessor Access Support ....................................150

3.12.7.7.1 Mode 1: 16-Bit Microprocessor Interface Support with

16-Bit Address Lines................................................................151

3.12.7.7.2 Mode 2: Interface with 8 Data Bits and 11 Address Bits ....155

3.12.7.7.3 Mode 3: Support for the Intel and AMCC* 2488 Mbps

SONET/SDH Microprocessor Interface ...................................157

4Microengines............................................................................................................................. 167

4.1 Overview........................................................................................................................... 167

4.1.1 Control Store........................................................................................................ 169

4.1.2 Contexts...............................................................................................................169

4.1.3 Datapath Registers .............................................................................................. 171

4.1.3.1 General-Purpose Registers (GPRs) ....................................................171

4.1.3.2 Transfer Registers ............................................................................... 171

4.1.3.3 Next Neighbor Registers......................................................................172

4.1.3.4 Local Memory ...................................................................................... 172

4.1.4 Addressing Modes ............................................................................................... 173

4.1.4.1 Context-Relative Addressing Mode .....................................................173

4.1.4.2 Absolute Addressing Mode .................................................................. 174

4.1.4.3 Indexed Addressing Mode ...................................................................174

4.2 Local CSRs....................................................................................................................... 174

4.3 Execution Datapath .......................................................................................................... 174

Hardware Reference Manual 7

Contents

4.3.1 Byte Align............................................................................................................. 174

4.3.2 CAM..................................................................................................................... 176

4.4 CRC Unit........................................................................................................................... 179

4.5 Event Signals.................................................................................................................... 180

4.5.1 Microengine Endianness ..................................................................................... 181

4.5.1.1 Read from RBUF (64 Bits)................................................................... 181

4.5.1.2 Write to TBUF ...................................................................................... 182

4.5.1.3 Read/Write from/to SRAM ................................................................... 182

4.5.1.4 Read/Write from/to DRAM ................................................................... 182

4.5.1.5 Read/Write from/to SHaC and Other CSRs......................................... 182

4.5.1.6 Write to Hash Unit................................................................................ 183

4.5.2 Media Access ...................................................................................................... 183

4.5.2.1 Read from RBUF ................................................................................. 184

4.5.2.2 Write to TBUF ...................................................................................... 185

4.5.2.3 TBUF to SPI-4 Transfer ....................................................................... 186

5 DRAM.......................................................................................................................................... 187

5.1 Overview........................................................................................................................... 187

5.2 Size Configuration ............................................................................................................188

5.3 DRAM Clocking ................................................................................................................189

5.4 Bank Policy ....................................................................................................................... 190

5.5 Interleaving ....................................................................................................................... 191

5.5.1 Three Channels Active (3-Way Interleave).......................................................... 191

5.5.2 Two Channels Active (2-Way Interleave) ............................................................ 193

5.5.3 One Channel Active (No Interleave) .................................................................... 193

5.5.4 Interleaving Across RDRAMs and Banks ............................................................ 194

5.6 Parity and ECC................................................................................................................. 194

5.6.1 Parity and ECC Disabled ..................................................................................... 194

5.6.2 Parity Enabled ..................................................................................................... 195

5.6.3 ECC Enabled ....................................................................................................... 195

5.6.4 ECC Calculation and Syndrome .......................................................................... 196

5.7 Timing Configuration.........................................................................................................196

5.8 Microengine Signals .........................................................................................................197

5.9 Serial Port......................................................................................................................... 197

5.10 RDRAM Controller Block Diagram.................................................................................... 198

5.10.1 Commands .......................................................................................................... 199

5.10.2 DRAM Write......................................................................................................... 199

5.10.2.1 Masked Write....................................................................................... 199

5.10.3 DRAM Read......................................................................................................... 200

5.10.4 CSR Write............................................................................................................ 200

5.10.5 CSR Read............................................................................................................ 200

5.10.6 Arbitration ............................................................................................................ 201

5.10.7 Reference Ordering ............................................................................................. 201

5.11 DRAM Push/Pull Arbiter ................................................................................................... 201

5.11.1 Arbiter Push/Pull Operation ................................................................................. 202

5.11.2 DRAM Push Arbiter Description .......................................................................... 203

5.12 DRAM Pull Arbiter Description.......................................................................................... 204

6 SRAM Interface.......................................................................................................................... 207

6.1 Overview........................................................................................................................... 207

6.2 SRAM Interface Configurations ........................................................................................ 208

8 Hardware Reference Manual

Contents

6.2.1 Internal Interface..................................................................................................209

6.2.2 Number of Channels............................................................................................209

6.2.3 Coprocessor and/or SRAMs Attached to a Channel............................................ 209

6.3 SRAM Controller Configurations.......................................................................................209

6.4 Command Overview .........................................................................................................211

6.4.1 Basic Read/Write Commands.............................................................................. 211

6.4.2 Atomic Operations ............................................................................................... 211

6.4.3 Queue Data Structure Commands ......................................................................213

6.4.3.1 Read_Q_Descriptor Commands.......................................................... 216

6.4.3.2 Write_Q_Descriptor Commands ..........................................................216

6.4.3.3 ENQ and DEQ Commands .................................................................. 217

6.4.4 Ring Data Structure Commands.......................................................................... 217

6.4.5 Journaling Commands......................................................................................... 217

6.4.6 CSR Accesses ..................................................................................................... 217

6.5 Parity................................................................................................................................. 217

6.6 Address Map..................................................................................................................... 218

6.7 Reference Ordering .......................................................................................................... 219

6.7.1 Reference Order Tables ...................................................................................... 219

6.7.2 Microcode Restrictions to Maintain Ordering ....................................................... 220

6.8 Coprocessor Mode ........................................................................................................... 221

7 SHaC — Unit Expansion ...........................................................................................................225

7.1 Overview........................................................................................................................... 225

7.1.1 SHaC Unit Block Diagram.................................................................................... 225

7.1.2 Scratchpad........................................................................................................... 227

7.1.2.1 Scratchpad Description........................................................................227

7.1.2.2 Scratchpad Interface............................................................................229

7.1.2.2.1 Command Interface .............................................................229

7.1.2.2.2 Push/Pull Interface...............................................................229

7.1.2.2.3 CSR Bus Interface ............................................................... 229

7.1.2.2.4 Advanced Peripherals Bus Interface (APB) ......................... 229

7.1.2.3 Scratchpad Block Level Diagram......................................................... 229

7.1.2.3.1 Scratchpad Commands .......................................................230

7.1.2.3.2 Ring Commands ..................................................................231

7.1.2.3.3 Clocks and Reset.................................................................235

7.1.2.3.4 Reset Registers ...................................................................235

7.1.3 Hash Unit .............................................................................................................236

7.1.3.1 Hashing Operation ............................................................................... 237

7.1.3.2 Hash Algorithm .................................................................................... 239

8 Media and Switch Fabric Interface...........................................................................................241

8.1 Overview........................................................................................................................... 241

8.1.1 SPI-4.................................................................................................................... 243

8.1.2 CSIX ....................................................................................................................246

8.1.3 CSIX/SPI-4 Interleave Mode................................................................................246

8.2 Receive............................................................................................................................. 247

8.2.1 Receive Pins........................................................................................................248

8.2.2 RBUF ...................................................................................................................248

8.2.2.1 SPI-4 .................................................................................................... 250

8.2.2.2 CSIX.....................................................................................................253

8.2.3 Full Element List .................................................................................................. 255

8.2.4 Rx_Thread_Freelist_# ......................................................................................... 255

Hardware Reference Manual 9

Contents

8.2.5 Rx_Thread_Freelist_Timeout_# .......................................................................... 256

8.2.6 Receive Operation Summary............................................................................... 256

8.2.7 Receive Flow Control Status ............................................................................... 258

8.2.7.1 SPI-4.................................................................................................... 258

8.2.7.2 CSIX..................................................................................................... 259

8.2.7.2.1 Link-Level............................................................................. 259

8.2.7.2.2 Virtual Output Queue ........................................................... 260

8.2.8 Parity.................................................................................................................... 260

8.2.8.1 SPI-4.................................................................................................... 260

8.2.8.2 CSIX..................................................................................................... 261

8.2.8.2.1 Horizontal Parity................................................................... 261

8.2.8.2.2 Vertical Parity....................................................................... 261

8.2.9 Error Cases.......................................................................................................... 261

8.3 Transmit............................................................................................................................ 262

8.3.1 Transmit Pins....................................................................................................... 262

8.3.2 TBUF ................................................................................................................... 263

8.3.2.1 SPI-4.................................................................................................... 266

8.3.2.2 CSIX..................................................................................................... 267

8.3.3 Transmit Operation Summary.............................................................................. 268

8.3.3.1 SPI-4.................................................................................................... 268

8.3.3.2 CSIX..................................................................................................... 269

8.3.3.3 Transmit Summary............................................................................... 270

8.3.4 Transmit Flow Control Status .............................................................................. 270

8.3.4.1 SPI-4.................................................................................................... 271

8.3.4.2 CSIX..................................................................................................... 273

8.3.4.2.1 Link-Level............................................................................. 273

8.3.4.2.2 Virtual Output Queue ........................................................... 273

8.3.5 Parity.................................................................................................................... 273

8.3.5.1 SPI-4.................................................................................................... 273

8.3.5.2 CSIX..................................................................................................... 274

8.3.5.2.1 Horizontal Parity................................................................... 274

8.3.5.2.2 Vertical Parity....................................................................... 274

8.4 RBUF and TBUF Summary .............................................................................................. 274

8.5 CSIX Flow Control Interface ............................................................................................. 275

8.5.1 TXCSRB and RXCSRB Signals .......................................................................... 275

8.5.2 FCIFIFO and FCEFIFO Buffers ........................................................................... 276

8.5.2.1 Full Duplex CSIX.................................................................................. 277

8.5.2.2 Simplex CSIX....................................................................................... 278

8.5.3 TXCDAT/RXCDAT, TXCSOF/RXCSOF, TXCPAR/RXCPAR,

and TXCFC/RXCFC Signals................................................................................ 280

8.6 Deskew and Training ........................................................................................................ 280

8.6.1 Data Training Pattern........................................................................................... 282

8.6.2 Flow Control Training Pattern .............................................................................. 282

8.6.3 Use of Dynamic Training ..................................................................................... 283

8.7 CSIX Startup Sequence.................................................................................................... 287

8.7.1 CSIX Full Duplex ................................................................................................. 287

8.7.1.1 Ingress IXP2800 Network Processor................................................... 287

8.7.1.2 Egress IXP2800 Network Processor.................................................... 287

8.7.1.3 Single IXP2800 Network Processor..................................................... 288

8.7.2 CSIX Simplex....................................................................................................... 288

8.7.2.1 Ingress IXP2800 Network Processor................................................... 288

8.7.2.2 Egress IXP2800 Network Processor.................................................... 289

10 Hardware Reference Manual

Contents

8.7.2.3 Single IXP2800 Network Processor.....................................................289

8.8 Interface to Command and Push and Pull Buses ............................................................. 290

8.8.1 RBUF or MSF CSR to Microengine S_TRANSFER_IN Register for Instruction:.291

8.8.2 Microengine S_TRANSFER_OUT Register to TBUF or

MSF CSR for Instruction:..................................................................................... 291

8.8.3 Microengine to MSF CSR for Instruction: ............................................................ 291

8.8.4 From RBUF to DRAM for Instruction: .................................................................. 291

8.8.5 From DRAM to TBUF for Instruction:................................................................... 292

8.9 Receiver and Transmitter Interoperation with Framers and Switch Fabrics .....................292

8.9.1 Receiver and Transmitter Configurations ............................................................293

8.9.1.1 Simplex Configuration..........................................................................293

8.9.1.2 Hybrid Simplex Configuration ..............................................................294

8.9.1.3 Dual Network Processor Full Duplex Configuration ............................. 295

8.9.1.4 Single Network Processor Full Duplex Configuration (SPI-4.2)........... 296

8.9.1.5 Single Network Processor, Full Duplex Configuration

(SPI-4.2 and CSIX-L1) .........................................................................297

8.9.2 System Configurations.........................................................................................297

8.9.2.1 Framer, Single Network Processor Ingress and Egress, and

Fabric Interface Chip............................................................................ 298

8.9.2.2 Framer, Dual Network Processor Ingress, Single

Network Processor Egress, and Fabric Interface Chip ........................298

8.9.2.3 Framer, Single Network Processor Ingress and Egress, and

CSIX-L1 Chips for Translation and Fabric Interface ............................299

8.9.2.4 CPU Complex, Network Processor, and Fabric Interface Chip ...........299

8.9.2.5 Framer, Single Network Processor, Co-Processor, and

Fabric Interface Chip............................................................................ 300

8.9.3 SPI-4.2 Support ................................................................................................... 301

8.9.3.1 SPI-4.2 Receiver..................................................................................301

8.9.3.2 SPI-4.2 Transmitter..............................................................................302

8.9.4 CSIX-L1 Protocol Support ...................................................................................303

8.9.4.1 CSIX-L1 Interface Reference Model: Traffic Manager and Fabric

Interface Chip.......................................................................................303

8.9.4.2 Intel® IXP2800 Support of the CSIX-L1 Protocol ................................304

8.9.4.2.1 Mapping to 16-Bit Wide DDR LVDS .................................... 304

8.9.4.2.2 Support for Dual Chip, Full-Duplex Operation ..................... 305

8.9.4.2.3 Support for Simplex Operation............................................. 306

8.9.4.2.4 Support for Hybrid Simplex Operation .................................307

8.9.4.2.5 Support for Dynamic De-Skew Training...............................308

8.9.4.3 CSIX-L1 Protocol Receiver Support ....................................................309

8.9.4.4 CSIX-L1 Protocol Transmitter Support ................................................310

8.9.4.5 Implementation of a Bridge Chip to CSIX-L1 .......................................311

8.9.5 Dual Protocol (SPI and CSIX-L1) Support ...........................................................312

8.9.5.1 Dual Protocol Receiver Support...........................................................312

8.9.5.2 Dual Protocol Transmitter Support....................................................... 312

8.9.5.3 Implementation of a Bridge Chip to CSIX-L1 and SPI-4.2 ................... 313

8.9.6 Transmit State Machine .......................................................................................314

8.9.6.1 SPI-4.2 Transmitter State Machine...................................................... 314

8.9.6.2 Training Transmitter State Machine..................................................... 315

8.9.6.3 CSIX-L1 Transmitter State Machine .................................................... 315

8.9.7 Dynamic De-Skew ...............................................................................................316

8.9.8 Summary of Receiver and Transmitter Signals ...................................................317

Hardware Reference Manual 11

Contents

9 PCI Unit....................................................................................................................................... 319

9.1 Overview........................................................................................................................... 319

9.2 PCI Pin Protocol Interface Block....................................................................................... 321

9.2.1 PCI Commands ................................................................................................... 322

9.2.2 IXP2800 Network Processor Initialization............................................................ 323

9.2.2.1 Initialization by the Intel XScale® Core................................................ 324

9.2.2.2 Initialization by a PCI Host................................................................... 324

9.2.3 PCI Type 0 Configuration Cycles......................................................................... 325

9.2.3.1 Configuration Write .............................................................................. 325

9.2.3.2 Configuration Read.............................................................................. 325

9.2.4 PCI 64-Bit Bus Extension .................................................................................... 325

9.2.5 PCI Target Cycles................................................................................................ 326

9.2.5.1 PCI Accesses to CSR.......................................................................... 326

9.2.5.2 PCI Accesses to DRAM....................................................................... 326

9.2.5.3 PCI Accesses to SRAM ....................................................................... 326

9.2.5.4 Target Write Accesses from the PCI Bus ............................................ 326

9.2.5.5 Target Read Accesses from the PCI Bus ............................................ 327

9.2.6 PCI Initiator Transactions .................................................................................... 327

9.2.6.1 PCI Request Operation........................................................................ 327

9.2.6.2 PCI Commands.................................................................................... 328

9.2.6.3 Initiator Write Transactions .................................................................. 328

9.2.6.4 Initiator Read Transactions.................................................................. 328

9.2.6.5 Initiator Latency Timer ......................................................................... 328

9.2.6.6 Special Cycle ....................................................................................... 329

9.2.7 PCI Fast Back-to-Back Cycles............................................................................. 329

9.2.8 PCI Retry ............................................................................................................. 329

9.2.9 PCI Disconnect .................................................................................................... 329

9.2.10 PCI Built-In System Test...................................................................................... 329

9.2.11 PCI Central Functions......................................................................................... 330

9.2.11.1 PCI Interrupt Inputs.............................................................................. 330

9.2.11.2 PCI Reset Output................................................................................. 330

9.2.11.3 PCI Internal Arbiter .............................................................................. 331

9.3 Slave Interface Block ........................................................................................................332

9.3.1 CSR Interface ...................................................................................................... 332

9.3.2 SRAM Interface ................................................................................................... 333

9.3.2.1 SRAM Slave Writes ............................................................................. 333

9.3.2.2 SRAM Slave Reads ............................................................................. 334

9.3.3 DRAM Interface ................................................................................................... 334

9.3.3.1 DRAM Slave Writes ............................................................................. 334

9.3.3.2 DRAM Slave Reads............................................................................. 335

9.3.4 Mailbox and Doorbell Registers........................................................................... 336

9.3.5 PCI Interrupt Pin .................................................................................................. 339

9.4 Master Interface Block ...................................................................................................... 340

9.4.1 DMA Interface...................................................................................................... 340

9.4.1.1 Allocation of the DMA Channels .......................................................... 341

9.4.1.2 Special Registers for Microengine Channels....................................... 341

9.4.1.3 DMA Descriptor.................................................................................... 342

9.4.1.4 DMA Channel Operation...................................................................... 343

9.4.1.5 DMA Channel End Operation .............................................................. 344

9.4.1.6 Adding Descriptor to an Unterminated Chain ...................................... 344

9.4.1.7 DRAM to PCI Transfer......................................................................... 344

9.4.1.8 PCI to DRAM Transfer......................................................................... 345

12 Hardware Reference Manual

Contents

9.4.2 Push/Pull Command Bus Target Interface........................................................... 345

9.4.2.1 Command Bus Master Access to Local Configuration Registers ........345

9.4.2.2 Command Bus Master Access to Local Control and

Status Registers...................................................................................346

9.4.2.3 Command Bus Master Direct Access to PCI Bus ................................ 346

9.4.2.3.1 PCI Address Generation for IO and MEM Cycles................346

9.4.2.3.2 PCI Address Generation for Configuration Cycles...............347

9.4.2.3.3 PCI Address Generation for Special and IACK Cycles........ 347

9.4.2.3.4 PCI Enables ......................................................................... 347

9.4.2.3.5 PCI Command ..................................................................... 347

9.5 PCI Unit Error Behavior .................................................................................................... 348

9.5.1 PCI Target Error Behavior ................................................................................... 348

9.5.1.1 Target Access Has an Address Parity Error ........................................348

9.5.1.2 Initiator Asserts PCI_PERR_L in Response to One of Our Data

Phases ................................................................................................. 348

9.5.1.3 Discard Timer Expires on a Target Read.............................................348

9.5.1.4 Target Access to the PCI_CSR_BAR Space Has Illegal

Byte Enables........................................................................................ 348

9.5.1.5 Target Write Access Receives Bad Parity PCI_PAR with the Data .....349

9.5.1.6 SRAM Responds with a Memory Error on One or More Data Phases

on a Target Read .................................................................................349

9.5.1.7 DRAM Responds with a Memory Error on One or More Data Phases

on a Target Read .................................................................................349

9.5.2 As a PCI Initiator During a DMA Transfer ............................................................349

9.5.2.1 DMA Read from DRAM (Memory-to-PCI Transaction) Gets a

Memory Error .......................................................................................349

9.5.2.2 DMA Read from SRAM (Descriptor Read) Gets a Memory Error........ 350

9.5.2.3 DMA from DRAM Transfer (Write to PCI) Receives PCI_PERR_L on

PCI Bus................................................................................................350

9.5.2.4 DMA To DRAM (Read from PCI) Has Bad Data Parity ....................... 350

9.5.2.5 DMA Transfer Experiences a Master Abort (Time-Out) on PCI ...........351

9.5.2.6 DMA Transfer Receives a Target Abort Response During a

Data Phase .......................................................................................... 351

9.5.2.7 DMA Descriptor Has a 0x0 Word Count (Not an Error) .......................351

9.5.3 As a PCI Initiator During a Direct Access from the Intel

XScale® Core or Microengine .............................................................................351

9.5.3.1 Master Transfer Experiences a Master Abort (Time-Out) on PCI ........351

9.5.3.2 Master Transfer Receives a Target Abort Response During

a Data Phase .......................................................................................351

9.5.3.3 Master from the Intel XScale® Core or Microengine Transfer

(Write to PCI) Receives PCI_PERR_L on PCI Bus .............................352

9.5.3.4 Master Read from PCI (Read from PCI) Has Bad Data Parity ............352

9.5.3.5 Master Transfer Receives PCI_SERR_L from the PCI Bus ................352

9.5.3.6 Intel XScale® Core Microengine Requests Direct Transfer when

the PCI Bus is in Reset ........................................................................352

9.6 PCI Data Byte Lane Alignment ......................................................................................... 352

9.6.1 Endian for Byte Enable ........................................................................................ 355

10 Clocks and Reset....................................................................................................................... 359

10.1 Clocks ............................................................................................................................... 359

10.2 Synchronization Between Frequency Domains ................................................................363

10.3 Reset ................................................................................................................................364

10.3.1 Hardware Reset Using nRESET or PCI_RST_L .................................................364

Hardware Reference Manual 13

Contents

10.3.2 PCI-Initiated Reset............................................................................................... 366

10.3.3 Watchdog Timer-Initiated Reset .......................................................................... 366

10.3.3.1 Slave Network Processor (Non-Central Function)............................... 367

10.3.3.2 Master Network Processor (PCI Host, Central Function) .................... 367

10.3.3.3 Master Network Processor (Central Function)..................................... 367

10.3.4 Software-Initiated Reset ...................................................................................... 367

10.3.5 Reset Removal Operation Based on CFG_PROM_BOOT.................................. 368

10.3.5.1 When CFG_PROM_BOOT is 1 (BOOT_PROM is Present) ................ 368

10.3.5.2 When CFG_PROM_BOOT is 0 (BOOT_PROM is Not Present) .........368

10.3.6 Strap Pins ............................................................................................................ 368

10.3.7 Powerup Reset Sequence ................................................................................... 370

10.4 Boot Mode ........................................................................................................................ 370

10.4.1 Flash ROM........................................................................................................... 372

10.4.2 PCI Host Download ............................................................................................. 372

10.5 Initialization ....................................................................................................................... 373

11 Performance Monitor Unit ........................................................................................................ 375

11.1 Introduction ....................................................................................................................... 375

11.1.1 Motivation for Performance Monitors................................................................... 375

11.1.2 Motivation for Choosing CHAP Counters ............................................................ 376

11.1.3 Functional Overview of CHAP Counters.............................................................. 377

11.1.4 Basic Operation of the Performance Monitor Unit ............................................... 378

11.1.5 Definition of CHAP Terminology .......................................................................... 379

11.1.6 Definition of Clock Domains................................................................................. 380

11.2 Interface and CSR Description ......................................................................................... 380

11.2.1 APB Peripheral .................................................................................................... 381

11.2.2 CAP Description .................................................................................................. 381

11.2.2.1 Selecting the Access Mode.................................................................. 381

11.2.2.2 PMU CSR ............................................................................................ 381

11.2.2.3 CAP Writes .......................................................................................... 381

11.2.2.4 CAP Reads .......................................................................................... 381

11.2.3 Configuration Registers ....................................................................................... 382

11.3 Performance Measurements ............................................................................................ 382

11.4 Events Monitored in Hardware ......................................................................................... 385

11.4.1 Queue Statistics Events....................................................................................... 385

11.4.1.1 Queue Latency..................................................................................... 385

11.4.1.2 Queue Utilization.................................................................................. 385

11.4.2 Count Events ....................................................................................................... 385

11.4.2.1 Hardware Block Execution Count ........................................................ 385

11.4.3 Design Block Select Definitions ........................................................................... 386

11.4.4 Null Event ............................................................................................................ 387

11.4.5 Threshold Events................................................................................................. 388

11.4.6 External Input Events........................................................................................... 389

11.4.6.1 XPI Events Target ID(000001) / Design Block #(0100) ....................... 389

11.4.6.2 SHaC Events Target ID(000010) / Design Block #(0101).................... 393

11.4.6.3 IXP2800 Network Processor MSF Events Target ID(000011) /

Design Block #(0110)........................................................................... 396

11.4.6.4 Intel XScale® Core Events Target ID(000100) /

Design Block #(0111)........................................................................... 402

11.4.6.5 PCI Events Target ID(000101) / Design Block #(1000) ....................... 405

11.4.6.6 ME00 Events Target ID(100000) / Design Block #(1001).................... 409

14 Hardware Reference Manual

Contents

11.4.6.7 ME01 Events Target ID(100001) / Design Block #(1001).................... 410

11.4.6.8 ME02 Events Target ID(100010) / Design Block #(1001).................... 411

11.4.6.9 ME03 Events Target ID(100011) / Design Block #(1001).................... 411

11.4.6.10 ME04 Events Target ID(100100) / Design Block #(1001).................... 412

11.4.6.11 ME05 Events Target ID(100101) / Design Block #(1001).................... 412

11.4.6.12 ME06 Events Target ID(100110) / Design Block #(1001).................... 413

11.4.6.13 ME07 Events Target ID(100111) / Design Block #(1001).................... 413

11.4.6.14 ME10 Events Target ID(110000) / Design Block #(1010).................... 414

11.4.6.15 ME11 Events Target ID(110001) / Design Block #(1010).................... 414

11.4.6.16 ME12 Events Target ID(110010) / Design Block #(1010).................... 415

11.4.6.17 ME13 Events Target ID(110011) / Design Block #(1010).................... 415

11.4.6.18 ME14 Events Target ID(110100) / Design Block #(1010).................... 416

11.4.6.19 ME15 Events Target ID(110101) / Design Block #(1010).................... 416

11.4.6.20 ME16 Events Target ID(100110) / Design Block #(1010).................... 417

11.4.6.21 ME17 Events Target ID(110111) / Design Block #(1010).................... 417

11.4.6.22 SRAM DP1 Events Target ID(001001) / Design Block #(0010)...........418

11.4.6.23 SRAM DP0 Events Target ID(001010) / Design Block #(0010)...........418

11.4.6.24 SRAM CH3 Events Target ID(001011) / Design Block #(0010)...........420

11.4.6.25 SRAM CH2 Events Target ID(001100) / Design Block #(0010)...........421

11.4.6.26 SRAM CH1 Events Target ID(001101) / Design Block #(0010)...........421

11.4.6.27 SRAM CH0 Events Target ID(001110) / Design Block #(0010)...........422

11.4.6.28 DRAM DPLA Events Target ID(010010) / Design Block #(0011) ........423

11.4.6.29 DRAM DPSA Events Target ID(010011) / Design Block #(0011)........ 424

11.4.6.30 IXP2800 Network Processor DRAM CH2 Events Target ID(010100) /

Design Block #(0011)........................................................................... 425

11.4.6.31 IXP2800 Network Processor DRAM CH1 Events Target ID(010101) /

Design Block #(0011)........................................................................... 429

11.4.6.32 IXP2800 Network Processor DRAM CH0 Events Target ID(010110) /

Design Block #(0011)........................................................................... 429

Hardware Reference Manual 15

Contents

Figures

1 IXP2800 Network Processor Functional Block Diagram ............................................................ 28

2 IXP2800 Network Processor Detailed Diagram.......................................................................... 29

3 Intel XScale® Core 4-GB (32-Bit) Address Space ..................................................................... 32

4 Microengine Block Diagram........................................................................................................ 34

5 Context State Transition Diagram .............................................................................................. 36

6 Byte-Align Block Diagram........................................................................................................... 44

7 CAM Block Diagram ................................................................................................................... 46

8 Echo Clock Configuration ........................................................................................................... 52

9 Logical View of Rings ................................................................................................................. 57

10 Example System Block Diagram ................................................................................................ 59

11 Full-Duplex Block Diagram......................................................................................................... 60

12 Simplified MSF Receive Section Block Diagram........................................................................ 61

13 Simplified Transmit Section Block Diagram................................................................................ 65

14 Hash Unit Block Diagram ........................................................................................................... 70

15 DMA Descriptor Reads............................................................................................................... 72

16 Intel XScale

17 Example of Locked Entries in TLB ............................................................................................. 88

18 Instruction Cache Organization .................................................................................................. 89

19 Locked Line Effect on Round Robin Replacement..................................................................... 93

20 BTB Entry ................................................................................................................................... 95

21 Branch History............................................................................................................................ 95

22 Data Cache Organization ........................................................................................................... 97

23 Mini-Data Cache Organization ................................................................................................... 98

24 Byte Steering for Read and Byte-Enable Generation by the Intel XScale

25 Intel XScale® Core-Initiated Write to the IXP2800 Network Processor.................................... 124

26 Intel XScale® Core-Initiated Write to the IXP2800 Network Processor (Continued)................ 125

27 Global Buses Connection to the Intel XScale® Core Gasket ................................................... 126

28 Flow Through the Intel XScale

29 Interrupt Mask Block Diagram ..................................................................................................133

30 XPI Interfaces for IXP2800 Network Processor........................................................................ 135

31 UART Data Frame.................................................................................................................... 138

32 GPIO Functional Diagram ........................................................................................................ 140

33 Timer Control Unit Interfacing Diagram .................................................................................... 141

34 Timer Internal Logic Diagram ................................................................................................... 142

35 Slowport Unit Interface Diagram...............................................................................................144

36 Address Space Hole Diagram .................................................................................................. 145

37 Slowport Example Application Topology .................................................................................. 146

38 Mode 0 Single Write Transfer for a Fixed-Timed Device.......................................................... 147

39 Mode 0 Single Write Transfer for a Self-Timing Device ........................................................... 148

40 Mode 0 Single Read Transfer for Fixed-Timed Device ............................................................ 149

41 Mode 0 Single Read Transfer for a Self-Timing Device ........................................................... 150

42 An Interface Topology with Lucent* TDAT042G5 SONET/SDH............................................... 152

43 Mode 1 Single Write Transfer for Lucent* TDAT042G5 Device (B0) ....................................... 153

44 Mode 1 Single Read Transfer for Lucent* TDAT042G5 Device (B0) ....................................... 154

45 An Interface Topology with PMC-Sierra* PM5351 S/UNI-TETRA* .......................................... 155

46 Mode 2 Single Write Transfer for PMC-Sierra* PM5351 Device (B0) ...................................... 156

47 Mode 2 Single Read Transfer for PMC-Sierra* PM5351 Device (B0) ...................................... 157

Core Architecture Features .................................................................................. 80

Core ..................... 122

Core Interrupt Controller........................................................ 132

16 Hardware Reference Manual

Contents

48 An Interface Topology with Intel / AMCC* SONET/SDH Device .............................................. 158

49 Mode 3 Second Interface Topology with Intel / AMCC* SONET/SDH Device.......................... 159

50 Mode 3 Single Write Transfer Followed by Read (B0) ............................................................. 160

51 Mode 3 Single Read Transfer Followed by Write (B0) ............................................................. 161

52 An Interface Topology with Intel / AMCC* SONET/SDH Device in Motorola* Mode ................162

53 Second Interface Topology with Intel / AMCC* SONET/SDH Device.......................................163

54 Mode 4 Single Write Transfer (B0) ........................................................................................... 164

55 Mode 4 Single Read Transfer (B0) ...........................................................................................165

56 Microengine Block Diagram...................................................................................................... 168

57 Context State Transition Diagram............................................................................................. 170

58 Byte Align Block Diagram ......................................................................................................... 175

59 CAM Block Diagram .................................................................................................................177

60 Read from RBUF (64 Bits)........................................................................................................181

61 Write to TBUF (64 Bits).............................................................................................................182

62 48-, 64-, and 128-Bit Hash Operand Transfers ........................................................................ 183

63 Bit, Byte, and Longword Organization in One RBUF Element.................................................. 184

64 Write to TBUF ........................................................................................................................... 185

65 MSF Interface ........................................................................................................................... 186

66 Clock Configuration .................................................................................................................. 189

67 IXP2800 Clocking for RDRAM at 400 MHz ..............................................................................190

68 IXP2800 Clocking for RDRAM at 508 MHz ..............................................................................190

69 Address Mapping Flow .............................................................................................................191

70 RDRAM Controller Block Diagram............................................................................................198

71 DRAM Push/Pull Arbiter Functional Blocks ..............................................................................202

72 DRAM Push Arbiter Functional Blocks .....................................................................................204

73 DRAM Pull Arbiter Functional Blocks .......................................................................................205

74 SRAM Controller/Chassis Block Diagram................................................................................. 208

75 SRAM Clock Connection on a Channel.................................................................................... 210

76 External Pipeline Registers Block Diagram ..............................................................................211

77 Queue Descriptor with Four Links ............................................................................................ 213

78 Enqueueing One Buffer at a Time ............................................................................................ 213

79 Previously Linked String of Buffers...........................................................................................214

80 First Step to Enqueue a String of Buffers to a Queue (ENQ_Tail_and_Link) ...........................214

81 Second Step to Enqueue a String of Buffers to a Queue (ENQ_Tail) ...................................... 214

82 Connection to a Coprocessor Though Standard QDR Interface .............................................. 221

83 Coprocessor with Memory Mapped FIFO Ports ....................................................................... 222

84 SHaC Top Level Diagram......................................................................................................... 226

85 Scratchpad Block Diagram .......................................................................................................228

86 Ring Communication Logic Diagram ........................................................................................231

87 Hash Unit Block Diagram..........................................................................................................236

88 Example System Block Diagram .............................................................................................. 242

89 Full-Duplex Block Diagram .......................................................................................................243

90 Receive and Transmit Clock Generation ..................................................................................245

91 Simplified Receive Section Block Diagram ............................................................................... 247

92 RBUF Element State Diagram..................................................................................................257

93 Extent of DIP-4 Codes .............................................................................................................. 260

94 Simplified Transmit Section Block Diagram.............................................................................. 262

95 TBUF State Diagram ................................................................................................................ 270

96 Tx Calendar Block Diagram...................................................................................................... 271

97 CSIX Flow Control Interface — TXCSRB and RXCSRB .......................................................... 275

Hardware Reference Manual 17

Contents

98 CSIX Flow Control Interface — FCIFIFO and FCEFIFO in Full Duplex Mode ......................... 277

99 CSIX Flow Control Interface — FCIFIFO and FCEFIFO in Simplex Mode .............................. 278

100 MSF to Command and Push and Pull Buses Interface Block Diagram .................................... 290

101 Basic I/O Capability of the Intel® IXP2800 Network Processor................................................ 292

102 Simplex Configuration .............................................................................................................. 293

103 Hybrid Simplex Configuration ................................................................................................... 294

104 Dual Network Processor, Full Duplex Configuration ................................................................ 295

105 Single Network Processor, Full Duplex Configuration (SPI-4.2 Protocol) ................................ 296

106 Single Network Processor, Full Duplex Configuration (SPI-4.2 and CSIX-L1 Protocols) ......... 297

107 Framer, Single Network Processor Ingress and Egress, and Fabric Interface Chip ................ 298

108 Framer, Dual Processor Ingress, Single Processor Egress, and Fabric Interface Chip ........... 298

109 Framer, Single Network Processor Ingress, Single Network Processor Egress,

CSIX-L1 Translation Chip and CSIX-L1 Fabric Interface Chip................................................. 299

110 CPU Complex, Network Processor, and Fabric Interface Chips .............................................. 299

111 Framer, Single Network Processor, Co-Processor, and Fabric Interface Chip ........................ 300

112 SPI-4.2 Interface Reference Model with Receiver and Transmitter Labels

Corresponding to Link Layer Device Functions........................................................................ 301

113 CSIX-L1 Interface Reference Model with Receiver and Transmitter Labels

Corresponding to Fabric Interface Chip Functions ................................................................... 303

114 Reference Model for IXP2800 Support of the Simplex Configuration Using

Independent Ingress and Egress Interfaces............................................................................. 306

115 Reference Model for Hybrid Simplex Operation ....................................................................... 307

116 Block Diagram of Dual Protocol (SPI-4.2 and CSIX-L1) Bridge Chip ....................................... 313

117 Summary of Receiver and Transmitter Signaling ..................................................................... 317

118 PCI Functional Blocks .............................................................................................................. 320

119 Data Access Paths ................................................................................................................... 321

120 PCI Arbiter Configuration Using CFG_PCI_ARB(GPIO[2]) ...................................................... 331

121 Example of Target Write to SRAM of 68 Bytes ........................................................................ 333

122 Example of Target Write to DRAM of 68 Bytes ........................................................................ 335

123 Example of Target Read from DRAM Using 64-Byte Burst ...................................................... 336

124 Generation of the Doorbell Interrupts to PCI ............................................................................ 337

125 Generation of the Doorbell Interrupts to the Intel XScale® Core.............................................. 338

126 PCI Interrupts ........................................................................................................................... 339

127 DMA Descriptor Reads............................................................................................................. 342

128 PCI Address Generation for Command Bus Master to PCI...................................................... 346

129 PCI Address Generation for Command Bus Master to PCI Configuration Cycle ..................... 347

130 Overall Clock Generation and Distribution ............................................................................... 360

131 IXP2800 Network Processor Clock Generation........................................................................ 363

132 Synchronization Between Frequency Domains........................................................................ 364

133 Reset Out Behavior .................................................................................................................. 365

134 Reset Generation ..................................................................................................................... 366

135 Boot Process ............................................................................................................................ 371

136 Performance Monitor Interface Block Diagram......................................................................... 376

137 Block Diagram of a Single CHAP Counter ............................................................................... 378

138 Basic Block Diagram of IXP2800 Network Processor with PMU .............................................. 379

139 CAP Interface to the APB ......................................................................................................... 380

140 Conceptual Diagram of Counter Array ..................................................................................... 382

18 Hardware Reference Manual

Contents

Tables

1 Data Terminology ....................................................................................................................... 26

2 Longword Formats......................................................................................................................26

3 IXP2800 Network Processor Microengine Bus Arrangement .....................................................35

4 Next Neighbor Write as a Function of CTX_ENABLE[NN_MODE] ............................................38

5 Registers Used By Contexts in Context-Relative Addressing Mode .......................................... 41

6 Align Value and Shift Amount .....................................................................................................43

7 Register Contents for Example 10.............................................................................................. 44

8 Register Contents for Example 11.............................................................................................. 45

9 RDRAM Sizes............................................................................................................................. 50

10 SRAM Controller Configurations................................................................................................. 52

11 Total Memory per Channel .........................................................................................................53

12 Address Reference Order........................................................................................................... 55

13 Q_array Entry Reference Order.................................................................................................. 55

14 Ring Full Signal Use – Number of Contexts and Length versus Ring Size ................................58

15 TBUF SPI-4 Control Definition....................................................................................................66

16 TBUF CSIX Control Definition ....................................................................................................67

17 DMA Descriptor Format ..............................................................................................................72

18 Doorbell Interrupt Registers........................................................................................................ 75

19 I/O Latency ................................................................................................................................. 78

20 Data Cache and Buffer Behavior when X = 0............................................................................. 83

21 Data Cache and Buffer Behavior when X = 1............................................................................. 83

22 Memory Operations that Impose a Fence ..................................................................................84

23 Valid MMU and Data/Mini-Data Cache Combinations................................................................85

24 Performance Monitoring Events ............................................................................................... 107

25 Some Common Uses of the PMU............................................................................................. 108

26 Branch Latency Penalty............................................................................................................ 112

27 Latency Example ...................................................................................................................... 114

28 Branch Instruction Timings (Predicted by the BTB)..................................................................115

29 Branch Instruction Timings (Not Predicted by the BTB) ........................................................... 115

30 Data Processing Instruction Timings ........................................................................................ 115

31 Multiply Instruction Timings ......................................................................................................116

32 Multiply Implicit Accumulate Instruction Timings ...................................................................... 117

33 Implicit Accumulator Access Instruction Timings......................................................................117

34 Saturated Data Processing Instruction Timings........................................................................ 117

35 Status Register Access Instruction Timings .............................................................................118

36 Load and Store Instruction Timings ..........................................................................................118

37 Load and Store Multiple Instruction Timings............................................................................. 118

38 Semaphore Instruction Timings ................................................................................................118

39 CP15 Register Access Instruction Timings............................................................................... 119

40 CP14 Register Access Instruction Timings............................................................................... 119

41 SWI Instruction Timings............................................................................................................ 119

42 Count Leading Zeros Instruction Timings ................................................................................. 119

43 Little-Endian Encoding.............................................................................................................. 120

44 Big-Endian Encoding ................................................................................................................120

45 Byte-Enable Generation by the Intel XScale

Big-Endian Systems .................................................................................................................121

46 Byte-Enable Generation by the Intel XScale

and Big-Endian Systems ..........................................................................................................123

Core for Byte Transfers in Little- and

Core for 16-Bit Data Transfers in Little-

Hardware Reference Manual 19

Contents

47 Byte-Enable Generation by the Intel XScale® Core for Byte Writes in Little- and

Big-Endian Systems ................................................................................................................. 123

48 Byte-Enable Generation by the Intel XScale

Core for Word Writes in Little- and

Big-Endian Systems ................................................................................................................. 124

49 CMB Write Command to CPP Command Conversion ............................................................. 127

50 IXP2800 Network Processor SRAM Q-Array Access Alias Addresses .................................... 129

51 GCSR Address Map (0xd700 0000)......................................................................................... 131

52 Data Transaction Alignment .....................................................................................................136

53 Address Spaces for XPI Internal Devices................................................................................. 136

54 8-Bit Flash Memory Device Density ......................................................................................... 143

55 SONET/SDH Devices ............................................................................................................... 143

56 Next Neighbor Write as a Function of CTX_Enable[NN_Mode] ............................................... 172

57 Registers Used by Contexts in Context-Relative Addressing Mode......................................... 173

58 Align Value and Shift Amount ................................................................................................... 174

59 Register Contents for Example 23............................................................................................ 175

60 Register Contents for Example 24............................................................................................ 176

61 RDRAM Loading....................................................................................................................... 188

62 RDRAM Sizes........................................................................................................................... 188

63 Address Rearrangement for 3-Way Interleave (Sheet 1 of 2) .................................................. 192

64 Address Rearrangement for 3-Way Interleave (Sheet 2 of 2) (Rev B) .................................... 193

65 Address Bank Interleaving........................................................................................................ 194

66 RDRAM Timing Parameter Settings ......................................................................................... 196

67 Ordering of Reads and Writes to the Same Address for DRAM............................................... 201

68 DRAM Push Arbiter Operation ................................................................................................. 203

69 DPLA Description ..................................................................................................................... 204

70 SRAM Controller Configurations .............................................................................................. 209

71 Total Memory per Channel ....................................................................................................... 210

72 Atomic Operations .................................................................................................................... 212

73 Queue Format .......................................................................................................................... 215

74 Ring/Journal Format ................................................................................................................. 216

75 Ring Size Encoding .................................................................................................................. 216

76 Address Map ............................................................................................................................ 218

77 Address Reference Order......................................................................................................... 219

78 Q_array Entry Reference Order ............................................................................................... 220

79 Ring Full Signal Use – Number of Contexts and Length versus Ring Size .............................. 232

80 Head/Tail, Base, and Full Threshold – by Ring Size ................................................................ 233

81 Intel XScale® Core and Microengine Instructions .................................................................... 235

82 S_Transfer Registers Hash Operands ..................................................................................... 237

83 SPI-4 Control Word Format ......................................................................................................244

84 Order of Bytes within the SPI-4 Data Burst .............................................................................. 245

85 CFrame Types.......................................................................................................................... 246

86 Receive Pins Usage by Protocol .............................................................................................. 248

87 Order in which Received Data Is Stored in RBUF .................................................................... 248

88 Mapping of Received Data to RBUF Partitions ........................................................................ 249

89 Number of Elements per RBUF Partition.................................................................................. 249

90 RBUF SPIF-4 Status Definition ................................................................................................ 252

91 RBUF CSIX Status Definition ...................................................................................................254

92 Rx_Thread_Freelist Use........................................................................................................... 255

93 Summary of SPI-4 and CSIX RBUF Operations ...................................................................... 258

94 Transmit Pins Usage by Protocol ............................................................................................. 262

20 Hardware Reference Manual

Contents

95 Order in which Data is Transmitted from TBUF ........................................................................ 263

96 Mapping of TBUF Partitions to Transmit Protocol ....................................................................263

97 Number of Elements per TBUF Partition .................................................................................. 264

98 TBUF SPI-4 Control Definition..................................................................................................266

99 TBUF CSIX Control Definition ..................................................................................................267

100 Transmit SPI-4 Control Word....................................................................................................268

101 Transmit CSIX Header..............................................................................................................269

102 Summary of RBUF and TBUF Operations................................................................................ 274

103 SRB Definition by Clock Phase Number...................................................................................276

104 Data Deskew Functions............................................................................................................281

105 Calendar Deskew Functions.....................................................................................................281

106 Flow Control Deskew Functions ............................................................................................... 281

107 Data Training Sequence ........................................................................................................... 282

108 Flow Control Training Sequence ..............................................................................................282

109 Calendar Training Sequence ....................................................................................................283

110 IXP2800 Network Processor Requires Data Training...............................................................284

111 Switch Fabric or SPI-4 Framer Requires Data Training ...........................................................285

112 IXP2800 Network Processor Requires Flow Control Training ..................................................286

113 Switch Fabric Requires Flow Control Training..........................................................................286

114 SPI-4.2 Transmitter State Machine Transitions on 16-Bit Bus Transfers .................................314

115 Training Transmitter State Machine Transitions on 16-Bit Bus Transfers ................................315

116 CSIX-L1 Transmitter State Machine Transitions on CWord Boundaries ..................................315

117 PCI Block FIFO Sizes ...............................................................................................................322

118 Maximum Loading .................................................................................................................... 322

119 PCI Commands ........................................................................................................................ 322

120 PCI BAR Programmable Sizes ................................................................................................. 324

121 PCI BAR Sizes with PCI Host Initialization ...............................................................................324

122 Legal Combinations of the Strap Pin Options...........................................................................330

123 Slave Interface Buffer Sizes ..................................................................................................... 332

124 Doorbell Interrupt Registers...................................................................................................... 337

125 IRQ Interrupt Options by Stepping............................................................................................339

126 DMA Descriptor Format ............................................................................................................ 342

127 PCI Maximum Burst Size..........................................................................................................345

128 Command Bus Master Configuration Transactions ..................................................................347

129 Command Bus Master Address Space Map to PCI.................................................................. 347

130 Byte Lane Alignment for 64-Bit PCI Data In (64 Bits PCI Little-Endian to Big-Endian

with Swap) ................................................................................................................................ 353

131 Byte Lane Alignment for 64-Bit PCI Data In (64 Bits PCI Big-Endian to Big-Endian

without Swap) ........................................................................................................................... 353

132 Byte Lane Alignment for 32-Bit PCI Data In (32 Bits PCI Little-Endian to Big-Endian

with Swap) ................................................................................................................................ 353

133 Byte Lane Alignment for 32-Bit PCI Data In (32 Bits PCI Big-Endian to Big-Endian

without Swap) ........................................................................................................................... 353

134 Byte Lane Alignment for 64-Bit PCI Data Out (Big-Endian to 64 Bits PCI Little

Endian with Swap) .................................................................................................................... 354

135 Byte Lane Alignment for 64-Bit PCI Data Out (Big-Endian to 64 Bits PCI Big-Endian

without Swap) ........................................................................................................................... 354

136 Byte Lane Alignment for 32-Bit PCI Data Out (Big-Endian to 32 Bits PCI Little

Endian with Swap) .................................................................................................................... 354

137 Byte Lane Alignment for 32-Bit PCI Data Out (Big-Endian to 32 Bits PCI Big-Endian

without Swap) ........................................................................................................................... 354

Hardware Reference Manual 21

Contents

138 Byte Enable Alignment for 64-Bit PCI Data In (64 Bits PCI Little-Endian to Big-

Endian with Swap).................................................................................................................... 355

139 Byte Enable Alignment for 64-Bit PCI Data In (64 Bits PCI Big-Endian to Big-Endian

without Swap) ........................................................................................................................... 355

140 Byte Enable Alignment for 32-Bit PCI Data In (32 bits PCI Little-Endian to Big-

Endian with Swap).................................................................................................................... 355

141 Byte Enable Alignment for 32-Bit PCI Data In (32 Bits PCI Big-Endian to Big-Endian

without Swap) ........................................................................................................................... 356

142 Byte Enable Alignment for 64-Bit PCI Data Out (Big-Endian to 64 Bits PCI Little

Endian with Swap).................................................................................................................... 356

143 Byte Enable Alignment for 64-Bit PCI Data Out (Big-Endian to 64 Bits PCI Big

Endian without Swap)............................................................................................................... 356

144 Byte Enable Alignment for 32-Bit PCI Data Out (Big-Endian to 32 Bits PCI Little

Endian with Swap).................................................................................................................... 356

145 Byte Enable Alignment for 32-Bit PCI Data Out (Big-Endian to 32 Bits PCI Big

Endian without Swap)............................................................................................................... 357

146 PCI I/O Cycles with Data Swap Enable .................................................................................... 358

147 Clock Usage Summary............................................................................................................. 360

148 Clock Rates Examples ............................................................................................................. 362

149 IXP2800 Network Processor Strap Pins................................................................................... 369

150 Supported Strap Combinations ................................................................................................ 370

151 APB Usage ............................................................................................................................... 381

152 Hardware Blocks and Their Performance Measurement Events.............................................. 383

153 PMU Design Unit Selection ...................................................................................................... 386

154 Chap Counter Threshold Events (Design Block # 0001).......................................................... 388

155 XPI PMU Event List .................................................................................................................. 389

156 SHaC PMU Event List .............................................................................................................. 393

157 IXP2800 Network Processor MSF PMU Event List .................................................................. 396

158 Intel XScale® Core Gasket PMU Event List............................................................................. 402

159 PCI PMU Event List.................................................................................................................. 405

160 ME00 PMU Event List .............................................................................................................. 409

161 ME01 PMU Event List .............................................................................................................. 410

162 ME02 PMU Event List .............................................................................................................. 411

163 ME03 PMU Event List .............................................................................................................. 411

164 ME04 PMU Event List .............................................................................................................. 412

165 ME05 PMU Event List .............................................................................................................. 412

166 ME06 PMU Event List .............................................................................................................. 413

167 ME07 PMU Event List .............................................................................................................. 413

168 ME10 PMU Event List .............................................................................................................. 414

169 ME11 PMU Event List .............................................................................................................. 414

170 ME12 PMU Event List .............................................................................................................. 415

171 ME13 PMU Event List .............................................................................................................. 415

172 ME14 PMU Event List .............................................................................................................. 416

173 ME15 PMU Event List .............................................................................................................. 416

174 ME16 PMU Event List .............................................................................................................. 417

175 ME17 PMU Event List .............................................................................................................. 417

176 SRAM DP1 PMU Event List ..................................................................................................... 418

177 SRAM DP0 PMU Event List ..................................................................................................... 418

178 SRAM CH3 PMU Event List .....................................................................................................420

179 SRAM CH3 PMU Event List .....................................................................................................421

180 SRAM CH3 PMU Event List .....................................................................................................421

22 Hardware Reference Manual

Contents

181 SRAM CH0 PMU Event List .....................................................................................................422

182 IXP2800 Network Processor Dram DPLA PMU Event List....................................................... 423

183 IXP2800 Network Processor Dram DPSA PMU Event List ......................................................424

184 IXP2800 Network Processor Dram CH2 PMU Event List.........................................................425

185 IXP2800 Network Processor Dram CH1 PMU Event List.........................................................429

186 IXP2800 Network Processor Dram CH0 PMU Event List.........................................................429

Hardware Reference Manual 23

Contents

24 Hardware Reference Manual

Intel® IXP2800 Network Processor

Introduction

Introduction 1

1.1 About This Document

This document is the hardware reference manual for the Intel® IXP2800 Network Processor. This information is intended for use by developers and is organized as follows:

Section 2, “Technical Description” contains a hardware overview.

Section 3, “Intel XScale

Section 4, “Microengines” describes Microengine operation.

Section 5, “DRAM” describes the DRAM Unit.

Section 6, “SRAM Interface” describes the SRAM Unit.

Section 7, “SHaC — Unit Expansion” describes the Scratchpad, Hash Unit, and CSRs (SHaC).

Core” describes the embedded core.

Section 8, “Media and Switch Fabric Interface” describes the Media and Switch Fabric (MSF)

Interface used to connect the network processor to a physical layer device.

Section 9, “PCI Unit” describes the PCI Unit.

Section 10, “Clocks and Reset” describes the clocks, reset and initialization sequence.

Section 11, “Performance Monitor Unit” describes the PMU.

1.2 Related Documentation

Further information on the IXP2800 is available in the following documents:

IXP2800 Network Processor Datasheet – Contains summary information on the IXP2800 Network Processor including a functional description, signal descriptions, electrical specifications, and mechanical specifications.

IXP2400 and IXP2800 Network Processor Programmer’s Reference Manual – Contains detailed programming information for designers.

IXP2400/IXP2800 Network Processor Development Tools User’s Guide – Describes the Developer Workbench and the development tools you can access through the use of the Workbench GUI.

Hardware Reference Manual 25

Introduction

1.3 Terminology

Table 1 and Table 2 list the terminology used in this manual.

Table 1. Data Terminology

Term Words Bytes Bits

Byte ½ 1 8

Word 1 2 16

Longword 2 4 32

Quadword 4 8 64

Table 2. Longword Formats

Endian Type 32-Bit 64-Bit

Little-Endian (0x12345678) arranged as {12 34 56 78}

Big-Endian (0x12345678) arranged as {78 56 34 12}

Intel® IXP2800 Network Processor

64-bit data 0x12345678 9ABCDE56 arranged as {12 34 56 78 9A BC DE 56}

64-bit data 0x12345678 9ABCDE56 arranged as {78 56 34 12, 56 DE BC 9A}

26 Hardware Reference Manual

Intel® IXP2800 Network Processor

Technical Description

Technical Description 2

2.1 Overview

This section provides a brief overview of the IXP2800 Network Processor internal hardware. This section is intended as an overall hardware introduction to the network processor.

The major blocks are:

• Intel XScale

compliant) used to initialize and manage the network processor, and can be used for higher layer network processing tasks.

• Intel XScale

Purpose I/O (GPIO) and interface to low-speed off chip peripherals (such as maintenance port of network devices) and Flash ROM.

• Microengines (MEs) — Sixteen 32-bit programmable engines specialized for Network

Processing. Microengines do the main data plane processing per packet.

core — General purpose 32-bit RISC processor (ARM* Version 5 Architecture

technology Peripherals (XPI) — Interrupt Controller, Timers, UART, General

• DRAM Controllers — Three independent controllers for Rambus* DRAM. Typically DRAM

is used for data buffer storage.

• SRAM Controllers — Four independent controllers for QDR SRAM. Typically SRAM is used

for control information storage.

• Scratchpad Memory — 16 Kbytes storage for general purpose use.

• Hash Unit — Polynomial hash accelerator. The Intel XScale

it to offload hash calculations.

core and Microengines can use

• Control and Status Register Access Proxy (CAP) — These provide special inter-processor

communication features to allow flexible and efficient inter-Microengine and Microengine to Intel XScale

core communication.

• Media and Switch Fabric Interface (MSF) — Interface for network framers and/or Switch

Fabric. Contains receive and transmit buffers.

• PCI Controller — PCI Local Bus Specification, Version 2.2* interface for 64-bit 66-MHz I/O. PCI can

be used to either connect to a Host processor, or to attach PCI-compliant peripheral devices.

• Performance Monitor — Counters that can be programmed to count selected internal chip

hardware events, which can be used to analyze and tune performance.

Figure 1 is a simple block diagram of the network processor showing the major internal hardware

blocks. Figure 2 is a detailed diagram of the network processor units and buses.

Hardware Reference Manual 27

Intel® IXP2800 Network Processor

Technical Description

Figure 1. IXP2800 Network Processor Functional Block Diagram

Media Switch

Fabric (MSF)

Hash

Unit

Scratched

Memory

PCI

Controller

SRAM

Controller

CAP

SRAM

Controller

ME 0x1

ME 0x2

ME 0x5

ME 0x6

ME Cluster 0

SRAM

Controller

ME 0x0

ME 0x3

ME 0x4

ME 0x7

SRAM

Controller

0x10

0x13

0x14

0x17

ME Cluster 1

DRAM

Controller

0x11

0x12

0x15

0x16

DRAM

Controller

Intel

XScale

Core

Peripherals

(XPI)

Performance

Monitor

DRAM

Controller

Intel

XScale

Core

A9226-02

28 Hardware Reference Manual

Figure 2. IXP2800 Network Processor Detailed Diagram

Intel® IXP2800 Network Processor

Technical Description

Device

SP14/CSIX

PCI

Device

SHaC Unit

Scratch Hash CAP

Media

Controller

RBuf TBuf CSR

mast/targ

CSRs PCI

space DMA

transfers

PCI

Controller

SRAM

Controller

S_Pull

Arb 1

SRAM

Controller

S_Push

Arb 1

Cmd_Arb_1 (grant/request)

Controller

S_Push

Arb 0

S_Pull_1

S_Push_1

S in xfer

CSRs

SRAM

S_Pull

Arb 0

S out

xfer

ME0x10-0x17

D in xfer

Cluster 1

D out

D_Push

D_Pull

Cmd_1

Cmd_0

S_Pull_0

S_Push_0

xfer

DRAM

Controller

Cmd

FIFO

CSRs

DRAM

Controller

S in

S out

xfer

ME0x0-0x7

Cluster 0

DRAM

Controller

Arbiter

D_Push_Bus

D out

D in

xfer

Cmd_Arb_0 (grant/request)

Arbiter

D_Pull_Bus

Gasket

Intel

XScale

Core

Command Bus

Arbiter 1

Command Bus

Arbiter 0

Notes:

Connected to the S_Push/Pull Buses Connected to the S_Push/Pull Buses and D_Push/Pull Buses

= Chassis Components

A9750-03

Hardware Reference Manual 29

Intel® IXP2800 Network Processor

Technical Description

2.2 Intel XScale® Core Microarchitecture

The Intel XScale® microarchitecture consists of a 32-bit general purpose RISC processor that incorporates an extensive list of architecture features that allows it to achieve high performance.

2.2.1 ARM* Compatibility

The Intel XScale® microarchitecture is ARM* Version 5 (V5) Architecture compliant. It implements the integer instruction set of ARM* V5, but does not provide hardware support of the floating point instructions.

The Intel XScale ARM V5E DSP extensions.

Backward compatibility with the first generation of StrongARM* products is maintained for usermode applications. Operating systems may require modifications to match the specific hardware features of the Intel XScale enhancements added to the Intel XScale

microarchitecture provides the Thumb instruction set (ARM V5T) and the

microarchitecture and to take advantage of the performance

core.

2.2.2 Features

2.2.2.1 Multiply/Accumulate (MAC)

The MAC unit supports early termination of multiplies/accumulates in two cycles and can sustain a throughput of a MAC operation every cycle. Several architectural enhancements were made to the MAC to support audio coding algorithms, which include a 40-bit accumulator and support for 16-bit packed values.

2.2.2.2 Memory Management

The Intel XScale® microarchitecture implements the Memory Management Unit (MMU) Architecture specified in the ARM Architecture Reference Manual. The MMU provides access protection and virtual to physical address translation.

The MMU Architecture also specifies the caching policies for the instruction cache and data memory. These policies are specified as page attributes and include:

• identifying code as cacheable or non-cacheable

• selecting between the mini-data cache or data cache

• write-back or write-through data caching

• enabling data write allocation policy

• and enabling the write buffer to coalesce stores to external memory

2.2.2.3 Instruction Cache

The Intel XScale® microarchitecture implements a 32-Kbyte, 32-way set associative instruction cache with a line size of 32 bytes. All requests that “miss” the instruction cache generate a 32-byte read request to external memory. A mechanism to lock critical code within the cache is also provided.

30 Hardware Reference Manual

2.2.2.4 Branch Target Buffer

The Intel XScale® microarchitecture provides a Branch Target Buffer (BTB) to predict the outcome of branch type instructions. It provides storage for the target address of branch type instructions and predicts the next address to present to the instruction cache when the current instruction address is that of a branch.

The BTB holds 128 entries.

2.2.2.5 Data Cache

The Intel XScale® microarchitecture implements a 32-Kbyte, 32-way set associative data cache and a 2-Kbyte, 2-way set associative mini-data cache. Each cache has a line size of 32 bytes, and supports write-through or write-back caching.

The data/mini-data cache is controlled by page attributes defined in the MMU Architecture and by coprocessor 15.

Intel® IXP2800 Network Processor

Technical Description

The Intel XScale as data RAM. Software may place special tables or frequently used variables in this RAM.

microarchitecture allows applications to reconfigure a portion of the data cache

2.2.2.6 Interrupt Controller

The Intel XScale® microarchitecture provides two levels of interrupt, IRQ and FIQ. They can be masked via coprocessor 13. Note that there is also a memory-mapped interrupt controller described with the Intel XScale many chip-wide interrupt sources.

technology peripherals (see Section 3.12), which is used to mask and steer

Hardware Reference Manual 31

Intel® IXP2800 Network Processor

Technical Description

2.2.2.7 Address Map

Figure 3 shows the partitioning of the Intel XScale® core microarchitecture 4-Gbyte address space.

Figure 3. Intel XScale

0XFFFF FFF

0XE000 0000

0XDFFF FFF

0XC000 0000

0XBFFF FFF

0x8000 0000

0X7FFF FFFF

Core 4-GB (32-Bit) Address Space

PCI MEM

(1/2 Gb)

Other

(1/2 Gb)

SRAM (1 Gb)

3.5 Gb

3.0 Gb

PCI Local CSRs

PCI Config Regs

PCI Spec/IACK

PCI CFG (32 Mb)

PCI I/O (32 Mb)

Intel XScale® Core CSR

RESERVED

(32 Mb x 2)

DRAM CSR (32 Mb) SRAM Ring (32 Mb)

SRAM CSR & Queue

Scratch (32 Mb)

MSF (32 Mb) FLASH ROM

(64 Mb)

RESERVED

CAP-CSRs (32 Mb)

0XDF00 0000 0XDE00 0000

0XDC00 0000 0XDA00 0000 0XD800 0000 0XD600 0000

0XD000 0000 0XCE00 0000 0XCC00 0000 0XCA00 0000 0XC800 0000

0XC400 0000 0XC200 0000 0XC000 0000

DRAM

and

Intel

XScale

Core

FLASH ROM

(2 Gb)

0X0000 0000

A9693-02

32 Hardware Reference Manual

2.3 Microengines

The Microengines do most of the programmable pre-packet processing in the IXP2800 Network Processor. There are 16 Microengines, connected as shown in Figure 1. The Microengines have access to all shared resources (SRAM, DRAM, MSF, etc.) as well as private connections between adjacent Microengines (referred to as “next neighbors”).

The block diagram in Figure 4 is used in the Microengine description. Note that this block diagram is simplified for clarity; some blocks and connectivity have been omitted to make the diagram more readable. Also, this block diagram does not show any pipeline stages, rather it shows the logical flow of information.

Microengines provide support for software-controlled multi-threaded operation. Given the disparity in processor cycle times versus external memory times, a single thread of execution often blocks, waiting for external memory operations to complete. Multiple threads allow for threadinterleave operation, as there is often at least one thread ready to run while others are blocked.

Intel® IXP2800 Network Processor

Technical Description

Hardware Reference Manual 33

Intel® IXP2800 Network Processor

Technical Description

Figure 4. Microengine Block Diagram

NNData_In (from previous ME)

640

Local

Mem

Lm_addr_1 Lm_addr_0

CRC_Remainder

CRC Unit

S_Push

Local CSRs

(from DRAM)

d e c o d e

128

GPRs

(A Bank)

A_Operand

128

GPRs

(B Bank)

128

Neighbor

B_Operand

Execution

Datapath

(Shift, Add, Subtract, Multiply Logicals,

Find First Bit, CAM)

ALU_Out

128

XFER

Out

128

XFER

Out

D_Push

Command

CMD FIFO

(4)

128

XFER

T_Index NN_Get

128

XFER

S_Push (from SRAM Scratchpad, MSF, Hash, PCI, CAP)

Control

Store

A_Src B_Src

Immed

Dest

NN_Data_Out (to next ME)

Control Data

S_PullD_Pull

B1670-01

34 Hardware Reference Manual

Intel® IXP2800 Network Processor

2.3.1 Microengine Bus Arrangement

The IXP2800 Network Processor supports a single D_Push/D_Pull bus, and both Microengine clusters interface to the same bus. Also, it supports two command buses, and two sets of S_Push/S_Pull buses connected as shown in Table 3, which also shows the next neighbor relationship between the Microengine.

Table 3. IXP2800 Network Processor Microengine Bus Arrangement

Technical Description

Microengine

Cluster

2.3.2 Control Store

Microengine

Number

0x00 0x01 NA

0x01 0x02 0x00

0x02 0x03 0x01

0x03 0x04 0x02

0x04 0x05 0x03

0x05 0x06 0x04

0x06 0x07 0x05

0x07 0x10 0x06

0x10 0x11 0x07

0x11 0x12 0x10

0x12 0x13 0x11

0x13 0x14 0x12

0x14 0x15 0x13

0x15 0x16 0x14

0x16 0x17 0x15

0x17 NA 0x16

Neighbor

Command

Bus

0 0

S_Push and

S_Pull Bus

The Control Store is a RAM that holds the program that is executed by the Microengine. It holds 8192 instructions, each of which is 40 bits wide. It is initialized by the Intel XScale

core, which

writes to USTORE_ADDR and USTORE_DATA Local CSRs.

The Control Store is protected by parity against soft errors. Parity checking is enabled by CTX_ENABLE[CONTROL STORE PARITY ENABLE]. A parity error on an instruction read will halt the Microengine and assert an interrupt to the Intel XScale

core.

2.3.3 Contexts

There are eight hardware Contexts available in the Microengine. To allow for efficient context swapping, each Context has its own register set, Program Counter, and Context specific Local registers. Having a copy per Context eliminates the need to move Context specific information to/ from shared memory and Microengine registers for each Context swap. Fast context swapping allows a Context to do computation while other Contexts wait for I/O (typically external memory accesses) to complete or for a signal from another Context or hardware unit. (A context swap is similar to a taken branch in timing.)

Hardware Reference Manual 35

Intel® IXP2800 Network Processor

Technical Description

Each of the eight Contexts is in one of four states.

1. Inactive — Some applications may not require all eight contexts. A Context is in the Inactive state when its CTX_ENABLE CSR enable bit is a 0.

2. Executing — A Context is in Executing state when its context number is in ACTIVE_CTX_STS CSR. The executing Context’s PC is used to fetch instructions from the Control Store. A Context will stay in this state until it executes an instruction that causes it to go to Sleep state (there is no hardware interrupt or preemption; Context swapping is completely under software control). At most one Context can be in Executing state at any time.

3. Ready — In this state, a Context is ready to execute, but is not because a different Context is executing. When the Executing Context goes to the Sleep state, the Microengine’s context arbiter selects the next Context to go to the Executing state from among all the Contexts in the Ready state. The arbitration is round robin.

4. Sleep — Context is waiting for external event(s) specified in the INDIRECT_WAKEUP_EVENTS CSR to occur (typically, but not limited to, an I/O access). In this state the Context does not arbitrate to enter the Executing state.

The state diagram in Figure 5 illustrates the Context state transitions. Each of the eight Contexts will be in one of these states. At most one Context can be in Executing state at a time; any number of Contexts can be in any of the other states.

Figure 5. Context State Transition Diagram

Reset

CTX_ENABLE

Note:

After reset, the Intel XScale CTX_WAKEUP_EVENTS to 0x1 (voluntary), and then set the appropriate CTX_ENABLE bits to begin executing Context(s).

Inactive Ready

bit is cleared

Sleep Executing

Core processor must load the starting address of the CTX_PC, load the

The Microengine is in Idle state whenever no Context is running (all Contexts are in either Inactive or Sleep states). This state is entered:

CTX_ENABLE bit is set by

Intel XScale

CTX_ENABLE bit is cleared

Context executes

CTX Arbitration instruction

Core

Executing Context goes to Sleep state, and this Context is the highest round-robin priority.

A9352-03

1. After reset (CTX_ENABLE Local CSR is clear, putting all Contexts into Inactive states).

2. When a context swap is executed, but no context is ready to wake up.

3. When a condition 2 above, since the

ctx_arb[bpt] instruction is executed by the Microengine (this is a special case of

ctx_arb[bpt] clears CTX_ENABLE, putting all Contexts into

Inactive states).

36 Hardware Reference Manual

The Microengine provides the following functionality during the Idle state:

1. The Microengine continuously checks if a Context is in Ready state. If so, a new Context begins to execute. If no Context is Ready, the Microengine remains in the Idle state.

2. Only the ALU instructions are supported. They are used for debug via special hardware defined in number 3 below.

3. A write to the USTORE_ADDR Local CSR with the USTORE_ADDR[ECS] bit set, causing the Microengine to repeatedly execute the instruction pointed by the address specified in the USTORE_ADDR CSR. Only the ALU instructions are supported in this mode. Also, the result of the execution is written to the ALU_OUT Local CSR rather than a destination register.

4. A write to the USTORE_ADDR Local CSR with the USTORE_ADDR[ECS] bit set, followed by a write to the USTORE_DATA Local CSR loads an instruction into the Control Store. After the Control Store is loaded, execution proceeds as described in number 3 above.

2.3.4 Datapath Registers

As shown in the block diagram in Figure 4, each Microengine contains four types of 32-bit datapath registers:

1. 256 General Purpose registers

Intel® IXP2800 Network Processor

Technical Description

2. 512 Transfer registers

3. 128 Next Neighbor registers

4. 640 32-bit words of Local Memory

2.3.4.1 General-Purpose Registers (GPRs)

GPRs are used for general programming purposes. They are read and written exclusively under program control. GPRs, when used as a source in an instruction, supply operands to the execution datapath. When used as a destination in an instruction, they are written with the result of the execution datapath. The specific GPRs selected are encoded in the instruction.

The GPRs are physically and logically contained in two banks, GPR A, and GPR B, defined in

Table 5.

2.3.4.2 Transfer Registers

Transfer (abbreviated as Xfer) registers are used for transferring data to and from the Microengine and locations external to the Microengine, (for example DRAMs, SRAMs etc.). There are four types of transfer registers.

• S_TRANSFER_IN

• S_TRANSFER_OUT

• D_TRANSFER_IN

• D_TRANSFER_OUT

TRANSFER_IN registers, when used as a source in an instruction, supply operands to the execution datapath. The specific register selected is either encoded in the instruction, or selected indirectly via T_INDEX. TRANSFER_IN registers are written by external units (A typical case is when the external unit returns data in response to read instructions. However, there are other

Hardware Reference Manual 37

Intel® IXP2800 Network Processor

Technical Description

methods to write TRANSFER_IN registers, for example a read instruction executed by one Microengine may cause the data to be returned to a different Microengine. Details are covered in the instruction set descriptions).

TRANSFER_OUT registers, when used as a destination in an instruction, are written with the result from the execution datapath. The specific register selected is encoded in the instruction, or selected indirectly via T_INDEX. TRANSFER_OUT registers supply data to external units (for example, write data for an SRAM write).

The S_TRANSFER_IN and S_TRANSFER_OUT registers connect to the S_PUSH and S_PULL buses, respectively.

The D_TRANSFER_IN and D_TRANSFER_OUT Transfer registers connect to the D_PUSH and D_PULL buses, respectively.

Typically, the external units access the Transfer registers in response to instructions executed by the Microengines. However, it is possible for an external unit to access a given Microengine’s Transfer registers either autonomously, or under control of a different Microengine, or the Intel XScale core, etc. The Microengine interface signals controlling writing/reading of the TRANSFER_IN and TRANSFER_OUT registers are independent of the operation of the rest of the Microengine, therefore the data movement does not stall or impact other instruction processing (it is the responsibility of software to synchronize usage of read data).

2.3.4.3 Next Neighbor Registers

Next Neighbor registers, when used as a source in an instruction, supply operands to the execution datapath. They are written in two different ways:

1. By an adjacent Microengine (the “Previous Neighbor”).

2. By the same Microengine they are in, as controlled by CTX_ENABLE[NN_MODE].

The specific register is selected in one of two ways:

1. Context-relative, the register number is encoded in the instruction.

2. As a Ring, selected via NN_GET and NN_PUT CSR registers.

The usage is configured in CTX_ENABLE[NN_MODE].

• When CTX_ENABLE[NN_MODE] is ‘0’ — when Next Neighbor is a destination in an

instruction, the result is sent out of the Microengine, to the Next Neighbor Microengine.

• When CTX_ENABLE[NN_MODE] is ‘1’ — when Next Neighbor is used as a destination in

an instruction, the instruction result data is written to the selected Next Neighbor register in the same Microengine. Note that there is a 5-instruction latency until the newly written data may be read. The data is not sent out of the Microengine as it would be when CTX_ENABLE[NN_MODE] is ‘0’.

Table 4. Next Neighbor Write as a Function of CTX_ENABLE[NN_MODE]

Where the Write Goes

NN_MODE

External?

0Yes No

1No Yes

NN Register in this

Microengine?

38 Hardware Reference Manual

Intel® IXP2800 Network Processor

2.3.4.4 Local Memory

Local Memory is addressable storage within the Microengine. Local Memory is read and written exclusively under program control. Local Memory supplies operands to the execution datapath as a source, and receives results as a destination. The specific Local Memory location selected is based on the value in one of the LM_ADDR registers, which are written by local_csr_wr instructions. There are two LM_ADDR registers per Context and a working copy of each. When a Context goes to the Sleep state, the value of the working copies is put into the Context’s copy of LM_ADDR. When the Context goes to the Executing state, the value in its copy of LM_ADDR are put into the working copies. The choice of LM_ADDR_0 or LM_ADDR_1 is selected in the instruction.

It is also possible to make use of both or one LM_ADDRs as global by setting CTX_ENABLE[LM_ADDR_0_GLOBAL] and/or CTX_ENABLE[LM_ADDR_1_GLOBAL]. When used globally, all Contexts use the working copy of LM_ADDR in place of their own Context specific one; the Context specific ones are unused. There is a three-instruction latency when writing a new value to the LM_ADDR, as shown in Example 1.

Example 1. Three-Cycle Latency when Writing a New Value to LM_ADDR

;some instruction to compute the address into gpr_m local_csr_wr[INDIRECT_LM_ADDR_0, gpr_m]; put gpr_m into lm_addr ;unrelated instruction 1 ;unrelated instruction 2 ;unrelated instruction 3 alu[dest_reg, *l$index0, op, src_reg] ;dest_reg can be used as a source in next instruction

Technical Description

LM_ADDR can also be incremented or decremented in parallel with use as a source and/or destination (using the notation *l$index#++ and *l$index#--), as shown in Example 2, where three consecutive Local Memory locations are used in three consecutive instructions.

Example 2. Using LM_ADDR in Consecutive Instructions

alu[dest_reg1, src_reg1, op, *l$index0++] alu[dest_reg2, src_reg2, op, *l$index0++] alu[dest_reg3, src_reg3, op, *l$index0++]

Local Memory is written by selecting it as a destination. Example 3 shows copying a section of Local Memory to another section. Each instruction accesses the next sequential Local Memory location from the previous instruction.

Example 3. Copying One Section of Local Memory to Another Section

alu[*l$index1++, --, B, *l$index0++] alu[*l$index1++, --, B, *l$index0++] alu[*l$index1++, --, B, *l$index0++]

Example 4 shows loading and using both Local Memory addresses.

Example 4. Loading and Using Both Local Memory Addresses

local_csr_wr[INDIRECT_LM_ADDR_0, gpr_m] local_csr_wr[INDIRECT_LM_ADDR_1, gpr_n] ;unrelated instruction 1 ;unrelated instruction 2 alu[dest_reg1, *l$index0, op, src_reg1] alu[dest_reg2, *l$index1, op, src_reg2]

Hardware Reference Manual 39

Intel® IXP2800 Network Processor

Technical Description

As shown in Example 1, there is a latency in loading LM_ADDR. Until the new value is loaded, the old value is still usable. Example 5 shows the maximum pipelined usage of LM_ADDR.

Example 5. Maximum Pipelined Usage of LM_ADDR

local_csr_wr[INDIRECT_LM_ADDR_0, gpr_m] local_csr_wr[INDIRECT_LM_ADDR_0, gpr_n] local_csr_wr[INDIRECT_LM_ADDR_0, gpr_o] local_csr_wr[INDIRECT_LM_ADDR_0, gpr_p] alu[dest_reg1, *l$index0, op, src_reg1] ; uses address from gpr_m alu[dest_reg2, *l$index0, op, src_reg2] ; uses address from gpr_n alu[dest_reg3, *l$index0, op, src_reg3] ; uses address from gpr_o alu[dest_reg4, *l$index0, op, src_reg4] ; uses address from gpr_p

LM_ADDR can also be used as the base of a 16 32-bit word region of memory, with the instruction specifying the offset from that base, as shown in Example 6. The source and destination can use different offsets.

Example 6. LM_ADDR Used as Base of a 16 32-Bit Word Region of Local Memory

alu[*l$index0[3], *l$index0[4], +, 1]

Note: Local Memory has 640 32-bit words. The local memory pointers (LM_ADDR) have an addressing

range of up to 1K longwords. However, only 640 longwords are currently populated with RAM. Therefore:

0 – 639 (0x0 – 0x27F) are addressable as local memory.

640 – 1023 (0x280 – 0x3FF) are addressable, but not populated with RAM.

To the programmer, all instructions using Local Memory act as follows, including read/modify/write instructions like immed_w0, ld_field, etc.

1. Read LM_ADDR location (if LM_ADDR is specified as source).

2. Execute logic function.

3. Write LM_ADDR location (if LM_ADDR is specified as destination).

4. If specified, increment or decrement LM_ADDR.

5. Proceed to next instruction.

Example 7 is legal because

lm_addr_0[2] does not post-modify LM_ADDR.

Example 7. LM_ADDR Use as Source and Destination

alu[*l$index0[2], --, ~B, *l$index0]

In Example 7, the programmer sees:

1. Read Local Memory memory location pointed to by LM_ADDR.

2. Invert the data.

3. Write the data into the address pointed to by LM_ADDR with the value of 2 that is OR’ed into the lower bits.

4. Increment LM_ADDR.

5. Proceed to next instruction.

40 Hardware Reference Manual

In Example 8, the second instruction will access the Local Memory location one past the source/ destination of the first.

Example 8. LM_ADDR Post-Increment

alu[*l$index0++, --, ~B, gpr_n] alu[gpr_m, --, ~B, *l$index0]

2.3.5 Addressing Modes

GPRs can be accessed in either a context-relative or an absolute addressing mode. Some instructions can specify either mode; other instructions can specify only Context-Relative mode.

Transfer and Next Neighbor registers can be accessed in Context-Relative and Indexed modes, and Local Memory is accessed in Indexed mode. The addressing mode in use is encoded directly into each instruction, for each source and destination specifier.

2.3.5.1 Context-Relative Addressing Mode

The GPRs are logically subdivided into equal regions such that each Context has relative access to one of the regions. The number of regions is configured in the CTX_ENABLE CSR, and can be either 4 or 8. Thus a Context-Relative register number is actually associated with multiple different physical registers. The actual register to be accessed is determined by the Context making the access request (the Context number is concatenated with the register number specified in the instruction). Context-Relative addressing is a powerful feature that enables eight (or four) different contexts to share the same code image, yet maintain separate data.

Intel® IXP2800 Network Processor

Technical Description

Table 5 shows how the Context number is used in selecting the register number in relative mode.

The register number in Table 5 is the Absolute GPR address, or Transfer or Next Neighbor Index number to use to access the specific Context-Relative register. For example, with eight active Contexts, Context-Relative Register 0 for Context 2 is Absolute Register Number 32.

Table 5. Registers Used By Contexts in Context-Relative Addressing Mode

Number of

Active

Contexts

(Instruction

always specifies

registers in

range 0 – 15)

(Instruction

always specifies

registers in

range 0 – 31)

Active Context Number

0 0 – 15 0 – 15 0 – 15 0 – 15

1 16 – 31 16 – 31 16 – 31 16 – 31

2 32 – 47 32 – 47 32 – 47 32 – 47

3 48 – 63 48 – 63 48 – 63 48 – 63

4 64 – 79 64 – 79 64 – 79 64 – 79

5 80 – 95 80 – 95 80 – 95 80 – 95

6 96 – 111 96 – 111 96 – 111 96 – 111

7 112 – 127 112 – 127 112 – 127 112 – 127

0 0 – 31 0 – 31 0 – 31 0 – 31

2 32 – 63 32 – 63 32 – 63 32 – 63

4 64 – 95 64 – 95 64 – 95 64 – 95

6 96 – 127 96 – 127 96 – 127 96 – 127

Absolute Register Numbers

A Port B Port

GPR

S_Transfer or

Neighbor

Index Number

D_Transfer

Index Number

Hardware Reference Manual 41

Intel® IXP2800 Network Processor

Technical Description

2.3.5.2 Absolute Addressing Mode

With Absolute addressing, any GPR can be read or written by any of the eight Contexts in a Microengine. Absolute addressing enables register data to be shared among all of the Contexts, e.g., for global variables or for parameter passing. All 256 GPRs can be read by Absolute address.

2.3.5.3 Indexed Addressing Mode

With Indexed addressing, any Transfer or Next Neighbor register can be read or written by any one of the eight Contexts in a Microengine. Indexed addressing enables register data to be shared among all of the Contexts. For indexed addressing the register number comes from the T_INDEX register for Transfer registers or NN_PUT and NN_GET registers (for Next Neighbor registers).

Example 9 shows the Index Mode usage. Assume that the numbered bytes have been moved into

the S_TRANSFER_IN registers as shown.

Example 9. Use of Indexed Addressing Mode

Transfer Register

Number

0 0x00 0x01 0x02 0x03

1 0x04 0x05 0x06 0x07

2 0x08 0x09 0x0a 0x0b

3 0x0c 0x0d 0x0e 0x0f

4 0x10 0x11 0x12 0x013

5 0x14 0x15 0x16 0x17

6 0x18 0x19 0x1a 0x1b

7 0x1c 0x1d 0x1e 0x1f

31:24 23:16 15:8 7:0

Data

If the software wants to access a specific byte that is known at compile-time, it will normally use context-relative addressing. For example to access the word in transfer register 3:

alu[dest, --, B, $xfer3] ; move the data from s_transfer 3 to gpr dest

If the location of the data is found at run-time, indexed mode can be used, e.g., if the start of an encapsulated header depends on an outer header value (the outer header byte is in a fixed location).

; Check byte 2 of transfer 0 ; If value==5 header starts on byte 0x9, else byte 0x14 br=byte[$0, 2, 0x5, L1#], defer_[1] local_csr_wr[t_index_byte_index, 0x09] local_csr_wr[t_index_byte_index, 0x14] nop ; wait for index registers to be loaded L1#: ; Move bytes right justified into destination registers nop ; wait for index registers to be loaded nop ; byte_align_be[dest1, *$index++] byte_align_be[dest2, *$index++] ;etc. ; The t_index and byte_index registers are loaded by the same instruction.

42 Hardware Reference Manual

2.3.6 Local CSRs

Local Control and Status registers (CSRs) are external to the Execution Datapath, and hold specific data. They can be read and written by special instructions (local_csr_rd and local_csr_wr) and are accessed less frequently than datapath registers.

Because Local CSRs are not built in the datapath, there is a write-to-use delay of three instructions, and a read-to-consume penalty of two instructions.

2.3.7 Execution Datapath

The Execution Datapath can take one or two operands, perform an operation, and optionally write back a result. The sources and destinations can be GPRs, Transfer registers, Next Neighbor registers, and Local Memory. The operations are shifts, add/subtract, logicals, multiply, byte align, and find first one bit.

2.3.7.1 Byte Align

The datapath provides a mechanism to move data from source register(s) to any destination register(s) with byte aligning. Byte aligning takes four consecutive bytes from two concatenated values (8 bytes), starting at any of four byte boundaries (0, 1, 2, 3), and based on the endian-type (which is defined in the instruction opcode), as shown in Example 5. The four bytes are taken from two concatenated values. Four bytes are always supplied from a temporary register that always holds the A or B operand from the previous cycle, and the other four bytes from the B or A operand of the Byte Align instruction.

Intel® IXP2800 Network Processor

Technical Description

The operation is described below, using the block diagram in Figure 6. The alignment is controlled by the two LSBs of the BYTE_INDEX Local CSR.

Table 6. Align Value and Shift Amount

Right Shift Amount (Number of Bits)

Align Value

(in Byte_Index[1:0])

Little-Endian Big-Endian

0032

1824

21616

3248

(Decimal)

Hardware Reference Manual 43

Intel® IXP2800 Network Processor

Technical Description

Figure 6. Byte-Align Block Diagram

Prev_B

. . .

Example 10 shows a big-endian align sequence of instructions and the value of the various

operands. Table 7 shows the data in the registers for this example. The value in BYTE_INDEX[1:0] CSR (which controls the shift amount) for this example is 2.

Table 7. Register Contents for Example 10

00123

14567

289AB

3CDEF

Byte 3 [31:24]

Byte 2 [23:16]

Byte 1

[15:8]

Shift

Result

Prev_A

. . .

B_OperandA_Operand

Byte_Index

A9353-01

Byte 0

[7:0]

Example 10. Big-Endian Align

Instruction Prev B A Operand B Operand Result

Byte_align_be[--, r0] -- -- 0123 -Byte_align_be[dest1, r1] 0123 0123 4567 2345 Byte_align_be[dest2, r2] 4567 4567 89AB 6789 Byte_align_be[dest3, r3] 89AB 89AB CDEF ABCD

NOTE: A Operand comes from Prev_B register during byte_align_be instructions.

44 Hardware Reference Manual

Example 11 shows a little-endian sequence of instructions and the value of the various operands. Table 8 shows the data in the registers for this example. The value in BYTE_INDEX[1:0] CSR

(which controls the shift amount) for this example is 2.

Table 8. Register Contents for Example 11

Intel® IXP2800 Network Processor

Technical Description

03210

17654

2BA98

3FEDC

Byte 3

[31:24]

Example 11. Little-Endian Align

Instruction A Operand B Operand Prev A Result

Byte_align_le[--, r0] 3210 -- -- -- Byte_align_le[dest1, r1] 7654 3210 3210 5432 Byte_align_le[dest2, r2] BA98 7654 7654 9876 Byte_align_le[dest3, r3] FEDC BA98 BA98 DCBA

NOTE: B Operand comes from Prev_A register during byte_align_le instructions.

As the examples show, byte aligning “n” words takes “n+1” cycles due to the first instruction needed to start the operation.

Another mode of operation is to use the T_INDEX register with post-increment, to select the source registers. T_INDEX operation is described later in this chapter.

Byte 2 [23:16]

Byte 1

[15:8]

Byte 0

[7:0]

2.3.7.2 CAM

The block diagram in Figure 7 is used to explain the CAM operation.

The CAM has 16 entries. Each entry stores a 32-bit value, which can be compared against a source operand by instruction:

CAM_Lookup[dest_reg, source_reg]

All entries are compared in parallel, and the result of the lookup is a 9-bit value that is written into the specified destination register in bits 11:3, with all other bits of the register 0 (the choice of bits 11:3 is explained below). The result can also optionally be written into either of the LM_Addr registers (see below in this section for details).

The 9-bit result consists of four State bits (dest_reg[11:8]), concatenated with a 1-bit Hit/Miss indication (dest_reg[7]), concatenated with 4-bit entry number (dest_reg[6:3]). All other bits of dest_reg are written with 0. Possible results of the lookup are:

• miss (0) — lookup value is not in CAM, entry number is Least Recently Used entry (which

can be used as a suggested entry to replace), and State bits are 0000.

• hit (1) — lookup value is in CAM, entry number is entry that has matched; State bits are the

value from the entry that has matched.

Hardware Reference Manual 45

Intel® IXP2800 Network Processor

Technical Description

Figure 7. CAM Block Diagram

Lookup Value

(from A port)

Tag Tag State

Tag State

State

Match Match Match

Match

State Status Entry Number

0000 Miss 0 LRU Entry

State Hit 1 Hit Entry

Status

and

LRU

Logic

Lookup Status

(to Dest Req)

A9354-01

Note: The State bits are data associated with the entry. The use is only by software. There is no

implication of ownership of the entry by any Context. The State bits hardware function is:

• the value is set by software (at the time the entry is loaded, or changed in an already loaded

entry).

• its value is read out on a lookup that hits, and used as part of the status written into the

destination register.

• its value can be read out separately (normally only used for diagnostic or debug).

The LRU (Least Recently Used) Logic maintains a time-ordered list of CAM entry usage. When an entry is loaded, or matches on a lookup, it is marked as MRU (Most Recently Used). Note that a lookup that misses does not modify the LRU list.

The CAM is loaded by instruction:

CAM_Write[entry_reg, source_reg, state_value]

The value in the register specified by source_reg is put into the Tag field of the entry specified by entry_reg. The value for the State bits of the entry is specified in the instruction as state_value.

46 Hardware Reference Manual

The value in the State bits for an entry can be written, without modifying the Tag, by instruction:

CAM_Write_State[entry_reg, state_value]

Note: CAM_Write_State does not modify the LRU list.

One possible way to use the result of a lookup is to dispatch to the proper code using instruction:

jump[register, label#],defer [3]

where the register holds the result of the lookup. The State bits can be used to differentiate cases where the data associated with the CAM entry is in flight, or is pending a change, etc. Because the lookup result was loaded into bits[11:3] of the destination register, the jump destinations are spaced eight instructions apart. This is a balance between giving enough space for many applications to complete their task without having to jump to another region, versus consuming too much Control Store. Another way to use the lookup result is to branch on just the hit miss bit, and use the entry number as a base pointer into a block of Local Memory.

When enabled, the CAM lookup result is loaded into Local_Addr as follows:

LM_Addr[5:0] = 0 ([1:0] are read-only bits) LM_Addr[9:6] = lookup result [6:3] (entry number) LM_Addr[11:10] = constant specified in instruction

Intel® IXP2800 Network Processor

Technical Description

This function is useful when the CAM is used as a cache, and each entry is associated with a block of data in Local Memory. Note that the latency from when CAM_Lookup executes until the LM_Addr is loaded is the same as when LM_Addr is written by a Local_CSR_Wr instruction.

The Tag and State bits for a given entry can be read by instructions:

CAM_Read_Tag[dest_reg, entry_reg] CAM_Read_State[dest_reg, entry_reg]

The Tag value and State bits value for the specified entry is written into the destination register, respectively for the two instructions (the State bits are placed into bits [11:8] of dest_reg, with all other bits 0). Reading the tag is useful in the case where an entry needs to be evicted to make room for a new value—the lookup of the new value results in a miss, with the LRU entry number returned as a result of the miss. The CAM_Read_Tag instruction can then be used to find the value that was stored in that entry. An alternative would be to keep the tag value in a GPR. These two instructions can also be used by debug and diagnostic software. Neither of these modify the state of the LRU pointer.

Note: The following rules must be adhered to when using the CAM.

• CAM is not reset by Microengine reset. Software must either do a CAM_clear prior to using

the CAM to initialize the LRU and clear the tags to 0, or explicitly write all entries with

CAM_write.

• No two tags can be written to have same value. If this rule is violated, the result of a lookup

that matches that value will be unpredictable, and LRU state is unpredictable.

The value 0x00000000 can be used as a valid lookup value. However, note that instruction puts 0x00000000 into all tags. To avoid violating rule 2 after doing necessary to write all entries to unique values prior to doing a lookup of 0x00000000.

CAM_clear

CAM_clear, it is

Hardware Reference Manual 47

Intel® IXP2800 Network Processor

Technical Description

An algorithm for debug software to find out the contents of the CAM is shown in Example 12.

Example 12. Algorithm for Debug Software to Find out the Contents of the CAM

; First read each of the tag entries. Note that these reads ; don’t modify the LRU list or any other CAM state. tag[0] = CAM_Read_Tag(entry_0);

......

tag[15] = CAM_Read_Tag(entry_15); ; Now read each of the state bits

state[0] = CAM_Read_State(entry_0); ... state[15] = CAM_Read_State(entry_15);

; Knowing what tags are in the CAM makes it possible to ; create a value that is not in any tag, and will therefore ; miss on a lookup.

; Next loop through a sequence of 16 lookups, each of which will ; miss, to obtain the LRU values of the CAM. for (i = 0; i < 16; i++) BEGIN_LOOP ; Do a lookup with a tag not present in the CAM. On a ; miss, the LRU entry will be returned. Since this lookup ; missed the LRU state is not modified. LRU[i] = CAM_Lookup(some_tag_not_in_cam); ; Now do a lookup using the tag of the LRU entry. This ; lookup will hit, which makes that entry MRU. ; This is necessary to allow the next lookup miss to ; see the next LRU entry. junk = CAM_Lookup(tag[LRU[i]]); END_LOOP

; Because all entries were hit in the same order as they were ; LRU, the LRU list is now back to where it started before the ; loop executed. ; LRU[0] through LRU[15] holds the LRU list.

The CAM can be cleared with CAM_Clear instruction. This instruction writes 0x00000000 simultaneously to all entries tag, clears all the state bits, and puts the LRU into an initial state (where entry 0 is LRU, ..., entry 15 is MRU).

2.3.8 CRC Unit

The CRC Unit operates in parallel with the Execution Datapath. It takes two operands, performs a CRC operation, and writes back a result. CRC-CCITT, CRC-32, CRC-10, CRC-5, and iSCSI polynomials are supported. One of the operands is the CRC_Remainder Local CSR, and the other is a GPR, Transfer_In register, Next Neighbor, or Local Memory, specified in the instruction and passed through the Execution Datapath to the CRC Unit.

The instruction specifies the CRC operation type, whether to swap bytes and or bits, and which bytes of the operand to include in the operation. The result of the CRC operation is written back into CRC_Remainder. The source operand can also be written into a destination register (however the byte/bit swapping and masking do not affect the destination register; they only affect the CRC computation). This allows moving data, for example, from S_TRANSFER_IN registers to S_TRANSFER_OUT registers at the same time as computing the CRC.

48 Hardware Reference Manual

2.3.9 Event Signals

Event Signals are used to coordinate a program with completion of external events. For example, when a Microengine executes an instruction to an external unit to read data (which will be written into a Transfer_In register), the program must insure that it does not try to use the data until the external unit has written it. This time is not deterministic due to queuing delays and other uncertainty in the external units (for example, DRAM refresh). There is no hardware mechanism to flag that a register write is pending, and then prevent the program from using it. Instead the coordination is under software control, with hardware support.

In the instructions that use external units (i.e., SRAM, DRAM, etc.) there are fields that direct the external unit to supply an indication (called an Event Signal) that the command has been completed. There are 15 Event Signals per Context that can be used, and Local CSRs per Context to track which Event Signals are pending and which have been returned. The Event Signals can be used to move a Context from Sleep state to Ready state, or alternatively, the program can test and branch on the status of Event Signals.

Event Signals can be set in nine different ways.

1. When data is written into S_TRANSFER_IN registers

2. When data is written into D_TRANSFER_IN registers

3. When data is taken from S_TRANSFER_OUT registers

Intel® IXP2800 Network Processor

Technical Description

4. When data is taken from D_TRANSFER_OUT registers

5. By a write to INTERTHREAD_SIGNAL register

6. By a write from Previous Neighbor Microengine to NEXT_NEIGHBOR_SIGNAL

7. By a write from Next Neighbor Microengine to PREVIOUS_NEIGHBOR_SIGNAL

8. By a write to SAME_ME_SIGNAL Local CSR

9. By Internal Timer

Any or all Event Signals can be set by any of the above sources.

When a Context goes to the Sleep state (executes a

ctx_swap token), it specifies which Event Signal(s) it requires to be put in Ready state.

The

ctx_arb instruction also specifies if the logical AND or logical OR of the Event Signal(s) is

ctx_arb instruction, or an instruction with

needed to put the Context into Ready state.

When all of the Context’s Event Signals arrive, the Context goes to Ready state, and then eventually to Executing state. In the case where the Event Signal is linked to moving data into or out of Transfer registers (numbers 1 through 4 in the list above), the code can safely use the Transfer register as the first instruction (for example, using a Transfer_In register as a source operand will get the new read data). The same is true when the Event Signal is tested for branches (br_=signal or br_!signal instructions).

The

ctx_arb instruction, CTX_SIG_EVENTS, and ACTIVE_CTX_WAKEUP_#_EVENTS

Local CSR descriptions provide details.

Hardware Reference Manual 49

Intel® IXP2800 Network Processor

Technical Description

2.4 DRAM

The IXP2800 Network Processor has controllers for three Rambus* DRAM (RDRAM) channels. Each of the controllers independently accesses its own RDRAMs, and can operate concurrently with the other controllers (i.e., they are not operating as a single, wider memory). DRAM provides high-density, high-bandwidth storage and is typically used for data buffers.

• RDRAM sizes of 64, 128, 256, or 512 Mbytes, and 1 Gbyte are supported; however, each of

the channels must have the same number, size, and speed of RDRAMs populated. Refer to

Section 5.2 for supported size and loading configurations.

• Up to two Gbytes of DRAM is supported. If less than two Gbytes of memory is present, the

upper part of the address space is not used. It is also possible, for system cost and area savings, to have Channels 0 and 1 populated with Channel 2 empty, or Channel 0 populated with Channels 1and 2 empty.

• Reads and writes to RDRAM are generated by Microengines, The Intel XScale

(external Bus Masters and DMA Channels). The controllers also do refresh and calibration cycles to the RDRAMs, transparently to software.

• RDRAM Powerdown and Nap modes are not supported.

• Hardware interleaving (also known as striping) of addresses is done to provide balanced

access to all populated channels. The interleave size is 128 bytes. Interleaving helps to maintain utilization of available bandwidth by spreading consecutive accesses to multiple channels. The interleaving is done in the hardware in such a way that the three channels appear to software as a single contiguous memory space.

core, and PCI

• ECC (Error Correcting Code) is supported, but can be disabled. Enabling ECC requires that

x18 RDRAMs be used. If ECC is disabled x16 RDRAMs can be used. ECC can detect and correct all single-bit errors, and detect all double-bit errors. When ECC is enabled, partial writes (writes of less than 8 bytes) must be done as read-modify-writes.

2.4.1 Size Configuration

Each channel can be populated with anywhere from one-to-four RDRAMs (Short Channel Mode). Refer to Section 5.2 for supported size and loading configurations. The RAM technology used will determine the increment size and maximum memory per channel as shown in Table 9.

Table 9. RDRAM Sizes

RDRAM Technology

64/72 MB 8 MB 256 MB

128/144 MB 16 MB 512 MB

256/288 MB 32 MB 1 GB

512/576 MB 64 MB 2 GB

NOTES:

1. The two numbers shown for each technology indicate x16 parts and x18 parts.

2. The maximum memory that can be addressed across all channels is 2 GB. This limitation is based on the partitioning of the 4-GB address space (32-bit addresses). Therefore, if all three channels are used, each can be populated up to a maximum of 768 MB. Two channels can be populated to a maximum of 1 GB each. A single channel can be populated to a maximum of 2 GB.

Increment Size Maximum per Channel

RDRAMs with 1 x 16 or 2 x 16 dependent banks, and 4 independent banks are supported.

50 Hardware Reference Manual

2.4.2 Read and Write Access

The minimum DRAM physical access length is 16 bytes. Software (and PCI) can read or write as little as a single byte, however the time (and bandwidth) taken at the DRAMs is the same as for an access of 16 bytes. Therefore, the best utilization of DRAM bandwidth will be for accesses that are multiples of 16 bytes.

If ECC is enabled, writes of less than 8 bytes must do read-modify-writes, which take two 16-byte time accesses (one for the read and one for the write).

2.5 SRAM

The IXP2800 Network Processor has four independent SRAM controllers, which each support pipelined QDR synchronous static RAM (SRAM) and/or a coprocessor that adheres to QDR signaling. Any or all controllers can be left unpopulated if the application does not need to use them. SRAM are accessible by the Microengines, the Intel XScale (external bus masters and DMA).

The memory is logically four bytes (32-bits) wide; physically the data pins are two bytes wide and are double clocked. Byte parity is supported. Each of the four bytes has a parity bit, which is written when the byte is written and checked when the data is read. There are byte-enables that select which bytes to write for writes of less than 32 bits.

Intel® IXP2800 Network Processor

Technical Description

core, and the PCI Unit

Each of the 4 QDR ports are QDR and QDRII compatible. Each port implements the “_K” and “_C” output clocks and “_CQ” as an input and their inversions. (Note: the “_C” and “_CQ” clocks are optional). Extensive work has been performed providing impedance controls within the IXP2800 Network Processor for processor-initiated signals driving to QDR parts. Providing a clean signaling environment is critical to achieving 200 – 250 MHz QDRII data transfers.

The configuration assumptions for the IXP2800 Network Processor I/O driver/receiver development includes four QDR loads and the IXP2800 Network Processor. The IXP2800 Network Processor supports bursts of two SRAMs, but does not support bursts of four SRAMs.

The SRAM controller can also be configured to interface to an external coprocessor that adheres to the QDR electricals and protocol. Each SRAM controller may also interface to an external coprocessor through its standard QDR interface. This interface enables the cohabitation of both SRAM devices and coprocessors to operate on the same bus. The coprocessor behaves as a memory-mapped device on the SRAM bus.

Hardware Reference Manual 51

Intel® IXP2800 Network Processor

Technical Description

2.5.1 QDR Clocking Scheme

The controller drives out two pairs of K clock (K and K#). It also drives out two pairs of C clock (C and C#). Both C/C# clocks externally return to the controller for reading data. Figure 8 shows

Figure 8. Echo Clock Configuration

the clock diagram if the clocking scheme for QDR interface driving four SRAM chips.

Termination

Clam-shelled SRAMS

CQ/CQ#

Package Balls

QDRn_CIN[0]

Intel IXP2800 Network Processor

*The CIN[1] pin is not used internally to capture the READ data; however, the I/O Pad can be used to terminate the signal.

QDRn_K[0] QDRn_C[0]

QDRn_C[1] QDRn_K[1]

*QDRn_CIN[1]

Termination

2.5.2 SRAM Controller Configurations

Each channel has enough address pins (24) to support up to 64 Mbytes of SRAM. The SRAM controllers can directly generate multiple port enables (up to four pairs) to allow for depth expansion. Two pairs of pins are dedicated for port enables. Smaller RAMs use fewer address signals than the number provided to accommodate the largest RAMs, so some address pins (23:20) are configurable as either address or port enable based on CSR setting as shown in Table 10. Note that all of the SRAMs on a given channel must be the same size.

K/K#

C/C#

K/K#

Package Balls

CQ/CQ#

B3664-01

Table 10. SRAM Controller Configurations (Sheet 1 of 2)

SRAM

Configuration

512K x 18 1 MB 17:0 23:22, 21:20 4

1M x 18 2 MB 18:0 23:22, 21:20 4

2M x 18 4 MB 19:0 23:22, 21:20 4

4M x 18 8 MB 20:0 23:22 3

SRAM Size

52 Hardware Reference Manual

Addresses Needed

to Index SRAM

Addresses Used

as Port Enables

Total Number of Port

Select Pairs Available

Table 10. SRAM Controller Configurations (Sheet 2 of 2)

Intel® IXP2800 Network Processor

Technical Description

SRAM

Configuration

8M x 18 16 MB 21:0 23:22 3

16M x 18 32 MB 22:0 None 2

32M x 18 64 MB 23:0 None 2

SRAM Size

Addresses Needed

to Index SRAM

Addresses Used

as Port Enables

Total Number of Port

Select Pairs Available

Each channel can be expanded by depth according to the number of port enables available. If external decoding is used, then the number of SRAMs used is not limited by the number of port enables generated by the SRAM controller.

Note: Doing external decoding may require external pipeline registers to account for the decode time,

depending on the desired frequency.

Maximum SRAM system sizes are shown in Table 11. Shaded entries require external decoding, because they use more port enables than the SRAM controller can supply directly.

Table 11. Total Memory per Channel

SRAM Size

12345678

512K x 18 1 MB 2 MB 3 MB

1M x 18 2 MB 4 MB 6 MB 8 MB 10 MB 12 MB 14 MB 16 MB

2M x 18 4 MB 8 MB 12 MB 16 MB 20 MB 24 MB 28 MB 32 MB

4M x 18 8 MB 16 MB 24 MB 32 MB 64 MB NA NA NA

8M x 18 16 MB 32 MB 48 MB 64 MB NA NA NA NA

16M x 18 32 MB 64 MB NA NA NA NA NA NA

32M x 18 64 MB NA NA NA NA NA NA NA

Number of SRAMs on Channel

4 MB 5 MB 6 MB 7 MB 8 MB

2.5.3 SRAM Atomic Operations

In addition to normal reads and writes, SRAM supports the following atomic operations. Microengines have specific instructions to do each atomic operation; Intel XScale microarchitecture uses aliased address regions to do atomic operations.

• bit set

• bit clear

• increment

• decrement

• add

• swap

The SRAM does read-modify-writes for the atomic operations, the pre-modified data can also be returned if desired. The atomic operations operate on a single 32-bit word.

Hardware Reference Manual 53

Intel® IXP2800 Network Processor

Technical Description

2.5.4 Queue Data Structure Commands

The ability to enqueue and dequeue data buffers at a fast rate is key to meeting line-rate performance. This is a difficult problem as it involves dependent memory references that must be turned around very quickly. The SRAM controller includes a data structure (called the Q_array) and associated control logic to perform efficient enqueue and dequeue operations. The Q_array has 64 entries, each of which can be used in one of four ways.

• Linked-list queue descriptor (resident queues)

• Cache of recently used linked-list queue descriptors (backing store for the cache is in SRAM)

• Ring descriptor

• Journal

The commands provided are:

For Linked-list queues or Cache of recently used linked-list queue descriptors

• Read_Q_Descriptor_Head(address, length, entry, xfer_addr)

• Read_Q_Descriptor_Tail(address, length, entry)

• Read_Q_Descriptor_Other(address, entry)

• Write_Q_Descriptor(address, entry)

• Write_Q_Descriptor_Count(address, entry)

• ENQ(buff_desc_adr, cell_count, EOP, entry)

• ENQ_tail(buff_desc_adr, entry)

• DEQ(entry, xfer_addr)

For Rings

• Get(entry, length, xfer_addr)

• Put(entry, length, xfer_addr)

For Journals

• Journal(entry, length, xfer_addr)

• Fast_journal(entry)

Note: The Read_Q_Descriptor_Head, Read_Q_Descriptor_Tail, etc.) are used to initialize the rings and

journals but not used to perform the ring and journal function.

2.5.5 Reference Ordering

This section covers the ordering between accesses to any one SRAM controller.

2.5.5.1 Reference Order Tables

Table 12 shows the architectural guarantees of order to access to the SAME SRAM address

between a reference of any given type (shown in the column labels) and a subsequent reference of any given type (shown in the row labels). The definition of first and second is defined by the order they are received by the SRAM controller.

Note: A given Network Processor version may implement a superset of these order guarantees. However,

that superset may not be supported in future implementations.

54 Hardware Reference Manual

Verification is required to test only the order rules shown in Table 12 and Table 13).

Note: A blank entry in Table 12 means that no order is enforced.

Table 12. Address Reference Order

Intel® IXP2800 Network Processor

Technical Description

1st ref

ref Memory

Read

Memory Read Order

CSR Read Order

Memory Write Order

CSR Write Order

Memory RMW Order

Queue / Ring / Q_ Descr Commands

CSR Read

Table 13 shows the architectural guarantees of order to access to the SAME SRAM Q_array entry

Table 13. Q_array Entry Reference Order

1st ref

ref

Read_Q

_Descr

head,

tail

Read_

Q_Des

other

Write_Q

_Descr

Memory

Write

Enqueue Dequeue Put Get Journal

CSR Write

Memory

RMW

Queue /

Ring /

Q_Descr

Commands

See

Table 13.

Read_Q_Descr head,tail

Read_Q_ Descr other

Write_Q_ Descr

Enqueue Order Order Order

Dequeue Order Order Order

Put Order

Get Order

Journal Order

Order

Hardware Reference Manual 55

Intel® IXP2800 Network Processor

Technical Description

2.5.5.2 Microengine Software Restrictions to Maintain Ordering

It is the Microengine programmer’s job to ensure order where the program flow finds order to be necessary and where the architecture does not guarantee that order. The signaling mechanism can be used to do this. For example, say that microcode needs to update several locations in a table. A location in SRAM is used to “lock” access to the table. Example 13 is the code for the table update.

Example 13. Table Update Code

IMMED [$xfer0, 1] SRAM [write, $xfer0, flag_address, 0, 1], ctx_swap [SIG_DONE_2] ; At this point, the write to flag_address has passed the point of coherency. Do

the table updates. SRAM [write, $xfer1, table_base, offset1, 2] , sig_done [SIG_DONE_3] SRAM [write, $xfer3, table_base, offset2, 2] , sig_done [SIG_DONE_4] CTX_ARB [SIG_DONE_3, SIG_DONE_4] ; At this point, the table writes have passed the point of coherency. Clear the

flag to allow access by other threads. IMMED [$xfer0, 0] SRAM [write, $xfer0, flag_address, 0, 1, ctx_swap [SIG_DONE_2]

Other rules:

• All accesses to atomic variables should be via read-modify-write instructions.

• If the flow must know that a write is completed (actually in the SRAM itself), follow the write

with a read to the same address. The write is guaranteed to be complete when the read data has been returned to the Microengine.

• With the exception of initialization, never do WRITE commands to the first three longwords

of a queue_descriptor data structure (these are the longwords that hold head, tail, and count, etc.). All accesses to this data must be via the Q commands.

• To initialize the Q_array registers, perform a memory write of at least three longwords,

followed by a memory read to the same address (to guarantee that the write completed). Then, for each entry in the Q_array, perform a read_q_descriptor_head followed by a read_q_descriptor_other using the address of the same three longwords.

2.6 Scratchpad Memory

The IXP2800 Network Processor contains a 16 Kbytes of Scratchpad Memory, organized as 4K 32-bit words, that is accessible by Microengines and the Intel XScale Memory provides the following operations:

• Normal reads and writes. 1–16 32-bit words can be read/written with a single Microengine

instruction. Note that Scratchpad is not byte-writable (each write must write all four bytes).

• Atomic read-modify-write operations, bit-set, bit-clear, increment, decrement, add, subtract,

and swap. The RMW operations can also optionally return the pre-modified data.

• Sixteen Hardware Assisted Rings for interprocess communication. (A ring is a FIFO that uses

a head and tail pointer to store/read information in Scratchpad memory.)

core. The Scratchpad

Scratchpad Memory is provided as a third memory resource (in addition to SRAM and DRAM) that is shared by the Microengines and the Intel XScale

XScale

core can distribute memory accesses between these three types of memory resources to

core. The Microengines and the Intel

provide a greater number of memory accesses occurring in parallel.

56 Hardware Reference Manual

2.6.1 Scratchpad Atomic Operations

Intel® IXP2800 Network Processor

Technical Description

In addition to normal reads and writes, the Scratchpad Memory supports the following atomic operations. Microengines have specific instructions to do each atomic operation; the Intel XScale microarchitecture uses aliased address regions to do atomic operations.

• bit set

• bit clear

• increment

• decrement

• add

• subtract

• swap

The Scratchpad Memory does read-modify-writes for the atomic operations, the pre-modified data can also be returned if desired. The atomic operations operate on a single 32-bit word.

2.6.2 Ring Commands

The Scratchpad Memory provides sixteen Rings used for interprocess communication. The rings provide two operations.

• Get(ring, length)

• Put(ring, length)

Ring is the number of the ring (0 through 15) to get or put from, and length specifies the number of 32-bit words to transfer. A logical view of one of the rings is shown in Tabl e 9.

Figure 9. Logical View of Rings

1 of 16

Head Count SizeTail

Address Decoder

Scratchpad RAM

Read / Write / Atomic Addresses

Full

A9355-01

Hardware Reference Manual 57

Intel® IXP2800 Network Processor

Technical Description

Head, Tail, and Size are registers in the Scratchpad Unit. Head and Tail point to the actual ring data, which is stored in the Scratchpad RAM. The count of how many entries are on the Ring is determined by hardware using the Head and Tail. For each Ring in use, a region of Scratchpad RAM must be reserved for the ring data.

Note: The reservation is by software convention. The hardware does not prevent other accesses to the

region of Scratchpad Memory used by the Ring. Also the regions of Scratchpad Memory allocated to different Rings must not overlap.

Head points to the next address to be read on a get, and Tail points to the next address to be written on a put. The size of each Ring is selectable from the following choices: 128, 256, 512, or 1024 32-bit words.

Note: The region of Scratchpad used for a Ring is naturally aligned to it size.

When the Ring is near full, it asserts an output signal, which is used as a state input to the Microengines. They must use that signal to test (by doing Branch on Input State) for room on the Ring before putting data onto it. There is a lag in time from a put instruction executing to the Full signal being updated to reflect that put. To guarantee that a put will not overfill the ring there is a bound on the number of Contexts and the number of 32-bit words per write based on the size of the ring, as shown in Table 14. Each Context should test the Full signal, then do the put if not Full, and then wait until the Context has been signaled that the data has been pulled before testing the Full signal again.

An alternate usage method is to have Contexts allocate and deallocate entries from a shared count variable, using the atomic subtract to allocate and atomic add to deallocate. In this case the Full signal is not used.

Table 14. Ring Full Signal Use – Number of Contexts and Length versus Ring Size

Number of

Contexts

1 16161616

2 16161616

4 8 16 16 16

8 4 12 16 16

16 2 6 14 16

2414916

3213715

40 Illegal 2 5 12

48 Illegal 2 4 10

64 Illegal 1 3 7

128 Illegal Illegal 1 3

NOTES:

1. Number in each table entry is the largest length that should be put. 16 is the largest length that a single put instruction can generate.

2. Illegal -- With that number of Contexts, even a length of one could cause the Ring to overfill.

128 256 512 1024

Ring Size

58 Hardware Reference Manual

Intel® IXP2800 Network Processor

2.7 Media and Switch Fabric Interface

The Media and Switch Fabric (MSF) Interface is used to connect the IXP2800 Network Processor to a physical layer device (PHY) and/or to a Switch Fabric. the MSF consists of separate receive and transmit interfaces. Each of the receive and transmit interfaces can be separately configured for either SPI-4 Phase 2 (System Packet Interface) for PHY devices or CSIX-L1 protocol for Switch Fabric Interfaces.

The receive and transmit ports are unidirectional and independent of each other. Each port has 16 data signals, a clock, a control signal, and a parity signal, all of which use LVDS (differential) signaling, and are sampled on both edges of the clock. There is also a flow control port consisting of a clock, data, and ready status bits, and used to communicate between two IXP2800 Network Processors, or the IXP2800 Network Processor chip and a Switch Fabric Interface. These are also LVDS, dual-edge data transfer. All of the high speed LVDS interfaces support dynamic deskew training.

The block diagram in Figure 10 shows a typical configuration.

Figure 10. Example System Block Diagram

Technical Description

Receive protocol is SPI-4 Transmit mode is CSIX

Ingress

Intel

IXP2800

Network Processor

Framing/MAC

Device

(PHY)

SPI-4

Protocol

RDAT

RSTAT

Intel

Egress

IXP2800

TDAT

Flow Control

Optional

Gasket

(Note

Switch

)

Fabric

CSIX

Protocol

Network Processor

TSTAT

TDAT

Receive protocol is CSIX Transmit mode is SPI-4

Notes:

1. Gasket is used to convert 16-bit, dual-data IXP2800 signals to wider single edge CWord signals used by Switch Fabric, if required.

2. Per the CSIX specification, the terms "egress" and ingress" are with respect to the Switch Fabric.

So the egress processor handles traffic received from the Switch Fabric and the ingress processor handles traffic sent to the Switch Fabric.

RDAT

A9356-03

Hardware Reference Manual 59

Intel® IXP2800 Network Processor

Technical Description

An alternate system configuration is shown in the block diagram in Figure 11. In this case, a single IXP2800 Network Processor is used for both Ingress and Egress. The bit rate supported would be less than in Figure 10. A hypothetical Bus Converter chip, external to the IXP2800 Network Processor is used. The block diagram in Figure 11 is only an illustrative example.

Figure 11. Full-Duplex Block Diagram

Receive and transmit protocol is SPI-4 and CSIX on transferby-transfer basis.

Intel® IXP2800

Network Processor

TDATRDAT

Notes:

2.7.1 SPI-4

SPI-4 is an interface for packet and cell transfer between a physical layer (PHY) device and a link layer device (the IXP2800 Network Processor), for aggregate bandwidths of OC-192 ATM and Packet over SONET/SDH (POS), as well as 10 Gb/s Ethernet applications.

The Optical Internetworking Forum (OIF), www.oiforum.com, controls the SPI-4 Implementation Agreement document.

Framing/MAC

Device

(PHY)

Switch Fabric

Bus Converter

UTOPIA-3

or IXBUS

Protocol

The Bus Converter chip receives and transmits both SPI-4 and CSIX protocols from/to Intel IXP2800 Network Processor. It steers the data, based on protocol, to either PHY device or Switch Fabric. PHY interface can be UTOPIA-3, IXBUS, or any other required protocol.

CSIX

Protocol

A9357-02

SPI-4 protocol transfers data in variable length bursts. Associated with each burst is information such as Port number (for a multi-port device such as a 10 x 1 GbE), SOP, and EOP. This information is collected by the MSF and passed to the Microengines.

60 Hardware Reference Manual

2.7.2 CSIX

CSIX-L1 (Common Switch Interface) defines an interface between a Traffic Manager (TM) and a Switch Fabric (SF) for ATM, IP, MPLS, Ethernet, and similar data communications applications.

The Network Processor Forum (NPF) www.npforum.org, controls the CSIX-L1 specification.

The basic unit of information transferred between Traffic Managers and Switch Fabrics is called a CFrame. There are three categories of CFrames:

• Data

• Control

• Flow Control

Associated with each CFrame is information such as length, type, address. This information is collected by MSF and passed to Microengines.

MSF also contains a number of hardware features related to flow control.

2.7.3 Receive

Intel® IXP2800 Network Processor

Technical Description

Figure 12 is a simplified block diagram of the MSF receive section.

Figure 12. Simplified MSF Receive Section Block Diagram

Checksum

RDAT RCTL RPAR

RPROT

RCLK

RCLK REF

RSTAT

CSIX

Protocol

Logic

SPI-4

Protocol

Logic

SPI-4

Flow

Control

Clock for Receive Functions

FCEFIFO

- - - - - -

TXCDAT

RBUF

- - - - - -

- - - - - Full Indication to Flow Control

Element

TXCFC

(FCIFIFO full)

128

Full List

CSIX CFrames mapped by RX_Port_Map CSR (normally Flow Control CFrames are mapped here)

Buffers

Control

Receive

Thread

Freelists

(to MEs)

(to DRAM)

CSR Write

A9365-01

Hardware Reference Manual 61

Intel® IXP2800 Network Processor

Technical Description

2.7.3.1 RBUF

RBUF is a RAM that holds received data. It stores received data in sub-blocks (referred to as elements), and is accessed by Microengines or the Intel XScale

core reading the received information. Details of how RBUF elements are allocated and filled is based on the receive data protocol. When data is received, the associated status is put into the FULL_ELEMENT_LIST FIFO and subsequently sent to Microengines to process. FULL_ELEMENT_LIST insures that received elements are sent to Microengines in the order that the data was received.

RBUF contains a total of 8 KB of data. The element size is programmable as either 64 bytes, 128 bytes, or 256 bytes per element. In addition, RBUF can be programmed to be split into one, two, or three partitions depending on application. For receiving SPI-4, one partition would be used. For receiving CSIX, two partitions are used (Control CFrames and Data CFrames). When both protocols are being used, the RBUF can be split into three partitions. For both SPI-4 and CSIX, three partitions are used.

Microengines can read data from the RBUF to Microengine S_TRANSFER_IN registers using the

msf[read] instruction where they specify the starting byte number (which must be aligned to 4

bytes), and number of 32-bit words to read. The number in the instruction can be either the number of 32-bit words, or number of 32-bit word pairs, using the single and double instruction modifiers, respectively.

Microengines can move data from RBUF to DRAM using the

dram instruction where they specify

the starting byte number (which must be aligned to 4 bytes), the number of 32-bit words to read, and the address in DRAM to write the data.

For both types of RBUF read, reading an element does not modify any RBUF data, and does not free the element, so buffered data can be read as many times as desired. This allows, for example, a processing pipeline to have different Microengines handle different protocol layers, with each Microengine reading only the specific header information it requires.

2.7.3.1.1 SPI-4 and the RBUF

SPI-4 data is placed into RBUF with each SPI-4 burst allocating an element. If a SPI-4 burst is larger than the element size, another element is allocated. The status information for the element contains the following information:

3130292827262524232221201918171615141312111

RPROT

Element Byte Count

636261605958575655545352515049484746454443424140393837363534333

Reserved Checksum

SOP

EOP

Err

Len Err

9876543210

0 Abort Err

Par Err

Type

Null

ADR

The definitions of the fields are shown in Table 90, “RBUF SPIF-4 Status Definition” on page 252.

62 Hardware Reference Manual

2.7.3.1.2 CSIX and RBUF

CSIX CFrames are placed into either RBUF with each CFrame allocating an element. Unlike SPI-4, a single CFrame must not spill over into another element. Since CSIX spec specifies a maximum CFrame size of 256 bytes, this can be done by programming the element size to 256 bytes. However, if the Switch Fabric uses a smaller CFrame size, then a smaller RBUF element size can be used.

Flow Control CFrames are put into the FCEFIFO, to be sent to the Ingress IXP2800 Network Processor where a Microengine will read them to manage flow control information to the Switch Fabric.

The status information for the element contains the following information:

Intel® IXP2800 Network Processor

Technical Description

3130292827262524232221201918171615141312111

RPROT

Element Payload Length

636261605958575655545352515049484746454443424140393837363534333

Extension Header

Len Err

Err

9876543210

HP Err

VP Err

Null

Reserved Type

The definitions of the fields are shown in Table 91, “RBUF CSIX Status Definition” on page 254.

2.7.3.2 Full Element List

Receive control hardware maintains the FULL_ELEMENT_LIST to hold the status of valid RBUF elements, in the order in which they were received. When an RBUF element is filled, its status is added to the tail of the FULL_ELEMENT_LIST. When a Microengine is notified of element arrival (by having the status written to its S_Transfer register), it is removed from the head of the FULL_ELEMENT_LIST.

2.7.3.3 RX_THREAD_FREELIST

RX_THREAD_FREELIST is a FIFO that indicates Microengine Contexts that are awaiting an RBUF element to process. This allows the Contexts to indicate their ready status prior to the reception of the data, as a way to eliminate latency. Each entry added to a Freelist also has an associated S_TRANSFER register and signal number. There are three RX_THREAD_FREELISTS that correspond to the RBUF partitions.

To be added as ready to receive an element, a Microengine does an

msf[fast_write] to the RX_THREAD_FREELIST address; the write data is the Microengine/

msf[write] or an

CONTEXT/S_TRANSFER register number to add to the Freelist.

When there is valid status at the head of the Full Element List, it will be pushed to a Microengine. The receive control logic pushes the status information (which includes the element number) to the Microengine in the head entry of RX_THREAD_FREELIST, and sends an Event Signal to the Microengine. It then removes that entry from the RX_THREAD_FREELIST, and removes the status from Full Element List.

Hardware Reference Manual 63

Intel® IXP2800 Network Processor

Technical Description

Each RX_THREAD_FREELIST has an associated countdown timer. If the timer expires and no new receive data is available yet, the receive logic will autopush a Null Receive Status Word to the next thread on the RX_THREAD_FREELIST. A Null Receive Status Word has the “Null” bit set, and does not have any data or RBUF entry associated with it.

The RX_THREAD_FREELIST timer is useful for certain applications. Its primary purpose is to keep the receive processing pipeline (implemented as code running on the Microengines) moving even when the line has gone idle.

It is especially useful if the pipeline is structured to handle mpackets in groups, i.e., eight mpackets at a time. If seven mpackets are received, then the line goes idle, then the timeout will trigger the autopush of a null Receive Status Word, filling the eighth slot and allowing the pipeline to advance. Another example is if one valid mpacket is received before the line goes idle for a long period; seven null Receive Status Words will be autopushed, allowing the pipeline to proceed. Typically the timeout interval is programmed to be slightly larger than the minimum arrival time of the incoming cells or packets.

The timer is controlled using the RX_THREAD_FREELIST_TIMEOUT_# CSR. The timer may be enabled or disabled, and the timeout value specified using this CSR.

2.7.3.4 Receive Operation Summary

During receive processing, received CFrames, and SPI-4 cells and packets (which in this context are all called mpackets) are placed into the RBUF, and then handed off to a Microengine to process. Normally, by application design, some number of Microengine Contexts will be assigned to receive processing. Those Contexts will have their number added to the proper RX_THREAD_FREELIST (via wait for arrival of an mpacket (or alternatively poll waiting for arrival of an mpacket).

When an mpacket arrives, MSF receive control logic will autopush eight bytes of information for the element to the Microengine/CONTEXT/S_TRANSFER registers at the head of RX_THREAD_FREELIST. The information pushed is:

msf[write]or msf[fast_write]), and then will go to sleep to

• Status Word (SPI-4) or Header Status (CSIX) — see Table 90, “RBUF SPIF-4 Status

Definition” on page 252 for more information.

• Checksum (SPI-4) or Extension Header (CSIX) — see Table 91, “RBUF CSIX Status

Definition” on page 254 for more information.

To handle the case where the receive Contexts temporarily fall behind and RX_THREAD_FREELIST is empty, all received element numbers are held in the FULL_ELEMENT_LIST. In that case, as soon as an RX_THREAD_FREELIST entry is entered, the status of the head element of FULL_ELEMENT_LIST will be pushed to it.

The Microengines may read part of (or the entire) RBUF element to their S_TRANSFER registers (via an

msf[read] instruction) for header processing, etc., and may also move the element data to

DRAM (via a

When a Context is done with an element, it does an RBUF_ELEMENT_DONE address; the write data is the element number. This marks the element as free and available to be re-used. There is no restriction on the order in which elements are freed; Contexts can do different amounts of processing per element based on the contents of the element — therefore elements can be returned in a different order than they were handed to Contexts.

dram[rbuf_rd] instruction).

msf[write]or msf[fast_write] to

64 Hardware Reference Manual

2.7.4 Transmit

Figure 13 is a simplified Block Diagram of the MSF transmit section.

Figure 13. Simplified Transmit Section Block Diagram

Intel® IXP2800 Network Processor

Technical Description

From Other CSRs

2.7.4.1 TBUF

TBUF is a RAM that holds data and status to be transmitted. The data is written into sub-blocks referred to as elements, by Microengines or the Intel XScale

From ME

From DRAM

- - - - - -

Element

RXCFC

(FCIFIFO full)

TBUF

Valid Logic

FCIFIFO

- - - - - -

RXCDAT

Control

ME Reads

(S_Push_Bus)

SPI-4

Protocol

Logic

CSIX

Protocol

Logic

RXCSRB

(Ready Bits)

core.

Internal Clock

for Transmit

Logic

Byte Align

Internal

Clock

TDAT TCTL TPAR

TCLK

TCLK REF

A9366-01

TBUF contains a total of 8 Kbytes of data. The element size is programmable as either 64 bytes, 128 bytes, or 256 bytes per element. In addition, TBUF can be programmed to be split into one, two, or three partitions depending on application. For transmitting SPI-4, one partition would be used. For transmitting CSIX, two partitions are used (Control CFrames and Data CFrames). For both SPI-4 and CSIX, three partitions are used.

Microengines can write data from Microengine S_TRANSFER_OUT registers to the TBUF using the

msf[write] instruction where they specify the starting byte number (which must be aligned to

4 bytes), and number of 32-bit words to write. The number in the instruction can be either the number of 32-bit words, or number of 32-bit word pairs, using the single and double instruction modifiers, respectively.

Microengines can move data from DRAM to TBUF using the

dram instruction where they specify

the starting byte number (which must be aligned to 4 bytes), the number of 32-bit words to write, and the address in DRAM of the data.

Hardware Reference Manual 65

Intel® IXP2800 Network Processor

Technical Description

All elements within a TBUF partition are transmitted in the order. Control information associated with the element defines which bytes are valid. The data from the TBUF will be shifted and byte aligned as required to be transmitted.

2.7.4.1.1 SPI-4 and TBUF

For SPI-4, data is put into the data portion of the element, and information for the SPI-4 Control Word that will precede the data is put into the Element Control Word.

When the Element Control Word is written, the information is:

3130292827262524232221201918171615141312111

Payload Length

636261605958575655545352515049484746454443424140393837363534333

Prepend

Offset

Prepend Length

Payload

Offset

Res

9876543210

SOP

Res

EOP

ADR

Skip

The definitions of the fields are shown in Table 15.

Table 15. TBUF SPI-4 Control Definition

Field Definition

Indicates the number of Payload bytes, from 1 to 256, in the element. The value of 0x00

Payload Length

Prepend Offset Indicates the first valid byte of Prepend, from 0 to 7

Prepend Length Indicates the number of bytes in Prepend, from 0 to 31.

Payload Offset Indicates the first valid byte of Payload, from 0 to 7.

Skip

SOP

EOP

ADR

NOTE:

1. Normally EOPS is sent on the next Control Word (along with ADR and SOP) to start the next element. If there is no valid element pending at the end of sending the data, the transmit logic will insert an Idle Control Word with the EOPS information.

means 256 bytes. The sum of Prepend Length and Payload Length will be sent. That value will also control the EOPS field (1 or 2 bytes valid indicated) of the Control Word that will succeed the data transfer. Note 1.

Allows software to allocate a TBUF element and then not transmit any data from it. 0—transmit data according to other fields of Control Word. 1—free the element without transmitting any data.

Indicates if the element is the start of a packet. This field will be sent in the SOPC field of the Control Word that will precede the data transfer.

Indicates if the element is the end of a packet. This field will be sent in the EOPS field of the Control Word that will succeed the data transfer. Note 1.

The port number to which the data is directed. This field will be sent in the ADR field of the Control Word that will precede the data transfer.

66 Hardware Reference Manual

2.7.4.1.2 CSIX and TBUF

For CSIX, payload information is put into the data area of the element, and Base and Extension Header information is put into the Element Control Word.

When the Element Control Word is written, the information is:

Intel® IXP2800 Network Processor

Technical Description

3130292827262524232221201918171615141312111

Payload Length

636261605958575655545352515049484746454443424140393837363534333

Prepend

Offset

Prepend Length

Extension Header

Payload

Offset

Res

9876543210

Skip

Res

Res Type

The definitions of the fields are shown in Table 16.

Table 16. TBUF CSIX Control Definition

Field Definition

Indicates the number of Payload bytes, from 1 to 256, in the element. The value of 0x00

Payload Length

Prepend Offset Indicates the first valid byte of Prepend, from 0 to 7.

Prepend Length Indicates the number of bytes in Prepend, from 0 to 31.

Payload Offset Indicates the first valid byte of Payload, from 0 to 7.

Skip

CR CR (CSIX Reserved) bit to put into the CSIX Base Header.

P P (Private) bit to put into the CSIX Base Header.

Type Type Field to put into the CSIX Base Header. Idle type is not legal here.

Extension Header

means 256 bytes. The sum of Prepend Length and Payload Length will be sent, and also put into the CSIX Base Header Payload Length field. Note that this length does not include any padding that may be required. Padding is inserted by transmit hardware as needed.

Allows software to allocate a TBUF element and then not transmit any data from it. 0—transmit data according to other fields of Control Word. 1—free the element without transmitting any data.

The Extension Header to be sent with the CFrame. The bytes are sent in big-endian order; byte 0 is in bits 63:56, byte 1 is in bits 55:48, byte 2 is in bits 47:40, and byte 3 is in bits 39:32.

2.7.4.2 Transmit Operation Summary

During transmit processing data to be transmitted is placed into the TBUF under Microengine control. The Microengine allocates an element in software; the transmit hardware processes TBUF elements within a partition in strict sequential order so the software can track which element to allocate next.

Microengines may write directly into an element by an DRAM written into the element by a

dram[tbuf_wr] instruction. Data can be merged into the

element by doing both.

Hardware Reference Manual 67

msf[write] instruction, or have data from

Intel® IXP2800 Network Processor

Technical Description

There is a Transmit Valid bit per element, that marks the element as ready to be transmitted. Microengines move all data into the element, by either or both of

dram[tbuf_wr] instructions to the TBUF. Microengines also write the element Transmit Control

Word with information about the element. When all of the data movement is complete, the Microengine sets the element valid bit.

1. Move data into TBUF by either or both of the TBUF.

2. Wait for 1 to complete.

3. Write Transmit Control Word at TBUF_ELEMENT_CONTROL_# address. Using this address sets the Transmit Valid bit.

2.7.5 The Flow Control Interface

The MSF provides flow control support for SPI-4 and CSIX.

2.7.5.1 SPI-4

SPI-4 uses a FIFO Status Channel to provide flow control information. MSF receives the information from the PHY device and stores it so that Microengines can read the information on a per-port basis. It can then use that information to determine when to transmit data to a given port.

msf[write] and

msf[write] and dram[tbuf_wr] instructions to

The MSF also sends status to the PHY based on the amount of available space in the RBUF — i.e., done by hardware without Microengines.

2.7.5.2 CSIX

CSIX provides two types of flow control — link level and per queue.

• The link level control is handled by hardware. MSF will stop transmission is response to link

level flow control received from the Switch Fabric. MSF will assert link level flow control based on the amount of available space in the RBUF.

• Per queue flow control information is put into the FCIFIFO and handled by Microengine

software. Also, if required, Microengines can send Flow Control CFrames to the Switch Fabric under software control.

In both cases, for a full-duplex configuration, information is passed from the Switch Fabric to the Egress IXP2800 Network Processor, which then passes it to the Ingress IXP2800 Network Processor over a proprietary flow control interface.

68 Hardware Reference Manual

2.8 Hash Unit

The IXP2800 Network Processor contains a Hash Unit that can take 48-, 64-, or 128-bit data and produce a 48-, 64-, or a 128-bit hash index, respectively. The Hash Unit is accessible by the Microengines and the Intel XScale example L2 addresses. Figure 14 is a block diagram of the Hash Unit.

Intel® IXP2800 Network Processor

Technical Description

core, and is useful in doing table searches with large keys, for

Up to three hash indexes can be created using a single Microengine instruction. This helps to minimize command overhead. The Intel XScale

A Microengine initiates a hash operation by writing the hash operands into a contiguous set of S_TRANSFER_OUT registers and then executing the hash instruction. The Intel XScale initiates a hash operation by writing a set of memory-mapped HASH_OP registers, which are built in the Intel XScale

core gasket, with the data to be used to generate the hash index. There are

core can only do a single hash at a time.

core

separate registers for 48-, 64-, and 128-bit hashes. The data is written from MSB to LSB, with the write to LSB triggering the Hash Operation. In both cases, the Hash Unit reads the operand into an input buffer, performs the hash operation, and returns the result.

The Hash Unit uses a hard-wired polynomial algorithm and a programmable hash multiplier to create hash indexes. Three separate multipliers are supported, one for 48-bit hash operations, one for 64-bit hash operations and one for 128-bit hash operations. The multiplier is programmed through Control registers in the Hash Unit.

The multiplicand is shifted into the hash array, 16 bits at a time. The hash array performs a 1’s-complement multiply and polynomial divide, using the multiplier and 16 bits of the multiplicand. The result is placed into an output buffer register and also feeds back into the array. This process is repeated three times for a 48-bit hash (16 bits x 3 = 48), four times for a 64-bit hash (16 bits x 4 = 64), and eight times for a 128-bit hash (16 x 8 = 128). After the multiplicand has been passed through the hash array, the resulting hash index is placed into a two-stage output buffer.

After each hash index is completed, the Hash Unit returns the hash index to the Microengines’ S_TRANSFER_IN registers, or the Intel XScale

core HASH_OP registers. For Microengine initiated hash operations, the Microengine is signaled after all the hashes specified in the instruction have been completed.

For the Intel XScale

core initiated hash operations, the Intel XScale® core reads the results from the memory-mapped HASH_OP registers. The addresses of Hash Results are the same as the HASH_OP registers. Because of queuing delays at the Hash Unit, the time to complete an operation is not fixed. The Intel XScale

core can do one of two operations to get the hash results.

• Poll the HASH_DONE register. This register is cleared when the HASH_OP registers are

written. Bit [0] of HASH_DONE register is set when the HASH_OP registers get the return result from the Hash Unit (when the last word of the result is returned). The Intel XScale software can poll on HASH_DONE, and read HASH_OP when HASH_DONE is equal to 0x00000001.

• Read HASH_OP directly. The interface hardware will acknowledge the read only when the

result is valid. This method will result in the Intel XScale

core stalling if the result is not

valid when the read happens.

The number of clock cycles required to perform a single hash operation equals: two or four cycles through the input buffers, three, four or eight cycles through the hash array, and two or four cycles through the output buffers. Because of the pipeline characteristics of the Hash Unit, performance is improved if multiple hash operations are initiated with a single instruction rather than separate hash instructions for each hash operation.

Hardware Reference Manual 69

core

Intel® IXP2800 Network Processor

Technical Description

Figure 14. Hash Unit Block Diagram

Data Used to Create Hash Index from S_Transfer_Out

2-Stage Input Buffer

128

Hashed Multiplicand 3

128

Hashed Multiplicand 2 Hashed Multiplicand 1

Hash Indexes to S_Transfer_In

Registers

Multiplicand 3 Multiplicand 2

128

Multiplicand 1

Hash Array

2-Stage Output Buffer

shift

Hash_Multiplier_48

Hash_Multiplier_64

Hash_Multiplier_128

48-bit, 64-bit or 128-bit Hash Select

A9367-02

70 Hardware Reference Manual

2.9 PCI Controller

The PCI Controller provides a 64-bit, 66 MHz capable PCI Local Bus Revision 2.2 interface, and is compatible to 32-bit or 33 MHz PCI devices. The PCI controller provides the following functions:

• Target Access (external Bus Master access to SRAM, DRAM, and CSRs)

• Master Access (the Intel XScale

• Two DMA Channels

• Mailbox and Doorbell registers for the Intel XScale

• PCI arbiter

The IXP2800 Network Processor can be configured to act as PCI central function (for use in a stand-alone system), where it provides the PCI reset signal, or as an add-in device, where it uses the PCI reset signal as the chip reset input. The choice is made by connecting the cfg_rst_dir input pin low or high.

2.9.1 Target Access

Intel® IXP2800 Network Processor

core access to PCI Target devices)

core to Host communication

Technical Description

There are three Base Address Registers (BARs) to allow PCI Bus Masters to access SRAM, DRAM, and CSRs, respectively. Examples of PCI Bus Masters include a Host Processor (for example a Pentium encryption coprocessor.

The SRAM BAR can be programmed to sizes of 16, 32, 64, 128, or 256 Mbytes, or no access.

The DRAM BAR can be programmed to sizes of 128, 256, or 512 Mbytes or 1 Gbyte, or no access.

The CSR BAR is 8 KB.

PCI Boot Mode is supported, in which the Host downloads the Intel XScale DRAM, while holding the Intel XScale Intel XScale attached to the Slowport.

processor), or an I/O device such as an Ethernet controller, SCSI controller, or

core reset is deasserted. The alternative is to provide the boot image in a Flash ROM

2.9.2 Master Access

The Intel XScale® core and Microengines can directly access the PCI bus. The Intel XScale® core can do loads and stores to specific address regions to generate all PCI command types. Microengines use PCI instruction, and also use address regions to generate different PCI commands.

2.9.3 DMA Channels

core in reset. Once the boot image has been loaded, the

core boot image into

There are two DMA Channels, each of which can move blocks of data from DRAM to the PCI or from the PCI to DRAM. The DMA channels read parameters from a list of descriptors in SRAM, perform the data movement to or from DRAM, and stop when the list is exhausted. The descriptors are loaded from predefined SRAM entries or may be set directly by CSR writes to DMA Channel registers. There is no restriction on byte alignment of the source address or the destination address.

Hardware Reference Manual 71

Intel® IXP2800 Network Processor

Technical Description

For PCI to DRAM transfers, the PCI command is Memory Read, Memory Read line, or Memory Read Multiple. For DRAM to PCI transfers, the PCI command is Memory Write. Memory Write Invalidate is not supported.

Up to two DMA channels are running at a time with three descriptors outstanding. Effectively, the active channels interleave bursts to or from the PCI Bus.

Interrupts are generated at the end of DMA operation for the Intel XScale Microengines do not provide an interrupt mechanism. The DMA Channel will instead use an Event Signal to notify the particular Microengine on completion of DMA.

2.9.3.1 DMA Descriptor

Each descriptor uses four 32-bit words in SRAM, aligned on a 16-byte boundary. The DMA channels read the descriptors from SRAM into working registers once the control register has been set to initiate the transaction. This control must be set explicitly; this starts the DMA transfer. Register names for DMA channels are listed in Figure 15 and Table 17 lists the descriptor contents.

Figure 15. DMA Descriptor Reads

Local SRAM

Last Descriptor

Prior Descriptor

Next Descriptor

Current Descriptor

Working Register

DMA Channel Register

Byte Count Register PCI Address Register DRAM Address REgister Descriptor Pointer Register

Channel Register Name

(X can be 1, 2, or 3) CHAN_X_BYTE_COUNT CHAN_X_PCI_ADDR CHAN_X_DRAM_ADDR CHAN_X_DESC_PTR

Control Register

DMA Channel Register

Control Register CHAN_X_CONTROL

Channel Register Name

(X can be 1, 2, or 3)

core. However,

A9368-01

After a descriptor is processed, the next descriptor is loaded in the working registers. This process repeats until the chain of descriptors is terminated (i.e., the End of Chain bit is set).

Table 17. DMA Descriptor Format

Offset from Descriptor Pointer Description

0x0 Byte Count

0x4 PCI Address

0x8 DRAM Address

0xC Next Descriptor Address

72 Hardware Reference Manual

2.9.3.2 DMA Channel Operation

The DMA channel can be set up to read the first descriptor in SRAM, or with the first descriptor written directly to the DMA channel registers. When descriptors and the descriptor list are in SRAM, the procedure is as follows:

1. The DMA channel owner writes the address of the first descriptor into the DMA Channel Descriptor Pointer register (DESC_PTR).

2. The DMA channel owner writes the DMA Channel Control register (CONTROL) with miscellaneous control information and also sets the channel enable bit (bit 0). The channel initial descriptor bit (bit 4) in the CONTROL register must also be cleared to indicate that the first descriptor is in SRAM.

3. Depending on the DMA channel number, the DMA channel reads the descriptor block into the corresponding DMA registers, BYTE_COUNT, PCI_ADDR, DRAM_ADDR, and DESC_PTR.

4. The DMA channel transfers the data until the byte count is exhausted, and then sets the channel transfer done bit in the CONTROL register.

5. If the end of chain bit (bit 31) in the BYTE_COUNT register is clear, the channel checks the Chain Pointer value. If the Chain Pointer value is not equal to 0. it reads the next descriptor and transfers the data (step 3 and 4 above). If the Chain Pointer value is equal to 0, it waits for the Descriptor Added bit of the Channel Control register to be set before reading the next descriptor and transfers the data (step 3 and 4 above). If bit 31 is set, the channel sets the channel chain done bit in the CONTROL register and then stops.

6. Proceed to the Channel End Operation.

Intel® IXP2800 Network Processor

Technical Description

When single descriptors are written into the DMA channel registers, the procedure is as follows:

1. The DMA channel owner writes the descriptor values directly into the DMA channel registers. The end of chain bit (bit 31) in the BYTE_COUNT register must be set, and the value in the DESC_PTR register is not used.

2. The DMA channel owner writes the base address of the DMA transfer into the PCI_ADDR to specify the PCI starting address.

3. When the first descriptor is in the BYTE_COUNT register, the DRAM_ADDR register must be written with the address of the data to be moved.

4. The DMA channel owner writes the CONTROL register with miscellaneous control information, along with setting the channel enable bit (bit 0). The channel initial descriptor in register bit (bit 4) in the CONTROL register must also be set to indicate that the first descriptor is already in the channel descriptor registers.

5. The DMA channel transfers the data until the byte count is exhausted, and then sets the channel transfer done bit (bit 2) in the CONTROL register.

6. Since the end of the chain bit (bit 31) in the BYTE_CONT register is set, the channel sets the channel chain done bit (bit 7) in the CONTROL register and then stops.

7. Proceed to the Channel End Operation.

Hardware Reference Manual 73

Intel® IXP2800 Network Processor

Technical Description

2.9.3.3 DMA Channel End Operation

1. Channel owned by PCI: If not masked via the PCI Outbound Interrupt Mask register, the DMA channel interrupts the

PCI host after the setting of the DMA done bit in the CHAN_X_CONTROL register, which is readable in the PCI Outbound Interrupt Status register.

2. Channel owned by the Intel XScale If enabled via the Intel XScale

the Intel XScale

core by setting the DMA channel done bit in the CHAN_X_CONTROL

core:

core Interrupt Enable registers, the DMA channel interrupts

core Interrupt Status register.

3. Channel owned by Microengine: If enabled via the Microengine Auto-Push Enable registers, the DMA channel signals the

Microengine after setting the DMA channel done bit in the CHAN_X_CONTROL register, which is readable in the Microengine Auto-Push Status register.

2.9.3.4 Adding Descriptors to an Unterminated Chain

It is possible to add a descriptor to a chain while a channel is running. To do so, the chain should be left unterminated, i.e., the last descriptor should have End of Chain clear, and the Chain Pointer value equal to 0. A new descriptor (or linked list of descriptors) can be added to the chain by overwriting the Chain Pointer value of the unterminated descriptor (in SRAM) with the Local Memory address of the (first) added descriptor (the added descriptor must actually be valid in Local Memory prior to that). After updating the Chain Pointer field, the software must write a 1 to the Descriptor Added bit of the Channel Control register. This is necessary for the case where the channel was paused to reactivate the channel. However, software need not check the state of the channel before writing that bit; there is no side-effect of writing that bit in the case where the channel had not yet read the unlinked descriptor.

If the channel was paused or had read an unlinked Pointer, it will re-read the last descriptor processed (i.e., the one that originally had the 0 value for Chain Pointer) to get the address of the newly added descriptor.

A descriptor cannot be added to a descriptor that has End of Chain set.

2.9.4 Mailbox and Message Registers

Mailbox and Doorbell registers provide hardware support for communication between the Intel

XScale

Four 32-bit mailbox registers are provided so that messages can be passed between the Intel XScale both the Intel XScale messages are not used internally by the PCI Unit in any way. The mailbox registers are often used with the Doorbell interrupts.

Doorbell interrupts provide an efficient method of generating an interrupt as well as encoding the purpose of the interrupt. The PCI Unit supports a 32-bit the Intel XScale register that is used by a PCI device to generate an the Intel XScale 32-bit PCI DOORBELL register that is used by the Intel XScale A source generating the Doorbell interrupt can write a software defined bitmap to the register to indicate a specific purpose. This bitmap is translated into a single interrupt signal to the destination

core and a device on the PCI Bus.

core and a PCI device. All four registers can be read and written with byte resolution from

core and PCI. How the registers are used is application dependent and the

core DOORBELL

core interrupt, and a separate

core to generate a PCI interrupt.

74 Hardware Reference Manual

(either a PCI interrupt or an Intel XScale® core interrupt). When an interrupt is received, the DOORBELL registers can be read and the bit mask can be interpreted. If a larger bit mask is required than that is provided by the DOORBELL register, the MAILBOX registers can be used to pass up to 16 bytes of data.

The doorbell interrupts are controlled through the registers shown in Table 18.

Table 18. Doorbell Interrupt Registers

XSCALE DOORBELL Used to generate the Intel XScale

XSCALE DOORBELL SETUP

PCI DOORBELL Used to generate the PCI Doorbell interrupts.

PCI DOORBELL SETUP Used to initialize the PCI Doorbell register and for diagnostics.

2.9.5 PCI Arbiter

The PCI unit contains a PCI bus arbiter that supports two external masters in addition to the PCI Unit’s initiator interface. If more than two external masters are used in the system, the aribter can be disabled and an external (to the IXP2800 Network Processor used. In that case, the IXP2800 Network Processor will provide its PCI request signal to the external aribter, and use that arbiters grant signal.

Used to initialize the Intel XScale

Intel® IXP2800 Network Processor

Technical Description

core Doorbell interrupts.

core Doorbell register and for diagnostics.

The arbiter uses a simple round-robin priority algorithm; it asserts the grant signal corresponding to the next request in the round-robin during the current executing transaction on the PCI bus (this is also called hidden arbitration). If the arbiter detects that an initiator has failed to assert frame_l after 16 cycles of both grant assertion and PCI bus idle condition, the arbiter deasserts the grant. That master does not receive any more grants until it deasserts its request for at least one PCI clock cycle. Bus parking is implemented in that the last bus grant will stay asserted if no request is pending.

To prevent bus contention, if the PCI bus is idle, the arbiter never asserts one grant signal in the same PCI cycle in which it deasserts another, It deasserts one grant, and then asserts the next grant after one full PCI clock cycle has elapsed to provide for bus driver turnaround.

Hardware Reference Manual 75

Intel® IXP2800 Network Processor

Technical Description

2.10 Control and Status Register Access Proxy

The Control and Status Register Access Proxy (CAP) contains a number of chip-wide control and status registers. Some provide miscellaneous control and status, while others are used for interMicroengine or Microengine to the Intel XScale Scratchpad Memory and SRAM can also be used for inter-process communication). These include:

• INTERTHREAD SIGNAL — Each thread (or context) on a Microengine can send a signal to

any other thread by writing to InterThread_Signal register. This allows a thread to go to sleep waiting completion of a task by a different thread.

• THREAD MESSAGE — Each thread has a message register where it can post a software-

specific message. Other Microengine threads, or the Intel XScale availability of messages by reading theTHREAD_MESSAGE_SUMMARY register. Both the THREAD_MESSAGE and corresponding THREAD_MESSAGE_SUMMARY clear upon a read of the message; this eliminates a race condition when there are multiple message readers. Only one reader will get the message.

• SELF DESTRUCT — This register provides another type of communication. Microengine

software can atomically set individual bits in the SELF_DESTRUCT registers; the registers clear upon read. The meaning of each bit is software-specific. Clearing the register upon read eliminates a race condition when there are multiple readers.

• THREAD INTERRUPT — Each thread can interrupt the Intel XScale

interrupts; the usage is software-specific. Having two interrupts allows for flexibility, for example, one can be assigned to normal service requests and one can be assigned to error conditions. If more information needs to be associated with the interrupt, mailboxes or Rings in Scratchpad Memory or SRAM could be used.

• REFLECTOR — CAP provides a function (called “reflector”) where any Microengine thread

can move data between its registers and those of any other thread. In response to a single write or read instruction (with the address in the specific reflector range) CAP will get data from the source Microengine and put it into the destination Microengine. Both the sending and receiving threads can optionally be signaled upon completion of the data movement.

core communication (note that rings in

core, can poll for

core on two different

2.11 Intel XScale® Core Peripherals

2.11.1 Interrupt Controller

The Interrupt Controller provides the ability to enable or mask interrupts from a number of chip wide sources, for example:

• Timers (normally used by Real-Time Operating System).

• Interrupts generated by Microengine software to request services from the Intel XScale

• External agents such as PCI devices.

• Error conditions, such as DRAM ECC error, or SPI-4 parity error.

Interrupt status is read as memory mapped registers; the state of an interrupt signal can be read even if it is masked from interrupting. Enabling and masking of interrupts is done as writes to memory mapped registers.

76 Hardware Reference Manual

core.

Intel® IXP2800 Network Processor

Technical Description

2.11.2 Timers

The IXP2800 Network Processor contains four programmable 32-bit timers, which can be used for software support. Each timer can be clocked by the internal clock, by a divided version of the clock, or by a signal on an external GPIO pin. Each timer can be programmed to generate a periodic interrupt after a programmed number of clocks. The range is from several ns to several minutes depending on the clock frequency.

In addition, timer 4 can be used as a watchdog timer. In this use, software must periodically reload the timer value; if it fails to do so and the timer counts to 0, it will reset the chip. This can be used to detect if software “hangs” or for some other reason fails to reload the timer.

2.11.3 General Purpose I/O

The IXP2800 Network Processor contains eight General Purpose I/O (GPIO) pins. These can be programmed as either input or output and can be used for slow speed I/O such as LEDs or input switches. They can also be used as interrupts to the Intel XScale programmable timers.

core, or to clock the

2.11.4 Universal Asynchronous Receiver/Transmitter

The IXP2800 Network Processor contains a standard RS-232 compatible Universal Asynchronous Receiver/Transmitter (UART), which can be used for communication with a debugger or maintenance console. Modem controls are not supported; if they are needed, GPIO pins can be used for that purpose.

The UART performs serial-to-parallel conversion on data characters received from a peripheral device and parallel-to-serial conversion on data characters received from the processor. The processor can read the complete status of the UART at any time during operation. Available status information includes the type and condition of the transfer operations being performed by the UART and any error conditions (parity, overrun, framing or break interrupt).

The serial ports can operate in either FIFO or non-FIFO mode. In FIFO mode, a 64-byte transmit FIFO holds data from the processor to be transmitted on the serial link and a 64-byte receive FIFO buffers data from the serial link until read by the processor.

The UART includes a programmable baud rate generator that is capable of dividing the internal clock input by divisors of 1 to 2 logic. It also drives the receive logic. The UART can be operated in polled or in interrupt driven mode as selected by software.

2.11.5 Slowport

The Slowport is an external interface to the IXP2800 Network Processor, used for Flash ROM access and 8, 16, or 32-bit asynchronous device access. It allows the Intel XScale write data transfers to these slave devices.

- 1 and produces a 16X clock to drive the internal transmitter

core do read/

The address bus and data bus are multiplexed to reduce the pin count. In addition, 24 bits of address are shifted out on three clock cycles. Therefore, an external set of buffers is needed to latch the address. Two chip selects are provided.

Hardware Reference Manual 77

Intel® IXP2800 Network Processor

Technical Description

The access is asynchronous. Insertion of delay cycles for both data setup and hold time is programmable via internal Control registers. The transfer can also wait for a handshake acknowledge signal from the external device.

2.12 I/O Latency

Table 19 shows the latencies for transferring data between the Microengine and the other sub-

system components. The latency is measured in 1.4 GHz cycles.

Table 19. I/O Latency

Sub-system

DRAM

(RDR)

Transfer Size

Average Read

Latency

Average Write

Latency

Note1: RDR, QDR, MSF, and Scratch values are extracted from a simulation model. Note 2: Minimum DRAM burst size on pins is 16 bytes. Transfers less than 16 bytes incur the same as a

16-byte transfer. Note 3: At 1016 MHz, read latency should be ~ 240 cycles.

8 bytes – 16 bytes

(note 2)

~ 295 cycles

(note 3)

~ 53 cycles ~ 53 cycles ~ 40 cycles

2.13 Performance Monitor

The Intel XScale® core hardware provides two 32-bit performance counters that allow two unique events to be monitored simultaneously. In addition, the Intel XScale clock counter that can be used in conjunction with the performance counters; its sole purpose is to count the number of core clock cycles, which is useful in measuring total execution time.

SRAM (QDR)

4 bytes 4 bytes 8 bytes

100 (light load) – 160 (heavy load)

Scratch MSF

~ 100 cycles

(range 53 – 152)

core implements a 32-bit

range 53 – 120

(RBUF)

~ 48 cycles

(TBUF)

78 Hardware Reference Manual

Intel® IXP2800 Network Processor

Intel XScale

Core

Intel XScale

This section contains information describing the Intel XScale® core, Intel XScale® core gasket, and Intel XScale

For additional information about the Intel XScale Developers Manual available on Intel’s Developers web site (http://www.developer.intel.com).

3.1 Introduction

The Intel XScale® core is an ARM* V5TE compliant microprocessor. It has been designed for high performance and low-power; leading the industry in mW/MIPs. The Intel XScale incorporates an extensive list of architecture features that allows it to achieve high performance. Many of the architectural features added to the Intel XScale often is a serious impediment to high performance processors.

This includes:

• The ability to continue instruction execution even while the data cache is retrieving data from

external memory.

• A write buffer.

• Write-back caching.

• Various data cache allocation policies that can be configured different for each application.

core Peripherals (XPI).

Core 3

architecture refer to the Intel XScale® Core

core

core help hide memory latency that

• Cache locking.

All these features improve the efficiency of the memory bus external to the core.

ARM* Version 5 (V5) Architecture added floating point instructions to ARM* Version 4. The Intel

XScale provide hardware support of the floating point instructions.

The Intel XScale extensions.

core implements the integer instruction set architecture of ARM* V5, but does not

core provides the Thumb instruction set (ARM* V5T) and the ARM* V5E DSP

Hardware Reference Manual 79

Intel® IXP2800 Network Processor

Intel XScale

Core

3.2 Features

Figure 16 shows the major functional blocks of the Intel XScale® core.

Figure 16. Intel XScale

Core Architecture Features

Instruction Cache

32 Kbytes 32 ways Lockable by line

Branch Target Buffer

128 entries

IMMU

32 entry TLB Fully associative Lockable by entry

Data Cache

Performance

Monitoring

Power Management

Debug

Hardware Breakpoint Branch History Table

Idle Drowsy Sleep

3.2.1 Multiply/ACcumulate (MAC)

The MAC unit supports early termination of multiplies/accumulates in two cycles and can sustain a throughput of a MAC operation every cycle. Architectural enhancements to the MAC support audio coding algorithms, including a 40-bit accumulator and support for 16-bit packed data.

Max 32 Kbytes 32 ways wr-back or

wr-through Hit under

miss

DMMU

32 entry TLB Fully associative Lockable by entry

MAC

Single Cycle Throughput (16*32)

16-bit SIMD 40-bit Accumulator

Data RAM

Max 28 Kbytes Re-map of

data cache

Mini-Data Cache

2 Kbytes 2 ways

Fill Buffer

4 - 8 entries

Write Buffer

8 entries Full coalescing

JTAG

A9642-01

3.2.2 Memory Management

The Intel XScale® core implements the Memory Management Unit (MMU) Architecture specified in the ARM* Architecture Reference Manual (see the ARM* website at http://www.arm.com). The MMU provides access protection and virtual to physical address translation. The MMU Architecture also specifies the caching policies for the instruction cache and data memory. These policies are specified as page attributes and include:

• identifying code as cacheable or non-cacheable

• selecting between the mini-data cache or data cache

• write-back or write-through data caching

• enabling data write allocation policy

• and enabling the write buffer to coalesce stores to external memory

80 Hardware Reference Manual

3.2.3 Instruction Cache

The Intel XScale® core implements a 32-Kbyte, 32-way set associative instruction cache with a line size of 32 bytes. All requests that “miss” the instruction cache generate a 32-byte read request to external memory. A mechanism to lock critical code within the cache is also provided.

3.2.4 Branch Target Buffer (BTB)

The Intel XScale® core provides a Branch Target Buffer to predict the outcome of branch type instructions. It provides storage for the target address of branch type instructions and predicts the next address to present to the instruction cache when the current instruction address is that of a branch.

The BTB holds 128 entries.

3.2.5 Data Cache

The Intel XScale® core implements a 32-Kbyte, a 32-way set associative data cache and a 2-Kbyte, 2-way set associative mini-data cache. Each cache has a line size of 32 bytes, and supports writethrough or write-back caching.

Intel® IXP2800 Network Processor

Intel XScale

Core

The data/mini-data cache is controlled by page attributes defined in the MMU Architecture and by coprocessor 15. The Intel XScale cache as data RAM. Software may place special tables or frequently used variables in this RAM.

core allows applications to reconfigure a portion of the data

3.2.6 Performance Monitoring

Two performance monitoring counters have been added to the Intel XScale® core that can be configured to monitor various events. These events allow a software developer to measure cache efficiency, detect system bottlenecks, and reduce the overall latency of programs.

3.2.7 Power Management

The Intel XScale® core incorporates a power and clock management unit that can assist in controlling clocking and managing power.

3.2.8 Debugging

The Intel XScale® core supports software debugging through two instruction address breakpoint registers, one data-address breakpoint register, one data-address/mask breakpoint register, and a trace buffer.

3.2.9 JTAG

Testability is supported on the Intel XScale® core through the Test Access Port (TAP) Controller implementation, which is based on IEEE 1149.1 (JTAG) Standard Test Access Port and BoundaryScan Architecture. The purpose of the TAP controller is to support test logic internal and external to the Intel XScale

Hardware Reference Manual 81

core such as built-in self-test, boundary-scan, and scan.

Intel® IXP2800 Network Processor

Intel XScale

Core

3.3 Memory Management

The Intel XScale® core implements the Memory Management Unit (MMU) Architecture specified in the ARM Architecture Reference Manual. To accelerate virtual to physical address translation, the Intel XScale TLB to cache the latest translations. Each TLB holds 32 entries and is fully-associative. Not only do the TLBs contain the translated addresses, but also the access rights for memory references.

If an instruction or data TLB miss occurs, a hardware translation-table-walking mechanism is invoked to translate the virtual address to a physical address. Once translated, the physical address is placed in the TLB along with the access rights and attributes of the page or section. These translations can also be locked down in either TLB to guarantee the performance of critical routines.

The Intel XScale memory:

• cacheable

• bufferable

• line allocate policy

• write policy

core uses both an instruction Translation Look-aside Buffer (TLB) and a data

core allows system software to associate various attributes with regions of

• I/O

• mini Data Cache

• Coalescing

• P bit

Note: The virtual address with which the TLBs are accessed may be remapped by the PID register.

3.3.1 Architecture Model

3.3.1.1 Version 4 versus Version 5

ARM* MMU Version 5 Architecture introduces the support of tiny pages, which are 1 Kbyte in size. The reserved field in the first-level descriptor (encoding 0b11) is used as the fine page table base address.

3.3.1.2 Memory Attributes

The attributes associated with a particular region of memory are configured in the memory management page table and control the behavior of accesses to the instruction cache, data cache, mini-data cache and the write buffer. These attributes are ignored when the MMU is disabled.

To allow compatibility with older system software, the new Intel XScale advantage of encoding space in the descriptors that was formerly reserved.

core attributes take

3.3.1.2.1 Page (P) Attribute Bit

The P bit assigns a page attribute to a memory region. Refer to the Intel Network Processor Programmer’s Reference Manual for details about the P bit.

82 Hardware Reference Manual

IXP2400 and IXP2800

3.3.1.2.2 Instruction Cache

When examining these bits in a descriptor, the Instruction Cache only utilizes the C bit. If the C bit is clear, the Instruction Cache considers a code fetch from that memory to be non-cacheable, and will not fill a cache entry. If the C bit is set, then fetches from the associated memory region will be cached.

3.3.1.2.3 Data Cache and Write Buffer

All of these descriptor bits affect the behavior of the Data Cache and the Write Buffer.

If the X bit for a descriptor is 0 (see Table 20), the C and B bits operate as mandated by the ARM* architecture. If the X bit for a descriptor is one, the C and B bits’ meaning is extended, as detailed in Table 21.

Table 20. Data Cache and Buffer Behavior when X = 0

Intel® IXP2800 Network Processor

Intel XScale

Core

C B Cacheable? Bufferable? Write Policy

0 0 N N — — Stall until complete

0 1 N Y — —

1 0 Y Y Write Through Read Allocate

1 1 Y Y Write Back Read Allocate

1. Normally, the processor will continue executing after a data access if no dependency on that access is encountered. With this setting, the processor will stall execution until the data access completes. This guarantees to software that the data access has taken effect by the time execution of the data access instruction completes. External data aborts from such accesses will be imprecise.

Table 21. Data Cache and Buffer Behavior when X = 1

C B Cacheable? Bufferable? Write Policy

0 0 — — — — Unpredictable; do not use

0 1 N Y — —

1 0

1 1 Y Y Write Back

1. Normally, bufferable writes can coalesce with previously buffered data in the same address range

(Mini Data

Cache)

———

Line

Allocation

Policy

Line

Allocation

Policy

Read/Write

Allocate

Notes

Writes will not coalesce into

buffers

Cache policy is determined by MD field of Auxiliary Control register

3.3.1.2.4 Details on Data Cache and Write Buffer Behavior

If the MMU is disabled all data accesses will be non-cacheable and non-bufferable. This is the same behavior as when the MMU is enabled, and a data access uses a descriptor with X, C, and B all set to 0.

The X, C, and B bits determine when the processor should place new data into the Data Cache. The cache places data into the cache in lines (also called blocks). Thus, the basis for making a decision about placing new data into the cache is a called a “Line Allocation Policy.”

Hardware Reference Manual 83

Intel® IXP2800 Network Processor

Intel XScale

Core

If the Line Allocation Policy is read-allocate, all load operations that miss the cache request a 32-byte cache line from external memory and allocate it into either the data cache or mini-data cache (this is assuming the cache is enabled). Store operations that miss the cache will not cause a line to be allocated.

If read/write-allocate is in effect, load or store operations that miss the cache will request a 32-byte cache line from external memory if the cache is enabled.

The other policy determined by the X, C, and B bits is the Write Policy. A write-through policy instructs the Data Cache to keep external memory coherent by performing stores to both external memory and the cache. A write-back policy only updates external memory when a line in the cache is cleaned or needs to be replaced with a new line. Generally, write-back provides higher performance because it generates less data traffic to external memory.

3.3.1.2.5 Memory Operation Ordering

A fence memory operation (memop) is one that guarantees all memops issued prior to the fence will execute before any memop issued after the fence. Thus software may issue a fence to impose a partial ordering on memory accesses.

Table 22 shows the circumstances in which memops act as fences.

Any swap (SWP or SWPB) to a page that would create a fence on a load or store is a fence.

Table 22. Memory Operations that Impose a Fence

operation X C B

load — 0 —

store101

load or store 0 0 0

3.3.2 Exceptions

The MMU may generate prefetch aborts for instruction accesses and data aborts for data memory accesses.

Data address alignment checking is enabled by setting bit 1 of the Control register (CP15, register 1). Alignment faults are still reported even if the MMU is disabled. All other MMU exceptions are disabled when the MMU is disabled.

84 Hardware Reference Manual

Intel® IXP2800 Network Processor

Intel XScale

Core

3.3.3 Interaction of the MMU, Instruction Cache, and Data Cache

The MMU, instruction cache, and data/mini-data cache may be enabled/disabled independently. The instruction cache can be enabled with the MMU enabled or disabled. However, the data cache can only be enabled when the MMU is enabled. Therefore only three of the four combinations of the MMU and data/mini-data cache enables are valid (see Table 23). The invalid combination will cause undefined results.

Table 23. Valid MMU and Data/Mini-Data Cache Combinations

MMU Data/Mini-data Cache

Off Off

On Off

On On

3.3.4 Control

3.3.4.1 Invalidate (Flush) Operation

The entire instruction and data TLB can be invalidated at the same time with one command or they can be invalidated separately. An individual entry in the data or instruction TLB can also be invalidated.

Globally invalidating a TLB will not affect locked TLB entries. However, the invalidate-entry operations can invalidate individual locked entries. In this case, the locked remains in the TLB, but will never “hit” on an address translation. Effectively, a hole is in the TLB. This situation may be rectified by unlocking the TLB.

3.3.4.2 Enabling/Disabling

The MMU is enabled by setting bit 0 in coprocessor 15, register 1 (Control register). When the MMU is disabled, accesses to the instruction cache default to cacheable and all accesses to data memory are made non-cacheable. A recommended code sequence for enabling the MMU is shown in Example 14.

Example 14. Enabling the MMU

; This routine provides software with a predictable way of enabling the MMU. ; After the CPWAIT, the MMU is guaranteed to be enabled. Be aware ; that the MMU will be enabled sometime after MCR and before the instruction ; that executes after the CPWAIT. ; Programming Note: This code sequence requires a one-to-one virtual to ; physical address mapping on this code since ; the MMU may be enabled part way through. This would allow the instructions ; after MCR to execute properly regardless the state of the MMU.

MRC P15,0,R0,C1,C0,0; Read CP15, register 1 ORR R0, R0, #0x1; Turn on the MMU MCR P15,0,R0,C1,C0,0; Write to CP15, register 1

; The MMU is guaranteed to be enabled at this point; the next instruction or ; data address will be translated.

Hardware Reference Manual 85

Intel® IXP2800 Network Processor

Intel XScale

Core

3.3.4.3 Locking Entries

Individual entries can be locked into the instruction and data TLBs. If a lock operation finds the virtual address translation already resident in the TLB, the results are unpredictable. An invalidate by entry command before the lock command will ensure proper operation. Software can also accomplish this by invalidating all entries, as shown in Example 15.

Locking entries into either the instruction TLB or data TLB reduces the available number of entries (by the number that was locked down) for hardware to cache other virtual to physical address translations.

A procedure for locking entries into the instruction TLB is shown in Example 15.

If a MMU abort is generated during an instruction or data TLB lock operation, the Fault Status register is updated to indicate a Lock Abort, and the exception is reported as a data abort.

Example 15. Locking Entries into the Instruction TLB

; R1, R2 and R3 contain the virtual addresses to translate and lock into ; the instruction TLB.

; The value in R0 is ignored in the following instruction. ; Hardware guarantees that accesses to CP15 occur in program order

MCR P15,0,R0,C8,C5,0 ; Invalidate the entire instruction TLB

MCR P15,0,R1,C10,C4,0 ; Translate virtual address (R1) and lock into

; instruction TLB

MCR P15,0,R2,C10,C4,0 ; Translate

; virtual address (R2) and lock into instruction TLB

MCR P15,0,R3,C10,C4,0 ; Translate virtual address (R3) and lock into

; instruction TLB

CPWAIT

; The MMU is guaranteed to be updated at this point; the next instruction will ; see the locked instruction TLB entries.

Note: If exceptions are allowed to occur in the middle of this routine, the TLB may end up caching a

translation that is about to be locked. For example, if R1 is the virtual address of an interrupt service routine and that interrupt occurs immediately after the TLB has been invalidated, the lock operation will be ignored when the interrupt service routine returns back to this code sequence. Software should disable interrupts (FIQ or IRQ) in this case.

As a general rule, software should avoid locking in all other exception types.

86 Hardware Reference Manual

Intel® IXP2800 Network Processor

Intel XScale

The proper procedure for locking entries into the data TLB is shown in Example 16.

Example 16. Locking Entries into the Data TLB

; R1, and R2 contain the virtual addresses to translate and lock into the data TLB

MCR P15,0,R1,C8,C6,1 ; Invalidate the data TLB entry specified by the

; virtual address in R1

MCR P15,0,R1,C10,C8,0 ; Translate virtual address (R1) and lock into

; data TLB

; Repeat sequence for virtual address in R2 MCR P15,0,R2,C8,C6,1 ; Invalidate the data TLB entry specified by the

; virtual address in R2

MCR P15,0,R2,C10,C8,0 ; Translate virtual address (R2) and lock into

; data TLB

CPWAIT ; wait for locks to complete

; The MMU is guaranteed to be updated at this point; the next instruction will ; see the locked data TLB entries.

Note: Care must be exercised here when allowing exceptions to occur during this routine whose handlers

may have data that lies in a page that is trying to be locked into the TLB.

Core

3.3.4.4 Round-Robin Replacement Algorithm

The line replacement algorithm for the TLBs is round-robin; there is a round-robin pointer that keeps track of the next entry to replace. The next entry to replace is the one sequentially after the last entry that was written. For example, if the last virtual to physical address translation was written into entry 5, the next entry to replace is entry 6.

At reset, the round-robin pointer is set to entry 31. Once a translation is written into entry 31, the round-robin pointer gets set to the next available entry, beginning with entry 0 if no entries have been locked down. Subsequent translations move the round-robin pointer to the next sequential entry until entry 31 is reached, where it will wrap back to entry 0 upon the next translation.

A lock pointer is used for locking entries into the TLB and is set to entry 0 at reset. A TLB lock operation places the specified translation at the entry designated by the lock pointer, moves the lock pointer to the next sequential entry, and resets the round-robin pointer to entry 31. Locking entries into either TLB effectively reduces the available entries for updating. For example, if the first three entries were locked down, the round-robin pointer would be entry 3 after it rolled over from entry 31.

Only entries 0 through 30 can be locked in either TLB; entry 31can never be locked. If the lock pointer is at entry 31, a lock operation will update the TLB entry with the translation and ignore the lock. In this case, the round-robin pointer will stay at entry 31.

Hardware Reference Manual 87

Intel® IXP2800 Network Processor

Intel XScale

Core

Figure 17 illustrates locked entries in TLB.

Figure 17. Example of Locked Entries in TLB

entry 0 entry 1

Note: 8 entries locked, 24 entries available for round robin replacement

3.4 Instruction Cache

The Intel XScale® core instruction cache enhances performance by reducing the number of instruction fetches from external memory. The cache provides fast execution of cached code. Code can also be locked down when guaranteed or fast access time is required.

Figure 18 shows the cache organization and how the instruction address is used to access the cache.

The instruction cache is a 32-Kbyte, 32-way set associative cache; this means there are 32 sets with each set containing 32 ways. Each way of a set contains eight 32-bit words and one valid bit, which is referred to as a line. The replacement policy is a round-robin algorithm and the cache also supports the ability to lock code in at a line granularity.

entry 7 entry 8

entry 22 entry 23

entry 30 entry 31

Locked

A9684-01

88 Hardware Reference Manual

Figure 18. Instruction Cache Organization

Set Index

Set 0

way 0

This example

shows Set 0 being

selected by the

Set Index

Tag

Word Select

way 1

way 31

Set 1

way 0 way 1

Intel® IXP2800 Network Processor

Set 31

way 0 way 1

8 Words (cache line)

DataCAM

8 Words (cache line)

DataCAM

Intel XScale

Core

Instruction Word

(4 bytes)

Instruction Address (Virtual)

31 54 21010 9

Tag

Note: CAM = Content Addressable Memory

The instruction cache is virtually addressed and virtually tagged. The virtual address presented to the instruction cache may be remapped by the PID register.

3.4.1 Instruction Cache Operation

3.4.1.1 Operation when Instruction Cache is Enabled

When the cache is enabled, it compares every instruction request address to the addresses of instructions that it is holding in cache. If the requested instruction is found, the access “hits” the cache, which returns the requested instruction. If the instruction is not found, the access “misses” the cache, which requests a fetch from external memory of the 8-word line (32 bytes) that contains the instruction (using the fetch policy). As the fetch returns instructions to the cache, they are put in one of two fetch buffers and the requested instruction is delivered to the instruction decoder. A fetched line is written into the cache if it is cacheable (code is cacheable if the MMU is disabled or if the MMU is enabled and the cacheable (C) bit is set to 1 in its corresponding page).

Set Index

Word

A9685-01

Note: An instruction fetch may “miss” the cache but “hit” one of the fetch buffers. If this happens, the

requested instruction is delivered to the instruction decoder in the same manner as a cache “hit.”

Hardware Reference Manual 89

Intel® IXP2800 Network Processor

Intel XScale

Core

3.4.1.2 Operation when Instruction Cache is Disabled

Disabling the cache prevents any lines from being written into the instruction cache. Although the cache is disabled, it is still accessed and may generate a “hit” if the data is already in the cache.

Disabling the instruction cache does not disable instruction buffering that may occur within the instruction fetch buffers. Two 8-word instruction fetch buffers will always be enabled in the cache disabled mode. As instruction fetches continue to “hit” within either buffer (even in the presence of forward and backward branches), no external fetches for instructions are generated. A miss causes one or the other buffer to be filled from external memory using the fill policy.

3.4.1.3 Fetch Policy

An instruction-cache “miss” occurs when the requested instruction is not found in the instruction fetch buffers or instruction cache; a fetch request is then made to external memory. The instruction cache can handle up to two “misses.” Each external fetch request uses a fetch buffer that holds 32-bytes and eight valid bits, one for each word. A miss causes the following:

1. A fetch buffer is allocated.

2. The instruction cache sends a fetch request to the external bus. This request is for a 32-byte line.

3. Instructions words are returned back from the external bus, at a maximum rate of 1 word per core cycle. As each word returns, the corresponding valid bit is set for the word in the fetch buffer.

4. As soon as the fetch buffer receives the requested instruction, it forwards the instruction to the instruction decoder for execution.

5. When all words have returned, the fetched line will be written into the instruction cache if it is cacheable and if the instruction cache is enabled. The line chosen for update in the cache is controlled by the round-robin replacement algorithm. This update may evict a valid line at that location.

6. Once the cache is updated, the eight valid bits of the fetch buffer are invalidated.

3.4.1.4 Round-Robin Replacement Algorithm

The line replacement algorithm for the instruction cache is round-robin. Each set in the instruction cache has a round-robin pointer that keeps track of the next line (in that set) to replace. The next line to replace in a set is the one after the last line that was written. For example, if the line for the last external instruction fetch was written into way 5-set 2, the next line to replace for that set would be way 6. None of the other round-robin pointers for the other sets are affected in this case.

After reset, way 31 is pointed to by the round-robin pointer for all the sets. Once a line is written into way 31, the round-robin pointer points to the first available way of a set, beginning with way0 if no lines have been locked into that particular set. Locking lines into the instruction cache effectively reduces the available lines for cache updating. For example, if the first three lines of a set were locked down, the round-robin pointer would point to the line at way 3 after it rolled over from way 31.

90 Hardware Reference Manual

3.4.1.5 Parity Protection

The instruction cache is protected by parity to ensure data integrity. Each instruction cache word has 1 parity bit. (The instruction cache tag is not parity protected.) When a parity error is detected on an instruction cache access, a prefetch abort exception occurs if the Intel XScale to execute the instruction. Before servicing the exception, hardware place a notification of the error in the Fault Status register (Coprocessor 15, register 5).

A software exception handler can recover from an instruction cache parity error. This can be accomplished by invalidating the instruction cache and the branch target buffer and then returning to the instruction that caused the prefetch abort exception. A simplified code example is shown in

Example 17. A more complex handler might choose to invalidate the specific line that caused the

exception and then invalidate the BTB.

Example 17. Recovering from an Instruction Cache Parity Error

; Prefetch abort handler MCR P15,0,R0,C7,C5,0 ; Invalidate the instruction cache and branch target

; buffer

CPWAIT ; wait for effect

;

Intel® IXP2800 Network Processor

Intel XScale

Core

core attempts

SUBS PC,R14,#4 ; Returns to the instruction that generated the

; parity error

; The Instruction Cache is guaranteed to be invalidated at this point

If a parity error occurs on an instruction that is locked in the cache, the software exception handler needs to unlock the instruction cache, invalidate the cache and then re-lock the code in before it returns to the faulting instruction.

3.4.1.6 Instruction Cache Coherency

The instruction cache does not detect modification to program memory by loads, stores or actions of other bus masters. Several situations may require program memory modification, such as uploading code from disk.

The application program is responsible for synchronizing code modification and invalidating the cache. In general, software must ensure that modified code space is not accessed until modification and invalidating are completed.

To achieve cache coherence, instruction cache contents can be invalidated after code modification in external memory is complete.

If the instruction cache is not enabled, or code is being written to a non-cacheable region, software must still invalidate the instruction cache before using the newly-written code. This precaution ensures that state associated with the new code is not buffered elsewhere in the processor, such as the fetch buffers or the BTB.

Naturally, when writing code as data, care must be taken to force it completely out of the processor into external memory before attempting to execute it. If writing into a non-cacheable region, flushing the write buffers is sufficient precaution. If writing to a cacheable region, then the data cache should be submitted to a Clean/Invalidate operation to ensure coherency.

Hardware Reference Manual 91

Intel® IXP2800 Network Processor

Intel XScale

Core

3.4.2 Instruction Cache Control

3.4.2.1 Instruction Cache State at Reset

After reset, the instruction cache is always disabled, unlocked, and invalidated (flushed).

3.4.2.2 Enabling/Disabling

The instruction cache is enabled by setting bit 12 in coprocessor 15, register 1 (Control register). This process is illustrated in Example 18.

Example 18. Enabling the Instruction Cache

; Enable the ICache MRC P15, 0, R0, C1, C0, 0 ; Get the control register ORR R0, R0, #0x1000 ; set bit 12 -- the I bit MCR P15, 0, R0, C1, C0, 0 ; Set the control register

CPWAIT

3.4.2.3 Invalidating the Instruction Cache

The entire instruction cache along with the fetch buffers are invalidated by writing to coprocessor 15, register 7. This command does not unlock any lines that were locked in the instruction cache nor does it invalidate those locked lines. To invalidate the entire cache including locked lines, the unlock instruction cache command needs to be executed before the invalidate command.

There is an inherent delay from the execution of the instruction cache invalidate command to where the next instruction will see the result of the invalidate. The routine in Example 19 can be used to guarantee proper synchronization.

Example 19. Invalidating the Instruction Cache

MCR P15,0,R1,C7,C5,0 ; Invalidate the instruction cache and branch

; target buffer

CPWAIT

; The instruction cache is guaranteed to be invalidated at this point; the next ; instruction sees the result of the invalidate command.

The Intel XScale

core also supports invalidating an individual line from the instruction cache.

3.4.2.4 Locking Instructions in the Instruction Cache

Software has the ability to lock performance critical routines into the instruction cache. Up to 28 lines in each set can be locked; hardware will ignore the lock command if software is trying to lock all the lines in a particular set (i.e., ways 28 – 31can never be locked). When this happens, the line is still allocated into the cache, but the lock will be ignored. The round-robin pointer will stay at way 31 for that set.

Lines can be locked into the instruction cache by initiating a write to coprocessor 15. Register Rd contains the virtual address of the line to be locked into the cache.

92 Hardware Reference Manual

There are several requirements for locking down code:

1. The routine used to lock lines down in the cache must be placed in non-cacheable memory, which means the MMU is enabled. As a corollary: no fetches of cacheable code should occur while locking instructions into the cache.

2. The code being locked into the cache must be cacheable.

3. The instruction cache must be enabled and invalidated prior to locking down lines.

Failure to follow these requirements will produce unpredictable results when accessing the instruction cache.

System programmers should ensure that the code to lock instructions into the cache does not reside closer than 128 bytes to a non-cacheable/cacheable page boundary. If the processor fetches ahead into a cacheable page, then the first requirement noted above could be violated.

Lines are locked into a set starting at way 0 and may progress up to way 27; which set a line gets locked into depends on the set index of the virtual address. Figure 19 is an example of where lines of code may be locked into the cache along with how the round-robin pointer is affected.

Figure 19. Locked Line Effect on Round Robin Replacement

Intel® IXP2800 Network Processor

Intel XScale

Core

way 0 way 1

way 7 way 8

set 0

Locked

set 1

Locked

set 2 set 31

Locked

way 22 way 23

way 30 way 31

Notes:

set 0: 8 ways locked, 24 ways available for round robin replacement set 1: 23 ways locked, 9 ways available for round robin replacement set 2: 28 ways locked, only way 28-31 available for replacement set 31: all 32 ways available for round robin replacement

A9686-01

Software can lock down several different routines located at different memory locations. This may cause some sets to have more locked lines than others as shown in Figure 19.

Hardware Reference Manual 93

Intel® IXP2800 Network Processor

Intel XScale

Core

Example 20 shows how a routine, called “lockMe” in this example, might be locked into the

instruction cache. Note that it is possible to receive an exception while locking code.

Example 20. Locking Code into the Cache

lockMe: ; This is the code that will be locked into the cache

mov r0, #5 add r5, r1, r2

. . .

lockMeEnd:

. . .

codeLock: ; here is the code to lock the “lockMe” routine

ldr r0, =(lockMe AND NOT 31); r0 gets a pointer to the first line we should lock ldr r1, =(lockMeEnd AND NOT 31); r1 contains a pointer to the last line we should lock

lockLoop:

mcr p15, 0, r0, c9, c1, 0; lock next line of code into ICache cmp r0, r1 ; are we done yet? add r0, r0, #32 ; advance pointer to next line bne lockLoop ; if not done, do the next line

3.4.2.5 Unlocking Instructions in the Instruction Cache

The Intel XScale® core provides a global unlock command for the instruction cache. Writing to coprocessor 15, register 9 unlocks all the locked lines in the instruction cache and leaves them valid. These lines then become available for the round-robin replacement algorithm.

3.5 Branch Target Buffer (BTB)

The Intel XScale® core uses dynamic branch prediction to reduce the penalties associated with changing the flow of program execution. The Intel XScale that provides the instruction cache with the target address of branch type instructions. The branch target buffer is implemented as a 128-entry, direct mapped cache.

3.5.1 Branch Target Buffer Operation

The BTB stores the history of branches that have executed along with their targets. Figure 20 shows an entry in the BTB, where the tag is the instruction address of a previously executed branch and the data contains the target address of the previously executed branch along with two bits of history information.

core features a branch target buffer

94 Hardware Reference Manual

Figure 20. BTB Entry

Intel® IXP2800 Network Processor

Intel XScale

Core

Branch Address[31:9,1]

The BTB takes the current instruction address and checks to see if this address is a branch that was previously seen. It uses bits [8:2] of the current address to read out the tag and then compares this tag to bits [31:9,1] of the current instruction address. If the current instruction address matches the tag in the cache and the history bits indicate that this branch is usually taken in the past, the BTB uses the data (target address) as the next instruction address to send to the instruction cache.

Bit[1] of the instruction address is included in the tag comparison to support Thumb execution. This organization means that two consecutive Thumb branch (B) instructions, with instruction address bits[8:2] the same, will contend for the same BTB entry. Thumb also requires 31 bits for the branch target address. In ARM* mode, bit[1] is 0.

The history bits represent four possible prediction states for a branch entry in the BTB. Figure 21 shows these states along with the possible transitions. The initial state for branches stored in the BTB is Weakly-Taken (WT). Every time a branch that exists in the BTB is executed, the history bits are updated to reflect the latest outcome of the branch, either taken or not-taken.

The BTB does not have to be managed explicitly by software; it is disabled by default after reset and is invalidated when the instruction cache is invalidated.

Figure 21. Branch History

TAG

DATA

Target Address[31:1]

History

Bits[1:0]

A9687-01

SN WN WT ST

Notes:

SN: Strongly Not Take ST: Strongly Taken WN: Weakly Not Taken WT: Weakly Taken

A9688-01

3.5.1.1 Reset

After Processor Reset, the BTB is disabled and all entries are invalidated.

Hardware Reference Manual 95

Intel® IXP2800 Network Processor

Intel XScale

Core

3.5.2 Update Policy

A new entry is stored into the BTB when the following conditions are met:

• The branch instruction has executed

• The branch was taken

• The branch is not currently in the BTB

The entry is then marked valid and the history bits are set to WT. If another valid branch exists at the same entry in the BTB, it will be evicted by the new branch.

Once a branch is stored in the BTB, the history bits are updated upon every execution of the branch as shown in Figure 21.

3.5.3 BTB Control

3.5.3.1 Disabling/Enabling

The BTB is always disabled with Reset. Software can enable the BTB through a bit in a coprocessor register.

Before enabling or disabling the BTB, software must invalidate it (described in the following section). This action will ensure correct operation in case stale data is in the BTB. Software should not place any branch instruction between the code that invalidates the BTB and the code that enables/disables it.

3.5.3.2 Invalidation

There are four ways the contents of the BTB can be invalidated.

1. Reset.

2. Software can directly invalidate the BTB via a CP15, register 7 function.

3. The BTB is invalidated when the Process ID register is written.

4. The BTB is invalidated when the instruction cache is invalidated via CP15, register 7 functions.

3.6 Data Cache

The Intel XScale® core data cache enhances performance by reducing the number of data accesses to and from external memory. There are two data cache structures in the Intel XScale Kbyte data cache and a 2-Kbyte mini-data cache. An eight entry write buffer and a four entry fill buffer are also implemented to decouple the Intel XScale memory accesses, which increases overall system performance.

core, a 32-

core instruction execution from external

96 Hardware Reference Manual

3.6.1 Overviews

3.6.1.1 Data Cache Overview

The data cache is a 32-Kbyte, 32-way set associative cache, i.e., there are 32 sets and each set has 32 ways. Each way of a set contains 32 bytes (one cache line) and one valid bit. There also exist two dirty bits for every line, one for the lower 16 bytes and the other one for the upper 16 bytes. When a store hits the cache, the dirty bit associated with it is set. The replacement policy is a round-robin algorithm and the cache also supports the ability to reconfigure each line as data RAM.

Figure 22 shows the cache organization and how the data address is used to access the cache.

Cache policies may be adjusted for particular regions of memory by altering page attribute bits in the MMU descriptor that controls that memory.

The data cache is virtually addressed and virtually tagged. It supports write-back and write-through caching policies. The data cache always allocates a line in the cache when a cacheable read miss occurs and will allocate a line into the cache on a cacheable write miss when write allocate is specified by its page attribute. Page attribute bits determine whether a line gets allocated into the data cache or mini-data cache.

Figure 22. Data Cache Organization

Intel® IXP2800 Network Processor

Intel XScale

Core

Set 31

way 0 way 1

Set Index

Set 0

This example

shows Set 0 being

selected by the

Set Index

Tag

Word Select

Byte Select

Data Address (Virtual)

31 54 21 010 9

way 0 way 1

way 31

Set 1

way 0 way 1

32 bytes (cache line)

Byte Alignment Sign Extension

(4 bytes to Destination Register)

Tag

32 bytes (cache line)

DataCAM

Data Word

32 bytes (cache line)

DataCAM

Set Index

Word

Byte

Note: CAM = Content Addressable Memory

A9689-01

Hardware Reference Manual 97

Intel® IXP2800 Network Processor

Intel XScale

Core

3.6.1.2 Mini-Data Cache Overview

The mini-data cache is a 2-Kbyte, 2-way set associative cache; this means there are 32 sets with each set containing 2 ways. Each way of a set contains 32 bytes (one cache line) and one valid bit. There also exist 2 dirty bits for every line, one for the lower 16 bytes and the other one for the upper 16 bytes. When a store hits the cache, the dirty bit associated with it is set. The replacement policy is a round-robin algorithm.

Figure 23 shows the cache organization and how the data address is used to access the cache.

The mini-data cache is virtually addressed and virtually tagged and supports the same caching policies as the data cache. However, lines cannot be locked into the mini-data cache.

Figure 23. Mini-Data Cache Organization

This example

shows Set 0 being

selected by the

Set Index

Set 0

way 0 way 1

Set 1

way 0 way 1

Set 31

way 0 way 1

32 bytes (cache line)

Tag

Word Select

Byte Select

Data Address (Virtual)

31 54 21 010 9

Tag

Note: CAM = Content Addressable Memory

Byte Alignment Sign Extension

Data Word

(4 bytes to Destination Register)

Set Index

Word

Byte

A9692-01

98 Hardware Reference Manual

Intel® IXP2800 Network Processor

3.6.1.3 Write Buffer and Fill Buffer Overview

The Intel XScale® core employs an eight entry write buffer, each entry containing 16 bytes. Stores to external memory are first placed in the write buffer and subsequently taken out when the bus is available. The write buffer supports the coalescing of multiple store requests to external memory. An incoming store may coalesce with any of the eight entries.

The fill buffer holds the external memory request information for a data cache or mini-data cache fill or non-cacheable read request. Up to four 32-byte read request operations can be outstanding in the fill buffer before the Intel XScale

The fill buffer has been augmented with a four-entry pend buffer that captures data memory requests to outstanding fill operations. Each entry in the pend buffer contains enough data storage to hold one 32-bit word, specifically for store operations. Cacheable load or store operations that hit an entry in the fill buffer get placed in the pend buffer and are completed when the associated fill completes. Any entry in the pend buffer can be pended against any of the entries in the fill buffer; multiple entries in the pend buffer can be pended against a single entry in the fill buffer. Pended operations complete in program order.

core needs to stall.

3.6.2 Data Cache and Mini-Data Cache Operation

Intel XScale

Core

The following discussions refer to the data cache and mini-data cache as one cache (data/minidata) since their behavior is the same when accessed.

3.6.2.1 Operation when Caching is Enabled

When the data/mini-data cache is enabled for an access, the data/mini-data cache compares the address of the request against the addresses of data that it is currently holding. If the line containing the address of the request is resident in the cache, the access “hits’ the cache. For a load operation the cache returns the requested data to the destination register and for a store operation the data is stored into the cache. The data associated with the store may also be written to external memory if write-through caching is specified for that area of memory. If the cache does not contain the requested data, the access ‘misses’ the cache, and the sequence of events that follows depends on the configuration of the cache, the configuration of the MMU and the page attributes.

3.6.2.2 Operation when Data Caching is Disabled

The data/mini-data cache is still accessed even though it is disabled. If a load hits the cache it will return the requested data to the destination register. If a store hits the cache, the data is written into the cache. Any access that misses the cache will not allocate a line in the cache when it’s disabled, even if the MMU is enabled and the memory region’s cacheability attribute is set.

Hardware Reference Manual 99

Intel® IXP2800 Network Processor

Intel XScale

Core

3.6.2.3 Cache Policies

3.6.2.3.1 Cacheability

Data at a specified address is cacheable given the following:

• The MMU is enabled

• The cacheable attribute is set in the descriptor for the accessed address

• The data/mini-data cache is enabled

3.6.2.3.2 Read Miss Policy

The following sequence of events occurs when a cacheable load operation misses the cache:

1. The fill buffer is checked to see if an outstanding fill request already exists for that line.

— If so, the current request is placed in the pending buffer and waits until the previously

requested fill completes, after which it accesses the cache again, to obtain the request data and returns it to the destination register.

— If there is no outstanding fill request for that line, the current load request is placed in the

fill buffer and a 32-byte external memory read request is made. If the pending buffer or fill buffer is full, the Intel XScale

core will stall until an entry is available.

2. A line is allocated in the cache to receive the 32 bytes of fill data. The line selected is determined by the round-robin pointer (see Section 3.6.2.4). The line chosen may contain a valid line previously allocated in the cache. In this case both dirty bits are examined and if set, the four words associated with a dirty bit that’s asserted will be written back to external memory as a 4-word burst operation.

3. When the data requested by the load is returned from external memory, it is immediately sent to the destination register specified by the load. A system that returns the requested data back first, with respect to the other bytes of the line, will obtain the best performance.

4. As data returns from external memory, it is written into the cache in the previously allocated line.

A load operation that misses the cache and is not cacheable makes a request from external memory for the exact data size of the original load request. For example, LDRH requests exactly two bytes from external memory, LDR requests four bytes from external memory, etc. This request is placed in the fill buffer until, the data is returned from external memory, which is then forwarded back to the destination register(s).

100 Hardware Reference Manual

Intel IXP2800 User Manual

Specifications and Main Features

Frequently Asked Questions

User Manual

Contents

Introduction 1

1.1 About This Document

1.2 Related Documentation

1.3 Terminology

Technical Description 2

2.1 Overview

2.2 Intel XScale® Core Microarchitecture

2.2.1 ARM* Compatibility

2.2.2 Features

2.2.2.1 Multiply/Accumulate (MAC)

2.2.2.2 Memory Management

2.2.2.3 Instruction Cache

2.2.2.4 Branch Target Buffer

2.2.2.5 Data Cache

2.2.2.6 Interrupt Controller

2.2.2.7 Address Map

2.3 Microengines

2.3.1 Microengine Bus Arrangement

2.3.2 Control Store

2.3.3 Contexts

2.3.4 Datapath Registers

2.3.4.1 General-Purpose Registers (GPRs)

2.3.4.2 Transfer Registers

2.3.4.3 Next Neighbor Registers

2.3.4.4 Local Memory

2.3.5 Addressing Modes

2.3.5.1 Context-Relative Addressing Mode

2.3.5.2 Absolute Addressing Mode

2.3.5.3 Indexed Addressing Mode

2.3.6 Local CSRs

2.3.7 Execution Datapath

2.3.7.1 Byte Align

2.3.7.2 CAM

2.3.8 CRC Unit

2.3.9 Event Signals

2.4 DRAM

2.4.1 Size Configuration

2.4.2 Read and Write Access

2.5 SRAM

2.5.1 QDR Clocking Scheme

2.5.2 SRAM Controller Configurations

2.5.3 SRAM Atomic Operations

2.5.4 Queue Data Structure Commands

2.5.5 Reference Ordering

2.5.5.1 Reference Order Tables

2.5.5.2 Microengine Software Restrictions to Maintain Ordering

2.6 Scratchpad Memory

2.6.1 Scratchpad Atomic Operations

2.6.2 Ring Commands

2.7 Media and Switch Fabric Interface

2.7.1 SPI-4

2.7.2 CSIX

2.7.3 Receive

2.7.3.1 RBUF

2.7.3.1.1 SPI-4 and the RBUF

2.7.3.1.2 CSIX and RBUF

2.7.3.2 Full Element List

2.7.3.3 RX_THREAD_FREELIST

2.7.3.4 Receive Operation Summary

2.7.4 Transmit

2.7.4.1 TBUF

2.7.4.1.1 SPI-4 and TBUF

2.7.4.1.2 CSIX and TBUF

2.7.4.2 Transmit Operation Summary

2.7.5 The Flow Control Interface

2.7.5.1 SPI-4

2.7.5.2 CSIX

2.8 Hash Unit

2.9 PCI Controller

2.9.1 Target Access

2.9.2 Master Access

2.9.3 DMA Channels

2.9.3.1 DMA Descriptor

2.9.3.2 DMA Channel Operation

2.9.3.3 DMA Channel End Operation