Intel® IXP42X Product Line of
Network Processors and IXC1100
Control Plane Processor
Developer’s Manual
September 2006
Order Number: 252480-006US
Legal Lines and Disclaimers
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR
OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS
OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING
TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE,
MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for
use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications.
Intel may make changes to specifications and product descriptions at any time, without notice.
Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the
presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel
or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights.
The Intel® IXP42X Product Line of Network Processors and IXC1100 Con trol Plane Processor may contain design defects or errors known as errata which
may cause the product to deviate from published specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an order number and are referenced in this document, or other Intel literature may be obtained by calling
1-800-548-4725 or by visiting Intel's website at http://www.intel.com.
BunnyPeople, Celeron, Chips, Dialogic, EtherExpress, ETOX, FlashFile, i386, i486, i960, iCOMP, InstantIP, Intel, Intel Centrino, Intel Centrino logo, Intel
logo, Intel386, Intel486, Intel740, IntelDX2, IntelDX4, IntelSX2, Intel Inside, Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel
SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel Xeon, Intel XScale, IPLink, Itanium, MCS, MMX, MMX logo, Optimizer logo, OverDrive, Par agon,
PDCharm, Pentium, Pentium II Xeon, Pentium III Xeon, Performance at Your Command, Sound Mark, The Computer Inside., The Journey Inside, VTune,
and Xircom are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other c o untries.
12. Updated Table 123, Configuration Register 0 Description, bit values
13. Added Section 8.9.9.1, User-Configurable Field
14. Updated Section 8.10, Expansion Bus Controller Performance
15. Updated Table 126, Address Map for the APB, peripheral descriptions
16. Updated Section 12, GPIO, description
17. Updated Table 143, GPIO Interrupt Selections
18. Added Sections 15, Ethernet MAC A
19. Added Sections 17.5-17.6, High Speed Serial Interface
Incorporated specification changes, specification clarifications and document changes from the
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
June 2003002
Intel
Specification Update. (252702-001).
Incorporated information for the Intel
®
IXC1100 Control Plane Processor.
February 2003001Initial release of this document. Document reissued, without “Confidential” marking.
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
September 2006DM
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Order Number: 252480-006US25
Intel® IXP42X product line and IXC1100 control plane processors—Introduction
1.0Introduction
1.1About This Document
This document is the main reference for the external architecture of the Intel® IXP42X
Product Line of Network Processors and IXC1100 Control Plane Processor.
1.1.1How to Read This Document
Familiarity with ARM* Version 5TE Architecture is necessary in order to understand
some aspects of this document.
Each chapter in this document focuses on a specific architectural feature of the Intel®
IXP42X product line and IXC1100 control plane processors.
Note:This document’s special terms and acronyms are listed in “Terminology and
Conventions” on page 26.
1.2Other Relevant Documents
Document TitleDocument #
®
Intel
IXP4XX Product Line Specification Update 306428
®
Intel
IXP42X Product Line of Network Processors and IXC1100 Control Plane
Processor Datasheet
®
IXP400 Software Specification Update273795
Intel
®
Intel
IXP400 Software Programmer’s Guide252539
*
Architecture Version 5TE Specification
ARM
PCI Local Bus Specification, Rev. 2.2N/A
Universal Serial Bus Specification, Revision 1.1N/A
UTOPIA Level 2 Specification, Revision 1.0N/A
All numbers in this document can be assumed to be base 10 unless designated
otherwise. In text and pseudo code descriptions, hexadecimal numbers have a prefix of
0x and binary numbers have a prefix of 0b. For example, 107 would be represented as
0x6B in hexadecimal and 0b1101011 in binary.
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
An operation that updates external memory with the contents of the specified line in the
data/mini-data cache if any of the dirty bits are set and the line is valid. There are two
DeassertThe logically inactive value of a signal or bit.
DMADirect Memory Access
DSPDigital Signal Processor
E1 Euro 1 trunk line
FIFOFirst In First Out
Flush
GCIGeneral Circuit Interface
GPIOGeneral-purpose input/output
G.SHDSLITU G series specification for Single-Pair HDSL
HDLCHigh-level Data Link Control
HDSLHigh-Bit-Rate Digital Subscriber Line
HDSL2High-Bit-Rate Digital Subscriber Line, Version 2
HECHead-Error Correction
HPI(Texas Instrument) Host Port Interfaces
HSSHigh-Speed Serial (port)
ISDNIntegrated Services Digital Network
IOMISDN Orientated Modular
LFSRLinear Feedback Shift Register
LSbLeast-Significant bit
dirty bits associated with each line in the cache so only the portion that is dirty will get
written back to external memory.
After this operation, the line is still valid and both dirty bits are deasserted.
Bringing together a new store operation with an existing store operation already resident
in the write buffer. The new store is placed in the same write buffer entry as an existing
store when the address of the new store falls in the four-word, aligned address of the
existing entry. This inc ludes, in PCI terminology , write merging, wri te collapsing, and write
combining.
An operation that invalidates the location(s) in the cache by de-asserting the valid bit.
Individual entries (lines) may be flushed or the entire cache may be flushed with one
command. Once an entry is flushed in the cache it can no longer be used by the progr am.
Description
September 2006DM
Order Number: 252480-006US27
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Introduction
Table 1.Acronyms and Terminology (Continued)
Acronym/
Terminology
LSBLeast-Significant Byte
LUTLook-Up Table
MACMedia Access Controller
MDIOManagement Data Input/Output
MIBManagement Information Base
MIIMedia-Independent Interface
MMUMemory Management Unit
MSbMost-Significant bit
MSBMost-Significant Byte
MVIPMulti-Vendor Integration Protocol
NPENetwork Processor Engine
NRZINon-Return To Zero Inverted
PCIPeripheral Component Interconnect
PECProgrammable Event Counters
PHYPhysical Layer (Layer 1) Interface
Reserved
RXReceive (HSS is receiving from off-chip)
SFDStart of Frame Delimiter
SRAMStatic Random Access Memory
SDRAMSynchronous Dynamic Random Access Memory
T1 Type 1 trunk line
TDMTime Division Multiplex
TLBTranslation Look-Aside Buffer
TXTransmit (HSS is transmitting off-chip)
UARTUniversal Asynchronous Receiver-Transmitter
WANWide Area Network
A field that may be used by an implementation. Software should not modify reserved
fields or depend on any values in reserved fields.
Description
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
IXP42X product line and IXC1100 control plane processors
September 2006DM
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Order Number: 252480-006US29
Intel® IXP42X product line and IXC1100 control plane processors—Overview of Product Line
2.0 Overview of Product Line
The Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane
Processor contain an ARM
®
XScale
Processor. The Intel® IXP42X product line and IXC1100 control plane
*
V5TE-compliant microprocessor referred to as the Intel
processors are designed with Intel 0.18-micron production semiconductor process
technology. This process technology, along with the compactness of the Intel XScale
processor, simultaneous processing of three integrated Network Processing Engines,
and numerous dedicated function peripheral interfaces enables the IXP42X product line
and IXC1100 control plane processors to operate over a wide range of low-cost
networking applications, producing industry-leading performance.
As indicated in Figure 1 through Figure 5, the IXP42X product line and IXC1100 control
plane processors combine many features with the Intel XScale processor to create a
highly integrated processor applicable to LAN/WAN based networking applications. The
IXP42X product line and IXC1100 control plane processors provide two MII interfaces;
a UTOPIA Level -2 interface; a USB v1.1 device controller with embedded transceiver;
a 32-bit, 33/66-MHz PCI bus; an 16-bit expansion bus; a 32-bit, 133-MHz SDRAM
Interface; two UARTs; two High-Speed Serial Interfaces and 16 GPIOs.
Unless otherwise specified, the functional descriptions apply to all of the IXP42X
product line and IXC1100 control plane processors. Refer to the table, “Processor
Features”, in the Intel
®
IXP42X Product Line of Network Processors and IXC1100
Control Plane Processor Datasheet for an overview feature matrix that includes
software enables for all supported processors.
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
The Intel XScale® Processor incorporates an extensive list of architecture features that
allows it to achieve high performance. This rich feature set allows programmers to
select the appropriate features that obtains the best performance for their application.
Many of the architectural features added to Intel XScale processor help hide memory
latency which often is a serious impediment to high-performance processors.
Intel XScale
®
Processor features include:
• The ability to continue instruction execution even while the data cache is retrieving
data from external memory
• A write buffer
• Write-back caching
• Various data cache allocation policies that can be configured different for each
application
• Cache-locking
All these features improve the efficiency of the memory bus external to the IXP42X
product line and IXC1100 control plane processors.
September 2006DM
Order Number: 252480-006US35
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Overview of Product Line
The IXP42X product line and IXC1100 control plane processors have been equipped to
efficiently handle audio processing through the support of 16-bit data types and 16-bit
operations. These audio-coding enhancements center around multiply and accumulate
operations which accelerate many of the audio filter operations.
2.1.1Intel XScale® Processor Overview
Figure 6 shows the major functional blocks of the Intel XScale processor. This section
gives brief, high-level overviews of these blocks.
Figure 6.Intel XScale
®
Technology Architecture Features
Instruction Cache
• 32K or 16K bytes
• 32 ways
• Lockable by line
Branch Target
Buffer
• 128 entries
Data Cache
• 32K or 16K bytes
• 32 ways
• wr-back or
wr-through
• Hit under
miss
IMMU
• 32 entry TLB
• Fully associative
• Lockable by entry
Data RAM
• 28K or 12K
bytes
• Re-map of data
cache
DMMU
• 32 entry TLB
• Fully Associative
• Lockable by entry
MiniData
Cache
• 2K or 1K
bytes
• 2 ways
Fill
Buffer
• 4 - 8 entries
Performance
Monitoring
Power
Debug
• Hardware Breakpoints
• Branch History Table
Note:The Power Management Control feature was not implemented in the IXP42X product
line and IXC1100 control plane processors.
Mgnt
Ctrl
MAC
• Single Cycle
Throughput (16*32)
• 16-bit SIMD
• 40 bit Accumulator
2.1.1.1ARM* Compatibility
ARM* Version 5 Architecture added floating point instructions to ARM Version 4. The
Intel XScale processor implements the integer instruction set architecture of ARM V5,
but does not provide hardware support of the floating point instructions.
Write Buffer
• 8 entries
• Full coalescing
JTAG
Intel XScale processor provides the Thumb* instruction set (ARM V5T) and the ARM
V5E DSP extensions.
Backward compatibility with ARM products is maintained for user-mode applications.
Operating systems may require modifications to match the specific hardware features
of the IXP42X product line and IXC1100 control plane processors and to take
advantage of added performance enhancements.
2.1.1.2Multiply/Accumulate (MAC)
The MAC unit supports early termination of multiplies/accumulates in two cycles and
can sustain a throughput of a MAC operation every cycle. Several architectural
enhancements were made to the MAC to support audio coding algorithms, which
include a 40-bit accumulator and support for 16-bit packed data.
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Overview of Product Line—Intel® IXP42X product line and IXC1100 control plane processors
2.1.1.3Memory Management
The Intel XScale processor implements the Memory Management Unit (MMU)
Architecture specified in the ARM Architecture Reference Manual. The MMU provides
access protection and virtual-to-physical address translation.
The MMU Architecture also specifies the caching policies for the instruction cache and
data cache. These policies are specified as page attributes and include:
• Identifying code as cacheable or non-cacheable
• Selecting between the mini-data cache or data cache
• Write-back or write-through data caching
• Enabling data-write allocation policy
• Enabling the write buffer to coalesce stores to external memory
For more details, see Section 3.1, “Memory Management Unit” on page 44.
2.1.1.4Instruction Cache
The Intel XScale processor comes with a 32-Kbyte instruction cache. The instruction
cache is 32-way set associative and has a line size of 32 bytes. All requests that “miss”
the instruction cache generate a 32-byte read request to external memory. A
mechanism to lock critical code within the cache also is provided.
For more details, see “Instruction Cache” on page 52.
2.1.1.5Branch Target Buffer
The Intel XScale processor provides a Branch Target Buffer (BTB) to predict the
outcome of branch-type instructions. It provides storage for the target address of
branch type instructions and predicts the next address to present to the instruction
cache, when the current instruction address is that of a branch.
The BTB holds 128 entries. For more details, see “Bra nch Target Buffer” on page 58.
2.1.1.6Data Cache
The Intel XScale processor comes with a 32-Kbyte data cache. Besides the main data
cache, a mini-data cache is provided whose size is 1/16
32-Kbyte main data cache has a 2-Kbyte mini-data cache.)
The main data cache is 32-way set associative and the mini-data cache is two-way set
associative. Each cache has a line size of 32 bytes and supports write-through or writeback caching.
The data/mini-data cache is controlled by page attributes defined in the MMU
Architecture and by coprocessor 15.
For more details, see “Data Cache” on page 60.
The Intel XScale processor allows applications to reconfigure a portion of the data
cache as data RAM. Software may place special tables or frequently used variables in
this RAM. For more information on this, see “Reconfiguring the Data Cache as Data
RAM” on page 68.
th
the main data cache. (A
September 2006DM
Order Number: 252480-006US37
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Overview of Product Line
Two performance-monitoring counters have been added to the Intel XScale processor
that can be configured to monitor various events in the Intel XScale processor. These
events allow a software developer to measure cache efficiency, detect system
bottlenecks, and reduce the overall latency of programs.
For more details, see “Performance Mon itoring” on page 133 and Section 11.0,
“Internal Bus Performance Monitoring Unit (IBPMU)” on page 372.
2.2Network Processor Engines (NPE)
The network processor engines are dedicated function processors integrated into many
of the IXP42X product line and IXC1100 control plane processors to off load processing
function required by the Intel XScale processor. Table 2 specifies which devices, of the
IXP42X product line and IXC1100 control plane processors, have which of these
capabilities.
Table 2.Network Processor Functions
DeviceUTOPIA HSS MII 0 MII 1
®
IXP425 Network
Intel
Processor
®
IXP423 Network
Intel
Processor
®
Intel
IXP422 Network
Processor
®
IXP421 Network
Intel
Processor
®
IXP420 Network
Intel
Processor
®
Intel
IXC1100 Control Plane
Processor
AES / DES
/ 3DES
XXXX X8X
XXXX8
XXXX
XXX8
XX
XX
Multi-
Channel
HDLC
SHA-1 /
MD-5
The network processor engines are high-performance, hardware-multi-threaded
processors. All instruction code is stored locally with a dedicated instruction-memory
bus. These engines support processing of the dedicated peripherals. The peripherals
supported using the network processor engines are the following interfaces:
•up to 2 MII
•UTOPIA Level-2
•up to 2 HSS
The combined forces of the hardware multi-threading, local code store, independent
instruction memory, and parallel processing allows the Intel XScale processor to be
utilized purely for application purposes. This parallel processing of the peripheral
interface functions allows unsurpassed performance to be achieved by the application
running on the IXP42X product line and IXC1100 control plane processors.
For further information on the network processor engines, see Section 4. 0, “Net w o rk
Processor Engines (NPE)” on page 202.
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Overview of Product Line—Intel® IXP42X product line and IXC1100 control plane processors
2.3Internal Bus
The internal bus architecture of the Intel XScale processor is designed to allow parallel
processing to occur and isolate bus utilization based on particular traffic patterns. The
bus is segmented into three major buses: the North AHB, the South AHB, and the APB.
The North AHB is a 133.32 MHz, 32-bit bus that can be mastered by the WAN NPE or
both of the Ethernet NPEs. The targets of the North AHB can be the SDRAM or the AHB/
AHB Bridge.
The AHB/AHB Bridge will allow access by the NPEs to the peripherals and internal
targets on the South AHB. Data transfers by the NPEs on the North AHB to the South
AHB are targeted predominately to the queue manager. Transfers to the AHB/AHB
Bridge may be “posted” when writing or “split” when reading. This allows control of the
North AHB to be given to another master on the North AHB and enables the bus to
achieve maximum efficiency.
Transfers to the AHB/AHB Bridge are considered to be small and infrequent relative to
the traffic passed between the NPEs on the North AHB and the SDRAM.
The South AHB is a 133.32 MHz, 32-bit bus that can be mastered by the Intel XScale
processor, PCI Controller DMA engines, AHB/AHB Bridge, and the AHB/APB Bridge. The
targets of the South AHB can be the SDRAM, PCI Interface, Queue Manager, or the
APB/AHB Bridge. Accessing across the APB/AHB allows interfacing to peripherals
attached to the APB.
The APB is a 66.66 MHz, 32-bit bus that can be mastered by the AHB/APB Bridge only.
The targets of the APB can be the High-Speed UART Interface, Console UART Interface,
USB v1.1 interface, all NPEs, the Performance Monitoring Unit (PMU), Interrupt
Controller, General-Purpose Input/Output (GPIO), and timers. The APB interface to the
NPEs are used for code download and part configuration.
For more information, see Section 5.0, “Internal Bus” on page 204.
2.4MII Interfaces
Two industry-standard Media Independent Interfaces (MII) are integrated into the
IXP42X product line and IXC1100 control plane processors with separate Media Access
Controllers and Network Processing Engines. This enables parallel processing of data
traffic on the interfaces and off loading of processing overhead required by the Intel
XScale processor.
The IXP42X product line and IXC1100 control plane processors are compliant with the
IEEE, 802.3 specification.
2.5AHB Queue Manager
The AHB Queue Manager (AQM) provides queue functionality for various internal
blocks. It maintains the queues as circular buffers in an embedded 8KB SRAM. It also
implements the status flags and pointers required for each queue.
The AQM manages 64 independent queues. Each queue is configurable for buffer and
entry size. Additionally status flags are maintained for each queue.
The AQM interfaces include an Advanced High-performance Bus (AHB) interface to the
NPEs and Intel XScale processor (or any other AHB bus master), a Flag Bus interface,
an event bus (to the NPE condition select logic) and two interrupts to the Intel XScale
processor. The AHB interface is used for configuration of the AQM and provides access
to queues, queue status and SRAM. Individual queue status for queues 0-31 is
September 2006DM
Order Number: 252480-006US39
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Overview of Product Line
communicated to the NPEs via the flag bus. Combined queue status for queues 32-63
are communicated to the NPEs via the event bus. The two interrupts, one for queues 031 and one for queues 32-63, provide status interrupts to the Intel XScale processor.
For more information on the AHB Queue Manager, see Section 21.0, “AHB Queue
Manager (AQM)” on page 556.
2.6UTOPIA 2
The integrated UTOPIA Level -2 interface has a dedicated network-processing engine.
The interface allows a multiple- or single-physical-interface configuration. The network
processing engine handles segmentation and reassembly of Asynchronous Transfer
Mode (ATM) cells, CRC Checking/Generation, and the transfer of data to and from
memory. This enables parallel processing of data traffic on the UTOPIA Level-2
interface, off loading processor overhead required by the Intel XScale processor.
The IXP42X product line and IXC1100 control plane processors are compliant with the
ATM Forum, UTOPIA Level -2, Revision 1.0 specification.
For more information on the UTOPIA Level-2 interface, see Section 19.0, “UTOPIA
Level-2” on page 538.
2.7USB v1.1
The integrated USB v1.1 interface is a device-only controller. The interface supports
full-speed operation and 16 end points and includes an integrated transceiver. The
endpoints include:
• Six isochronous endpoints (three input and three output)
• Two control endpoints (one input and one output)
• Two interrupt endpoints (one input and one output)
• Six bulk endpoints (one input and one output)
For more information on the USB v1.1 interface, see Section 18.0, “Universal Serial Bus
(USB) v1.1 Device Controller” on page 468.
2.8PCI
The IXP42X product line and IXC1100 control plane processors’ PCI controller is
compatible with the PCI Local Bus Specification, Rev. 2.2. The PCI interface is 32-bit
compatible bus and capable of operating as either a host or an option (i.e. not the
Host)
For more information on the PCI interface, see Section 6.0, “PCI Controller” on
page 208.
2.9Memory Controller
The memory controller manages the interface to external SDRAM memory chips. The
interface:
• Operates at 133.32 MHz (which is 4 * OSC_IN input pin.)
• Supports eight open pages simultaneously
• Has two banks to support memory configurations from 8 Mbyte to 256 Mbyte
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Overview of Product Line—Intel® IXP42X product line and IXC1100 control plane processors
The memory controller only supports 32-bit memory. If a x16 memory chip is used, a
minimum of two memory chips would be required to facilitate the 32-bit interface
required by the IXP42X product line and IXC1100 control plane processors. A maximum
of four SDRAM memory chips may be attached to the processors.
The memory controller internally interfaces to the North AHB and South AHB with
independent peripherals. This architecture allows SDRAM transfers to be interleaved
and pipelined to achieve maximum possible efficiency. The maximum burst size
supported to the SDRAM Interface is 8-32 bit words. This burst size allows the best
efficiency/fairness performance between accesses from the North and South AHB.
For more information on the memory controller, see Section 7.0, “SDRAM Controller”
on page 276.
2.10Expansion Bus
The expansion interface allows easy and — in most cases — glue-less connection to
peripheral devices. It also provides input information for device configuration after
reset. Some of the peripheral device types are flash, ATM control interfaces, and DSPs
used for voice applications. (Some voice configurations can be supported by the HSS
interfaces and the Intel XScale® Processor, implementing voice-compression
algorithms.)
The expansion bus interface is a 16-bit interface that allows an address range of
512 bytes to 16 Mbytes, using 24 address lines for each of the eight independent chip
selects.
Accesses to the expansion bus interface consists of five phases. Each of the five phases
can be lengthened or shortened by setting various configuration registers on a perchip-select basis. This feature allows the IXP42X product line and IXC1100 control
plane processors to connect to a wide variety of peripheral devices with varying speeds.
The expansion bus interface supports Intel or Motorola* microprocessor-style bus
cycles. The bus cycles can be configured to be multiplexed address/data cycles or
separate address/data cycles for each of the eight chip-selects.
Additionally, Chip Selects 4 through 7 can be configured to support Texas Instruments
HPI-8 or HPI-16 style accesses for DSPs.
The expansion bus interface is an asynchronous interface to externally connected
chips. However, a clock must be supplied to the IXP42X product line and IXC1100
control plane processors’ expansion bus interface for the interface to operate. This
clock can be driven from GPIO 15 or an external source. The maximum clock rate that
the expansion bus interface can accept is 66.66 MHz.
At the de-assertion of reset, the 24-bit address bus is used to capture configuration
information from the levels that are applied to the pins at this time. External pull-up/
pull-down resistors are used to tie the signals to particular logic levels
For more information on the Expansion Interface, see Section 8.0, “Expansion Bus
Controller” on page 292.
2.11High-Speed Serial Interfaces
The High-Speed Serial interfaces are a six-signal interface that supports serial transfer
speeds from 512 KHz to 8.192 MHz.
For more information on the High-Speed Serial Interfaces, see Section 17.0, “High-
Speed Serial Interfaces” on page 448.
September 2006DM
Order Number: 252480-006US41
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Overview of Product Line
2.12Universal Asynchronous Receiver Transceiver
The UART interfaces are 16550-compliant UARTs with the exception of transmit and
receive buffers. Transmit and receive buffers are 64 bytes-deep versus the 16 bytes
required by the 16550 UART specification.
The interface can be configured to support speeds from 1,200 baud to 921 Kbaud. The
interface support configurations of:
• Five, six, seven, or eight data-bit transfers
• One or two stop bits
• Even, odd, or no parity
The request-to-send (RTS_N) and clear-to-send (CTS_N) modem control signals also
are available with the interface for hardware flow control.
For more information on the UART interfaces, see Section 10.0, “Universal
Asynchronous Receiver Transceiver (UART)” on page 332.
2.13GPIO
There are 16 GPIO pins supported by the IXP42X product line and IXC1100 control
plane processors. GPIO pins 0 through 13 can be configured to be general-purpose
input or general-purpose output. Additionally, GPIO pins 0 through 12 can be
configured to be an interrupt input.
GPIO Pin 14 can be configured similar to GPIO pin 13 or as a clock output. The outputclock configuration can be set at various speeds, up to 33.33 MHz, with various duty
cycles. GPIO Pin 14 is configured as an input, upon reset.
GPIO Pin 15 can be configured similar to GPIO pin 13 or as a clock output. The outputclock configuration can be set at various speeds, up to 33.33 MHz, with various duty
cycles. GPIO Pin 15 is configured as a clock output, upon reset. GPIO Pin 15 can be
used to clock the expansion interface, after reset.
For more information on the GPIO pins, see Section 12.0, “General Purpose Input/
Output (GPIO)” on page 386.
2.14Interrupt Controller
The IXP42X product line and IXC1100 control plane processors consist of 32 interrupt
sources to allow an extension of the Intel XScale processor’s FIQ and IRQ interrupt
sources. These sources can originate from external GPIO pins or internal peripheral
interfaces.
The interrupt controller can configure each interrupt source as FIQ, IRQ, or disabled.
The interrupt sources tied to Interrupt 0 to 7 can be prioritized. The remaining
interrupts are prioritized in ascending order. (For example, 8 has a higher priority than
9.)
For more information on the interrupt controller, see Section 13.0, “Interrupt
Controller” on page 398.
2.15Timers
The IXP42X product line and IXC1100 control plane processors consists of four internal
timers operating at 66.66 MHz (which is 2 * OSC_IN input pin.) to allow task
scheduling and prevent software lock-ups. The device has four 32-bit counters:
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Overview of Product Line—Intel® IXP42X product line and IXC1100 control plane processors
• Watch-Dog Timer• Timestamp Timer• Two general-purpose
timers
For more information on the timers, see Section 14.0, “Timers” on page 408.
2.16JTAG
Testability is supported on the IXP42X product line and IXC1100 control plane
processors through the Test Access Port (TAP) Controller implementation, which is
based on IEEE 1149.1 (JTAG) Standard Test Access Port and Boundary-Scan
Architecture. The purpose of the TAP controller is to support test logic internal and
external to the IXP42X product line and IXC1100 control plane processors, such as
built-in self test and boundary scan.
For more information on JTAG, see Section 20.0, “JTAG Interface” on page 548.
§ §
September 2006DM
Order Number: 252480-006US43
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
3.0Intel XScale® Processor
This chapter provides functional descriptions of the Intel XScale® Processor.
3.1Memory Management Unit
This section describes the memory management unit implemented in Intel® IXP42X
Product Line of Network Processors and IXC1100 Control Plane Processor.
The Intel XScale
Architecture specified in the ARM* Architecture Reference Manual. To accelerate
virtual-to-physical address translation, Intel XScale processor uses both an instruction
Translation Look-Aside Buffer (TLB) and a data TLB to cache the latest translations.
Each TLB holds 32 entries and is fully associative.
®
Processor implements the Memory Management Unit (MMU)
Not only do the TLBs contain the translated addresses, but also the access rights for
memory references.
If an instruction or data TLB miss occurs, a hardware translation-table-walking
mechanism is invoked to translate the virtual address to a physical address. Once
translated, the physical address is placed in the TLB along with the access rights and
attributes of the page or section. These translations can also be locked down in either
TLB to guarantee the performance of critical routines.
For more information, refer to “Exceptions” on page 47.
The Intel XScale processor allows system software to associate various attributes with
regions of memory:
• Cacheable
•Bufferable
• Line-allocate policy
• Write policy
•I/O
• Mini data cache
• Coalescing
For a description of page attributes, see “Cacheable (C), Bufferable (B), and eXtension
(X) Bits” on page 45. For information on where these attributes have been mapped in
the MMU descriptors, see “New Page Attributes” on page 152.
Note:The virtual address with which the TLBs are accessed may be remapped by the PID
register. For a description of the PID register , see “Register 13: Process ID” on page 84.
ARM MMU Version 5 Architecture introduces the support of tiny pages, which are
1 Kbyte in size. The reserved field in the first-level descriptor (encoding 0b11) is used
as the fine page table base address. The exact bit fields and the format of the first and
second-level descriptors can be found in “New Page Attributes” on page 152.
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Processor—Intel® IXP42X product line and IXC1100 control plane processors
The attributes associated with a particular region of memory are configured in the
memory management page table and control the behavior of accesses to the
instruction cache, data cache, mini-data cache, and the write buffer. These attributes
are ignored when the MMU is disabled.
To allow compatibility with older system software, the new Intel XScale processor
attributes take advantage of encoding space in the descriptors that was formerly
reserved.
3.1.1Memory Attributes
3.1.1.1Page (P) Attribute Bit
The selection between address or data coherency is controlled by a softwareprogrammable P-Attribute bit in the Intel XScale processor’ s Memory Management Unit
(MMU) and BYTE_SWAP_EN bit. The BYTE_SWAP_EN bit will be from the ExpansionBus Controller Configuration Register 1 Table 126, bit 8. When the IXP42X product line
and IXC1100 control plane processors is reset, this bit will reset to 0.
The default endian-conversion method for IXP42X product line and IXC1100 control
plane processors is address coherency. This was selected for backward compatibility
with the IXP425 A0-step device.
The BYTE_SWAP_EN bit is an enable bit that allows data coherency to be performed,
based on the P-Attribute bit.
• When the bit is 0, address coherency is always performed.
• When the bit is 1, the type of coherency performed is dependent on the P-Attribute
bit.
The P-Attribute bit is associated with each 1-Mbyte page. The P-Attribute bit is output
from the Intel XScale processor with any store or load access associated with that
page.
Note:When preparing data for processing by the NPE memory (if byte swapping is necessary
for the application), the P-attribute bit should be used to byte-swap the entire memory
map belonging to the NPE region. For instance, when the Intel XScale processor is
operating in little endian mode, all data arriving from the NPE that is to be read by the
Intel XScale processor should be configured to swap all bytes of data. When writing this
data from the Intel XScale processor to memory (with the intention of the NPE using
this data) all bytes should be swapped using the P-attribute. Using the P-attribute bit to
byte swap all of the NPE memory region will ensure compatible software code porting
to future releases of the Intel XScale processor. Using the P-attribute bit to byte-swap
1-Mbyte regions of the NPE memory may not allow compatible software code porting to
a future Intel XScale microarchitecture.
3.1.1.2Cacheable (C), Bufferable (B), and eXtension (X) Bits
3.1.1.2.1Instruction Cache
When examining these bits in a descriptor, the Instruction Cache only utilizes the C bit.
If the C bit is clear, the Instruction Cache considers a code fetch from that memory to
be non-cacheable and will not fill a cache entry. If the C bit is set, then fetches from the
associated memory region will be cached.
September 2006DM
Order Number: 252480-006US45
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
3.1.1.2.2Details on Data Cache and Write Buffer Behavior
If the MMU is disabled, all data accesses will be non-cacheable and non-bufferable. This
is the same behavior as when the MMU is enabled and a data access uses a descriptor
with X, C, and B all set to 0.
The X, C, and B bits determine when the processor should place new data into the Data
Cache. The cache places data into the cache in lines (also called blocks). Thus, the
basis for making a decision about placing new data into the cache is a called a “LineAllocation Policy.”
If the Line-Allocation Policy is read-allocate, all load operations that miss the cache,
request a 32-byte cache line from external memory and allocate it into either the data
cache or mini-data cache. (This statement assumes that the cache is enabled.) Store
operations that miss the cache will not cause a line to be allocated.
If read/write-allocate is in effect, load or store operations that miss the cache will
request a 32-byte cache line from external memory if the cache is enabled.
The other policy determined by the X, C, and B bits is the Write Policy . A write-through
policy instructs the data cache to keep external memory coherent by performing stores
to both external memory and the cache. A write-back policy only updates external
memory when a line in the cache is cleaned or needs to be replaced with a new line.
Generally, write-back provides higher performance because it generates less data
traffic to external memory. For more details on cache policies, see “Cacheability” on
page 63
3.1.1.2.3Data Cache and Write Buffer
All of these descriptor bits affect the behavior of the Data Cache and the Write Buffer.
If the X bit for a descriptor is zero, the C and B bits operate as mandated by the ARM
architecture, refer to the ARM* Architecture Reference Manual. This behavior is
detailed in Table 3.
If the X bit for a descriptor is one, the C and B bits’ meaning is extended, as detailed in
Table 4.
Table 3.Data Cache and Buffer Behavior When X = 0
C BCacheableBufferableWrite Policy
0 0NN--Stall until complete
01NY-1 0YYWrite ThroughRead Allocate
1 1YYWrite BackRead Allocate
Note: Normally, the processor will continue executing after a data access if no dependency on that access is
encountered. With this setting, the processor will stall execution until the data access completes. This
guarantees to software that the data access has taken effect by the time execution of the data access
instruction completes. External data aborts from such accesses will be imprecise (but see “Data
Aborts” on page 156 for a method to shield code from this imprecision).
Line
Allocation
Policy
Notes
*
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Processor—Intel® IXP42X product line and IXC1100 control plane processors
Table 4.Data Cache and Buffer Behavior When X = 1
C BCacheableBufferableWrite Policy
0 0----Unpredictable -- do not use
01NY--
10
11YYWrite Back
Notes:
1.Normally, bufferable writes can coalesce with previously buffered data in the same address range.
2.See “Register 1: Control and Auxiliary Control Registers” on page 77 for a description of this register.
(Mini Data
Cache)
---
3.1.1.2.4Memory Operation Ordering
A fence memory operation (memop) is one that guarantees all memops issued prior to
the fence will execute before any memop issued after the fence. Thus software may
issue a fence to impose a partial ordering on memory accesses.
Table 5 on page 47 shows the circumstances in which memops act as fences.
Any swap (SWP or SWPB) to a page that would create a fence on a load or store is a
fence.
Table 5.Memory Operations that Impose a Fence
OperationXCB
load-0store101
load or store000
Line
Allocation
Policy
Read/Write
Allocate
Notes
Writes will not coalesce into
1
buffers
Cache policy is determined
by MD field of Auxiliary
Control register
2
3.1.1.2.5Exceptions
The MMU may generate prefetch aborts for instruction accesses and data aborts for
data memory accesses. The types and priorities of these exceptions are described in
“Event Architecture” on page 154.
Data address alignment checking is enabled by setting bit 1 of the Control Register
(CP15, register 1). Alignment faults are still reported even if the MMU is disabled. All
other MMU exceptions are disabled when the MMU is disabled.
3.1.2Interaction of the MMU, Instruction Cache, and Data Cache
The MMU, instruction cache, and data/mini-data cache may be enabled/disabled
independently . The instruction cache can be enabled with the MMU enabled or disabled.
However, the data cache can only be enabled when the MMU is enabled. Therefore only
three of the four combinations of the MMU and data/mini-data cache enables are valid.
The invalid combination will cause undefined results.
September 2006DM
Order Number: 252480-006US47
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
Table 6.Valid MMU and Data/Mini-Data Cache Combinations
MMUData/mini-data Cache
OffOff
OnOff
OnOn
3.1.3MMU Control
3.1.3.1Invalidate (Flush) Operation
The entire instruction and data TLB can be invalidated at the same time with one
command or they can be invalidated separately. An individual entry in the data or
instruction TLB can also be invalidated. See Table 19, “TLB Functions” on page 82 for a
listing of commands supported by the Intel XScale processor.
Globally invalidating a TLB will not affect locked TLB entries. However, the invalidateentry operations can invalidate individual locked entries. In this case, the locked
contents remain in the TLB, but will never “hit” on an address translation. Effectively,
creating a hole is in the TLB. This situation may be rectified by unlocking the TLB.
3.1.3.2Enabling/Disabling
The MMU is enabled by setting bit 0 in coprocessor 15, register 1 (Control Register).
When the MMU is disabled, accesses to the instruction cache default to cacheable
accesses and all accesses to data memory are made non-cacheable.
A recommended code sequence for enabling the MMU is shown in Example 1 on
page 49.
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Processor—Intel® IXP42X product line and IXC1100 control plane processors
Example 1. Enabling the MMU
; This routine provides software with a predictable way of enabling the MMU.
; After the CPWAIT, the MMU is guaranteed to be enabled. Be aware
; that the MMU will be enabled sometime after MCR and before the instruction
; that executes after the CPWAIT.
; Programming Note: This code sequence requires a one-to-one virtual to
; physical address mapping on this code since
; the MMU may be enabled part way through. This would allow the instructions
; after MCR to execute properly regardless the state of the MMU.
MRC P15,0,R0,C1,C0,0; Read CP15, register 1
ORR R0, R0, #0x1; Turn on the MMU
MCR P15,0,R0,C1,C0,0; Write to CP15, register 1
; For a description of CPWAIT, see
; “Additions to CP15 Functionality” on page 153
CPWAIT
; The MMU is guaranteed to be enabled at this point; the next instruction or
; data address will be translated.
3.1.3.3Locking Entries
Individual entries can be locked into the instruction and data TLBs. See Table 20,
“Cache Lock-Down Functions” on page 83 for the exact commands. If a lock operation
finds the virtual address translation already resident in the TLB, the results are
unpredictable. An invalidate by entry command before the lock command will ensure
proper operation. Software can also accomplish this by invalidating all entries, as
shown in Example 2 on page 50.
Locking entries into either the instruction TLB or data TLB reduces the available number
of entries (by the number that was locked down) for hardware to cache other virtual to
physical address translations.
A procedure for locking entries into the instruction TLB is shown in Example 2 on
page 50.
If a MMU abort is generated during an instruction or data TLB lock operation, the Fault
Status Register is updated to indicate a Lock Abort (see “Data Aborts” on page 156),
and the exception is reported as a data abort.
September 2006DM
Order Number: 252480-006US49
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
Example 2. Locking Entries into the Instruction TLB
; R1, R2 and R3 contain the virtual addresses to translate and lock into
; the instruction TLB.
; The value in R0 is ignored in the following instruction.
; Hardware guarantees that accesses to CP15 occur in program order
MCR P15,0,R0,C8,C5,0; Invalidate the entire instruction TLB
MCR P15,0,R1,C10,C4,0; Translate virtual address (R1) and lock into
; instruction TLB
MCR P15,0,R2,C10,C4,0 ; Translate
; virtual address (R2) and lock into instruction TLB
MCR P15,0,R3,C10,C4,0; Translate virtual address (R3) and lock into
; instruction TLB
CPWAIT
; The MMU is guaranteed to be updated at this point; the next instruction will
; see the locked instruction TLB entries.
Note:If exceptions are allowed to occur in the middle of this routine, the TLB may end up
caching a translation that is about to be locked. For example, if R1 is the virtual
address of an interrupt service routine and that interrupt occurs immediately after the
TLB has been invalidated, the lock operation will be ignored when the interrupt service
routine returns back to this code sequence. Software should disable interrupts (FIQ or
IRQ) in this case.
As a general rule, software should avoid locking in all other exception types.
The proper procedure for locking entries into the data TLB is shown in Example 3 on
page 51.
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Processor—Intel® IXP42X product line and IXC1100 control plane processors
Example 3. Locking Entries into the Data TLB
; R1, and R2 contain the virtual addresses to translate and lock into the data TLB
MCR P15,0,R1,C8,C6,1; Invalidate the data TLB entry specified by the
; virtual address in R1
MCR P15,0,R1,C10,C8,0; Translate virtual address (R1) and lock into
; data TLB
; Repeat sequence for virtual address in R2
MCR P15,0,R2,C8,C6,1; Invalidate the data TLB entry specified by the
; virtual address in R2
MCR P15,0,R2,C10,C8,0; Translate virtual address (R2) and lock into
; data TLB
CPWAIT; wait for locks to complete
; The MMU is guaranteed to be updated at this point; the next instruction will
; see the locked data TLB entries.
Note:Care must be exercised here when allowing exceptions to occur during this routine
whose handlers may have data that lies in a page that is trying to be locked into the
TLB.
3.1.3.4Round-Robin Replacement Algorithm
The line replacement algorithm for the TLBs is round-robin; there is a round-robin
pointer that keeps track of the next entry to replace. The next entry to replace is the
one sequentially after the last entry that was written. For example, if the last virtual to
physical address translation was written into entry 5, the next entry to replace is
entry 6.
At reset, the round-robin pointer is set to entry 31. Once a translation is written into
entry 31, the round-robin pointer gets set to the next available entry, beginning with
entry 0 if no entries have been locked down. Subsequent translations move the roundrobin pointer to the next sequential entry until entry 31 is reached, where it will wrap
back to entry 0 upon the next translation.
A lock pointer is used for locking entries into the TLB and is set to entry 0 at reset. A
TLB lock operation places the specified translation at the entry designated by the lock
pointer, moves the lock pointer to the next sequential entry , and resets the round-robin
pointer to entry 31. Locking entries into either TLB effectively reduces the available
entries for updating. For example, if the first three entries were locked down, the
round-robin pointer would be entry 3 after it rolled over from entry 31.
September 2006DM
Order Number: 252480-006US51
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
Only entries 0 through 30 can be locked in either TLB; entry 31can never be locked. If
the lock pointer is at entry 31, a lock operation will update the TLB entry with the
translation and ignore the lock. In this case, the round-robin pointer will stay at
entry 31.
Figure 7.Example of Locked Entries in TLB
Eight entries locked, 24 entries available for
round robin replacement
entry 0
entry 1
entry 7
entry 8
entry 22
entry 23
entry 30
entry 31
Locked
3.2Instruction Cache
The Intel XScale processor instruction cache enhances performance by reducing the
number of instruction fetches from external memory . The cache provides fast execution
of cached code. Code can also be locked down when guaranteed or fast access time is
required.
Figure 8 shows the cache organization and how the instruction address is used to
access the cache.
The instruction cache is available as a 32 K, 32-way set, associative cache. Each set is
1,024 bytes in size. Each set contains 32 ways. Each way of a set contains eight 32-bit
words and one valid bit, which is referred to as a line. The replacement policy is a
round-robin algorithm and the cache also supports the ability to lock code in at a line
granularity.
The instruction cache is virtually addressed and virtually tagged.
Note:The virtual address presented to the instruction cache may be remapped by the PID
3.2.1Operation When Instruction Cache is Enabled
register. For a description of the PID register , see “Register 13: Process ID” on page 84.
When the cache is enabled, it compares every instruction request address against the
addresses of instructions that it is currently holding. If the cache contains the
requested instruction, the access “hits” the cache, and the cache returns the requested
instruction. If the cache does not contain the requested instruction, the access “misses”
the cache. The cache requests an eight-word, also known as a line, fetch from external
memory that contains the requested instruction using the fetch policy described in
“Instruction-Cache ‘Miss’” on page 53. As the fetch returns instructions to the cache,
the instructions are placed in one of two fetch buffers and the requested instruction is
delivered to the instruction decoder.
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Processor—Intel® IXP42X product line and IXC1100 control plane processors
A fetched line will be written into the cache if it is cacheable. Code is designated as
cacheable when the Memory Management Unit (MMU) is disabled or when the MMU is
enable and the cacheable (C) bit is set to 1 in its corresponding page. See “Memory
Management Unit” on page 44 for a discussion on page attributes.
Note that an instruction fetch may “miss” the cache but “hit” one of the fetch buffers.
When this happens, the requested instruction will be delivered to the instruction
decoder in the same manner as a cache “hit.”
Figure 8.Instruction Cache Organization
Example: 32K byte cache
Set Index
Set 1
Set 0
way 0
way 1
This example
shows Set 0 being
selected by the
set index.
Tag
Word Select
Instruction Address (Virtual) — 32-Kbyte Cache
3110954210
CAM
way 31
way 0
way 1
8 Words (cache line)
CAM
way 31
Instruction Word
(4 bytes)
TagSet IndexWord
Set 31
way 0
way 1
CAM
8 Words (cache line)
way 31
DATA
DATA
8 Words (cache line)
DATA
CAM: Content
Addressable Memory
Disabling the cache prevents any lines from being written into the instruction cache.
Although the cache is disabled, it is still accessed and may generate a “hit” if the data is
already in the cache.
Disabling the instruction cache does not disable instruction buffering that may occur
within the instruction fetch buffers. Two 8-word instruction fetch buffers will always be
enabled in the cache disabled mode. So long as instruction fetches continue to “hit”
within either buffer (even in the presence of forward and backward branches), no
external fetches for instructions are generated. A miss causes one or the other buffer to
be filled from external memory using the fill policy described in “Instruction-Cache
‘Miss’” on page 53.
3.2.1.1Instruction-Cache ‘Miss’
An instruction-cache “miss” occurs when the requested instruction is not found in the
instruction fetch buffers or instruction cache; a fetch request is then made to external
memory. The instruction cache can handle up to two “misses.” Each external fetch
request uses a fetch buffer that holds 32-bytes and eight valid bits, one for each word.
A miss causes the following:
September 2006DM
Order Number: 252480-006US53
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
• A fetch buffer is allocated
• The instruction cache sends a fetch request to the external bus. This request is for
a 32-byte line.
• Instructions words are returned back from the external bus, at a maximum rate of
1 word per core cycle. The instruction cache can have the eight words of data
return in any order, which allows the Intel XScale processor to send the requested
instruction first, thus reducing fetch latency. (This is referred to as critical word
first.)As each word returns, the corresponding valid bit is set for the word in the
fetch buffer.
• As soon as the fetch buffer receives the requested instruction, it forwards the
instruction to the instruction decoder for execution.
• When all words have returned, the fetched line will be written into the instruction
cache if it’s cacheable and if the instruction cache is enabled. The line chosen for
update in the cache is controlled by the round-robin replacement algorithm. This
update may evict a valid line at that location. For more information on enabling or
disabling instruction cache, refer to “Instruction-Cache Coherence” on page 55
1. Once the cache is updated, the eight valid bits of the fetch buffer are invalidated.
The line replacement algorithm for the instruction cache is round-robin. Each set in the
instruction cache has a round-robin pointer that keeps track of the next line (in that
set) to replace. The next line to replace in a set is the one after the last line that was
written. For example, if the line for the last external instruction fetch was written into
way 5-set 2, the next line to replace for that set would be way 6. None of the other
round-robin pointers for the other sets are affected in this case.
After reset, way 31 is pointed to by the round-robin pointer for all the sets. Once a line
is written into way 31, the round-robin pointer points to the first available way of a set,
beginning with way0 if no lines have been locked into that particular set. Locking lines
into the instruction cache effectively reduces the available lines for cache updating. For
example, if the first three lines of a set were locked down, the round-robin pointer
would point to the line at way 3 after it rolled over from way 31. For more details on
cache locking, see “Instruction-Cache Coherence” on page 55.
The instruction cache is protected by parity to ensure data integrity. Each instruction
cache word has 1 parity bit. (The instruction cache tag is NOT parity protected.) When
a parity error is detected on an instruction cache access, a prefetch abort exception
occurs if the Intel XScale processor attempts to execute the instruction. Before
servicing the exception, hardware places a notification of the error in the Fault Status
Register (Coprocessor 15, register 5).
A software exception handler can recover from an instruction cache parity error. The
parity error can be accomplished by invalidating the instruction cache and the branch
target buffer and then returning to the instruction that caused the prefetch abort
exception. A simplified code example is shown in Example 4 on page 55. A more
complex handler might choose to invalidate the specific line that caused the exception
and then invalidate the BTB.
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Processor—Intel® IXP42X product line and IXC1100 control plane processors
Example 4. Recovering from an Instruction Cache Parity Error
; Prefetch abort handler
MCR P15,0,R0,C7,C5,0; Invalidate the instruction cache and branch target
; buffer
CPWAIT; wait for effect (see “Additions to CP15 Functionality”
on page 153 for a
; description of CPWAIT)
SUBS PC,R14,#4; Returns to the instruction that generated the
; parity error
; The Instruction Cache is guaranteed to be invalidated at this point
If a parity error occurs on an instruction that is locked in the cache, the software
exception handler needs to unlock the instruction cache, invalidate the cache and then
re-lock the code in before it returns to the faulting instruction.
The instruction cache does not detect modification to program memory by loads, stores
or actions of other bus masters. Several situations may require program memory
modification, such as uploading code from disk.
The application program is responsible for synchronizing code modification and
invalidating the cache. In general, software must ensure that modified code space is
not accessed until modification and invalidating are completed.
3.2.1.3Instruction-Cache Coherence
To achieve cache coherence, instruction cache contents can be invalidated after code
modification in external memory is complete.
If the instruction cache is not enabled, or code is being written to a non-cacheable
region, software must still invalidate the instruction cache before using the newlywritten code. This precaution ensures that state associated with the new code is not
buffered elsewhere in the processor, such as the fetch buffers or the BTB.
Naturally, when writing code as data, care must be taken to force it completely out of
the processor into external memory before attempting to execute it. If writing into a
non-cacheable region, flushing the write buffers is sufficient precaution (see “Register
7: Cache Functions” on page 81 for a description of this operation). If writing to a
cacheable region, then the data cache should be submitted to a Clean/Invalidate
operation (see “Cacheability” on page 63) to ensure coherency.
After reset, the instruction cache is always disabled, unlocked, and invalidated
(flushed).
The instruction cache is enabled by setting bit 12 in coprocessor 15, register 1 (Control
Register). This process is illustrated in Example 5, Enabling the Instruction Cache.
September 2006DM
Order Number: 252480-006US55
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
Example 5. Enabling the Instruction Cache
; Enable the ICache
MRC P15, 0, R0, C1, C0, 0; Get the control register
ORR R0, R0, #0x1000; set bit 12 -- the I bit
MCR P15, 0, R0, C1, C0, 0; Set the control register
CPWAIT
The entire instruction cache along with the fetch buffers are invalidated by writing to
coprocessor 15, register 7. (See Table 18, “Cache Functions” on page 81 for the exact
command.) The invalidate command does not unlock any lines that were locked in the
instruction cache nor does it invalidate those locked lines. To invalidate the entire cache
including locked lines, the unlock instruction cache command needs to be executed
before the invalidate command. The unlock command can also be found in Table 20,
“Cache Lock-Down Functions” on page 83.
There is an inherent delay from the execution of the instruction cache invalidate
command to where the next instruction will see the result of the invalidate. The
following routine can be used to guarantee proper synchronization.
Example 6. Invalidating the Instruction Cache
MCR P15,0,R1,C7,C5,0; Invalidate the instruction cache and branch
; target buffer
CPWAIT
; The instruction cache is guaranteed to be invalidated at this point; the next
; instruction sees the result of the invalidate command.
The Intel XScale processor also supports invalidating an individual line from the
instruction cache. See Table 18, “Cache Functions” on page 81 for the exact command.
Software has the ability to lock performance critical routines into the instruction cache.
Up to 28 lines in each set can be locked; hardware will ignore the lock command if
software is trying to lock all the lines in a particular set (i.e., ways 28-31can never be
locked). When all ways in a particular set are requested to be locked, the instruction
cache line will still be allocated into the cache but the lock will be ignored. The roundrobin pointer will stay at way 31 for that set.
Cache lines can be locked into the instruction cache by initiating a write to
coprocessor 15. (See Table 20, “Cache Lock-Down Functions” on page 83 for the exact
command.) Register Rd contains the virtual address of the line to be locked into the
cache.
There are several requirements for locking down code:
• The routine used to lock lines down in the cache must be placed in non-cacheable
memory, which means the MMU is enabled.
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Processor—Intel® IXP42X product line and IXC1100 control plane processors
As a result: no fetches of cacheable code should occur while locking instructions
into the cache.
• The code being locked into the cache must be cacheable
• The instruction cache must be enabled and invalidated prior to locking down lines.
Failure to follow these requirements will produce unpredictable results when accessing
the instruction cache.
System programmers should ensure that the code to lock instructions into the cache
does not reside closer than 128 bytes to a non-cacheable/cacheable page boundary. If
the processor fetches ahead into a cacheable page, then the first requirement noted
above could be violated.
Lines are locked into a set starting at way 0 and may progress up to way 27; which set
a line gets locked into depends on the set index of the virtual address. Figure 9 is an
example (32-Kbyte cache) of where lines of code may be locked into the cache along
with how the round-robin pointer is affected.
Figure 9.Locked Line Effect on Round-Robin Replacement
xample
set 0: 8 ways locked, 24 ways available for round robin replacement
set 1: 23 ways locked, 9 ways available for round robin replacement
set 2: 28 ways locked, only way 28-31 available for replacement
set 31: all 32 ways available for round robin replacement
way 0
way 1
...
way 7
way 8
......
way 22
way 23
way 30
way 31
set 0
Locked
set 1
Locked
set 2
Locked
...
set 31
Software can lock down several different routines located at different memory
locations. This may cause some sets to have more locked lines than others as shown in
Figure 9.
Example 7 on page 58 shows how a routine, called “lockMe” in this example, might be
locked into the instruction cache. Note that it is possible to receive an exception while
locking code (see “Event Architecture” on page 154).
September 2006DM
Order Number: 252480-006US57
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
Example 7. Locking Code into the Cache
lockMe:; This is the code that will be locked into the cache
mov r0, #5
add r5, r1, r2
. . .
lockMeEnd:
. . .
codeLock:; here is the code to lock the “lockMe” routine
ldr r0, =(lockMe AND NOT 31); r0 gets a pointer to the first line we
should lock
ldr r1, =(lockMeEnd AND NOT 31); r1 contains a pointer to the last line we
should lock
lockLoop:
mcr p15, 0, r0, c9, c1, 0; lock next line of code into ICache
cmp r0, r1; are we done yet?
add r0, r0, #32; advance pointer to next line
bne lockLoop; if not done, do the next line
The Intel XScale processor provides a global unlock command for the instruction cache.
Writing to coprocessor 15, register 9 unlocks all the locked lines in the instruction
cache and leaves them valid. These lines then become available for the round-robin
replacement algorithm. (See Table 20, “Cache Lock-Down Functions” on page 83 for
the exact command.)
3.3Branch Target Buffer
The Intel XScale processor uses dynamic branch prediction to reduce the penalties
associated with changing the flow of program execution. The Intel XScale processor
features a branch target buffer that provides the instruction cache with the target
address of branch type instructions. The branch target buffer is implemented as a 128entry, direct-mapped cache.
This section is primarily for those optimizing their code for performance. An
understanding of the branch target buffer is needed in this case so that code can be
scheduled to best utilize the performance benefits of the branch target buffer.
3.3.1Branch Target Buffer (BTB) Operation
The BTB stores the history of bran ches that have executed along with their targets.
Figure 10 shows an entry in the BTB, where the tag is the instruction address of a
previously executed branch and the data contains the target address of the previously
executed branch along with two bits of history information.
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Processor—Intel® IXP42X product line and IXC1100 control plane processors
Figure 10.BTB Entry
TAG
Branch Address[31:9,1]Target Address[31:1]
DATA
History
Bits[1:0]
The BTB takes the current instruction address and checks to see if this address is a
branch that was previously seen. The BTB uses bits [8:2] of the current address to read
out the tag and then compares this tag to bits [31:9,1] of the current instruction
address. If the current instruction address matches the tag in the cache and the history
bits indicate that this branch is usually taken in the past, the BTB uses the data (target
address) as the next instruction address to send to the instruction cache.
Bit[1] of the instruction address is included in the tag comparison in order to support
Thumb execution. This organization means that two consecutive Thumb branch (B)
instructions, with instruction address bits[8:2] the same, will contend for the same BTB
entry. Thumb also requires 31 bits for the branch target address. In ARM mode, bit[1]
is zero.
The history bits represent four possible prediction states for a branch entry in the BTB.
Figure 11, “Branch History” on page 59 shows these states along with the possible
transitions. The initial state for branches stored in the BTB is W eakly-Tak en (WT). Every
time a branch that exists in the BTB is executed, the history bits are updated to reflect
the latest outcome of the branch, either taken or not-taken.
“Performance Considerations” on page 159 describes which instructions are
dynamically predicted by the BTB and the performance penalty for incorrectly
predicting a branch.
The BTB does not have to be managed explicitly by software; it is disabled by default
after reset and is invalidated when the instruction cache is invalidated.
Figure 11.Branch History
SN
Not
Taken
SN: Strongly Not Taken
WN: Weakly Not Taken
3.3.1.1Reset
After Processor Reset, the BTB is disabled and all entries are invalidated.
Taken
Not Taken
WN
Not Taken
Taken
WT
Not Taken
ST: Strongly Taken
WT: Weakly Taken
Taken
ST
Taken
September 2006DM
Order Number: 252480-006US59
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
A new entry is stored into the BTB when the following conditions are met:
• The branch instruction has executed
• The branch was taken
• The branch is not currently in the BTB
The entry is then marked valid and the history bits are set to WT. If another valid
branch exists at the same entry in the BTB, it will be evicted by the new branch.
Once a branch is stored in the BTB, the history bits are updated upon every execution
of the branch as shown in Figure 11.
The BTB is always disabled with Reset. Software can enable the BTB through a bit in a
coprocessor register (see “Register 1: Control and Auxiliary Control Registers” on
page 77).
Before enabling or disabling the BTB, software must invalidate the BTB (described in
the following section). This action will ensure correct operation in case stale data is in
the BTB. Software should not place any branch instruction between the code that
invalidates the BTB and the code that enables/disables it.
There are four ways the contents of the BTB can be invalidated.
• Reset
• Software can directly invalidate the BTB via a CP15, register 7 function.
Refer to “Register 7: Cache Functions” on page 81.
• The BTB is invalidated when the Process ID Register is written.
• The BTB is invalidated when the instruction cache is inv alidated via CP15, register 7
functions.
3.4Data Cache
The Intel XScale processor data cache enhances performance by reducing the number
of data accesses to and from external memory . Th ere are two data cache structures in
the Intel XScale processor: a 32-Kbyte data cache and a 2-Kbyte mini-data cache. An
eight entry write buffer and a four-entry, fill buffer are also implemented to decouple
the Intel XScale processor instruction execution from external memory accesses, which
increases overall system performance.
3.4.1Data Cache Overview
The data cache is a 32-Kbyte, 32-way set, associative cache. The 32-Kbyte cache has
32 sets. Each set contains 32 ways. Each way of a set contains 32 bytes (one cache
line) and one valid bit. There also exist two dirty bits for every line, one for the lower
16 bytes and the other one for the upper 16 bytes. When a store hits the cache the
dirty bit associated with it is set. The replacement policy is a round-robin algorithm and
the cache also supports the ability to reconfigure each line as data RAM.
Figure 12, “Data Cache Organization” on page 61 shows the cache organization and
how the data address is used to access the cache.
Cache policies may be adjusted for particular regions of memory by altering page
attribute bits in the MMU descriptor that controls that memory. See “Memory
Attributes” on page 45 for a description of these bits.
The data cache is virtually addressed and virtually tagged. The data cache supports
write-back and write-through caching policies. The data cache always allocates a line in
the cache when a cacheable read miss occurs and will allocate a line into the cache on
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Processor—Intel® IXP42X product line and IXC1100 control plane processors
a cacheable write miss when write allocate is specified by its page attribute. Page
attribute bits determine whether a line gets allocated into the data cache or mini-data
cache.
Figure 12.Data Cache Organization
Example: 32-Kbyte cache
Set Index
Set 1
Set 0
way 0
This example shows
Set 0 being selected
by the set index.
Tag
Word S elect
Byte Select
Data Address (Virtual) — 32-Kbyte Cache
3110954210
way 1
CAM
way 31
way 0
way 1
32 bytes (cache line)
CAM
way 31
Byte Alignment
Sign Extension
(4 bytes to Destination Register)
TagSet IndexWordByte
Data Word
Set 31
way 0
way 1
CAM
32 bytes (cache line)
way 31
DATA
DATA
32 bytes (cache line)
DATA
CAM: Content Addressable Memory
The mini-data cache is 2 Kbytes in size. The 2-Kbyte mini data cache has 32 sets and is
two-way set associative. Each way of a set contains 32 bytes (one cache line) and one
valid bit. There also exist two dirty bits for every line, one for the lower 16 bytes and
the other one for the upper 16 bytes. When a store hits the cache the dirty bit
associated with it is set. The replacement policy is a round-robin algorithm.
Figure 13, “Mini-Data Cache Organization” on page 62 shows the cache organization
and how the data address is used to access the cache.
The mini-data cache is virtually addressed and virtually tagged and supports the same
caching policies as the data cache. However, lines can’t be locked into the mini-data
cache.
September 2006DM
Order Number: 252480-006US61
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
Figure 13.Mini-Data Cache Organization
Example: 2K byte cache
Set Index
This example
shows Set 0
being selected by
the set index.
Tag
Word S elect
Byte Select
Data Address (Virtual) — 2-Kbyte Cache
3110954210
Set 0
Set 1
way 0
way 0
way 1
way 1
32 bytes (cache line)
Byte Alignment
Sign Extension
(4 bytes to Destination Register)
TagSet IndexWordByte
Data Word
Set 31
way 0
way 1
32 bytes (cache line)
32 bytes (cache line)
The Intel XScale processor employs an eight entry write buffer, each entry containing
16 bytes. Stores to external memory are first placed in the write buffer and
subsequently taken out when the bus is available.
The write buffer supports the coalescing of multiple store requests to external memory .
An incoming store may coalesce with any of the eight entries.
The fill buffer holds the external memory request information for a data cache or minidata cache fill or non-cacheable read request. Up to four 32-byte read request
operations can be outstanding in the fill buffer before the Intel XScale processor needs
to stall.
The fill buffer has been augmented with a four entry pend buffer that captures data
memory requests to outstanding fill operations. Each entry in the pend buffer contains
enough data storage to hold one 32-bit word, specifically for store operations.
Cacheable load or store operations that hit an entry in the fill buffer get placed in the
pend buffer and are completed when the associated fill completes. Any entry in the
pend buffer can be pended against any of the entries in the fill buffer; multiple entries
in the pend buffer can be pended against a single entry in the fill buffer.
Pended operations complete in program order.
The following discussions refer to the data cache and mini-data cache as one cache
(data/mini-data) since their behavior is the same when accessed.
When the data/mini-data cache is enabled for an access, the data/mini-data cache
compares the address of the request against the addresses of data that it is currently
holding. If the line containing the address of the request is resident in the cache, the
access “hits” the cache. For a load operation the cache returns the requested data to
the destination register and for a store operation the data is stored into the cache. The
data associated with the store may also be written to external memory if write-through
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Processor—Intel® IXP42X product line and IXC1100 control plane processors
caching is specified for that area of memory. If the cache does not contain the
requested data, the access ‘misses’ the cache, and the sequence of events that follows
depends on the configuration of the cache, the configuration of the MMU and the page
attributes, which are described in “Cacheability” on page 63.
The data/mini-data cache is still accessed even though it is disabled. If a load hits the
cache it will return the requested data to the destination register. If a store hits the
cache, the data is written into the cache. Any access that misses the cache will not
allocate a line in the cache when it’s disabled, even if the MMU is enabled and the
memory region’s cacheability attribute is set.
3.4.2Cacheability
Data at a specified address is cacheable given the following:
• the MMU is enabled
• the cacheable attribute is set in the descriptor for the accessed address
• and the data/mini-data cache is enabled
The following sequence of events occurs when a cacheable (see “Cacheability” on
page 63) load operation misses the cache:
1. The fill buffer is checked to see if an outstanding fill request already exists for that
line.
If so, the current request is placed in the pending buffer and waits until the
previously requested fill completes, after which it accesses the cache again, to
obtain the request data and returns it to the destination register.
If there is no outstanding fill request for that line, the current load request is placed
in the fill buffer and a 32-byte external memory read request is made. If the
pending buffer or fill buffer is full, the Intel XScale processor will stall until an entry
is available.
2. A line is allocated in the cache to receive the 32-bytes of fill data. The line selected
is determined by the round-robin pointer. (See “Cacheability” on page 63.) The line
chosen may contain a valid line previously allocated in the cache. In this case both
dirty bits are examined and if set, the four words associated with a dirty bit that’s
asserted will be written back to external memory as a four word burst operation.
3. When the data requested by the load is returned from external memory, it is
immediately sent to the destination register specified by the load. A system that
returns the requested data back first, with respect to the other bytes of the line,
will obtain the best performance.
4. As data returns from external memory it is written into the cache in the previously
allocated line.
A load operation that misses the cache and is NOT cacheable makes a request from
external memory for the exact data size of the original load request. For example,
LDRH requests exactly two bytes from external memory, LDR requests 4 bytes from
external memory, etc. This request is placed in the fill buffer until, the data is returned
from external memory, which is then forwarded back to the destination register(s).
A write operation that misses the cache will request a 32-byte cache line from external
memory if the access is cacheable and write allocation is specified in the page. In this
case the following sequence of events occur:
1. The fill buffer is checked to see if an outstanding fill request already exists for that
line.
If so, the current request is placed in the pending buffer and waits until the
previously requested fill completes, after which it writes its data into the recently
allocated cache line.
September 2006DM
Order Number: 252480-006US63
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
If there is no outstanding fill request for that line, the current store request is
placed in the fill buffer and a 32-byte external memory read request is made. If the
pending buffer or fill buffer is full, the Intel XScale processor will stall until an entry
is available.
2. The 32-bytes of data can be returned back to the Intel XScale processor in any
word order, i.e, the eight words in the line can be returned in any order. Note that it
does not matter, for performance reasons, which order the data is returned to the
Intel XScale processor since the store operation has to wait until the entire line is
written into the cache before it can complete.
3. When the entire 32-byte line has returned from external memory, a line is allocated
in the cache, selected by the round-robin pointer. The line to be written into the
cache may replace a valid line previously allocated in the cache. In this case both
dirty bits are examined and if any are set, the four words associated with a dirt y bit
that’s asserted will be written back to external memory as a 4 word burst
operation. This write operation will be placed in the write buffer.
4. The line is written into the cache along with the data associated with the store
operation.
If the above condition for requesting a 32-byte cache line is not met, a write miss will
cause a write request to external memory for the exact data size specified by the store
operation, assuming the write request doesn’t coalesce with another write operation in
the write buffer.
The Intel XScale processor supports write-back caching or write-through caching,
controlled through the MMU page attributes. When write-through caching is specified,
all store operations are written to external memory even if the access hits the cache.
This feature keeps the external memory coherent with the cache, i.e., no dirt y bits are
set for this region of memory in the data/mini-data cache. However write through does
not guarantee that the data/mini-data cache is coherent with external memory, which
is dependent on the system level configuration, specifically if the external memory is
shared by another master.
When write-back caching is specified, a store operation that hits the cache will not
generate a write to external memory, thus reducing external memory traffic.
The line replacement algorithm for the data cache is round-robin. Each set in the data
cache has a round-robin pointer that keeps track of the next line (in that set) to
replace. The next line to replace in a set is the next sequential line after the last one
that was just filled. For example, if the line for the last fill was written into way 5-set 2,
the next line to replace for that set would be way 6. None of the other round-robin
pointers for the other sets are affected in this case.
After reset, way 31 is pointed to by the round-robin pointer for all the sets. Once a line
is written into way 31, the round-robin pointer points to the first available way of a set,
beginning with way 0 if no lines have been re-configured as data RAM in that particular
set. Re-configuring lines as data RAM effectively reduces the available lines for cache
updating. For example, if the first three lines of a set were re-configured, the roundrobin pointer would point to the line at way 3 after it rolled over from way 31. Refer to
“Reconfiguring the Data Cache as Data RAM” on page 68 for more details on data RAM.
The mini-data cache follows the same round-robin replacement algorithm as the data
cache except that there are only two lines the round-robin pointer can point to such
that the round-robin pointer always points to the least recently filled line. A least
recently used replacement algorithm is not supported because the purpose of the minidata cache is to cache data that exhibits low temporal locality, i.e.,data that is placed
into the mini-data cache is typically modified once and then written back out to
external memory.
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Processor—Intel® IXP42X product line and IXC1100 control plane processors
The data cache and mini-data cache are protected by parity to ensure data integrity;
there is one parity bit per byte of data. (The tags are NOT parity protected.) When a
parity error is detected on a data/mini-data cache access, a data abort exception
occurs. Before servicing the exception, hardware will set bit 10 of the Fault Status
Register register.
A data/mini-data cache parity error is an imprecise data abort, meaning R14_ABORT
(+8) may not point to the instruction that caused the parity error. If the parity error
occurred during a load, the targeted register may be updated with incorrect data.
A data abort due to a data/mini-data cache parity error may not be recoverable if the
data address that caused the abort occurred on a line in the cache that has a writeback caching policy. Prior updates to this line may be lost; in this case the software
exception handler should perform a “clean and clear” operation on the data cache,
ignoring subsequent parity errors, and restart the offending process.
The SWP and SWPB instructions generate an atomic load and store operation allowin g
a memory semaphore to be loaded and altered without interruption. These accesses
may hit or miss the data/mini-data cache depending on configuration of the cache,
configuration of the MMU, and the page attributes.
After processor reset, both the data cache and mini-data cache are disabled, all valid
bits are set to zero (invalid), and the round-robin bit points to wa y 31. Any lines in the
data cache that were configured as data RAM before reset are changed back to
cacheable lines after reset, i.e., there are 32 Kbytes of data cache and zero bytes of
data RAM.
The data cache and mini-data cache are enabled by setting bit 2 in coprocessor 15,
register 1 (Control Register). See “Configuration” on page 73, for a description of this
register and others.
Equation 8 shows code that enables the data and mini-data caches. Note that the MMU
MCR p15, 0, r0, c1, c0, 0; And update the Control register
Individual entries can be invalidated and cleaned in the data cache and mini-data cache
via coprocessor 15, register 7. Note that a line locked into the data cache remains
locked even after it has been subjected to an invalidate-entry operation. This will leav e
an unusable line in the cache until a global unlock has occurred. For this reason, do not
use these commands on locked lines.
This same register also provides the command to invalidate the entire data cache and
mini-data cache. Refer to Table 18, “Cache Functions” on page 81 for a listing of the
commands. These global invalidate commands have no effect on lines locked in the
data cache. Locked lines must be unlocked before they can be invalidated. This is
accomplished by the Unlock Data Cache command found in Table 20, “Cache Lock-
Down Functions” on page 83.
; (see Chapter 7.2.8, Register 7: Cache functions)
September 2006DM
Order Number: 252480-006US65
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
A simple software routine is used to globally clean the data cache. It takes advantage
of the line-allocate data cache operation, which allocates a line into the data cache.
This allocation evicts any cache dirty data back to external memory. Example 9 shows
how data cache can be cleaned.
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Processor—Intel® IXP42X product line and IXC1100 control plane processors
Example 9. Global Clean Operation
; Global Clean/Invalidate THE DATA CACHE
; R1 contains the virtual address of a region of cacheable memory reserved for
; this clean operation
; R0 is the loop count; Iterate 1024 times which is the number of lines in the
; data cache
;; Macro ALLOCATE performs the line-allocation cache operation on the
;; address specified in register Rx.
;;
MACRO ALLOCATE Rx
MCR P15, 0, Rx, C7, C2, 5
ENDM
MOV R0, #1024
LOOP1:
ALLOCATE R1; Allocate a line at the virtual address
; specified by R1.
ADD R1, R1, #32; Increment the address in R1 to the next cache line
SUBS R0, R0, #1; Decrement loop count
BNE LOOP1
;
;Clean the Mini-data Cache
; Can’t use line-allocate command, so cycle 2KB of unused data through.
; R2 contains the virtual address of a region of cacheable memory reserved for
; cleaning the Mini-data Cache
; R0 is the loop count; Iterate 64 times which is the number of lines in the
; Mini-data Cache.
MOV R0, #64
LOOP2:
LDR R3,[R2],#32 ; Load and increment to next cache line
SUBS R0, R0, #1; Decrement loop count
BNE LOOP2
;
; Invalidate the data cache and mini-data cache
MCR P15, 0, R0, C7, C6, 0
;
September 2006DM
Order Number: 252480-006US67
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
The line-allocate operation does not require physical memory to exist at the virtual
address specified by the instruction, since it does not generate a load/fill request to
external memory. Also, the line-allocate operation does not set the 32 bytes of data
associated with the line to any known value. Reading this data will produce
unpredictable results.
The line-allocate command will not operate on the mini Data Cache, so system software
must clean this cache by reading 2 Kbytes of contiguous unused data into it. This data
must be unused and reserved for this purpose so that it will not already be in the
cache. It must reside in a page that is marked as mini Data Cache cacheable (see “New
Page Attributes” on page 152).
The time it takes to execute a global clean operation depends on the number of dirty
lines in cache.
3.4.3Reconfiguring the Data Cache as Data RAM
Software has the ability to lock tags associated with 32-byte lines in the data cache,
thus creating the appearance of data RAM. Any subsequent access to this line will
always hit the cache unless it is invalidated. Once a line is locked into the data cache it
is no longer available for cache allocation on a line fill. Up to 28 lines in each set can be
reconfigured as data RAM, such that the maximum data RAM size is 28 Kbytes for the
32-Kbyte cache and 12 Kbytes for the 16-Kbyte cache.
Hardware does not support locking lines into the mini-data cache; any attempt to do
this will produce unpredictable results.
There are two methods for locking tags into the data cache; the method of choice
depends on the application. One method is used to lock data that resides in external
memory into the data cache and the other method is used to re-configure lines in the
data cache as data RAM. Locking data from external memory into the data cache is
useful for lookup tables, constants, and any other data that is frequently accessed. Reconfiguring a portion of the data cache as data RAM is useful when an application needs
scratch memory (bigger than the register file can provide) for frequently used
variables. These variables may be strewn across memory, making it advantageous for
software to pack them into data RAM memory.
Code examples for these two applications are shown in Example 1 0 on page 69 and
Example 11 on page 70. The difference between these two routines is that Example 10
on page 69 actually requests the entire line of data from external memory and
Example 11 on page 70 uses the line-allocate operation to lock the tag into the cache.
No external memory request is made, which means software can map any unallocated
area of memory as data RAM. However, the line-allocate operation does validate the
target address with the MMU, so system software must ensure that the memory has a
valid descriptor in the page table.
Another item to note in Example 11 on page 70 is that the 32 bytes of data located in a
newly allocated line in the cache must be initialized by software before it can be read.
The line allocate operation does not initialize the 32 bytes and therefore reading from
that line will produce unpredictable results.
In both examples, the code drains the pending loads before and after locking data. Th is
step ensures that outstanding loads do not end up in the wrong place -- either
unintentionally locked into the cache or mistakenly left out in the proverbial cold. Note
also that a drain operation has been placed after the operation that locks the tag into
the cache. This drains ensures predictable results if a programmer tries to lock more
than 28 lines in a set; the tag will get allocated in this case but not locked into the
cache.
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
MCR P15,0,R2,C9,C2,0; Put the data cache in lock mode
CPWAIT
MOV R0, #16
LOOP1:
ALLOCATE R1; Allocate and lock a tag into the data cache at
; address [R1].
; initialize 32 bytes of newly allocated line
DRAIN
STRD R4, [R1],#8 ;
STRD R4, [R1],#8 ;
STRD R4, [R1],#8 ;
STRD R4, [R1],#8 ;
SUBS R0, R0, #1; Decrement loop count
BNE LOOP1
; Turn off data cache locking
DRAIN; Finish all pending operations
MOV R2, #0x0
MCR P15,0,R2,C9,C2,0; Take the data cache out of lock mode.
CPWAIT
September 2006DM
Order Number: 252480-006US71
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
Tags can be locked into the data cache by enabling the data cache lock mode bit
located in coprocessor 15, register 9. (See Table 20, “Cache Lock-Down Functions” on
page 83 for the exact command.) Once enabled, any new lines allocated into the data
cache will be locked down.
Note that the PLD instruction will not affect the cache contents if it encounters an error
while executing. For this reason, system software should ensure the memory address
used in the PLD is correct. If this cannot be ascertained, replace the PLD with a LDR
instruction that targets a scratch register.
Lines are locked into a set starting at way0 and may progress up to way 27; which set
a line gets locked into depends on the set index of the virtual address of the request.
Figure 9, “Locked Line Effect on Round-Robin Replacement” on page 57 is an example
of where lines of code may be locked into the cache along with how the round-robin
pointer is affected.
Figure 14.Locked Line Effect on Round-Robin Replacement
set 0: 8 ways locked, 24 ways available for round robin replacement
set 1: 23 ways locked, 9 ways available for round robin replacement
set 2: 28 ways locked, only ways 28-31 available for replacement
set 31: all 32 ways available for round robin replacement
set 0
set 1
set 2
way 0
way 1
...
way 7
way 8
......
way 22
Locked
Locked
Locked
way 23
...
set 31
way 30
way 31
Software can lock down data located at different memory locations. This may cause
some sets to have more locked lines than others as shown in Figure 9.
Lines are unlocked in the data cache by performing an unlock operation. See “Register
9: Cache Lock Down” on page 82 for more information about locking and unlocking the
data cache.
Before locking, the programmer must ensure that no part of the target data range is
already resident in the cache. The Intel XScale processor will not refetch such data,
which will result in it not being locked into the cache. If there is any doubt as to the
location of the targeted memory data, the cache should be cleaned and invalidated to
prevent this scenario. If the cache contains a locked region which the programmer
wishes to lock again, then the cache must be unlocked before being cleaned and
invalidated.
See “Terminology and Conventions” on page 26 for a definition of coalescing.
The write buffer is always enabled which means stores to external memory will be
buffered. The K bit in the Auxiliary Control Register (CP15, register 1) is a global
enable/disable for allowing coalescing in the write buffer. When this bit disables
coalescing, no coalescing will occur regardless the value of the page attributes. If this
bit enables coalescing, the page attributes X, C, and B are examined to see if
coalescing is enabled for each region of memory.
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Processor—Intel® IXP42X product line and IXC1100 control plane processors
All reads and writes to external memory occur in program order when coalescing is
disabled in the write buffer. If coalescing is enabled in the write buffer, writes may
occur out of program order to external memory. Program correctness is maintained in
this case by comparing all store requests with all the valid entries in the fill buffer.
The write buffer and fill buffer support a drain operation, such that before the next
instruction executes, all Intel XScale processor data requests to external memory have
completed. See Table 18, “Cache Functions” on page 81 for the exact command.
Writes to a region marked non-cacheable/non-bufferable (page attributes C, B, and X
all 0) will cause execution to stall until the write completes.
If software is running in a privileged mode, it can explicitly drain all buffered writes. For
details on this operation, see the description of Drain Write Buffer in “Register 7: Cache
Functions” on page 81.
3.5Configuration
This section describes the System Control Coprocessor (CP15) and coprocessor 14
(CP14). CP15 configures the MMU, caches, buffers and other system attributes. Where
possible, the definition of CP15 follows the definition of the ARM products. CP14
contains the performance monitor registers, clock and power management registers
and the debug registers.
CP15 is accessed through MRC and MCR coprocessor instructions and allowed only in
privileged mode. Any access to CP15 in user mode or with LDC or STC coprocessor
instructions will cause an undefined instruction exception.
All CP14 registers can be accessed through MRC and MCR coprocessor instructions.
LDC and STC coprocessor instructions can only access the clock and power
management registers, and the debug registers. The performance monitoring registers
can’t be accessed by LDC and STC because CRm != 0x0. Access to all registers is
allowed only in privileged mode. Any access to CP14 in user mode will cause an
undefined instruction exception.
Coprocessors, CP15 and CP14, on the Intel XScale
®
Processor do not support access
via CDP, MRRC, or MCRR instructions. An attempt to access these coprocessors with
these instructions will result in an Undefined Instruction exception.
Many of the MCR commands available in CP15 modify hardware state sometime after
execution. A software sequence is available for those wishing to determine when this
update occurs and can be found in “Additions to CP15 Functionality” on page 153.
Like certain other ARM architecture products, the Intel XScale
®
Processor includes an
extra level of virtual address translation in the form of a PID (Process ID) register and
associated logic. For a detailed description of this facility, see “Register 13: Process ID”
on page 84. Privileged code needs to be aware of this facility because, when interacting
with CP15, some addresses are modified by the PID and others are not.
An address that has yet to be modified by the PID (“PIDified”) is known as a virtual
address (VA). An address that has been through the PID logic, but not tr anslated into a
physical address, is a modified virtual address (MVA). Non-privileged code always deals
with VAs, while privileged code that programs CP15 occasionally needs to use MVAs.
The format of MRC and MCR is shown in Table 7. cp_num is defined for CP15, CP14 and CP0 on the Intel XScale processor. CP0 supports
instructions specific for DSP and is described in “Programming Model” on page 144
September 2006DM
Order Number: 252480-006US73
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
Unless otherwise noted, unused bits in coprocessor registers have unpredictable values
when read. For compatibility with future implementations, software should not rely on
the values in those bits.
19:16CRn - specifies which coprocessor register15:12Rd - General Purpose Register, R0..R15-
11:8cp_num - coprocessor number
7:5opcode_2 - Function bits
3:0CRm - Function bits
n - Read or write coprocessor register
0 = MCR
1 = MRC
Should be programmed to zero for future
compatibility
-
Intel XScale processor defines three
coprocessors:
0b1111 = CP15
0b1110 = CP14
0x0000 = CP0
Note: Mappings are implemen ta tion defined
This field should be programmed to zero for
future compatibility unless a value has been
specified in the command.
This field should be programmed to zero for
future compatibility unless a value has been
specified in the command.
for all coprocessors below CP14 and
above CP0. Access to unimplemented
coprocessors (as defined by the
cpConfig bus) cause exceptions.
The format of LDC and STC for CP14 is shown in Table 8. LDC and STC follow the
programming notes in the ARM* Architecture Reference Manual. Note that access to
CP15 with LDC and STC will cause an undefined exception.
LDC and STC transfer a single 32-bit word between a coprocessor register and
memory. These instructions do not allow the programmer to specify values for
opcode_1, opcode_2, or Rm; those fields implicitly contain zero, which means the
performance monitoring registers are not accessible.
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Architecture version = ARM Version 5TE
(0x05 will be value returned)
Core Generation
0b010 = Intel XScale processor
This field reflects a specific set of architecture features
supported by the core. If new features are added/
deleted/modified, this field will change.
Intel XScale processor Revision:
This field reflects revisions of core generations.
Differences may include errata that dictate different
operating conditions, software work-around, etc. Value
returned will be 000b
Cache class = 0b0101
The caches support locking, write back and round-robin
replacement. They do not support ad dress by index.
Data Cache Size (Dsize)
0b110 = 32 KB
Instruction cache size (Isize)
0b110 = 32 KB
3.5.1.2Register 1: Control and Auxiliary Control Registers
Register 1 is made up of two registers, one that is compliant with ARM Version 5TE and
referred by opcode_2 = 0x0, and the other which is specific to the Intel XScale
processor is referred by opcode_2 = 0x1. The latter is known as the Auxiliary Control
Register.
The Exception Vector Relocation bit (bit 13 of the ARM control register) allows the
vectors to be mapped into high memory rather than their default location at address 0.
This bit is readable and writable by software. If the MMU is enabled, the exception
vectors will be accessed via the usual translation method involving the PID register
(see “Register 13: Process ID” on page 84) and the TLBs. To avoid automatic
application of the PID to exception vector accesses, software may relocate the
exceptions to high memory.
This selects the access checks performed by the memory
management unit. See the ARM* Architecture Reference Manual for more information.
System Protection (S)
This selects the access checks performed by the memory
management unit. See the ARM* Architecture Reference Manual for more information.
Big/Little Endian (B)
0 = Little-endian operation
1 = Big-endian operation
Data cache enable/disable (C)
0 = Disabled
1 = Enabled
Alignment fault enable/disable (A)
0 = Disabled
1 = Enabled
Memory management unit enable/disable (M)
0 = Disabled
1 = Enabled
The mini-data cache attribute bits, in the Auxiliary Control Register, are used to control
the allocation policy for the mini-data cache and whether it will use write-back caching
or write-through caching.
The configuration of the mini-data cache should be setup before any data access is
made that may be cached in the mini-data cache. Once data is cached, software must
ensure that the mini-data cache has been cleaned and invalidated before the mini-data
cache attributes can be changed.
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Mini Data Cache Attributes (MD)
All configurations of the Mini-data cache are cacheable,
stores are buffered in the write buffer and stores will be
coalesced in the write buffer as long as coalescing is
globally enable (bit 0 of this register).
Access permissions for all 16 domains - The meaning
of each field can be found in the ARM* Architecture Reference Manual.
3.5.1.5Register 4: Reserved
Register 4 is reserved. Reading and writing this register yields unpredictable results.
3.5.1.6Register 5: Fault Status Register
The Fault Status Register (FSR) indicates which fault has occurred, which could be
either a prefetch abort or a data abort. Bit 10 extends the encoding of the status field
for prefetch aborts and data aborts. The definition of the extended status field is found
in “Event Architecture” on page 154. Bit 9 indicates that a debug event occurred and
the exact source of the event is found in the debug control and status register (CP14,
register 10). When bit 9 is set, the domain and extended status field are undefined.
Upon entry into the prefetch abort or data abort handler, hardware will update this
register with the source of the exception. Software is not required to clear these fields.
All the functions defined in existing ARM products appear here. The Intel XScale
processor adds other functions as well. This register should be accessed as write-only.
Reads from this register, as with an MRC, have an undefined effect.
The Drain Write Buffer function not only drains the write buffer but also drains the fill
buffer.
The Intel XScale processor does not check permissions on addresses supplied for cache
or TLB functions. Due to the fact only privileged software may execute these functions,
full accessibility is assumed. Cache functions will not generate any of the following:
• Translation faults
• Domain faults
• Permission faults
Fault Virtual Address - Contains the MVA of the data
access that caused the memory abort
The invalidate instruction cache line command does not invalidate the BTB. If software
invalidates a line from the instruction cache and modifies the same location in external
memory, it needs to invalidate the BTB also. Not invalidating the BTB in this case may
cause unpredictable results.
Disabling/enabling a cache has no effect on contents of the cache: valid data stays
valid, locked items remain locked. All operations defined in Table 18 work regardless of
whether the cache is enabled or disabled.
Since the Clean DCache Line function reads from the data cache, it is capable of
generating a parity fault. The other operations will not generate parity faults.
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
The line-allocate command allocates a tag into the data cache specified by bits [31:5]
of Rd. If a valid dirty line (with a different MVA) already exists at this location it will be
evicted. The 32 bytes of data associated with the newly allocated line are not initialized
and therefore will generate unpredictable results if read.
Line allocate command may be used for cleaning the entire data cache on a context
switch and also when reconfiguring portions of the data cache as data RAM. In both
cases, Rd is a virtual address that maps to some non-existent physical memory. When
creating data RAM, software must initialize the data RAM before read accesses can
occur. Specific uses of these commands can be found in Chapter 3.0, “Data Cache”.
Other items to note about the line-allocate command are:
• It forces all pending memory operations to complete.
• Bits [31:5] of Rd is used to specific the virtual address of the line to be allocated
into the data cache.
• If the targeted cache line is already resident, this command has no effect.
• The command cannot be used to allocate a line in the mini Data Cache.
• The newly allocated line is not marked as “dirty” so it will never get evicted.
However, if a valid store is made to that line it will be marked as “dirty” and will get
written back to external memory if another line is allocated to the same cache
location. This eviction will produce unpredictable results.
To avoid this situation, the line-allocate operation should only be used if one of the
following can be guaranteed:
— The virtual address associated with this command is not one that will be
generated during normal program execution. This is the case when line-allocate
is used to clean/invalidate the entire cache.
— The line-allocate operation is used only on a cache region destined to be
locked. When the region is unlocked, it must be invalidated before making
another data access.
3.5.1.9Register 8: TLB Operations
Disabling/enabling the MMU has no effect on the contents of either TLB: valid entries
stay valid, locked items remain locked. All operations defined in Table 19 work
regardless of whether the TLB is enabled or disabled.
This register should be accessed as write-only. Reads from this register, as with an
MRC, have an undefined effect.
Register 9 is used for locking down entries into the instruction cache and data cache.
(The protocol for locking down entries can be found in Chapter 3.0, “Data Cache”.)
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Processor—Intel® IXP42X product line and IXC1100 control plane processors
Table 20 shows the command for locking down entries in the instruction and data
cache. The entry to lock in the instruction cache is specified by the virtual address in
Rd. The data cache locking mechanism follows a different procedure than the
instruction cache. The data cache is placed in lock down mode such that all subsequent
fills to the data cache result in that line being locked in, as controlled by Table 21.
Lock/unlock operations on a disabled cache have an undefined effect.
Read and write access is allowed to the data cache lock register bit[0]. All other
accesses to register 9 should be write-only; reads, as with an MRC, have an undefined
effect.
Table 20.Cache Lock-Down Functions
Functionopcode_2CRmDataInstruction
Fetch and Lock I cache line 0b0000b0001MVAMCR p15, 0, Rd, c9, c1, 0
Unlock Instruction cache0b0010b0001IgnoredMCR p15, 0, Rd, c9, c1, 1
Read data cache lock register0b0000b0010
Write data cache lock register0b0000b0010
Unlock Data Cache0b0010b0010IgnoredMCR p15, 0, Rd, c9, c2, 1
Register 10 is used for locking down entries into the instruction TLB, and data TLB.
(The protocol for locking down entries can be found in Chapter 3.0, “Memory
Management Unit”.) Lock/unlock operations on a TLB when the MMU is disabled have
an undefined effect.
This register should be accessed as write-only. Reads from this register, as with an
MRC, have an undefined effect.
Table 22 shows the command for locking down entries in the instruction TLB, and data
TLB. The entry to lock is specified by the virtual address in Rd.
Table 22.TLB Lockdown Functions
Data Cache Lock Mode (L)
0 = No locking occurs
1 = Any fill into the data cache while this bit is set gets
locked in
L
Functionopcode_2CRmDataInstruction
Translate and Lock I TLB entry0b0000b0100MVAMCR p15, 0, Rd, c10, c4, 0
September 2006DM
Order Number: 252480-006US83
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
Table 22.TLB Lockdown Functions
Functionopcode_2CRmDataInstruction
Translate and Lock D TLB entry0b0000b1000MVAMCR p15, 0, Rd, c10, c8, 0
Unlock I TLB0b0010b0100IgnoredMCR p15, 0, Rd, c10, c4, 1
Unlock D TLB0b0010b1000IgnoredMCR p15, 0, Rd, c10, c8, 1
3.5.1.12Register 11-12: Reserved
These registers are reserved. Reading and writing them yields unpredictable results.
3.5.1.13Register 13: Process ID
The Intel XScale processor supports the remapping of virtual addresses through a
Process ID (PID) register. This remapping occurs before the instruction cache,
instruction TLB, data cache and data TLB are accessed. The PID register controls when
virtual addresses are remapped and to what value.
The PID register is a 7-bit value that is ORed with bits 31:25 of the virtual address
when they are zero. This action effectively remaps the address to one of 128 “slots” in
the 4 Gbytes of address space. If bits 31:25 are not zero, no remapping occurs. This
feature is useful for operating system management of processes that may map to the
same virtual address space. In those cases, the virtually mapped caches on the Intel
XScale processor would not require invalidating on a process switch.
Table 23.Accessing Process ID
Functionopcode_2CRmInstruction
Read Proce ss ID Register0b0000b0000MRC p15, 0, Rd, c13, c0, 0
Write Process ID Register0b0000b0000MCR p15, 0, Rd, c13, c0, 0
Process ID - This field is used for remapping the virtual
address when bits 31-25 of the virtual address are zero.
Reserved - Should be programmed to zero for future
compatibility
3.5.1.14The PID Register Affect On Addresses
All addresses generated and used by User Mode code are eligible for being “PIDified” as
described in the previous section. Privileged code, however, must be aware of certain
special cases in which address generation does not follow the usual flow.
The PID register is not used to remap the virtual address when accessing the Branch
Target Buffer (BTB). Any writes to the PID register invalidate the BTB, which prevents
any virtual addresses from being double mapped between two processes.
A breakpoint address (see “Register 14: Breakpoint Registers” on page 85) must be
expressed as an MVA when written to the breakpoint register. This requirement means
the value of the PID must be combined appropriately with the address before it is
written to the breakpoint register. All virtual addresses in translation descriptors (see
“Memory Management Unit” on page 44) are MVAs.
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Processor—Intel® IXP42X product line and IXC1100 control plane processors
3.5.1.15Register 14: Breakpoint Registers
The Intel XScale processor contains two instruction breakpoint address registers
(IBCR0 and IBCR1), one data breakpoint address register (DBR0), one configurable
data mask/address register (DBR1), and one data breakpoint control register (DBCON).
The Intel XScale processor also supports a 256-entry , trace buffer that records program
execution information. The registers to control the trace buffer are located in CP14 .
Refer to “Software Debug” on page 88 for more information on these features of the
Intel XScale processor.
Table 25.Accessing the Debug Registers
Functionopcode_2CRmInstruction
Access Instruction Breakpoint
Control Register 0 (IBCR0)
Access Instruction Breakpoint
Control Register 1(IBCR1)
This register is selected when opcode_2 = 0 and CRm = 1.
This register controls access rights to all the coprocessors in the system except for
CP15 and CP14. Both CP15 and CP14 can only be accessed in privilege mode. This
register is accessed with an MCR or MRC with the CRm field set to 1.
This register controls access to CP0, atypical use for this register is for an operating
system to control resource sharing among applications. Initially, all applications are
denied access to shared resources by clearing the appropriate coprocessor bit in the
Coprocessor Access Register. An application may request the use of a shared resource
(e.g., the accumulator in CP0) by issuing an access to the resource, which will result in
an undefined exception. The operating system may grant access to this coprocessor by
setting the appropriate bit in the Coprocessor Access Register and return to the
application where the access is retried.
Sharing resources among different applications requires a state saving mechanism.
Two possibilities are:
• The operating system, during a context switch, could save the state of the
coprocessor if the last executing process had access rights to the coprocessor.
• The operating system, during a request for access, saves off the old coprocessor
state and saves it with last process to have access to it.
Under both scenarios, the OS needs to restore state when a request for access is made.
This means the OS has to maintain a list of what processes are modifying CP0 and their
associated state.
September 2006DM
Order Number: 252480-006US85
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
Example 12. Disallowing access to CP0
;; The following code clears bit 0 of the CPAR.
;; This will cause the processor to fault if software
Processor—Intel® IXP42X product line and IXC1100 control plane processors
3.5.2.1Performance Monitoring Registers
The performance monitoring unit contains a control register (PMNC), a clock counter
(CCNT), interrupt enable register (INTEN), overflow flag register (FLAG), event
selection register (EVTSEL) and four event counters (PMN0 through PMN3). The format
of these registers can be found in “Performance Monitoring” on page 133, along with a
description on how to use the performance monitoring facility.
Opcode_2 should be zero on all accesses.
These registers can’t be accessed by LDC and STC coprocessor instructions.
Table 28.Accessing the Performance Monitoring Registers
Description
(PMNC) Performance Monitor Control
(CCNT) Clock Counter Register0b00010b0001
(INTEN) Interrupt Enable Register0b01000b0001
(FLAG) Overflow Flag Register0b01010b0001
(EVTSEL) Event Selection Register0b10000b0001
(PMN0) Performance Count Register 0 0b00000b0010
(PMN1) Performance Count Register 10b00010b0010
(PMN2) Performance Count Register 20b00100b0010
(PMN3) Performance Count Register 30b00110b0010
Register
CRn
Register
#
0b00000b0001
3.5.2.2Clock and Power Management Registers
These registers contain functions for managing the core clock and power.
For the IXP42X product line and IXC1100 control plane processors, these registers are
Software debug is supported by address breakpoint registers (Coprocessor 15,
register 14), serial communication over the JTAG interface and a trace buffer.
Registers 8 and 9 are used for the serial interface and registers 10 through 13 support
a 256 entry trace buffer. Register 14 and 15 are the debug link register and debug
SPSR (saved program status register). These registers are explained in more detail in
“Software Debug” on page 88.
CCLKCFG
Reserved (write as zero)
Opcode_2 and CRm should be zero.
Table 32.Accessing the Debug Registers
FunctionCRn (Register #)Instruction
Access Transmit Debug Register (TX)0b1000
Access Receive Debug Register (RX)0b1001
Access Debug Control and Status Register
(DBGCSR)
Access Trace Buffer Register (TBREG)0b1011
Access Checkpoint 0 Register (CHKPT0)0b1100
Access Checkpoint 1 Register (CHKPT1)0b1101
Access Transmit and Receive Debug Control
Register
3.6Software Debug
This section describes the software debug and related features implemented in the
IXP42X product line and IXC1100 control plane processors, namely:
• Debug modes, registers and exceptions
• A serial debug communication link via the JTAG interface
•A trace buffer
• A mechanism to load the instruction cache through JTAG
CP15 registers are accessible using MRC and MCR. CRn and CRm specify the register to
access. The opcode_1 and opcode_2 fields are not used and should be set to 0.
and IXC1100 control plane processors, when a debug event
occurs.
IXP42X product line and IXC1100 control plane processors.
CP14 registers are accessible using MRC, MCR, LDC and STC (CDP to any CP14
registers will cause an undefined instruction trap). The CRn field specifies the number
of the register to access. The CRm, opcode_1, and opcode_2 fields are not used and
should be set to 0.
Software access to all debug registers must be done in privileged mode. User mode
access will generate an undefined instruction exception. Specifying registers which do
not exist has unpredictable results.
The TX and RX registers, certain bits in the TXRXCTRL register, and certain bits in the
DCSR can be accessed by a debugger through the JTAG interface.
3.6.3Debug Modes
The IXP42X product line and IXC1100 control plane processors’ debug unit, when used
with a debugger application, allows software running on an IXP42X product line and
IXC1100 control plane processors’ target to be debugged. The debug unit allows the
debugger to stop program execution and re-direct execution to a debug handling
routine. Once program execution has stopped, the debugger can examine or modify
processor state, co-processor state, or memory. The debugger can then restart
execution of the application.
On IXP42X product line and IXC1100 control plane processors, one of two debug
modes can be entered:
•Halt mode
• Monitor mode
September 2006DM
Order Number: 252480-006US89
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
3.6.3.1Halt Mode
When the debug unit is configured for halt mode, the reset vector is overloaded to
serve as the debug vector. A new processor mode, DEBUG mode (CPSR[4:0] = 0x15),
is added to allow debug exceptions to be handled similarly to other types of ARM
exceptions.
When a debug exception occurs, the processor switches to debug mode and redirects
execution to a debug handler, via the reset vector. After the debug handler begins
execution, the debugger can communicate with the debug handler to examine or alter
processor state or memory through the JTAG interface.
The debug handler can be downloaded and locked directly into the instruction cache
through JTAG so external memory is not required to contain debug handler code.
3.6.3.2Monitor Mode
In monitor mode, debug exceptions are handled like ARM prefetch aborts or ARM data
aborts, depending on the cause of the exception.
When a debug exception occurs, the processor switches to abort mode and branches to
a debug handler using the pre-fetch abort vector or data abort vector. The debugger
then communicates with the debug handler to access processor state or memory
contents.
3.6.4Debug Control and Status Register (DCSR)
The DCSR register is the main control register for the debug unit. Table 33 shows the
format of the register. The DCSR register can be accessed in privileged modes by
software running on the processor or by a debugger through the JTAG interface. Refer
to “SELDCSR JTAG Register” on page 103 for details about accessing DCSR through
JTAG.
Table 33.Debug Control and Status Register (DCSR) (Sheet 1 of 2)
The Global Enable bit disables and enables all debug functionality (except the reset
vector trap). Following a processor reset, this bit is clear so all debug functionality is
disabled. When debug functionality is disabled, the BKPT instruction becomes a loop
and external debug breaks, hardware breakpoints, and non-reset vector traps are
ignored.
3.6.4.2Halt Mode Bit (H)
The Halt Mode bit configures the debug unit for either halt mode or monitor mode.
Method Of Entry (MOE)
000: Processor Reset
001: Instruction Breakpoint Hit
010: Data Breakpoint Hit
011: BKPT Instruction Executed
100: External Debug Event Asserted
101: Vector Trap Occurred
110: Trace Buffer Full Break
111: Reserved
Trace Buffer Mode (M)
0: Wrap around mode
1: fill-once mode
Trace Buffer Enable (E)
0: Disabled
1: Enabled
0b000
0
0
unchange
d
unchange
d
unchange
d
September 2006DM
Order Number: 252480-006US91
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
3.6.4.3Vector Trap Bits (TF,TI,TD,TA,TS,TU,TR)
The Vector Trap bits allow instruction breakpoints to be set on exception vectors
without using up any of the breakpoint registers. When a bit is set, it acts as if an
instruction breakpoint was set up on the corresponding exception vector. A debug
exception is generated before the instruction in the exception vector executes.
Software running on IXP42X product line and IXC1100 control plane processors must
set the Global Enable bit and the debugger must set the Halt Mode bit and the
appropriate vector trap bit through JTAG to set up a non-reset vector trap.
To set up a reset vector trap, the debugger sets the Halt Mode bit and reset vector tr ap
bit through JTAG. The Global Enable bit does not effect the reset vector trap. A reset
vector trap can be set up before or during a processor reset. When processor reset is
de-asserted, a debug exception occurs before the instruction in the reset vector
executes.
3.6.4.4Sticky Abort Bit (SA)
The Sticky Abort bit is only valid in Halt mode. It indicates a data abort occurred within
the Special Debug State (see “Halt Mode” on page 93). Since Special Debug State
disables all exceptions, a data abort exception does not occur. However, the processor
sets the Sticky Abort bit to indicate a data abort was detected. The debugger can use
this bit to determine if a data abort was detected during the Speci al De bu g State . The
sticky abort bit must be cleared by the debug handler before exiting the debug handler .
3.6.4.5Method of Entry Bits (MOE)
The Method of Entry bits specify the cause of the most recent debug exception. When
multiple exceptions occur in parallel, the processor places the highest priority exception
(based on the priorities in Table 34) in the MOE field.
3.6.4.6Trace Buffer Mode Bit (M)
The Trace Buffer Mode bit selects one of two trace buffer modes:
• Wrap-around mode — Trace buffer fills up and wraps around until a debug
exception occurs.
• Fill-once mode — The trace buffer automatically generates a debug exception
(trace buffer full break) when it becomes full.
3.6.4.7Trace Buffer Enable Bit (E)
The Trace Buffer Enable bit enables and disables the trace buffer. Both DCSR.e and
DCSR.ge must be set to enable the trace buffer. The processor automatically clears this
bit to disable the trace buffer when a debug exception occurs. For more details on the
trace buffer refer to “Trace Buffer” on page 109.
3.6.5Debug Exceptions
A debug exception causes the processor to re-direct execution to a debug event
handling routine. IXP42X product line and IXC1100 control plane processors’ debug
architecture defines the following debug exceptions:
• Instruction breakpoint
• Data breakpoint
• Software breakpoint
• External debug break
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Processor—Intel® IXP42X product line and IXC1100 control plane processors
• Exception vector trap
• Trace-buffer full break
When a debug exception occurs, the processor’s actions depend on whether the debug
unit is configured for Halt mode or Monitor mode.
Table 34 shows the priority of debug exceptions relative to other processor exceptions.
Table 34.Event Priority
Reset1
Vector Trap2
data abort (precise)3
data bkpt4
data abort (imprecise)5
external debug break, trace-buffer full6
FIQ7
IRQ8
instruction breakpoint9
pre-fetch abort10
undef, SWI, software Bkpt11
EventPriority
3.6.5.1Halt Mode
The debugger turns on Halt mode through the JTAG interface by scanning in a value
that sets the bit in DCSR. The debugger turns off Halt mode through JTAG, either by
scanning in a new DCSR value or by a TRST. Processor reset does not effect the value
of the Halt mode bit.
When halt mode is active, the processor uses the reset vector as the debug vector. The
debug handler and exception vectors can be downloaded directly into the instruction
cache, to intercept the default vectors and reset handler, or they can be resident in
external memory . Downloading into the instruction cache allows a system with memory
problems, or no external memory, to be debugged. Refer top “Downloading Code in
ICache” on page 116 for details about downloading code into the instruction cache.
During Halt mode, software running on IXP42X product line and IXC1100 control plane
processors cannot access DCSR, or any of hardware breakpoint registers, unless the
processor is in Special Debug State (SDS), described below.
When a debug exception occurs during Halt mode, the processor takes the following
actions:
• Disables the trace buffer
• Sets DCSR.moe encoding
• Processor enters a Special Debug State (SDS)
• For data breakpoints, trace buffer full break, and external debug break:
R14_dbg = PC of the next instruction to execute + 4
for instruction breakpoints and software breakpoints and vector traps:
R14_dbg = PC of the aborted instruction + 4
• SPSR_dbg = CPSR
• CPSR[4:0] = 0b10101 (DEBUG mode)
September 2006DM
Order Number: 252480-006US93
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
• CPSR[5] = 0
•CPSR[6] = 1
•CPSR[7] = 1
•PC = 0x0
Note:When the vector table is relocated (CP15 Control Register[13] = 1), the debug vector is
relocated to 0xffff0000.
Following a debug exception, the processor switches to debug mode and enters SDS,
which allows the following special functionality:
• All events are disabled. SWI or undefined instructions have unpredictable results.
The processor ignores pre-fetch aborts, FIQ and IRQ (SDS disables FIQ and IRQ
regardless of the enable values in the CPSR). The processor reports data aborts
detected during SDS by setting the Sticky Abort bit in the DCSR, but does not
generate an exception (processor also sets up FSR and FAR as it normally would for
a data abort).
• Normally, during halt mode, software cannot write the hardware breakpoint
registers or the DCSR. However, during the SDS, software has write access to the
breakpoint registers (see “HW Breakpoint Resources” on page 95) and the DCSR
(see Table 33, “Debug Control and Status Register (DCSR)” on page 90).
• The IMMU is disabled. In halt mode, since the debug handler would typically be
downloaded directly into the IC, it would not be appropriate to do TLB accesses or
translation walks, since there may not be any external memory or if there is, the
translation table or TLB may not contain a valid mapping for the debug handler
code. To avoid these problems, the processor internally disables the IMMU during
SDS.
• The PID is disabled for instruction fetches. This prevents fetches of the debug
handler code from being remapped to a different address than where the code was
downloaded.
The SDS remains in effect regardless of the processor mode. This allows the debug
handler to switch to other modes, maintaining SDS functionality. Entering user mode
may cause unpredictable behavior. The processor exits SDS following a CPSR restore
operation.
When exiting, the debug handler should use:
subs pc, lr, #4
This restores CPSR, turns off all of SDS functionality, and branches to the target
instruction.
3.6.5.2Monitor Mode
In monitor mode, the processor handles debug exceptions like normal ARM exceptions.
If debug functionality is enabled (DCSR[31] = 1) and the processor is in Monitor mode,
debug exceptions cause either a data abort or a pre-fetch abort.
The following debug exceptions cause data aborts:
• Data breakpoint
• External debug break
• Trace-buffer full break
The following debug exceptions cause pre-fetch aborts:
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Processor—Intel® IXP42X product line and IXC1100 control plane processors
• Instruction breakpoint
• BKPT instruction
The processor ignores vector traps during monitor mode.
When an exception occurs in monitor mode, the processor takes the following actions:
• Disables the trace buffer
• Sets DCSR.moe encoding
• Sets FSR[9]
• R14_abt = PC of the next instruction to execute + 4 (for Data Aborts)
R14_abt = PC of the faulting instruction + 4 (for Prefetch Aborts)
•SPSR_abt = CPSR
• CPSR[4:0] = 0b10111 (ABORT mode)
•CPSR[5] = 0
• CPSR[6] = unchanged
•CPSR[7] = 1
• PC = 0xc (for Prefetch Aborts),
PC = 0x10 (for Data Aborts)
During abort mode, external debug breaks and trace buffer full breaks are internally
pended. When the processor exits abort mode, either through a CPSR restore or a write
directly to the CPSR, the pended debug breaks will immediately generate a debug
exception. Any pending debug breaks are cleared out when any type of debug
exception occurs.
When exiting, the debug handler should do a CPSR restore operation that branches to
the next instruction to be executed in the program under debug.
3.6.6HW Breakpoint Resources
IXP42X product line and IXC1100 control plane processors’ debug architecture defines
two instruction and two data breakpoint registers, denoted IBCR0, IBCR1, DBR0, and
DBR1.
The instruction and data address breakpoint registers are 32-bit registers. The
instruction breakpoint causes a break before execution of the target instruction. The
data breakpoint causes a break after the memory access has been issued.
In this section Modified Virtual Address (MVA) refers to the virtual address ORed with
the PID. Refer to “Register 13: Process ID” on page 84 for more details on the PID. The
processor does not OR the PID with the specified breakpoint address prior to doing
address comparison. This must be done by the programmer and written to the
breakpoint register as the MVA. This applies to data and instruction breakpoints.
3.6.6.1Instruction Breakpoints
The Debug architecture defines two instruction breakpoint registers (IBCR0 and
IBCR1). The format of these registers is shown in Table 35., Instruction Breakpoint
Address and Control Register (IBCRx). In ARM mode, the upper 30 bits contain a word
aligned MVA to break on. In Thumb mode, the upper 31 bits contain a half-word aligned
MVA to break on. In both modes, bit 0 enables and disables that instruction breakpoint
register. Enabling instruction breakpoints while debug is globally disabled (DCSR.GE=0)
may result in unpredictable behavior.
September 2006DM
Order Number: 252480-006US95
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
Table 35.Instruction Breakpoint Address and Control Register (IBCRx)
Instruction Breakpoint MVA
in ARM mode, IBCRx[1] is ignored
IBCRx Enable (E) 0 = Breakpoint disabled
1 = Breakpoint enabled
An instruction breakpoint will generate a debug exception before the instruction at the
address specified in the ICBR executes. When an instruction breakpoint occurs, the
processor sets the DBCR.moe bits to 0b001.
Software must disable the breakpoint before exiting the handler. This allows the breakpointed instruction to execute after the exception is handled.
Single step execution is accomplished using the instruction breakpoint registers and
must be completely handled in software (either on the host or by the debug handler).
3.6.6.2Data Breakp oints
IXP42X product line and IXC1100 control plane processors’ debug architecture defines
two data breakpoint registers (DBR0, DBR1). The format of the registers is shown in
DBR0 is a dedicated data address breakpoint register. DBR1 can be programmed for
one of two operations:
• Data address mask
• Second data address breakpoint
The DBCON register controls the functionality of DBR1, as well as the enables for both
DBRs. DBCON also controls what type of memory access to break on.
DBRx
DBR0: Data Breakpoint MVA
DBR1:
Data Address Mask OR
Data Breakpoint MVA
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
DBR1 Mode (M) 0: DBR1 = Data Address Breakpoint
1: DBR1 = Data Address Mask
DBR1 Enable (E1) When DBR1 = Data Address Breakpoint
0b00: DBR1 disabled
0b01: DBR1 enabled, Store only
0b10: DBR1 enabled, Any data access, load or store
0b11: DBR1 enabled, Load only
When DBR1 = Data Address Mask this field has no effect
DBR0 Enable (E0) -
0b00: DBR0 disabled
0b01: DBR0 enabled, Store only
0b10: DBR0 enabled, Any data access, load or store
0b11: DBR0 enabled, Load only
ME1E0
When DBR1 is programmed as a data address mask, it is used in conjunction with the
address in DBR0. The bits set in DBR1 are ignored by the processor when comparing
the address of a memory access with the address in DBR0. Using DBR1 as a data
address mask allows a range of addresses to generate a data breakpoint. When DBR1
is selected as a data address mask, it is unaffected by the E1 field of DBCON. The mask
is used only when DBR0 is enabled.
When DBR1 is programmed as a second data address breakpoint, it functions
independently of DBR0. In this case, the DBCON.E1 controls DBR1.
A data breakpoint is triggered if the memory access matches the access type and the
address of any byte within the memory access matches the address in DBRx. For
example, LDR triggers a breakpoint if DBCON.E0 is 0b10 or 0b11, and the address of
any of the 4 bytes accessed by the load matches the address in DBR0.
The processor does not trigger data breakpoints for the PLD instruction or any CP15,
register 7, 8, 9, or 10 functions. Any other type of memory access can trigger a data
breakpoint. For data breakpoint purposes the SWP and SWPB instructions are treated
as stores - they will not cause a data breakpoint if the breakpoint is set up to break on
loads only and an address match occurs.
On unaligned memory accesses, breakpoint address comparison is done on a wordaligned address (aligned down to word boundary).
When a memory access triggers a data breakpoint, the breakpoint is reported after the
access is issued. The memory access will not be aborted by the processor. The actual
timing of when the access completes with respect to the start of the debug handler
depends on the memory configuration.
September 2006DM
Order Number: 252480-006US97
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
On a data breakpoint, the processor generates a debug exception and re-directs
execution to the debug handler before the next instruction executes. The processor
reports the data breakpoint by setting the DCSR.MOE to 0b010. The link register of a
data breakpoint is always PC (of the next instruction to execute) + 4, regardless of
whether the processor is configured for monitor mode or halt mode.
3.6.7Software Breakpoints
Mnemonics:BKPT (See ARM* Architecture Reference Manual, ARMv5T)
Operation:If DCSR[31] = 0, BKPT is a nop;
If DCSR[31] =1, BKPT causes a debug exception
The processor handles the software breakpoint as described in “Debug Exceptions” on
page 92.
3.6.8Transmit/Receive Control Register
Communications between the debug handler and debugger are controlled through
handshaking bits that ensures the debugger and debug handler make synchronized
accesses to TX and RX. The debugger side of the handshaking is accessed through the
DBGTX (“DBGTX JTAG Register” on page 105) and DBGRX (“DBGRX JTAG Register” on
page 106) JTAG Data Registers, depending on the direction of the data transfer.The
debug handler uses separate handshaking bits in TXRXCTRL register for accessing TX
and RX.
(TXRXCTRL)
The TXRXCTRL register also contains two other bits that support high-speed download.
One bit indicates an overflow condition that occurs when the debugger attempts to
write the RX register before the debug handler has read the previous data written to
RX. The other bit is used by the debug handler as a branch flag during high-speed
download.
All of the bits in the TXRXCTRL register are placed such that they can be read directly
into the CC flags in the CPSR with an MRC (with Rd = PC). The subsequent instruction
can then conditionally execute based on the updated CC value
Processor—Intel® IXP42X product line and IXC1100 control plane processors
3.6.8.1RX Register Ready Bit (RR)
The debugger and debug handler use the RR bit to synchronize accesses to RX.
Normally, the debugger and debug handler use a handshaking scheme that requires
both sides to poll the RR bit. To support higher download performance for large
amounts of data, a high-speed download handshaking scheme can be used in which
only the debug handler polls the RR bit before accessing the RX register, while the
debugger continuously downloads data.
Table 39 shows the normal handshaking used to access the RX register.
Table 39.Normal RX Ha ndshaking
Debugger Actions
• Debugger wants to send data to debug handler.
• Before writing new data to the RX register, the debugger polls RR through JTAG until the bit is cleared.
• After the debugger reads a ‘0’ from the RR bit, it scans data into JTAG to write to the RX register and sets
the valid bit. The write to the RX register automatically sets the RR bit.
Debug Handler Actions
• Debug handler is expecting data from the debugger.
• The debug handler polls the RR bit until it is set, indicating data in the RX register is valid.
• Once the RR bit is set, the debug handler reads the new data from the RX register. The read operation
automatically clears the RR bit.
When data is being downloaded by the debugger, part of the normal handshaking can
be bypassed to allow the download rate to be increased. Table 40 shows the
handshaking used when the debugger is doing a high-speed download. Note that
before the high-speed download can start, both the debugger and debug handler must
be synchronized, such that the debug handler is executing a routine that supports the
high-speed download.
Although it is similar to the normal handshaking, the debugger polling of RR is
bypassed with the assumption that the debug handler can read the previous data from
RX before the debugger can scan in the new data.
Table 40.High-Speed Download Handshaking States
Debugger Actions
• Debugger wants to transfer code into IXP42X product line and IXC1100 control plane processors
memory.
• Prior to starting download, the debugger must polls RR bit until it is clear. Once the RR bit is clear,
indicating the debug handler is ready, the debugger starts the download.
• The debugger scans data into JTAG to write to the RX register with the download bit and the valid bit set.
Following the write to RX, the RR bit and D bit are automatically set in TXRXCTRL.
• Without polling of RR to see whether the debug handler has read the data ju st scanned in, the debugger
continues scanning in new data into JTAG for RX, with the download bit and the valid bit set.
• An overflow condition occurs if the debug handler does not read the previous data before the debugger
completes scanning in the new data, (See “Overflow Flag (OV)” on page 100 for more details on the
overflow condition).
• After completing the download, the debugger clears the D bit allowing the debug handler to exit the
download loop.
Debug Handler Actions
• Debug is handler in a routine waiting to write data out to memory. The routine loops based on the D bit in
TXRXCTRL.
• The debug handler polls the RR bit until it is set. It then reads the Rx register , and writes it out to memory .
The handler loops, repeating these operations until the debugger clears the D bit.
’ system
September 2006DM
Order Number: 252480-006US99
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
3.6.8.2Overflow Flag (OV)
The Overflow flag is a sticky flag that is set when the debugger writes to the RX register
while the RR bit is set.
The flag is used during high-speed download to indicate that some data was lost. The
assumption during high-speed download is that the time it takes for the debugger to
shift in the next data word is greater than the time necessary for the debug handler to
process the previous data word. So, before the debugger shifts in the next data word,
the handler will be polling for that data.
However, if the handler incurs stalls that are long enough such that the handler is still
processing the previous data when the debugger completes shifting in the next data
word, an overflow condition occurs and the OV bit is set.
Once set, the overflow flag will remain set, until cleared by a write to TXRXCTRL with
an MCR. After the debugger completes the download, it can examine the OV bit to
determine if an overflow occurred. The debug handler software is responsible for saving
the address of the last valid store before the overflow occurred.
3.6.8.3Download Flag (D)
The value of the download flag is set by the debugger through JTAG. This flag is used
during high-speed download to replace a loop counter.
The download flag becomes especially useful when an overflow occurs. If a loop
counter is used, and an overflow occurs, the debug handler cannot determine how
many data words overflowed. Therefore the debug handler counter may get out of sync
with the debugger — the debugger may finish downloading the data, but the debug
handler counter may indicate there is more data to be downloaded - this may result in
unpredictable behavior of the debug handler.
Using the download flag, the debug handler loops until the debugger clears the flag.
Therefore, when doing a high-speed download, for each data word downloaded, the
debugger should set the D bit.
3.6.8.4TX Register Ready Bit (TR)
The debugger and debug handler use the TR bit to synchronize accesses to the TX
register. The debugger and debug handler must poll the TR bit before accessing the TX
register. Table 41 shows the handshaking used to access the TX register.
Table 41.TX Handshaking
Debugger Actions
• Debugger is expecting data from the debu g handler.
• Before reading data from the TX register, the debugger polls the TR bit through JTAG until the bit is set.
NOTE: while polling TR, the debugger must scan out the TR bit and the TX register data.
• Reading a ‘1’ from the TR bit, indicates that the TX data scanned out is valid
• The action of scanning out data when the TR bit is set, automatically clears TR.
Debug Handler Actions
• Debug handler wants to send data to the debugger (in response to a previous request).
• The debug handler polls the TR bit to determine when the TX register is empty (any previous data has
been read out by the debugger). The handler polls th e TR bit until it is clear.
• Once the TR bit is clear, the debug handler writes new data to the TX register. The write operation
automatically sets the TR bit.
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor