Intel 80200 User Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
Developer’s Manual
March, 2003
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
Information in this document is provided in connection with Intel® products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel's Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel® products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice.Intel may make changes to specifications and product descriptions at any time, without notice.
Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.
The Intel® 80200 Processor may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an ordering number and are referenced in this document, or other Intel literature may be obtained by calling 1-800-548-4725 or by visiting Intel's website at http://www.intel.com.
Copyright© Intel Corporation, 2003
*Other brands and names are the property of their respective owners.
ARM and StrongARM are registered trademarks of ARM, Ltd.
ii March, 2003 Developer’s Manual
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
Contents
1 Introduction .............................................................................................. 1
1.1 Intel® 80200 Processor based on Intel® XScale™ Microarchitecture High-Level Overview .........1
1.1.1 ARM* Architecture Compliance ................................................................................... 1
1.1.2 Features....................................................................................................................... 2
1.1.2.1 Multiply/Accumulate (MAC) ......................................................................2
1.1.2.2 Memory Management ............................................................................... 3
1.1.2.3 Instruction Cache...................................................................................... 3
1.1.2.4 Branch Target Buffer ................................................................................ 3
1.1.2.5 Data Cache............................................................................................... 3
1.1.2.6 Power Management..................................................................................4
1.1.2.7 Interrupt Controller ....................................................................................4
1.1.2.8 Bus Controller ...........................................................................................4
1.1.2.9 Performance Monitoring ........................................................................... 4
1.1.2.10 Debug ....................................................................................................... 4
1.1.2.11 JTAG.........................................................................................................4
1.2 Terminology and Conventions ...................................................................................................... 5
1.2.1 Number Representation............................................................................................... 5
1.2.2 Terminology and Acronyms .........................................................................................5
1.3 Other Relevant Documents ..........................................................................................................6
2 Programming Model ................................................................................ 1
2.1 ARM* Architecture Compliance .................................................................................................... 1
2.2 ARM* Architecture Implementation Options ................................................................................. 1
2.2.1 Big Endian versus Little Endian ................................................................................... 1
2.2.2 26-Bit Code .................................................................................................................. 1
2.2.3 Thumb* ........................................................................................................................ 1
2.2.4 ARM* DSP-Enhanced Instruction Set.......................................................................... 2
2.2.5 Base Register Update.................................................................................................. 2
2.3 Extensions to ARM* Architecture..................................................................................................3
2.3.1 DSP Coprocessor 0 (CP0)...........................................................................................3
2.3.1.1 Multiply With Internal Accumulate Format ................................................ 4
2.3.1.2 Internal Accumulator Access Format........................................................ 7
2.3.2 New Page Attributes .................................................................................................... 9
2.3.3 Additions to CP15 Functionality ................................................................................. 11
2.3.4 Event Architecture .....................................................................................................12
2.3.4.1 Exception Summary................................................................................12
2.3.4.2 Event Priority ..........................................................................................12
2.3.4.3 Prefetch Aborts .......................................................................................13
2.3.4.4 Data Aborts............................................................................................. 14
2.3.4.5 Events from Preload Instructions ............................................................ 16
2.3.4.6 Debug Events ......................................................................................... 16
3 Memory Management .............................................................................. 1
3.1 Overview....................................................................................................................................... 1
3.2 Architecture Model........................................................................................................................ 2
3.2.1 Version 4 vs. Version 5................................................................................................ 2
3.2.2 Memory Attributes........................................................................................................2
Developer’s Manual March, 2003 iii
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
3.2.2.1 Page (P) Attribute Bit ................................................................................ 2
3.2.2.2 Cacheable (C), Bufferable (B), and eXtension (X) Bits ............................ 2
3.2.2.3 Instruction Cache...................................................................................... 2
3.2.2.4 Data Cache and Write Buffer.................................................................... 3
3.2.2.5 Details on Data Cache and Write Buffer Behavior ................................... 4
3.2.2.6 Memory Operation Ordering..................................................................... 4
3.2.3 Exceptions ................................................................................................................... 4
3.3 Interaction of the MMU, Instruction Cache, and Data Cache ....................................................... 5
3.4 Control .......................................................................................................................................... 6
3.4.1 Invalidate (Flush) Operation ........................................................................................ 6
3.4.2 Enabling/Disabling....................................................................................................... 6
3.4.3 Locking Entries ............................................................................................................7
3.4.4 Round-Robin Replacement Algorithm ......................................................................... 9
4 Instruction Cache ..................................................................................... 1
4.1 Overview....................................................................................................................................... 1
4.2 Operation...................................................................................................................................... 2
4.2.1 Operation When Instruction Cache is Enabled............................................................ 2
4.2.2 Operation When The Instruction Cache Is Disabled.................................................... 2
4.2.3 Fetch Policy ................................................................................................................. 3
4.2.4 Round-Robin Replacement Algorithm ......................................................................... 3
4.2.5 Parity Protection ..........................................................................................................4
4.2.6 Instruction Fetch Latency............................................................................................. 5
4.2.7 Instruction Cache Coherency ...................................................................................... 5
4.3 Instruction Cache Control ............................................................................................................. 6
4.3.1 Instruction Cache State at RESET .............................................................................. 6
4.3.2 Enabling/Disabling....................................................................................................... 6
4.3.3 Invalidating the Instruction Cache................................................................................ 7
4.3.4 Locking Instructions in the Instruction Cache ..............................................................8
4.3.5 Unlocking Instructions in the Instruction Cache........................................................... 9
5 Branch Target Buffer ............................................................................... 1
5.1 Branch Target Buffer (BTB) Operation .........................................................................................1
5.1.1 Reset ........................................................................................................................... 2
5.1.2 Update Policy............................................................................................................... 2
5.2 BTB Control .................................................................................................................................. 3
5.2.1 Disabling/Enabling....................................................................................................... 3
5.2.2 Invalidation................................................................................................................... 3
6 Data Cache................................................................................................ 1
6.1 Overviews ..................................................................................................................................... 1
6.1.1 Data Cache Overview.................................................................................................. 1
6.1.2 Mini-Data Cache Overview .......................................................................................... 3
6.1.3 Write Buffer and Fill Buffer Overview........................................................................... 4
6.2 Data Cache and Mini-Data Cache Operation ............................................................................... 5
6.2.1 Operation When Caching is Enabled........................................................................... 5
6.2.2 Operation When Data Caching is Disabled ................................................................. 5
6.2.3 Cache Policies .............................................................................................................5
6.2.3.1 Cacheability .............................................................................................. 5
6.2.3.2 Read Miss Policy ...................................................................................... 6
iv March, 2003 Developer’s Manual
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
6.2.3.3 Write Miss Policy ...................................................................................... 7
6.2.3.4 Write-Back Versus Write-Through ............................................................ 7
6.2.4 Round-Robin Replacement Algorithm ......................................................................... 8
6.2.5 Parity Protection ..........................................................................................................8
6.2.6 Atomic Accesses .........................................................................................................8
6.3 Data Cache and Mini-Data Cache Control ...................................................................................9
6.3.1 Data Memory State After Reset................................................................................... 9
6.3.2 Enabling/Disabling ....................................................................................................... 9
6.3.3 Invalidate & Clean Operations .....................................................................................9
6.3.3.1 Global Clean and Invalidate Operation ...................................................10
6.4 Re-configuring the Data Cache as Data RAM ............................................................................12
6.5 Write Buffer/Fill Buffer Operation and Control ............................................................................ 16
7 Configuration ........................................................................................... 1
7.1 Overview....................................................................................................................................... 1
7.2 CP15 Registers............................................................................................................................. 4
7.2.1 Register 0: ID and Cache Type Registers ................................................................... 5
7.2.2 Register 1: Control and Auxiliary Control Registers .................................................... 7
7.2.3 Register 2: Translation Table Base Register ............................................................... 9
7.2.4 Register 3: Domain Access Control Register .............................................................. 9
7.2.5 Register 4: Reserved ................................................................................................... 9
7.2.6 Register 5: Fault Status Register............................................................................... 10
7.2.7 Register 6: Fault Address Register ............................................................................ 10
7.2.8 Register 7: Cache Functions .....................................................................................11
7.2.9 Register 8: TLB Operations ....................................................................................... 13
7.2.10 Register 9: Cache Lock Down ...................................................................................14
7.2.11 Register 10: TLB Lock Down ..................................................................................... 15
7.2.12 Register 11-12: Reserved.......................................................................................... 15
7.2.13 Register 13: Process ID .............................................................................................16
7.2.13.1 The PID Register Affect On Addresses .................................................. 16
7.2.14 Register 14: Breakpoint Registers .............................................................................17
7.2.15 Register 15: Coprocessor Access Register ............................................................... 18
7.3 CP14 Registers........................................................................................................................... 20
7.3.1 Registers 0-3: Performance Monitoring .....................................................................20
7.3.2 Register 4-5: Reserved ..............................................................................................20
7.3.3 Registers 6-7: Clock and Power Management .......................................................... 21
7.3.4 Registers 8-15: Software Debug................................................................................ 22
8 System Management ............................................................................... 1
8.1 Clocking ........................................................................................................................................1
8.2 Processor Reset ........................................................................................................................... 3
8.2.1 Reset Sequence .......................................................................................................... 3
8.2.2 Reset Effect on Outputs............................................................................................... 4
8.3 Power Management...................................................................................................................... 5
8.3.1 Invocation .................................................................................................................... 5
8.3.2 Signals Associated with Power Management .............................................................. 5
9 Interrupts .................................................................................................. 1
9.1 Introduction ...................................................................................................................................1
9.2 External Interrupts ........................................................................................................................ 1
Developer’s Manual March, 2003 v
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
9.3 Programmer Model ....................................................................................................................... 2
9.3.1 INTCTL ........................................................................................................................ 3
9.3.2 INTSRC ....................................................................................................................... 4
9.3.3 INTSTR........................................................................................................................ 5
10 External Bus ............................................................................................. 1
10.1 General Description ...................................................................................................................... 1
10.2 Signal Description......................................................................................................................... 3
10.2.1 Request Bus ................................................................................................................ 4
10.2.1.1 Intel
10.2.2 Data Bus ...................................................................................................................... 6
10.2.3 Critical Word First ........................................................................................................ 7
10.2.4 Configuration Pins ....................................................................................................... 8
10.2.5 Multimaster Support..................................................................................................... 9
10.2.6 Abort .......................................................................................................................... 11
10.2.7 ECC ........................................................................................................................... 12
10.2.8 Big Endian System Configuration .............................................................................. 13
10.3 Examples.................................................................................................................................... 14
10.3.1 Simple Read Word..................................................................................................... 14
10.3.2 Read Burst, No Critical Word First............................................................................. 15
10.3.3 Read Burst, Critical Word First Data Return.............................................................. 16
10.3.4 Word Write................................................................................................................. 17
10.3.5 Two Word Coalesced Write ....................................................................................... 18
10.3.5.1 Write Burst .............................................................................................. 19
10.3.6 Write Burst, Coalesced.............................................................................................. 20
10.3.7 Pipelined Accesses.................................................................................................... 21
10.3.8 Locked Access........................................................................................................... 22
10.3.9 Aborted Access.......................................................................................................... 23
10.3.10 Hold ........................................................................................................................... 24
®
80200 Processor Use of the Request Bus...................................... 4
11 Bus Controller ..........................................................................................1
11.1 Introduction ................................................................................................................................... 1
11.2 ECC .............................................................................................................................................. 1
11.3 Error Handling .............................................................................................................................. 2
11.3.1 Bus Aborts ................................................................................................................... 2
11.3.2 ECC Errors ..................................................................................................................3
11.4 Programmer Model ....................................................................................................................... 5
11.4.1 BCU Control Registers ................................................................................................ 5
11.4.2 ECC Error Registers .................................................................................................... 9
12 Performance Monitoring.......................................................................... 1
12.1 Overview....................................................................................................................................... 1
12.2 Clock Counter (CCNT; CP14 - Register 1) ................................................................................... 2
12.3 Performance Count Registers (PMN0 - PMN1; CP14 - Register 2 and 3, Respectively)............. 3
12.3.1 Extending Count Duration Beyond 32 Bits .................................................................. 3
12.4 Performance Monitor Control Register (PMNC) ........................................................................... 4
12.4.1 Managing PMNC ......................................................................................................... 5
12.5 Performance Monitoring Events ................................................................................................... 6
12.5.1 Instruction Cache Efficiency Mode .............................................................................. 7
12.5.2 Data Cache Efficiency Mode ....................................................................................... 8
vi March, 2003 Developer’s Manual
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
12.5.3 Instruction Fetch Latency Mode................................................................................... 8
12.5.4 Data/Bus Request Buffer Full Mode ............................................................................ 9
12.5.5 Stall/Writeback Statistics ............................................................................................. 9
12.5.6 Instruction TLB Efficiency Mode ................................................................................ 10
12.5.7 Data TLB Efficiency Mode .........................................................................................10
12.6 Multiple Performance Monitoring Run Statistics ......................................................................... 11
12.7 Examples .................................................................................................................................... 12
13 Software Debug........................................................................................ 1
13.1 Definitions .....................................................................................................................................1
13.2 Debug Registers ...........................................................................................................................1
13.3 Introduction ...................................................................................................................................2
13.3.1 Halt Mode ....................................................................................................................2
13.3.2 Monitor Mode ............................................................................................................... 2
13.4 Debug Control and Status Register (DCSR) ................................................................................3
13.4.1 Global Enable Bit (GE) ................................................................................................ 4
13.4.2 Halt Mode Bit (H) .........................................................................................................4
13.4.3 Vector Trap Bits (TF,TI,TD,TA,TS,TU,TR) .................................................................. 5
13.4.4 Sticky Abort Bit (SA) .................................................................................................... 5
13.4.5 Method of Entry Bits (MOE)......................................................................................... 5
13.4.6 Trace Buffer Mode Bit (M) ...........................................................................................5
13.4.7 Trace Buffer Enable Bit (E).......................................................................................... 5
13.5 Debug Exceptions.........................................................................................................................6
13.5.1 Halt Mode ....................................................................................................................6
13.5.2 Monitor Mode ............................................................................................................... 8
13.6 HW Breakpoint Resources ........................................................................................................... 9
13.6.1 Instruction Breakpoints ................................................................................................ 9
13.6.2 Data Breakpoints .......................................................................................................10
13.7 Software Breakpoints.................................................................................................................. 11
13.8 Transmit/Receive Control Register (TXRXCTRL) ......................................................................12
13.8.1 RX Register Ready Bit (RR) ......................................................................................13
13.8.2 Overflow Flag (OV) .................................................................................................... 14
13.8.3 Download Flag (D) .....................................................................................................14
13.8.4 TX Register Ready Bit (TR) ....................................................................................... 15
13.8.5 Conditional Execution Using TXRXCTRL .................................................................. 15
13.9 Transmit Register (TX) ............................................................................................................... 16
13.10 Receive Register (RX) ................................................................................................................ 16
13.11 Debug JTAG Access ..................................................................................................................17
13.11.1 SELDCSR JTAG Command ......................................................................................17
13.11.2 SELDCSR JTAG Register .........................................................................................18
13.11.2.1 DBG.HLD_RST....................................................................................... 19
13.11.2.2 DBG.BRK................................................................................................ 20
13.11.2.3 DBG.DCSR .............................................................................................20
13.11.3 DBGTX JTAG Command........................................................................................... 20
13.11.4 DBGTX JTAG Register .............................................................................................. 21
13.11.5 DBGRX JTAG Command ..........................................................................................21
13.11.6 DBGRX JTAG Register ............................................................................................. 22
13.11.6.1 RX Write Logic........................................................................................ 23
13.11.6.2 DBGRX Data Register ............................................................................ 24
13.11.6.3 DBG.RR ..................................................................................................24
Developer’s Manual March, 2003 vii
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
13.11.6.4 DBG.V .................................................................................................... 25
13.11.6.5 DBG.RX .................................................................................................. 25
13.11.6.6 DBG.D .................................................................................................... 25
13.11.6.7 DBG.FLUSH ........................................................................................... 25
13.11.7 Debug JTAG Data Register Reset Values................................................................. 25
13.12 Trace Buffer ................................................................................................................................ 26
13.12.1 Trace Buffer CP Registers......................................................................................... 26
13.12.1.1 Checkpoint Registers ............................................................................. 26
13.12.1.2 Trace Buffer Register (TBREG).............................................................. 27
13.13 Trace Buffer Entries.................................................................................................................... 28
13.13.1 Message Byte ............................................................................................................ 28
13.13.1.1 Exception Message Byte ........................................................................ 29
13.13.1.2 Non-exception Message Byte................................................................. 30
13.13.1.3 Address Bytes ........................................................................................ 31
13.13.2 Trace Buffer Usage.................................................................................................... 32
13.14 Downloading Code in the ICache ............................................................................................... 34
13.14.1 LDIC JTAG Command............................................................................................... 34
13.14.2 LDIC JTAG Data Register ......................................................................................... 35
13.14.3 LDIC Cache Functions............................................................................................... 36
13.14.4 Loading IC During Reset ........................................................................................... 38
13.14.4.1 Loading IC During Cold Reset for Debug ............................................... 39
13.14.4.2 Loading IC During a Warm Reset for Debug .......................................... 41
13.14.5 Dynamically Loading IC After Reset .......................................................................... 43
13.14.5.1 Dynamic Code Download Synchronization ............................................ 45
13.14.6 Mini Instruction Cache Overview ............................................................................... 46
13.15 Halt Mode Software Protocol...................................................................................................... 47
13.15.1 Starting a Debug Session .......................................................................................... 47
13.15.1.1 Setting up Override Vector Tables ......................................................... 47
13.15.1.2 Placing the Handler in Memory .............................................................. 48
13.15.2 Implementing a Debug Handler ................................................................................. 49
13.15.2.1 Debug Handler Entry .............................................................................. 49
13.15.2.2 Debug Handler Restrictions.................................................................... 49
13.15.2.3 Dynamic Debug Handler ........................................................................ 50
13.15.2.4 High-Speed Download............................................................................ 52
13.15.3 Ending a Debug Session ........................................................................................... 53
13.16 Software Debug Notes/Errata..................................................................................................... 54
14 Performance Considerations ..................................................................1
14.1 Interrupt Latency........................................................................................................................... 1
14.2 Branch Prediction ......................................................................................................................... 2
14.3 Addressing Modes ........................................................................................................................ 2
14.4 Instruction Latencies..................................................................................................................... 3
14.4.1 Performance Terms ..................................................................................................... 3
14.4.2 Branch Instruction Timings .......................................................................................... 4
14.4.3 Data Processing Instruction Timings ........................................................................... 5
14.4.4 Multiply Instruction Timings ......................................................................................... 6
14.4.5 Saturated Arithmetic Instructions................................................................................. 8
14.4.6 Status Register Access Instructions ............................................................................ 8
14.4.7 Load/Store Instructions................................................................................................ 8
14.4.8 Semaphore Instructions............................................................................................... 9
14.4.9 Coprocessor Instructions ............................................................................................. 9
viii March, 2003 Developer’s Manual
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
14.4.10 Miscellaneous Instruction Timing................................................................................. 9
14.4.11 Thumb* Instructions ..................................................................................................... 9
A Compatibility: Intel® 80200 Processor vs. SA-110................................ 1
A.1 Introduction ...................................................................................................................................1
A.2 Summary ......................................................................................................................................1
A.3 Architecture Deviations ................................................................................................................. 3
A.3.1 Read Buffer...................................................................................................................... 3
A.3.2 26-bit Mode...................................................................................................................... 3
A.3.3 Cacheable (C) and Bufferable (B) Encoding ................................................................... 3
A.3.4 Write Buffer Behavior....................................................................................................... 4
A.3.5 External Aborts ................................................................................................................ 4
A.3.6 Performance Differences ................................................................................................. 5
A.3.7 System Control Coprocessor........................................................................................... 5
A.3.8 New Instructions and Instruction Formats .......................................................................5
A.3.9 Augmented Page Table Descriptors................................................................................ 5
B Optimization Guide .................................................................................. 1
B.1 Introduction ...................................................................................................................................1
B.1.1 About This Guide ............................................................................................................. 1
B.2 Intel
B.3 Basic Optimizations ...................................................................................................................... 9
B.4 Cache and Prefetch Optimizations .............................................................................................17
®
80200 Processor Pipeline...................................................................................................2
B.2.1 General Pipeline Characteristics .....................................................................................2
B.2.1.1. Number of Pipeline Stages ................................................................................. 2
B.2.1.2. Intel
®
80200 Processor Pipeline Organization.................................................... 3
B.2.1.3. Out Of Order Completion ....................................................................................4
B.2.1.4. Register Scoreboarding ...................................................................................... 4
B.2.1.5. Use of Bypassing ................................................................................................ 4
B.2.2 Instruction Flow Through the Pipeline .............................................................................5
B.2.2.1. ARM* V5 Instruction Execution........................................................................... 5
B.2.2.2. Pipeline Stalls ..................................................................................................... 5
B.2.3 Main Execution Pipeline ..................................................................................................6
B.2.3.1. F1 / F2 (Instruction Fetch) Pipestages................................................................ 6
B.2.3.2. ID (Instruction Decode) Pipestage...................................................................... 6
B.2.3.3. RF (Register File / Shifter) Pipestage ................................................................. 7
B.2.3.4. X1 (Execute) Pipestage ...................................................................................... 7
B.2.3.5. X2 (Execute 2) Pipestage ................................................................................... 7
B.2.3.6. WB (write-back) ..................................................................................................7
B.2.4 Memory Pipeline ..............................................................................................................8
B.2.4.1. D1 and D2 Pipestage.......................................................................................... 8
B.2.5 Multiply/Multiply Accumulate (MAC) Pipeline .................................................................. 8
B.2.5.1. Behavioral Description ........................................................................................ 8
B.3.1 Conditional Instructions ...................................................................................................9
B.3.1.1. Optimizing Condition Checks.............................................................................. 9
B.3.1.2. Optimizing Branches......................................................................................... 10
B.3.1.3. Optimizing Complex Expressions ..................................................................... 12
B.3.2 Bit Field Manipulation ....................................................................................................13
B.3.3 Optimizing the Use of Immediate Values.......................................................................14
B.3.4 Optimizing Integer Multiply and Divide .......................................................................... 15
B.3.5 Effective Use of Addressing Modes............................................................................... 16
Developer’s Manual March, 2003 ix
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
B.4.1 Instruction Cache........................................................................................................... 17
B.4.1.1. Cache Miss Cost............................................................................................... 17
B.4.1.2. Round Robin Replacement Cache Policy......................................................... 17
B.4.1.3. Code Placement to Reduce Cache Misses ...................................................... 17
B.4.1.4. Locking Code into the Instruction Cache .......................................................... 18
B.4.2 Data and Mini Cache ..................................................................................................... 19
B.4.2.1. Non Cacheable Regions ................................................................................... 19
B.4.2.2. Write-through and Write-back Cached Memory Regions ................................. 19
B.4.2.3. Read Allocate and Read-write Allocate Memory Regions ................................ 20
B.4.2.4. Creating On-chip RAM...................................................................................... 20
B.4.2.5. Mini-data Cache................................................................................................ 21
B.4.2.6. Data Alignment ................................................................................................. 22
B.4.2.7. Literal Pools ...................................................................................................... 23
B.4.3 Cache Considerations ................................................................................................... 24
B.4.3.1. Cache Conflicts, Pollution and Pressure .......................................................... 24
B.4.3.2. Memory Page Thrashing .................................................................................. 24
B.4.4 Prefetch Considerations ................................................................................................ 25
B.4.4.1. Prefetch Distances in the Intel
®
80200 Processor............................................ 25
B.4.4.2. Prefetch Loop Scheduling................................................................................. 27
B.4.4.3. Prefetch Loop Limitations ................................................................................. 27
B.4.4.4. Compute vs. Data Bus Bound .......................................................................... 27
B.4.4.5. Low Number of Iterations.................................................................................. 27
B.4.4.6. Bandwidth Limitations ....................................................................................... 28
B.4.4.7. Cache Memory Considerations ........................................................................ 29
B.4.4.8. Cache Blocking................................................................................................. 31
B.4.4.9. Prefetch Unrolling ............................................................................................. 31
B.4.4.10.Pointer Prefetch .............................................................................................. 32
B.4.4.11.Loop Interchange ............................................................................................ 33
B.4.4.12.Loop Fusion .................................................................................................... 33
B.4.4.13.Prefetch to Reduce Register Pressure............................................................ 34
B.5 Instruction Scheduling ................................................................................................................ 35
B.5.1 Scheduling Loads .......................................................................................................... 35
B.5.1.1. Scheduling Load and Store Double (LDRD/STRD) .......................................... 37
B.5.1.2. Scheduling Load and Store Multiple (LDM/STM) ............................................. 38
B.5.2 Scheduling Data Processing Instructions ...................................................................... 39
B.5.3 Scheduling Multiply Instructions .................................................................................... 40
B.5.4 Scheduling SWP and SWPB Instructions...................................................................... 41
B.5.5 Scheduling the MRA and MAR Instructions (MRRC/MCRR)......................................... 42
B.5.6 Scheduling the MIA and MIAPH Instructions................................................................. 43
B.5.7 Scheduling MRS and MSR Instructions......................................................................... 44
B.5.8 Scheduling CP15 Coprocessor Instructions .................................................................. 44
B.6 Optimizing C Libraries ................................................................................................................ 45
B.7 Optimizations for Size................................................................................................................. 45
B.7.1 Space/Performance Trade Off....................................................................................... 45
B.7.1.1. Multiple Word Load and Store .......................................................................... 45
B.7.1.2. Use of Conditional Instructions ......................................................................... 45
B.7.1.3. Use of PLD Instructions .................................................................................... 45
C Test Features ............................................................................................ 1
C.1 Introduction................................................................................................................................... 1
C.2 JTAG - IEEE1149.1 ...................................................................................................................... 1
C.2.1 Boundary Scan Architecture ............................................................................................ 2
x March, 2003 Developer’s Manual
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
C.2.2 TAP Pins.......................................................................................................................... 3
C.2.3 Instruction Register (IR)................................................................................................... 4
C.2.3.1. Boundary-Scan Instruction Set ........................................................................... 4
C.2.4 TAP Test Data Registers .................................................................................................6
C.2.4.1. Device Identification Register ............................................................................. 6
C.2.4.2. Bypass Register.................................................................................................. 6
C.2.4.3. Boundary-Scan Register..................................................................................... 6
C.2.5 TAP Controller ................................................................................................................. 7
C.2.5.1. Test Logic Reset State ....................................................................................... 8
C.2.5.2. Run-Test/Idle State............................................................................................. 8
C.2.5.3. Select-DR-Scan State......................................................................................... 8
C.2.5.4. Capture-DR State ............................................................................................... 8
C.2.5.5. Shift-DR State.....................................................................................................9
C.2.5.6. Exit1-DR State .................................................................................................... 9
C.2.5.7. Pause-DR State..................................................................................................9
C.2.5.8. Exit2-DR State .................................................................................................... 9
C.2.5.9. Update-DR State .............................................................................................. 10
C.2.5.10.Select-IR Scan State....................................................................................... 10
C.2.5.11.Capture-IR State ............................................................................................. 10
C.2.5.12.Shift-IR State ................................................................................................... 10
C.2.5.13.Exit1-IR State .................................................................................................. 11
C.2.5.14.Pause-IR State................................................................................................ 11
C.2.5.15.Exit2-IR State .................................................................................................. 11
C.2.5.16.Update-IR State .............................................................................................. 11
C.2.5.17.Boundary-Scan Example ................................................................................12
Developer’s Manual March, 2003 xi
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
Figures
1-1 Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Features ........................................... 2
3-1 Example of Locked Entries in TLB.............................................................................................................. 9
4-1 Instruction Cache Organization .................................................................................................................... 1
4-2 Locked Line Effect on Round Robin Replacement...................................................................................... 8
5-1 BTB Entry..................................................................................................................................................... 1
5-2 Branch History.............................................................................................................................................. 2
6-1 Data Cache Organization.............................................................................................................................. 2
6-2 Mini-Data Cache Organization..................................................................................................................... 3
6-3 Locked Line Effect on Round Robin Replacement.................................................................................... 15
8-1 Reset Sequence ............................................................................................................................................. 3
8-2 Pin State at Reset .......................................................................................................................................... 4
9-1 Interrupt Controller Block Diagram ............................................................................................................. 2
10-1 Typical System ............................................................................................................................................. 1
10-2 Alternate Configuration................................................................................................................................ 2
10-3 Big Endian Lane Swapping on a 64-bit Bus............................................................................................... 13
10-4 Basic Read Timing ..................................................................................................................................... 14
10-5 Read Burst, No CWF.................................................................................................................................. 15
10-6 Read Burst, CWF........................................................................................................................................ 16
10-7 Basic Word Write ....................................................................................................................................... 17
10-8 Two Word Coalesced Write ....................................................................................................................... 18
10-9 Four Word Eviction Write.......................................................................................................................... 19
10-10 Four Word Coalesced Write Burst ............................................................................................................. 20
10-11 Pipeline Example........................................................................................................................................ 21
10-12 Locked Access............................................................................................................................................ 22
10-13 Aborted Access........................................................................................................................................... 23
10-14 Hold Assertion............................................................................................................................................ 24
13-1 SELDCSR Hardware.................................................................................................................................. 18
13-2 SELDCSR Data Register............................................................................................................................ 19
13-3 DBGTX Hardware...................................................................................................................................... 21
13-4 DBGRX Hardware...................................................................................................................................... 22
13-5 RX Write Logic .......................................................................................................................................... 23
13-6 DBGRX Data Register ............................................................................................................................... 24
13-7 Message Byte Formats................................................................................................................................ 28
13-8 Indirect Branch Entry Address Byte Organization..................................................................................... 31
13-9 High Level View of Trace Buffer............................................................................................................... 32
13-10 LDIC JTAG Data Register Hardware......................................................................................................... 35
13-11 Format of LDIC Cache Functions .............................................................................................................. 37
13-12 Code Download During a Cold Reset For Debug ...................................................................................... 39
13-13 Code Download During a Warm Reset For Debug.................................................................................... 41
13-14 Downloading Code in IC During Program Execution................................................................................ 43
B-1 Intel
C-1 Test Access Port Block Diagram.................................................................................................................. 2
C-2 TAP Controller State Diagram ..................................................................................................................... 7
C-3 JTAG Example ........................................................................................................................................... 13
C-4 Timing Diagram Illustrating the Loading of Instruction Register..............................................................14
C-5 Timing Diagram Illustrating the Loading of Data Register........................................................................ 15
®
80200 Processor RISC Superpipeline................................................................................................ 3
xii March, 2003 Developer’s Manual
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
Tables
2-1 Multiply with Internal Accumulate Format...................................................................................................4
2-2 MIA{<cond>} acc0, Rm, Rs.........................................................................................................................4
2-3 MIAPH{<cond>} acc0, Rm, Rs....................................................................................................................5
2-4 MIAxy{<cond>} acc0, Rm, Rs.....................................................................................................................6
2-5 Internal Accumulator Access Format............................................................................................................7
2-6 MAR{<cond>} acc0, RdLo, RdHi................................................................................................................8
2-7 MRA{<cond>} RdLo, RdHi, acc0................................................................................................................8
2-9 Second-level Descriptors for Coarse Page Table ........................................................................................10
2-10 Second-level Descriptors for Fine Page Table............................................................................................10
2-8 First-level Descriptors .................................................................................................................................10
2-11 Exception Summary ....................................................................................................................................12
2-12 Event Priority...............................................................................................................................................12
2-13 Intel 2-14 Intel
3-1 Data Cache and Buffer Behavior when X = 0...............................................................................................3
3-2 Data Cache and Buffer Behavior when X = 1...............................................................................................3
3-3 Memory Operations that Impose a Fence......................................................................................................4
3-4 Valid MMU & Data/mini-data Cache Combinations....................................................................................5
7-1 MRC/MCR Format........................................................................................................................................2
7-2 LDC/STC Format ..........................................................................................................................................3
7-3 CP15 Registers ..............................................................................................................................................4
7-4 ID Register.....................................................................................................................................................5
7-5 Cache Type Register......................................................................................................................................5
7-6 ARM* Control Register ................................................................................................................................7
7-7 Auxiliary Control Register ............................................................................................................................8
7-8 Translation Table Base Register....................................................................................................................9
7-9 Domain Access Control Register ..................................................................................................................9
7-10 Fault Status Register....................................................................................................................................10
7-11 Fault Address Register ................................................................................................................................10
7-12 Cache Functions ..........................................................................................................................................11
7-13 TLB Functions.............................................................................................................................................13
7-14 Cache Lockdown Functions ........................................................................................................................14
7-15 Data Cache Lock Register...........................................................................................................................14
7-16 TLB Lockdown Functions...........................................................................................................................15
7-17 Accessing Process ID ..................................................................................................................................16
7-18 Process ID Register .....................................................................................................................................16
7-19 Accessing the Debug Registers ...................................................................................................................17
7-20 Coprocessor Access Register ......................................................................................................................19
7-21 CP14 Registers ............................................................................................................................................20
7-22 Accessing the Performance Monitoring Registers ......................................................................................20
7-23 PWRMODE Register ..................................................................................................................................21
7-24 Clock and Power Management....................................................................................................................21
7-25 CCLKCFG Register ....................................................................................................................................21
7-26 Accessing the Debug Registers ...................................................................................................................22
8-1 Reset CCLK Configuration ...........................................................................................................................1
8-2 Software CCLK Configuration......................................................................................................................2
8-3 Low Power Modes.........................................................................................................................................5
8-4 PWRSTATUS[1:0] Encoding .......................................................................................................................5
®
80200 Processor Encoding of Fault Status for Prefetch Aborts .......................................................13
®
80200 Processor Encoding of Fault Status for Data Aborts .............................................................14
Developer’s Manual March, 2003 xiii
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
9-1 Interrupt Control Register (CP13 register 0) ................................................................................................ 3
9-2 Interrupt Source Register (CP13, register 4) ................................................................................................ 4
9-3 Interrupt Steer Register (CP13, register 8) ................................................................................................... 5
10-1 Intel
®
80200 Processor based on Intel® XScale™ Microarchitecture Bus Signals...................................... 3
10-2 Requests on a 64-bit Bus .............................................................................................................................. 4
10-3 Requests on a 32-bit Bus .............................................................................................................................. 5
10-4 Return Order for 8-Word Burst, 64-bit Data Bus......................................................................................... 7
10-5 Return Order for 8-Word Burst, 32-bit Data Bus......................................................................................... 7
11-1 BCU Response to ECC Errors...................................................................................................................... 3
11-2 BCUCTL (Register 0)................................................................................................................................... 5
11-3 BCUMOD (Register 1)................................................................................................................................. 7
11-4 ELOG0, ELOG1(Registers 4, 5) .................................................................................................................. 9
11-5 ECAR0, ECAR1(Registers 6, 7) .................................................................................................................. 9
11-6 ECTST (Register 8) .................................................................................................................................... 10
12-1 Clock Count Register (CCNT) ..................................................................................................................... 2
12-2 Performance Monitor Count Register (PMN0 and PMN1)..........................................................................3
12-3 Performance Monitor Control Register (CP14, register 0)........................................................................... 4
12-4 Performance Monitoring Events................................................................................................................... 6
12-5 Some Common Uses of the PMU................................................................................................................. 7
13-1 Debug Control and Status Register (DCSR) ................................................................................................ 3
13-2 Event Priority................................................................................................................................................ 6
13-3 Instruction Breakpoint Address and Control Register (IBCRx)................................................................... 9
13-4 Data Breakpoint Register (DBRx).............................................................................................................. 10
13-5 Data Breakpoint Controls Register (DBCON) ........................................................................................... 10
13-6 TX RX Control Register (TXRXCTRL).................................................................................................... 12
13-7 Normal RX Handshaking ........................................................................................................................... 13
13-8 High-Speed Download Handshaking States............................................................................................... 13
13-9 TX Handshaking......................................................................................................................................... 15
13-10 TXRXCTRL Mnemonic Extensions .......................................................................................................... 15
13-11 TX Register................................................................................................................................................. 16
13-12 RX Register ................................................................................................................................................ 16
13-13 DEBUG Data Register Reset Values.......................................................................................................... 25
13-14 CP 14 Trace Buffer Register Summary...................................................................................................... 26
13-15 Checkpoint Register (CHKPTx)................................................................................................................. 26
13-16 TBREG Format........................................................................................................................................... 27
13-17 Message Byte Formats................................................................................................................................ 28
13-18 LDIC Cache Functions ............................................................................................................................... 36
14-1 Minimum Interrupt Latency ......................................................................................................................... 1
14-2 Branch Latency Penalty................................................................................................................................ 2
14-3 Latency Example .......................................................................................................................................... 4
14-4 Branch Instruction Timings (Those predicted by the BTB) ......................................................................... 4
14-5 Branch Instruction Timings (Those not predicted by the BTB)................................................................... 5
14-6 Data Processing Instruction Timings............................................................................................................ 5
14-7 Multiply Instruction Timings........................................................................................................................ 6
14-8 Multiply Implicit Accumulate Instruction Timings...................................................................................... 7
14-9 Implicit Accumulator Access Instruction Timings....................................................................................... 7
14-10 Saturated Data Processing Instruction Timings............................................................................................ 8
14-11 Status Register Access Instruction Timings ................................................................................................. 8
14-12 Load and Store Instruction Timings ............................................................................................................. 8
14-13 Load and Store Multiple Instruction Timings .............................................................................................. 8
xiv March, 2003 Developer’s Manual
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
14-14 Semaphore Instruction Timings ....................................................................................................................9
14-15 CP15 Register Access Instruction Timings...................................................................................................9
14-16 CP14 Register Access Instruction Timings...................................................................................................9
14-17 SWI Instruction Timings...............................................................................................................................9
14-18 Count Leading Zeros Instruction Timings ....................................................................................................9
A-1 C and B encoding ..........................................................................................................................................3
B-1 Pipelines and Pipe stages...............................................................................................................................3
C-1 TAP Controller Pin Definitions.....................................................................................................................3
C-2 JTAG Instruction Set.....................................................................................................................................4
C-3 IEEE Instructions...........................................................................................................................................5
C-4 JTAG ID Register Value ...............................................................................................................................6
Developer’s Manual March, 2003 xv
Introduction

1.1 Intel® 80200 Processor based on Intel® XScale™ Microarchitecture High-Level Overview

1
The Intel® 80200 processor based on Intel® XScale™ microarchitecture, is the next generation in the Intel designed for high performance and low-power; leading the industry in mW/MIPs. The Intel 80200 processor integrates a bus controller and an interrupt controller around a core processor, with intended embedded markets such as: handheld devices, networking, remote access servers, etc. This technology is ideal for internet infrastructure products such as network and I/O processors, where ultimate performance is critical for moving and processing large amounts of data quickly.
The Intel achieve high performance. This rich feature set allows programmers to select the appropriate features that obtains the best performance for their application. Many of the architectural features added to Intel high performance processors. This includes:
®
StrongARM* processor family (compliant with ARM* Architecture V5TE). It is
®
80200 processor incorporates an extensive list of architecture features that allows it to
®
80200 processor help hide memory latency which often is a serious impediment to
®
the ability to continue instruction execution even while the data cache is retrieving data from
external memory.
a write buffer.
write-back caching.
various data cache allocation policies which can be configured different for each application.
cache locking.
and a pipelined external bus.
All these features improve the efficiency of the external bus.
The Intel support of 16-bit data types and 16-bit operations. These audio coding enhancements center around multiply and accumulate operations which accelerate many of the audio filter operations.
®
80200 processor has been equipped to efficiently handle audio processing through the

1.1.1 ARM* Architecture Compliance

ARM* Version 5 (V5) Architecture added floating point instructions to ARM* Version 4. The
®
80200 processor implements the integer instruction set architecture of ARM V5, but does
Intel not provide hardware support of the floating point instructions.
The Intel DSP extensions.
Backward compatibility with the first generation of Intel user-mode applications. Operating systems may require modifications to match the specific hardware features of the Intel enhancements added to the Intel
Developer’s Manual March, 2003 1-1
®
80200 processor provides the Thumb* instruction set (ARM* V5T) and the ARM* V5E
®
StrongARM* products is maintained for
®
80200 processor and to take advantage of the performance
®
80200 processor.
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
Introduction

1.1.2 Features

Figure 1-1 shows the major functional blocks of the Intel® 80200 processor. The following
sections give a brief, high-level overview of these blocks.
Figure 1-1. Intel
®
80200 Processor based on Intel® XScale™ Microarchitecture Features
Instruction Cache
32 Kbytes 32 ways Lockable by line
Branch Target Buffer
2 Kbytes 2 ways
Performance
Monitoring
Debug
Hardware Breakpoint Branch History Table
Interrupt Controller
Interrupt Masking FIQ/IRQ Steering Pend Register
Data Cache
Max 32 Kbytes 32 ways wr-back or
wr-through Hit under
miss
IMMU
32 entry TLB Fully associative Lockable by entry
Power Management
Idle Sleep
Data RAM
Max 28 Kbytes Re-map of
data cache
DMMU
32 entry TLB Fully associative Lockable by entry
MAC
Single Cycle Throughput (16*32)
16-bit SIMD 40-bit Accumulator
Bus Controller
1 Gbyte/sec Pipelined, de-multiplexed ECC protection
Mini-Data Cache
2 Kbytes 2 ways
Fill Buffer
4 - 8 entries
Write Buffer
8 entries Full coalescing
JTAG
B1307-01
1.1.2.1 Multiply/Accumulate (MAC)
The MAC unit supports early termination of multiplies/accumulates in two cycles and can sustain a throughput of a MAC operation every cycle. Several architectural enhancements were made to the MAC to support audio coding algorithms, which include a 40-bit accumulator and support for 16-bit packed data.
See Section 2.3, “Extensions to ARM* Architecture” on page 2-3 for more details.
1-2 March, 2003 Developer’s Manual
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
1.1.2.2 Memory Management
The Intel® 80200 processor implements the Memory Management Unit (MMU) Architecture specified in the ARM Architecture Reference Manual. The MMU provides access protection and virtual to physical address translation.
The MMU Architecture also specifies the caching policies for the instruction cache and data memory. These policies are specified as page attributes and include:
identifying code as cacheable or non-cacheable
selecting between the mini-data cache or data cache
write-back or write-through data caching
enabling data write allocation policy
and enabling the write buffer to coalesce stores to external memory
Chapter 3, “Memory Management”discusses this in more detail.
1.1.2.3 Instruction Cache
The Intel® 80200 processor implements a 32-Kbyte, 32-way set associative instruction cache with a line size of 32 bytes. All requests that “miss” the instruction cache generate a 32-byte read request to external memory. A mechanism to lock critical code within the cache is also provided.
Introduction
Chapter 4, “Instruction Cache”discusses this in more detail.
1.1.2.4 Branch Target Buffer
The Intel® 80200 processor provides a Branch Target Buffer (BTB) to predict the outcome of branch type instructions. It provides storage for the target address of branch type instructions and predicts the next address to present to the instruction cache when the current instruction address is that of a branch.
The BTB holds 128 entries. See Chapter 5, “Branch Target Buffer”for more details.
1.1.2.5 Data Cache
The Intel® 80200 processor implements a 32-Kbyte, a 32-way set associative data cache and a 2-Kbyte, 2-way set associative mini-data cache. Each cache has a line size of 32 bytes, supports write-through or write-back caching.
The data/mini-data cache is controlled by page attributes defined in the MMU Architecture and by coprocessor 15.
Chapter 6, “Data Cache”discusses all this in more detail.
The Intel RAM. Software may place special tables or frequently used variables in this RAM. See
Section 6.4, “Re-configuring the Data Cache as Data RAM” on page 6-12 for more information on
this.
®
80200 processor allows applications to re-configure a portion of the data cache as data
Developer’s Manual March, 2003 1-3
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
Introduction
1.1.2.6 Power Management
The Intel® 80200 processor supports two low power modes: idle and sleep. These modes are discussed in Section 8.3, “Power Management” on page 8-5.
1.1.2.7 Interrupt Controller
An interrupt controller is implemented on the Intel® 80200 processor that provides masking of interrupts and the ability to steer interrupts to FIQ or IRQ. It is accessed through Coprocessor 13 registers. See Chapter 9, “Interrupts”for more detail.
1.1.2.8 Bus Controller
The Intel® 80200 processor supports a pipelined external bus that runs at 100 MHz. The data bus is 32/64 bits with ECC protection. The bus controller can be configured to provide critical word first on load operations, enhancing overall system performance. The bus controller has four request queues, where all four requests can be active on the pipelined external bus.
Chapter 10, “External Bus” describes the external bus protocol and Chapter 11, “Bus Controller”
covers the aspects of ECC protection. The bus controller registers are accessed via coprocessor 13.
1.1.2.9 Performance Monitoring
Two performance monitoring counters have been added to the Intel® 80200 processor that can be configured to monitor various events in the Intel developer to measure cache efficiency, detect system bottlenecks and reduce the overall latency of programs.
Chapter 12, “Performance Monitoring”discusses this in more detail.
1.1.2.10 Debug
The Intel® 80200 processor supports software debugging through two instruction address breakpoint registers, one data-address breakpoint register, one data-address/mask breakpoint register, and a trace buffer.
Chapter 13, “Software Debug”discusses this in more detail.
1.1.2.11 JTAG
Testability is supported on the Intel® 80200 processor through the Test Access Port (TAP) Controller implementation, which is based on IEEE 1149.1 (JTAG) Standard Test Access Port and Boundary-Scan Architecture. The purpose of the TAP controller is to support test logic internal and external to the Intel
Appendix C.2 discusses this in more detail.
®
80200 processor such as built-in self-test, boundary-scan, and scan.
®
80200 processor. These events allow a software
1-4 March, 2003 Developer’s Manual
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

1.2 Terminology and Conventions

1.2.1 Number Representation

All numbers in this document can be assumed to be base 10 unless designated otherwise. In text and pseudo code descriptions, hexadecimal numbers have a prefix of 0x and binary numbers have a prefix of 0b. For example, 107 would be represented as 0x6B in hexadecimal and 0b1101011 in binary.

1.2.2 Terminology and Acronyms

ASSP Application Specific Standard Product
Assert This term refers to the logically active value of a signal or bit.
BTB Branch Target Buffer
Clean A clean operation updates external memory with the contents of the specified line in
the data/mini-data cache if any of the dirty bits are set and the line is valid. There are two dirty bits associated with each line in the cache so only the portion that is dirty gets written back to external memory.
Introduction
After this operation, the line is still valid and both dirty bits are deasserted.
Coalescing Coalescing means bringing together a new store operation with an existing store
operation already resident in the write buffer. The new store is placed in the same write buffer entry as an existing store when the address of the new store falls in the 4 word aligned address of the existing entry. This includes, in PCI terminology, write merging, write collapsing, and write combining.
Deassert This term refers to the logically inactive value of a signal or bit.
Flush A flush operation invalidates the location(s) in the cache by deasserting the valid bit.
Individual entries (lines) may be flushed or the entire cache may be flushed with one command. Once an entry is flushed in the cache it can no longer be used by the program.
Reserved A reserved field is a field that may be used by an implementation. If the initial value
of a reserved field is supplied by software, this value must be zero. Software should not modify reserved fields or depend on any values in reserved fields.
Developer’s Manual March, 2003 1-5
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
Introduction

1.3 Other Relevant Documents

®
Intel
ARM Architecture Version 5TE Specification Document Number: ARM DDI 0100E
ARM Architecture Reference Manual Document Number: ARM DDI 0100B
Intel
Intel
StrongARM SA-1100 Microprocessor Developer’s Manual, Intel Order # 278088
StrongARM SA-110 Microprocessor Technical Reference Manual, Intel Order #278058
80200 Processor based on Intel® XScale™ Microarchitecture Datasheet, Intel Order #
273414
This document describes Version 5TE of the ARM Architecture which includes Thumb ISA and ARM DSP-Enhanced ISA.
This document describes Version 4 of the ARM Architecture.
®
XScale™ Microarchitecture Programming Reference Manual, Intel Order # 273436
®
80312 I/O Companion Chip Developer’s Manual, Intel Order # 273410
1-6 March, 2003 Developer’s Manual
Programming Model
This chapter describes the programming model of the Intel® 80200 processor based on Intel®
XScale Version 5 architecture.
The ARM* Architecture Version 5TE Specification (ARM DDI 0100E) describes Version 5TE of the ARM Architecture, including the Thumb* ISA and ARM DSP-Enhanced ISA.

2.1 ARM* Architecture Compliance

The Intel® 80200 processor implements the integer instruction set architecture specified in ARM* Version 5TE. T refers to the Thumb instruction set and E refers to the DSP-Enhanced instruction set.
ARM* Version 5 introduces a few more architecture features over Version 4, specifically the addition of tiny pages (1 Kbyte), a new instruction (CLZ) that counts the leading zeroes in a data value, enhanced ARM-Thumb transfer instructions and a modification of the system control coprocessor, CP15.

2.2 ARM* Architecture Implementation Options

microarchitecture, namely the implementation options and extensions to the ARM*
2

2.2.1 Big Endian versus Little Endian

The Intel® 80200 processor supports both big and little endian data representation. The B-bit of the Control Register (Coprocessor 15, register 1, bit 7) selects big and little endian mode. To run in big endian mode, the B bit must be set before attempting any sub-word accesses to memory, or undefined results occur. Note that this bit takes effect even if the MMU is disabled.

2.2.2 26-Bit Code

The Intel® 80200 processor does not support 26-bit code.

2.2.3 Thumb*

The Intel® 80200 processor supports the Thumb instruction set.
Developer’s Manual March, 2003 2-1
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
Programming Model

2.2.4 ARM* DSP-Enhanced Instruction Set

The Intel® 80200 processor implements ARM DSP-enhanced instruction set, which is a set of instructions that boost the performance of signal processing applications. There are new multiply instructions that operate on 16-bit data values and new saturation instructions. Some of the new instructions are:
SMLAxy 32<=16x16+32
SMLAWy 32<=32x16+32
SMLALxy 64<=16x16+64
SMULxy 32<=16x16
SMULWy 32<=32x16
QADD adds two registers and saturates the result if an overflow occurred
QDADD doubles and saturates one of the input registers then add and saturate
QSUB subtracts two registers and saturates the result if an overflow occurred
QDSUB doubles and saturates one of the input registers then subtract and saturate
The Intel following implementation notes:
®
80200 processor also implements LDRD, STRD and PLD instructions with the
PLD is interpreted as a read operation by the MMU and is ignored by the data breakpoint unit,
i.e., PLD never generates data breakpoint events.
PLD to a non-cacheable page performs no action. Also, if the targeted cache line is already
resident, this instruction has no affect.
Both LDRD and STRD instructions generation an alignment exception when the address bits
[2:0] = 0b100.
MCRR and MRRC are only supported on the Intel 0 and are used to access the internal accumulator. See Section 2.3.1.2 for more information. Access to any other coprocessor besides 0x0 are undefined.

2.2.5 Base Register Update

If a data abort is signalled on a memory instruction that specifies writeback, the contents of the base register is not updated. This holds for all load and store instructions. This behavior matches that of the first generation Intel architecture as the Base Restored Abort Model.
®
StrongARM* processor and is referred to in the ARM V5
®
80200 processor when directed to coprocessor
2-2 March, 2003 Developer’s Manual
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

2.3 Extensions to ARM* Architecture

The Intel® 80200 processor made a few extensions to the ARM Version 5 architecture to meet the needs of various markets and design requirements. The following is a list of the extensions which are discussed in the next sections.
A DSP coprocessor (CP0) has been added that contains a 40-bit accumulator and new
instructions.
New page attributes were added to the page table descriptors. The C and B page attribute
encoding was extended by one more bit to allow for more encodings: write allocate and mini-data cache. An attribute specifying ECC for 1Meg regions was also added.
Additional functionality has been added to coprocessor 15. Coprocessor 14 was also created.
Enhancements were made to the Event Architecture, instruction cache and data cache parity
error exceptions, breakpoint events, and imprecise external data aborts.

2.3.1 DSP Coprocessor 0 (CP0)

The Intel® 80200 processor adds a DSP coprocessor to the architecture for the purpose of increasing the performance and the precision of audio processing algorithms. This coprocessor contains a 40-bit accumulator and new instructions.
Programming Model
The 40-bit accumulator is referenced by several new instructions that were added to the architecture; MIA, MIAPH and MIAxy are multiply/accumulate instructions that reference the 40-bit accumulator instead of a register specified accumulator. MAR and MRA provide the ability to read and write the 40-bit accumulator.
Access to CP0 is always allowed in all processor modes when bit 0 of the Coprocessor Access Register is set. Any access to CP0 when this bit is clear causes an undefined exception. (See
Section 7.2.15, “Register 15: Coprocessor Access Register” on page 7-18 for more details). Note
that only privileged software can set this bit in the Coprocessor Access Register.
The 40-bit accumulator needs to be saved on a context switch if multiple processes are using it.
Two new instruction formats were added for coprocessor 0: Multiply with Internal Accumulate Format and Internal Accumulate Access Format. The formats and instructions are described next.
Developer’s Manual March, 2003 2-3
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
Programming Model
2.3.1.1 Multiply With Internal Accumulate Format
A new multiply format has been created to define operations on 40-bit accumulators. Table 2-1 ,
“Multiply with Internal Accumulate Format” on page 2-4 shows the layout of the new format. The
opcode for this format lies within the coprocessor register transfer instruction type. These instructions have their own syntax.
Table 2-1. Multiply with Internal Accumulate Format
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
cond 1 1 1 0 0 0 1 0 opcode_3 Rs 0 0 0 0 acc 1 Rm
Bits Description Notes
31:28 cond - ARM condition codes -
19:16
15:12 Rs - Multiplier
7:5 acc - select 1 of 8 accumulators
3:0 Rm - Multiplicand -
opcode_3 - specifies the type of multiply with internal accumulate
®
Intel 0b0000 = 0b1000 = MIAPH 0b1100 = MIABB 0b1101 = MIABT 0b1110 = MIATB 0b1111 = MIATT The effect of all other encodings are unpredictable.
®
Intel access to any other acc has unpredictable effect.
80200 processor defines the following:
MIA
80200 processor only implements acc0;
Two new fields were created for this format, acc and opcode_3. The acc field specifies 1 of 8 internal accumulators to operate on and opcode_3 defines the operation for this format. The Intel 80200 processor defines a single 40-bit accumulator referred to as acc0; future implementations may define multiple internal accumulators.The Intel instructions, MIA, MIAPH, MIABB, MIABT, MIATB and MIATT.
Table 2-2. MIA{<cond>} acc0, Rm, Rs
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
cond 111000100000 Rs 00000001 Rm
Operation: if ConditionPassed(<cond>) then
acc0 = (Rm[31:0] * Rs[31:0])[39:0] + acc0[39:0] Exceptions: none Qualifiers Condition Code
No condition code flags are updated
Notes: Early termination is supported. Instruction timings can be found
in Section 14.4.4, “Multiply Instruction Timings” on page 14-6. Specifying R15 for register Rs or Rm has unpredictable results. acc0 is defined to be 0b000 on 80200.
The MIA instruction operates similarly to MLA except that the 40-bit accumulator is used. MIA multiplies the signed value in register Rs (multiplier) by the signed value in register Rm (multiplicand) and then adds the result to the 40-bit accumulator (acc0).
®
80200 processor uses opcode_3 to define six
®
2-4 March, 2003 Developer’s Manual
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
MIA does not support unsigned multiplication; all values in Rs and Rm are interpreted as signed data values. MIA is useful for operating on signed 16-bit data that was loaded into a general purpose register by LDRSH.
The instruction is only executed if the condition specified in the instruction matches the condition code status.
Table 2-3. MIAPH{<cond>} acc0, Rm, Rs
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
cond 111000101000 Rs 00000001 Rm
Operation: if ConditionPassed(<cond>) then
acc0 = sign_extend(Rm[31:16] * Rs[31:16]) +
sign_extend(Rm[15:0] * Rs[15:0]) +
acc0[39:0] Exceptions: none Qualifiers Condition Code
S bit is always cleared; no condition code flags are updated
Notes: Instruction timings can be found
in Section 14.4.4, “Multiply Instruction Timings” on page 14-6. Specifying R15 for register Rs or Rm has unpredictable results. acc0 is defined to be 0b000 on 80200
Programming Model
The MIAPH instruction performs two16-bit signed multiplies on packed half word data and accumulates these to a single 40-bit accumulator. The first signed multiplication is performed on the lower 16 bits of the value in register Rs with the lower 16 bits of the value in register Rm. The second signed multiplication is performed on the upper 16 bits of the value in register Rs with the upper 16 bits of the value in register Rm. Both signed 32-bit products are sign extended and then added to the value in the 40-bit accumulator (acc0).
The instruction is only executed if the condition specified in the instruction matches the condition code status.
Developer’s Manual March, 2003 2-5
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
Programming Model
Table 2-4. MIAxy{<cond>} acc0, Rm, Rs
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
cond 1110001011xy Rs 00000001 Rm
Operation: if ConditionPassed(<cond>) then
if (bit[17] == 0)
<operand1> = Rm[15:0]
else
<operand1> = Rm[31:16]
if (bit[16] == 0)
<operand2> = Rs[15:0]
else
<operand2> = Rs[31:16]
acc0[39:0] = sign_extend(<operand1> * <operand2>) + acc0[39:0]
Exceptions: none Qualifiers Condition Code
S bit is always cleared; no condition code flags are updated
Notes: Instruction timings can be found
in Section 14.4.4, “Multiply Instruction Timings” on page 14-6. Specifying R15 for register Rs or Rm has unpredictable results. acc0 is defined to be 0b000 on 80200.
The MIAxy instruction performs one16-bit signed multiply and accumulates these to a single 40-bit accumulator. x refers to either the upper half or lower half of register Rm (multiplicand) and y refers to the upper or lower half of Rs (multiplier). A value of 0x1 selects bits [31:16] of the register which is specified in the mnemonic as T (for top). A value of 0x0 selects bits [15:0] of the register which is specified in the mnemonic as B (for bottom).
MIAxy does not support unsigned multiplication; all values in Rs and Rm are interpreted as signed data values.
The instruction is only executed if the condition specified in the instruction matches the condition code status.
2-6 March, 2003 Developer’s Manual
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
2.3.1.2 Internal Accumulator Access Format
The Intel® 80200 processor defines a new instruction format for accessing internal accumulators in CP0. Table 2-5, “Internal Accumulator Access Format” on page 2-7 shows that the opcode falls into the coprocessor register transfer space.
Programming Model
The RdHi and RdLo fields allow up to 64 bits of data transfer between Intel registers and an internal accumulator. The acc field specifies 1 of 8 internal accumulators to transfer data to/from. The Intel
®
80200 processor implements a single 40-bit accumulator referred to as acc0; future implementations can specify multiple internal accumulators of varying sizes, up to 64 bits.
Access to the internal accumulator is allowed in all processor modes (user and privileged) as long bit 0 of the Coprocessor Access Register is set. (See Section 7.2.15, “Register 15: Coprocessor
Access Register” on page 7-18 for more details).
The Intel
®
80200 processor implements two instructions MAR and MRA that move two Intel®
StrongARM* registers to acc0 and move acc0 to two Intel
Table 2-5. Internal Accumulator Access Format
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
cond 1100010L RdHi RdLo 000000000 acc
Bits Description Notes
31:28 cond - ARM condition codes -
20
19:16
15:12
7:4 Should be zero
3 Should be zero
2:0 acc - specifies 1 of 8 internal accumulators
L - move to/from internal accumulator 0= move to internal accumulator (MAR) 1= move from internal accumulator (MRA)
RdHi - specifies the high order eight (39:32) bits of the internal accumulator.
RdLo - specifies the low order 32 bits of the internal accumulator
®
StrongARM*
®
StrongARM* registers, respectively.
-
On a read of the acc, this 8-bit high order field is sign extended.
On a write to the acc, the lower 8 bits of this register is written to acc[39:32]
-
This field could be used in future implementations to specify the type of saturation to perform on the read of an internal accumulator. (e.g., a signed saturation to 16-bits may be useful for some filter algorithms.)
-
®
80200 processor only implements acc0;
Intel access to any other acc is unpredictable
Note: MAR has the same encoding as MCRR (to coprocessor 0) and MRA has the same encoding as
MRRC (to coprocessor 0). These instructions move 64-bits of data to/from ARM registers from/to
coprocessor registers. MCRR and MRRC are defined in ARM’s DSP instruction set.
Disassemblers not aware of MAR and MRA produces the following syntax:
MCRR{<cond>} p0, 0x0, RdLo, RdHi, c0 MRRC{<cond>} p0, 0x0, RdLo, RdHi, c0
Developer’s Manual March, 2003 2-7
Intel® 80200 Processor based on Intel® XScale™ Microarchitecture
Programming Model
Table 2-6. MAR{<cond>} acc0, RdLo, RdHi
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
cond 11000100 RdHi RdLo 000000000000
Operation: if ConditionPassed(<cond>) then
Exceptions: none Qualifiers Condition Code
Notes: Instruction timings can be found in
acc0[39:32] = RdHi[7:0] acc0[31:0] = RdLo[31:0]
No condition code flags are updated
Section 14.4.4, “Multiply Instruction Timings” on page 14-6
Specifying R15 as either RdHi or RdLo has unpredictable results.
The MAR instruction moves the value in register RdLo to bits[31:0] of the 40-bit accumulator (acc0) and moves bits[7:0] of the value in register RdHi into bits[39:32] of acc0.
The instruction is only executed if the condition specified in the instruction matches the condition code status.
This instruction executes in any processor mode.
Table 2-7. MRA{<cond>} RdLo, RdHi, acc0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
cond 11000101 RdHi RdLo 000000000000
Operation: if ConditionPassed(<cond>) then
Exceptions: none Qualifiers Condition Code
Notes: Instruction timings can be found in
RdHi[31:0] = sign_extend(acc0[39:32]) RdLo[31:0] = acc0[31:0]
No condition code flags are updated
Section 14.4.4, “Multiply Instruction Timings” on page 14-6
Specifying the same register for RdHi and RdLo has unpredictable results.
Specifying R15 as either RdHi or RdLo has unpredictable results.
The MRA instruction moves the 40-bit accumulator value (acc0) into two registers. Bits[31:0] of the value in acc0 are moved into the register RdLo. Bits[39:32] of the value in acc0 are sign extended to 32 bits and moved into the register RdHi.
The instruction is only executed if the condition specified in the instruction matches the condition code status.
This instruction executes in any processor mode.
2-8 March, 2003 Developer’s Manual
Loading...
+ 259 hidden pages