Intel 80200 User Manual

Download

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Developer’s Manual

March, 2003

Order Number: 273411-003

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Information in this document is provided in connection with Intel® products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel's Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel® products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice.Intel may make changes to specifications and product descriptions at any time, without notice.

Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.

The Intel® 80200 Processor may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an ordering number and are referenced in this document, or other Intel literature may be obtained by calling 1-800-548-4725 or by visiting Intel's website at http://www.intel.com.

*Other brands and names are the property of their respective owners.

ARM and StrongARM are registered trademarks of ARM, Ltd.

ii March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Contents

1 Introduction .............................................................................................. 1

1.1 Intel® 80200 Processor based on Intel® XScale™ Microarchitecture High-Level Overview .........1

1.1.1 ARM* Architecture Compliance ................................................................................... 1

1.1.2 Features....................................................................................................................... 2

1.1.2.1 Multiply/Accumulate (MAC) ......................................................................2

1.1.2.2 Memory Management ............................................................................... 3

1.1.2.3 Instruction Cache...................................................................................... 3

1.1.2.4 Branch Target Buffer ................................................................................ 3

1.1.2.5 Data Cache............................................................................................... 3

1.1.2.6 Power Management..................................................................................4

1.1.2.7 Interrupt Controller ....................................................................................4

1.1.2.8 Bus Controller ...........................................................................................4

1.1.2.9 Performance Monitoring ........................................................................... 4

1.1.2.10 Debug ....................................................................................................... 4

1.1.2.11 JTAG.........................................................................................................4

1.2 Terminology and Conventions ...................................................................................................... 5

1.2.1 Number Representation............................................................................................... 5

1.2.2 Terminology and Acronyms .........................................................................................5

1.3 Other Relevant Documents ..........................................................................................................6

2 Programming Model ................................................................................ 1

2.1 ARM* Architecture Compliance .................................................................................................... 1

2.2 ARM* Architecture Implementation Options ................................................................................. 1

2.2.1 Big Endian versus Little Endian ................................................................................... 1

2.2.2 26-Bit Code .................................................................................................................. 1

2.2.3 Thumb* ........................................................................................................................ 1

2.2.4 ARM* DSP-Enhanced Instruction Set.......................................................................... 2

2.2.5 Base Register Update.................................................................................................. 2

2.3 Extensions to ARM* Architecture..................................................................................................3

2.3.1 DSP Coprocessor 0 (CP0)...........................................................................................3

2.3.1.1 Multiply With Internal Accumulate Format ................................................ 4

2.3.1.2 Internal Accumulator Access Format........................................................ 7

2.3.2 New Page Attributes .................................................................................................... 9

2.3.3 Additions to CP15 Functionality ................................................................................. 11

2.3.4 Event Architecture .....................................................................................................12

2.3.4.1 Exception Summary................................................................................12

2.3.4.2 Event Priority ..........................................................................................12

2.3.4.3 Prefetch Aborts .......................................................................................13

2.3.4.4 Data Aborts............................................................................................. 14

2.3.4.5 Events from Preload Instructions ............................................................ 16

2.3.4.6 Debug Events ......................................................................................... 16

3 Memory Management .............................................................................. 1

3.1 Overview....................................................................................................................................... 1

3.2 Architecture Model........................................................................................................................ 2

3.2.1 Version 4 vs. Version 5................................................................................................ 2

3.2.2 Memory Attributes........................................................................................................2

Developer’s Manual March, 2003 iii

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

3.2.2.1 Page (P) Attribute Bit ................................................................................ 2

3.2.2.2 Cacheable (C), Bufferable (B), and eXtension (X) Bits ............................ 2

3.2.2.3 Instruction Cache...................................................................................... 2

3.2.2.4 Data Cache and Write Buffer.................................................................... 3

3.2.2.5 Details on Data Cache and Write Buffer Behavior ................................... 4

3.2.2.6 Memory Operation Ordering..................................................................... 4

3.2.3 Exceptions ................................................................................................................... 4

3.3 Interaction of the MMU, Instruction Cache, and Data Cache ....................................................... 5

3.4 Control .......................................................................................................................................... 6

3.4.1 Invalidate (Flush) Operation ........................................................................................ 6

3.4.2 Enabling/Disabling....................................................................................................... 6

3.4.3 Locking Entries ............................................................................................................7

3.4.4 Round-Robin Replacement Algorithm ......................................................................... 9

4 Instruction Cache ..................................................................................... 1

4.1 Overview....................................................................................................................................... 1

4.2 Operation...................................................................................................................................... 2

4.2.1 Operation When Instruction Cache is Enabled............................................................ 2

4.2.2 Operation When The Instruction Cache Is Disabled.................................................... 2

4.2.3 Fetch Policy ................................................................................................................. 3

4.2.4 Round-Robin Replacement Algorithm ......................................................................... 3

4.2.5 Parity Protection ..........................................................................................................4

4.2.6 Instruction Fetch Latency............................................................................................. 5

4.2.7 Instruction Cache Coherency ...................................................................................... 5

4.3 Instruction Cache Control ............................................................................................................. 6

4.3.1 Instruction Cache State at RESET .............................................................................. 6

4.3.2 Enabling/Disabling....................................................................................................... 6

4.3.3 Invalidating the Instruction Cache................................................................................ 7

4.3.4 Locking Instructions in the Instruction Cache ..............................................................8

4.3.5 Unlocking Instructions in the Instruction Cache........................................................... 9

5 Branch Target Buffer ............................................................................... 1

5.1 Branch Target Buffer (BTB) Operation .........................................................................................1

5.1.1 Reset ........................................................................................................................... 2

5.1.2 Update Policy............................................................................................................... 2

5.2 BTB Control .................................................................................................................................. 3

5.2.1 Disabling/Enabling....................................................................................................... 3

5.2.2 Invalidation................................................................................................................... 3

6 Data Cache................................................................................................ 1

6.1 Overviews ..................................................................................................................................... 1

6.1.1 Data Cache Overview.................................................................................................. 1

6.1.2 Mini-Data Cache Overview .......................................................................................... 3

6.1.3 Write Buffer and Fill Buffer Overview........................................................................... 4

6.2 Data Cache and Mini-Data Cache Operation ............................................................................... 5

6.2.1 Operation When Caching is Enabled........................................................................... 5

6.2.2 Operation When Data Caching is Disabled ................................................................. 5

6.2.3 Cache Policies .............................................................................................................5

6.2.3.1 Cacheability .............................................................................................. 5

6.2.3.2 Read Miss Policy ...................................................................................... 6

iv March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

6.2.3.3 Write Miss Policy ...................................................................................... 7

6.2.3.4 Write-Back Versus Write-Through ............................................................ 7

6.2.4 Round-Robin Replacement Algorithm ......................................................................... 8

6.2.5 Parity Protection ..........................................................................................................8

6.2.6 Atomic Accesses .........................................................................................................8

6.3 Data Cache and Mini-Data Cache Control ...................................................................................9

6.3.1 Data Memory State After Reset................................................................................... 9

6.3.2 Enabling/Disabling ....................................................................................................... 9

6.3.3 Invalidate & Clean Operations .....................................................................................9

6.3.3.1 Global Clean and Invalidate Operation ...................................................10

6.4 Re-configuring the Data Cache as Data RAM ............................................................................12

6.5 Write Buffer/Fill Buffer Operation and Control ............................................................................ 16

7 Configuration ........................................................................................... 1

7.1 Overview....................................................................................................................................... 1

7.2 CP15 Registers............................................................................................................................. 4

7.2.1 Register 0: ID and Cache Type Registers ................................................................... 5

7.2.2 Register 1: Control and Auxiliary Control Registers .................................................... 7

7.2.3 Register 2: Translation Table Base Register ............................................................... 9

7.2.4 Register 3: Domain Access Control Register .............................................................. 9

7.2.5 Register 4: Reserved ................................................................................................... 9

7.2.6 Register 5: Fault Status Register............................................................................... 10

7.2.7 Register 6: Fault Address Register ............................................................................ 10

7.2.8 Register 7: Cache Functions .....................................................................................11

7.2.9 Register 8: TLB Operations ....................................................................................... 13

7.2.10 Register 9: Cache Lock Down ...................................................................................14

7.2.11 Register 10: TLB Lock Down ..................................................................................... 15

7.2.12 Register 11-12: Reserved.......................................................................................... 15

7.2.13 Register 13: Process ID .............................................................................................16

7.2.13.1 The PID Register Affect On Addresses .................................................. 16

7.2.14 Register 14: Breakpoint Registers .............................................................................17

7.2.15 Register 15: Coprocessor Access Register ............................................................... 18

7.3 CP14 Registers........................................................................................................................... 20

7.3.1 Registers 0-3: Performance Monitoring .....................................................................20

7.3.2 Register 4-5: Reserved ..............................................................................................20

7.3.3 Registers 6-7: Clock and Power Management .......................................................... 21

7.3.4 Registers 8-15: Software Debug................................................................................ 22

8 System Management ............................................................................... 1

8.1 Clocking ........................................................................................................................................1

8.2 Processor Reset ........................................................................................................................... 3

8.2.1 Reset Sequence .......................................................................................................... 3

8.2.2 Reset Effect on Outputs............................................................................................... 4

8.3 Power Management...................................................................................................................... 5

8.3.1 Invocation .................................................................................................................... 5

8.3.2 Signals Associated with Power Management .............................................................. 5

9 Interrupts .................................................................................................. 1

9.1 Introduction ...................................................................................................................................1

9.2 External Interrupts ........................................................................................................................ 1

Developer’s Manual March, 2003 v

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

9.3 Programmer Model ....................................................................................................................... 2

9.3.1 INTCTL ........................................................................................................................ 3

9.3.2 INTSRC ....................................................................................................................... 4

9.3.3 INTSTR........................................................................................................................ 5

10 External Bus ............................................................................................. 1

10.1 General Description ...................................................................................................................... 1

10.2 Signal Description......................................................................................................................... 3

10.2.1 Request Bus ................................................................................................................ 4

10.2.1.1 Intel

10.2.2 Data Bus ...................................................................................................................... 6

10.2.3 Critical Word First ........................................................................................................ 7

10.2.4 Configuration Pins ....................................................................................................... 8

10.2.5 Multimaster Support..................................................................................................... 9

10.2.6 Abort .......................................................................................................................... 11

10.2.7 ECC ........................................................................................................................... 12

10.2.8 Big Endian System Configuration .............................................................................. 13

10.3 Examples.................................................................................................................................... 14

10.3.1 Simple Read Word..................................................................................................... 14

10.3.2 Read Burst, No Critical Word First............................................................................. 15

10.3.3 Read Burst, Critical Word First Data Return.............................................................. 16

10.3.4 Word Write................................................................................................................. 17

10.3.5 Two Word Coalesced Write ....................................................................................... 18

10.3.5.1 Write Burst .............................................................................................. 19

10.3.6 Write Burst, Coalesced.............................................................................................. 20

10.3.7 Pipelined Accesses.................................................................................................... 21

10.3.8 Locked Access........................................................................................................... 22

10.3.9 Aborted Access.......................................................................................................... 23

10.3.10 Hold ........................................................................................................................... 24

80200 Processor Use of the Request Bus...................................... 4

11 Bus Controller ..........................................................................................1

11.1 Introduction ................................................................................................................................... 1

11.2 ECC .............................................................................................................................................. 1

11.3 Error Handling .............................................................................................................................. 2

11.3.1 Bus Aborts ................................................................................................................... 2

11.3.2 ECC Errors ..................................................................................................................3

11.4 Programmer Model ....................................................................................................................... 5

11.4.1 BCU Control Registers ................................................................................................ 5

11.4.2 ECC Error Registers .................................................................................................... 9

12 Performance Monitoring.......................................................................... 1

12.1 Overview....................................................................................................................................... 1

12.2 Clock Counter (CCNT; CP14 - Register 1) ................................................................................... 2

12.3 Performance Count Registers (PMN0 - PMN1; CP14 - Register 2 and 3, Respectively)............. 3

12.3.1 Extending Count Duration Beyond 32 Bits .................................................................. 3

12.4 Performance Monitor Control Register (PMNC) ........................................................................... 4

12.4.1 Managing PMNC ......................................................................................................... 5

12.5 Performance Monitoring Events ................................................................................................... 6

12.5.1 Instruction Cache Efficiency Mode .............................................................................. 7

12.5.2 Data Cache Efficiency Mode ....................................................................................... 8

vi March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

12.5.3 Instruction Fetch Latency Mode................................................................................... 8

12.5.4 Data/Bus Request Buffer Full Mode ............................................................................ 9

12.5.5 Stall/Writeback Statistics ............................................................................................. 9

12.5.6 Instruction TLB Efficiency Mode ................................................................................ 10

12.5.7 Data TLB Efficiency Mode .........................................................................................10

12.6 Multiple Performance Monitoring Run Statistics ......................................................................... 11

12.7 Examples .................................................................................................................................... 12

13 Software Debug........................................................................................ 1

13.1 Definitions .....................................................................................................................................1

13.2 Debug Registers ...........................................................................................................................1

13.3 Introduction ...................................................................................................................................2

13.3.1 Halt Mode ....................................................................................................................2

13.3.2 Monitor Mode ............................................................................................................... 2

13.4 Debug Control and Status Register (DCSR) ................................................................................3

13.4.1 Global Enable Bit (GE) ................................................................................................ 4

13.4.2 Halt Mode Bit (H) .........................................................................................................4

13.4.3 Vector Trap Bits (TF,TI,TD,TA,TS,TU,TR) .................................................................. 5

13.4.4 Sticky Abort Bit (SA) .................................................................................................... 5

13.4.5 Method of Entry Bits (MOE)......................................................................................... 5

13.4.6 Trace Buffer Mode Bit (M) ...........................................................................................5

13.4.7 Trace Buffer Enable Bit (E).......................................................................................... 5

13.5 Debug Exceptions.........................................................................................................................6

13.5.1 Halt Mode ....................................................................................................................6

13.5.2 Monitor Mode ............................................................................................................... 8

13.6 HW Breakpoint Resources ........................................................................................................... 9

13.6.1 Instruction Breakpoints ................................................................................................ 9

13.6.2 Data Breakpoints .......................................................................................................10

13.7 Software Breakpoints.................................................................................................................. 11

13.8 Transmit/Receive Control Register (TXRXCTRL) ......................................................................12

13.8.1 RX Register Ready Bit (RR) ......................................................................................13

13.8.2 Overflow Flag (OV) .................................................................................................... 14

13.8.3 Download Flag (D) .....................................................................................................14

13.8.4 TX Register Ready Bit (TR) ....................................................................................... 15

13.8.5 Conditional Execution Using TXRXCTRL .................................................................. 15

13.9 Transmit Register (TX) ............................................................................................................... 16

13.10 Receive Register (RX) ................................................................................................................ 16

13.11 Debug JTAG Access ..................................................................................................................17

13.11.1 SELDCSR JTAG Command ......................................................................................17

13.11.2 SELDCSR JTAG Register .........................................................................................18

13.11.2.1 DBG.HLD_RST....................................................................................... 19

13.11.2.2 DBG.BRK................................................................................................ 20

13.11.2.3 DBG.DCSR .............................................................................................20

13.11.3 DBGTX JTAG Command........................................................................................... 20

13.11.4 DBGTX JTAG Register .............................................................................................. 21

13.11.5 DBGRX JTAG Command ..........................................................................................21

13.11.6 DBGRX JTAG Register ............................................................................................. 22

13.11.6.1 RX Write Logic........................................................................................ 23

13.11.6.2 DBGRX Data Register ............................................................................ 24

13.11.6.3 DBG.RR ..................................................................................................24

Developer’s Manual March, 2003 vii

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

13.11.6.4 DBG.V .................................................................................................... 25

13.11.6.5 DBG.RX .................................................................................................. 25

13.11.6.6 DBG.D .................................................................................................... 25

13.11.6.7 DBG.FLUSH ........................................................................................... 25

13.11.7 Debug JTAG Data Register Reset Values................................................................. 25

13.12 Trace Buffer ................................................................................................................................ 26

13.12.1 Trace Buffer CP Registers......................................................................................... 26

13.12.1.1 Checkpoint Registers ............................................................................. 26

13.12.1.2 Trace Buffer Register (TBREG).............................................................. 27

13.13 Trace Buffer Entries.................................................................................................................... 28

13.13.1 Message Byte ............................................................................................................ 28

13.13.1.1 Exception Message Byte ........................................................................ 29

13.13.1.2 Non-exception Message Byte................................................................. 30

13.13.1.3 Address Bytes ........................................................................................ 31

13.13.2 Trace Buffer Usage.................................................................................................... 32

13.14 Downloading Code in the ICache ............................................................................................... 34

13.14.1 LDIC JTAG Command............................................................................................... 34

13.14.2 LDIC JTAG Data Register ......................................................................................... 35

13.14.3 LDIC Cache Functions............................................................................................... 36

13.14.4 Loading IC During Reset ........................................................................................... 38

13.14.4.1 Loading IC During Cold Reset for Debug ............................................... 39

13.14.4.2 Loading IC During a Warm Reset for Debug .......................................... 41

13.14.5 Dynamically Loading IC After Reset .......................................................................... 43

13.14.5.1 Dynamic Code Download Synchronization ............................................ 45

13.14.6 Mini Instruction Cache Overview ............................................................................... 46

13.15 Halt Mode Software Protocol...................................................................................................... 47

13.15.1 Starting a Debug Session .......................................................................................... 47

13.15.1.1 Setting up Override Vector Tables ......................................................... 47

13.15.1.2 Placing the Handler in Memory .............................................................. 48

13.15.2 Implementing a Debug Handler ................................................................................. 49

13.15.2.1 Debug Handler Entry .............................................................................. 49

13.15.2.2 Debug Handler Restrictions.................................................................... 49

13.15.2.3 Dynamic Debug Handler ........................................................................ 50

13.15.2.4 High-Speed Download............................................................................ 52

13.15.3 Ending a Debug Session ........................................................................................... 53

13.16 Software Debug Notes/Errata..................................................................................................... 54

14 Performance Considerations ..................................................................1

14.1 Interrupt Latency........................................................................................................................... 1

14.2 Branch Prediction ......................................................................................................................... 2

14.3 Addressing Modes ........................................................................................................................ 2

14.4 Instruction Latencies..................................................................................................................... 3

14.4.1 Performance Terms ..................................................................................................... 3

14.4.2 Branch Instruction Timings .......................................................................................... 4

14.4.3 Data Processing Instruction Timings ........................................................................... 5

14.4.4 Multiply Instruction Timings ......................................................................................... 6

14.4.5 Saturated Arithmetic Instructions................................................................................. 8

14.4.6 Status Register Access Instructions ............................................................................ 8

14.4.7 Load/Store Instructions................................................................................................ 8

14.4.8 Semaphore Instructions............................................................................................... 9

14.4.9 Coprocessor Instructions ............................................................................................. 9

viii March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

14.4.10 Miscellaneous Instruction Timing................................................................................. 9

14.4.11 Thumb* Instructions ..................................................................................................... 9

A Compatibility: Intel® 80200 Processor vs. SA-110................................ 1

A.1 Introduction ...................................................................................................................................1

A.2 Summary ......................................................................................................................................1

A.3 Architecture Deviations ................................................................................................................. 3

A.3.1 Read Buffer...................................................................................................................... 3

A.3.2 26-bit Mode...................................................................................................................... 3

A.3.3 Cacheable (C) and Bufferable (B) Encoding ................................................................... 3

A.3.4 Write Buffer Behavior....................................................................................................... 4

A.3.5 External Aborts ................................................................................................................ 4

A.3.6 Performance Differences ................................................................................................. 5

A.3.7 System Control Coprocessor........................................................................................... 5

A.3.8 New Instructions and Instruction Formats .......................................................................5

A.3.9 Augmented Page Table Descriptors................................................................................ 5

B Optimization Guide .................................................................................. 1

B.1 Introduction ...................................................................................................................................1

B.1.1 About This Guide ............................................................................................................. 1

B.2 Intel

B.3 Basic Optimizations ...................................................................................................................... 9

B.4 Cache and Prefetch Optimizations .............................................................................................17

80200 Processor Pipeline...................................................................................................2

B.2.1 General Pipeline Characteristics .....................................................................................2

B.2.1.1. Number of Pipeline Stages ................................................................................. 2

B.2.1.2. Intel

80200 Processor Pipeline Organization.................................................... 3

B.2.1.3. Out Of Order Completion ....................................................................................4

B.2.1.4. Register Scoreboarding ...................................................................................... 4

B.2.1.5. Use of Bypassing ................................................................................................ 4

B.2.2 Instruction Flow Through the Pipeline .............................................................................5

B.2.2.1. ARM* V5 Instruction Execution........................................................................... 5

B.2.2.2. Pipeline Stalls ..................................................................................................... 5

B.2.3 Main Execution Pipeline ..................................................................................................6

B.2.3.1. F1 / F2 (Instruction Fetch) Pipestages................................................................ 6

B.2.3.2. ID (Instruction Decode) Pipestage...................................................................... 6

B.2.3.3. RF (Register File / Shifter) Pipestage ................................................................. 7

B.2.3.4. X1 (Execute) Pipestage ...................................................................................... 7

B.2.3.5. X2 (Execute 2) Pipestage ................................................................................... 7

B.2.3.6. WB (write-back) ..................................................................................................7

B.2.4 Memory Pipeline ..............................................................................................................8

B.2.4.1. D1 and D2 Pipestage.......................................................................................... 8

B.2.5 Multiply/Multiply Accumulate (MAC) Pipeline .................................................................. 8

B.2.5.1. Behavioral Description ........................................................................................ 8

B.3.1 Conditional Instructions ...................................................................................................9

B.3.1.1. Optimizing Condition Checks.............................................................................. 9

B.3.1.2. Optimizing Branches......................................................................................... 10

B.3.1.3. Optimizing Complex Expressions ..................................................................... 12

B.3.2 Bit Field Manipulation ....................................................................................................13

B.3.3 Optimizing the Use of Immediate Values.......................................................................14

B.3.4 Optimizing Integer Multiply and Divide .......................................................................... 15

B.3.5 Effective Use of Addressing Modes............................................................................... 16

Developer’s Manual March, 2003 ix

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

B.4.1 Instruction Cache........................................................................................................... 17

B.4.1.1. Cache Miss Cost............................................................................................... 17

B.4.1.2. Round Robin Replacement Cache Policy......................................................... 17

B.4.1.3. Code Placement to Reduce Cache Misses ...................................................... 17

B.4.1.4. Locking Code into the Instruction Cache .......................................................... 18

B.4.2 Data and Mini Cache ..................................................................................................... 19

B.4.2.1. Non Cacheable Regions ................................................................................... 19

B.4.2.2. Write-through and Write-back Cached Memory Regions ................................. 19

B.4.2.3. Read Allocate and Read-write Allocate Memory Regions ................................ 20

B.4.2.4. Creating On-chip RAM...................................................................................... 20

B.4.2.5. Mini-data Cache................................................................................................ 21

B.4.2.6. Data Alignment ................................................................................................. 22

B.4.2.7. Literal Pools ...................................................................................................... 23

B.4.3 Cache Considerations ................................................................................................... 24

B.4.3.1. Cache Conflicts, Pollution and Pressure .......................................................... 24

B.4.3.2. Memory Page Thrashing .................................................................................. 24

B.4.4 Prefetch Considerations ................................................................................................ 25

B.4.4.1. Prefetch Distances in the Intel

80200 Processor............................................ 25

B.4.4.2. Prefetch Loop Scheduling................................................................................. 27

B.4.4.3. Prefetch Loop Limitations ................................................................................. 27

B.4.4.4. Compute vs. Data Bus Bound .......................................................................... 27

B.4.4.5. Low Number of Iterations.................................................................................. 27

B.4.4.6. Bandwidth Limitations ....................................................................................... 28

B.4.4.7. Cache Memory Considerations ........................................................................ 29

B.4.4.8. Cache Blocking................................................................................................. 31

B.4.4.9. Prefetch Unrolling ............................................................................................. 31

B.4.4.10.Pointer Prefetch .............................................................................................. 32

B.4.4.11.Loop Interchange ............................................................................................ 33

B.4.4.12.Loop Fusion .................................................................................................... 33

B.4.4.13.Prefetch to Reduce Register Pressure............................................................ 34

B.5 Instruction Scheduling ................................................................................................................ 35

B.5.1 Scheduling Loads .......................................................................................................... 35

B.5.1.1. Scheduling Load and Store Double (LDRD/STRD) .......................................... 37

B.5.1.2. Scheduling Load and Store Multiple (LDM/STM) ............................................. 38

B.5.2 Scheduling Data Processing Instructions ...................................................................... 39

B.5.3 Scheduling Multiply Instructions .................................................................................... 40

B.5.4 Scheduling SWP and SWPB Instructions...................................................................... 41

B.5.5 Scheduling the MRA and MAR Instructions (MRRC/MCRR)......................................... 42

B.5.6 Scheduling the MIA and MIAPH Instructions................................................................. 43

B.5.7 Scheduling MRS and MSR Instructions......................................................................... 44

B.5.8 Scheduling CP15 Coprocessor Instructions .................................................................. 44

B.6 Optimizing C Libraries ................................................................................................................ 45

B.7 Optimizations for Size................................................................................................................. 45

B.7.1 Space/Performance Trade Off....................................................................................... 45

B.7.1.1. Multiple Word Load and Store .......................................................................... 45

B.7.1.2. Use of Conditional Instructions ......................................................................... 45

B.7.1.3. Use of PLD Instructions .................................................................................... 45

C Test Features ............................................................................................ 1

C.1 Introduction................................................................................................................................... 1

C.2 JTAG - IEEE1149.1 ...................................................................................................................... 1

C.2.1 Boundary Scan Architecture ............................................................................................ 2

x March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

C.2.2 TAP Pins.......................................................................................................................... 3

C.2.3 Instruction Register (IR)................................................................................................... 4

C.2.3.1. Boundary-Scan Instruction Set ........................................................................... 4

C.2.4 TAP Test Data Registers .................................................................................................6

C.2.4.1. Device Identification Register ............................................................................. 6

C.2.4.2. Bypass Register.................................................................................................. 6

C.2.4.3. Boundary-Scan Register..................................................................................... 6

C.2.5 TAP Controller ................................................................................................................. 7

C.2.5.1. Test Logic Reset State ....................................................................................... 8

C.2.5.2. Run-Test/Idle State............................................................................................. 8

C.2.5.3. Select-DR-Scan State......................................................................................... 8

C.2.5.4. Capture-DR State ............................................................................................... 8

C.2.5.5. Shift-DR State.....................................................................................................9

C.2.5.6. Exit1-DR State .................................................................................................... 9

C.2.5.7. Pause-DR State..................................................................................................9

C.2.5.8. Exit2-DR State .................................................................................................... 9

C.2.5.9. Update-DR State .............................................................................................. 10

C.2.5.10.Select-IR Scan State....................................................................................... 10

C.2.5.11.Capture-IR State ............................................................................................. 10

C.2.5.12.Shift-IR State ................................................................................................... 10

C.2.5.13.Exit1-IR State .................................................................................................. 11

C.2.5.14.Pause-IR State................................................................................................ 11

C.2.5.15.Exit2-IR State .................................................................................................. 11

C.2.5.16.Update-IR State .............................................................................................. 11

C.2.5.17.Boundary-Scan Example ................................................................................12

Developer’s Manual March, 2003 xi

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Figures

1-1 Intel® 80200 Processor based on Intel® XScale™ Microarchitecture Features ........................................... 2

3-1 Example of Locked Entries in TLB.............................................................................................................. 9

4-1 Instruction Cache Organization .................................................................................................................... 1

4-2 Locked Line Effect on Round Robin Replacement...................................................................................... 8

5-1 BTB Entry..................................................................................................................................................... 1

5-2 Branch History.............................................................................................................................................. 2

6-1 Data Cache Organization.............................................................................................................................. 2

6-2 Mini-Data Cache Organization..................................................................................................................... 3

6-3 Locked Line Effect on Round Robin Replacement.................................................................................... 15

8-1 Reset Sequence ............................................................................................................................................. 3

8-2 Pin State at Reset .......................................................................................................................................... 4

9-1 Interrupt Controller Block Diagram ............................................................................................................. 2

10-1 Typical System ............................................................................................................................................. 1

10-2 Alternate Configuration................................................................................................................................ 2

10-3 Big Endian Lane Swapping on a 64-bit Bus............................................................................................... 13

10-4 Basic Read Timing ..................................................................................................................................... 14

10-5 Read Burst, No CWF.................................................................................................................................. 15

10-6 Read Burst, CWF........................................................................................................................................ 16

10-7 Basic Word Write ....................................................................................................................................... 17

10-8 Two Word Coalesced Write ....................................................................................................................... 18

10-9 Four Word Eviction Write.......................................................................................................................... 19

10-10 Four Word Coalesced Write Burst ............................................................................................................. 20

10-11 Pipeline Example........................................................................................................................................ 21

10-12 Locked Access............................................................................................................................................ 22

10-13 Aborted Access........................................................................................................................................... 23

10-14 Hold Assertion............................................................................................................................................ 24

13-1 SELDCSR Hardware.................................................................................................................................. 18

13-2 SELDCSR Data Register............................................................................................................................ 19

13-3 DBGTX Hardware...................................................................................................................................... 21

13-4 DBGRX Hardware...................................................................................................................................... 22

13-5 RX Write Logic .......................................................................................................................................... 23

13-6 DBGRX Data Register ............................................................................................................................... 24

13-7 Message Byte Formats................................................................................................................................ 28

13-8 Indirect Branch Entry Address Byte Organization..................................................................................... 31

13-9 High Level View of Trace Buffer............................................................................................................... 32

13-10 LDIC JTAG Data Register Hardware......................................................................................................... 35

13-11 Format of LDIC Cache Functions .............................................................................................................. 37

13-12 Code Download During a Cold Reset For Debug ...................................................................................... 39

13-13 Code Download During a Warm Reset For Debug.................................................................................... 41

13-14 Downloading Code in IC During Program Execution................................................................................ 43

B-1 Intel

C-1 Test Access Port Block Diagram.................................................................................................................. 2

C-2 TAP Controller State Diagram ..................................................................................................................... 7

C-3 JTAG Example ........................................................................................................................................... 13

C-4 Timing Diagram Illustrating the Loading of Instruction Register..............................................................14

C-5 Timing Diagram Illustrating the Loading of Data Register........................................................................ 15

80200 Processor RISC Superpipeline................................................................................................ 3

xii March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Tables

2-1 Multiply with Internal Accumulate Format...................................................................................................4

2-2 MIA{<cond>} acc0, Rm, Rs.........................................................................................................................4

2-3 MIAPH{<cond>} acc0, Rm, Rs....................................................................................................................5

2-4 MIAxy{<cond>} acc0, Rm, Rs.....................................................................................................................6

2-5 Internal Accumulator Access Format............................................................................................................7

2-6 MAR{<cond>} acc0, RdLo, RdHi................................................................................................................8

2-7 MRA{<cond>} RdLo, RdHi, acc0................................................................................................................8

2-9 Second-level Descriptors for Coarse Page Table ........................................................................................10

2-10 Second-level Descriptors for Fine Page Table............................................................................................10

2-8 First-level Descriptors .................................................................................................................................10

2-11 Exception Summary ....................................................................................................................................12

2-12 Event Priority...............................................................................................................................................12

2-13 Intel 2-14 Intel

3-1 Data Cache and Buffer Behavior when X = 0...............................................................................................3

3-2 Data Cache and Buffer Behavior when X = 1...............................................................................................3

3-3 Memory Operations that Impose a Fence......................................................................................................4

3-4 Valid MMU & Data/mini-data Cache Combinations....................................................................................5

7-1 MRC/MCR Format........................................................................................................................................2

7-2 LDC/STC Format ..........................................................................................................................................3

7-3 CP15 Registers ..............................................................................................................................................4

7-4 ID Register.....................................................................................................................................................5

7-5 Cache Type Register......................................................................................................................................5

7-6 ARM* Control Register ................................................................................................................................7

7-7 Auxiliary Control Register ............................................................................................................................8

7-8 Translation Table Base Register....................................................................................................................9

7-9 Domain Access Control Register ..................................................................................................................9

7-10 Fault Status Register....................................................................................................................................10

7-11 Fault Address Register ................................................................................................................................10

7-12 Cache Functions ..........................................................................................................................................11

7-13 TLB Functions.............................................................................................................................................13

7-14 Cache Lockdown Functions ........................................................................................................................14

7-15 Data Cache Lock Register...........................................................................................................................14

7-16 TLB Lockdown Functions...........................................................................................................................15

7-17 Accessing Process ID ..................................................................................................................................16

7-18 Process ID Register .....................................................................................................................................16

7-19 Accessing the Debug Registers ...................................................................................................................17

7-20 Coprocessor Access Register ......................................................................................................................19

7-21 CP14 Registers ............................................................................................................................................20

7-22 Accessing the Performance Monitoring Registers ......................................................................................20

7-23 PWRMODE Register ..................................................................................................................................21

7-24 Clock and Power Management....................................................................................................................21

7-25 CCLKCFG Register ....................................................................................................................................21

7-26 Accessing the Debug Registers ...................................................................................................................22

8-1 Reset CCLK Configuration ...........................................................................................................................1

8-2 Software CCLK Configuration......................................................................................................................2

8-3 Low Power Modes.........................................................................................................................................5

8-4 PWRSTATUS[1:0] Encoding .......................................................................................................................5

80200 Processor Encoding of Fault Status for Prefetch Aborts .......................................................13

80200 Processor Encoding of Fault Status for Data Aborts .............................................................14

Developer’s Manual March, 2003 xiii

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

9-1 Interrupt Control Register (CP13 register 0) ................................................................................................ 3

9-2 Interrupt Source Register (CP13, register 4) ................................................................................................ 4

9-3 Interrupt Steer Register (CP13, register 8) ................................................................................................... 5

10-1 Intel

80200 Processor based on Intel® XScale™ Microarchitecture Bus Signals...................................... 3

10-2 Requests on a 64-bit Bus .............................................................................................................................. 4

10-3 Requests on a 32-bit Bus .............................................................................................................................. 5

10-4 Return Order for 8-Word Burst, 64-bit Data Bus......................................................................................... 7

10-5 Return Order for 8-Word Burst, 32-bit Data Bus......................................................................................... 7

11-1 BCU Response to ECC Errors...................................................................................................................... 3

11-2 BCUCTL (Register 0)................................................................................................................................... 5

11-3 BCUMOD (Register 1)................................................................................................................................. 7

11-4 ELOG0, ELOG1(Registers 4, 5) .................................................................................................................. 9

11-5 ECAR0, ECAR1(Registers 6, 7) .................................................................................................................. 9

11-6 ECTST (Register 8) .................................................................................................................................... 10

12-1 Clock Count Register (CCNT) ..................................................................................................................... 2

12-2 Performance Monitor Count Register (PMN0 and PMN1)..........................................................................3

12-3 Performance Monitor Control Register (CP14, register 0)........................................................................... 4

12-4 Performance Monitoring Events................................................................................................................... 6

12-5 Some Common Uses of the PMU................................................................................................................. 7

13-1 Debug Control and Status Register (DCSR) ................................................................................................ 3

13-2 Event Priority................................................................................................................................................ 6

13-3 Instruction Breakpoint Address and Control Register (IBCRx)................................................................... 9

13-4 Data Breakpoint Register (DBRx).............................................................................................................. 10

13-5 Data Breakpoint Controls Register (DBCON) ........................................................................................... 10

13-6 TX RX Control Register (TXRXCTRL).................................................................................................... 12

13-7 Normal RX Handshaking ........................................................................................................................... 13

13-8 High-Speed Download Handshaking States............................................................................................... 13

13-9 TX Handshaking......................................................................................................................................... 15

13-10 TXRXCTRL Mnemonic Extensions .......................................................................................................... 15

13-11 TX Register................................................................................................................................................. 16

13-12 RX Register ................................................................................................................................................ 16

13-13 DEBUG Data Register Reset Values.......................................................................................................... 25

13-14 CP 14 Trace Buffer Register Summary...................................................................................................... 26

13-15 Checkpoint Register (CHKPTx)................................................................................................................. 26

13-16 TBREG Format........................................................................................................................................... 27

13-17 Message Byte Formats................................................................................................................................ 28

13-18 LDIC Cache Functions ............................................................................................................................... 36

14-1 Minimum Interrupt Latency ......................................................................................................................... 1

14-2 Branch Latency Penalty................................................................................................................................ 2

14-3 Latency Example .......................................................................................................................................... 4

14-4 Branch Instruction Timings (Those predicted by the BTB) ......................................................................... 4

14-5 Branch Instruction Timings (Those not predicted by the BTB)................................................................... 5

14-6 Data Processing Instruction Timings............................................................................................................ 5

14-7 Multiply Instruction Timings........................................................................................................................ 6

14-8 Multiply Implicit Accumulate Instruction Timings...................................................................................... 7

14-9 Implicit Accumulator Access Instruction Timings....................................................................................... 7

14-10 Saturated Data Processing Instruction Timings............................................................................................ 8

14-11 Status Register Access Instruction Timings ................................................................................................. 8

14-12 Load and Store Instruction Timings ............................................................................................................. 8

14-13 Load and Store Multiple Instruction Timings .............................................................................................. 8

xiv March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

14-14 Semaphore Instruction Timings ....................................................................................................................9

14-15 CP15 Register Access Instruction Timings...................................................................................................9

14-16 CP14 Register Access Instruction Timings...................................................................................................9

14-17 SWI Instruction Timings...............................................................................................................................9

14-18 Count Leading Zeros Instruction Timings ....................................................................................................9

A-1 C and B encoding ..........................................................................................................................................3

B-1 Pipelines and Pipe stages...............................................................................................................................3

C-1 TAP Controller Pin Definitions.....................................................................................................................3

C-2 JTAG Instruction Set.....................................................................................................................................4

C-3 IEEE Instructions...........................................................................................................................................5

C-4 JTAG ID Register Value ...............................................................................................................................6

Developer’s Manual March, 2003 xv

Introduction

1.1 Intel® 80200 Processor based on Intel® XScale™ Microarchitecture High-Level Overview

The Intel® 80200 processor based on Intel® XScale™ microarchitecture, is the next generation in the Intel designed for high performance and low-power; leading the industry in mW/MIPs. The Intel 80200 processor integrates a bus controller and an interrupt controller around a core processor, with intended embedded markets such as: handheld devices, networking, remote access servers, etc. This technology is ideal for internet infrastructure products such as network and I/O processors, where ultimate performance is critical for moving and processing large amounts of data quickly.

The Intel achieve high performance. This rich feature set allows programmers to select the appropriate features that obtains the best performance for their application. Many of the architectural features added to Intel high performance processors. This includes:

StrongARM* processor family (compliant with ARM* Architecture V5TE). It is

80200 processor incorporates an extensive list of architecture features that allows it to

80200 processor help hide memory latency which often is a serious impediment to

• the ability to continue instruction execution even while the data cache is retrieving data from

external memory.

• a write buffer.

• write-back caching.

• various data cache allocation policies which can be configured different for each application.

• cache locking.

• and a pipelined external bus.

All these features improve the efficiency of the external bus.

The Intel support of 16-bit data types and 16-bit operations. These audio coding enhancements center around multiply and accumulate operations which accelerate many of the audio filter operations.

80200 processor has been equipped to efficiently handle audio processing through the

1.1.1 ARM* Architecture Compliance

ARM* Version 5 (V5) Architecture added floating point instructions to ARM* Version 4. The

80200 processor implements the integer instruction set architecture of ARM V5, but does

Intel not provide hardware support of the floating point instructions.

The Intel DSP extensions.

Backward compatibility with the first generation of Intel user-mode applications. Operating systems may require modifications to match the specific hardware features of the Intel enhancements added to the Intel

Developer’s Manual March, 2003 1-1

80200 processor provides the Thumb* instruction set (ARM* V5T) and the ARM* V5E

StrongARM* products is maintained for

80200 processor and to take advantage of the performance

80200 processor.

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Introduction

1.1.2 Features

Figure 1-1 shows the major functional blocks of the Intel® 80200 processor. The following

sections give a brief, high-level overview of these blocks.

Figure 1-1. Intel

80200 Processor based on Intel® XScale™ Microarchitecture Features

Instruction Cache

32 Kbytes 32 ways Lockable by line

Branch Target Buffer

2 Kbytes 2 ways

Performance

Monitoring

Debug

Hardware Breakpoint Branch History Table

Interrupt Controller

Interrupt Masking FIQ/IRQ Steering Pend Register

Data Cache

Max 32 Kbytes 32 ways wr-back or

wr-through Hit under

miss

IMMU

32 entry TLB Fully associative Lockable by entry

Power Management

Idle Sleep

Data RAM

Max 28 Kbytes Re-map of

data cache

DMMU

32 entry TLB Fully associative Lockable by entry

MAC

Single Cycle Throughput (16*32)

16-bit SIMD 40-bit Accumulator

Bus Controller

1 Gbyte/sec Pipelined, de-multiplexed ECC protection

Mini-Data Cache

2 Kbytes 2 ways

Fill Buffer

4 - 8 entries

Write Buffer

8 entries Full coalescing

JTAG

B1307-01

1.1.2.1 Multiply/Accumulate (MAC)

The MAC unit supports early termination of multiplies/accumulates in two cycles and can sustain a throughput of a MAC operation every cycle. Several architectural enhancements were made to the MAC to support audio coding algorithms, which include a 40-bit accumulator and support for 16-bit packed data.

See Section 2.3, “Extensions to ARM* Architecture” on page 2-3 for more details.

1-2 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

1.1.2.2 Memory Management

The Intel® 80200 processor implements the Memory Management Unit (MMU) Architecture specified in the ARM Architecture Reference Manual. The MMU provides access protection and virtual to physical address translation.

The MMU Architecture also specifies the caching policies for the instruction cache and data memory. These policies are specified as page attributes and include:

• identifying code as cacheable or non-cacheable

• selecting between the mini-data cache or data cache

• write-back or write-through data caching

• enabling data write allocation policy

• and enabling the write buffer to coalesce stores to external memory

Chapter 3, “Memory Management”discusses this in more detail.

1.1.2.3 Instruction Cache

The Intel® 80200 processor implements a 32-Kbyte, 32-way set associative instruction cache with a line size of 32 bytes. All requests that “miss” the instruction cache generate a 32-byte read request to external memory. A mechanism to lock critical code within the cache is also provided.

Introduction

Chapter 4, “Instruction Cache”discusses this in more detail.

1.1.2.4 Branch Target Buffer

The Intel® 80200 processor provides a Branch Target Buffer (BTB) to predict the outcome of branch type instructions. It provides storage for the target address of branch type instructions and predicts the next address to present to the instruction cache when the current instruction address is that of a branch.

The BTB holds 128 entries. See Chapter 5, “Branch Target Buffer”for more details.

1.1.2.5 Data Cache

The Intel® 80200 processor implements a 32-Kbyte, a 32-way set associative data cache and a 2-Kbyte, 2-way set associative mini-data cache. Each cache has a line size of 32 bytes, supports write-through or write-back caching.

The data/mini-data cache is controlled by page attributes defined in the MMU Architecture and by coprocessor 15.

Chapter 6, “Data Cache”discusses all this in more detail.

The Intel RAM. Software may place special tables or frequently used variables in this RAM. See

Section 6.4, “Re-configuring the Data Cache as Data RAM” on page 6-12 for more information on

this.

80200 processor allows applications to re-configure a portion of the data cache as data

Developer’s Manual March, 2003 1-3

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Introduction

1.1.2.6 Power Management

The Intel® 80200 processor supports two low power modes: idle and sleep. These modes are discussed in Section 8.3, “Power Management” on page 8-5.

1.1.2.7 Interrupt Controller

An interrupt controller is implemented on the Intel® 80200 processor that provides masking of interrupts and the ability to steer interrupts to FIQ or IRQ. It is accessed through Coprocessor 13 registers. See Chapter 9, “Interrupts”for more detail.

1.1.2.8 Bus Controller

The Intel® 80200 processor supports a pipelined external bus that runs at 100 MHz. The data bus is 32/64 bits with ECC protection. The bus controller can be configured to provide critical word first on load operations, enhancing overall system performance. The bus controller has four request queues, where all four requests can be active on the pipelined external bus.

Chapter 10, “External Bus” describes the external bus protocol and Chapter 11, “Bus Controller”

covers the aspects of ECC protection. The bus controller registers are accessed via coprocessor 13.

1.1.2.9 Performance Monitoring

Two performance monitoring counters have been added to the Intel® 80200 processor that can be configured to monitor various events in the Intel developer to measure cache efficiency, detect system bottlenecks and reduce the overall latency of programs.

Chapter 12, “Performance Monitoring”discusses this in more detail.

1.1.2.10 Debug

The Intel® 80200 processor supports software debugging through two instruction address breakpoint registers, one data-address breakpoint register, one data-address/mask breakpoint register, and a trace buffer.

Chapter 13, “Software Debug”discusses this in more detail.

1.1.2.11 JTAG

Testability is supported on the Intel® 80200 processor through the Test Access Port (TAP) Controller implementation, which is based on IEEE 1149.1 (JTAG) Standard Test Access Port and Boundary-Scan Architecture. The purpose of the TAP controller is to support test logic internal and external to the Intel

Appendix C.2 discusses this in more detail.

80200 processor such as built-in self-test, boundary-scan, and scan.

80200 processor. These events allow a software

1-4 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

1.2 Terminology and Conventions

1.2.1 Number Representation

All numbers in this document can be assumed to be base 10 unless designated otherwise. In text and pseudo code descriptions, hexadecimal numbers have a prefix of 0x and binary numbers have a prefix of 0b. For example, 107 would be represented as 0x6B in hexadecimal and 0b1101011 in binary.

1.2.2 Terminology and Acronyms

ASSP Application Specific Standard Product

Assert This term refers to the logically active value of a signal or bit.

BTB Branch Target Buffer

Clean A clean operation updates external memory with the contents of the specified line in

the data/mini-data cache if any of the dirty bits are set and the line is valid. There are two dirty bits associated with each line in the cache so only the portion that is dirty gets written back to external memory.

Introduction

After this operation, the line is still valid and both dirty bits are deasserted.

Coalescing Coalescing means bringing together a new store operation with an existing store

operation already resident in the write buffer. The new store is placed in the same write buffer entry as an existing store when the address of the new store falls in the 4 word aligned address of the existing entry. This includes, in PCI terminology, write merging, write collapsing, and write combining.

Deassert This term refers to the logically inactive value of a signal or bit.

Flush A flush operation invalidates the location(s) in the cache by deasserting the valid bit.

Individual entries (lines) may be flushed or the entire cache may be flushed with one command. Once an entry is flushed in the cache it can no longer be used by the program.

Reserved A reserved field is a field that may be used by an implementation. If the initial value

of a reserved field is supplied by software, this value must be zero. Software should not modify reserved fields or depend on any values in reserved fields.

Developer’s Manual March, 2003 1-5

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Introduction

1.3 Other Relevant Documents

• Intel

• ARM Architecture Version 5TE Specification Document Number: ARM DDI 0100E

• ARM Architecture Reference Manual Document Number: ARM DDI 0100B

• Intel

• StrongARM SA-1100 Microprocessor Developer’s Manual, Intel Order # 278088

• StrongARM SA-110 Microprocessor Technical Reference Manual, Intel Order #278058

80200 Processor based on Intel® XScale™ Microarchitecture Datasheet, Intel Order #

273414

This document describes Version 5TE of the ARM Architecture which includes Thumb ISA and ARM DSP-Enhanced ISA.

This document describes Version 4 of the ARM Architecture.

XScale™ Microarchitecture Programming Reference Manual, Intel Order # 273436

80312 I/O Companion Chip Developer’s Manual, Intel Order # 273410

1-6 March, 2003 Developer’s Manual

Programming Model

This chapter describes the programming model of the Intel® 80200 processor based on Intel®

™

XScale Version 5 architecture.

The ARM* Architecture Version 5TE Specification (ARM DDI 0100E) describes Version 5TE of the ARM Architecture, including the Thumb* ISA and ARM DSP-Enhanced ISA.

2.1 ARM* Architecture Compliance

The Intel® 80200 processor implements the integer instruction set architecture specified in ARM* Version 5TE. T refers to the Thumb instruction set and E refers to the DSP-Enhanced instruction set.

ARM* Version 5 introduces a few more architecture features over Version 4, specifically the addition of tiny pages (1 Kbyte), a new instruction (CLZ) that counts the leading zeroes in a data value, enhanced ARM-Thumb transfer instructions and a modification of the system control coprocessor, CP15.

2.2 ARM* Architecture Implementation Options

microarchitecture, namely the implementation options and extensions to the ARM*

2.2.1 Big Endian versus Little Endian

The Intel® 80200 processor supports both big and little endian data representation. The B-bit of the Control Register (Coprocessor 15, register 1, bit 7) selects big and little endian mode. To run in big endian mode, the B bit must be set before attempting any sub-word accesses to memory, or undefined results occur. Note that this bit takes effect even if the MMU is disabled.

2.2.2 26-Bit Code

The Intel® 80200 processor does not support 26-bit code.

2.2.3 Thumb*

The Intel® 80200 processor supports the Thumb instruction set.

Developer’s Manual March, 2003 2-1

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Programming Model

2.2.4 ARM* DSP-Enhanced Instruction Set

The Intel® 80200 processor implements ARM DSP-enhanced instruction set, which is a set of instructions that boost the performance of signal processing applications. There are new multiply instructions that operate on 16-bit data values and new saturation instructions. Some of the new instructions are:

• SMLAxy 32<=16x16+32

• SMLAWy 32<=32x16+32

• SMLALxy 64<=16x16+64

• SMULxy 32<=16x16

• SMULWy 32<=32x16

• QADD adds two registers and saturates the result if an overflow occurred

• QDADD doubles and saturates one of the input registers then add and saturate

• QSUB subtracts two registers and saturates the result if an overflow occurred

• QDSUB doubles and saturates one of the input registers then subtract and saturate

The Intel following implementation notes:

80200 processor also implements LDRD, STRD and PLD instructions with the

• PLD is interpreted as a read operation by the MMU and is ignored by the data breakpoint unit,

i.e., PLD never generates data breakpoint events.

• PLD to a non-cacheable page performs no action. Also, if the targeted cache line is already

resident, this instruction has no affect.

• Both LDRD and STRD instructions generation an alignment exception when the address bits

[2:0] = 0b100.

MCRR and MRRC are only supported on the Intel 0 and are used to access the internal accumulator. See Section 2.3.1.2 for more information. Access to any other coprocessor besides 0x0 are undefined.

2.2.5 Base Register Update

If a data abort is signalled on a memory instruction that specifies writeback, the contents of the base register is not updated. This holds for all load and store instructions. This behavior matches that of the first generation Intel architecture as the Base Restored Abort Model.

StrongARM* processor and is referred to in the ARM V5

80200 processor when directed to coprocessor

2-2 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

2.3 Extensions to ARM* Architecture

The Intel® 80200 processor made a few extensions to the ARM Version 5 architecture to meet the needs of various markets and design requirements. The following is a list of the extensions which are discussed in the next sections.

• A DSP coprocessor (CP0) has been added that contains a 40-bit accumulator and new

instructions.

• New page attributes were added to the page table descriptors. The C and B page attribute

encoding was extended by one more bit to allow for more encodings: write allocate and mini-data cache. An attribute specifying ECC for 1Meg regions was also added.

• Additional functionality has been added to coprocessor 15. Coprocessor 14 was also created.

• Enhancements were made to the Event Architecture, instruction cache and data cache parity

error exceptions, breakpoint events, and imprecise external data aborts.

2.3.1 DSP Coprocessor 0 (CP0)

The Intel® 80200 processor adds a DSP coprocessor to the architecture for the purpose of increasing the performance and the precision of audio processing algorithms. This coprocessor contains a 40-bit accumulator and new instructions.

Programming Model

The 40-bit accumulator is referenced by several new instructions that were added to the architecture; MIA, MIAPH and MIAxy are multiply/accumulate instructions that reference the 40-bit accumulator instead of a register specified accumulator. MAR and MRA provide the ability to read and write the 40-bit accumulator.

Access to CP0 is always allowed in all processor modes when bit 0 of the Coprocessor Access Register is set. Any access to CP0 when this bit is clear causes an undefined exception. (See

Section 7.2.15, “Register 15: Coprocessor Access Register” on page 7-18 for more details). Note

that only privileged software can set this bit in the Coprocessor Access Register.

The 40-bit accumulator needs to be saved on a context switch if multiple processes are using it.

Two new instruction formats were added for coprocessor 0: Multiply with Internal Accumulate Format and Internal Accumulate Access Format. The formats and instructions are described next.

Developer’s Manual March, 2003 2-3

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Programming Model

2.3.1.1 Multiply With Internal Accumulate Format

A new multiply format has been created to define operations on 40-bit accumulators. Table 2-1 ,

“Multiply with Internal Accumulate Format” on page 2-4 shows the layout of the new format. The

opcode for this format lies within the coprocessor register transfer instruction type. These instructions have their own syntax.

Table 2-1. Multiply with Internal Accumulate Format

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond 1 1 1 0 0 0 1 0 opcode_3 Rs 0 0 0 0 acc 1 Rm

Bits Description Notes

31:28 cond - ARM condition codes -

19:16

15:12 Rs - Multiplier

7:5 acc - select 1 of 8 accumulators

3:0 Rm - Multiplicand -

opcode_3 - specifies the type of multiply with internal accumulate

Intel 0b0000 = 0b1000 = MIAPH 0b1100 = MIABB 0b1101 = MIABT 0b1110 = MIATB 0b1111 = MIATT The effect of all other encodings are unpredictable.

Intel access to any other acc has unpredictable effect.

80200 processor defines the following:

MIA

80200 processor only implements acc0;

Two new fields were created for this format, acc and opcode_3. The acc field specifies 1 of 8 internal accumulators to operate on and opcode_3 defines the operation for this format. The Intel 80200 processor defines a single 40-bit accumulator referred to as acc0; future implementations may define multiple internal accumulators.The Intel instructions, MIA, MIAPH, MIABB, MIABT, MIATB and MIATT.

Table 2-2. MIA{<cond>} acc0, Rm, Rs

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond 111000100000 Rs 00000001 Rm

Operation: if ConditionPassed(<cond>) then

acc0 = (Rm[31:0] * Rs[31:0])[39:0] + acc0[39:0] Exceptions: none Qualifiers Condition Code

No condition code flags are updated

Notes: Early termination is supported. Instruction timings can be found

in Section 14.4.4, “Multiply Instruction Timings” on page 14-6. Specifying R15 for register Rs or Rm has unpredictable results. acc0 is defined to be 0b000 on 80200.

The MIA instruction operates similarly to MLA except that the 40-bit accumulator is used. MIA multiplies the signed value in register Rs (multiplier) by the signed value in register Rm (multiplicand) and then adds the result to the 40-bit accumulator (acc0).

80200 processor uses opcode_3 to define six

2-4 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

MIA does not support unsigned multiplication; all values in Rs and Rm are interpreted as signed data values. MIA is useful for operating on signed 16-bit data that was loaded into a general purpose register by LDRSH.

The instruction is only executed if the condition specified in the instruction matches the condition code status.

Table 2-3. MIAPH{<cond>} acc0, Rm, Rs

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond 111000101000 Rs 00000001 Rm

Operation: if ConditionPassed(<cond>) then

acc0 = sign_extend(Rm[31:16] * Rs[31:16]) +

sign_extend(Rm[15:0] * Rs[15:0]) +

acc0[39:0] Exceptions: none Qualifiers Condition Code

S bit is always cleared; no condition code flags are updated

Notes: Instruction timings can be found

in Section 14.4.4, “Multiply Instruction Timings” on page 14-6. Specifying R15 for register Rs or Rm has unpredictable results. acc0 is defined to be 0b000 on 80200

Programming Model

The MIAPH instruction performs two16-bit signed multiplies on packed half word data and accumulates these to a single 40-bit accumulator. The first signed multiplication is performed on the lower 16 bits of the value in register Rs with the lower 16 bits of the value in register Rm. The second signed multiplication is performed on the upper 16 bits of the value in register Rs with the upper 16 bits of the value in register Rm. Both signed 32-bit products are sign extended and then added to the value in the 40-bit accumulator (acc0).

The instruction is only executed if the condition specified in the instruction matches the condition code status.

Developer’s Manual March, 2003 2-5

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Programming Model

Table 2-4. MIAxy{<cond>} acc0, Rm, Rs

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond 1110001011xy Rs 00000001 Rm

Operation: if ConditionPassed(<cond>) then

if (bit[17] == 0)

<operand1> = Rm[15:0]

else

<operand1> = Rm[31:16]

if (bit[16] == 0)

<operand2> = Rs[15:0]

else

<operand2> = Rs[31:16]

acc0[39:0] = sign_extend(<operand1> * <operand2>) + acc0[39:0]

Exceptions: none Qualifiers Condition Code

S bit is always cleared; no condition code flags are updated

Notes: Instruction timings can be found

in Section 14.4.4, “Multiply Instruction Timings” on page 14-6. Specifying R15 for register Rs or Rm has unpredictable results. acc0 is defined to be 0b000 on 80200.

The MIAxy instruction performs one16-bit signed multiply and accumulates these to a single 40-bit accumulator. x refers to either the upper half or lower half of register Rm (multiplicand) and y refers to the upper or lower half of Rs (multiplier). A value of 0x1 selects bits [31:16] of the register which is specified in the mnemonic as T (for top). A value of 0x0 selects bits [15:0] of the register which is specified in the mnemonic as B (for bottom).

MIAxy does not support unsigned multiplication; all values in Rs and Rm are interpreted as signed data values.

The instruction is only executed if the condition specified in the instruction matches the condition code status.

2-6 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

2.3.1.2 Internal Accumulator Access Format

The Intel® 80200 processor defines a new instruction format for accessing internal accumulators in CP0. Table 2-5, “Internal Accumulator Access Format” on page 2-7 shows that the opcode falls into the coprocessor register transfer space.

Programming Model

The RdHi and RdLo fields allow up to 64 bits of data transfer between Intel registers and an internal accumulator. The acc field specifies 1 of 8 internal accumulators to transfer data to/from. The Intel

80200 processor implements a single 40-bit accumulator referred to as acc0; future implementations can specify multiple internal accumulators of varying sizes, up to 64 bits.

Access to the internal accumulator is allowed in all processor modes (user and privileged) as long bit 0 of the Coprocessor Access Register is set. (See Section 7.2.15, “Register 15: Coprocessor

Access Register” on page 7-18 for more details).

The Intel

80200 processor implements two instructions MAR and MRA that move two Intel®

StrongARM* registers to acc0 and move acc0 to two Intel

Table 2-5. Internal Accumulator Access Format

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond 1100010L RdHi RdLo 000000000 acc

Bits Description Notes

31:28 cond - ARM condition codes -

19:16

15:12

7:4 Should be zero

3 Should be zero

2:0 acc - specifies 1 of 8 internal accumulators

L - move to/from internal accumulator 0= move to internal accumulator (MAR) 1= move from internal accumulator (MRA)

RdHi - specifies the high order eight (39:32) bits of the internal accumulator.

RdLo - specifies the low order 32 bits of the internal accumulator

StrongARM*

StrongARM* registers, respectively.

On a read of the acc, this 8-bit high order field is sign extended.

On a write to the acc, the lower 8 bits of this register is written to acc[39:32]

This field could be used in future implementations to specify the type of saturation to perform on the read of an internal accumulator. (e.g., a signed saturation to 16-bits may be useful for some filter algorithms.)

80200 processor only implements acc0;

Intel access to any other acc is unpredictable

Note: MAR has the same encoding as MCRR (to coprocessor 0) and MRA has the same encoding as

MRRC (to coprocessor 0). These instructions move 64-bits of data to/from ARM registers from/to

coprocessor registers. MCRR and MRRC are defined in ARM’s DSP instruction set.

Disassemblers not aware of MAR and MRA produces the following syntax:

MCRR{<cond>} p0, 0x0, RdLo, RdHi, c0 MRRC{<cond>} p0, 0x0, RdLo, RdHi, c0

Developer’s Manual March, 2003 2-7

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Programming Model

Table 2-6. MAR{<cond>} acc0, RdLo, RdHi

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond 11000100 RdHi RdLo 000000000000

Operation: if ConditionPassed(<cond>) then

Exceptions: none Qualifiers Condition Code

Notes: Instruction timings can be found in

acc0[39:32] = RdHi[7:0] acc0[31:0] = RdLo[31:0]

No condition code flags are updated

Section 14.4.4, “Multiply Instruction Timings” on page 14-6

Specifying R15 as either RdHi or RdLo has unpredictable results.

The MAR instruction moves the value in register RdLo to bits[31:0] of the 40-bit accumulator (acc0) and moves bits[7:0] of the value in register RdHi into bits[39:32] of acc0.

The instruction is only executed if the condition specified in the instruction matches the condition code status.

This instruction executes in any processor mode.

Table 2-7. MRA{<cond>} RdLo, RdHi, acc0

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond 11000101 RdHi RdLo 000000000000

Operation: if ConditionPassed(<cond>) then

Exceptions: none Qualifiers Condition Code

Notes: Instruction timings can be found in

RdHi[31:0] = sign_extend(acc0[39:32]) RdLo[31:0] = acc0[31:0]

No condition code flags are updated

Section 14.4.4, “Multiply Instruction Timings” on page 14-6

Specifying the same register for RdHi and RdLo has unpredictable results.

Specifying R15 as either RdHi or RdLo has unpredictable results.

The MRA instruction moves the 40-bit accumulator value (acc0) into two registers. Bits[31:0] of the value in acc0 are moved into the register RdLo. Bits[39:32] of the value in acc0 are sign extended to 32 bits and moved into the register RdHi.

The instruction is only executed if the condition specified in the instruction matches the condition code status.

This instruction executes in any processor mode.

2-8 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

2.3.2 New Page Attributes

The Intel® 80200 processor extends the page attributes defined by the C and B bits in the page descriptors with an additional X bit. This bit allows four more attributes to be encoded when X=1. These new encodings include allocating data for the mini-data cache and write-allocate caching. A full description of the encodings can be found in Section 3.2.2, “Memory Attributes” on page 3-2.

The Intel different than the first generation Intel mini-data cache has been moved and replaced with the write-through caching attribute.

When write-allocate is enabled, a store operation that misses the data cache (cacheable data only) generates a line fill. If disabled, a line fill only occurs when a load operation misses the data cache (cacheable data only).

Write-through caching causes all store operations to be written to memory, whether they are cacheable or not cacheable. This feature is useful for maintaining data cache coherency.

80200 processor retains ARM definitions of the C and B encoding when X = 0, which is

Programming Model

StrongARM* products. The memory attribute for the

The Intel

80200 processor also added a P bit in the first level descriptors to identify which pages

of memory are protected with ECC.

A descriptor with the P bit set indicates the corresponding page in memory is ECC protected. If the BCUs ECC mode is enabled (see Chapter 11, “Bus Controller”) then writes to such a page are accompanied with an ECC and reads are validated by an ECC.

Bit 1 in the Control Register (coprocessor 15, register 1, opcode=1) enables ECC protection for memory accesses made during page table walks.

These attributes are programmed in the translation table descriptors, which are highlighted in

Table 2-8, “First-level Descriptors” on page 2-10, Table 2-9, “Second-level Descriptors for Coarse Page Table” on page 2-10 and Table 2-10, “Second-level Descriptors for Fine Page Table” on page 2-10. Two second-level descriptor formats have been defined for Intel

80200 processor, one

is used for the coarse page table and the other is used for the fine page table.

Developer’s Manual March, 2003 2-9

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Programming Model

Table 2-8. First-level Descriptors

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

SBZ 0 0

Coarse page table base address P Domain SBZ 0 1

Section base address SBZ TEX AP P Domain 0 C B 1 0

Fine page table base address SBZ P Domain SBZ 1 1

Table 2-9. Second-level Descriptors for Coarse Page Table

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

SBZ 0 0

Large page base address TEX AP3 AP2 AP1 AP0 C B 0 1

Small page base address AP3 AP2 AP1 AP0 C B 1 0

Extended small page base address SBZ TEX AP C B 1 1

Table 2-10. Second-level Descriptors for Fine Page Table

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

SBZ 0 0

Large page base address TEX AP3 AP2 AP1 AP0 C B 0 1

Small page base address AP3 AP2 AP1 AP0 C B 1 0

Tiny Page Base Address TEX AP C B 1 1

The P bit controls ECC.

The TEX (Type Extension) field is present in several of the descriptor types. In the Intel processor, only the LSB of this field is used; this is called the X bit.

A Small Page descriptor does not have a TEX field. For these descriptors, TEX is implicitly zero; that is, they operate as if the X bit had a ‘0’ value.

The X bit, when set, modifies the meaning of the C and B bits. Description of page attributes and their encoding can be found in Chapter 3, “Memory Management”.

80200

2-10 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

2.3.3 Additions to CP15 Functionality

To accommodate the functionality in the Intel® 80200 processor, registers in CP15 and CP14 have been added or augmented. See Chapter 7, “Configuration” for details.

At times it is necessary to be able to guarantee exactly when a CP15 update takes effect. For example, when enabling memory address translation (turning on the MMU), it is vital to know when the MMU is actually guaranteed to be in operation. To address this need, a processor-specific code sequence is defined for each Intel the sequence -- called CPWAIT -- is shown in Example 2-1 on page 2-11.

Example 2-1. CPWAIT: Canonical method to wait for CP15 update

;; The following macro should be used when software needs to be ;; assured that a CP15 update has taken effect. ;; It may only be used while in a privileged mode, because it ;; accesses CP15.

MACRO CPWAIT

MRC P15, 0, R0, C2, C0, 0 ; arbitrary read of CP15 MOV R0, R0 ; wait for it SUB PC, PC, #4 ; branch to next instruction

StrongARM* processor. For the Intel® 80200 processor,

Programming Model

; At this point, any previous CP15 writes are ; guaranteed to have taken effect.

ENDM

When setting multiple CP15 registers, system software may opt to delay the assurance of their update. This is accomplished by emitting CPWAIT only after the sequence of MCR instructions.

The CPWAIT sequence guarantees that CP15 side-effects are complete by the time the CPWAIT is complete. It is possible, however, that the CP15 side-effect takes place before CPWAIT completes or is issued. Programmers should take care that this does not affect the correctness of their code.

Developer’s Manual March, 2003 2-11

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Programming Model

2.3.4 Event Architecture

2.3.4.1 Exception Summary

Table 2-11 shows all the exceptions that the Intel® 80200 processor may generate, and the

attributes of each. Subsequent sections give details on each exception.

Table 2-11. Exception Summary

Exception Description Exception Type

Reset Reset N N

FIQ FIQ N N

IRQ IRQ N N

External Instruction Prefetch Y N

Instruction MMU Prefetch Y N

Instruction Cache Parity Prefetch Y N

Lock Abort Data Y N

MMU Data Data Y Y

External Data Data N N

Data Cache Parity Data N N

Software Interrupt Software Interrupt Y N

Undefined Instruction Undefined Instruction Y N

Debug Events

a. Exception types are those described in the ARM, section 2.5. b. Refer to Chapter 13, “Software Debug” for more details

varies varies N

Precise? Updates FAR?

2.3.4.2 Event Priority

The Intel® 80200 processor follows the exception priority specified in the ARM Architecture Reference Manual. The processor has additional exceptions that might be generated while

debugging. For information on these debug exceptions, see Chapter 13, “Software Debug”.

Table 2-12. Event Priority

Exception Priority

Reset 1 (Highest)

Data Abort (Precise & Imprecise) 2

FIQ 3

IRQ 4

Prefetch Abort 5

Undefined Instruction, SWI 6 (Lowest)

2-12 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

2.3.4.3 Prefetch Aborts

The Intel® 80200 processor detects three types of prefetch aborts: Instruction MMU abort, external abort on an instruction access, and an instruction cache parity error. These aborts are described in

Table 2-1 3 .

When a prefetch abort occurs, hardware reports the highest priority one in the extended Status field of the Fault Status Register. The value placed in R14_ABORT (the link register in abort mode) is the address of the aborted instruction + 4.

Programming Model

Table 2-13. Intel

Priority Sources FS[10,3:0]

Instruction MMU Exception

Several exceptions can generate this encoding:

Highest

Lowest Instruction Cache Parity Error Exception 0b11000 invalid invalid

a. All other encodings not listed in the table are reserved.

- translation faults

- domain faults, and

- permission faults

It is up to software to figure out which one occurred.

External Instruction Error Exception

This exception occurs when the external memory system reports an error on an instruction cache fetch.

80200 Processor Encoding of Fault Status for Prefetch Aborts

0b10000 invalid invalid

0b10110 invalid invalid

Domain FAR

Developer’s Manual March, 2003 2-13

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Programming Model

2.3.4.4 Data Aborts

Two types of data aborts exist in the Intel® 80200 processor: precise and imprecise. A precise data abort is defined as one where R14_ABORT always contains the PC (+8) of the instruction that caused the exception. An imprecise abort is one where R14_ABORT contains the PC (+4) of the next instruction to execute and not the address of the instruction that caused the abort. In other words, instruction execution has advanced beyond the instruction that caused the data abort.

On the Intel

80200 processor precise data aborts are recoverable and imprecise data aborts are not

recoverable.

Precise Data Aborts

• A lock abort is a precise data abort; the extended Status field of the Fault Status Register is set

to 0xb10100. This abort occurs when a lock operation directed to the MMU (instruction or data) or instruction cache causes an exception, due to either a translation fault, access permission fault or external bus fault.

The Fault Address Register is undefined and R14_ABORT is the address of the aborted instruction + 8.

• A data MMU abort is precise. These are due to an alignment fault, translation fault, domain

fault, permission fault or external data abort on an MMU translation. The status field is set to a predetermined ARM definition which is shown in Table 2-14, “Intel

Encoding of Fault Status for Data Aborts” on page 2-14.

The Fault Address Register is set to the effective data address of the instruction and R14_ABORT is the address of the aborted instruction + 8.

Table 2-14. Intel

Priority Sources FS[10,3:0]

Highest Alignment 0b000x1 invalid valid

External Abort on Translation

Translation

Domain

Permission

Lock Abort

This data abort occurs on an MMU lock operation (data or instruction TLB) or on an Instruction Cache lock operation.

Imprecise External Data Abort 0b10110 invalid invalid

Lowest Data Cache Parity Error Exception 0b11000 invalid invalid

a. All other encodings not listed in the table are reserved.

80200 Processor Encoding of Fault Status for Data Aborts

First level

Second level

Section

Page

Section

Page

Section

Page

0b01100 0b01110

0b00101 0b00111

0b01001 0b01011

0b01101

0b01111

0b10100 invalid invalid

80200 Processor

Domain FAR

invalid

valid

invalid

valid

valid valid

2-14 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Programming Model

Imprecise data aborts

• A data cache parity error is imprecise; the extended Status field of the Fault Status Register is

set to 0xb11000.

• All external data aborts except for those generated on a data MMU translation are imprecise.

The Fault Address Register for all imprecise data aborts is undefined and R14_ABORT is the address of the next instruction to execute + 4, which is the same for both ARM and Thumb mode.

The Intel Abort pin is asserted on memory transactions. (See Chapter 11, “Bus Controller” for more details.) An external data abort can occur on non-cacheable loads, reads into the cache, cache evictions, or stores to external memory.

80200 processor generates external data aborts on multi-bit ECC errors and when the

Although the Intel

80200 processor guarantees the Base Restored Abort Model for precise aborts, it cannot do so in the case of imprecise aborts. A Data Abort handler may encounter an updated base register if it is invoked because of an imprecise abort.

Imprecise data aborts may create scenarios that are difficult for an abort handler to recover. Both external data aborts and data cache parity errors may result in corrupted data in the targeted registers. Because these faults are imprecise, it is possible that the corrupted data has been used before the Data Abort fault handler is invoked. Because of this, software should treat imprecise data aborts as unrecoverable.

Note that even memory accesses marked as “stall until complete” (see Section 3.2.2.4) can result in imprecise data aborts. For these types of accesses, the fault is somewhat less imprecise than the general case: it is guaranteed to be raised within three instructions of the instruction that caused it. In other words, if a “stall until complete” LD or ST instruction triggers an imprecise fault, then that fault is seen by the program within three instructions.

With this knowledge, it is possible to write code that accesses “stall until complete” memory with impunity. Simply place several NOP instructions after such an access. If an imprecise fault occurs, it happens during the NOPs; the data abort handler sees identical register and memory state as it would with a precise exception, and so should be able to recover. An example of this is shown in

Example 2-2 on page 2-15.

Example 2-2. Shielding Code from Potential Imprecise Aborts

;; Example of code that maintains architectural state through the ;; window where an imprecise fault might occur.

LD R0, [R1] ; R1 points to stall-until-complete

; region of memory NOP NOP NOP ; Code beyond this point is guaranteed not to see any aborts ; from the LD.

Of course, if a system design precludes events that could cause external aborts, then such precautions are not necessary.

Developer’s Manual March, 2003 2-15

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Programming Model

Multiple Data Aborts

Multiple data aborts may be detected by hardware, but only the highest priority one is reported. If the reported data abort is precise, software can correct the cause of the abort and re-execute the aborted instruction. If the lower priority abort still exists, it is reported. Software can handle each abort separately until the instruction successfully executes.

If the reported data abort is imprecise, software needs to check the SPSR to see if the previous context was executing in abort mode. If this is the case, the link back to the current process has been lost and the data abort is unrecoverable.

2.3.4.5 Events from Preload Instructions

A PLD instruction never causes the Data MMU to fault for any of the following reasons:

• Domain Fault

• Permission Fault

• Translation Fault

If execution of the PLD would cause one of the above faults, then the PLD causes no effect.

This feature allows software to issue PLDs speculatively. For example, Example 2-3 on page 2-16 places a PLD instruction early in the loop. This PLD is used to fetch data for the next loop iteration. In this example, the list is terminated with a node that has a null pointer. When execution reaches the end of the list, the PLD on address 0x0 does not cause a fault. Rather, it is ignored and the loop terminates normally.

Example 2-3. Speculatively issuing PLD

;; R0 points to a node in a linked list. A node has the following layout: ;; Offset Contents ;;---------------------------------;; 0 data ;; 4 pointer to next node ;; This code computes the sum of all nodes in a list. The sum is placed into R9. ;;

MOV R9, #0 ; Clear accumulator

sumList:

LDR R1, [R0, #4] ; R1 gets pointer to next node LDR R3, [R0] ; R3 gets data from current node PLD [R1] ; Speculatively start load of next node ADD R9, R9, R3 ; Add into accumulator MOVS R0, R1 ; Advance to next node. At end of list? BNE sumList ; If not then loop

2.3.4.6 Debug Events

Debug events are covered in Section 13.5, “Debug Exceptions” on page 13-6.

2-16 March, 2003 Developer’s Manual

Memory Management

This chapter describes the memory management unit implemented in the Intel® 80200 processor based on Intel

XScale™ microarchitecture, and is compliant with the ARM* Architecture V5TE.

3.1 Overview

The Intel® 80200 processor implements the Memory Management Unit (MMU) Architecture specified in the ARM Architecture Reference Manual. To accelerate virtual to physical address translation, the Intel (TLB) and a data TLB to cache the latest translations. Each TLB holds 32 entries and is fully-associative. Not only do the TLBs contain the translated addresses, but also the access rights for memory references.

If an instruction or data TLB miss occurs, a hardware translation-table-walking mechanism is invoked to translate the virtual address to a physical address. Once translated, the physical address is placed in the TLB along with the access rights and attributes of the page or section. These translations can also be locked down in either TLB to guarantee the performance of critical routines.

The Intel memory:

80200 processor allows system software to associate various attributes with regions of

• cacheable

• bufferable

• line allocate policy

80200 processor uses both an instruction Translation Look-aside Buffer

• write policy

• I/O

• mini Data Cache

• Coalescing

• ECC-Protected

See Section 3.2.2, “Memory Attributes” on page 3-2 for a description of page attributes and

Section 2.3.2, “New Page Attributes” on page 2-9 to find out where these attributes have been

mapped in the MMU descriptors.

Note: The virtual address with which the TLBs are accessed may be remapped by the PID register. See

Section 7.2.13, “Register 13: Process ID” on page 7-16 for a description of the PID register.

Developer’s Manual March, 2003 3-1

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Memory Management

3.2 Architecture Model

3.2.1 Version 4 vs. Version 5

ARM* MMU Version 5 Architecture introduces the support of tiny pages, which are 1 KByte in size. The reserved field in the first-level descriptor (encoding 0b11) is used as the fine page table base address. The exact bit fields and the format of the first and second-level descriptors can be found in Section 2.3.2, “New Page Attributes” on page 2-9.

3.2.2 Memory Attributes

The attributes associated with a particular region of memory are configured in the memory management page table and control the behavior of accesses to the instruction cache, data cache, mini-data cache and the write buffer. These attributes are ignored when the MMU is disabled.

To allow compatibility with older system software, the new Intel advantage of encoding space in the descriptors that was formerly reserved.

80200 processor attributes take

3.2.2.1 Page (P) Attribute Bit

The P bit specifies that the associated memory should be protected with ECC. The P bit is only present in the first level descriptors. Thus, ECC memory is specified with a 1 megabyte granularity.

If the MMU is disabled, ECC is disabled for all memory accesses. If the MMU is enabled, ECC is enabled for a region of memory if:

• its P bit in the first level descriptor for that virtual memory is set and

• the BCU has ECC enabled (see Chapter 11, “Bus Controller”)

Accesses to memory for page walks do not use the MMU. For these accesses, ECC is enabled if:

• the CP15 Auxiliary Control Register enables it (see Section 7.2.2, “Register 1: Control and

Auxiliary Control Registers” on page 7-7) and

• the BCU has ECC enabled (see Chapter 11, “Bus Controller”)

3.2.2.2 Cacheable (C), Bufferable (B), and eXtension (X) Bits

3.2.2.3 Instruction Cache

When examining these bits in a descriptor, the Instruction Cache only utilizes the C bit. If the C bit is clear, the Instruction Cache considers a code fetch from that memory to be non-cacheable, and does not fill a cache entry. If the C bit is set, then fetches from the associated memory region are cached.

3-2 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

3.2.2.4 Data Cache and Write Buffer

All of these descriptor bits affect the behavior of the Data Cache and the Write Buffer.

If the X bit for a descriptor is zero, the C and B bits operate as mandated by the ARM architecture. This behavior is detailed in Table 3- 1 .

If the X bit for a descriptor is one, the C and B bits’ meaning is extended, as detailed in Table 3- 2.

Table 3-1. Data Cache and Buffer Behavior when X = 0

Memory Management

C B Cacheable? Bufferable? Write Policy

0 0 N N - - Stall until complete

0 1 N Y - -

1 0 Y Y Write Through Read Allocate

1 1 Y Y Write Back Read Allocate

a. Normally, the processor continues executing after a data access if no dependency on that access is encountered. With this

setting, the processor stalls execution until the data access completes. This guarantees to software that the data access has taken effect by the time execution of the data access instruction completes. External data aborts from such accesses are imprecise (but see Section 2.3.4.4 for a method to shield code from this imprecision).

Table 3-2. Data Cache and Buffer Behavior when X = 1

C B Cacheable? Bufferable? Write Policy

0 0 - - - - Unpredictable -- do not use

0 1 N Y - -

1 0

1 1 Y Y Write Back

a. Normally, bufferable writes can coalesce with previously buffered data in the same address range b. See Section 7.2.2 for a description of this register

(Mini Data

Cache)

---

Line

Allocation

Policy

Line

Allocation

Policy

Read/Write

Allocate

Notes

Writes do not coalesce into

buffers

Cache policy is determined by MD field of Auxiliary Control register

Developer’s Manual March, 2003 3-3

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Memory Management

3.2.2.5 Details on Data Cache and Write Buffer Behavior

If the MMU is disabled all data accesses are non-cacheable and non-bufferable. This is the same behavior as when the MMU is enabled, and a data access uses a descriptor with X, C, and B all set to 0.

The X, C, and B bits determine when the processor should place new data into the Data Cache. The cache places data into the cache in lines (also called blocks). Thus, the basis for making a decision about placing new data into the cache is a called a “Line Allocation Policy”.

If the Line Allocation Policy is read-allocate, all load operations that miss the cache request a 32-byte cache line from external memory and allocate it into either the data cache or mini-data cache (this is assuming the cache is enabled). Store operations that miss the cache do not cause a line to be allocated.

If read/write-allocate is in effect, load or

store operations that miss the cache requests a 32-byte

cache line from external memory if the cache is enabled.

The other policy determined by the X, C, and B bits is the Write Policy. A write-through policy instructs the Data Cache to keep external memory coherent by performing stores to both external memory and the cache. A write-back policy only updates external memory when a line in the cache is cleaned or needs to be replaced with a new line. Generally, write-back provides higher performance because it generates less data traffic to external memory.

More details on cache policies may be gleaned from Section 6.2.3, “Cache Policies” on page 6-5.

3.2.2.6 Memory Operation Ordering

A fence memory operation (memop) is one that guarantees all memops issued prior to the fence executes before any memop issued after the fence. Thus software may issue a fence to impose a partial ordering on memory accesses.

Table 3-3 on page 3-4 shows the circumstances in which memops act as fences.

Any swap (SWP or SWPB) to a page that would create a fence on a load or store is a fence.

Table 3-3. Memory Operations that Impose a Fence

operation X C B

load - 0 -

store101

load or store 0 0 0

3.2.3 Exceptions

The MMU may generate prefetch aborts for instruction accesses and data aborts for data memory accesses. The types and priorities of these exceptions are described in Section 2.3.4, “Event

Architecture” on page 2-12.

Data address alignment checking is enabled by setting bit 1 of the Control Register (CP15, register 1). Alignment faults are still reported even if the MMU is disabled. All other MMU exceptions are disabled when the MMU is disabled.

3-4 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Memory Management

3.3 Interaction of the MMU, Instruction Cache, and Data Cache

The MMU, instruction cache, and data/mini-data cache may be enabled/disabled independently. The instruction cache can be enabled with the MMU enabled or disabled. However, the data cache can only be enabled when the MMU is enabled. Therefore only three of the four combinations of the MMU and data/mini-data cache enables are valid. The invalid combination causes undefined results.

Table 3-4. Valid MMU & Data/mini-data Cache Combinations

MMU Data/mini-data Cache

Off Off

On Off

On On

Developer’s Manual March, 2003 3-5

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Memory Management

3.4 Control

3.4.1 Invalidate (Flush) Operation

The entire instruction and data TLB can be invalidated at the same time with one command or they can be invalidated separately. An individual entry in the data or instruction TLB can also be invalidated. See Table 7-13, “TLB Functions” on page 7-13 for a listing of commands supported by the Intel

Globally invalidating a TLB does not affect locked TLB entries. However, the invalidate-entry operations can invalidate individual locked entries. In this case, the locked entry remains in the TLB, but never “hits” on an address translation. Effectively, a hole is in the TLB. This situation may be rectified by unlocking the TLB.

3.4.2 Enabling/Disabling

The MMU is enabled by setting bit 0 in coprocessor 15, register 1 (Control Register).

When the MMU is disabled, accesses to the instruction cache default to cacheable and all accesses to data memory are made non-cacheable.

80200 processor.

A recommended code sequence for enabling the MMU is shown in Example 3-1 on page 3-6.

Example 3-1. Enabling the MMU

; This routine provides software with a predictable way of enabling the MMU. ; After the CPWAIT, the MMU is guaranteed to be enabled. Be aware ; that the MMU will be enabled sometime after MCR and before the instruction ; that executes after the CPWAIT. ; Programming Note: This code sequence requires a one-to-one virtual to ; physical address mapping on this code since ; the MMU may be enabled part way through. This would allow the instructions ; after MCR to execute properly regardless the state of the MMU.

MRC P15,0,R0,C1,C0,0; Read CP15, register 1 ORR R0, R0, #0x1; Turn on the MMU MCR P15,0,R0,C1,C0,0; Write to CP15, register 1

; For a description of CPWAIT, see ; Section 2.3.3, “Additions to CP15 Functionality” on page 2-11 CPWAIT ; The MMU is guaranteed to be enabled at this point; the next instruction or ; data address will be translated.

3-6 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

3.4.3 Locking Entries

Individual entries can be locked into the instruction and data TLBs. See Table 7-14, “Cache

Lockdown Functions” on page 7-14 for the exact commands. If a lock operation finds the virtual

address translation already resident in the TLB, the results are unpredictable. An invalidate by entry command before the lock command ensures proper operation. Software can also accomplish this by invalidating all entries, as shown in Example 3-2 on page 3-7.

Locking entries into either the instruction TLB or data TLB reduces the available number of entries (by the number that was locked down) for hardware to cache other virtual to physical address translations.

A procedure for locking entries into the instruction TLB is shown in Example 3-2 on page 3-7.

If a MMU abort is generated during an instruction or data TLB lock operation, the Fault Status Register is updated to indicate a Lock Abort (see Section 2.3.4.4, “Data Aborts” on page 2-14), and the exception is reported as a data abort.

Example 3-2. Locking Entries into the Instruction TLB

; R1, R2 and R3 contain the virtual addresses to translate and lock into ; the instruction TLB.

Memory Management

; The value in R0 is ignored in the following instruction. ; Hardware guarantees that accesses to CP15 occur in program order

MCR P15,0,R0,C8,C5,0 ; Invalidate the entire instruction TLB

MCR P15,0,R1,C10,C4,0 ; Translate virtual address (R1) and lock into

; instruction TLB

MCR P15,0,R2,C10,C4,0 ; Translate

; virtual address (R2) and lock into instruction TLB

MCR P15,0,R3,C10,C4,0 ; Translate virtual address (R3) and lock into

; instruction TLB

CPWAIT

; The MMU is guaranteed to be updated at this point; the next instruction will ; see the locked instruction TLB entries.

Note: If exceptions are allowed to occur in the middle of this routine, the TLB may end up caching a

translation that is about to be locked. For example, if R1 is the virtual address of an interrupt service routine and that interrupt occurs immediately after the TLB has been invalidated, the lock operation is ignored when the interrupt service routine returns back to this code sequence. Software should disable interrupts (FIQ or IRQ) in this case.

As a general rule, software should avoid locking in all other exception types.

The proper procedure for locking entries into the data TLB is shown in Example 3-3 on page 3-8.

Developer’s Manual March, 2003 3-7

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Memory Management

Example 3-3. Locking Entries into the Data TLB

; R1, and R2 contain the virtual addresses to translate and lock into the data TLB

MCR P15,0,R1,C8,C6,1 ; Invalidate the data TLB entry specified by the

; virtual address in R1

MCR P15,0,R1,C10,C8,0 ; Translate virtual address (R1) and lock into

; data TLB

; Repeat sequence for virtual address in R2 MCR P15,0,R2,C8,C6,1 ; Invalidate the data TLB entry specified by the

; virtual address in R2

MCR P15,0,R2,C10,C8,0 ; Translate virtual address (R2) and lock into

; data TLB

CPWAIT ; wait for locks to complete

; The MMU is guaranteed to be updated at this point; the next instruction will ; see the locked data TLB entries.

Note: Care must be exercised here when allowing exceptions to occur during this routine whose handlers

may have data that lies in a page that is trying to be locked into the TLB.

3-8 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

3.4.4 Round-Robin Replacement Algorithm

The line replacement algorithm for the TLBs is round-robin; there is a round-robin pointer that keeps track of the next entry to replace. The next entry to replace is the one sequentially after the last entry that was written. For example, if the last virtual to physical address translation was written into entry 5, the next entry to replace is entry 6.

At reset, the round-robin pointer is set to entry 31. Once a translation is written into entry 31, the round-robin pointer gets set to the next available entry, beginning with entry 0 if no entries have been locked down. Subsequent translations move the round-robin pointer to the next sequential entry until entry 31 is reached, where it wraps back to entry 0 upon the next translation.

A lock pointer is used for locking entries into the TLB and is set to entry 0 at reset. A TLB lock operation places the specified translation at the entry designated by the lock pointer, moves the lock pointer to the next sequential entry, and resets the round-robin pointer to entry 31. Locking entries into either TLB effectively reduces the available entries for updating. For example, if the first three entries were locked down, the round-robin pointer would be entry 3 after it rolled over from entry 31.

Only entries 0 through 30 can be locked in either TLB; entry 31can never be locked. If the lock pointer is at entry 31, a lock operation updates the TLB entry with the translation and ignore the lock. In this case, the round-robin pointer stays at entry 31.

Memory Management

Figure 3-1. Example of Locked Entries in TLB

Eight entries locked, 24 entries available for round robin replacement

entry 0 entry 1

entry 7 entry 8

entry 22 entry 23

entry 30 entry 31

Locked

Developer’s Manual March, 2003 3-9

Instruction Cache

The Intel® 80200 processor based on Intel® XScale™ microarchitecture (compliant with the ARM* Architecture V5TE) instruction cache enhances performance by reducing the number of instruction fetches from external memory. The cache provides fast execution of cached code. Code can also be locked down when guaranteed or fast access time is required.

4.1 Overview

Figure 4-1 shows the cache organization and how the instruction address is used to access the

cache.

The instruction cache is a 32-Kbyte, 32-way set associative cache; this means there are 32 sets with each set containing 32 ways. Each way of a set contains eight 32-bit words and one valid bit, which is referred to as a line. The replacement policy is a round-robin algorithm and the cache also supports the ability to lock code in at a line granularity.

Figure 4-1. Instruction Cache Organization

Set 31

way 0 way 1

8 Words (cache line)

Set Index

Set 1

Set 0

way 0 way 1

This example shows Set 0 being selected by the set index.

Ta g

Word Select

Instruction Address (Virtual)

31 109 54 210

CAM

way 31

way 0 way 1

8 Words (cache line)

CAM

way 31

Tag Set Index Word

8 Words (cache line)

DATA

Instruction Word (4 bytes)

CAM

way 31

DATA

CAM: Content Addressable Memory

DATA

The instruction cache is virtually addressed and virtually tagged.

Note: The virtual address presented to the instruction cache may be remapped by the PID register. See

Section 7.2.13, “Register 13: Process ID” on page 7-16 for a description of the PID register.

Developer’s Manual March, 2003 4-1

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Instruction Cache

4.2 Operation

4.2.1 Operation When Instruction Cache is Enabled

When the cache is enabled, it compares every instruction request address against the addresses of instructions that it is currently holding. If the cache contains the requested instruction, the access “hits” the cache, and the cache returns the requested instruction. If the cache does not contain the requested instruction, the access “misses” the cache, and the cache requests a fetch from external memory of the 8-word line (32 bytes) that contains the requested instruction using the fetch policy described in Section 4.2.3. As the fetch returns instructions to the cache, they are placed in one of two fetch buffers and the requested instruction is delivered to the instruction decoder.

A fetched line is written into the cache if it is cacheable. Code is designated as cacheable when the Memory Management Unit (MMU) is disabled or when the MMU is enable and the cacheable (C) bit is set to 1 in its corresponding page. See Chapter 3, “Memory Management” for a discussion on page attributes.

Note that an instruction fetch may “miss” the cache but “hit” one of the fetch buffers. When this happens, the requested instruction is delivered to the instruction decoder in the same manner as a cache “hit.”

4.2.2 Operation When The Instruction Cache Is Disabled

Disabling the cache prevents any lines from being written into the instruction cache. Although the cache is disabled, it is still accessed and may generate a “hit” if the data is already in the cache.

Disabling the instruction cache does not disable instruction buffering that may occur within the instruction fetch buffers. Two 8-word instruction fetch buffers are always enabled in the cache disabled mode. So long as instruction fetches continue to “hit” within either buffer (even in the presence of forward and backward branches), no external fetches for instructions are generated. A miss causes one or the other buffer to be filled from external memory using the fill policy described in Section 4.2.3.

4-2 March, 2003 Developer’s Manual

4.2.3 Fetch Policy

An instruction-cache “miss” occurs when the requested instruction is not found in the instruction fetch buffers or instruction cache; a fetch request is then made to external memory. The instruction cache can handle up to two “misses.” Each external fetch request uses a fetch buffer that holds 32-bytes and eight valid bits, one for each word.

A miss causes the following:

1. A fetch buffer is allocated

2. The instruction cache sends a fetch request to the external bus. This request is for a 32-byte line.

3. Instruction words are returned back from the external bus, at a maximum rate of 1 word per core cycle. As each word returns, the corresponding valid bit is set for the word in the fetch buffer.

4. As soon as the fetch buffer receives the requested instruction, it forwards the instruction to the instruction decoder for execution.

5. When all words have returned, the fetched line is written into the instruction cache if cacheable and if the instruction cache is enabled. The line chosen for update in the cache is controlled by the round-robin replacement algorithm. This update may evict a valid line at that location.

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Instruction Cache

6. Once the cache is updated, the eight valid bits of the fetch buffer are invalidated.

4.2.4 Round-Robin Replacement Algorithm

The line replacement algorithm for the instruction cache is round-robin. Each set in the instruction cache has a round-robin pointer that keeps track of the next line (in that set) to replace. The next line to replace in a set is the one after the last line that was written. For example, if the line for the last external instruction fetch was written into way 5-set 2, the next line to replace for that set would be way 6. None of the other round-robin pointers for the other sets are affected in this case.

After reset, way 31 is pointed to by the round-robin pointer for all the sets. Once a line is written into way 31, the round-robin pointer points to the first available way of a set, beginning with way 0 if no lines have been locked into that particular set. Locking lines into the instruction cache effectively reduces the available lines for cache updating. For example, if the first three lines of a set were locked down, the round-robin pointer would point to the line at way 3 after it rolled over from way 31. Refer to Section 4.3.4, “Locking Instructions in the Instruction Cache” on page 4-8 for more details on cache locking.

Developer’s Manual March, 2003 4-3

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Instruction Cache

4.2.5 Parity Protection

The instruction cache is protected by parity to ensure data integrity. Each instruction cache word has 1 parity bit. (The instruction cache tag is NOT parity protected.) When a parity error is detected on an instruction cache access, a prefetch abort exception occurs if the Intel attempts to execute the instruction. Before servicing the exception, hardware will place a notification of the error in the Fault Status Register (Coprocessor 15, register 5).

A software exception handler can recover from an instruction cache parity error. This can be accomplished by invalidating the instruction cache and the branch target buffer and then returning to the instruction that caused the prefetch abort exception. A simplified code example is shown in

Example 4-1 on page 4-4. A more complex handler might choose to invalidate the specific line that

caused the exception and then invalidate the BTB.

Example 4-1. Recovering from an Instruction Cache Parity Error

; Prefetch abort handler MCR P15,0,R0,C7,C5,0 ; Invalidate the instruction cache and branch target

; buffer

CPWAIT ; wait for effect (see Section 2.3.3 for a

; description of CPWAIT)

80200 processor

SUBS PC,R14,#4 ; Returns to the instruction that generated the

; parity error

; The Instruction Cache is guaranteed to be invalidated at this point

If a parity error occurs on an instruction that is locked in the cache, the software exception handler needs to unlock the instruction cache, invalidate the cache and then re-lock the code in before it returns to the faulting instruction.

4-4 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

4.2.6 Instruction Fetch Latency

Because the Intel® 80200 processor core is clocked at a multiple of the external bus clock, and the two clocks are truly asynchronous, an exact fetch latency is difficult to derive. In general, if a fetch can be directly issued (no other memory accesses are intervening), then the delay to the first instruction is approximately (8 + W) bus clocks, where W is number of memory wait states.

As an example: in a system with 2-wait-state memory (W = 2), an unoccluded fetch would require about 10 bus clocks to get the first instruction. If this system were running with a core/bus clock ratio of 6, then the core would perceive this as a latency of about 60 cycles.

These numbers are best case and assume that no other active memory transactions exist. Refer to

Chapter 10, “External Bus” for more information on External Bus signal definitions and request

timings.

4.2.7 Instruction Cache Coherency

The instruction cache does not detect modification to program memory by loads, stores or actions of other bus masters. Several situations may require program memory modification, such as uploading code from disk.

Instruction Cache

The application program is responsible for synchronizing code modification and invalidating the cache. In general, software must ensure that modified code space is not accessed until modification and invalidating are completed.

To achieve cache coherence, instruction cache contents can be invalidated after code modification in external memory is complete. Refer to Section 4.3.3, “Invalidating the Instruction Cache” on

page 4-7 for the proper procedure in invalidating the instruction cache.

If the instruction cache is not enabled, or code is being written to a non-cacheable region, software must still invalidate the instruction cache before using the newly-written code. This precaution ensures that state associated with the new code is not buffered elsewhere in the processor, such as the fetch buffers or the BTB.

Naturally, when writing code as data, care must be taken to force it completely out of the processor into external memory before attempting to execute it. If writing into a non-cacheable region, flushing the write buffers is sufficient precaution (see Section 7.2.8 for a description of this operation). If writing to a cacheable region, then the data cache should be submitted to a Clean/Invalidate operation (see Section 6.3.3.1) to ensure coherency.

Developer’s Manual March, 2003 4-5

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Instruction Cache

4.3 Instruction Cache Control

4.3.1 Instruction Cache State at RESET

After reset, the instruction cache is always disabled, unlocked, and invalidated (flushed).

4.3.2 Enabling/Disabling

The instruction cache is enabled by setting bit 12 in coprocessor 15, register 1 (Control Register). This process is illustrated in Example 4-2, Enabling the Instruction Cache.

Example 4-2. Enabling the Instruction Cache

; Enable the ICache MRC P15, 0, R0, C1, C0, 0 ; Get the control register ORR R0, R0, #0x1000 ; set bit 12 -- the I bit MCR P15, 0, R0, C1, C0, 0 ; Set the control register

CPWAIT

4-6 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

4.3.3 Invalidating the Instruction Cache

The entire instruction cache along with the fetch buffers are invalidated by writing to coprocessor 15, register 7. (See Table 7-12, “Cache Functions” on page 7-11 for the exact command.) This command does not unlock any lines that were locked in the instruction cache nor does it invalidate those locked lines. To invalidate the entire cache including locked lines, the unlock instruction cache command needs to be executed before the invalidate command. This unlock command can also be found in Table 7-14, “Cache Lockdown Functions” on page 7-14.

There is an inherent delay from the execution of the instruction cache invalidate command to where the next instruction sees the result of the invalidate. The following routine can be used to guarantee proper synchronization.

Example 4-3. Invalidating the Instruction Cache

MCR P15,0,R1,C7,C5,0 ; Invalidate the instruction cache and branch

; target buffer

CPWAIT

; The instruction cache is guaranteed to be invalidated at this point; the next ; instruction sees the result of the invalidate command.

Instruction Cache

The Intel

80200 processor also supports invalidating an individual line from the instruction cache.

See Table 7-12, “Cache Functions” on page 7-11 for the exact command.

Developer’s Manual March, 2003 4-7

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Instruction Cache

4.3.4 Locking Instructions in the Instruction Cache

Software has the ability to lock performance critical routines into the instruction cache. Up to 28 lines in each set can be locked; hardware ignores the lock command if software is trying to lock all the lines in a particular set (i.e., ways 28-31can never be locked). When this happens, the line is still allocated into the cache, but the lock is ignored. The round-robin pointer stays at way 31 for that set.

Lines can be locked into the instruction cache by initiating a write to coprocessor 15. (See

Table 7-14, “Cache Lockdown Functions” on page 7-14 for the exact command.) Register Rd

contains the virtual address of the line to be locked into the cache.

There are several requirements for locking down code:

1. The routine used to lock lines down in the cache must be placed in non-cacheable memory, which means the MMU is enabled. As a corollary: no fetches of cacheable code should occur while locking instructions into the cache.

2. The code being locked into the cache must be cacheable.

3. The instruction cache must be enabled and invalidated prior to locking down lines.

Failure to follow these requirements produces unpredictable results when accessing the instruction cache.

System programmers should ensure that the code to lock instructions into the cache does not reside closer than 128 bytes to a non-cacheable/cacheable page boundary. If the processor fetches ahead into a cacheable page, then the first requirement noted above could be violated.

Lines are locked into a set starting at way 0 and may progress up to way 27; which set a line gets locked into depends on the set index of the virtual address. Figure 4-2 is an example of where lines of code may be locked into the cache along with how the round-robin pointer is affected.

Figure 4-2. Locked Line Effect on Round Robin Replacement

set 0: 8 ways locked, 24 ways available for round robin replacement set 1: 23 ways locked, 9 ways available for round robin replacement set 2: 28 ways locked, only way28-31 available for replacement set 31: all 32 ways available for round robin replacement

set 1

Locked

set 2

way 0 way 1

...

way 7 way 8

......

way 22 way 23

way 30 way 31

set 0

Locked

...

set 31

4-8 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Software can lock down several different routines located at different memory locations. This may cause some sets to have more locked lines than others as shown in Figure 4-2.

Example 4-4 on page 4-9 shows how a routine, called “lockMe” in this example, might be locked

into the instruction cache. Note that it is possible to receive an exception while locking code (see

Section 2.3.4, “Event Architecture” on page 2-12).

Example 4-4. Locking Code into the Cache

lockMe: ; This is the code that will be locked into the cache

mov r0, #5 add r5, r1, r2

. . .

lockMeEnd:

. . .

codeLock: ; here is the code to lock the “lockMe” routine

ldr r0, =(lockMe AND NOT 31); r0 gets a pointer to the first line we should lock ldr r1, =(lockMeEnd AND NOT 31); r1 contains a pointer to the last line we should lock

Instruction Cache

lockLoop:

mcr p15, 0, r0, c9, c1, 0; lock next line of code into ICache cmp r0, r1 ; are we done yet? add r0, r0, #32 ; advance pointer to next line bne lockLoop ; if not done, do the next line

4.3.5 Unlocking Instructions in the Instruction Cache

The Intel® 80200 processor provides a global unlock command for the instruction cache. Writing to coprocessor 15, register 9 unlocks all the locked lines in the instruction cache and leaves them valid. These lines then become available for the round-robin replacement algorithm. (See

Table 7-14, “Cache Lockdown Functions” on page 7-14 for the exact command.)

Developer’s Manual March, 2003 4-9

Branch Target Buffer

Intel® 80200 processor based on Intel® XScale™ microarchitecture (compliant with the ARM* Architecture V5TE) uses dynamic branch prediction to reduce the penalties associated with changing the flow of program execution. The Intel buffer that provides the instruction cache with the target address of branch type instructions. The branch target buffer is implemented as a 128-entry, direct mapped cache.

This chapter is primarily for those optimizing their code for performance. An understanding of the branch target buffer is needed in this case so that code can be scheduled to best utilize the performance benefits of the branch target buffer.

5.1 Branch Target Buffer (BTB) Operation

The BTB stores the history of branches that have executed along with their targets. Figure 5-1 shows an entry in the BTB, where the tag is the instruction address of a previously executed branch and the data contains the target address of the previously executed branch along with two bits of history information.

Figure 5-1. BTB Entry

80200 processor features a branch target

TAG

Branch Address[31:9,1] Target Address[31:1]

The BTB takes the current instruction address and checks to see if this address is a branch that was previously seen. It uses bits [8:2] of the current address to read out the tag and then compares this tag to bits [31:9,1] of the current instruction address. If the current instruction address matches the tag in the cache and the history bits indicate that this branch is usually taken in the past, the BTB uses the data (target address) as the next instruction address to send to the instruction cache.

Bit[1] of the instruction address is included in the tag comparison in order to support Thumb* execution. This organization means that two consecutive Thumb branch (B) instructions, with instruction address bits[8:2] the same, contends for the same BTB entry. Thumb also requires 31 bits for the branch target address. In ARM* mode, bit[1] is zero.

The history bits represent four possible prediction states for a branch entry in the BTB. Figure 5-2,

“Branch History” on page 5-2 shows these states along with the possible transitions. The initial

state for branches stored in the BTB is Weakly-Taken (WT). Every time a branch that exists in the BTB is executed, the history bits are updated to reflect the latest outcome of the branch, either taken or not-taken.

Chapter 14, “Performance Considerations” describes which instructions are dynamically predicted

by the BTB and the performance penalty for mispredicting a branch.

The BTB does not have to be managed explicitly by software; it is disabled by default after reset and is invalidated when the instruction cache is invalidated.

DATA

History Bits[1:0]

Developer’s Manual March, 2003 5-1

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Branch Target Buffer

Figure 5-2. Branch History

Ta ke n

Tak en

Ta ke n

Not Ta ke n

SN: Strongly Not Taken

WN: Weakly Not Taken

5.1.1 Reset

After Processor Reset, the BTB is disabled and all entries are invalidated.

5.1.2 Update Policy

A new entry is stored into the BTB when the following conditions are met:

• the branch instruction has executed,

• the branch was taken

• the branch is not currently in the BTB

Not Taken

ST: Strongly Taken

WT: Weakly Taken

Tak e

The entry is then marked valid and the history bits are set to WT. If another valid branch exists at the same entry in the BTB, it is evicted by the new branch.

Once a branch is stored in the BTB, the history bits are updated upon every execution of the branch as shown in Figure 5-2.

5-2 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

5.2 BTB Control

5.2.1 Disabling/Enabling

The BTB is always disabled out of Reset. Software can enable the BTB through a bit in a coprocessor register (see Section 7.2.2).

Before enabling or disabling the BTB, software must invalidate it (described in the following section). This action ensures correct operation in case stale data is in the BTB. Software should not place any branch instruction between the code that invalidates the BTB and the code that enables/disables it.

5.2.2 Invalidation

There are four ways the contents of the BTB can be invalidated.

1. Reset

2. Software can directly invalidate the BTB via a CP15, register 7 function. Refer to

Section 7.2.8, “Register 7: Cache Functions” on page 7-11.

3. The BTB is invalidated when the Process ID Register is written.

Branch Target Buffer

4. The BTB is invalidated when the instruction cache is invalidated via CP15, register 7 functions.

Developer’s Manual March, 2003 5-3

Data Cache

The Intel® 80200 processor based on Intel® XScale™ microarchitecture (compliant with the ARM* Architecture V5TE) data cache enhances performance by reducing the number of data accesses to and from external memory. There are two data cache structures in the Intel processor, a 32 Kbyte data cache and a 2 Kbyte mini-data cache. An eight entry write buffer and a four entry fill buffer are also implemented to decouple the Intel execution from external memory accesses, which increases overall system performance.

6.1 Overviews

6.1.1 Data Cache Overview

The data cache is a 32-Kbyte, 32-way set associative cache; this means there are 32 sets with each set containing 32 ways. Each way of a set contains 32 bytes (one cache line) and one valid bit. There also exist two dirty bits for every line, one for the lower 16 bytes and the other one for the upper 16 bytes. When a store hits the cache the dirty bit associated with it is set. The replacement policy is a round-robin algorithm and the cache also supports the ability to reconfigure each line as data RAM.

Figure 6-1, “Data Cache Organization” on page 6-2 shows the cache organization and how the data

address is used to access the cache.

80200

80200 processor instruction

Cache policies may be adjusted for particular regions of memory by altering page attribute bits in the MMU descriptor that controls that memory. See Section 3.2.2 for a description of these bits.

The data cache is virtually addressed and virtually tagged. It supports write-back and write-through caching policies. The data cache always allocates a line in the cache when a cacheable read miss occurs and allocates a line into the cache on a cacheable write miss when write allocate is specified by its page attribute. Page attribute bits determine whether a line gets allocated into the data cache or mini-data cache.

Developer’s Manual March, 2003 6-1

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Data Cache

Figure 6-1. Data Cache Organization

Set 31

way 0 way 1

32 bytes (cache line)

Set Index

Set 1

Set 0

way 0

This example shows Set 0 being selected by the set index.

Ta g

Word Select

Byte Select

Data Address (Virtual)

31 109 54 210

way 1

CAM

way 31

way 0 way 1

32 bytes (cache line)

CAM

way 31

(4 bytes to Destination Register)

Tag Set Index Word Byte

32 bytes (cache line)

DATA

Byte Alignment Sign Extension

Data Word

CAM

way 31

DATA

CAM: Content Addressable Memory

DATA

6-2 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

6.1.2 Mini-Data Cache Overview

The mini-data cache is a 2-Kbyte, 2-way set associative cache; this means there are 32 sets with each set containing 2 ways. Each way of a set contains 32 bytes (one cache line) and one valid bit. There also exist 2 dirty bits for every line, one for the lower 16 bytes and the other one for the upper 16 bytes. When a store hits the cache the dirty bit associated with it is set. The replacement policy is a round-robin algorithm.

Figure 6-2, “Mini-Data Cache Organization” on page 6-3 shows the cache organization and how

the data address is used to access the cache.

The mini-data cache is virtually addressed and virtually tagged and supports the same caching policies as the data cache. However, lines can’t be locked into the mini-data cache.

Figure 6-2. Mini-Data Cache Organization

Data Cache

This example shows Set 0 being selected by

Set Index

Set 1

Set 0

way 0 way 1

Ta g

Word Select

Byte Select

Data Address (Virtual)

31 109 54 210

way 0 way 1

32 bytes (cache line)

Byte Alignment Sign Extension

(4 bytes to Destination Register)

Data Word

Set 31

way 0 way 1

32 bytes (cache line)

Tag Se t Index Word Byte

Developer’s Manual March, 2003 6-3

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Data Cache

6.1.3 Write Buffer and Fill Buffer Overview

The Intel® 80200 processor employs an eight entry write buffer, each entry containing 16 bytes. Stores to external memory are first placed in the write buffer and subsequently taken out when the bus is available.

The write buffer supports the coalescing of multiple store requests to external memory. An incoming store may coalesce with any of the eight entries.

The fill buffer holds the external memory request information for a data cache or mini-data cache fill or non-cacheable read request. Up to four 32-byte read request operations can be outstanding in the fill buffer before the Intel

The fill buffer has been augmented with a four entry pend buffer that captures data memory requests to outstanding fill operations. Each entry in the pend buffer contains enough data storage to hold one 32-bit word, specifically for store operations. Cacheable load or store operations that hit an entry in the fill buffer get placed in the pend buffer and are completed when the associated fill completes. Any entry in the pend buffer can be pended against any of the entries in the fill buffer; multiple entries in the pend buffer can be pended against a single entry in the fill buffer.

Pended operations complete in program order.

80200 processor needs to stall.

6-4 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

6.2 Data Cache and Mini-Data Cache Operation

The following discussions refer to the data cache and mini-data cache as one cache (data/mini-data) since their behavior is the same when accessed.

6.2.1 Operation When Caching is Enabled

When the data/mini-data cache is enabled for an access, the data/mini-data cache compares the address of the request against the addresses of data that it is currently holding. If the line containing the address of the request is resident in the cache, the access “hits’ the cache. For a load operation the cache returns the requested data to the destination register and for a store operation the data is stored into the cache. The data associated with the store may also be written to external memory if write-through caching is specified for that area of memory. If the cache does not contain the requested data, the access ‘misses’ the cache, and the sequence of events that follows depends on the configuration of the cache, the configuration of the MMU and the page attributes, which are described in Section 6.2.3.2, “Read Miss Policy” on page 6-6 and Section 6.2.3.3, “Write Miss

Policy” on page 6-7 for a load “miss” and store “miss” respectively.

6.2.2 Operation When Data Caching is Disabled

Data Cache

The data/mini-data cache is still accessed even though it is disabled. If a load hits the cache it returns the requested data to the destination register. If a store hits the cache, the data is written into the cache. Any access that misses the cache does not allocate a line in the cache when it’s disabled, even if the MMU is enabled and the memory region’s cacheability attribute is set.

6.2.3 Cache Policies

6.2.3.1 Cacheability

Data at a specified address is cacheable given the following:

• the MMU is enabled

• the cacheable attribute is set in the descriptor for the accessed address

• and the data/mini-data cache is enabled

Developer’s Manual March, 2003 6-5

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Data Cache

6.2.3.2 Read Miss Policy

The following sequence of events occurs when a cacheable (see Section 6.2.3.1, “Cacheability” on

page 6-5) load operation misses the cache:

1. The fill buffer is checked to see if an outstanding fill request already exists for that line.

If so, the current request is placed in the pending buffer and waits until the previously requested fill completes, after which it accesses the cache again, to obtain the request data and returns it to the destination register.

If there is no outstanding fill request for that line, the current load request is placed in the fill buffer and a 32-byte external memory read request is made. If the pending buffer or fill buffer is full, the Intel

2. A line is allocated in the cache to receive the 32-bytes of fill data. The line selected is determined by the round-robin pointer (see Section 6.2.4, “Round-Robin Replacement

Algorithm” on page 6-8). The line chosen may contain a valid line previously allocated in the

cache. In this case both dirty bits are examined and if set, the four words associated with a dirty bit that’s asserted are written back to external memory as a four word burst operation.

3. When the data requested by the load is returned from external memory, it is immediately sent to the destination register specified by the load. A system that returns the requested data back first, with respect to the other bytes of the line, obtains the best performance.

4. As data returns from external memory it is written into the cache in the previously allocated line.

80200 processor stalls until an entry is available.

A load operation that misses the cache and is NOT cacheable makes a request from external memory for the exact data size of the original load request. For example, LDRH requests exactly two bytes from external memory, LDR requests 4 bytes from external memory, etc. This request is placed in the fill buffer until, the data is returned from external memory, which is then forwarded back to the destination register(s).

6-6 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

6.2.3.3 Write Miss Policy

A write operation that misses the cache requests a 32-byte cache line from external memory if the access is cacheable and write allocation is specified in the page. In this case the following sequence of events occur:

1. The fill buffer is checked to see if an outstanding fill request already exists for that line.

If so, the current request is placed in the pending buffer and waits until the previously requested fill completes, after which it writes its data into the recently allocated cache line.

If there is no outstanding fill request for that line, the current store request is placed in the fill buffer and a 32-byte external memory read request is made. If the pending buffer or fill buffer is full, the Intel

2. The 32-bytes of data can be returned back to the Intel the eight words in the line can be returned in any order. Note that it does not matter, for performance reasons, which order the data is returned to the Intel store operation has to wait until the entire line is written into the cache before it can complete.

3. When the entire 32-byte line has returned from external memory, a line is allocated in the cache, selected by the round-robin pointer (see Section 6.2.4, “Round-Robin Replacement

Algorithm” on page 6-8). The line to be written into the cache may replace a valid line

previously allocated in the cache. In this case both dirty bits are examined and if any are set, the four words associated with a dirty bit that’s asserted are written back to external memory as a 4 word burst operation. This write operation is placed in the write buffer.

80200 processor stalls until an entry is available.

Data Cache

80200 processor in any word order, i.e,

80200 processor since the

4. The line is written into the cache along with the data associated with the store operation.

If the above condition for requesting a 32-byte cache line is not met, a write miss causes a write request to external memory for the exact data size specified by the store operation, assuming the write request doesn’t coalesce with another write operation in the write buffer.

6.2.3.4 Write-Back Versus Write-Through

The Intel® 80200 processor supports write-back caching or write-through caching, controlled through the MMU page attributes. When write-through caching is specified, all store operations are written to external memory even if the access hits the cache. This feature keeps the external memory coherent with the cache, i.e., no dirty bits are set for this region of memory in the data/mini-data cache. This however does not guarantee that the data/mini-data cache is coherent with external memory, which is dependent on the system level configuration, specifically if the external memory is shared by another master.

When write-back caching is specified, a store operation that hits the cache does not generate a write to external memory, thus reducing external memory traffic.

Developer’s Manual March, 2003 6-7

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Data Cache

6.2.4 Round-Robin Replacement Algorithm

The line replacement algorithm for the data cache is round-robin. Each set in the data cache has a round-robin pointer that keeps track of the next line (in that set) to replace. The next line to replace in a set is the next sequential line after the last one that was just filled. For example, if the line for the last fill was written into way 5-set 2, the next line to replace for that set would be way 6. None of the other round-robin pointers for the other sets are affected in this case.

After reset, way 31 is pointed to by the round-robin pointer for all the sets. Once a line is written into way 31, the round-robin pointer points to the first available way of a set, beginning with way 0 if no lines have been re-configured as data RAM in that particular set. Re-configuring lines as data RAM effectively reduces the available lines for cache updating. For example, if the first three lines of a set were re-configured, the round-robin pointer would point to the line at way 3 after it rolled over from way 31. Refer to Section 6.4, “Re-configuring the Data Cache as Data RAM” on

page 6-12 for more details on data RAM.

The mini-data cache follows the same round-robin replacement algorithm as the data cache except that there are only two lines the round-robin pointer can point to such that the round-robin pointer always points to the least recently filled line. A least recently used replacement algorithm is not supported because the purpose of the mini-data cache is to cache data that exhibits low temporal locality, i.e.,data that is placed into the mini-data cache is typically modified once and then written back out to external memory.

6.2.5 Parity Protection

The data cache and mini-data cache are protected by parity to ensure data integrity; there is one parity bit per byte of data. (The tags are NOT parity protected.) When a parity error is detected on a data/mini-data cache access, a data abort exception occurs. Before servicing the exception, hardware sets bit 10 of the Fault Status Register register.

A data/mini-data cache parity error is an imprecise data abort, meaning R14_ABORT (+8) may not point to the instruction that caused the parity error. If the parity error occurred during a load, the targeted register may be updated with incorrect data.

A data abort due to a data/mini-data cache parity error may not be recoverable if the data address that caused the abort occurred on a line in the cache that has a write-back caching policy. Prior updates to this line may be lost; in this case the software exception handler should perform a “clean and clear” operation on the data cache, ignoring subsequent parity errors, and restart the offending process. This operation is shown in Section 6.3.3.1.

6.2.6 Atomic Accesses

The SWP and SWPB instructions generate an atomic load and store operation allowing a memory semaphore to be loaded and altered without interruption. These accesses may hit or miss the data/mini-data cache depending on configuration of the cache, configuration of the MMU, and the page attributes. If the swap operation is directed to external memory the BCU performs a locked set of memory operations (see Chapter 11, “Bus Controller”).

6-8 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

6.3 Data Cache and Mini-Data Cache Control

6.3.1 Data Memory State After Reset

After processor reset, both the data cache and mini-data cache are disabled, all valid bits are set to zero (invalid), and the round-robin bit points to way 31. Any lines in the data cache that were configured as data RAM before reset are changed back to cacheable lines after reset, i.e., there are 32 KBytes of data cache and zero bytes of data RAM.

6.3.2 Enabling/Disabling

The data cache and mini-data cache are enabled by setting bit 2 in coprocessor 15, register 1 (Control Register). See Chapter 7, “Configuration”, for a description of this register and others.

Example 6-1 shows code that enables the data and mini-data caches. Note that the MMU must be

enabled to use the data cache.

Example 6-1. Enabling the Data Cache

enableDCache:

Data Cache

MCR p15, 0, r0, c7, c10, 4; Drain pending data operations...

; (see Chapter 7.2.8, Register 7: Cache functions) MRC p15, 0, r0, c1, c0, 0; Get current control register ORR r0, r0, #4 ; Enable DCache by setting ‘C’ (bit 2) MCR p15, 0, r0, c1, c0, 0; And update the Control register

6.3.3 Invalidate & Clean Operations

Individual entries can be invalidated and cleaned in the data cache and mini-data cache via coprocessor 15, register 7. Note that a line locked into the data cache remains locked even after it has been subjected to an invalidate-entry operation. This will leave an unusable line in the cache until a global unlock has occurred. For this reason, do not use these commands on locked lines.

This same register also provides the command to invalidate the entire data cache and mini-data cache. Refer to Table 7-12, “Cache Functions” on page 7-11 for a listing of the commands. These global invalidate commands have no effect on lines locked in the data cache. Locked lines must be unlocked before they can be invalidated. This is accomplished by the Unlock Data Cache command found in Table 7-14, “Cache Lockdown Functions” on page 7-14.

Developer’s Manual March, 2003 6-9

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Data Cache

6.3.3.1 Global Clean and Invalidate Operation

A simple software routine is used to globally clean the data cache. It takes advantage of the line-allocate data cache operation, which allocates a line into the data cache. This allocation will evict any dirty data in the cache back to external memory. Example 6-2 shows how the data cache can be cleaned.

Example 6-2. Global Clean Operation

; Global Clean/Invalidate THE DATA CACHE ; R1 contains the virtual address of a region of cacheable memory reserved for ; this clean operation. ; R0 is the loop count; Iterate 1024 times which is the number of lines in the ; data cache

;; Macro ALLOCATE performs the line-allocation cache operation on the ;; address specified in register Rx. ;; MACRO ALLOCATE Rx

MCR P15, 0, Rx, C7, C2, 5

ENDM

MOV R0, #1024 LOOP1: ALLOCATE R1 ; Allocate a line at the virtual address

; specified by R1. ADD R1, R1, #32 ; Increment the address in R1 to the next cache line SUBS R0, R0, #1 ; Decrement loop count BNE LOOP1 ; ; Clean the Mini-data Cache ; Can’t use line-allocate command, so cycle 2KB of unused data through. ; R2 contains the virtual address of a region of cacheable memory reserved for ; cleaning the Mini-data cache ; R0 is the loop count; Iterate 64 times which is the number of lines in the ; Mini-data Cache.

MOV R0, #64 LOOP2: LDR R3,[R2],#32 ; Load and increment to next cache line SUBS R0,R0,#1 ; Decrement loop count BNE LOOP2 ;

; Invalidate the data cache and mini-data cache MCR P15, 0, R0, C7, C6, 0 ;

The line-allocate operation does not require physical memory to exist at the virtual address specified by the instruction, since it does not generate a load/fill request to external memory. Also, the line-allocate operation does not set the 32 bytes of data associated with the line to any known value. Reading this data produces unpredictable results.

6-10 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Data Cache

The line-allocate command will not operate on the mini Data Cache, so system software must clean this cache by reading 2KByte of contiguous unused data into it. This data must be unused and reserved for this purpose so that it will not already be in the cache. It must reside in a page that is marked as mini Data Cache cacheable (see Section 2.3.2).

The time it takes to execute the global clean operation depends on the number of dirty lines in the cache.

Developer’s Manual March, 2003 6-11

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Data Cache

6.4 Re-configuring the Data Cache as Data RAM

Software has the ability to lock tags associated with 32-byte lines in the data cache, thus creating the appearance of data RAM. Any subsequent access to this line always hits the cache unless it is invalidated. Once a line is locked into the data cache it is no longer available for cache allocation on a line fill. Up to 28 lines in each set can be reconfigured as data RAM, such that the maximum data RAM size is 28 Kbytes.

Hardware does not support locking lines into the mini-data cache; any attempt to do this produces unpredictable results.

There are two methods for locking tags into the data cache; the method of choice depends on the application. One method is used to lock data that resides in external memory into the data cache and the other method is used to re-configure lines in the data cache as data RAM. Locking data from external memory into the data cache is useful for lookup tables, constants, and any other data that is frequently accessed. Re-configuring a portion of the data cache as data RAM is useful when an application needs scratch memory (bigger than the register file can provide) for frequently used variables. These variables may be strewn across memory, making it advantageous for software to pack them into data RAM memory.

Code examples for these two applications are shown in Example 6-3 on page 6-13 and Example

6-4 on page 6-14. The difference between these two routines is that Example 6-3 on page 6-13

actually requests the entire line of data from external memory and Example 6-4 on page 6-14 uses the line-allocate operation to lock the tag into the cache. No external memory request is made, which means software can map any unallocated area of memory as data RAM. However, the line-allocate operation does validate the target address with the MMU, so system software must ensure that the memory has a valid descriptor in the page table.

Another item to note in Example 6-4 on page 6-14 is that the 32 bytes of data located in a newly allocated line in the cache must be initialized by software before it can be read. The line allocate operation does not initialize the 32 bytes and therefore reading from that line produces unpredictable results.

In both examples, the code drains the pending loads before and after locking data. This step ensures that outstanding loads do not end up in the wrong place—either unintentionally locked into the cache or not locked at all. Note also that a drain operation has been placed after the operation that locks the tag into the cache. This drain ensures predictable results if a programmer tries to lock more than 28 lines in a set; the tag gets allocated in this case but not locked into the cache.

6-12 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Example 6-3. Locking Data into the Data Cache

; configured with C=1 and B=1 ; R0 is the number of 32-byte lines to lock into the data cache. In this ; example 16 lines of data are locked into the cache. ; MMU and data cache are enabled prior to this code.

.macroCPWAIT MRC P15, 0, R0, C2, C0, 0 MOV R0, R0 SUB PC, PC, #4 .endm .macroDRAIN MCR P15, 0, R0, C7, C10, 4 ; drain pending loads and stores .endm .macroLOCKLINE, Rx, Ry

; Write back the line if it's dirty in the cache

MCR P15, 0, \Rx, C7, C10, 1

; Flush/Invalidate the line from the cache

MCR P15, 0, \Rx, C7, C6, 1 ; Load and lock 32 bytes of data located at [R1] ; into the data cache. Post-increment the address ; in R1 to the next cache line.

LDR \Ry, [\Rx], #32

.endm ; LockLines(int cache_lines, void *start_address)

.global LockLines LockLines:

STMFDSP!, {R4-R6, LR}

MOV R6, R0

DRAIN

MOV R2, #0x1

MCR P15, 0, R2, C9, C2, 0 ; Put the data cache in lock mode

CPWAIT LOOP1:

LOCKLINE R1, R2

SUBSR6, R6, #1; Decrement loop count

BEQ DONE

LOCKLINE R1, R3

SUBSR6, R6, #1; Decrement loop count

BEQ DONE

LOCKLINE R1, R4

SUBSR6, R6, #1; Decrement loop count

BEQ DONE

LOCKLINE R1, R5

SUBSR6, R6, #1; Decrement loop count

BNE LOOP1 ; Turn off data cache locking DONE:

DRAIN

MOV R2, #0x0

MCR P15, 0, R2, C9, C2, 0 ; Take the data cache out of lock mode.

CPWAIT

LDMFDSP!, {R4-R6, PC}

Data Cache

Developer’s Manual March, 2003 6-13

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Data Cache

Example 6-4. Creating Data RAM

; R1 contains the virtual address of a region of memory to configure as data RAM, ; which is aligned on a 32-byte boundary. ; MMU is configured so that the memory region is cacheable. ; R0 is the number of 32-byte lines to designate as data RAM. In this example 16 ; lines of the data cache are re-configured as data RAM. ; The inner loop is used to initialize the newly allocated lines ; MMU and data cache are enabled prior to this code.

MACRO ALLOCATE Rx

MCR P15, 0, Rx, C7, C2, 5

ENDM

MACRO DRAIN

MCR P15, 0, R0, C7, C10, 4 ; drain pending loads and stores

ENDM

DRAIN MOV R4, #0x0 MOV R5, #0x0 MOV R2, #0x1 MCR P15,0,R2,C9,C2,0 ; Put the data cache in lock mode CPWAIT

MOV R0, #16 LOOP1: ALLOCATE R1 ; Allocate and lock a tag into the data cache at

; address [R1]. ; initialize 32 bytes of newly allocated line DRAIN STRD R4, [R1],#8 ; STRD R4, [R1],#8 ; STRD R4, [R1],#8 ; STRD R4, [R1],#8 ;

SUBS R0, R0, #1 ; Decrement loop count BNE LOOP1 ; Turn off data cache locking

DRAIN ; Finish all pending operations

MOV R2, #0x0 MCR P15,0,R2,C9,C2,0; Take the data cache out of lock mode. CPWAIT

6-14 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Tags can be locked into the data cache by enabling the data cache lock mode bit located in coprocessor 15, register 9. (See Table 7-14, “Cache Lockdown Functions” on page 7-14 for the exact command.) Once enabled, any new lines allocated into the data cache are locked down.

Note that the PLD instruction does not affect the cache contents if it encounters an error while executing. For this reason, system software should ensure the memory address used in the PLD is correct. If this cannot be ascertained, replace the PLD with a LDR instruction that targets a scratch register.

Lines are locked into a set starting at way0 and may progress up to way 27; which set a line gets locked into depends on the set index of the virtual address of the request. Figure 6-3, “Locked Line

Effect on Round Robin Replacement” on page 6-15 is an example of where lines of code may be

locked into the cache along with how the round-robin pointer is affected.

Figure 6-3. Locked Line Effect on Round Robin Replacement

set 0: 8 ways locked, 24 ways available for round robin replacement set 1: 23 ways locked, 9 ways available for round robin replacement set 2: 28 ways locked, only ways 28-31 available for replacement set 31: all 32 ways available for round robin replacement

set 1

Locked

set 2

Locked

way 0 way 1

...

way 7 way 8

......

way 22 way 23

set 0

Locked

...

Data Cache

set 31

way 30 way 31

Software can lock down data located at different memory locations. This may cause some sets to have more locked lines than others as shown in Figure 6-3.

Lines are unlocked in the data cache by performing an unlock operation. See Section 7.2.10,

“Register 9: Cache Lock Down” on page 7-14 for more information about locking and unlocking

the data cache.

Before locking, the programmer must ensure that no part of the target data range is already resident in the cache. The Intel locked into the cache. If there is any doubt as to the location of the targeted memory data, the cache should be cleaned and invalidated to prevent this scenario. If the cache contains a locked region which the programmer wishes to lock again, then the cache must be unlocked before being cleaned and invalidated.

80200 processor does not refetch such data, which results in it not being

Developer’s Manual March, 2003 6-15

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Data Cache

6.5 Write Buffer/Fill Buffer Operation and Control

See Section 1.2.2, “Terminology and Acronyms” on page 1-5 for a definition of coalescing.

The write buffer is always enabled, which means, stores to external memory are buffered. The K bit in the Auxiliary Control Register (CP15, register 1) is a global enable/disable for allowing coalescing in the write buffer. When this bit disables coalescing, no coalescing occurs regardless the value of the page attributes. If this bit enables coalescing, the page attributes X, C, and B are examined to see if coalescing is enabled for each region of memory.

All reads and writes to external memory occur in program order when coalescing is disabled in the write buffer. If coalescing is enabled in the write buffer, writes may occur out of program order to external memory. Program correctness is maintained in this case by comparing all store requests with all the valid entries in the fill buffer.

The write buffer and fill buffer support a drain operation, such that before the next instruction executes, all Intel operations in the bus controller have completed. See Table 7-12, “Cache Functions” on page 7-11 for the exact command.

Writes to a region marked non-cacheable/non-bufferable (page attributes C, B, and X all 0) causes execution to stall until the write completes.

If software is running in a privileged mode, it can explicitly drain all buffered writes. For details on this operation, see the description of Drain Write Buffer in Section 7.2.8, “Register 7: Cache

Functions” on page 7-11.

80200 processor data requests to external memory including the write

6-16 March, 2003 Developer’s Manual

Configuration

This chapter describes the System Control Coprocessor (CP15) and coprocessor 14 (CP14). CP15 configures the MMU, caches, buffers and other system attributes. Where possible, the definition of CP15 follows the definition in the first generation Intel performance monitor registers and the trace buffer registers.

7.1 Overview

CP15 is accessed through MRC and MCR coprocessor instructions and allowed only in privileged mode. Any access to CP15 in user mode or with LDC or STC coprocessor instructions causes an undefined instruction exception.

CP14 registers can be accessed through MRC, MCR, LDC, and STC coprocessor instructions and allowed only in privileged mode. Any access to CP14 in user mode causes an undefined instruction exception.

Coprocessors on the Intel (compliant with the ARM* Architecture V5TE) do not support access via CDP, MRRC, or MCRR instructions. An attempt to execute these instructions results in an Undefined Instruction exception.

StrongARM* products. CP14 contains the

80200 processor based on Intel® XScale™ microarchitecture

Many of the MCR commands available in CP15 modify hardware state sometime after execution. A software sequence is available for those wishing to determine when this update occurs and can be found in Section 2.3.3, “Additions to CP15 Functionality” on page 2-11.

Like certain other ARM* architecture products, the Intel of virtual address translation in the form of a PID (Process ID) register and associated logic. For a detailed description of this facility, see Section 7.2.13, “Register 13: Process ID” on page 7-16. Privileged code needs to be aware of this facility because, when interacting with CP15, some addresses are modified by the PID and others are not.

An address that has yet to be modified by the PID (“PIDified”) is known as a virtual address (VA). An address that has been through the PID logic, but not translated into a physical address, is a modified virtual address (MVA). Non-privileged code always deals with VAs, while privileged code that programs CP15 occasionally needs to use MVAs.

80200 processor includes an extra level

Developer’s Manual March, 2003 7-1

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Configuration

The format of MRC and MCR is shown in Table 7- 1.

cp_num is defined for CP15, CP14, CP13 and CP0. CP13 contains the interrupt controller and bus controller registers and is described in Chapter 9, “Interrupts”and Chapter 11, “Bus Controller,” respectively. CP0 supports instructions specific for DSP and is described in Chapter 2,

“Programming Model.” Access to all other coprocessors on the Intel

undefined exception.

Unless otherwise noted, unused bits in coprocessor registers have unpredictable values when read. For compatibility with future implementations, software should not rely on the values in those bits.

Table 7-1. MRC/MCR Format

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond 1110

Bits Description Notes

31:28 cond - ARM* condition codes -

23:21 opcode_1 - Reserved

n - Read or write coprocessor register

19:16 CRn - specifies which coprocessor register -

15:12 Rd - General Purpose Register, R0..R15 -

11: 8 cp_num - coprocessor number

7:5 opcode_2 - Function bits

3:0 CRm - Function bits

0 = MCR 1 = MRC

opcode_1

n CRn Rd cp_num

Should be programmed to zero for future compatibility

0b1111 = CP15 0b1110 = CP14 0x1101 = CP13 0x0000 = CP0

This field should be programmed to zero for future compatibility unless a value has been specified in the command.

80200 processor causes an

opcode_2

1 CRm

7-2 March, 2003 Developer’s Manual

The format of LDC and STC is shown in Tabl e 7-2. LDC and STC follow the programming notes in the ARM Architecture Reference Manual.

LDC and STC transfer a single 32-bit word between a coprocessor register and memory. These instructions do not allow the programmer to specify values for opcode_1, opcode_2, or Rm; those fields implicitly contain zero.

Table 7-2. LDC/STC Format

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond 1 1 0 P U N W L Rn CRd cp_num 8_bit_word_offset

Bits Description Notes

31:28 cond - ARM* condition codes -

24:23,21

19:16 Rn - specifies the base register -

15:12 CRd - specifies the coprocessor register -

11:8 cp_num - coprocessor number

7:0 8-bit word offset -

P, U , W - specifies 1 of 3 addressing modes identified by addressing mode 5 in the

Architecture Reference Manual

N - should be 0 for Intel Setting this bit to 1 has an undefined effect.

L - Load or Store

0 = STC 1 = LDC

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Configuration

ARM

80200 processors.

0b1111 = Undefined Exception 0b1110 = CP14 0b1101 = CP13

Developer’s Manual March, 2003 7-3

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Configuration

7.2 CP15 Registers

Table 7 -3 lists the CP15 registers implemented in the Intel® 80200 processor.

Table 7-3. CP15 Registers

0 0 Read / Write-Ignored ID

0 1 Read / Write-Ignored Cache Type

1 0 Read / Write Control

1 1 Read / Write Auxiliary Control

2 0 Read / Write Translation Table Base

3 0 Read / Write Domain Access Control

4 - Unpredictable Reserved

5 0 Read / Write Fault Status

6 0 Read / Write Fault Address

7 0 Read-unpredictable / Write Cache Operations

8 0 Read-unpredictable / Write TLB Operations

9 0 Read-unpredictable / Write Cache Lock Down

10 0 Read / Write TLB Lock Down

11 - 12 - Unpredictable Reserved

13 0 Read / Write Process ID (PID)

14 0 Read / Write Breakpoint Registers

15 0 Read / Write (CRm = 1) CP Access

7-4 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

7.2.1 Register 0: ID and Cache Type Registers

Register 0 houses two read-only registers that are used for part identification: an ID register and a cache type register.

Configuration

The ID Register is selected when opcode_2=0. This register returns the code for the Intel processor: 0x69052000 for A0 stepping/revision. The low order four bits of the register are the chip revision number and will be incremented for future steppings.

Table 7-4. ID Register

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0110100100000101001000000000 Revision

reset value: As Shown

Bits Access Description

31:24 Read / Write Ignored

23:16 Read / Write Ignored Architecture version = ARM* Version 5

15:4 Read / Write Ignored

3:0 Read / Write Ignored

The Cache Type Register is selected when opcode_2=1 and describes the present Intel processor cache.

80200

Implementation trademark (0x69 = ‘i’= Intel Corporation)

Part Number (Implementation Specified)

Intel

80200 processor: 0x200 Bits[15:12] refer to the processor generation. Bits[11:8] refer to the implementation Bits[7:4] used for implementation derivatives

Revision number for the processor (Implementation Specified) A0 stepping = 0b0000 A1 stepping = 0b0001 B0 stepping = 0b0010 C0 stepping = 0b0011 D0 stepping = 0b0100

80200

Table 7-5. Cache Type Register (Sheet 1 of 2)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

00001011000110101010000110101010

reset value: As Shown

Bits Access Description

31:29 Read-as-Zero / Write Ignored Reserved

28:25 Read / Write Ignored

24 Read / Write Ignored Harvard Cache

23:21 Read-as-Zero / Write Ignored Reserved

20:18 Read / Write Ignored Data Cache Size = 0b110 = 32 kB

17:15 Read / Write Ignored Data cache associativity = 0b101 = 32

14 Read-as-Zero / Write Ignored Reserved

13:12 Read / Write Ignored Data cache line length = 0b10 = 8 words/line

11:9 Read-as-Zero / Write Ignored Reserved

8:6 Read / Write Ignored Instruction cache size = 0b110 = 32 kB

Developer’s Manual March, 2003 7-5

Cache class = 0b0101 The caches support locking, write back and round-robin replacement. They do not support address by index.

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Configuration

Table 7-5. Cache Type Register (Sheet 2 of 2)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

00001011000110101010000110101010

reset value: As Shown

Bits Access Description

5:3 Read / Write Ignored Instruction cache associativity = 0b101 = 32 kB

2 Read-as-Zero / Write Ignored Reserved

1:0 Read / Write Ignored Instruction cache line length = 0b10 = 8 words/line

7-6 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Configuration

7.2.2 Register 1: Control and Auxiliary Control Registers

Register 1 is made up of two registers, one that is compliant with ARM Version 5 and is referenced by opcode_2 = 0x0, and the other which is specific to Intel opcode_2 = 0x1.

The Exception Vector Relocation bit (bit 13 of the ARM control register) allows the vectors to be mapped into high memory rather than their default location at address 0. This bit is readable and writable by software. If the MMU is enabled, the exception vectors are accessed via the usual translation method involving the PID register (see Section 7.2.13, “Register 13: Process ID” on

page 7-16) and the TLBs. To avoid automatic application of the PID to exception vector accesses,

software may relocate the exceptions to high memory.

Table 7-6. ARM* Control Register

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

reset value: writeable bits set to 0

Bits Access Description

31:14

13 Read / Write

12 Read / Write

11 Read / Write

10 Read-as-Zero / Write-as-Zero Reserved

9 Read / Write

8 Read / Write

7 Read / Write

6:3 Read-as-One / Write-as-One = 0b1111

2 Read / Write

1 Read / Write

0 Read / Write

Read-Unpredictable / Write-as-Zero

StrongARM* and is referenced by

V I Z 0 RSB1 1 1 1 CAM

Reserved

Exception Vector Relocation (V).

0 = Base address of exception vectors is 0x0000,0000 1 = Base address of exception vectors is 0xFFFF,0000

Instruction Cache Enable/Disable (I)

0 = Disabled 1 = Enabled

Branch Target Buffer Enable (Z)

0 = Disabled 1 = Enabled

ROM Protection (R)

This selects the access checks performed by the memory management unit. See the

for more information.

Manual

System Protection (S)

This selects the access checks performed by the memory management unit. See the

for more information.

Manual

Big/Little Endian (B)

0 = Little-endian operation 1 = Big-endian operation

Data cache enable/disable (C)

0 = Disabled 1 = Enabled

Alignment fault enable/disable (A)

0 = Disabled 1 = Enabled

Memory management unit enable/disable (M)

0 = Disabled 1 = Enabled

ARM Architecture Reference

Developer’s Manual March, 2003 7-7

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Configuration

The mini-data cache attribute bits, in the Intel® 80200 processor Control Register, are used to control the allocation policy for the mini-data cache and whether it uses write-back caching or write-through caching.

The configuration of the mini-data cache should be setup before any data access is made that may be cached in the mini-data cache. Once data is cached, software must ensure that the mini-data cache has been cleaned and invalidated before the mini-data cache attributes can be changed.

Table 7-7. Auxiliary Control Register

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

reset value: writeable bits set to 0

Bits Access Description

MD PK

31:6

5:4 Read / Write

3:2

1 Read / Write

0 Read / Write

Read-Unpredictable / Write-as-Zero

Reserved

Mini Data Cache Attributes (MD)

All configurations of the Mini-data cache are cacheable, stores are buffered in the write buffer and stores are coalesced in the write buffer as long as coalescing is globally enabled (bit 0 of this register).

0b00 = Write back, Read allocate 0b01 = Write back, Read/Write allocate 0b10 = Write through, Read allocate 0b11 = Unpredictable

Reserved

Page Table Memory Attribute (P)

If set, page table accesses are protected by ECC. See

Chapter 11, “Bus Controller” for more information.

Write Buffer Coalescing Disable (K)

This bit globally disables the coalescing of all stores in the write buffer no matter what the value of the Cacheable and Bufferable bits are in the page table descriptors.

0 = Enabled 1 = Disabled

7-8 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

7.2.3 Register 2: Translation Table Base Register

Table 7-8. Translation Table Base Register

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Translation Table Base

reset value: unpredictable

Bits Access Description

31:14 Read / Write

13:0 Read-unpredictable / Write-as-Zero Reserved

Translation Table Base - Physical address of the base of the first-level table

7.2.4 Register 3: Domain Access Control Register

Table 7-9. Domain Access Control Register

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

D15 D14 D13 D12 D11 D10 D9 D8 D7 D6 D5 D4 D3 D2 D1 D0

Configuration

reset value: unpredictable

Bits Access Description

31:0 Read / Write

7.2.5 Register 4: Reserved

Access permissions for all 16 domains - The meaning of each field can be found in the

Reference Manual

ARM Architecture

Developer’s Manual March, 2003 7-9

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Configuration

7.2.6 Register 5: Fault Status Register

The Fault Status Register (FSR) indicates which fault has occurred, which could be either a prefetch abort or a data abort. Bit 10 extends the encoding of the status field for prefetch aborts and data aborts. The definition of the extended status field is found in Section 2.3.4, “Event

Architecture” on page 2-12. Bit 9 indicates that a debug event occurred and the exact source of the

event is found in the debug control and status register (CP14, register 10). When bit 9 is set, the domain and extended status field are undefined.

Upon entry into the prefetch abort or data abort handler, hardware updates this register with the source of the exception. Software is not required to clear these fields.

Table 7-10. Fault Status Register

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

reset value: unpredictable

Bits Access Description

31:11 Read-unpredictable / Write-as-Zero Reserved

Status Field Extension (X)

10 Read / Write

9 Read / Write

8 Read-as-zero / Write-as-Zero = 0

7:4 Read / Write

3:0 Read / Write Status - Type of data access being attempted

This bit is used to extend the encoding of the Status field, when there is a prefetch abort and when there is a data abort. The definition of this field can be found in

Section 2.3.4, “Event Architecture” on page 2-12

Debug Event (D)

This flag indicates a debug event has occurred and that the cause of the debug event is found in the MOE field of the debug control register (CP14, register 10)

Domain - Specifies which of the 16 domains was being accessed when a data abort occurred

XD0 Domain Status

7.2.7 Register 6: Fault Address Register

Table 7-11. Fault Address Register

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Fault Virtual Address

reset value: unpredictable

Bits Access Description

31:0 Read / Write

7-10 March, 2003 Developer’s Manual

Fault Virtual Address - Contains the MVA of the data access that caused the memory abort

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

7.2.8 Register 7: Cache Functions

All the functions defined in the first generation of Intel® StrongARM* appear here. The Intel® 80200 processor adds other functions as well. This register should be accessed as write-only. Reads from this register, as with an MRC, have an undefined effect.

The Drain Write Buffer function not only drains the write buffer but also drains the fill buffer.

Configuration

The Intel

80200 processor does not check permissions on addresses supplied for cache or TLB functions. Because only privileged software may execute these functions, full accessibility is assumed. Cache functions do not generate any of the following:

• translation faults

• domain faults

• permission faults

The invalidate instruction cache line command does not invalidate the BTB. If software invalidates a line from the instruction cache and modifies the same location in external memory, it needs to invalidate the BTB also. Not invalidating the BTB in this case may cause unpredictable results.

Disabling/enabling a cache has no effect on contents of the cache: valid data stays valid, locked items remain locked. All operations defined in Table 7-12 work regardless of whether the cache is enabled or disabled.

Since the Clean D Cache Line function reads from the data cache, it is capable of generating a parity fault. The other operations do not generate parity faults.

Table 7-12. Cache Functions

Function opcode_2 CRm Data Instruction

Invalidate I&D cache & BTB 0b000 0b0111 Ignored MCR p15, 0, Rd, c7, c7, 0

Invalidate I cache & BTB 0b000 0b0101 Ignored MCR p15, 0, Rd, c7, c5, 0

Invalidate I cache line 0b001 0b0101 MVA MCR p15, 0, Rd, c7, c5, 1

Invalidate D cache 0b000 0b0110 Ignored MCR p15, 0, Rd, c7, c6, 0

Invalidate D cache line 0b001 0b0110 MVA MCR p15, 0, Rd, c7, c6, 1

Clean D cache line 0b001 0b1010 MVA MCR p15, 0, Rd, c7, c10, 1

Drain Write (& Fill) Buffer 0b100 0b1010 Ignored MCR p15, 0, Rd, c7, c10, 4

Invalidate Branch Target Buffer 0b110 0b0101 Ignored MCR p15, 0, Rd, c7, c5, 6

Allocate Line in the Data Cache 0b101 0b0010 MVA MCR p15, 0, Rd, c7, c2, 5

The line-allocate command allocates a tag into the data cache specified by bits [31:5] of Rd. If a valid dirty line (with a different MVA) already exists at this location, it will be evicted. The 32 bytes of data associated with the newly allocated line are not initialized and therefore generates unpredictable results if read.

This command may be used for cleaning the entire data cache on a context switch and also when re-configuring portions of the data cache as data RAM. In both cases, Rd is a virtual address that maps to some non-existent physical memory. When creating data RAM, software must initialize the data RAM before read accesses can occur. Specific uses of these commands can be found in

Chapter 6, “Data Cache”.

Developer’s Manual March, 2003 7-11

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Configuration

Other items to note about the line-allocate command are:

• It forces all pending memory operations to complete.

• Bits [31:5] of Rd is used to specific the virtual address of the line to allocated into the data

cache.

• If the targeted cache line is already resident, this command has no effect.

• This command cannot be used to allocate a line in the mini Data Cache.

• The newly allocated line is not marked as “dirty” so it never gets evicted. However, if a valid

store is made to that line it is marked as “dirty” and gets written back to external memory if another line is allocated to the same cache location. This eviction produces unpredictable results if the line-allocate command used a virtual address that mapped to non-existent memory.

To avoid this situation, the line-allocate operation should only be used if one of the following can be guaranteed:

— The virtual address associated with this command is not one that is generated during

normal program execution. This is the case when line-allocate is used to clean/invalidate the entire cache.

— The line-allocate operation is used only on a cache region destined to be locked. When the

region is unlocked, it must be invalidated before making another data access.

7-12 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

7.2.9 Register 8: TLB Operations

Disabling/enabling the MMU has no effect on the contents of either TLB: valid entries stay valid, locked items remain locked. All operations defined in Table 7-1 3 work regardless of whether the TLB is enabled or disabled.

This register should be accessed as write-only. Reads from this register, as with an MRC, have an undefined effect.

Table 7-13. TLB Functions

Function opcode_2 CRm Data Instruction

Invalidate I&D TLB 0b000 0b0111 Ignored MCR p15, 0, Rd, c8, c7, 0

Invalidate I TLB 0b000 0b0101 Ignored MCR p15, 0, Rd, c8, c5, 0

Invalidate I TLB entry 0b001 0b0101 MVA MCR p15, 0, Rd, c8, c5, 1

Invalidate D TLB 0b000 0b0110 Ignored MCR p15, 0, Rd, c8, c6, 0

Invalidate D TLB entry 0b001 0b0110 MVA MCR p15, 0, Rd, c8, c6, 1

Configuration

Developer’s Manual March, 2003 7-13

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Configuration

7.2.10 Register 9: Cache Lock Down

Register 9 is used for locking down entries into the instruction cache and data cache. (The protocol for locking down entries can be found in Chapter 6, “Data Cache”.)

Table 7 -14 shows the command for locking down entries in the instruction cache, instruction TLB,

and data TLB. The entry to lock is specified by the virtual address in Rd. The data cache locking mechanism follows a different procedure than the others. The data cache is placed in lock down mode such that all subsequent fills to the data cache result in that line being locked in, as controlled by Tab le 7 -15.

Lock/unlock operations on a disabled cache have an undefined effect. This register should be accessed as write-only. Reads from this register, as with an MRC, have an undefined effect.

Table 7-14. Cache Lockdown Functions

Function opcode_2 CRm Data Instruction

Fetch and Lock I cache line 0b000 0b0001 MVA MCR p15, 0, Rd, c9, c1, 0

Unlock Instruction cache 0b001 0b0001 Ignored MCR p15, 0, Rd, c9, c1, 1

Read data cache lock register 0b000 0b0010

Write data cache lock register 0b000 0b0010

Unlock Data Cache 0b001 0b0010 Ignored MCR p15, 0, Rd, c9, c2, 1

Read lock mode value

Set/Clear lock mode

MRC p15, 0, Rd, c9, c2, 0

MCR p15, 0, Rd, c9, c2, 0

Table 7-15. Data Cache Lock Register

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

reset value: writeable bits set to 0

Bits Access Description

31:1 Read-unpredictable / Write-as-Zero Reserved

0 Read-unpredictable / Write

Data Cache Lock Mode (L)

0 = No locking occurs 1 = Any fill into the data cache while this bit is set gets

locked in

7-14 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

7.2.11 Register 10: TLB Lock Down

Register 10 is used for locking down entries into the instruction TLB, and data TLB. (The protocol for locking down entries can be found in Chapter 3, “Memory Management”.) Lock/unlock operations on a TLB when the MMU is disabled have an undefined effect.

This register should be accessed as write-only. Reads from this register, as with an MRC, have an undefined effect.

Table 7- 16 shows the command for locking down entries in the instruction TLB, and data TLB.

The entry to lock is specified by the virtual address in Rd.

Table 7-16. TLB Lockdown Functions

Function opcode_2 CRm Data Instruction

Translate and Lock I TLB entry 0b000 0b0100 MVA MCR p15, 0, Rd, c10, c4, 0

Translate and Lock D TLB entry 0b000 0b1000 MVA MCR p15, 0, Rd, c10, c8, 0

Unlock I TLB 0b001 0b0100 Ignored MCR p15, 0, Rd, c10, c4, 1

Unlock D TLB 0b001 0b1000 Ignored MCR p15, 0, Rd, c10, c8, 1

Configuration

7.2.12 Register 11-12: Reserved

These registers are reserved. Reading and writing them yields unpredictable results.

Developer’s Manual March, 2003 7-15

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Configuration

7.2.13 Register 13: Process ID

The Intel® 80200 processor supports the remapping of virtual addresses through a Process ID (PID) register. This remapping occurs before the instruction cache, instruction TLB, data cache and data TLB are accessed. The PID register controls when virtual addresses are remapped and to what value.

The PID register is a 7-bit value that is ORed with bits 31:25 of the virtual address when they are zero. This effectively remaps the address to one of 128 “slots” in the 4 Gbytes of address space. If bits 31:25 are not zero, no remapping occurs. This feature is useful for operating system management of processes that may map to the same virtual address space. In those cases, the virtually mapped caches on the Intel switch.

Table 7-17. Accessing Process ID

Function opcode_2 CRm Instruction

Read Process ID Register 0b000 0b0000 MRC p15, 0, Rd, c13, c0, 0

Write Process ID Register 0b000 0b0000 MCR p15, 0, Rd, c13, c0, 0

Table 7-18. Process ID Register

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Process ID

80200 processor would not require invalidating on a process

reset value: 0x0000,0000

Bits Access Description

31:25 Read / Write

24:0 Read-as-Zero / Write-as-Zero

Process ID - This field is used for remapping the virtual address when bits 31-25 of the virtual address are zero.

Reserved - Should be programmed to zero for future compatibility

7.2.13.1 The PID Register Affect On Addresses

All addresses generated and used by User Mode code are eligible for being “PIDified” as described in the previous section. Privileged code, however, must be aware of certain special cases in which address generation does not follow the usual flow.

The PID register is not used to remap the virtual address when accessing the Branch Target Buffer (BTB). Any writes to the PID register invalidate the BTB, which prevents any virtual addresses from being double mapped between two processes.

A breakpoint address (see Section 7.2.14, “Register 14: Breakpoint Registers” on page 7-17) must be expressed as an MVA when written to the breakpoint register. This means the value of the PID must be combined appropriately with the address before it is written to the breakpoint register. All virtual addresses in translation descriptors (see Chapter 3, “Memory Management”) are MVAs.

7-16 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

7.2.14 Register 14: Breakpoint Registers

The Intel® 80200 processor contains two instruction breakpoint address registers (IBCR0 and IBCR1), one data breakpoint address register (DBR0), one configurable data mask/address register (DBR1), and one data breakpoint control register (DBCON). The Intel supports a 256 entry, trace buffer that records program execution information. The registers to control the trace buffer are located in CP14.

Configuration

80200 processor also

Refer to Chapter 13, “Software Debug” for more information on these features of the Intel processor.

Table 7-19. Accessing the Debug Registers

Function opcode_2 CRm Instruction

Access Instruction Breakpoint Control Register 0 (IBCR0)

Access Instruction Breakpoint Control Register 1(IBCR1)

Access Data Breakpoint Address Register (DBR0)

Access Data Mask/Address Register (DBR1)

Access Data Breakpoint Control Register (DBCON)

0b000 0b1000

0b000 0b1001

0b000 0b0000

0b000 0b0011

0b000 0b0100

80200

MRC p15, 0, Rd, c14, c8, 0 ; read MCR p15, 0, Rd, c14, c8, 0 ; write

MRC p15, 0, Rd, c14, c9, 0 ; read MCR p15, 0, Rd, c14, c9, 0 ; write

MRC p15, 0, Rd, c14, c0, 0 ; read MCR p15, 0, Rd, c14, c0, 0 ; write

MRC p15, 0, Rd, c14, c3, 0 ; read MCR p15, 0, Rd, c14, c3, 0 ; write

MRC p15, 0, Rd, c14, c4, 0 ; read MCR p15, 0, Rd, c14, c4, 0 ; write

Developer’s Manual March, 2003 7-17

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Configuration

7.2.15 Register 15: Coprocessor Access Register

This register is selected when opcode_2 = 0 and CRm = 1.

This register controls access rights to all the coprocessors in the system except for CP15 and CP14. Both CP15 and CP14 can only be accessed in privilege mode. This register is accessed with an MCR or MRC with the CRm field set to 1.

This register controls access to CP0 and CP13 for the Intel register is for an operating system to control resource sharing among applications. Initially, all applications are denied access to shared resources by clearing the appropriate coprocessor bit in the Coprocessor Access Register. An application may request the use of a shared resource (e.g., the accumulator in CP0) by issuing an access to the resource, which results in an undefined exception. The operating system may grant access to this coprocessor by setting the appropriate bit in the Coprocessor Access Register and return to the application where the access is retried.

Sharing resources among different applications requires a state saving mechanism. Two possibilities are:

• The operating system, during a context switch, could save the state of the coprocessor if the

last executing process had access rights to the coprocessor.

• The operating system, during a request for access, saves off the old coprocessor state and saves

it with last process to have access to it.

Under both scenarios, the OS needs to restore state when a request for access is made. This means the OS has to maintain a list of what processes are modifying CP0 and their associated state.

Example 7-1. Disallowing access to CP0

;; The following code clears bit 0 of the CPAR. ;; This will cause the processor to fault if software ;; attempts to access CP0.

LDR R0, =0x0000 ; bit 0 is clear MCR P15, 0, R0, C15, C1, 0 ; move to CPAR CPWAIT ; wait for effect

80200 processor. A typical use for this

7-18 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Table 7-20. Coprocessor Access Register

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

reset value: 0x0000,0000

Bits Access Description

31:16 Read-unpredictable / Write-as-Zero Reserved - Program to zero for future compatibility

15:14 Read-as-Zero/Write-as-Zero Reserved - Program to zero for future compatibility

13:0 Read / Write

Configuration

0 0

Coprocessor Access Rights-

Each bit in this field corresponds to the access rights for each coprocessor.

For each bit:

0 = Access denied. Attempts to access corresponding

coprocessor generates an undefined exception.

1 = Access allowed. Includes read and write accesses.

Setting any of bits 12:1 has an undefined effect.

C P 0

Developer’s Manual March, 2003 7-19

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Configuration

7.3 CP14 Registers

Table 7 -21 lists the CP14 registers implemented in the Intel® 80200 processor.

Table 7-21. CP14 Registers

0-3 Read / Write Performance Monitoring Registers

4-5 Unpredictable Reserved

6-7 Read / Write Clock and Power Management

8-15 Read / Write Software Debug

7.3.1 Registers 0-3: Performance Monitoring

The performance monitoring unit contains a control register (PMNC), a clock counter (CCNT), and two event counters (PMN0 and PMN1). The format of these registers can be found in

Chapter 12, “Performance Monitoring”, along with a description on how to use the performance

monitoring facility.

Opcode_2 and CRm should be zero.

Table 7-22. Accessing the Performance Monitoring Registers

Function CRn (Register #) Instruction

Read PMNC 0b0000 MRC p14, 0, Rd, c0, c0, 0

Write PMNC 0b0000 MCR p14, 0, Rd, c0, c0, 0

Read CCNT 0b0001 MRC p14, 0, Rd, c1, c0, 0

Write CCNT 0b0001 MCR p14, 0, Rd, c1, c0, 0

Read PMN0 0b0010 MRC p14, 0, Rd, c2, c0, 0

Write PMN0 0b0010 MCR p14, 0, Rd, c2, c0, 0

Read PMN1 0b0011 MRC p14, 0, Rd, c3, c0, 0

Write PMN1 0b0011 MCR p14, 0, Rd, c3, c0, 0

7.3.2 Register 4-5: Reserved

These registers are reserved. Reading and writing them yields unpredictable results.

7-20 March, 2003 Developer’s Manual

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

7.3.3 Registers 6-7: Clock and Power Management

These registers contain functions for managing the core clock and power.

Three low power modes are supported that are entered upon executing the functions listed in

Table 7- 24. To enter any of these modes, write the appropriate data to CP14, register 7

(PWRMODE). Software may read this register, but since software only runs during ACTIVE mode, it always reads zeroes from the M field.

Table 7-23. PWRMODE Register

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

reset value: writeable bits set to 0

Bits Access Description

31:2 Read-unpredictable / Write-as-Zero Reserved

Mode (M)

1:0 Read / Write

0 = ACTIVE 2 = RESERVED 1 = IDLE 3 = SLEEP

Configuration

Software can change the core clock frequency by writing to the CP 14 register 6, CCLKCFG. This function waits for all the Intel

80200 processor initiated memory requests to complete and informs the PLL to change the core clock frequency. This function completes when the PLL is re-locked. Software can read CCLKCFG to determine current operating frequency.

Table 7-24. Clock and Power Management

Function Data Instruction

Go to IDLE 1 MCR p14, 0, Rd, c7, c0, 0

Go to SLEEP 3 MCR p14, 0, Rd, c7, c0, 0

Read CCLKCFG ignored MRC p14, 0, Rd, c6, c0, 0

Write CCLKCFG CCLKCFG value MCR p14, 0, Rd, c6, c0, 0

Table 7-25. CCLKCFG Register

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

reset value: unpredictable

Bits Access Description

31:4 Read-unpredictable / Write-as-Zero Reserved

3:0 Read / Write

CCLKCFG

Core Clock Configuration (CCLKCFG)

This field is used to configure the core clock frequency. The value in this field is multiplied by REFCLK to obtain core clock. See Table 8-2.

Developer’s Manual March, 2003 7-21

Intel® 80200 Processor based on Intel® XScale™ Microarchitecture

Configuration

7.3.4 Registers 8-15: Software Debug

Software debug is supported by address breakpoint registers (Coprocessor 15, register 14), serial communication over the JTAG interface and a trace buffer. Registers 8 and 9 are used for the serial interface and registers 10 through 13 support a 256 entry trace buffer. Register 14 and 15 are the debug link register and debug SPSR (saved program status register). These registers are explained in more detail in Chapter 13, “Software Debug”.

Opcode_2 and CRm should be zero.

Table 7-26. Accessing the Debug Registers

Function CRn (Register #) Instruction

Access Transmit Debug Register (TX) 0b1000

Access Receive Debug Register (RX) 0b1001

Access Debug Control and Status Register (DBGCSR)

Access Trace Buffer Register (TBREG) 0b1011

Access Checkpoint 0 Register (CHKPT0) 0b1100

Access Checkpoint 1 Register (CHKPT1) 0b1101

Access Transmit and Receive Debug Control Register

0b1010

0b1110

MRC p14, 0, Rd, c8, c0, 0 MCR p14, 0, Rd, c8, c0, 0

MCR p14, 0, Rd, c9, c0, 0 MRC p14, 0, Rd, c9, c0, 0

MCR p14, 0, Rd, c10, c0, 0 MRC p14, 0, Rd, c10, c0, 0

MCR p14, 0, Rd, c11, c0, 0 MRC p14, 0, Rd, c11, c0, 0

MCR p14, 0, Rd, c12, c0, 0 MRC p14, 0, Rd, c12, c0, 0

MCR p14, 0, Rd, c13, c0, 0 MRC p14, 0, Rd, c13, c0, 0

MCR p14, 0, Rd, c14, c0, 0 MRC p14, 0, Rd, c14, c0, 0

7-22 March, 2003 Developer’s Manual

Intel 80200 User Manual

Specifications and Main Features

Frequently Asked Questions

User Manual

1.1 Intel® 80200 Processor based on Intel® XScale™ Microarchitecture High-Level Overview

1.1.1 ARM* Architecture Compliance

1.1.2 Features

1.1.2.1 Multiply/Accumulate (MAC)

1.1.2.2 Memory Management

1.1.2.3 Instruction Cache

1.1.2.4 Branch Target Buffer

1.1.2.5 Data Cache

1.1.2.6 Power Management

1.1.2.7 Interrupt Controller

1.1.2.8 Bus Controller

1.1.2.9 Performance Monitoring

1.1.2.10 Debug

1.1.2.11 JTAG

1.2 Terminology and Conventions

1.2.1 Number Representation

1.2.2 Terminology and Acronyms

1.3 Other Relevant Documents

2.1 ARM* Architecture Compliance

2.2 ARM* Architecture Implementation Options

2.2.1 Big Endian versus Little Endian

2.2.2 26-Bit Code

2.2.3 Thumb*

2.2.4 ARM* DSP-Enhanced Instruction Set

2.2.5 Base Register Update

2.3 Extensions to ARM* Architecture

2.3.1 DSP Coprocessor 0 (CP0)

2.3.1.1 Multiply With Internal Accumulate Format

2.3.1.2 Internal Accumulator Access Format

2.3.2 New Page Attributes

2.3.3 Additions to CP15 Functionality

2.3.4 Event Architecture

2.3.4.1 Exception Summary

2.3.4.2 Event Priority

2.3.4.3 Prefetch Aborts

2.3.4.4 Data Aborts

2.3.4.5 Events from Preload Instructions

2.3.4.6 Debug Events

3.1 Overview

3.2 Architecture Model

3.2.1 Version 4 vs. Version 5

3.2.2 Memory Attributes

3.2.2.1 Page (P) Attribute Bit

3.2.2.2 Cacheable (C), Bufferable (B), and eXtension (X) Bits

3.2.2.3 Instruction Cache

3.2.2.4 Data Cache and Write Buffer

3.2.2.5 Details on Data Cache and Write Buffer Behavior

3.2.2.6 Memory Operation Ordering

3.2.3 Exceptions

3.3 Interaction of the MMU, Instruction Cache, and Data Cache

3.4 Control

3.4.1 Invalidate (Flush) Operation

3.4.2 Enabling/Disabling

3.4.3 Locking Entries

3.4.4 Round-Robin Replacement Algorithm

4.1 Overview

4.2 Operation

4.2.1 Operation When Instruction Cache is Enabled

4.2.2 Operation When The Instruction Cache Is Disabled

4.2.3 Fetch Policy

4.2.4 Round-Robin Replacement Algorithm

4.2.5 Parity Protection

4.2.6 Instruction Fetch Latency

4.2.7 Instruction Cache Coherency

4.3 Instruction Cache Control

4.3.1 Instruction Cache State at RESET

4.3.2 Enabling/Disabling

4.3.3 Invalidating the Instruction Cache

4.3.4 Locking Instructions in the Instruction Cache

4.3.5 Unlocking Instructions in the Instruction Cache

5.1 Branch Target Buffer (BTB) Operation

5.1.1 Reset

5.1.2 Update Policy

5.2 BTB Control

5.2.1 Disabling/Enabling

5.2.2 Invalidation