Intel 80200 User Manual

2.96 Mb
Loading...

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Developer’s Manual

March, 2003

Order Number: 273411-003

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Information in this document is provided in connection with Intel® products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel's Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel® products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice.Intel may make changes to specifications and product descriptions at any time, without notice.

Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.

The Intel® 80200 Processor may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an ordering number and are referenced in this document, or other Intel literature may be obtained by calling 1-800-548-4725 or by visiting Intel's website at http://www.intel.com.

Copyright© Intel Corporation, 2003

*Other brands and names are the property of their respective owners.

ARM and StrongARM are registered trademarks of ARM, Ltd.

ii

March, 2003

Developer’s Manual

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Contents

1

 

Introduction

..............................................................................................

1

1.1

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture High-Level Overview.........

1

 

1.1.1

ARM* Architecture Compliance ...................................................................................

1

 

1.1.2

Features.......................................................................................................................

 

2

 

 

1.1.2.1

Multiply/Accumulate (MAC) ......................................................................

2

 

 

1.1.2.2

Memory Management...............................................................................

3

 

 

1.1.2.3

Instruction Cache......................................................................................

3

 

 

1.1.2.4

Branch Target Buffer ................................................................................

3

 

 

1.1.2.5

Data Cache...............................................................................................

3

 

 

1.1.2.6

Power Management..................................................................................

4

 

 

1.1.2.7

Interrupt Controller....................................................................................

4

 

 

1.1.2.8

Bus Controller ...........................................................................................

4

 

 

1.1.2.9

Performance Monitoring ...........................................................................

4

 

 

1.1.2.10

Debug .......................................................................................................

4

 

 

1.1.2.11

JTAG.........................................................................................................

4

1.2

Terminology and Conventions ......................................................................................................

5

 

1.2.1

Number Representation...............................................................................................

5

 

1.2.2

Terminology and Acronyms .........................................................................................

5

1.3

Other Relevant Documents ..........................................................................................................

6

2

 

Programming Model ................................................................................

1

2.1

ARM* Architecture Compliance ....................................................................................................

1

2.2

ARM* Architecture Implementation Options .................................................................................

1

 

2.2.1

Big Endian versus Little Endian ...................................................................................

1

 

2.2.2

26-Bit Code

..................................................................................................................

1

 

2.2.3

Thumb* ........................................................................................................................

 

1

 

2.2.4

ARM* DSP-Enhanced Instruction Set..........................................................................

2

 

2.2.5

Base Register Update..................................................................................................

2

2.3

Extensions to ARM* Architecture..................................................................................................

3

 

2.3.1

DSP Coprocessor 0 (CP0)...........................................................................................

3

 

 

2.3.1.1

Multiply With Internal Accumulate Format ................................................

4

 

 

2.3.1.2

Internal Accumulator Access Format........................................................

7

 

2.3.2

New Page Attributes ....................................................................................................

9

 

2.3.3

Additions to CP15 Functionality.................................................................................

11

 

2.3.4

Event Architecture .....................................................................................................

12

 

 

2.3.4.1

Exception Summary................................................................................

12

 

 

2.3.4.2

Event Priority ..........................................................................................

12

 

 

2.3.4.3

Prefetch Aborts .......................................................................................

13

 

 

2.3.4.4

Data Aborts.............................................................................................

14

 

 

2.3.4.5

Events from Preload Instructions............................................................

16

 

 

2.3.4.6

Debug Events .........................................................................................

16

3

 

Memory Management ..............................................................................

1

3.1

Overview.......................................................................................................................................

1

3.2

Architecture Model........................................................................................................................

2

 

3.2.1

Version 4 vs. Version 5................................................................................................

2

 

3.2.2

Memory Attributes........................................................................................................

2

Developer’s Manual

March, 2003

iii

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

 

 

3.2.2.1

Page (P) Attribute Bit ................................................................................

2

 

 

3.2.2.2

Cacheable (C), Bufferable (B), and eXtension (X) Bits ............................

2

 

 

3.2.2.3

Instruction Cache ......................................................................................

2

 

 

3.2.2.4

Data Cache and Write Buffer ....................................................................

3

 

 

3.2.2.5

Details on Data Cache and Write Buffer Behavior ...................................

4

 

 

3.2.2.6

Memory Operation Ordering .....................................................................

4

 

3.2.3

Exceptions

...................................................................................................................

4

3.3

Interaction of the MMU, Instruction Cache, and Data Cache .......................................................

5

3.4

Control

..........................................................................................................................................

 

6

 

3.4.1 ........................................................................................

Invalidate (Flush) Operation

6

 

3.4.2 .......................................................................................................

Enabling/Disabling

6

 

3.4.3 ............................................................................................................

Locking Entries

7

 

3.4.4 .........................................................................

Round - Robin Replacement Algorithm

9

4

 

Instruction Cache .....................................................................................

1

4.1

Overview.......................................................................................................................................

1

4.2

Operation......................................................................................................................................

2

 

4.2.1

Operation When Instruction Cache is Enabled............................................................

2

 

4.2.2

Operation When The Instruction Cache Is Disabled....................................................

2

 

4.2.3

Fetch Policy .................................................................................................................

3

 

4.2.4

Round-Robin Replacement Algorithm .........................................................................

3

 

4.2.5

Parity Protection ..........................................................................................................

4

 

4.2.6

Instruction Fetch Latency.............................................................................................

5

 

4.2.7

Instruction Cache Coherency ......................................................................................

5

4.3

Instruction Cache Control .............................................................................................................

6

 

4.3.1

Instruction Cache State at RESET ..............................................................................

6

 

4.3.2

Enabling/Disabling.......................................................................................................

6

 

4.3.3

Invalidating the Instruction Cache................................................................................

7

 

4.3.4

Locking Instructions in the Instruction Cache ..............................................................

8

 

4.3.5

Unlocking Instructions in the Instruction Cache...........................................................

9

5

 

Branch Target Buffer ...............................................................................

1

5.1

Branch Target Buffer (BTB) Operation .........................................................................................

1

 

5.1.1

Reset ...........................................................................................................................

2

 

5.1.2

Update Policy...............................................................................................................

2

5.2

BTB Control ..................................................................................................................................

3

 

5.2.1

Disabling/Enabling.......................................................................................................

3

 

5.2.2

Invalidation...................................................................................................................

3

6

 

Data Cache

................................................................................................

1

6.1

Overviews.....................................................................................................................................

 

1

 

6.1.1

Data Cache ..................................................................................................Overview

1

 

6.1.2

Mini-Data Cache ..........................................................................................Overview

3

 

6.1.3

Write Buffer ...........................................................................and Fill Buffer Overview

4

6.2

Data Cache and Mini- ...............................................................................Data Cache Operation

5

 

6.2.1

Operation When ...........................................................................Caching is Enabled

5

 

6.2.2

Operation When .................................................................Data Caching is Disabled

5

 

6.2.3

Cache Policies.............................................................................................................

5

 

 

6.2.3.1 ..............................................................................................

Cacheability

5

 

 

6.2.3.2 ......................................................................................

Read Miss Policy

6

iv

March, 2003

Developer’s Manual

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

 

 

6.2.3.3

Write Miss Policy ......................................................................................

7

 

 

6.2.3.4

Write-Back Versus Write-Through ............................................................

7

 

6.2.4

Round-Robin Replacement Algorithm .........................................................................

8

 

6.2.5

Parity Protection ..........................................................................................................

8

 

6.2.6

Atomic Accesses .........................................................................................................

8

6.3

Data Cache and Mini-Data Cache Control ...................................................................................

9

 

6.3.1

Data Memory State After Reset...................................................................................

9

 

6.3.2

Enabling/Disabling .......................................................................................................

9

 

6.3.3

Invalidate & Clean Operations .....................................................................................

9

 

 

6.3.3.1

Global Clean and Invalidate Operation...................................................

10

6.4

Re-configuring the Data Cache as Data RAM ............................................................................

12

6.5

Write Buffer/Fill Buffer Operation and Control ............................................................................

16

7

 

Configuration ...........................................................................................

1

7.1

Overview.......................................................................................................................................

 

1

7.2

CP15 Registers.............................................................................................................................

 

4

 

7.2.1

Register 0: ID and Cache Type Registers ...................................................................

5

 

7.2.2

Register 1: Control and Auxiliary Control Registers ....................................................

7

 

7.2.3

Register 2: Translation Table Base Register ...............................................................

9

 

7.2.4

Register 3: Domain Access Control Register ..............................................................

9

 

7.2.5

Register 4: Reserved ...................................................................................................

9

 

7.2.6

Register 5: Fault Status Register...............................................................................

10

 

7.2.7

Register 6: Fault Address Register............................................................................

10

 

7.2.8

Register 7: Cache Functions .....................................................................................

11

 

7.2.9

Register 8: TLB Operations .......................................................................................

13

 

7.2.10

Register 9: Cache Lock Down ...................................................................................

14

 

7.2.11

Register 10: TLB Lock Down .....................................................................................

15

 

7.2.12

Register 11-12: Reserved..........................................................................................

15

 

7.2.13

Register 13: Process ID.............................................................................................

16

 

 

7.2.13.1

The PID Register Affect On Addresses ..................................................

16

 

7.2.14

Register 14: Breakpoint Registers .............................................................................

17

 

7.2.15

Register 15: Coprocessor Access Register ...............................................................

18

7.3

CP14 Registers...........................................................................................................................

 

20

 

7.3.1

Registers 0-3: Performance Monitoring .....................................................................

20

 

7.3.2

Register 4-5: Reserved..............................................................................................

20

 

7.3.3

Registers 6-7: Clock and Power Management ..........................................................

21

 

7.3.4

Registers 8-15: Software Debug................................................................................

22

8

 

System Management ...............................................................................

1

8.1

Clocking ........................................................................................................................................

1

8.2

Processor Reset ...........................................................................................................................

3

 

8.2.1

Reset Sequence ..........................................................................................................

3

 

8.2.2

Reset Effect on Outputs...............................................................................................

4

8.3

Power Management......................................................................................................................

5

 

8.3.1

Invocation ....................................................................................................................

5

 

8.3.2

Signals Associated with Power Management..............................................................

5

9

Interrupts ..................................................................................................

1

9.1

Introduction ...................................................................................................................................

1

9.2

External Interrupts ........................................................................................................................

1

Developer’s Manual

March, 2003

v

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

9.3

Programmer Model.......................................................................................................................

2

 

9.3.1

INTCTL ........................................................................................................................

3

 

9.3.2

INTSRC .......................................................................................................................

4

 

9.3.3

INTSTR........................................................................................................................

5

10

External Bus .............................................................................................

1

10.1

General Description......................................................................................................................

 

1

10.2

Signal Description.........................................................................................................................

 

3

 

10.2.1

Request Bus ................................................................................................................

4

 

 

10.2.1.1

Intel® 80200 Processor Use of the Request Bus......................................

4

 

10.2.2

Data Bus ......................................................................................................................

 

6

 

10.2.3

Critical Word First ........................................................................................................

7

 

10.2.4

Configuration Pins .......................................................................................................

8

 

10.2.5

Multimaster Support.....................................................................................................

9

 

10.2.6

Abort ..........................................................................................................................

 

11

 

10.2.7

ECC ...........................................................................................................................

 

12

 

10.2.8

Big Endian System Configuration ..............................................................................

13

10.3

Examples....................................................................................................................................

 

14

 

10.3.1

Simple Read Word.....................................................................................................

14

 

10.3.2

Read Burst, No Critical Word First.............................................................................

15

 

10.3.3

Read Burst, Critical Word First Data Return..............................................................

16

 

10.3.4

Word Write.................................................................................................................

 

17

 

10.3.5

Two Word Coalesced Write.......................................................................................

18

 

 

10.3.5.1

Write Burst..............................................................................................

19

 

10.3.6

Write Burst, Coalesced..............................................................................................

20

 

10.3.7

Pipelined Accesses....................................................................................................

21

 

10.3.8

Locked Access...........................................................................................................

22

 

10.3.9

Aborted Access..........................................................................................................

23

 

10.3.10

Hold ...........................................................................................................................

 

24

11

 

Bus Controller ..........................................................................................

1

11.1

Introduction...................................................................................................................................

1

11.2

ECC ..............................................................................................................................................

 

1

11.3

Error Handling ..............................................................................................................................

2

 

11.3.1

Bus Aborts ...................................................................................................................

2

 

11.3.2

ECC Errors ..................................................................................................................

3

11.4

Programmer Model.......................................................................................................................

5

 

11.4.1

BCU Control Registers ................................................................................................

5

 

11.4.2

ECC Error Registers ....................................................................................................

9

12

 

Performance Monitoring..........................................................................

1

12.1

Overview.......................................................................................................................................

1

12.2

Clock Counter (CCNT; CP14 - Register 1)...................................................................................

2

12.3

Performance Count Registers (PMN0 - PMN1; CP14 - Register 2 and 3, Respectively).............

3

 

12.3.1

Extending Count Duration Beyond 32 Bits ..................................................................

3

12.4

Performance Monitor Control Register (PMNC) ...........................................................................

4

 

12.4.1

Managing PMNC .........................................................................................................

5

12.5

Performance Monitoring Events ...................................................................................................

6

 

12.5.1

Instruction Cache Efficiency Mode ..............................................................................

7

 

12.5.2

Data Cache Efficiency Mode .......................................................................................

8

vi

March, 2003

Developer’s Manual

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

12.5.3

Instruction Fetch Latency Mode...................................................................................

8

12.5.4

Data/Bus Request Buffer Full Mode ............................................................................

9

12.5.5

Stall/Writeback Statistics .............................................................................................

9

12.5.6

Instruction TLB Efficiency Mode ................................................................................

10

12.5.7

Data TLB Efficiency Mode .........................................................................................

10

12.6

Multiple Performance Monitoring Run Statistics .........................................................................

11

12.7

Examples ....................................................................................................................................

12

13

Software Debug........................................................................................

1

13.1

Definitions .....................................................................................................................................

 

1

13.2

Debug Registers ...........................................................................................................................

 

1

13.3

Introduction ...................................................................................................................................

 

2

 

13.3.1

Halt Mode ....................................................................................................................

 

2

 

13.3.2

Monitor Mode...............................................................................................................

2

13.4

Debug Control and Status Register (DCSR) ................................................................................

3

 

13.4.1

Global Enable Bit (GE) ................................................................................................

4

 

13.4.2

Halt Mode Bit (H) .........................................................................................................

4

 

13.4.3

Vector Trap Bits (TF,TI,TD,TA,TS,TU,TR) ..................................................................

5

 

13.4.4

Sticky Abort Bit (SA) ....................................................................................................

5

 

13.4.5

Method of Entry Bits (MOE).........................................................................................

5

 

13.4.6

Trace Buffer Mode Bit (M) ...........................................................................................

5

 

13.4.7

Trace Buffer Enable Bit (E)..........................................................................................

5

13.5

Debug Exceptions.........................................................................................................................

 

6

 

13.5.1

Halt Mode ....................................................................................................................

 

6

 

13.5.2

Monitor Mode...............................................................................................................

8

13.6

HW Breakpoint Resources ...........................................................................................................

9

 

13.6.1

Instruction Breakpoints ................................................................................................

9

 

13.6.2

Data Breakpoints .......................................................................................................

10

13.7

Software Breakpoints..................................................................................................................

 

11

13.8

Transmit/Receive Control Register (TXRXCTRL) ......................................................................

12

 

13.8.1

RX Register Ready Bit (RR) ......................................................................................

13

 

13.8.2

Overflow Flag (OV) ....................................................................................................

14

 

13.8.3

Download Flag (D).....................................................................................................

14

 

13.8.4

TX Register Ready Bit (TR) .......................................................................................

15

 

13.8.5

Conditional Execution Using TXRXCTRL..................................................................

15

13.9

Transmit Register (TX) ...............................................................................................................

16

13.10

Receive Register (RX)

................................................................................................................

16

13.11

Debug JTAG Access ..................................................................................................................

 

17

 

13.11.1

SELDCSR JTAG Command ......................................................................................

17

 

13.11.2

SELDCSR JTAG Register .........................................................................................

18

 

 

13.11.2.1

DBG.HLD_RST.......................................................................................

19

 

 

13.11.2.2

DBG.BRK................................................................................................

20

 

 

13.11.2.3

DBG.DCSR.............................................................................................

20

 

13.11.3

DBGTX JTAG Command...........................................................................................

20

 

13.11.4

DBGTX JTAG Register..............................................................................................

21

 

13.11.5

DBGRX JTAG Command ..........................................................................................

21

 

13.11.6

DBGRX JTAG Register .............................................................................................

22

 

 

13.11.6.1

RX Write Logic........................................................................................

23

 

 

13.11.6.2

DBGRX Data Register ............................................................................

24

 

 

13.11.6.3

DBG.RR..................................................................................................

24

Developer’s Manual

March, 2003

vii

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

 

 

13.11.6.4

DBG.V ....................................................................................................

25

 

 

13.11.6.5

DBG.RX..................................................................................................

25

 

 

13.11.6.6

DBG.D ....................................................................................................

25

 

 

13.11.6.7

DBG.FLUSH ...........................................................................................

25

 

13.11.7 Debug JTAG Data Register Reset Values.................................................................

25

13.12

Trace Buffer................................................................................................................................

 

26

 

13.12.1 Trace Buffer CP Registers.........................................................................................

26

 

 

13.12.1.1

Checkpoint Registers .............................................................................

26

 

 

13.12.1.2

Trace Buffer Register (TBREG)..............................................................

27

13.13

Trace Buffer Entries....................................................................................................................

 

28

 

13.13.1

Message Byte............................................................................................................

28

 

 

13.13.1.1

Exception Message Byte ........................................................................

29

 

 

13.13.1.2

Non-exception Message Byte.................................................................

30

 

 

13.13.1.3

Address Bytes ........................................................................................

31

 

13.13.2

Trace Buffer Usage....................................................................................................

32

13.14 Downloading Code in the ICache ...............................................................................................

34

 

13.14.1

LDIC JTAG Command...............................................................................................

34

 

13.14.2 LDIC JTAG Data Register .........................................................................................

35

 

13.14.3

LDIC Cache Functions...............................................................................................

36

 

13.14.4 Loading IC During Reset ...........................................................................................

38

 

 

13.14.4.1

Loading IC During Cold Reset for Debug ...............................................

39

 

 

13.14.4.2

Loading IC During a Warm Reset for Debug..........................................

41

 

13.14.5 Dynamically Loading IC After Reset..........................................................................

43

 

 

13.14.5.1

Dynamic Code Download Synchronization ............................................

45

 

13.14.6 Mini Instruction Cache Overview ...............................................................................

46

13.15 Halt Mode Software Protocol......................................................................................................

47

 

13.15.1 Starting a Debug Session..........................................................................................

47

 

 

13.15.1.1

Setting up Override Vector Tables .........................................................

47

 

 

13.15.1.2

Placing the Handler in Memory ..............................................................

48

 

13.15.2 Implementing a Debug Handler.................................................................................

49

 

 

13.15.2.1

Debug Handler Entry ..............................................................................

49

 

 

13.15.2.2

Debug Handler Restrictions....................................................................

49

 

 

13.15.2.3

Dynamic Debug Handler ........................................................................

50

 

 

13.15.2.4

High-Speed Download............................................................................

52

 

13.15.3 Ending a Debug Session ...........................................................................................

53

13.16

Software Debug Notes/Errata.....................................................................................................

54

14

 

Performance Considerations ..................................................................

1

14.1

Interrupt Latency...........................................................................................................................

1

14.2

Branch Prediction .........................................................................................................................

2

14.3

Addressing Modes........................................................................................................................

2

14.4

Instruction Latencies.....................................................................................................................

3

 

14.4.1

Performance Terms .....................................................................................................

3

 

14.4.2

Branch Instruction Timings ..........................................................................................

4

 

14.4.3

Data Processing Instruction Timings ...........................................................................

5

 

14.4.4

Multiply Instruction Timings .........................................................................................

6

 

14.4.5

Saturated Arithmetic Instructions.................................................................................

8

 

14.4.6

Status Register Access Instructions ............................................................................

8

 

14.4.7

Load/Store Instructions................................................................................................

8

 

14.4.8

Semaphore Instructions...............................................................................................

9

 

14.4.9

Coprocessor Instructions .............................................................................................

9

viii

March, 2003

Developer’s Manual

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

 

14.4.10

Miscellaneous Instruction Timing.................................................................................

9

 

14.4.11

Thumb* Instructions.....................................................................................................

9

A

 

Compatibility: Intel® 80200 Processor vs. SA-110................................

1

A.1

Introduction ...................................................................................................................................

1

A.2

Summary ......................................................................................................................................

1

A.3

Architecture Deviations.................................................................................................................

3

 

A.3.1

 

Read Buffer......................................................................................................................

3

 

A.3.2

 

26-bit Mode......................................................................................................................

3

 

A.3.3 Cacheable (C) and Bufferable (B) Encoding ...................................................................

3

 

A.3.4

 

Write Buffer Behavior.......................................................................................................

4

 

A.3.5

 

External Aborts ................................................................................................................

4

 

A.3.6

 

Performance Differences .................................................................................................

5

 

A.3.7

 

System Control Coprocessor...........................................................................................

5

 

A.3.8 New Instructions and Instruction Formats .......................................................................

5

 

A.3.9 Augmented Page Table Descriptors................................................................................

5

B

 

Optimization Guide ..................................................................................

1

B.1

Introduction ...................................................................................................................................

1

 

B.1.1

 

About This Guide .............................................................................................................

1

B.2

Intel® 80200 Processor Pipeline...................................................................................................

2

 

B.2.1

 

General Pipeline Characteristics .....................................................................................

2

 

 

 

B.2.1.1. Number of Pipeline Stages .................................................................................

2

 

 

 

B.2.1.2. Intel® 80200 Processor Pipeline Organization....................................................

3

 

 

 

B.2.1.3. Out Of Order Completion....................................................................................

4

 

 

 

B.2.1.4. Register Scoreboarding ......................................................................................

4

 

 

 

B.2.1.5. Use of Bypassing................................................................................................

4

 

B.2.2 Instruction Flow Through the Pipeline .............................................................................

5

 

 

 

B.2.2.1. ARM* V5 Instruction Execution...........................................................................

5

 

 

 

B.2.2.2. Pipeline Stalls .....................................................................................................

5

 

B.2.3

 

Main Execution Pipeline ..................................................................................................

6

 

 

 

B.2.3.1. F1 / F2 (Instruction Fetch) Pipestages................................................................

6

 

 

 

B.2.3.2. ID (Instruction Decode) Pipestage......................................................................

6

 

 

 

B.2.3.3. RF (Register File / Shifter) Pipestage .................................................................

7

 

 

 

B.2.3.4. X1 (Execute) Pipestage ......................................................................................

7

 

 

 

B.2.3.5. X2 (Execute 2) Pipestage ...................................................................................

7

 

 

 

B.2.3.6. WB (write-back) ..................................................................................................

7

 

B.2.4

 

Memory Pipeline ..............................................................................................................

8

 

 

 

B.2.4.1. D1 and D2 Pipestage..........................................................................................

8

 

B.2.5 Multiply/Multiply Accumulate (MAC) Pipeline ..................................................................

8

 

 

 

B.2.5.1. Behavioral Description........................................................................................

8

B.3

Basic Optimizations ......................................................................................................................

9

 

B.3.1

 

Conditional Instructions ...................................................................................................

9

 

 

 

B.3.1.1. Optimizing Condition Checks..............................................................................

9

 

 

 

B.3.1.2. Optimizing Branches.........................................................................................

10

 

 

 

B.3.1.3. Optimizing Complex Expressions .....................................................................

12

 

B.3.2

 

Bit Field Manipulation ....................................................................................................

13

 

B.3.3 Optimizing the Use of Immediate Values.......................................................................

14

 

B.3.4 Optimizing Integer Multiply and Divide ..........................................................................

15

 

B.3.5 Effective Use of Addressing Modes...............................................................................

16

B.4

Cache and Prefetch Optimizations .............................................................................................

17

Developer’s Manual

March, 2003

ix

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

 

B.4.1

Instruction Cache...........................................................................................................

17

 

 

B.4.1.1. Cache Miss Cost...............................................................................................

17

 

 

B.4.1.2. Round Robin Replacement Cache Policy.........................................................

17

 

 

B.4.1.3. Code Placement to Reduce Cache Misses ......................................................

17

 

 

B.4.1.4. Locking Code into the Instruction Cache ..........................................................

18

 

B.4.2 Data and Mini Cache .....................................................................................................

19

 

 

B.4.2.1. Non Cacheable Regions...................................................................................

19

 

 

B.4.2.2. Write-through and Write-back Cached Memory Regions .................................

19

 

 

B.4.2.3. Read Allocate and Read-write Allocate Memory Regions ................................

20

 

 

B.4.2.4. Creating On-chip RAM......................................................................................

20

 

 

B.4.2.5. Mini-data Cache................................................................................................

21

 

 

B.4.2.6. Data Alignment .................................................................................................

22

 

 

B.4.2.7. Literal Pools ......................................................................................................

23

 

B.4.3

Cache Considerations ...................................................................................................

24

 

 

B.4.3.1. Cache Conflicts, Pollution and Pressure ..........................................................

24

 

 

B.4.3.2. Memory Page Thrashing ..................................................................................

24

 

B.4.4

Prefetch Considerations ................................................................................................

25

 

 

B.4.4.1. Prefetch Distances in the Intel® 80200 Processor............................................

25

 

 

B.4.4.2. Prefetch Loop Scheduling.................................................................................

27

 

 

B.4.4.3. Prefetch Loop Limitations .................................................................................

27

 

 

B.4.4.4. Compute vs. Data Bus Bound ..........................................................................

27

 

 

B.4.4.5. Low Number of Iterations..................................................................................

27

 

 

B.4.4.6. Bandwidth Limitations.......................................................................................

28

 

 

B.4.4.7. Cache Memory Considerations ........................................................................

29

 

 

B.4.4.8. Cache Blocking.................................................................................................

31

 

 

B.4.4.9. Prefetch Unrolling .............................................................................................

31

 

 

B.4.4.10.Pointer Prefetch ..............................................................................................

32

 

 

B.4.4.11.Loop Interchange ............................................................................................

33

 

 

B.4.4.12.Loop Fusion ....................................................................................................

33

 

 

B.4.4.13.Prefetch to Reduce Register Pressure............................................................

34

B.5

Instruction Scheduling ................................................................................................................

35

 

B.5.1

Scheduling Loads ..........................................................................................................

35

 

 

B.5.1.1. Scheduling Load and Store Double (LDRD/STRD) ..........................................

37

 

 

B.5.1.2. Scheduling Load and Store Multiple (LDM/STM) .............................................

38

 

B.5.2 Scheduling Data Processing Instructions ......................................................................

39

 

B.5.3

Scheduling Multiply Instructions ....................................................................................

40

 

B.5.4 Scheduling SWP and SWPB Instructions......................................................................

41

 

B.5.5 Scheduling the MRA and MAR Instructions (MRRC/MCRR).........................................

42

 

B.5.6 Scheduling the MIA and MIAPH Instructions.................................................................

43

 

B.5.7 Scheduling MRS and MSR Instructions.........................................................................

44

 

B.5.8 Scheduling CP15 Coprocessor Instructions ..................................................................

44

B.6

Optimizing C Libraries ................................................................................................................

45

B.7

Optimizations for Size.................................................................................................................

45

 

B.7.1

Space/Performance Trade Off.......................................................................................

45

 

 

B.7.1.1. Multiple Word Load and Store ..........................................................................

45

 

 

B.7.1.2. Use of Conditional Instructions .........................................................................

45

 

 

B.7.1.3. Use of PLD Instructions ....................................................................................

45

C

 

Test Features ............................................................................................

1

C.1

Introduction...................................................................................................................................

1

C.2

JTAG - IEEE1149.1 ......................................................................................................................

1

 

C.2.1

Boundary Scan Architecture ............................................................................................

2

x

March, 2003

Developer’s Manual

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

C.2.2

TAP Pins..........................................................................................................................

3

C.2.3

Instruction Register (IR)...................................................................................................

4

 

C.2.3.1.Boundary-Scan Instruction Set ...........................................................................

4

C.2.4 TAP Test Data Registers .................................................................................................

6

 

C.2.4.1.Device Identification Register .............................................................................

6

 

C.2.4.2.Bypass Register..................................................................................................

6

 

C.2.4.3.Boundary-Scan Register.....................................................................................

6

C.2.5

TAP Controller .................................................................................................................

7

 

C.2.5.1.Test Logic Reset State .......................................................................................

8

 

C.2.5.2.Run-Test/Idle State.............................................................................................

8

 

C.2.5.3.Select-DR-Scan State.........................................................................................

8

 

C.2.5.4.Capture-DR State ...............................................................................................

8

 

C.2.5.5.Shift-DR State.....................................................................................................

9

 

C.2.5.6.Exit1-DR State ....................................................................................................

9

 

C.2.5.7.Pause-DR State..................................................................................................

9

 

C.2.5.8.Exit2-DR State ....................................................................................................

9

 

C.2.5.9.Update-DR State ..............................................................................................

10

 

C.2.5.10.Select-IR Scan State.......................................................................................

10

 

C.2.5.11.Capture-IR State .............................................................................................

10

 

C.2.5.12.Shift-IR State...................................................................................................

10

 

C.2.5.13.Exit1-IR State ..................................................................................................

11

 

C.2.5.14.Pause-IR State................................................................................................

11

 

C.2.5.15.Exit2-IR State ..................................................................................................

11

 

C.2.5.16.Update-IR State ..............................................................................................

11

 

C.2.5.17.Boundary-Scan Example ................................................................................

12

Developer’s Manual

March, 2003

xi

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Figures

1-1

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture Features

........................................... 2

3-1

Example of Locked Entries in TLB..............................................................................................................

9

4-1

Instruction Cache Organization ....................................................................................................................

1

4-2

Locked Line Effect on Round Robin Replacement......................................................................................

8

5-1

BTB Entry.....................................................................................................................................................

1

5-2

Branch History..............................................................................................................................................

2

6-1

Data Cache Organization..............................................................................................................................

2

6-2

Mini-Data Cache Organization.....................................................................................................................

3

6-3

Locked Line Effect on Round Robin Replacement....................................................................................

15

8-1

Reset Sequence .............................................................................................................................................

3

8-2

Pin State at Reset ..........................................................................................................................................

4

9-1

Interrupt Controller Block Diagram .............................................................................................................

2

10-1

Typical System .............................................................................................................................................

1

10-2

Alternate Configuration ................................................................................................................................

2

10-3

Big Endian Lane Swapping on a 64-bit Bus...............................................................................................

13

10-4

Basic Read Timing .....................................................................................................................................

14

10-5

Read Burst, No CWF..................................................................................................................................

15

10-6

Read Burst, CWF........................................................................................................................................

16

10-7

Basic Word Write .......................................................................................................................................

17

10-8

Two Word Coalesced Write .......................................................................................................................

18

10-9

Four Word Eviction Write ..........................................................................................................................

19

10-10

Four Word Coalesced Write Burst .............................................................................................................

20

10-11

Pipeline Example........................................................................................................................................

21

10-12

Locked Access ............................................................................................................................................

22

10-13

Aborted Access...........................................................................................................................................

23

10-14

Hold Assertion ............................................................................................................................................

24

13-1

SELDCSR Hardware..................................................................................................................................

18

13-2

SELDCSR Data Register............................................................................................................................

19

13-3

DBGTX Hardware......................................................................................................................................

21

13-4

DBGRX Hardware......................................................................................................................................

22

13-5

RX Write Logic ..........................................................................................................................................

23

13-6

DBGRX Data Register ...............................................................................................................................

24

13-7

Message Byte Formats................................................................................................................................

28

13-8

Indirect Branch Entry Address Byte Organization.....................................................................................

31

13-9

High Level View of Trace Buffer...............................................................................................................

32

13-10

LDIC JTAG Data Register Hardware.........................................................................................................

35

13-11

Format of LDIC Cache Functions ..............................................................................................................

37

13-12

Code Download During a Cold Reset For Debug ......................................................................................

39

13-13

Code Download During a Warm Reset For Debug ....................................................................................

41

13-14

Downloading Code in IC During Program Execution................................................................................

43

B-1

Intel® 80200 Processor RISC Superpipeline................................................................................................

3

C-1

Test Access Port Block Diagram..................................................................................................................

2

C-2

TAP Controller State Diagram .....................................................................................................................

7

C-3

JTAG Example ...........................................................................................................................................

13

C-4

Timing Diagram Illustrating the Loading of Instruction Register..............................................................

14

C-5

Timing Diagram Illustrating the Loading of Data Register........................................................................

15

xii

March, 2003

Developer’s Manual

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Tables

2-1

Multiply with Internal Accumulate Format...................................................................................................

4

2-2

MIA{<cond>} acc0, Rm, Rs.........................................................................................................................

4

2-3

MIAPH{<cond>} acc0, Rm, Rs....................................................................................................................

5

2-4

MIAxy{<cond>} acc0, Rm, Rs.....................................................................................................................

6

2-5

Internal Accumulator Access Format............................................................................................................

7

2-6

MAR{<cond>} acc0, RdLo, RdHi................................................................................................................

8

2-7

MRA{<cond>} RdLo, RdHi, acc0................................................................................................................

8

2-9

Second-level Descriptors for Coarse Page Table ........................................................................................

10

2-10

Second-level Descriptors for Fine Page Table ............................................................................................

10

2-8

First-level Descriptors .................................................................................................................................

10

2-11

Exception Summary ....................................................................................................................................

12

2-12

Event Priority...............................................................................................................................................

12

2-13

Intel® 80200 Processor Encoding of Fault Status for Prefetch Aborts .......................................................

13

2-14

Intel® 80200 Processor Encoding of Fault Status for Data Aborts .............................................................

14

3-1

Data Cache and Buffer Behavior when X = 0...............................................................................................

3

3-2

Data Cache and Buffer Behavior when X = 1...............................................................................................

3

3-3

Memory Operations that Impose a Fence......................................................................................................

4

3-4

Valid MMU & Data/mini-data Cache Combinations....................................................................................

5

7-1

MRC/MCR Format........................................................................................................................................

2

7-2

LDC/STC Format ..........................................................................................................................................

3

7-3

CP15 Registers ..............................................................................................................................................

4

7-4

ID Register.....................................................................................................................................................

5

7-5

Cache Type Register......................................................................................................................................

5

7-6

ARM* Control Register ................................................................................................................................

7

7-7

Auxiliary Control Register ............................................................................................................................

8

7-8

Translation Table Base Register....................................................................................................................

9

7-9

Domain Access Control Register ..................................................................................................................

9

7-10

Fault Status Register....................................................................................................................................

10

7-11

Fault Address Register ................................................................................................................................

10

7-12

Cache Functions ..........................................................................................................................................

11

7-13

TLB Functions.............................................................................................................................................

13

7-14

Cache Lockdown Functions ........................................................................................................................

14

7-15

Data Cache Lock Register ...........................................................................................................................

14

7-16

TLB Lockdown Functions...........................................................................................................................

15

7-17

Accessing Process ID ..................................................................................................................................

16

7-18

Process ID Register .....................................................................................................................................

16

7-19

Accessing the Debug Registers ...................................................................................................................

17

7-20

Coprocessor Access Register ......................................................................................................................

19

7-21

CP14 Registers ............................................................................................................................................

20

7-22

Accessing the Performance Monitoring Registers ......................................................................................

20

7-23

PWRMODE Register ..................................................................................................................................

21

7-24

Clock and Power Management....................................................................................................................

21

7-25

CCLKCFG Register ....................................................................................................................................

21

7-26

Accessing the Debug Registers ...................................................................................................................

22

8-1

Reset CCLK Configuration ...........................................................................................................................

1

8-2

Software CCLK Configuration......................................................................................................................

2

8-3

Low Power Modes.........................................................................................................................................

5

8-4

PWRSTATUS[1:0] Encoding .......................................................................................................................

5

Developer’s Manual

March, 2003

xiii

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

9-1

Interrupt Control Register (CP13 register 0) ................................................................................................

3

9-2

Interrupt Source Register (CP13, register 4) ................................................................................................

4

9-3

Interrupt Steer Register (CP13, register 8) ...................................................................................................

5

10-1

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture Bus Signals......................................

3

10-2

Requests on a 64-bit Bus ..............................................................................................................................

4

10-3

Requests on a 32-bit Bus ..............................................................................................................................

5

10-4

Return Order for 8-Word Burst, 64-bit Data Bus .........................................................................................

7

10-5

Return Order for 8-Word Burst, 32-bit Data Bus .........................................................................................

7

11-1

BCU Response to ECC Errors......................................................................................................................

3

11-2

BCUCTL (Register 0)...................................................................................................................................

5

11-3

BCUMOD (Register 1).................................................................................................................................

7

11-4

ELOG0, ELOG1(Registers 4, 5) ..................................................................................................................

9

11-5

ECAR0, ECAR1(Registers 6, 7) ..................................................................................................................

9

11-6

ECTST (Register 8) ....................................................................................................................................

10

12-1

Clock Count Register (CCNT) .....................................................................................................................

2

12-2

Performance Monitor Count Register (PMN0 and PMN1)..........................................................................

3

12-3

Performance Monitor Control Register (CP14, register 0)...........................................................................

4

12-4

Performance Monitoring Events...................................................................................................................

6

12-5

Some Common Uses of the PMU.................................................................................................................

7

13-1

Debug Control and Status Register (DCSR) ................................................................................................

3

13-2

Event Priority................................................................................................................................................

6

13-3

Instruction Breakpoint Address and Control Register (IBCRx)...................................................................

9

13-4

Data Breakpoint Register (DBRx)..............................................................................................................

10

13-5

Data Breakpoint Controls Register (DBCON) ...........................................................................................

10

13-6

TX RX Control Register (TXRXCTRL) ....................................................................................................

12

13-7

Normal RX Handshaking ...........................................................................................................................

13

13-8

High-Speed Download Handshaking States...............................................................................................

13

13-9

TX Handshaking.........................................................................................................................................

15

13-10

TXRXCTRL Mnemonic Extensions ..........................................................................................................

15

13-11

TX Register.................................................................................................................................................

16

13-12

RX Register ................................................................................................................................................

16

13-13

DEBUG Data Register Reset Values..........................................................................................................

25

13-14

CP 14 Trace Buffer Register Summary ......................................................................................................

26

13-15

Checkpoint Register (CHKPTx).................................................................................................................

26

13-16

TBREG Format...........................................................................................................................................

27

13-17

Message Byte Formats................................................................................................................................

28

13-18

LDIC Cache Functions ...............................................................................................................................

36

14-1

Minimum Interrupt Latency .........................................................................................................................

1

14-2

Branch Latency Penalty................................................................................................................................

2

14-3

Latency Example ..........................................................................................................................................

4

14-4

Branch Instruction Timings (Those predicted by the BTB) .........................................................................

4

14-5

Branch Instruction Timings (Those not predicted by the BTB) ...................................................................

5

14-6

Data Processing Instruction Timings............................................................................................................

5

14-7

Multiply Instruction Timings........................................................................................................................

6

14-8

Multiply Implicit Accumulate Instruction Timings......................................................................................

7

14-9

Implicit Accumulator Access Instruction Timings.......................................................................................

7

14-10

Saturated Data Processing Instruction Timings............................................................................................

8

14-11

Status Register Access Instruction Timings .................................................................................................

8

14-12

Load and Store Instruction Timings .............................................................................................................

8

14-13

Load and Store Multiple Instruction Timings ..............................................................................................

8

xiv

March, 2003

Developer’s Manual

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

14-14

Semaphore Instruction Timings ....................................................................................................................

9

14-15

CP15 Register Access Instruction Timings...................................................................................................

9

14-16

CP14 Register Access Instruction Timings...................................................................................................

9

14-17

SWI Instruction Timings ...............................................................................................................................

9

14-18

Count Leading Zeros Instruction Timings ....................................................................................................

9

A-1

C and B encoding ..........................................................................................................................................

3

B-1

Pipelines and Pipe stages...............................................................................................................................

3

C-1

TAP Controller Pin Definitions.....................................................................................................................

3

C-2

JTAG Instruction Set.....................................................................................................................................

4

C-3

IEEE Instructions...........................................................................................................................................

5

C-4

JTAG ID Register Value ...............................................................................................................................

6

Developer’s Manual

March, 2003

xv

Introduction

1

1.1Intel® 80200 Processor based on Intel® XScaleMicroarchitecture High-Level Overview

The Intel® 80200 processor based on Intel® XScalemicroarchitecture, is the next generation in the Intel® StrongARM* processor family (compliant with ARM* Architecture V5TE). It is designed for high performance and low-power; leading the industry in mW/MIPs. The Intel® 80200 processor integrates a bus controller and an interrupt controller around a core processor, with intended embedded markets such as: handheld devices, networking, remote access servers, etc. This technology is ideal for internet infrastructure products such as network and I/O processors, where ultimate performance is critical for moving and processing large amounts of data quickly.

The Intel® 80200 processor incorporates an extensive list of architecture features that allows it to achieve high performance. This rich feature set allows programmers to select the appropriate features that obtains the best performance for their application. Many of the architectural features added to Intel® 80200 processor help hide memory latency which often is a serious impediment to high performance processors. This includes:

the ability to continue instruction execution even while the data cache is retrieving data from external memory.

a write buffer.

write-back caching.

various data cache allocation policies which can be configured different for each application.

cache locking.

and a pipelined external bus.

All these features improve the efficiency of the external bus.

The Intel® 80200 processor has been equipped to efficiently handle audio processing through the support of 16-bit data types and 16-bit operations. These audio coding enhancements center around multiply and accumulate operations which accelerate many of the audio filter operations.

1.1.1ARM* Architecture Compliance

ARM* Version 5 (V5) Architecture added floating point instructions to ARM* Version 4. The Intel® 80200 processor implements the integer instruction set architecture of ARM V5, but does not provide hardware support of the floating point instructions.

The Intel® 80200 processor provides the Thumb* instruction set (ARM* V5T) and the ARM* V5E DSP extensions.

Backward compatibility with the first generation of Intel® StrongARM* products is maintained for user-mode applications. Operating systems may require modifications to match the specific hardware features of the Intel® 80200 processor and to take advantage of the performance enhancements added to the Intel® 80200 processor.

Developer’s Manual

March, 2003

1-1

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Introduction

1.1.2Features

Figure 1-1 shows the major functional blocks of the Intel® 80200 processor. The following sections give a brief, high-level overview of these blocks.

Figure 1-1. Intel® 80200 Processor based on Intel® XScaleMicroarchitecture Features

 

Instruction

 

Data Cache

Mini-Data

 

 

Max 32 Kbytes

 

 

Cache

 

Cache

 

32 ways

 

 

 

 

 

 

32 Kbytes

 

wr-back or

 

2 Kbytes

 

 

wr-through

 

 

32 ways

 

Data RAM

2 ways

 

 

Hit under

 

Lockable by line

 

 

 

 

miss

Max 28 Kbytes

 

 

 

 

 

 

 

 

 

 

Re-map of

 

 

 

 

 

data cache

 

 

Branch Target

IMMU

DMMU

Fill

 

Buffer

 

32 entry TLB

32 entry TLB

Buffer

 

2 Kbytes

 

Fully associative

Fully associative

4 - 8 entries

 

2 ways

 

Lockable by entry

Lockable by entry

 

 

Performance

 

Power

MAC

Write Buffer

 

Monitoring

 

Management

Single Cycle

8 entries

 

 

 

 

 

 

 

Idle

Throughput (16*32)

Full coalescing

 

Debug

 

16-bit SIMD

 

 

 

Sleep

 

 

 

40-bit Accumulator

 

 

Hardware Breakpoint

 

 

 

 

 

 

 

JTAG

 

Branch History Table

 

 

 

 

Interrupt Controller

 

Bus Controller

 

 

Interrupt Masking

 

 

1 Gbyte/sec

 

 

FIQ/IRQ Steering

 

 

Pipelined, de-multiplexed

 

Pend Register

 

 

ECC protection

 

 

 

 

 

 

B1307-01

1.1.2.1

Multiply/Accumulate (MAC)

 

 

The MAC unit supports early termination of multiplies/accumulates in two cycles and can sustain a throughput of a MAC operation every cycle. Several architectural enhancements were made to the MAC to support audio coding algorithms, which include a 40-bit accumulator and support for 16-bit packed data.

See Section 2.3, “Extensions to ARM* Architecture” on page 2-3 for more details.

1-2

March, 2003

Developer’s Manual

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Introduction

1.1.2.2Memory Management

The Intel® 80200 processor implements the Memory Management Unit (MMU) Architecture specified in the ARM Architecture Reference Manual. The MMU provides access protection and virtual to physical address translation.

The MMU Architecture also specifies the caching policies for the instruction cache and data memory. These policies are specified as page attributes and include:

identifying code as cacheable or non-cacheable

selecting between the mini-data cache or data cache

write-back or write-through data caching

enabling data write allocation policy

and enabling the write buffer to coalesce stores to external memory

Chapter 3, “Memory Management”discusses this in more detail.

1.1.2.3Instruction Cache

The Intel® 80200 processor implements a 32-Kbyte, 32-way set associative instruction cache with a line size of 32 bytes. All requests that “miss” the instruction cache generate a 32-byte read request to external memory. A mechanism to lock critical code within the cache is also provided.

Chapter 4, “Instruction Cache”discusses this in more detail.

1.1.2.4Branch Target Buffer

The Intel® 80200 processor provides a Branch Target Buffer (BTB) to predict the outcome of branch type instructions. It provides storage for the target address of branch type instructions and predicts the next address to present to the instruction cache when the current instruction address is that of a branch.

The BTB holds 128 entries. See Chapter 5, “Branch Target Buffer”for more details.

1.1.2.5Data Cache

The Intel® 80200 processor implements a 32-Kbyte, a 32-way set associative data cache and a 2-Kbyte, 2-way set associative mini-data cache. Each cache has a line size of 32 bytes, supports write-through or write-back caching.

The data/mini-data cache is controlled by page attributes defined in the MMU Architecture and by coprocessor 15.

Chapter 6, “Data Cache”discusses all this in more detail.

The Intel® 80200 processor allows applications to re-configure a portion of the data cache as data RAM. Software may place special tables or frequently used variables in this RAM. See

Section 6.4, “Re-configuring the Data Cache as Data RAM” on page 6-12 for more information on this.

Developer’s Manual

March, 2003

1-3

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Introduction

1.1.2.6Power Management

The Intel® 80200 processor supports two low power modes: idle and sleep. These modes are discussed in Section 8.3, “Power Management” on page 8-5.

1.1.2.7Interrupt Controller

An interrupt controller is implemented on the Intel® 80200 processor that provides masking of interrupts and the ability to steer interrupts to FIQ or IRQ. It is accessed through Coprocessor 13 registers. See Chapter 9, “Interrupts”for more detail.

1.1.2.8Bus Controller

The Intel® 80200 processor supports a pipelined external bus that runs at 100 MHz. The data bus is 32/64 bits with ECC protection. The bus controller can be configured to provide critical word first on load operations, enhancing overall system performance. The bus controller has four request queues, where all four requests can be active on the pipelined external bus.

Chapter 10, “External Bus” describes the external bus protocol and Chapter 11, “Bus Controller” covers the aspects of ECC protection. The bus controller registers are accessed via coprocessor 13.

1.1.2.9Performance Monitoring

Two performance monitoring counters have been added to the Intel® 80200 processor that can be configured to monitor various events in the Intel® 80200 processor. These events allow a software developer to measure cache efficiency, detect system bottlenecks and reduce the overall latency of programs.

Chapter 12, “Performance Monitoring”discusses this in more detail.

1.1.2.10Debug

The Intel® 80200 processor supports software debugging through two instruction address breakpoint registers, one data-address breakpoint register, one data-address/mask breakpoint register, and a trace buffer.

Chapter 13, “Software Debug”discusses this in more detail.

1.1.2.11JTAG

Testability is supported on the Intel® 80200 processor through the Test Access Port (TAP) Controller implementation, which is based on IEEE 1149.1 (JTAG) Standard Test Access Port and Boundary-Scan Architecture. The purpose of the TAP controller is to support test logic internal and external to the Intel® 80200 processor such as built-in self-test, boundary-scan, and scan.

Appendix C.2 discusses this in more detail.

1-4

March, 2003

Developer’s Manual

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Introduction

1.2Terminology and Conventions

1.2.1Number Representation

All numbers in this document can be assumed to be base 10 unless designated otherwise. In text and pseudo code descriptions, hexadecimal numbers have a prefix of 0x and binary numbers have a prefix of 0b. For example, 107 would be represented as 0x6B in hexadecimal and 0b1101011 in binary.

1.2.2Terminology and Acronyms

ASSP

Application Specific Standard Product

Assert

This term refers to the logically active value of a signal or bit.

BTB

Branch Target Buffer

Clean

A clean operation updates external memory with the contents of the specified line in

 

the data/mini-data cache if any of the dirty bits are set and the line is valid. There are

 

two dirty bits associated with each line in the cache so only the portion that is dirty

 

gets written back to external memory.

 

After this operation, the line is still valid and both dirty bits are deasserted.

Coalescing

Coalescing means bringing together a new store operation with an existing store

 

operation already resident in the write buffer. The new store is placed in the same

 

write buffer entry as an existing store when the address of the new store falls in the

 

4 word aligned address of the existing entry. This includes, in PCI terminology, write

 

merging, write collapsing, and write combining.

Deassert

This term refers to the logically inactive value of a signal or bit.

Flush

A flush operation invalidates the location(s) in the cache by deasserting the valid bit.

 

Individual entries (lines) may be flushed or the entire cache may be flushed with one

 

command. Once an entry is flushed in the cache it can no longer be used by the

 

program.

Reserved

A reserved field is a field that may be used by an implementation. If the initial value

 

of a reserved field is supplied by software, this value must be zero. Software should

 

not modify reserved fields or depend on any values in reserved fields.

Developer’s Manual

March, 2003

1-5

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Introduction

1.3Other Relevant Documents

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture Datasheet, Intel Order # 273414

ARM Architecture Version 5TE Specification Document Number: ARM DDI 0100E

This document describes Version 5TE of the ARM Architecture which includes Thumb ISA and ARM DSP-Enhanced ISA.

ARM Architecture Reference Manual Document Number: ARM DDI 0100B This document describes Version 4 of the ARM Architecture.

Intel® XScaleMicroarchitecture Programming Reference Manual, Intel Order # 273436

Intel® 80312 I/O Companion Chip Developer’s Manual, Intel Order # 273410

StrongARM SA-1100 Microprocessor Developer’s Manual, Intel Order # 278088

StrongARM SA-110 Microprocessor Technical Reference Manual, Intel Order #278058

1-6

March, 2003

Developer’s Manual

Programming Model

2

This chapter describes the programming model of the Intel® 80200 processor based on Intel® XScalemicroarchitecture, namely the implementation options and extensions to the ARM* Version 5 architecture.

The ARM* Architecture Version 5TE Specification (ARM DDI 0100E) describes Version 5TE of the ARM Architecture, including the Thumb* ISA and ARM DSP-Enhanced ISA.

2.1ARM* Architecture Compliance

The Intel® 80200 processor implements the integer instruction set architecture specified in ARM* Version 5TE. T refers to the Thumb instruction set and E refers to the DSP-Enhanced instruction set.

ARM* Version 5 introduces a few more architecture features over Version 4, specifically the addition of tiny pages (1 Kbyte), a new instruction (CLZ) that counts the leading zeroes in a data value, enhanced ARM-Thumb transfer instructions and a modification of the system control coprocessor, CP15.

2.2ARM* Architecture Implementation Options

2.2.1Big Endian versus Little Endian

The Intel® 80200 processor supports both big and little endian data representation. The B-bit of the Control Register (Coprocessor 15, register 1, bit 7) selects big and little endian mode. To run in big endian mode, the B bit must be set before attempting any sub-word accesses to memory, or undefined results occur. Note that this bit takes effect even if the MMU is disabled.

2.2.226-Bit Code

The Intel® 80200 processor does not support 26-bit code.

2.2.3Thumb*

The Intel® 80200 processor supports the Thumb instruction set.

Developer’s Manual

March, 2003

2-1

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Programming Model

2.2.4ARM* DSP-Enhanced Instruction Set

The Intel® 80200 processor implements ARM DSP-enhanced instruction set, which is a set of instructions that boost the performance of signal processing applications. There are new multiply instructions that operate on 16-bit data values and new saturation instructions. Some of the new instructions are:

SMLAxy

32<=16x16+32

SMLAWy

32<=32x16+32

SMLALxy

64<=16x16+64

SMULxy

32<=16x16

SMULWy

32<=32x16

QADD

adds two registers and saturates the result if an overflow occurred

QDADD

doubles and saturates one of the input registers then add and saturate

QSUB

subtracts two registers and saturates the result if an overflow occurred

QDSUB

doubles and saturates one of the input registers then subtract and saturate

The Intel® 80200 processor also implements LDRD, STRD and PLD instructions with the following implementation notes:

PLD is interpreted as a read operation by the MMU and is ignored by the data breakpoint unit, i.e., PLD never generates data breakpoint events.

PLD to a non-cacheable page performs no action. Also, if the targeted cache line is already resident, this instruction has no affect.

Both LDRD and STRD instructions generation an alignment exception when the address bits [2:0] = 0b100.

MCRR and MRRC are only supported on the Intel® 80200 processor when directed to coprocessor 0 and are used to access the internal accumulator. See Section 2.3.1.2 for more information. Access to any other coprocessor besides 0x0 are undefined.

2.2.5Base Register Update

If a data abort is signalled on a memory instruction that specifies writeback, the contents of the base register is not updated. This holds for all load and store instructions. This behavior matches that of the first generation Intel® StrongARM* processor and is referred to in the ARM V5 architecture as the Base Restored Abort Model.

2-2

March, 2003

Developer’s Manual

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Programming Model

2.3Extensions to ARM* Architecture

The Intel® 80200 processor made a few extensions to the ARM Version 5 architecture to meet the needs of various markets and design requirements. The following is a list of the extensions which are discussed in the next sections.

A DSP coprocessor (CP0) has been added that contains a 40-bit accumulator and new instructions.

New page attributes were added to the page table descriptors. The C and B page attribute encoding was extended by one more bit to allow for more encodings: write allocate and mini-data cache. An attribute specifying ECC for 1Meg regions was also added.

Additional functionality has been added to coprocessor 15. Coprocessor 14 was also created.

Enhancements were made to the Event Architecture, instruction cache and data cache parity error exceptions, breakpoint events, and imprecise external data aborts.

2.3.1DSP Coprocessor 0 (CP0)

The Intel® 80200 processor adds a DSP coprocessor to the architecture for the purpose of increasing the performance and the precision of audio processing algorithms. This coprocessor contains a 40-bit accumulator and new instructions.

The 40-bit accumulator is referenced by several new instructions that were added to the architecture; MIA, MIAPH and MIAxy are multiply/accumulate instructions that reference the 40-bit accumulator instead of a register specified accumulator. MAR and MRA provide the ability to read and write the 40-bit accumulator.

Access to CP0 is always allowed in all processor modes when bit 0 of the Coprocessor Access Register is set. Any access to CP0 when this bit is clear causes an undefined exception. (See Section 7.2.15, “Register 15: Coprocessor Access Register” on page 7-18 for more details). Note that only privileged software can set this bit in the Coprocessor Access Register.

The 40-bit accumulator needs to be saved on a context switch if multiple processes are using it.

Two new instruction formats were added for coprocessor 0: Multiply with Internal Accumulate Format and Internal Accumulate Access Format. The formats and instructions are described next.

Developer’s Manual

March, 2003

2-3

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Programming Model

2.3.1.1Multiply With Internal Accumulate Format

A new multiply format has been created to define operations on 40-bit accumulators. Table 2-1, “Multiply with Internal Accumulate Format” on page 2-4 shows the layout of the new format. The opcode for this format lies within the coprocessor register transfer instruction type. These instructions have their own syntax.

Table 2-1.

Multiply with Internal Accumulate Format

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

 

cond

 

1

1

1

0

0 0 1

0

opcode_3

 

Rs

 

0

0

0

0

 

acc

 

1

 

Rm

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Bits

 

 

 

 

 

 

Description

 

 

 

 

 

 

 

 

 

 

 

Notes

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

31:28

 

cond - ARM condition codes

 

 

 

 

 

-

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Intel® 80200 processor defines the following:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0b0000 = MIA

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0b1000 = MIAPH

 

 

 

 

 

 

 

 

 

 

 

 

 

opcode_3 - specifies the type of multiply with

0b1100 = MIABB

 

 

 

 

 

 

 

 

 

 

19:16

 

0b1101 = MIABT

 

 

 

 

 

 

 

 

 

 

 

internal accumulate

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0b1110 = MIATB

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0b1111 = MIATT

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The effect of all other encodings are

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

unpredictable.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

15:12

 

Rs - Multiplier

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

acc - select 1 of 8 accumulators

 

 

 

 

Intel® 80200 processor only implements acc0;

 

 

7:5

 

 

 

 

 

access to any other acc has unpredictable

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

effect.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

3:0

 

Rm - Multiplicand

 

 

 

 

 

 

 

 

-

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Two new fields were created for this format, acc and opcode_3. The acc field specifies 1 of 8 internal accumulators to operate on and opcode_3 defines the operation for this format. The Intel® 80200 processor defines a single 40-bit accumulator referred to as acc0; future implementations may define multiple internal accumulators.The Intel® 80200 processor uses opcode_3 to define six instructions, MIA, MIAPH, MIABB, MIABT, MIATB and MIATT.

Table 2-2.

MIA{<cond>} acc0, Rm, Rs

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

cond

 

1

 

1

1

0

0 0 1

0

0 0 0 0

 

Rs

 

0

0

0

0

0

0

0

1

 

Rm

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Operation: if ConditionPassed(<cond>) then

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

acc0 = (Rm[31:0] * Rs[31:0])[39:0] + acc0[39:0]

 

 

 

 

 

 

 

 

Exceptions: none

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Qualifiers

Condition Code

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

No condition code flags are updated

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Notes:

 

 

Early termination is supported. Instruction timings can be found

 

 

 

 

 

 

 

 

in Section 14.4.4, “Multiply Instruction Timings” on page 14-6.

 

 

 

 

 

 

 

 

 

Specifying R15 for register Rs or Rm has unpredictable results.

 

 

 

 

 

 

 

 

 

acc0 is defined to be 0b000 on 80200.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The MIA instruction operates similarly to MLA except that the 40-bit accumulator is used. MIA multiplies the signed value in register Rs (multiplier) by the signed value in register Rm (multiplicand) and then adds the result to the 40-bit accumulator (acc0).

2-4

March, 2003

Developer’s Manual

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Programming Model

MIA does not support unsigned multiplication; all values in Rs and Rm are interpreted as signed data values. MIA is useful for operating on signed 16-bit data that was loaded into a general purpose register by LDRSH.

The instruction is only executed if the condition specified in the instruction matches the condition code status.

Table 2-3.

MIAPH{<cond>} acc0, Rm, Rs

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

 

3

2

1

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

cond

 

1

 

1

1

0

0 0 1

0

1 0 0 0

 

Rs

 

0

0

0

0

0

0

0

1

 

 

Rm

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Operation: if ConditionPassed(<cond>) then

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

acc0 = sign_extend(Rm[31:16] * Rs[31:16]) +

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

sign_extend(Rm[15:0] * Rs[15:0]) +

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

acc0[39:0]

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Exceptions: none

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Qualifiers

Condition Code

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

S bit is always cleared; no condition code flags are updated

 

 

 

 

 

Notes:

 

 

Instruction timings can be found

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

in Section 14.4.4, “Multiply Instruction Timings” on page 14-6.

 

 

 

 

 

 

 

 

 

Specifying R15 for register Rs or Rm has unpredictable results.

 

 

 

 

 

 

 

 

 

acc0 is defined to be 0b000 on 80200

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The MIAPH instruction performs two16-bit signed multiplies on packed half word data and accumulates these to a single 40-bit accumulator. The first signed multiplication is performed on the lower 16 bits of the value in register Rs with the lower 16 bits of the value in register Rm. The second signed multiplication is performed on the upper 16 bits of the value in register Rs with the upper 16 bits of the value in register Rm. Both signed 32-bit products are sign extended and then added to the value in the 40-bit accumulator (acc0).

The instruction is only executed if the condition specified in the instruction matches the condition code status.

Developer’s Manual

March, 2003

2-5

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Programming Model

Table 2-4.

MIAxy{<cond>} acc0, Rm, Rs

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

 

3

2

1

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

cond

 

1

 

1

1

0

0 0 1

0

1 1 x y

 

Rs

 

0

0

0

0

0

0

0

1

 

 

Rm

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Operation: if ConditionPassed(<cond>) then

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

if (bit[17] == 0)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

<operand1> = Rm[15:0]

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

else

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

<operand1> = Rm[31:16]

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

if (bit[16] == 0)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

<operand2> = Rs[15:0]

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

else

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

<operand2> = Rs[31:16]

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

acc0[39:0] = sign_extend(<operand1> * <operand2>) + acc0[39:0]

 

 

Exceptions: none

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Qualifiers

Condition Code

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

S bit is always cleared; no condition code flags are updated

 

 

 

 

 

Notes:

 

 

Instruction timings can be found

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

in Section 14.4.4, “Multiply Instruction Timings” on page 14-6.

 

 

 

 

 

 

 

 

 

Specifying R15 for register Rs or Rm has unpredictable results.

 

 

 

 

 

 

 

 

 

acc0 is defined to be 0b000 on 80200.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The MIAxy instruction performs one16-bit signed multiply and accumulates these to a single 40-bit accumulator. x refers to either the upper half or lower half of register Rm (multiplicand) and y refers to the upper or lower half of Rs (multiplier). A value of 0x1 selects bits [31:16] of the register which is specified in the mnemonic as T (for top). A value of 0x0 selects bits [15:0] of the register which is specified in the mnemonic as B (for bottom).

MIAxy does not support unsigned multiplication; all values in Rs and Rm are interpreted as signed data values.

The instruction is only executed if the condition specified in the instruction matches the condition code status.

2-6

March, 2003

Developer’s Manual

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Programming Model

2.3.1.2Internal Accumulator Access Format

The Intel® 80200 processor defines a new instruction format for accessing internal accumulators in CP0. Table 2-5, “Internal Accumulator Access Format” on page 2-7 shows that the opcode falls into the coprocessor register transfer space.

The RdHi and RdLo fields allow up to 64 bits of data transfer between Intel® StrongARM* registers and an internal accumulator. The acc field specifies 1 of 8 internal accumulators to transfer data to/from. The Intel® 80200 processor implements a single 40-bit accumulator referred to as acc0; future implementations can specify multiple internal accumulators of varying sizes, up to 64 bits.

Access to the internal accumulator is allowed in all processor modes (user and privileged) as long bit 0 of the Coprocessor Access Register is set. (See Section 7.2.15, “Register 15: Coprocessor Access Register” on page 7-18 for more details).

The Intel® 80200 processor implements two instructions MAR and MRA that move two Intel® StrongARM* registers to acc0 and move acc0 to two Intel® StrongARM* registers, respectively.

Table 2-5.

Internal Accumulator Access Format

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

31

30

29

28

27

26

25

24

23

22

 

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

 

2

1

0

 

 

cond

 

1

1

0

0

0

1

 

0

L

 

RdHi

 

 

RdLo

 

0

0

0

0

0

0

0

0

0

 

 

acc

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Bits

 

 

 

 

 

 

Description

 

 

 

 

 

 

 

 

 

 

 

Notes

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

31:28

 

cond - ARM condition codes

 

 

 

 

 

-

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

L - move to/from internal accumulator

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

20

 

0= move to internal accumulator (MAR)

 

 

-

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1= move from internal accumulator (MRA)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

On a read of the acc, this 8-bit high order field

 

 

19:16

 

RdHi - specifies the high order eight (39:32)

 

is sign extended.

 

 

 

 

 

 

 

 

 

 

 

 

 

bits of the internal accumulator.

 

 

 

 

On a write to the acc, the lower 8 bits of this

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

register is written to acc[39:32]

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

15:12

 

RdLo - specifies the low order 32 bits of the

 

-

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

internal accumulator

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This field could be used in future

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

implementations to specify the type of

 

 

 

 

 

7:4

 

Should be zero

 

 

 

 

 

 

 

 

 

 

saturation to perform on the read of an internal

 

 

 

 

 

 

 

 

 

 

 

 

 

accumulator. (e.g., a signed saturation to

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

16-bits may be useful for some filter

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

algorithms.)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

3

 

Should be zero

 

 

 

 

 

 

 

 

 

 

-

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2:0

 

acc - specifies 1 of 8 internal accumulators

 

Intel® 80200 processor only implements acc0;

 

 

 

 

access to any other acc is unpredictable

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Note: MAR has the same encoding as MCRR (to coprocessor 0) and MRA has the same encoding as MRRC (to coprocessor 0). These instructions move 64-bits of data to/from ARM registers from/to coprocessor registers. MCRR and MRRC are defined in ARM’s DSP instruction set.

Disassemblers not aware of MAR and MRA produces the following syntax:

MCRR{<cond>} p0, 0x0, RdLo, RdHi, c0

MRRC{<cond>} p0, 0x0, RdLo, RdHi, c0

Developer’s Manual

March, 2003

2-7

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Programming Model

Table 2-6.

MAR{<cond>} acc0, RdLo, RdHi

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

cond

 

1

1

0

0

0

1

0

0

 

RdHi

 

 

RdLo

 

0

0

0

0

0

0

0

0

0

0

0

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Operation: if ConditionPassed(<cond>) then acc0[39:32] = RdHi[7:0] acc0[31:0] = RdLo[31:0]

Exceptions: none

Qualifiers

Condition Code

 

 

 

 

No condition code

flags

are updated

Notes:

Instruction timings can

be found

in

 

Section 14.4.4, “Multiply Instruction Timings” on page 14-6

 

Specifying R15 as

either RdHi or

RdLo has unpredictable results.

The MAR instruction moves the value in register RdLo to bits[31:0] of the 40-bit accumulator (acc0) and moves bits[7:0] of the value in register RdHi into bits[39:32] of acc0.

The instruction is only executed if the condition specified in the instruction matches the condition code status.

This instruction executes in any processor mode.

Table 2-7.

MRA{<cond>} RdLo, RdHi, acc0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

31

30

29

28

27

26

25

24

23

22

 

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

cond

 

1

 

1

0

0

0

1

0

1

 

RdHi

 

 

RdLo

 

0

0

0

0

0

0

0

0

0

0

0

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Operation: if ConditionPassed(<cond>) then

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

RdHi[31:0] = sign_extend(acc0[39:32])

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

RdLo[31:0] = acc0[31:0]

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Exceptions: none

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Qualifiers

Condition Code

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

No condition code flags are updated

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Notes:

 

 

Instruction timings can be found in

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Section 14.4.4, “Multiply Instruction Timings” on page 14-6

 

 

 

 

 

 

 

 

 

 

Specifying the same register for RdHi and RdLo has unpredictable

 

 

 

 

 

 

 

 

results.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Specifying R15 as either RdHi or RdLo has unpredictable results.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The MRA instruction moves the 40-bit accumulator value (acc0) into two registers. Bits[31:0] of the value in acc0 are moved into the register RdLo. Bits[39:32] of the value in acc0 are sign extended to 32 bits and moved into the register RdHi.

The instruction is only executed if the condition specified in the instruction matches the condition code status.

This instruction executes in any processor mode.

2-8

March, 2003

Developer’s Manual

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Programming Model

2.3.2New Page Attributes

The Intel® 80200 processor extends the page attributes defined by the C and B bits in the page descriptors with an additional X bit. This bit allows four more attributes to be encoded when X=1. These new encodings include allocating data for the mini-data cache and write-allocate caching. A full description of the encodings can be found in Section 3.2.2, “Memory Attributes” on page 3-2.

The Intel® 80200 processor retains ARM definitions of the C and B encoding when X = 0, which is different than the first generation Intel® StrongARM* products. The memory attribute for the mini-data cache has been moved and replaced with the write-through caching attribute.

When write-allocate is enabled, a store operation that misses the data cache (cacheable data only) generates a line fill. If disabled, a line fill only occurs when a load operation misses the data cache (cacheable data only).

Write-through caching causes all store operations to be written to memory, whether they are cacheable or not cacheable. This feature is useful for maintaining data cache coherency.

The Intel® 80200 processor also added a P bit in the first level descriptors to identify which pages of memory are protected with ECC.

A descriptor with the P bit set indicates the corresponding page in memory is ECC protected. If the BCUs ECC mode is enabled (see Chapter 11, “Bus Controller”) then writes to such a page are accompanied with an ECC and reads are validated by an ECC.

Bit 1 in the Control Register (coprocessor 15, register 1, opcode=1) enables ECC protection for memory accesses made during page table walks.

These attributes are programmed in the translation table descriptors, which are highlighted in Table 2-8, “First-level Descriptors” on page 2-10, Table 2-9, “Second-level Descriptors for Coarse Page Table” on page 2-10 and Table 2-10, “Second-level Descriptors for Fine Page Table” on page 2-10. Two second-level descriptor formats have been defined for Intel® 80200 processor, one is used for the coarse page table and the other is used for the fine page table.

Developer’s Manual

March, 2003

2-9

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Programming Model

Table 2-8.

First-level Descriptors

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

 

2

1

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

SBZ

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Coarse page table base address

 

 

 

 

 

 

P

 

Domain

 

 

SBZ

 

0

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Section base address

 

 

 

SBZ

 

 

TEX

 

AP

P

 

Domain

 

0

 

C

 

B

1

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Fine page table base address

 

 

 

 

 

SBZ

P

 

Domain

 

 

SBZ

 

1

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 2-9.

Second-level Descriptors for Coarse Page Table

 

 

 

 

 

 

 

 

 

 

 

 

 

 

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

SBZ

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Large page base address

 

 

 

 

 

TEX

 

AP3

AP2

AP1

AP0

C

B

0

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Small page base address

 

 

 

 

 

 

AP3

AP2

AP1

AP0

C

B

1

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Extended small page base address

 

 

 

 

SBZ

 

TEX

 

AP

C

B

1

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 2-10. Second-level Descriptors for Fine Page Table

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

SBZ

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Large page base address

 

 

 

 

 

TEX

 

AP3

AP2

AP1

AP0

C

B

0

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Small page base address

 

 

 

 

 

 

AP3

AP2

AP1

AP0

C

B

1

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Tiny Page Base Address

 

 

 

 

 

 

 

 

TEX

 

AP

C

B

1

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The P bit controls ECC.

The TEX (Type Extension) field is present in several of the descriptor types. In the Intel® 80200 processor, only the LSB of this field is used; this is called the X bit.

A Small Page descriptor does not have a TEX field. For these descriptors, TEX is implicitly zero; that is, they operate as if the X bit had a ‘0’ value.

The X bit, when set, modifies the meaning of the C and B bits. Description of page attributes and their encoding can be found in Chapter 3, “Memory Management”.

2-10

March, 2003

Developer’s Manual

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Programming Model

2.3.3Additions to CP15 Functionality

To accommodate the functionality in the Intel® 80200 processor, registers in CP15 and CP14 have been added or augmented. See Chapter 7, “Configuration” for details.

At times it is necessary to be able to guarantee exactly when a CP15 update takes effect. For example, when enabling memory address translation (turning on the MMU), it is vital to know when the MMU is actually guaranteed to be in operation. To address this need, a processor-specific code sequence is defined for each Intel® StrongARM* processor. For the Intel® 80200 processor, the sequence -- called CPWAIT -- is shown in Example 2-1 on page 2-11.

Example 2-1. CPWAIT: Canonical method to wait for CP15 update

;;The following macro should be used when software needs to be

;;assured that a CP15 update has taken effect.

;;It may only be used while in a privileged mode, because it

;;accesses CP15.

MACRO CPWAIT

MRC P15, 0,

R0, C2, C0, 0

; arbitrary read of CP15

MOV

R0,

R0

 

;

wait for it

SUB

PC,

PC,

#4

;

branch to next instruction

;At this point, any previous CP15 writes are

;guaranteed to have taken effect.

ENDM

When setting multiple CP15 registers, system software may opt to delay the assurance of their update. This is accomplished by emitting CPWAIT only after the sequence of MCR instructions.

The CPWAIT sequence guarantees that CP15 side-effects are complete by the time the CPWAIT is complete. It is possible, however, that the CP15 side-effect takes place before CPWAIT completes or is issued. Programmers should take care that this does not affect the correctness of their code.

Developer’s Manual

March, 2003

2-11

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Programming Model

2.3.4Event Architecture

2.3.4.1Exception Summary

Table 2-11 shows all the exceptions that the Intel® 80200 processor may generate, and the attributes of each. Subsequent sections give details on each exception.

Table 2-11. Exception Summary

Exception Description

Exception Typea

Precise?

Updates FAR?

 

 

 

 

Reset

Reset

N

N

 

 

 

 

FIQ

FIQ

N

N

 

 

 

 

IRQ

IRQ

N

N

 

 

 

 

External Instruction

Prefetch

Y

N

 

 

 

 

Instruction MMU

Prefetch

Y

N

 

 

 

 

Instruction Cache Parity

Prefetch

Y

N

 

 

 

 

Lock Abort

Data

Y

N

 

 

 

 

MMU Data

Data

Y

Y

 

 

 

 

External Data

Data

N

N

 

 

 

 

Data Cache Parity

Data

N

N

 

 

 

 

Software Interrupt

Software Interrupt

Y

N

 

 

 

 

Undefined Instruction

Undefined Instruction

Y

N

 

 

 

 

Debug Eventsb

varies

varies

N

a.Exception types are those described in the ARM, section 2.5.

b.Refer to Chapter 13, “Software Debug” for more details

2.3.4.2Event Priority

The Intel® 80200 processor follows the exception priority specified in the ARM Architecture Reference Manual. The processor has additional exceptions that might be generated while debugging. For information on these debug exceptions, see Chapter 13, “Software Debug”.

Table 2-12. Event Priority

Exception

Priority

 

 

Reset

1 (Highest)

 

 

Data Abort (Precise & Imprecise)

2

 

 

FIQ

3

 

 

IRQ

4

 

 

Prefetch Abort

5

 

 

Undefined Instruction, SWI

6 (Lowest)

 

 

2-12

March, 2003

Developer’s Manual

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Programming Model

2.3.4.3Prefetch Aborts

The Intel® 80200 processor detects three types of prefetch aborts: Instruction MMU abort, external abort on an instruction access, and an instruction cache parity error. These aborts are described in Table 2-13.

When a prefetch abort occurs, hardware reports the highest priority one in the extended Status field of the Fault Status Register. The value placed in R14_ABORT (the link register in abort mode) is the address of the aborted instruction + 4.

Table 2-13.

Intel® 80200 Processor Encoding of Fault Status for Prefetch Aborts

 

Priority

 

Sources

FS[10,3:0]a

Domain

FAR

 

 

 

 

 

 

Instruction MMU Exception

 

 

 

 

Several exceptions can generate this encoding:

 

 

 

Highest

- translation faults

0b10000

invalid

invalid

- domain faults, and

 

 

 

 

 

- permission faults

 

 

 

 

It is up to software to figure out which one occurred.

 

 

 

 

 

 

 

 

 

External Instruction Error Exception

 

 

 

 

This exception occurs when the external memory system

0b10110

invalid

invalid

 

reports an error on an instruction cache fetch.

 

 

 

 

 

 

 

 

Lowest

Instruction Cache Parity Error Exception

0b11000

invalid

invalid

 

 

 

 

 

 

a.All other encodings not listed in the table are reserved.

Developer’s Manual

March, 2003

2-13

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Programming Model

2.3.4.4Data Aborts

Two types of data aborts exist in the Intel® 80200 processor: precise and imprecise. A precise data abort is defined as one where R14_ABORT always contains the PC (+8) of the instruction that caused the exception. An imprecise abort is one where R14_ABORT contains the PC (+4) of the next instruction to execute and not the address of the instruction that caused the abort. In other words, instruction execution has advanced beyond the instruction that caused the data abort.

On the Intel® 80200 processor precise data aborts are recoverable and imprecise data aborts are not recoverable.

Precise Data Aborts

A lock abort is a precise data abort; the extended Status field of the Fault Status Register is set to 0xb10100. This abort occurs when a lock operation directed to the MMU (instruction or data) or instruction cache causes an exception, due to either a translation fault, access permission fault or external bus fault.

The Fault Address Register is undefined and R14_ABORT is the address of the aborted instruction + 8.

A data MMU abort is precise. These are due to an alignment fault, translation fault, domain

fault, permission fault or external data abort on an MMU translation. The status field is set to a predetermined ARM definition which is shown in Table 2-14, “Intel® 80200 Processor Encoding of Fault Status for Data Aborts” on page 2-14.

The Fault Address Register is set to the effective data address of the instruction and R14_ABORT is the address of the aborted instruction + 8.

Table 2-14. Intel® 80200 Processor Encoding of Fault Status for Data Aborts

Priority

Sources

 

FS[10,3:0]a

Domain

FAR

 

 

 

 

 

 

Highest

Alignment

 

0b000x1

invalid

valid

 

 

 

 

 

 

 

External Abort on Translation

First level

0b01100

invalid

valid

 

Second level

0b01110

valid

valid

 

 

 

 

 

 

 

 

 

Translation

Section

0b00101

invalid

valid

 

Page

0b00111

valid

valid

 

 

 

 

 

 

 

 

 

Domain

Section

0b01001

valid

valid

 

Page

0b01011

valid

valid

 

 

 

 

 

 

 

 

 

Permission

Section

0b01101

valid

valid

 

Page

0b01111

valid

valid

 

 

 

 

 

 

 

 

 

Lock Abort

 

 

 

 

 

This data abort occurs on an MMU lock operation (data or

0b10100

invalid

invalid

 

instruction TLB) or on an Instruction Cache lock operation.

 

 

 

 

 

 

 

 

 

 

Imprecise External Data Abort

 

0b10110

invalid

invalid

 

 

 

 

 

 

Lowest

Data Cache Parity Error Exception

 

0b11000

invalid

invalid

 

 

 

 

 

 

a.All other encodings not listed in the table are reserved.

2-14

March, 2003

Developer’s Manual

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Programming Model

Imprecise data aborts

A data cache parity error is imprecise; the extended Status field of the Fault Status Register is set to 0xb11000.

All external data aborts except for those generated on a data MMU translation are imprecise.

The Fault Address Register for all imprecise data aborts is undefined and R14_ABORT is the address of the next instruction to execute + 4, which is the same for both ARM and Thumb mode.

The Intel® 80200 processor generates external data aborts on multi-bit ECC errors and when the Abort pin is asserted on memory transactions. (See Chapter 11, “Bus Controller” for more details.) An external data abort can occur on non-cacheable loads, reads into the cache, cache evictions, or stores to external memory.

Although the Intel® 80200 processor guarantees the Base Restored Abort Model for precise aborts, it cannot do so in the case of imprecise aborts. A Data Abort handler may encounter an updated base register if it is invoked because of an imprecise abort.

Imprecise data aborts may create scenarios that are difficult for an abort handler to recover. Both external data aborts and data cache parity errors may result in corrupted data in the targeted registers. Because these faults are imprecise, it is possible that the corrupted data has been used before the Data Abort fault handler is invoked. Because of this, software should treat imprecise data aborts as unrecoverable.

Note that even memory accesses marked as “stall until complete” (see Section 3.2.2.4) can result in imprecise data aborts. For these types of accesses, the fault is somewhat less imprecise than the general case: it is guaranteed to be raised within three instructions of the instruction that caused it. In other words, if a “stall until complete” LD or ST instruction triggers an imprecise fault, then that fault is seen by the program within three instructions.

With this knowledge, it is possible to write code that accesses “stall until complete” memory with impunity. Simply place several NOP instructions after such an access. If an imprecise fault occurs, it happens during the NOPs; the data abort handler sees identical register and memory state as it would with a precise exception, and so should be able to recover. An example of this is shown in Example 2-2 on page 2-15.

Example 2-2. Shielding Code from Potential Imprecise Aborts

;;Example of code that maintains architectural state through the

;;window where an imprecise fault might occur.

LD

R0, [R1]

;

R1 points to stall-until-complete

 

 

;

region of memory

NOP

 

 

 

NOP

 

 

 

NOP

 

 

 

;Code beyond this point is guaranteed not to see any aborts

;from the LD.

Of course, if a system design precludes events that could cause external aborts, then such precautions are not necessary.

Developer’s Manual

March, 2003

2-15

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Programming Model

Multiple Data Aborts

Multiple data aborts may be detected by hardware, but only the highest priority one is reported. If the reported data abort is precise, software can correct the cause of the abort and re-execute the aborted instruction. If the lower priority abort still exists, it is reported. Software can handle each abort separately until the instruction successfully executes.

If the reported data abort is imprecise, software needs to check the SPSR to see if the previous context was executing in abort mode. If this is the case, the link back to the current process has been lost and the data abort is unrecoverable.

2.3.4.5Events from Preload Instructions

A PLD instruction never causes the Data MMU to fault for any of the following reasons:

Domain Fault

Permission Fault

Translation Fault

If execution of the PLD would cause one of the above faults, then the PLD causes no effect.

This feature allows software to issue PLDs speculatively. For example, Example 2-3 on page 2-16 places a PLD instruction early in the loop. This PLD is used to fetch data for the next loop iteration. In this example, the list is terminated with a node that has a null pointer. When execution reaches the end of the list, the PLD on address 0x0 does not cause a fault. Rather, it is ignored and the loop terminates normally.

Example 2-3. Speculatively issuing PLD

;;R0 points to a node in a linked list. A node has the following layout:

;;Offset Contents

;;----------------------------------

;;0 data

;;4 pointer to next node

;;This code computes the sum of all nodes in a list. The sum is placed into R9.

MOV R9, #0 ; Clear accumulator sumList:

LDR R1, [R0, #4]

; R1

gets

pointer to next node

LDR R3, [R0]

; R3

gets

data from current node

PLD [R1]

; Speculatively start load of next node

ADD R9, R9, R3

; Add into accumulator

MOVS R0, R1

; Advance

to next node. At end of list?

BNE sumList

; If not then loop

2.3.4.6Debug Events

Debug events are covered in Section 13.5, “Debug Exceptions” on page 13-6.

2-16

March, 2003

Developer’s Manual

Memory Management

3

This chapter describes the memory management unit implemented in the Intel® 80200 processor based on Intel® XScalemicroarchitecture, and is compliant with the ARM* Architecture V5TE.

3.1Overview

The Intel® 80200 processor implements the Memory Management Unit (MMU) Architecture specified in the ARM Architecture Reference Manual. To accelerate virtual to physical address translation, the Intel® 80200 processor uses both an instruction Translation Look-aside Buffer (TLB) and a data TLB to cache the latest translations. Each TLB holds 32 entries and is fully-associative. Not only do the TLBs contain the translated addresses, but also the access rights for memory references.

If an instruction or data TLB miss occurs, a hardware translation-table-walking mechanism is invoked to translate the virtual address to a physical address. Once translated, the physical address is placed in the TLB along with the access rights and attributes of the page or section. These translations can also be locked down in either TLB to guarantee the performance of critical routines.

The Intel® 80200 processor allows system software to associate various attributes with regions of memory:

cacheable

bufferable

line allocate policy

write policy

I/O

mini Data Cache

Coalescing

ECC-Protected

See Section 3.2.2, “Memory Attributes” on page 3-2 for a description of page attributes and Section 2.3.2, “New Page Attributes” on page 2-9 to find out where these attributes have been mapped in the MMU descriptors.

Note: The virtual address with which the TLBs are accessed may be remapped by the PID register. See Section 7.2.13, “Register 13: Process ID” on page 7-16 for a description of the PID register.

Developer’s Manual

March, 2003

3-1

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Memory Management

3.2Architecture Model

3.2.1Version 4 vs. Version 5

ARM* MMU Version 5 Architecture introduces the support of tiny pages, which are 1 KByte in size. The reserved field in the first-level descriptor (encoding 0b11) is used as the fine page table base address. The exact bit fields and the format of the first and second-level descriptors can be found in Section 2.3.2, “New Page Attributes” on page 2-9.

3.2.2Memory Attributes

The attributes associated with a particular region of memory are configured in the memory management page table and control the behavior of accesses to the instruction cache, data cache, mini-data cache and the write buffer. These attributes are ignored when the MMU is disabled.

To allow compatibility with older system software, the new Intel® 80200 processor attributes take advantage of encoding space in the descriptors that was formerly reserved.

3.2.2.1Page (P) Attribute Bit

The P bit specifies that the associated memory should be protected with ECC. The P bit is only present in the first level descriptors. Thus, ECC memory is specified with a 1 megabyte granularity.

If the MMU is disabled, ECC is disabled for all memory accesses. If the MMU is enabled, ECC is enabled for a region of memory if:

its P bit in the first level descriptor for that virtual memory is set and

the BCU has ECC enabled (see Chapter 11, “Bus Controller”)

Accesses to memory for page walks do not use the MMU. For these accesses, ECC is enabled if:

the CP15 Auxiliary Control Register enables it (see Section 7.2.2, “Register 1: Control and Auxiliary Control Registers” on page 7-7) and

the BCU has ECC enabled (see Chapter 11, “Bus Controller”)

3.2.2.2Cacheable (C), Bufferable (B), and eXtension (X) Bits
3.2.2.3Instruction Cache

When examining these bits in a descriptor, the Instruction Cache only utilizes the C bit. If the C bit is clear, the Instruction Cache considers a code fetch from that memory to be non-cacheable, and does not fill a cache entry. If the C bit is set, then fetches from the associated memory region are cached.

3-2

March, 2003

Developer’s Manual

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Memory Management

3.2.2.4Data Cache and Write Buffer

All of these descriptor bits affect the behavior of the Data Cache and the Write Buffer.

If the X bit for a descriptor is zero, the C and B bits operate as mandated by the ARM architecture. This behavior is detailed in Table 3-1.

If the X bit for a descriptor is one, the C and B bits’ meaning is extended, as detailed in Table 3-2.

Table 3-1.

Data Cache and Buffer Behavior when X = 0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Line

 

 

C B

Cacheable?

Bufferable?

Write Policy

Allocation

Notes

 

 

 

 

 

Policy

 

 

 

 

 

 

 

 

 

0 0

N

N

-

-

Stall until completea

 

0 1

N

Y

-

-

 

 

 

 

 

 

 

 

 

1 0

Y

Y

Write Through

Read Allocate

 

 

 

 

 

 

 

 

 

1 1

Y

Y

Write Back

Read Allocate

 

 

 

 

 

 

 

 

a.Normally, the processor continues executing after a data access if no dependency on that access is encountered. With this setting, the processor stalls execution until the data access completes. This guarantees to software that the data access has taken effect by the time execution of the data access instruction completes. External data aborts from such accesses are imprecise (but see Section 2.3.4.4 for a method to shield code from this imprecision).

Table 3-2.

Data Cache and Buffer Behavior when X = 1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Line

 

 

C B

Cacheable?

Bufferable?

Write Policy

Allocation

Notes

 

 

 

 

 

Policy

 

 

 

 

 

 

 

 

 

0 0

-

-

-

-

Unpredictable -- do not use

 

 

 

 

 

 

 

 

0 1

N

Y

-

-

Writes do not coalesce into

 

buffersa

 

 

 

 

 

 

 

 

(Mini Data

 

 

 

Cache policy is determined

 

1 0

-

-

-

by MD field of Auxiliary

 

Cache)

 

 

 

 

 

Control registerb

 

 

 

 

 

 

 

1 1

Y

Y

Write Back

Read/Write

 

 

Allocate

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a.Normally, bufferable writes can coalesce with previously buffered data in the same address range

b.See Section 7.2.2 for a description of this register

Developer’s Manual

March, 2003

3-3

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Memory Management

3.2.2.5Details on Data Cache and Write Buffer Behavior

If the MMU is disabled all data accesses are non-cacheable and non-bufferable. This is the same behavior as when the MMU is enabled, and a data access uses a descriptor with X, C, and B all set to 0.

The X, C, and B bits determine when the processor should place new data into the Data Cache. The cache places data into the cache in lines (also called blocks). Thus, the basis for making a decision about placing new data into the cache is a called a “Line Allocation Policy”.

If the Line Allocation Policy is read-allocate, all load operations that miss the cache request a 32-byte cache line from external memory and allocate it into either the data cache or mini-data cache (this is assuming the cache is enabled). Store operations that miss the cache do not cause a line to be allocated.

If read/write-allocate is in effect, load or store operations that miss the cache requests a 32-byte cache line from external memory if the cache is enabled.

The other policy determined by the X, C, and B bits is the Write Policy. A write-through policy instructs the Data Cache to keep external memory coherent by performing stores to both external memory and the cache. A write-back policy only updates external memory when a line in the cache is cleaned or needs to be replaced with a new line. Generally, write-back provides higher performance because it generates less data traffic to external memory.

More details on cache policies may be gleaned from Section 6.2.3, “Cache Policies” on page 6-5.

3.2.2.6Memory Operation Ordering

A fence memory operation (memop) is one that guarantees all memops issued prior to the fence executes before any memop issued after the fence. Thus software may issue a fence to impose a partial ordering on memory accesses.

Table 3-3 on page 3-4 shows the circumstances in which memops act as fences.

Any swap (SWP or SWPB) to a page that would create a fence on a load or store is a fence.

Table 3-3.

Memory Operations that Impose a Fence

 

 

 

 

 

 

 

 

operation

X

C

B

 

 

 

 

 

 

load

-

0

-

 

 

 

 

 

 

store

1

0

1

 

 

 

 

 

 

load or store

0

0

0

 

 

 

 

 

3.2.3Exceptions

The MMU may generate prefetch aborts for instruction accesses and data aborts for data memory accesses. The types and priorities of these exceptions are described in Section 2.3.4, “Event Architecture” on page 2-12.

Data address alignment checking is enabled by setting bit 1 of the Control Register (CP15, register 1). Alignment faults are still reported even if the MMU is disabled. All other MMU exceptions are disabled when the MMU is disabled.

3-4

March, 2003

Developer’s Manual

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Memory Management

3.3Interaction of the MMU, Instruction Cache, and Data Cache

The MMU, instruction cache, and data/mini-data cache may be enabled/disabled independently. The instruction cache can be enabled with the MMU enabled or disabled. However, the data cache can only be enabled when the MMU is enabled. Therefore only three of the four combinations of the MMU and data/mini-data cache enables are valid. The invalid combination causes undefined results.

Table 3-4.

Valid MMU & Data/mini-data Cache Combinations

 

 

 

 

 

 

MMU

 

Data/mini-data Cache

 

 

 

 

 

Off

 

Off

 

 

 

 

 

On

 

Off

 

 

 

 

 

On

 

On

 

 

 

 

Developer’s Manual

March, 2003

3-5

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Memory Management

3.4Control

3.4.1Invalidate (Flush) Operation

The entire instruction and data TLB can be invalidated at the same time with one command or they can be invalidated separately. An individual entry in the data or instruction TLB can also be invalidated. See Table 7-13, “TLB Functions” on page 7-13 for a listing of commands supported by the Intel® 80200 processor.

Globally invalidating a TLB does not affect locked TLB entries. However, the invalidate-entry operations can invalidate individual locked entries. In this case, the locked entry remains in the TLB, but never “hits” on an address translation. Effectively, a hole is in the TLB. This situation may be rectified by unlocking the TLB.

3.4.2Enabling/Disabling

The MMU is enabled by setting bit 0 in coprocessor 15, register 1 (Control Register).

When the MMU is disabled, accesses to the instruction cache default to cacheable and all accesses to data memory are made non-cacheable.

A recommended code sequence for enabling the MMU is shown in Example 3-1 on page 3-6.

Example 3-1. Enabling the MMU

;This routine provides software with a predictable way of enabling the MMU.

;After the CPWAIT, the MMU is guaranteed to be enabled. Be aware

;that the MMU will be enabled sometime after MCR and before the instruction

;that executes after the CPWAIT.

;Programming Note: This code sequence requires a one-to-one virtual to

;physical address mapping on this code since

;the MMU may be enabled part way through. This would allow the instructions

;after MCR to execute properly regardless the state of the MMU.

MRC P15,0,R0,C1,C0,0; Read CP15, register 1

ORR R0, R0, #0x1; Turn on the MMU

MCR P15,0,R0,C1,C0,0; Write to CP15, register 1

;For a description of CPWAIT, see

;Section 2.3.3, “Additions to CP15 Functionality” on page 2-11 CPWAIT

;The MMU is guaranteed to be enabled at this point; the next instruction or

;data address will be translated.

3-6

March, 2003

Developer’s Manual

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Memory Management

3.4.3Locking Entries

Individual entries can be locked into the instruction and data TLBs. See Table 7-14, “Cache Lockdown Functions” on page 7-14 for the exact commands. If a lock operation finds the virtual address translation already resident in the TLB, the results are unpredictable. An invalidate by entry command before the lock command ensures proper operation. Software can also accomplish this by invalidating all entries, as shown in Example 3-2 on page 3-7.

Locking entries into either the instruction TLB or data TLB reduces the available number of entries (by the number that was locked down) for hardware to cache other virtual to physical address translations.

A procedure for locking entries into the instruction TLB is shown in Example 3-2 on page 3-7.

If a MMU abort is generated during an instruction or data TLB lock operation, the Fault Status Register is updated to indicate a Lock Abort (see Section 2.3.4.4, “Data Aborts” on page 2-14), and the exception is reported as a data abort.

Example 3-2. Locking Entries into the Instruction TLB

;R1, R2 and R3 contain the virtual addresses to translate and lock into

;the instruction TLB.

;The value in R0 is ignored in the following instruction.

;Hardware guarantees that accesses to CP15 occur in program order

MCR P15,0,R0,C8,C5,0

; Invalidate the entire instruction TLB

MCR P15,0,R1,C10,C4,0

; Translate virtual address (R1) and

lock into

 

; instruction TLB

 

MCR P15,0,R2,C10,C4,0

; Translate

 

 

; virtual address (R2) and lock into

instruction TLB

MCR P15,0,R3,C10,C4,0

; Translate virtual address (R3) and

lock into

 

; instruction TLB

 

CPWAIT

;The MMU is guaranteed to be updated at this point; the next instruction will

;see the locked instruction TLB entries.

Note: If exceptions are allowed to occur in the middle of this routine, the TLB may end up caching a translation that is about to be locked. For example, if R1 is the virtual address of an interrupt service routine and that interrupt occurs immediately after the TLB has been invalidated, the lock operation is ignored when the interrupt service routine returns back to this code sequence. Software should disable interrupts (FIQ or IRQ) in this case.

As a general rule, software should avoid locking in all other exception types.

The proper procedure for locking entries into the data TLB is shown in Example 3-3 on page 3-8.

Developer’s Manual

March, 2003

3-7

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Memory Management

Example 3-3. Locking Entries into the Data TLB

; R1, and R2 contain the virtual addresses to translate and lock into the data TLB

MCR P15,0,R1,C8,C6,1

; Invalidate the data TLB entry specified

by the

 

;

virtual address in R1

 

MCR P15,0,R1,C10,C8,0

;

Translate virtual address (R1) and lock

into

;data TLB

;Repeat sequence for virtual address in R2

MCR P15,0,R2,C8,C6,1

; Invalidate the

data TLB entry specified

by the

 

; virtual address in R2

 

MCR P15,0,R2,C10,C8,0

; Translate virtual

address (R2) and lock

into

 

;

data

TLB

 

 

 

CPWAIT

;

wait

for locks

to

complete

 

;The MMU is guaranteed to be updated at this point; the next instruction will

;see the locked data TLB entries.

Note: Care must be exercised here when allowing exceptions to occur during this routine whose handlers may have data that lies in a page that is trying to be locked into the TLB.

3-8

March, 2003

Developer’s Manual

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Memory Management

3.4.4Round-Robin Replacement Algorithm

The line replacement algorithm for the TLBs is round-robin; there is a round-robin pointer that keeps track of the next entry to replace. The next entry to replace is the one sequentially after the last entry that was written. For example, if the last virtual to physical address translation was written into entry 5, the next entry to replace is entry 6.

At reset, the round-robin pointer is set to entry 31. Once a translation is written into entry 31, the round-robin pointer gets set to the next available entry, beginning with entry 0 if no entries have been locked down. Subsequent translations move the round-robin pointer to the next sequential entry until entry 31 is reached, where it wraps back to entry 0 upon the next translation.

A lock pointer is used for locking entries into the TLB and is set to entry 0 at reset. A TLB lock operation places the specified translation at the entry designated by the lock pointer, moves the lock pointer to the next sequential entry, and resets the round-robin pointer to entry 31. Locking entries into either TLB effectively reduces the available entries for updating. For example, if the first three entries were locked down, the round-robin pointer would be entry 3 after it rolled over from entry 31.

Only entries 0 through 30 can be locked in either TLB; entry 31can never be locked. If the lock pointer is at entry 31, a lock operation updates the TLB entry with the translation and ignore the lock. In this case, the round-robin pointer stays at entry 31.

Figure 3-1. Example of Locked Entries in TLB

Eight entries locked, 24 entries available for round robin replacement

entry 0 entry 1

entry 7 entry 8

entry 22 entry 23

entry 30 entry 31

Locked

Developer’s Manual

March, 2003

3-9

Instruction Cache

4

The Intel® 80200 processor based on Intel® XScalemicroarchitecture (compliant with the ARM* Architecture V5TE) instruction cache enhances performance by reducing the number of instruction fetches from external memory. The cache provides fast execution of cached code. Code can also be locked down when guaranteed or fast access time is required.

4.1Overview

Figure 4-1 shows the cache organization and how the instruction address is used to access the cache.

The instruction cache is a 32-Kbyte, 32-way set associative cache; this means there are 32 sets with each set containing 32 ways. Each way of a set contains eight 32-bit words and one valid bit, which is referred to as a line. The replacement policy is a round-robin algorithm and the cache also supports the ability to lock code in at a line granularity.

Figure 4-1. Instruction Cache Organization

 

 

 

Set 31

 

 

 

 

 

 

 

 

 

way 0

 

8 Words (cache line)

 

 

 

 

 

way 1

 

 

 

 

 

 

Set Index

 

 

CAM

 

DATA

 

 

 

 

 

 

Set 1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Set 0

way 0

8 Words (cache line)

 

 

 

 

 

 

 

way 1

 

 

 

 

 

 

 

 

way 0

8 Words (cache line)way 31

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

way 1

 

 

 

 

 

 

 

 

This example

 

CAM

DATA

 

 

 

 

 

 

 

 

DATA

 

 

 

 

 

 

shows Set 0 being

CAM

 

 

 

 

 

 

 

selected by the

 

way 31

 

 

 

 

 

 

set index.

 

 

 

 

 

 

 

way 31

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Tag

 

 

 

 

CAM: Content

 

 

 

 

 

 

 

 

Addressable Memory

 

 

 

 

 

 

 

 

 

Word Select

 

 

 

 

 

 

 

 

 

 

 

 

Instruction Word

 

 

 

 

 

 

 

 

 

(4 bytes)

 

 

 

 

 

 

Instruction Address (Virtual)

 

 

 

 

 

 

 

 

31

 

 

10

9

5

4

2

1

0

 

 

Tag

 

 

Set Index

 

Word

 

 

The instruction cache is virtually addressed and virtually tagged.

Note: The virtual address presented to the instruction cache may be remapped by the PID register. See Section 7.2.13, “Register 13: Process ID” on page 7-16 for a description of the PID register.

Developer’s Manual

March, 2003

4-1

Intel® 80200 Processor based on Intel® XScaleMicroarchitecture

Instruction Cache

4.2Operation

4.2.1Operation When Instruction Cache is Enabled

When the cache is enabled, it compares every instruction request address against the addresses of instructions that it is currently holding. If the cache contains the requested instruction, the access “hits” the cache, and the cache returns the requested instruction. If the cache does not contain the requested instruction, the access “misses” the cache, and the cache requests a fetch from external memory of the 8-word line (32 bytes) that contains the requested instruction using the fetch policy described in Section 4.2.3. As the fetch returns instructions to the cache, they are placed in one of two fetch buffers and the requested instruction is delivered to the instruction decoder.

A fetched line is written into the cache if it is cacheable. Code is designated as cacheable when the Memory Management Unit (MMU) is disabled or when the MMU is enable and the cacheable (C) bit is set to 1 in its corresponding page. See Chapter 3, “Memory Management” for a discussion on page attributes.

Note that an instruction fetch may “miss” the cache but “hit” one of the fetch buffers. When this happens, the requested instruction is delivered to the instruction decoder in the same manner as a cache “hit.”

4.2.2Operation When The Instruction Cache Is Disabled

Disabling the cache prevents any lines from being written into the instruction cache. Although the cache is disabled, it is still accessed and may generate a “hit” if the data is already in the cache.

Disabling the instruction cache does not disable instruction buffering that may occur within the instruction fetch buffers. Two 8-word instruction fetch buffers are always enabled in the cache disabled mode. So long as instruction fetches continue to “hit” within either buffer (even in the presence of forward and backward branches), no external fetches for instructions are generated. A miss causes one or the other buffer to be filled from external memory using the fill policy described in Section 4.2.3.

4-2

March, 2003

Developer’s Manual

+ 239 hidden pages