Intel XScale Core Developer's Manual

Intel XScale® Core
Develop er ’s Ma nu al
January, 2004
Order Number: 273473-002
2 January, 2004 Developer’s Manual
Intel XScale® Cor e Developer’s Manual
Information in this document is provided in connection with Intel® products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel's Terms and Conditions of Sale for such products, Intel assumes no li ability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for
future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. Intel
internal code names are subject to change.
THIS SPECIFICATION, THE Intel XScale® Core Developer’s Manual IS PROVIDED "AS IS" WITH NO WARRANTIES WHATSOEVER, INCLUDING ANY WARRANTY OF MERCHANTABILITY, NONINFRINGEMENT, FITNESS FOR ANY PARTICULAR PURPOSE, OR ANY WARRANTY OTHERWISE ARISING OUT OF ANY PROPOSAL, SPECIFICATION OR SAMPLE.
Intel disclaims all liability, including liability for infringement of any proprietary rights, relating to use of information in this specification. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted herein.
Copyright © Intel Corporation, 2004 AlertVIEW, i960, AnyPoint, AppChoice, BoardWatch, BunnyPeople, CablePort, Celeron, Chips, Commerce Cart, CT Connect, CT Media, Dialogic,
DM3, EtherExpress, ETOX, FlashFile, GatherRound, i386, i486, iCat, iCOMP, Insight960, InstantIP, Intel, Intel logo, Intel386, Intel486, Intel740, IntelDX2, IntelDX4, IntelSX2, Intel ChatPad, Intel Create&Share, Intel Dot.Station, Intel GigaBlade, Intel InBusiness, Intel Inside, Intel Inside logo, Intel NetBurst, Intel NetStructure, Intel Play, Intel Play logo, Intel Pocket Concert, Intel SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel TeamStation, Intel WebOutfitter, Intel Xeon, Intel XScale, Itanium, JobAnalyst, LANDesk, LanRover, MCS, MMX, MMX logo, NetPort, NetportExpress, Optimizer logo, OverDrive, Paragon, PC Dads, PC Parents, Pentium, Pentium II Xeon, Pentium III Xeon, Performance at Your Command, ProShare, RemoteExpress, Screamline, Shiva, SmartDie, Solutions960, Sound Mark, StorageExpress, The Computer Inside, The Journey Inside, This Way In, TokenExpress, Trillium, Vivonic, and VTune are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
The ARM* and ARM Powered logo marks (the ARM marks) are trademarks of ARM, Ltd., and Intel uses these marks under license from ARM, Ltd. *Other names and brands may be claimed as the property of others.
Developer’s Manual January, 2004 3
Intel XScale® Core Developer’s Manual
Contents
Contents
1 Introduction....................................................................................................................................13
1.1 About This Document................................ ....... ..... .. ....... ..... ....... ..... ....... .. ....... ..... ..... ....... ..13
1.1.1 How to Read This Document.................................................................................13
1.1.2 Other Relevant Documents ...................................................................................14
1.2 High-Level Overview of the Intel XScale® Core..................................................................15
1.2.1 ARM Compatibility .................................................................................................15
1.2.2 Features.................................................................................................................16
1.2.2.1 Multiply/Accumulate (MAC)....................................................................16
1.2.2.2 Memory Management............................................................................17
1.2.2.3 Instruction Cache...................................................................................17
1.2.2.4 Branch Target Buffer..............................................................................17
1.2.2.5 Data Cache............................................................................................17
1.2.2.6 Performance Monitoring .........................................................................18
1.2.2.7 P ower Ma nagem ent...............................................................................18
1.2.2.8 Debug ....................................................................................................18
1.2.2.9 JTAG......................................................................................................18
1.3 Termi n o log y and Conve n tions............................. ................ ................. ................ ..............19
1.3.1 Number Representation..................... ................ ................. ................ ................. ..19
1.3.2 Terminology and Acronyms........................ ................. ................ ................. .........19
2 Programming Model ............................................................................................ ....... ...................21
2.1 ARM Architecture Compatibility..........................................................................................21
2.2 ARM Architecture Implementation Options......................................................................... 21
2.2.1 Big Endian versus Little Endian.............................................................................21
2.2.2 26-Bit Architectur e............... ................ ................. ................ ................. ................21
2.2.3 Thumb....................................................................................................................21
2.2.4 ARM DSP-Enhanced Instructi o n Set........ ................ ................. ................. ...........22
2.2.5 Base Register Update..................................................................................... ..... ..22
2.3 Exten sions to ARM Architect ur e....... ......... ................. ................ ................. ................ .......23
2.3.1 DSP Coprocessor 0 (CP0).....................................................................................23
2.3.1.1 Multiply With Internal Accumulate Format .............................................24
2.3.1.2 Internal Accumulator Access Format..................................................... 27
2.3.2 New Page Attributes..............................................................................................29
2.3.3 Additions to CP15 Functionality.............................................................................31
2.3.4 Event Architecture .................................................................................................32
2.3.4.1 Exception Summary...............................................................................32
2.3.4.2 Event Priori ty..... .......... ................ ................. ................ ................. .........32
2.3.4.3 Prefetch Aborts......................................................................................33
2.3.4.4 Data Aborts............................................................................................34
2.3.4.5 E v ents from Preload Ins tructions ...........................................................35
2.3.4.6 Debug Events ................. ................. ................. ................ ................. ....36
3 Memory Management....................................................................................................................37
3.1 Overview.............................................................................................................................37
3.2 Architecture Model..............................................................................................................38
3.2.1 Version 4 vs. Version 5..........................................................................................38
3.2.2 Memory Attributes..................................................................................................38
3.2.2.1 P age (P ) Attribute Bit .............................................................................38
4 January, 2004 Developer’s Manual
Intel XScale® Cor e Developer’s Manual
Contents
3.2.2.2 Cacheable (C), Bufferable (B), and eXtension (X) Bits..........................38
3.2.2.3 Instruction Cache....................... ................ .......... ................ ................. .38
3.2.2.4 Data Cache and Write Buff e r............ ................ .......... ................ ...........39
3.2.2.5 D etails on Data Cache and Write Buffer Behavior.................................40
3.2.2.6 Memory Operation Ordering.................................................................. 40
3.2.3 Exceptions.............................................................................................................40
3.3 Interaction of the MMU, Instruction Cache, and Data Cache.............................................41
3.4 Control................................................................................................................................42
3.4.1 Invalidate (Flush) Operation ................... ................. ................ ................. .............42
3.4.2 Enabling/Disabling................................................................................................. 42
3.4.3 L ocking Entries......................................................................................................43
3.4.4 Round-Robin Replacement Algorithm................................................... .......... ......45
4 Instruction Cache... ................. ................ ................. ................. ......... ................. ...........................47
4.1 Overview.............................................................................................................................47
4.2 Operation............................................................................................................................48
4.2.1 O peration When Instruction Cache is Enabled ......................................................48
4.2.2 Operatio n When The Instr u ction Cache Is Disabled......... ................. ......... ...........48
4.2.3 Fetch Policy.................... ................. ................. ................ ................. ................ ....49
4.2.4 Round-Robin Replacement Algorithm................................................... .......... ......49
4.2.5 Parity Protection...... ................. ................. ................ ................. ................ ...........50
4.2.6 Instruction Fetch Latency.......... ................. ................ ................. ................ ...........51
4.2.7 Instruction Cache Coherency............ ................. ................. ......... ................. ........51
4.3 Instr uc tion Cache Control........................ ................. ................ .......... ................ ................52
4.3.1 Instruction Cache State at RESET........................................................................52
4.3.2 Enabling/Disabling................................................................................................. 52
4.3.3 Invalidating the Instruction Cache.......................................................................... 53
4.3.4 L ocking Instructions in the Instruction Cache........................................................54
4.3.5 Unlockin g Ins tr u ctions in the Instruction Cache......... .......... ................ ................. .55
5 Branch Target Buffer .....................................................................................................................57
5.1 Branch Target Buffer (BTB) Operation...............................................................................57
5.1.1 R eset .....................................................................................................................58
5.1.2 Update Policy..................................... ................... ................... ....... ................... ....58
5.2 BTB Control........................................................................................................................59
5.2.1 D isabling/Enabling.................................................................................................59
5.2.2 Invalidation.............................................................................................................59
6 Data Cache....................................................................................................................................61
6.1 Overviews...........................................................................................................................61
6.1.1 D ata Cache Overview............................................................................................ 61
6.1.2 Mini-Data Cache Overview.............. ................. ................ ................. ......... ...........63
6.1.3 Write Buffer and Fill Buffer Overview.....................................................................64
6.2 Data Cache and Mini-Data Cache Operation.....................................................................65
6.2.1 Operation When Caching is Enabled.....................................................................65
6.2.2 O peration When Data Caching is Disabled........................................................... 65
6.2.3 Cache Policies.............. ................. ................ ................. ................ ................. ......65
6.2.3.1 C acheability ...........................................................................................65
6.2.3.2 R ead M iss Policy ...................................................................................66
6.2.3.3 Write Miss Policy....................................................................................67
6.2.3.4 Write-Bac k Versus Write-Through .........................................................67
Developer’s Manual January, 2004 5
Intel XScale® Core Developer’s Manual
Contents
6.2.4 Round-Robin Replacement Algorithm ...................................................................68
6.2.5 Parity Protection ....................................................................................................68
6.2.6 Atomic Accesses ...................................................................................................68
6.3 Data Cache and Mi ni-Data Cache Control ............ ................ .......... ................ ................. ..69
6.3.1 Data Memory State After Reset..................... ................ ................. ................ .......69
6.3.2 Enabling/Disabling.................................................................................................69
6.3.3 Invalidate and Clean Operations ...........................................................................69
6.3.3.1 Global Clean and Invalidate Operation........................................... .......70
6.4 Re-configuring the Data Cache as Data RAM....................................................................71
6.5 Write Buf fer/Fill Buffer Op er at io n and Control ........ ... .. .... . .. .... . .. ... .. .... . .. .... . .. .... . .. ... .. .... . .. ...75
7 Configuration.................................................................................................................................77
7.1 Overview.............................................................................................................................77
7.2 CP15 Registers...................................................................................................................80
7.2.1 Register 0: ID & Cache Type Registers.... ......... ................. ................ .......... .........81
7.2.2 Reg is t er 1: C on t r ol & A u x iliary Control R eg is t er s .. .. ............... .. .... . .. .. ............... .. ...83
7.2.3 Register 2: Translati on Table Base Register................. ................. ................ .......85
7.2.4 Register 3: Domain Access Control Register.........................................................85
7.2.5 Register 4: Reserved........... ......... ................. ................ ................. ................ .......85
7.2.6 Register 5: Fault Status Register ...........................................................................86
7.2.7 Register 6: Fault address Register ........................................................................ 86
7.2.8 Register 7: Cache Functions .................................................................................87
7.2.9 Register 8: TLB Operations...................................................................................89
7.2.10 Register 9: Cache Lock Down ...............................................................................90
7.2.11 Register 10: TLB Lock Down................................................................................. 91
7.2.12 Register 11-12 : Reserved...... ......... ................. ................. ................ ................. ....91
7.2.13 Register 13: Process ID.........................................................................................91
7.2.13.1 The PID Regi ster Affect On Addresse s................ .......... ................ .......92
7.2.14 Register 14: Breakpoint Registers.........................................................................93
7.2.15 Register 15: Copro ce sso r Acce ss Regi ster.............. .......... ................ .......... .........94
7.3 CP14 Registers...................................................................................................................96
7.3.1 Performance Monitoring Registers ........................................................................96
7.3.1.1 XSC1 Performance Mon itoring Registers............. ................. ................96
7.3.1.2 XSC2 Performance Mon itoring Registers............. ................. ................97
7.3.2 Clock and Power Management Registers..............................................................98
7.3.3 Software Debug Registers.. ...................................................................................99
8 Performance Monitoring..............................................................................................................101
8.1 Overview...........................................................................................................................1 01
8.2 XSC1 Register Description (2 counter variant).................................................................102
8.2.1 Clock Counter (CCNT; CP14 - Register 1).......................................................... 102
8.2.2 Performance Count Registers (PMN0 - PMN1;
CP14 - Register 2 and 3, Respectively)............................................................... 1 03
8.2.3 Extending Count Duration Beyond 32 Bits ..........................................................103
8.2.4 Performance Monitor Control Register (PMNC)..................................................103
8.2.4.1 M anagi ng PM NC..................................................................................105
8.3 XSC2 Register Description (4 counter variant).................................................................106
8.3.1 Clock Counter (CCNT).................. .. ... .. ..... .. ..... .. ... .. .. ..... ... .. .. ..... .. ..... ...................106
8.3.2 Performance Count Registers (PMN0 - PMN3)...................................................1 07
8.3.3 Performance Monitor Control Register (PMNC)..................................................108
8.3.4 Interrupt Enable Register ( INT EN)....... .......... ................ ................. ................ .....109
6 January, 2004 Developer’s Manual
Intel XScale® Cor e Developer’s Manual
Contents
8.3.5 O verflow Flag Status Register (FLAG)................................................................110
8.3.6 Event Select Register (EVTSEL) .........................................................................111
8.3.7 Managing the Performance Monit o r.............. ................. ................ ................. ....112
8.4 Performan ce Mon itoring Events..................... ................. ................. ................ ................113
8.4.1 Instruction Cache Efficiency Mode ......................................................................115
8.4.2 D ata Cache Efficiency Mode...............................................................................115
8.4.3 Instruction Fetch Latency Mode...........................................................................115
8.4.4 Data/Bus Request Buffer Full Mode .................................................................... 116
8.4.5 Stall/Writeback Statistics.....................................................................................116
8.4.6 Instruction TLB Effici e nc y Mode................ ................ ................. ................ .........117
8.4.7 D ata TLB Efficiency Mode...................................................................................117
8.5 Multiple Pe r fo rma n ce Mon itoring Run Statist ics....... ................ .......... ................ ..............118
8.6 Examples..........................................................................................................................119
8.6.1 XSC1 Example (2 counter variant)......................................................................119
8.6.2 XSC2 Example (4 counter variant)......................................................................120
9 Software Debug............ ................. ................. ................ ................. ................ .......... ..................121
9.1 Definitions ......................................................................................................................... 121
9.2 Debug Registers...............................................................................................................121
9.3 Introduction.......................................................................................................................122
9.3.1 H alt Mode............................................................................................................122
9.3.2 Monitor Mode....................................................................................................... 122
9.4 Debug Control and Status Register (DCSR) . ...................................................................123
9.4.1 G lobal Enable Bit (GE)........................................................................................124
9.4.2 Halt Mode Bit (H).................................................................................................124
9.4.3 SOC Break (B)..................... ................. ................ ................. ................ ..............124
9.4.4 Vector Trap Bits (TF,TI,TD,TA,TS,TU,TR)..........................................................125
9.4.5 Sticky Abort Bit (SA)...................... ................ ................. ................ ................. ....125
9.4.6 Method of Entry Bits (MOE).................................................................................125
9.4.7 Trace Buffer Mode Bit (M) ............................................................ ....... .. ..... ....... ..125
9.4.8 Trace Buffer Enable Bit (E)............................ ....... ....... .......... ....... ....... ....... ....... ..125
9.5 Debug Exceptions.............................................................................................................126
9.5.1 H alt Mode............................................................................................................127
9.5.2 Monitor Mode....................................................................................................... 129
9.6 HW Breakpoint Resources...................... ................. ................ ................. ................. ......130
9.6.1 Instruction Breakpoi nt s...................... ................. ................. ................ ................130
9.6.2 D ata Breakpoints.................................................................................................131
9.7 Software Bre a k p o ints............ ................. ......... ................. ................. ................ ................133
9.8 Transmit/Receive Control Register (TXRXCTRL)............................................................ 134
9.8.1 RX Register Ready Bit (RR)......................................................... .. ....... .......... .. ..135
9.8.2 Overflow Fl a g (OV).......................... .......... ................ ................. ................ .........136
9.8.3 Downloa d Fla g (D)...... ................ ................. ................ ........................ ................136
9.8.4 TX Register Ready Bit (TR).................................................................................137
9.8.5 Conditional Execution Using TXRXCTRL............................................................137
9.9 Transmit Register (TX)..................................................................................................... 138
9.10 Receive Regist er (RX)............... ................. ................ ................. ................ ................. ....138
9.11 D ebug JTAG A ccess ........................................................................................................ 139
9.11.1 SELDCSR JTAG Register...................................................................................139
9.11.1.1 hold_reset............................................................................................140
9.11.1.2 ext_dbg_break .. ...................................................................................140
Developer’s Manual January, 2004 7
Intel XScale® Core Developer’s Manual
Contents
9.11.1.3 DCSR (DBG_SR[34:3])........................................................................140
9.11.2 DBGTX JTAG Register........................................................................................ 1 41
9.11.2.1 DBG_SR[0].......................................................................................... 1 41
9.11.2.2 TX (DBG_SR[34:3]) ............................................................................. 1 41
9.11.3 DBGRX JTAG Register .......................... ................ ................. ................ ............142
9.11.3.1 RX Write Logic.....................................................................................143
9.11.3.2 DBG_SR[0].......................................................................................... 1 43
9.11.3.3 flush_rr.................................................................................................143
9.11.3.4 hs_download ........................................................................................1 43
9.11.3.5 RX (DBG_SR[34:3]).............................................................................1 43
9.11.3.6 rx_valid.................................................................................................144
9.12 Trace Buffer......................................................................................................................145
9.12.1 Trace Buffer Registers.........................................................................................145
9.12.1.1 Checkpoint Registers ...........................................................................146
9.12.1.2 Trace Buffer Registe r ( TBREG)..................... ................. ................ .....147
9.13 Trace Buffer Entries..........................................................................................................148
9.13.1 Message Byte.......................... ................. ................ ................. ................. .........148
9.13.1.1 Exception Message Byte .....................................................................149
9.13.1.2 Non-exception Message Byte..............................................................150
9.13.1.3 Ad d re ss Byte s......................... ................ ................. ................ ............151
9.13.2 Trace Buffer Usage..............................................................................................152
9.14 Downloading Code in the In stru ction Cache.......... ......... ................. ................ .......... .......154
9.14.1 Mini Instruction Cache Overview..................... ................. ................ ................. ..154
9.14.2 LDIC JTAG Command............. ................. ......... ................. ................ .................155
9.14.3 LDIC JTAG Data Register ............ .......... ................ .......... ................ ................. ..155
9.14.4 LDIC Cache Functions.................... ................. ......... ................. ................. .........156
9.14.5 Loading Instruction Cache During Reset........................................................ .....158
9.14.6 Dynam ically Loading Instruc tion Cache After Reset............................................160
9.14.6.1 Dyn a mi c Downl oad Syn ch r o n ization Code.............. ................ ............162
10 Performance Considerations.......................................................................................................1 63
10.1 Interrup t Latency..... ................ ................. ................ ................. ................ ................. .......1 63
10.2 Branch Prediction ....... ................. ................ ................. ................. ................ ...................164
10.3 Addressing Modes............................................................................................................164
10.4 Instruction Latencies................. ................. ................. ................ ................. ................ .....165
10.4.1 Performance Terms............................................................................................. 1 65
10.4.2 Branch Inst r u cti o n Tim ing s..................... ................ ................. ................ ............167
10.4.3 Data Processing Instruction Timings ...................................................................167
10.4.4 Multiply In struction Timin g s................ ................ ................. ................ .......... .......168
10.4.5 Saturated Arithmetic Instructions.........................................................................170
10.4.6 Status Register Access Instructions ....................................................................170
10.4.7 Load/Store Instructions........................................................................................171
10.4.8 Sema phore Instruc tions.......................................................................................171
10.4.9 Coprocessor In structions................ ................. ......... ................. ................. .........172
10.4.10 Miscellaneous Instruction Timing........................................................ ............ .....172
10.4.11 Thumb Instructions..............................................................................................1 73
A Optimization Guide......................................................................................................................175
A.1 Introduction.......................................................................................................................1 75
A.1.1 About This Guide .................................................................................................175
A.2 The Int el XSca l e® Core Pipeline....... ................ ................. ................ ................. ..............176
8 January, 2004 Developer’s Manual
Intel XScale® Cor e Developer’s Manual
Contents
A.2.1 General Pipeline Characteristics .........................................................................176
A.2.1.1. Number of Pipeline Stages..................................................................176
A.2.1.2. The Intel XScale® Core Pipeline Organi za tion .................... ................177
A.2.1.3. Out Of Order Completion.....................................................................178
A.2.1.4. Register Scoreboarding .......................................................................178
A.2.1.5. Use of Bypassing.................... ................. ................ ................. ...........178
A.2.2 Instruction Flow Thr o u gh th e Pipe line......... ................ ................. ................. ......179
A.2.2.1. ARM* V5TE Instruction Execution.......................................................179
A.2.2.2. Pipeline Sta l ls ................................... ................ ................. ................ ..179
A.2.3 Main Exe cu tion Pipeline.... ................ ................. ................. ................ ................180
A.2.3.1. F1 / F2 (Instruction Fetch) Pipest a g es....... ................. ................ .........180
A.2.3.2. ID (Instruction Decode) Pipestage.......................................................180
A.2.3.3. RF (Register File / Shifter) Pipestage..................................................181
A.2.3.4. X1 (Execute) Pipestages .....................................................................181
A.2.3.5. X2 (Execute 2) Pipe sta g e.......... ......... ................. ................ ................181
A.2.3.6. WB (write-ba ck)............. ................ ................. ................ ................. ....181
A.2.4 Memor y Pi peline...... ................. ................. ................ ................. ................ .........182
A.2.4.1. D1 and D2 Pipestage. ..........................................................................182
A.2.5 Multiply/Multiply Accumulate (MAC) Pipeline......................................................182
A.2.5.1. Behavioral Description..................................................................... ....182
A.3 Basic Optimizations..........................................................................................................183
A.3.1 Conditional Instructions .......................................................................................183
A.3.1.1. Optimizing Condition Checks...............................................................183
A.3.1.2. Optimizing Branches............................................................................ 184
A.3.1.3. Optimizing Complex Expressions........................................................186
A.3.2 Bit Field Manipulation.......................................................................................... 187
A.3.3 Opti mi zing the Use of Immediate Values........................ ................ ................. ....188
A.3.4 Optimizing Integer Multiply and Divide ................................................................189
A.3.5 Effective Use of Addressing Modes.....................................................................190
A.4 Cac he and Pre fe tch Optimizati o ns........ ................. ................ ................. ................ .........191
A.4.1 Instruction Cache..... .......... ................ ................. ................. ................ ................191
A.4.1.1. Cache Miss Cost.................................................................................. 191
A.4.1.2. Round-Robin Replacement Cache Policy............................................191
A.4.1.3. Code Placement to Reduce Cache Misses .........................................191
A.4.1.4. Locking Code into the Instruction Cache.............................................192
A.4.2 Data and Mini Cache...........................................................................................193
A.4.2.1. Non Cacheable Regions......................................................................193
A.4.2.2. Write-through and Write-back Cached Memory Regions ....................193
A.4.2.3. Read Allocate and Read-write Allocate Memory Regions...................194
A.4.2.4. Creating On-chip RAM.........................................................................194
A.4.2.5. Mini-data Cache...................................................................................195
A.4.2.6. Data Alignment..................... ................ ................. ......... ................. ....196
A.4.2.7. Literal Pools......................................................................................... 197
A.4.3 Cac he Consi d e ra tions....................... ................. ......... ................. ................. ......198
A.4.3.1. Cache Conflicts, Pollution and Pressure..............................................198
A.4.3.2. Memory Page Thrashing......................................................................198
A.4.4 Pre fe tch Consideratio n s.................... ................. ................. ................ ................199
A.4.4.1. Prefetch Distances.................. ................. ................ ................. ...........199
A.4.4.2. Prefetch Loop Scheduling....................................................................199
A.4.4.3. Prefetch Loop Limitations .................................................... .............. ..199
A.4.4.4. Compute vs. Data Bus Bound..............................................................199
A.4.4.5. Low Number of Iterations.....................................................................200
Developer’s Manual January, 2004 9
Intel XScale® Core Developer’s Manual
Contents
A.4.4.6. Bandwidth Limitations ..........................................................................200
A.4.4.7. Cache Memory Considerations............................................................201
A.4.4.8. Cache Blocking....................................................................................203
A.4.4.9. Prefetch Unrolling ................................................................................2 03
A.4.4.10. Pointer Prefetch...................................................................................204
A.4.4.11. Loop Interchange .................................................................................205
A.4.4.12. Loop Fusion ........... ................. ................ ................. ................ ............205
A.4.4.13. Prefetch to Reduc e Register Pressure .......... .......... ................ .......... ..206
A.5 Instruction Schedulin g........................ ................. ................ ................. ................ ............207
A.5.1 Scheduling Loads ................................................................................................207
A.5.1.1. Scheduling Load and Store Double (LDRD/STRD) .............................2 10
A.5.1.2. Scheduling Load and Store Multiple (LDM/STM)................................. 211
A.5.2 Scheduling Data Processing Instruc tions ......................... ................ ................. ..212
A.5.3 Scheduling Multiply Instructi o n s .......... ................. ................ ................. ..............213
A.5.4 Scheduling SWP and SWPB Instructions............................................................214
A.5.5 Scheduling the MRA and MAR Instructions (MRRC/MCRR)...............................2 15
A.5.6 Scheduling the MIA and MIAPH Instructions.......................................................216
A.5.7 Scheduling MRS and MSR Instructions......................... ................. ................ .....217
A.5.8 Scheduling CP15 Coproc e sso r In str u ctions................ ......... ................. ..............217
A.6 Optimizing C Libraries ......................................................................................................218
A.7 Optimizations for Size.......................................................................................................218
A.7.1 Space/Performance Trade Off.............................................................................218
A.7.1.1. Multiple Word L oad and Sto r e................ ................. ................ ............218
A.7.1.2. Use of Conditional Instructions ............................................................ 218
A.7.1.3. Use of PLD Instructions....................................................................... 2 18
B Test Features............... ................ ................. ................. ................ ................. ............................219
B.1 Overview...........................................................................................................................219
10 January, 2004 Developer’s Manual
Intel XScale® Cor e Developer’s Manual
Contents
Figures
1-1 Architecture Features .................................................................................................................16
3-1 Example of Locked Entries in TLB .............................................................................................45
4-1 Instruction Cache Organization ..................................................................................................47
4-2 Locked Line Effect on Round Robin Replacement.....................................................................54
5-1 BTB Entry.... .......... ................ ................. ................ ................. ................. ................ ..................57
5-2 Branch Histor y.............. ................ ................. ................. ................ ................. ...........................58
6-1 Data Cache Organization ................................................................. .......... .. ....... ....... .......... ......62
6-2 Mini-Data Cache Organization ...................................................................................................63
6-3 Locked Line Effect on Round Robin Replacement.....................................................................74
9-1 SELDCSR.................................................................................................................................139
9-2 DBGTX.....................................................................................................................................141
9-3 DBGRX .....................................................................................................................................142
9-4 Message Byte Form at s.................... ................ ................. ......... ................. ................ ......... .....148
9-5 Indirect Branch Entry Address Byte Organization ....................................................................151
9-6 High Level Vie w of Trace Bu ffer........................... ................. ................ ................. ................ ..152
9-7 LDIC JTAG Data Register Har dware.................... ................. ......... ................. ......... ................155
9-8 Format of LDIC Cache Functions.............................................................................................157
9-9 Code Download During a Cold Reset For Debug .....................................................................158
9-10Downloading Code in IC During Pro gram Execution............... ................. ......... ................. ......160
A-1 The Intel XScal e® Core RISC Superpipeline.................... ................ ................. ................. ......177
Developer’s Manual January, 2004 11
Intel XScale® Core Developer’s Manual
Contents
Tables
2-1 Multiply with Internal Accumulate Format...................................................................................24
2-2 MIA{<co nd> } ac c0 , Rm, Rs................. ................ ................. ......... ................. ................. ........... 25
2-3 MIAPH{<co n d>} a cc0 , Rm, Rs....... ................. ......... ................. ................ ................. ......... .......25
2-4 MIAxy{<cond>} acc0, Rm, Rs............................ .. ..... ....... ..... .. ....... ..... ....... ..... .. ....... ..... ..... .........26
2-5 Internal Accumulator Access Format..........................................................................................27
2-6 MAR{<cond>} acc0, RdLo, RdHi ................................................................................................28
2-7 MRA{<cond>} RdLo, RdHi, acc0 ................................................................................................28
2-9 Second-level Descriptors for Coarse Page Table.......................................................................30
2-10Second-level Descriptors for Fine Page Table ................................................. ....... ..... ..... ....... ..30
2-8 First- level Descriptor s..................... ................. ................ ................. ................ ..........................30
2-11Exception Summary....................................................................................................................32
2-12Event Priority .................. ................ ................. ................ ................. ................ ..........................32
2-13Encoding of Fault Status for Prefetch Aborts..............................................................................33
2-14Encoding of Fault Status for Data Aborts...................................................................................34
3-1 Data Cache and Buffer Behavior when X = 0.............................................................................39
3-2 Data Cache and Buffer Behavior when X = 1.............................................................................39
3-3 Memory Operations that Impose a Fence..................... ................ ................. ................. ...........40
3-4 Valid MMU & Data/mini-data Cache Combinations ....................................................................41
7-1 MRC/MCR Format...................................................................................................................... 78
7-2 LDC/STC Format when Acce ssi n g CP14..... ................. ......... ................. ................ .......... .........79
7-3 CP15 Registers...........................................................................................................................80
7-4 ID Register.......... ................ ................. ................ .......... ................ ................. ............................81
7-5 Cache Type Register ..................................................................................................................82
7-6 ARM* Control Register ..................... ................. ................. ................ .......... ................ ..............83
7-7 Auxiliary Control Register...........................................................................................................84
7-8 Translation Table Base Register ................................................................................................85
7-9 Domain Access Control Register................................................................................................85
7-10Fault Status Register.......... ................. ................ ................. ................. ................ ..................... 86
7-11Fault Add r e ss Regi ste r.......................... ................. ................ ................. ................ ......... .......... 86
7-12Cache Functions.........................................................................................................................87
7-13T LB Functions.............................................................................................................................89
7-14Cache Lockdown Functions........................................................................................................90
7-15Data Cache Lock Register .......................................................................................................... 90
7-16T LB Lockdown Functions ...........................................................................................................91
7-17Accessing Process ID............................ ................. ................ ................. ................ ......... ..........91
7-18Process ID Register............... ................ ................. ................ .......... ................ .......................... 91
7-19Accessing the Debug Registers............................................................... .. ....... ....... .......... .. .......93
7-20Coprocessor Access Register ....................................................................................................95
7-21Accessing the XSC1 Performance Moni to r ing Registers ................... ................. ......... ..............96
7-22Accessing the XSC2 Performance Moni to r ing Registers ................... ................. ......... ..............97
7-23PWRMODE Register..................................................................................................................98
7-24Clock and Power Management...................................................................................................98
7-25CCLKCFG Register....................................................................................................................98
7-26Accessing the Debug Registers............................................................... .. ....... ....... .......... .. .......99
8-1 XSC1 Performan c e Mon itoring Register s........ ......... ................. ................ ................. ..............102
8-2 Clock Count Register (CCNT) ............................. ..... ..... .. ... .. .. ..... .. ..... ... .. .. ..... .. ..... .. ... .. ..... .. .....102
8-3 Performance Monitor Count Register (PMN0 and PMN1)........................................................103
8-4 Perform a nc e Moni to r Control Register (CP14 , r e g ister 0).... ......... ................. ................. .........104
8-5 Performance Monitoring Registers........................................................................................... 1 06
12 January, 2004 Developer’s Manual
Intel XScale® Cor e Developer’s Manual
Contents
8-6 Clock Count Register (CCNT) .................. ... .. ..................... ... .. .. ... .. .. ... .. ... .. .. ... .. .. ... .. .. ... .. ... .. .. ..106
8-7 Performance Monitor Count Register (PMN0 - PMN3) ............................................................107
8-8 Performance Monitor Control Register.....................................................................................108
8-9 Interrup t En able Register................. ................ ................. ................ ........................ ................109
8-10Overflow Flag Status Register..................................................................................................110
8-11Event Select Register...............................................................................................................111
8-12Performance Monitoring Events...............................................................................................113
8-13Some Common Uses of the PMU ............................................................................................ 114
9-1 Debug Control and Status Register (DCSR) ............................................................................123
9-2 Event Priority............................................................................................................................126
9-3 Halt Mode R14_DBG Updating .................................................................................. ............ ..127
9-4 Monitor Mode R14_DBG Updating.................................................................... ....... .. .......... ....129
9-5 Instruct ion Br e a kp oin t Ad d r ess and Control Register (IBCRx)......... .......... ................ .......... ....130
9-6 Data Breakpoint Register (DBRx).............................................................................................131
9-7 Data Breakpoint Controls Register (DBCON)...........................................................................131
9-8 TX RX Control Register (TXRXCTRL)......................................................................................134
9-9 Normal RX Handshaking..........................................................................................................135
9-10High-Speed Download Handsha king Stat es ............................................................................135
9-11TX Handshaking....................................................................................................................... 137
9-12TXRXCTRL Mnemonic Exte n sions ............... ................. ................ ................. ......... ................137
9-13TX Register............................................................................................................................... 138
9-14RX Register.................... ................. ................ ................. ................ ................. ......... ..............138
9-15CP 14 Trace Buffer Regi ste r Su mmar y.............. ................ ................. ................ ................. ....145
9-16Checkpoint Register (CHKPTx)................................................................................................146
9-17TBREG Format.........................................................................................................................147
9-18Message Byte Formats................. ................. ................. ................ ................. ................ .........148
9-19LDIC Cache Function s........ ................ ................. ................. ................ .......... ................ .........156
9-20Steps For Loading Mini Instruction Cache During Reset ..........................................................159
9-21Steps For Dynamically Loading the Mini Instruction Cache .....................................................161
10-1Branch Late n cy Penalty......... ................. ................ ................. ................. ................ ................164
10-2Latency Exampl e........................ ................. ................ ................. ................ ................. ...........166
10-3Branch Inst r u cti o n Timin g s ( Tho se pred icted by the BTB).................. ................ ................. ....167
10-4Branch Inst r u cti o n Tim ing s ( Tho se not predicted by the BTB)....... ................. ................ .........167
10-5Data Processing Instruction Timings........................................................................................167
10-6Multipl y In struction Timing s........................... ................. ................ ................. ................ .........168
10-7Multiply Implicit Accumulate Instruction Timings......................................................................169
10-8Implicit Accumulator Access Instruction Timings......................................................................169
10-9Saturated Data Processing Instruction Timings .......................................................................170
10-10Status Register Access Instruction Timings............................................................................170
10-11Load and Store Instruction Timings ........................................................................................171
10-12Load and Store Multiple Instruction Timings...........................................................................171
10-13Semaphore Instruction Timings ..............................................................................................171
10-14CP15 Registe r Acce ss Instruction Ti mi n gs......... ................. ................ ................. ................ ..172
10-15CP14 Registe r Acce ss Instruction Ti mi n gs......... ................. ................ ................. ................ ..172
10-16Exception-Generating Instruction Timings.............................................................. ............ ....172
10-17Count Leading Zeros Instruction Timings .......................................................... ............ ....... ..172
A-1 Pipelines and Pipe stages ............................................................................ ....... .....................177
Developer’s Manual January, 2004 13
Intel XScale® Core Developer’s Manual
Introduction
Introduction 1
1.1 About This Document
This document is the author itative and definitive referenc e f o r the external archite cture of the In tel XScale
®
core1.
This documen t describes two variants of the Intel XScale® core that differ only in the performance monitoring and the size of the JTAG instruction register. Software can detect which variant it is running on by examining the CoreGen field of Coprocesso r 15, ID Register (bits 15:13). (See
Table 7-4, “ID Register” on page 7-81 for more details.) A CoreGen value of 0x1 is referred to as
XSC1 and a value of 0x2 is referred to as XSC2. Intel Corporation assumes no responsibility for any errors which may appear in this document nor
does it make a commitment to update the information contained herein. Intel reta ins the right to make changes to these specifications at any time, without notice. In
particular, descriptions of features, timi ngs , and pin-outs does not imply a comm itment to implement them.
1.1.1 How to Read This Document
It is necessary to be familiar with the ARM Version 5TE Architecture in order to unde rs tand some aspect s of th i s do cu ment.
Each chapter in this document f ocuses on a specifi c architec tur al feature of the Intel XScale® core.
Ch ap te r 2 , “P r o g r am m i n g M o de l ”
Chapter 3, “Memory Management”
Chapter 4, “Instruction Cache”
Chapter 5, “Branch Target Buffer”
Ch ap te r 6 , “D ata Cache”
Chapter 7, “Configuration”
Ch ap ter 8, “Per fo rmance Mon i to r i ng ”
Ch ap te r 9 , “S o f t w ar e De bu g ”
Ch ap te r 1 0 , “P e r f ormance Co n si de r at io n s”
Severa l ap pendices are also p resent:
Appendix A, “Optimization Guide” covers instruction scheduling techniques.
Appendix B, “Test Features” describes the JTAG unit.
Note: All the “buzz words” and acronyms found throughout this document are captured in Section 1.3.2,
“Terminology and Acronym s” on page 1-19, located at the end of this chap ter.
1. ARM* archi tectu r e co mplia nt .
14 January, 2004 Developer’s Manual
Intel XScale® Cor e Developer’s Manual
Introduction
1.1.2 Other Relevant Documents
ARM Architecture Version 5TE Specification Docume nt Num ber: ARM DDI 0100E
This document describes Version 5TE of the ARM Architecture which includes Thumb ISA and ARM DSP-Enhanced ISA. (ISBN 0 201 737191)
StrongARM SA-1100 Microprocessor Developer’s Manual, Intel Order # 278105
StrongARM SA-110 Microprocessor Technical Reference Manual, Intel Order #278104
Developer’s Manual January, 2004 15
Intel XScale® Core Developer’s Manual
Introduction
1.2 High-Level Overview of the Intel XScale® Core
The Intel XScale® core is an ARM V5TE compliant mic roprocessor. It has been designed for high performance and low-power; leading the industry in mW/MIPs. The core is not int ended to be delivered as a stand alone product but as a building block for an ASSP (Application Specific Standard Product) with embedded markets such as handheld devices, networking, storage, remote access s erv ers, etc.
The Intel XScale® core inco r porates an extens iv e list of architecture features that allows it to achiev e hi g h per f o r ma nce. This r ic h feature set al lo w s p rog r ammers to select the app r opr i at e features that obtains the b est performance for their app l ication . Many of the architectural features adde d to th e Intel XS ca le
®
core help hide memory latency which often is a serious impedi me nt to
high performance proces sors. This includes:
the ability to continue ins truction executio n even while the data cache is retrieving data from
external memory.
a wr it e b uffe r.
write-back cachi n g .
various data cache allocation policies which can be configured different for each application.
and cache locking.
All these featur es improve the eff ic ie n cy of the mem o r y bus ex te r na l to th e core. The Intel XScale® core has been equipped to efficiently handle audio processing through the
support of 16-bit data t ypes a nd 16-b it opera tions . The se audio c odin g enhanc ements cente r ar ound multiply and accu mul ate operations which acce lerate many of the audio filter operations.
1.2.1 ARM Compatibility
ARM Version 5 (V5) Architecture added floating point instructions to ARM Version 4. The Intel XScale
®
core imp lements the int eger instruction s et architec ture of ARM V5, but does not provide
hardware support of the floating point instructions. The Intel XScale® core provides the Thumb ins truction set (ARM V5T) and the ARM V5E DSP
extensions. Backward compati bility with StrongARM* products is maintained for user-mode applications.
Operating systems may require modifications to match the specific hardware feat ures of the Intel XScale
®
core and to take advantage of the performance enhancements added.
16 January, 2004 Developer’s Manual
Intel XScale® Cor e Developer’s Manual
Introduction
1.2.2 Features
Figure 1-1 shows the major functional blocks of the Intel XSca le® core. The following sections
give a brief, high-level overview of these blocks.
1.2.2.1 Multiply /Accu mu late (MAC)
The MAC unit s upports early ter minat ion o f multi pli es/ac cumula tes in tw o cycle s and can sust ain a throughput of a MAC operation every cycle. Several architectural enhancements were made to the MAC to support audio coding algorithms, which include a 40-bit accumulator and support for 16-b i t packed data.
See Section 2.3, “Extensions to ARM Architecture” on page 2-23 for more details.
Figure 1-1. Architecture Features
Write Buffer
• 8 entries
• Full coalescing
Fill Buffer
• 4 - 8 entries
Instruction Cache
• 32K or 16K bytes
• 32 ways
• Lockable by line
IMMU
• 32 entry TLB
• Fully associative
• Lockable by entry
DMMU
• 32 en try TLB
• Fully Associative
• Lockable by entry
JTAG
Debug
• Hardware Breakpoints
• Branch History Table
Branch Target Buffer
• 128 entries
MAC
• Single Cycle Throughput (16*32)
• 16-bit SIMD
• 40 bit Accumulator
Data Cache
• 32K or 16K bytes
• 32 ways
• wr-back or wr-through
• Hit under miss
Data RAM
• 28K or 12K bytes
• Re-map of data cache
Power Mgnt Ctrl
Mini­Data Cache
• 2K or 1K bytes
• 2 ways
Performance Monitoring
Developer’s Manual January, 2004 17
Intel XScale® Core Developer’s Manual
Introduction
1.2.2.2 Memory Manage men t
The Intel XScal e® core implements the Memory Management Unit (MMU) Architecture specified in the ARM Arch itecture Referenc e Manual. The MMU provides access protection and virtual to physical address tra n slation.
The MMU Architectur e also speci fies the caching policies for the in struct io n cache and d ata memory. These policies are specified as page attributes and include:
identifying code as cacheab le or non-cacheable
selecting between the mini-da ta c ache or data cache
write-back or write-through data caching
enabling data write allocation policy
and en ab l in g th e wr it e bu ffer to coales ce s tor es to ex ternal memory
Chapter 3, “Memory Management” discusses this in more detail.
1.2.2.3 Instruction Cache
The Intel XScale® core comes with either a 16 K or 32 K byte instruction cache. The size is determined by the ASSP. The instruction cache is 32-way set associative and has a line size of 32 bytes. All requests that “miss” the instruction cache generate a 32-byte read request to ext ernal memory. A mechanism to lock critical code within the cache is also provided.
Chapte r 4 , “I n s t r uc t io n Cac h e” discusses this in more detail.
1.2.2.4 Branch Target Buffer
The Intel XScale® core provides a Branch Target Buffer (BTB) to predict the outcome of branch type instr u ctions. It provides storage f o r the target address of b r an ch type ins tructions and predicts the next addre ss to present to the in struction cache when the current instruction address is tha t of a branch.
The BTB holds 128 entries. See Chapter 5, “Branch Target Buffe r ” for more details.
1.2.2.5 Data Cache
The Intel XScale® core comes with either a 16 K or 32 K byte data cache. Th e size is determined by the ASSP. Besides the main data cache, a mini-data cache is provid ed whos e size is 1/16
th
the main data cache . So a 32K, 16 K byte main data cache would have a 2 K, 1 K byte mini-d ata ca che respectively. The main data cach e is 32-way se t associative and th e mini-data cache is 2-way set associative. Each cache has a line size of 32 bytes, supports write - through or write-back cachin g.
The data/mini-data cache is controlled by page attributes defined in the MMU Architecture and by coprocessor 15.
Chapte r 6 , “D ata Cache” discusses all this in more detail.
The Intel XScale® core allows applications to re-configure a portion of the data cache as data RAM. Software may place special tables or frequently used variables in this RAM. See
Section 6.4, “Re-config uring the Data Cache as Data RAM” on page 6-71 for more information on
this.
18 January, 2004 Developer’s Manual
Intel XScale® Cor e Developer’s Manual
Introduction
1.2.2.6 Performance Monitoring
Perfor man ce mon it oring co u nter s hav e b een add ed to th e In te l XSca le® core th at ca n be con f igu red to monitor variou s events in th e core. Th ese events allow a software deve lo per to measure cach e efficiency, detect system bottlenecks and reduce the overall latency of programs.
Chapter8, “Performance Monitoring” discusses this in more det ail.
1.2.2.7 Power Managemen t
The Inte l X Scale® core incorporates a power and clock management unit that can assist ASSPs in controlling their clocking and managing their power . T hes e features are described in Section 7.3,
“CP14 Register s” on page 7-96.
1.2.2.8 Debug
The Inte l X Scale® core supports software debugging through two instruction address break point registers, one da ta-address breakpoint register , one data-address/mask br ea kpoint register , and a trace b u ffer.
Chapter 9, “Software Debug” discusses thi s in m o r e de tail.
1.2.2.9 JTAG
Testabili ty is supported on the Intel XScale® core through the Test Access Port (TAP) Controller implementation, which is based on IEEE 1149.1 (JTAG) Standard Test Acc es s Po rt and Boundary-Scan Arc hitec ture. The purp ose of t he TAP contr oll er i s to suppo rt te st l ogic int er nal and external to th e core such as built-in self-test and boundary-scan.
Appendix B discusses this in more detail.
Developer’s Manual January, 2004 19
Intel XScale® Core Developer’s Manual
Introduction
1.3 Terminology and Convent ions
1.3.1 Number Representation
All numbers in this document can be assumed to be base 10 unless designated otherwise. In text and pseudo code descriptions, hexadecimal numbers have a prefix of 0x and binary numbers have a prefix of 0b. For example, 107 would be represented as 0x6B in hexadecimal and 0b1101011 in binary.
1.3.2 Ter minology and Acronyms
ASSP Application Specific Standard Product Assert This term refers to the logically active value of a signal or bit. BTB Branch Target Buffer Clean A clean operation upda tes ext ern al memory with the con tents of t he spe cif ied li ne in
the data /min i- da ta ca ch e i f an y o f t he dirt y b its a re s et a nd t he l in e is v al id. Th ere ar e two dirty bits ass o ciated with each line in the cache so only the portion that is dirty will get written back to external memory.
After this operation, the line is still valid and both dirty bi ts are deasserted.
Coalescing Coalescing means bringing together a new store operation with an existing store
operation already resident in the write buffer. The new store is placed in the same write buffer entr y as an existing store when the address of the new store falls in the 4 word aligned add ress of the exis ting entry. This includes, in PCI termin ology , writ e
merging, write collapsing, and write combining. Deassert This term refers to the logica lly inactive value of a signal or bit. Flush A flush oper ati on inva li dates the l ocati on(s) i n t he c ache by d easse rting t he vali d bi t.
Individual e ntries ( lin es) may be flush ed or t he ent ire c ache may be fl ushed wit h one
command. Once an entry is flushed in the cache it can no longer be used by the
program. XSC1 XSC1 r efers to a variant of the Intel XScale® core denoted by a CoreGen
(Coprocessor 15 , ID Regis ter) value of 0x1. This varia nt has a 2 counter pe rformance
monitor and a 5-bit JTAG instruction register. See Table 7-4, “ID Register” on
page 7-81 for more details.
XSC2 XSC2 r efers to a variant of the Intel XScale® core denoted by a CoreGen
(Coprocessor 15 , ID Regis ter) value of 0x2. This varia nt has a 4 counter pe rformance
monitor and a 7-bit JTAG instruction register. See Table 7-4, “ID Register” on
page 7-81 for more details.
Reserved A reserved field is a field th at may be us ed by an im pl ement at ion . I f th e in iti al val ue
of a reserved field is supplied by software, this value must be zero. Software should
not modify reserved fields or depend on any values in reserved fields.
20 January, 2004 Developer’s Manual
Intel XScale® Cor e Developer’s Manual
Introduction
This Page Intentionally Left Blank
Developer’s Manual January, 2004 21
Intel XScale® Core Developer’s Man u al
Programming Model
Programming Model 2
This chapter desc ribes the programming model of the Intel XScale® core, namely the implementation options and extensions to the ARM Version 5TE architecture.
2.1 ARM Architecture Compatibility
The Intel XScale® core implements the in teger instruction set arc hitecture specified in ARM V5TE. T refers to the Thumb instr uction set and E refers to the DSP-Enhanced ins truction set.
ARM V5TE in tro duce s a f e w more arc hi tec tu re fe at ures ov er ARM V4, sp ecif ica ll y the add iti on of tiny pages (1 Kbyte), a new instruction (CLZ) that counts the leading zeroes in a data value, enhanced ARM-Thumb transfer instructions and a modification of the sys tem control coprocessor , CP15.
2.2 ARM Architecture Implementation Options
2.2.1 Big Endian versus Little Endian
The Intel XScale® core supports both big and little endian data representation. The B-bit of the Control Register (Coproces sor 15, register 1, bit 7) selects big and lit tl e endia n mode. To run in big endian mode, the B bit must be se t before attempting any sub-word acce sses to memory, or undefined results will occur. Note that this bit takes effect even if the MMU is disabled.
2.2.2 26-Bit Architecture
The Intel XScale® core does not support 26-bit architecture.
2.2.3 Thumb
The Intel XScale® core supports the Thumb instruction set.
22 January, 2004 Developer’s Manual
Intel XScale® Cor e Developer’s Manual
Programm i ng M odel
2.2.4 ARM DSP-Enh an ce d In stru ct ion Set
The Inte l X Scale® core implements the ARM DSP-enhanced instruction set which is a set of instructi ons that boost the performanc e of s ignal processing appl ications. There are new multiply instructions that operate on 16-bit data values and new saturation instructions. Some of the new instructions are:
SMLAxy 32<=16x16+32
SMLAWy 32<=32x16+32
SMLALxy 64<=16x16+64
SMULxy 32<=16x16
SMULWy 32<=32x16
QADD adds two registers and saturates the result if an overflow occurred
QDADD doubles and satura tes one of the input registers then add and saturate
QSUB subtracts two registers and saturates the result if an overflow occurred
QDSUB doubles and saturates one of the input re gis ters then subtract and saturate
The Intel XScale® core also implements LD RD, S TRD and PLD instructions with the following implementation notes:
PLD is interpreted as a read operat ion by the MMU and is ignored by the data bre akpoint unit
(i.e., PLD will neve r generate data breakpoi nt events).
PLD to a non-cacheable page performs no action. Also, if the targeted cache line is already
resident, this instruction has no af fect.
Both LDRD and STRD instructions will generate an alignment exception when the address
bits [2:0] = 0b100.
MCRR and MRRC are only supported on the Intel XScale® core when di r ected to coprocessor 0 and are used to access the internal accumulator. See Section 2.3.1.2 for more information. Access to coprocesso rs 15 and 14 generate an undefined ins truction exception. Refer to the Intel XScale
®
core implementa tion option section of the ASSP architecture specifi ca tion for the behavior when accessi ng al l ot h er co p r oc es s o r s.
2.2.5 Base Register Update
If a data abort is signalled on a memory instruction that specifies writeback, the contents of the base register will not be updated. This holds for all load and store instructions. This behavior matches that of the first generation StrongARM processor and is referred to in the ARM V5TE archite ct u r e as th e Base Restored Abort Model.
Developer’s Manual January, 2004 23
Intel XScale® Core Developer’s Man u al
Programming Model
2.3 Extensions to ARM Architecture
The Intel XScale® core made a few extensions to the ARM Version 5TE architecture to meet the needs of various marke ts and design requirements. The following is a list of the extensions which are discussed in the next sections.
A DSP coprocessor (CP0) has been added that contains a 40-bit accumulator and eight new
instructions.
New page attributes were added to the page table descriptors. The C and B page attribute
encoding was extended by one more bit to allow for more encodings: write allocate and mini-data cache. An ASSP definable attribute (P bit) was also added.
Additional functionality has been added to coprocessor 15. Coprocessor 14 was also created.
Enhancements were made to the Eve n t Ar chitecture , which include instruction cache and data
cache parity error ex ceptions, breakpoint events, and imprecise external data aborts.
2.3.1 DSP Coprocessor 0 (CP0)
The Intel XScale® core adds a DSP coprocessor to the architecture for the purpose of increasing the performance and the pre cision of audio processing algorithms. This coproc essor contains a 40-bit accumulator and 8 new instructions.
Note: Products using the Intel XScale® core may extend the definition of CP0. If this is the case, a
complete definition can be found in the Intel XScale
®
core implementation option section of the ASSP architecture specification. For this very reason, software should not rely on behavior that is specific to the 40-bit length of the accumulator, since the length may be extended.
The 40-bit accumulat or is refe renced by several new instructi ons that were added to the architecture; MIA, MIAPH and MIAxy are multiply/accumulate instruc tions that ref erence the 40-bit accumulator instead of a register specified accumulator. MAR a nd MRA pr ovide the a b ility to read and write the 40-bit accumulator.
Access to CP0 is always allowed in all processor modes when bit 0 of the Coprocessor Access Register is set. Any access to CP0 when this bit is clear will cause an undefined exception. (See
Section 7.2.15, “Register 15: Coprocessor Access Register” on page 7-94 for more details).
Note: Only pr ivilege d softwar e can set thi s bi t in th e Coproce s so r Ac cess Reg is ter.
The 40-bit a ccumulator will need to be sa ved on a context switch if mu ltiple proces ses are using it. Two new instruction formats were added for coproc ess or 0: Multiply with Internal Accumulate
Format and Internal Accumulate Access Format . The formats and instructions are described next .
24 January, 2004 Developer’s Manual
Intel XScale® Cor e Developer’s Manual
Programm i ng M odel
2.3.1.1 Multiply With Internal Accumulate Format
A new multiply format has been created to define operations on 40-bit accumulators. Table 2-1,
“Mul tiply wi t h I nterna l A ccumulate For mat” on page 2-2 4 shows the layout of the new format.
The opcode for this forma t lies within the coprocessor register transfer instruction type. These instructions have their own syntax.
Two new fields were created for this format, acc and opcode_3. The acc field specifies 1 of 8 intern al accumu lators to op erate on an d opcode_3 defines the operation for this format. The Intel XScale
®
core defines a single 40-bit accum ulator referred to as acc0; future implementations may
define multiple internal accumulators. The Intel XScale
®
core uses opcode_3 to de fine six
instructions, MIA, MIAPH, MIAB B, MIABT, MIATB and MIATT.
Table 2-1. Multiply with Internal Accumulate Format
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
cond 1 1 1 0 0 0 1 0 opcode_3 Rs 0 0 0 0 acc 1 Rm
Bits Description Notes
31:28 cond - ARM condition codes -
19:16
opcode_3 - specifies the type of multiply with
internal ac c umulate
The Intel XScale® core defines the following: 0b0000 =
MIA
0b1000 = MIAPH 0b1100 = MIABB 0b1101 = MIABT 0b1110 = MIATB 0b1111 = MIATT The effect of all other encodings are unpredictable.
15:12 Rs - M u ltiplier
7:5 acc - select 1 of 8 ac cumulators
The Intel XScale® core only implements acc0; access to any other acc has an unpredictable effect.
3:0 Rm - Multiplicand -
Developer’s Manual January, 2004 25
Intel XScale® Core Developer’s Man u al
Programming Model
The MIA instruction operates similarly to MLA except t h at the 40-bit accumulator is used. MIA multiplie s the signed value in register Rs (multiplier) by the signed value in register Rm (multiplicand) and then adds the result to the 40-bit accumulator ( acc0).
MIA does not support unsigned multiplication; all values in Rs and Rm will be interpreted as signed data val ues. MIA is useful f or operat ing on sig ned 16-bit dat a that was lo aded int o a general purpose register by LDRSH.
The instruction is only executed if the condition specified in the instruction matches the condition code status.
The MIAPH instruction performs two16-bit si gned multiplies on packed half word data and accumulates these to a single 40-bit accumulator. The first signed multiplication is performed on the lower 16 bits of the value in register Rs with the lower 16 bits of the value in register Rm. The second signed multiplication is perfo rme d on the upper 16 bits of the value in register Rs with the upper 16 bits of the value in register Rm. Both si gned 32-bit products are si gn extended and then added to the value in the 40-bit accumulator (acc0).
The instruction is only executed if the condition specified in the instruction matches the condition code status.
Table 2-2. MIA{<cond>} acc0, Rm, Rs
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
cond 1 1 1 0 0 0 1 0 0 0 0 0 Rs 0 0 0 0 0 0 0 1 Rm
Operation: if ConditionPassed(<cond>) then
acc0 = (Rm[31:0] * Rs[31:0])[39:0] + acc0[39:0] Exceptions:none Qualifiers Condition Code
No condition code flags are updated
Notes: Early termination is supported. Instruction timings can be found
in Section 10.4.4, “Multiply Instruction Timings” on page 10-168. Specifying R15 for register Rs or Rm has unpredictable results. acc0 is defined to be 0b000 on the core.
Table 2-3. MIAPH{<cond>} acc0, Rm, Rs
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
cond 1 1 1 0 0 0 1 0 1 0 0 0 Rs 0 0 0 0 0 0 0 1 Rm
Operation: if ConditionPassed(<cond>) then
acc0 = sign_extend(Rm[31:16] * Rs[31:16]) +
sign_extend(Rm[15:0] * Rs[15:0]) +
acc0[39:0] Exceptions:none Qualifiers Condition Code
S bit is always cleared; no condition code flags are updated
Notes: Instruction timings can be found
in Section 10.4.4, “Multiply Instruction Timings” on page 10-168. Specifying R15 for register Rs or Rm has unpredictable results. acc0 is defined to be 0b000 on the core
26 January, 2004 Developer’s Manual
Intel XScale® Cor e Developer’s Manual
Programm i ng M odel
The MIAxy instruction performs one16-bit signed multiply and accumulates these to a single 40-bit accumulator. x refers to either the upper half or lower half of regist er Rm (multiplica nd) and y refers to the upper or lower half of Rs (multiplier). A value of 0x1 will sele ct bits [31:16] of the register which is specified in the mnemonic as T (for top ) . A val ue of 0x0 will select bits [15:0] of the register which is speci f ied in the mnemonic as B (for bottom).
MIAxy does not support unsigned multiplication; all values in Rs and Rm will be interpreted as signed data values.
The instruction is only executed if the condition specified in the instruction matches the condition code status.
Table 2-4. MIAxy{<cond>} acc0, Rm, Rs
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
cond 1 1 1 0 0 0 1 0 1 1 x y Rs 0 0 0 0 0 0 0 1 Rm
Operation: if ConditionPassed(<cond>) then
if (bit[17] == 0)
<operand1> = Rm[15:0]
else
<operand1> = Rm[31:16]
if (bit[16] == 0)
<operand2> = Rs[15:0]
else
<operand2> = Rs[31:16]
acc0[39:0] = sign_extend(<operand1> * <operand2>) + acc0[39:0]
Exceptions:none Qualifiers Condition Code
S bit is always cleared; no condition code flags are updated
Notes: Instruction timings can be found
in Section 10.4.4, “Multiply Instruction Timings” on page 10-168. Specifying R15 for register Rs or Rm has unpredictable results. acc0 is defined to be 0b000 on the core.
Developer’s Manual January, 2004 27
Intel XScale® Core Developer’s Man u al
Programming Model
2.3.1.2 Internal Accumulator Access Format
The Intel XScale® core defines a new instruction format for acces sing internal accumulators in CP0. Table 2-5, “Internal Accumulator Access Format” on pa ge 2-27 shows that the opcode falls into the coprocessor register transfer space.
The RdHi and RdLo fields allow up t o 64 bits of data transfer betwee n StrongARM re gist ers an d an internal accumulator. The acc field specifies 1 of 8 internal accumulators to transfer data to/from. The core implements a sin gle 40-bit accum u lator referred to as acc0; future implementations can specify multiple internal accumulators of varying sizes, up to 64 bits.
Access to the internal accumulator is allowed in all processor modes (user and privileged) as long bit 0 of the Coproces sor Access Register is set. (See Section 7.2.15, “Register 15: Coprocessor
Access Register” on page 7-94 for more details).
The Intel XScale® core imp le m e nts two ins t r uc tions MAR and MRA that move two ARM registers to acc0 and move acc0 to two ARM regis ters, respectively.
Note: MAR has t h e same encoding as MCRR (to coproc essor 0) and MRA has the s a m e en c od i n g as
MRRC (to coprocessor 0). These instructions move 64-bits of data to/from ARM registers from/t o
coprocessor registers. MCRR and MRRC are defined in ARM’s DSP ins truction set. Disassemblers not aware of MAR and MRA will produce the following syntax:
MCRR{<cond>} p0, 0x0, RdLo, RdHi, c0 MRRC{<cond>} p0, 0x0, RdLo, RdHi, c0
Table 2-5. Internal Accumulator Access For mat
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
cond 1 1 0 0 0 1 0 L RdHi RdLo 0 0 0 0 0 0 0 0 0 acc
Bits Description Notes
31:28 cond - ARM condition codes -
20
L - move to/from internal accumulator
0= move to internal accumulator (MAR) 1= move from internal accumulator (MRA)
-
19:16
RdHi - specifies the high order eight (39:32)
bits of the internal accumulator.
On a read of the acc, this 8-bit high order field will be sign extended.
On a write to the acc, the lower 8 bits of this register will be written to acc[39:32]
15:12
RdLo - specifies t he low order 32 bits of t he
internal accumulator
-
7:4 Should be zero
This field could be used in future implementatio ns to specify the type of saturati on to pe rf orm o n t he r ead o f an in ter na l accumulator. (e.g., a signed saturation to 16-bits m ay be usef ul for some filter algorithms.)
3 Should be zero
-
2:0 acc - specifies 1 of 8 internal accumulators
The core only imple ments acc0; access to
any other acc is unpredictable
28 January, 2004 Developer’s Manual
Intel XScale® Cor e Developer’s Manual
Programm i ng M odel
The MAR instruction moves the value in register RdLo to bits[31:0] of the 40-bit accumulator (acc0) and moves bits [7:0] of the value in registe r RdHi into bits[39:32] of acc 0.
The instruction is only executed if the condition specified in the instruction matches the condition code status.
This instruction executes in any pr ocessor mode.
The MRA instruction moves the 40-bit accumulator value (acc0) into two regi sters. Bits[31:0] of the value in acc0 are moved into the register RdLo. Bits[39:32] of the value in acc0 are sign extended to 32 bits and moved into the regi st er RdHi.
The instruction is only executed if the condition specified in the instruction matches the condition code status.
This instruction executes in any pr ocessor mode.
Table 2-6. MAR{<cond>} acc0, RdLo, RdHi
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
cond 1 1 0 0 0 1 0 0 RdHi RdLo 0 0 0 0 0 0 0 0 0 0 0 0
Operation: if ConditionPassed(<cond>) then
acc0[39:32] = RdHi[7:0]
acc0[31:0] = RdLo[31:0] Exceptions:none Qualifiers Condition Code
No condition code flags are updated
Notes: Instruction timings can be found in
Section 10.4.4, “Multiply Instruction Timings” on page 10-168
Specifying R15 as either RdHi or RdLo has unpredictable results.
Table 2-7. MRA{<cond>} RdLo, RdHi, acc0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
cond 1 1 0 0 0 1 0 1 RdHi RdLo 0 0 0 0 0 0 0 0 0 0 0 0
Operation: if ConditionPassed(<cond>) then
RdHi[31:0] = sign_extend(acc0[39:32])
RdLo[31:0] = acc0[31:0] Exceptions:none Qualifiers Condition Code
No condition code flags are updated
Notes: Instruction timings can be found in
Section 10.4.4, “Multiply Instruction Timings” on page 10-168
Specifying the same register for RdHi and RdLo has unpredictable results.
Specifying R15 as either RdHi or RdLo has unpredictable results.
Developer’s Manual January, 2004 29
Intel XScale® Core Developer’s Man u al
Programming Model
2.3.2 New Page Attributes
The Intel XScale® core extends the page attributes defined by the C and B bits in the page descriptor s wit h an add itional X bit. This bit al lows four more attributes to be encoded when X=1. These new encodings include allocating data for the mini-data cache and write-allocate caching. A full descripti on of th e encodi ngs can be found in Section 3.2.2, “Memory Attri bute s” on page 3-38.
The Intel XScale® core retains ARM definitions of the C and B encoding when X = 0, which is differe nt than the StrongARM prod ucts. The memory attribu te for the mini-data cach e has bee n moved and replaced with the write-through caching attribute.
When write-alloc ate is enabled, a store operatio n th at misses the data cache (cacheable data only) will generate a line fill. If disabled, a lin e fill only occurs when a load operation misses the data cache (cacheable data only).
Write-through caching causes all store operat ions to be written to memory, whether the y are cacheable or not cacheable. This feature is use ful for maintaining data ca che coherency.
The Intel XSca le® core also adds a P bit in the first level descriptors to allow an ASSP to identify a new memory attribute. Refer to the Intel XScal e
®
core implementation option section of the ASSP architecture s pecification to find out how the P bit has been defined. Bit 1 in the Control Register (coprocessor 15, register 1, opcode=1) is used to assigned the P bit memory attribute for memory accesses made during page table walks.
These attributes are programmed in the translation table descriptors, which are highlighted in
Table 2-8, “First-level Descriptors” on page 2-30, Table 2-9, “Second-level Descriptors for Coarse Page Table” on page 2-30 and Tab le 2-10, “Second-leve l Descriptors for Fine Page Table” on page 2-30. Two second-level descriptor formats have been defined for the core, one is us ed for the
coarse page table and the other is used for the fine page table.
30 January, 2004 Developer’s Manual
Intel XScale® Cor e Developer’s Manual
Programm i ng M odel
The TEX (T ype Extension) fiel d is pres ent in several of the descriptor types. In the core, only the LSB of this field is defined; this is called the X bit. The remaining bits s hould be programmed as zero (SBZ).
A Small Page descriptor does not have a TEX field. For these descriptors, TEX is implicitly zero; that is, they operate as if the X bit had a ‘0’ valu e.
The X bit, when set, modi fies the meaning of the C and B bits. Des cription of page attributes and their encoding can be found in Chapter 3, “Memory Management”.
Table 2-8. First-level Descriptors
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
SBZ 0 0
Coarse page table base address P Domain SBZ 0 1
Section base addre ss SBZ TEX AP P Domain 0 C B 1 0
Fine page table base address SBZ P Domain SBZ 1 1
Table 2-9. Second-level Descriptors for Coarse Page Table
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
SBZ 0 0
Large page base address TEX AP3 AP2 AP1 AP0 C B 0 1
Small page base address AP3 AP2 AP1 AP0 C B 1 0
Extended small page base address SBZ TEX AP C B 1 1
Table 2-10. Second-level Descriptors for Fine Page Table
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
SBZ 0 0
Large page base ad dres s TEX AP3 AP2 AP1 AP0 C B 0 1
Small page base address AP3 AP2 AP1 AP0 C B 1 0
Tiny Page Base Address TEX AP C B 1 1
Loading...
+ 190 hidden pages