IBM A2 User Manual

Page 1
A2 Processor
User’s Manual
for Blue Gene/Q
Note: This document and the information it contains are provided on an as-is basis. There is no plan for providing for future updates and corrections to this document.

Title Page

October 23, 2012 Version 1.3
Page 2
®

Copyright and Disclaimer

© Copyright International Business Machines Corporation 2010, 2012
Printed in the United States of America October 2012
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other compa­nies. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml
.
Intel is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other coun­tries.
Other company, product, and service names may be trademarks or service marks of others.
All information contained in this document is subject to change without notice. The products described in this document are NOT intended for use in applications such as implantation, life support, or other hazardous uses where malfunction could result in death, bodily injury, or catastrophic property damage. The information contained in this document does not affect or change IBM product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnity under the intellectual property rights of IBM or third parties. All information contained in this docu­ment was obtained in specific environments, and is presented as an illustration. The results obtained in other operating environments may vary.
THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN “AS IS” BASIS. In no event will IBM be liable for damages arising directly or indirectly from any use of the information contained in this document.
IBM Systems and Technology Group 2070 Route 52, Bldg. 330 Hopewell Junction, NY 12533-6351
The IBM home page can be found at ibm.com®. The IBM semiconductor solutions home page can be found at ibm.com/chips.
Version 1.3 October 23, 2012
Page 3
User’s Manual
A2 Processor

Contents

List of Figures ............................................................................................................... 21
List of Tables ................................................................................................................. 23
Revision Log ................................................................................................................. 29
About This Book .......................................................................................................... 31
Who Should Use This Book .................................................................................................................. 31
How to Use This Book ........................................................................................................................... 31
Notation ................................................................................................................................................. 32
Related Publications ............................................................................................................................. 33
List of Acronyms and Abbreviations .......................................................................... 35
1. Overview .................................................................................................................... 45
1.1 A2 Core Key Design Fundamentals ................................................................................................ 45
1.2 A2 Core Features ............................................................................................................................ 46
1.3 The A2 Core as a Power ISA Implementation ................................................................................ 49
1.3.1 Embedded Hypervisor ........................................................................................................... 49
1.4 A2 Core Organization ...................................................................................................................... 49
1.4.1 Instruction Unit ....................................................................................................................... 50
1.4.2 Execution Unit ....................................................................................................................... 51
1.4.3 Instruction and Data Cache Controllers ................................................................................. 51
1.4.3.1 Instruction Cache Controller ........................................................................................... 51
1.4.3.2 Data Cache Controller .................................................................................................... 51
1.4.4 Memory Management Unit (MMU) ........................................................................................ 52
1.4.5 Timers .................................................................................................................................... 54
1.4.6 Debug Facilities ..................................................................................................................... 54
1.4.6.1 Debug Modes ................................................................................................................. 54
1.4.6.2 Development Tool Support ............................................................................................. 55
1.4.7 Floating-Point Unit Organization ............................................................................................ 55
1.4.7.1 Arithmetic and Load/Store Pipelines .............................................................................. 56
1.4.8 IEEE 754 and Architectural Compliance ............................................................................... 56
1.4.8.1 IEEE 754 Compliance .................................................................................................... 57
1.4.9 Floating-Point Unit Implementation ....................................................................................... 57
1.4.9.1 Reciprocal Estimates ...................................................................................................... 57
1.4.9.2 Denormalized B Operands ............................................................................................. 57
1.4.9.3 Non-IEEE mode ............................................................................................................. 57
1.4.10 Floating-Point Unit Interfaces .............................................................................................. 57
1.4.10.1 A2 Processor Core Interface ........................................................................................ 57
1.4.10.2 Clock and Power Management Interface ..................................................................... 58
1.5 Core Interfaces ................................................................................................................................ 58
1.5.1 System Interface .................................................................................................................... 58
1.5.2 Auxiliary Execution Unit (AXU) Port ...................................................................................... 59
1.5.3 JTAG Port .............................................................................................................................. 59
Version 1.3 October 23, 2012
Contents
Page 3 of 864
Page 4
User’s Manual
A2 Processor
2. CPU Programming Model ......................................................................................... 61
2.1 Logical Partitioning .......................................................................................................................... 61
2.1.1 Overview ................................................................................................................................ 61
2.2 Storage Addressing ......................................................................................................................... 62
2.2.1 Storage Operands .................................................................................................................. 62
2.2.2 Effective Address Calculation ................................................................................................ 64
2.2.2.1 Data Storage Addressing Modes .................................................................................... 65
2.2.2.2 Instruction Storage Addressing Modes ........................................................................... 65
2.2.3 Byte Ordering ......................................................................................................................... 66
2.2.3.1 Structure Mapping Examples ......................................................................................... 66
2.2.3.2 Instruction Byte Ordering ................................................................................................ 67
2.2.3.3 Data Byte Ordering ......................................................................................................... 68
2.2.3.4 Byte-Reverse Instructions .............................................................................................. 69
2.3 Multithreading .................................................................................................................................. 70
2.3.1 Thread Identification .............................................................................................................. 70
2.3.1.1 Thread Identification Register (TIR) ............................................................................... 70
2.3.1.2 Processor Identification Register (PIR) .......................................................................... 70
2.3.1.3 Guest Processor Identification Register (GPIR) ............................................................. 71
2.3.2 Thread Run State ................................................................................................................... 71
2.3.2.1 Thread Stop I/O Pin ........................................................................................................ 71
2.3.2.2 Thread Control and Status Register (THRCTL) ............................................................. 71
2.3.2.3 Core Configuration Register 0 (CCR0) ........................................................................... 72
2.3.2.4 Thread Enable Register (TENS, TENC) ......................................................................... 72
2.3.2.5 Thread Enable Status Register (TENSR) ....................................................................... 73
2.3.3 Wake On Interrupt .................................................................................................................. 74
2.3.3.1 Core Configuration Register 1 (CCR1) ........................................................................... 74
2.3.4 Thread Priority ....................................................................................................................... 75
2.3.4.1 Program Priority Register (PPR32) ................................................................................ 75
2.3.4.2 Instruction Unit Configuration Register 1 (IUCR1) .......................................................... 77
2.3.5 Resources Shared between Threads .................................................................................... 77
2.3.6 Shared Resources ................................................................................................................. 77
2.3.6.1 Accessing Shared Resources ........................................................................................ 78
2.3.7 Duplicated Resources ............................................................................................................ 78
2.3.8 Pipeline Sharing ..................................................................................................................... 79
2.3.8.1 Instruction Cache ............................................................................................................ 80
2.3.8.2 Instruction Buffer and Decode Dependency ................................................................... 80
2.3.8.3 Instruction Issue ............................................................................................................. 80
2.3.8.4 Ram Unit ......................................................................................................................... 81
2.3.8.5 Microcode Unit ................................................................................................................ 82
2.3.8.6 Integer Unit ..................................................................................................................... 82
2.4 Registers ......................................................................................................................................... 82
2.4.1 Register Mapping ................................................................................................................... 84
2.4.2 Register Types ....................................................................................................................... 84
2.4.2.1 General Purpose Registers ............................................................................................ 84
2.4.2.2 Special Purpose Registers ............................................................................................. 84
2.4.2.3 Condition Register .......................................................................................................... 85
2.4.2.4 Machine State Register .................................................................................................. 85
2.5 32-Bit Mode ..................................................................................................................................... 85
2.5.1 64-Bit Specific Instructions ..................................................................................................... 85
2.5.2 32-Bit Instruction Selection .................................................................................................... 85
Contents
Page 4 of 864
Version 1.3
October 23, 2012
Page 5
User’s Manual
A2 Processor
2.6 Instruction Categories ..................................................................................................................... 86
2.7 Instruction Classes .......................................................................................................................... 87
2.7.1 Defined Instruction Class ....................................................................................................... 87
2.7.2 Illegal Instruction Class .......................................................................................................... 88
2.7.3 Reserved Instruction Class .................................................................................................... 88
2.8 Implemented Instruction Set Summary ........................................................................................... 88
2.8.1 Integer Instructions ................................................................................................................ 89
2.8.1.1 Integer Storage Access Instructions ............................................................................... 89
2.8.1.2 Integer Arithmetic Instructions ........................................................................................ 91
2.8.1.3 Integer Logical Instructions ............................................................................................ 92
2.8.1.4 Integer Compare Instructions ......................................................................................... 92
2.8.1.5 Integer Trap Instructions ................................................................................................ 92
2.8.1.6 Integer Rotate Instructions ............................................................................................. 92
2.8.1.7 Integer Shift Instructions ................................................................................................. 93
2.8.1.8 Integer Population Count Instructions ............................................................................ 93
2.8.1.9 Integer Select Instruction ................................................................................................ 93
2.8.2 Branch Instructions ................................................................................................................ 94
2.8.3 Processor Control Instructions .............................................................................................. 94
2.8.3.1 Condition Register Logical Instructions .......................................................................... 94
2.8.3.2 Register Management Instructions ................................................................................. 95
2.8.3.3 System Linkage Instructions .......................................................................................... 95
2.8.3.4 Processor Control Instructions ....................................................................................... 95
2.8.4 Storage Control Instructions .................................................................................................. 95
2.8.4.1 Cache Management Instructions .................................................................................... 96
2.8.4.2 TLB Management Instructions ....................................................................................... 96
2.8.4.3 Processor Synchronization Instruction ........................................................................... 97
2.8.4.4 Load and Reserve and Store Conditional Instructions ................................................... 97
2.8.4.5 Storage Synchronization Instructions ............................................................................. 97
2.8.4.6 Wait Instruction ............................................................................................................... 98
2.8.5 Initiate Coprocessor Instructions ........................................................................................... 98
2.8.5.1 Cache Initialization Instructions ...................................................................................... 98
2.9 Branch Processing .......................................................................................................................... 99
2.9.1 Branch Addressing ................................................................................................................ 99
2.9.2 Branch Instruction BI Field .................................................................................................... 99
2.9.3 Branch Instruction BO Field ................................................................................................... 99
2.9.4 Branch Prediction ................................................................................................................ 100
2.9.4.1 Branch Decoder ........................................................................................................... 100
2.9.4.2 Branch Direction Prediction .......................................................................................... 101
2.9.4.3 Branch Prioritization ..................................................................................................... 104
2.9.4.4 Branch Target Prediction .............................................................................................. 104
2.9.4.5 Redirection ................................................................................................................... 105
2.9.5 Branch Control Registers .................................................................................................... 105
2.9.5.1 Link Register (LR) ........................................................................................................ 105
2.9.5.2 Count Register (CTR) ................................................................................................... 106
2.9.5.3 Condition Register (CR) ............................................................................................... 107
2.10 Integer Processing ...................................................................................................................... 110
2.10.1 General Purpose Registers (GPRs) .................................................................................. 110
2.10.2 Integer Exception Register (XER) ..................................................................................... 110
2.10.2.1 Summary Overflow (SO) Field ................................................................................... 112
2.10.2.2 Overflow (OV) Field .................................................................................................... 112
Version 1.3 October 23, 2012
Contents
Page 5 of 864
Page 6
User’s Manual
A2 Processor
2.10.2.3 Carry (CA) Field .......................................................................................................... 112
2.10.2.4 Transfer Byte Count (TBC) Field ................................................................................ 113
2.11 Processor Control ........................................................................................................................ 113
2.11.1 Special Purpose Registers General (SPRG0–SPRG8) ..................................................... 114
2.11.2 External Process ID Load Context (EPLC) Register .......................................................... 119
2.11.3 External Process ID Store Context (EPSC) Register ......................................................... 119
2.12 Privileged Modes ......................................................................................................................... 120
2.12.1 Privileged Instructions ........................................................................................................ 121
2.12.1.1 Cache Locking Instructions ........................................................................................ 121
2.12.2 Privileged SPRs ................................................................................................................. 122
2.13 Speculative Accesses ................................................................................................................. 122
2.14 Synchronization ........................................................................................................................... 122
2.14.1 Context Synchronization .................................................................................................... 122
2.14.2 Execution Synchronization ................................................................................................. 124
2.14.3 Storage Ordering and Synchronization .............................................................................. 124
2.15 Software Transactional Memory Acceleration ............................................................................. 125
2.15.1 Summary ............................................................................................................................ 125
2.15.2 Implementation .................................................................................................................. 125
2.15.2.1 L1 D-Cache ................................................................................................................ 126
2.15.3 Watch Operation Ordering Requirements .......................................................................... 126
2.15.4 Impact on Existing Software .............................................................................................. 126
3. FU Programming Model .......................................................................................... 127
3.1 Storage Addressing ....................................................................................................................... 127
3.1.1 Storage Operands ................................................................................................................ 127
3.1.2 Effective Address Calculation .............................................................................................. 128
3.1.3 Data Storage Addressing Modes ......................................................................................... 128
3.2 Floating-Point Exceptions .............................................................................................................. 129
3.3 Floating-Point Registers ................................................................................................................ 129
3.3.1 Register Types ..................................................................................................................... 130
3.3.1.1 Floating-Point Registers (FPR0–FPR31) ..................................................................... 130
3.3.1.2 Floating-Point Status and Control Register (FPSCR) .................................................. 131
3.4 Floating-Point Data Formats ......................................................................................................... 133
3.4.1 Value Representation .......................................................................................................... 134
3.4.2 Binary Floating-Point Numbers ............................................................................................ 135
3.4.2.1 Normalized Numbers .................................................................................................... 135
3.4.2.2 Denormalized Numbers ................................................................................................ 136
3.4.2.3 Zero Values .................................................................................................................. 136
3.4.3 Infinities ................................................................................................................................ 136
3.4.3.1 Not a Numbers ............................................................................................................. 136
3.4.4 Sign of Result ....................................................................................................................... 137
3.4.5 Normalization and Denormalization ..................................................................................... 138
3.4.6 Data Handling and Precision ............................................................................................... 138
3.4.7 Rounding .............................................................................................................................. 139
3.5 Floating-Point Execution Models ................................................................................................... 140
3.5.1 Execution Model for IEEE Operations ................................................................................. 141
3.5.2 Execution Model for Multiply-Add Type Instructions ............................................................ 143
3.6 Floating-Point Instructions ............................................................................................................. 143
3.6.1 Instructions by Category ...................................................................................................... 144
Contents
Page 6 of 864
Version 1.3
October 23, 2012
Page 7
User’s Manual
A2 Processor
3.6.2 Load and Store Instructions ................................................................................................. 145
3.6.3 Floating-Point Store Instructions ......................................................................................... 146
3.6.4 Floating-Point Move Instructions ......................................................................................... 148
3.6.5 Floating-Point Arithmetic Instructions .................................................................................. 148
3.6.5.1 Floating-Point Multiply-Add Instructions ....................................................................... 149
3.6.6 Floating-Point Rounding and Conversion Instructions ........................................................ 149
3.6.7 Floating-Point Compare Instructions ................................................................................... 150
3.6.8 Floating-Point Status and Control Register Instructions ...................................................... 151
4. Initialization ............................................................................................................. 153
4.1 Core Reset .................................................................................................................................... 153
4.2 A2 Core State After Reset ............................................................................................................. 154
4.3 Software Initiated Reset Requests ................................................................................................ 160
4.3.1 Software Reset Requests .................................................................................................... 160
4.3.1.1 From Debug ................................................................................................................. 161
4.3.1.2 From Watchdog Timer .................................................................................................. 161
4.3.2 Reset Request Status .......................................................................................................... 161
4.3.2.1 Debug Facility Reset Status ......................................................................................... 162
4.3.2.2 Timer Facility Reset Status .......................................................................................... 162
4.4 Initialization Software Requirements ............................................................................................. 163
5. Instruction and Data Caches ................................................................................. 169
5.1 Data Cache Array Organization and Operation ............................................................................ 169
5.2 Instruction Cache Array Organization and Operation ................................................................... 170
5.3 Cache Line Replacement Policy ................................................................................................... 170
5.4 Instruction Cache Controller .......................................................................................................... 170
5.4.1 ICC Operations .................................................................................................................... 171
5.4.2 Instruction Cache Coherency .............................................................................................. 171
5.4.2.1 Self-Modifying Code ..................................................................................................... 172
5.4.2.2 Instruction Cache Synonyms ........................................................................................ 172
5.4.3 Instruction Cache Control and Debug ................................................................................. 172
5.4.3.1 Instruction Cache Management and Debug Instruction Summary ............................... 172
5.4.3.2 Instruction Cache Parity Operations ............................................................................. 173
5.4.3.3 Simulating Instruction Cache Parity Errors for Software Testing ................................. 173
5.5 Data Cache Controller ................................................................................................................... 173
5.5.1 DCC Operations .................................................................................................................. 174
5.5.1.1 Load and Store Alignment ............................................................................................ 175
5.5.1.2 Load Operations ........................................................................................................... 175
5.5.1.3 Store Operations .......................................................................................................... 176
5.5.1.4 Data Read and Instruction Fetch Interface Requests .................................................. 176
5.5.1.5 Data Write Interface Requests ..................................................................................... 176
5.5.1.6 Storage Access Ordering ............................................................................................. 177
5.5.2 Data Cache Coherency ....................................................................................................... 177
5.5.3 Data Cache Control ............................................................................................................. 177
5.5.3.1 Data Cache Management Instruction Summary .......................................................... 177
5.5.3.2 dcbt and dcbtst Operation ............................................................................................ 178
5.5.3.3 Cache Locking Mechanisms ........................................................................................ 179
5.5.3.4 Data Cache Parity Operations ...................................................................................... 183
5.5.3.5 Simulating Data Cache Parity Errors for Software Testing .......................................... 183
Version 1.3 October 23, 2012
Contents
Page 7 of 864
Page 8
User’s Manual
A2 Processor
5.5.3.6 Data Cache Disable ...................................................................................................... 183
6. Memory Management .............................................................................................. 185
6.1 MMU Overview .............................................................................................................................. 185
6.1.1 Support for Power ISA MMU Architecture ........................................................................... 186
6.2 Page Identification ......................................................................................................................... 186
6.2.1 Virtual Address Formation ................................................................................................... 187
6.2.2 Address Space Identifier Convention ................................................................................... 187
6.2.3 Exclusion Range (X-bit) Operation ...................................................................................... 188
6.2.4 TLB Match Process .............................................................................................................. 189
6.3 Address Translation ...................................................................................................................... 191
6.4 Access Control .............................................................................................................................. 193
6.4.1 Execute Access ................................................................................................................... 193
6.4.2 Write Access ........................................................................................................................ 193
6.4.3 Read Access ........................................................................................................................ 194
6.4.4 Access Control Applied to Cache Management Instructions ............................................... 194
6.5 Storage Attributes .......................................................................................................................... 195
6.5.1 Write-Through (W) ............................................................................................................... 196
6.5.2 Caching Inhibited (I) ............................................................................................................. 196
6.5.3 Memory Coherence Required (M) ....................................................................................... 196
6.5.4 Guarded (G) ......................................................................................................................... 196
6.5.5 Endian (E) ............................................................................................................................ 197
6.5.6 User-Definable (U0–U3) ...................................................................................................... 197
6.5.7 Supported Storage Attribute Combinations ......................................................................... 197
6.5.8 Aliasing ................................................................................................................................ 197
6.6 Translation Lookaside Buffer ......................................................................................................... 198
6.7 Effective to Real Address Translation Arrays ................................................................................ 203
6.7.1 ERAT Context Synchronization ........................................................................................... 204
6.7.2 ERAT Reset Behavior .......................................................................................................... 205
6.7.3 Atomic Update of ERAT Entries ........................................................................................... 205
6.7.4 ERAT LRU Round-Robin Replacement Mode ..................................................................... 205
6.7.5 ERAT LRU Replacement Watermark .................................................................................. 206
6.7.6 ERAT (TLB Lookaside Information) Coherency and Back-Invalidation ............................... 206
6.7.7 ERAT External PID (EPID) Context and Instruction Dependencies .................................... 208
6.8 Logical to Real Address Translation Array (Category E.HV.LRAT) .............................................. 209
6.9 TLB Management Instructions (Architected) ................................................................................. 212
6.9.1 TLB Read and Write Instructions (tlbre and tlbwe) ............................................................. 213
6.9.2 TLB Search Instruction (tlbsx[.]) ........................................................................................ 215
6.9.3 TLB Search and Reserve Instruction (tlbsrx.) .................................................................... 215
6.9.4 TLB Invalidate Virtual Address (Indexed) Instruction (tlbivax) ............................................ 216
6.9.5 TLB Invalidate Local (Indexed) Instruction (tlbilx) ............................................................... 218
6.9.6 TLB Sync Instruction (tlbsync) ............................................................................................ 218
6.10 ERAT Management Instructions (Non-Architected) .................................................................... 219
6.10.1 ERAT Read and Write Instructions (eratre and eratwe) ................................................... 219
6.10.2 ERAT Search Instruction (eratsx[.]) ................................................................................. 220
6.10.3 ERAT Invalidate Virtual Address (Indexed) Instruction (erativax) ..................................... 221
6.10.4 ERAT Invalidate Local (Indexed) Instruction (eratilx) ........................................................ 224
6.11 32-Bit Mode Memory Management Behavior .............................................................................. 224
6.11.1 32-Bit Mode TLB Read and Write Instructions (tlbre and tlbwe) ...................................... 225
Contents
Page 8 of 864
Version 1.3
October 23, 2012
Page 9
User’s Manual
A2 Processor
6.11.2 32-Bit Mode TLB Search Instruction (tlbsx[.]) ................................................................. 225
6.11.3 32-Bit Mode TLB Search and Reserve Instruction (tlbsrx.) ............................................. 225
6.11.4 32-Bit Mode TLB Invalidate Virtual Address (Indexed) Instruction (tlbivax) ..................... 226
6.11.5 32-Bit Mode TLB Invalidate Local (Indexed) Instruction (tlbilx) ........................................ 226
6.11.6 32-Bit Mode TLB Sync Instruction (tlbsync) ..................................................................... 226
6.11.7 32-Bit Mode ERAT Read and Write Instructions (eratre and eratwe) .............................. 226
6.11.8 32-Bit Mode ERAT Search Instruction (eratsx[.]) ............................................................ 227
6.11.9 32-Bit Mode ERAT Invalidate Virtual Address (Indexed) Instruction (erativax) ................ 227
6.11.10 32-Bit Mode ERAT Invalidate Local (Indexed) Instruction (eratilx) ................................. 228
6.12 Page Reference and Change Status Management .................................................................... 228
6.13 TLB and ERAT Parity Operations ............................................................................................... 229
6.13.1 Parity Errors Generated from tlbre or eratre .................................................................... 230
6.13.2 Simulating TLB and ERAT Parity Errors for Software Testing .......................................... 231
6.14 ERAT-Only Mode Operation ....................................................................................................... 232
6.15 TLB Reservations and TLB Write Conditional (Category E.TWC) .............................................. 232
6.16 Hardware Page Table Walking (Category E.PT) ........................................................................ 237
6.16.1 Searching the TLB for Direct and Indirect Entries ............................................................. 237
6.16.2 Indirect TLB Entry Page and Sub-Page Sizes ................................................................... 238
6.16.3 Hardware Page Table Entry Format .................................................................................. 239
6.16.4 Calculation of Hardware Page Table Entry Real Address ................................................. 240
6.16.5 Hardware Page Table Errors and Exceptions ................................................................... 241
6.16.6 Hardware Page Table Storage Control Attributes ............................................................. 241
6.16.7 TLB Update After Hardware Page Table Translation ........................................................ 242
6.17 Storage Control Registers (Architected) ..................................................................................... 244
6.17.1 Process ID Register (PID) ................................................................................................. 244
6.17.2 Logical Partition ID Register (LPIDR) ................................................................................ 245
6.17.3 External PID Load Context (EPLC) Register ..................................................................... 246
6.17.4 External PID Store Context (EPSC) Register .................................................................... 247
6.17.5 MMU Assist Register 0 (MAS0) ......................................................................................... 248
6.17.6 MMU Assist Register 1 (MAS1) ......................................................................................... 249
6.17.7 MMU Assist Register 2 (MAS2) ......................................................................................... 251
6.17.8 MMU Assist Register 2 Upper (MAS2U) ........................................................................... 252
6.17.9 MMU Assist Register 3 (MAS3) ......................................................................................... 253
6.17.10 MMU Assist Register 4 (MAS4) ....................................................................................... 255
6.17.11 MMU Assist Register 5 (MAS5) ....................................................................................... 256
6.17.12 MMU Assist Register 6 (MAS6) ....................................................................................... 257
6.17.13 MMU Assist Register 7 (MAS7) ....................................................................................... 258
6.17.14 MMU Assist Register 8 (MAS8) ....................................................................................... 259
6.17.15 MAS0_MAS1 Register ..................................................................................................... 260
6.17.16 MAS5_MAS6 Register ..................................................................................................... 261
6.17.17 MAS7_MAS3 Register ..................................................................................................... 262
6.17.18 MAS8_MAS1 Register ..................................................................................................... 263
6.17.19 MMU Configuration Register (MMUCFG) ........................................................................ 264
6.17.20 MMU Control and Status Register 0 (MMUCSR0) .......................................................... 265
6.17.21 TLB 0 Configuration Register (TLB0CFG) ....................................................................... 266
6.17.22 TLB 0 Page Size Register (TLB0PS) .............................................................................. 268
6.17.23 LRAT Configuration Register (LRATCFG) ...................................................................... 269
6.17.24 LRAT Page Size Register (LRATPS) .............................................................................. 270
6.17.25 Embedded Page Table Configuration Register (EPTCFG) ............................................. 272
6.17.26 Logical Page Exception Register (LPER) ........................................................................ 273
Version 1.3 October 23, 2012
Contents
Page 9 of 864
Page 10
User’s Manual
A2 Processor
6.17.27 Logical Page Exception Register Upper (LPERU) ........................................................... 274
6.17.28 MAS Register Update Summary ...................................................................................... 275
6.18 Storage Control Registers (Non-Architected) .............................................................................. 277
6.18.1 Memory Management Unit Control Register 0 (MMUCR0) ............................................... 277
6.18.2 Memory Management Unit Control Register 1 (MMUCR1) ............................................... 280
6.18.3 Memory Management Unit Control Register 2 (MMUCR2) ............................................... 287
6.18.4 Memory Management Unit Control Register 3 (MMUCR3) ............................................... 290
7. CPU Interrupts and Exceptions .............................................................................. 293
7.1 Overview ....................................................................................................................................... 293
7.2 Directed Interrupts ......................................................................................................................... 293
7.3 Interrupt Classes ........................................................................................................................... 294
7.3.1 Asynchronous Interrupts ...................................................................................................... 294
7.3.2 Synchronous Interrupts ........................................................................................................ 294
7.3.2.1 Synchronous, Precise Interrupts .................................................................................. 294
7.3.2.2 Synchronous, Imprecise Interrupts ............................................................................... 295
7.3.3 Critical and Noncritical Interrupts ......................................................................................... 296
7.3.4 Machine Check Interrupts .................................................................................................... 296
7.4 Interrupt Processing ...................................................................................................................... 297
7.4.1 Partially Executed Instructions ............................................................................................. 299
7.5 Interrupt Processing Registers ...................................................................................................... 300
7.5.1 Register Mapping ................................................................................................................. 301
7.5.2 Machine State Register (MSR) ............................................................................................ 301
7.5.3 Machine State Register Protect (MSRP) ............................................................................. 303
7.5.4 Embedded Processor Control Register (EPCR) .................................................................. 304
7.5.5 Save/Restore Register 0 (SRR0) ......................................................................................... 305
7.5.6 Save/Restore Register 1 (SRR1) ......................................................................................... 306
7.5.7 Guest Save/Restore Register 0 (GSRR0) ........................................................................... 308
7.5.8 Guest Save/Restore Register 1 (GSRR1) ........................................................................... 308
7.5.9 Critical Save/Restore Register 0 (CSRR0) .......................................................................... 310
7.5.10 Critical Save/Restore Register 1 (CSRR1) ........................................................................ 311
7.5.11 Machine Check Save/Restore Register 0 (MCSRR0) ....................................................... 313
7.5.12 Machine Check Save/Restore Register 1 (MCSRR1) ....................................................... 313
7.5.13 Data Exception Address Register (DEAR) ......................................................................... 315
7.5.14 Guest Data Exception Address Register (GDEAR) ........................................................... 316
7.5.15 Interrupt Vector Prefix Register (IVPR) .............................................................................. 318
7.5.16 Guest Interrupt Vector Prefix Register (GIVPR) ................................................................ 318
7.5.17 Exception Syndrome Register (ESR) ................................................................................. 318
7.5.18 Guest Exception Syndrome Register (GESR) ................................................................... 320
7.5.19 Machine Check Status Register (MCSR) ........................................................................... 322
7.6 Interrupt Definitions ....................................................................................................................... 323
7.6.1 Critical Input Interrupt ........................................................................................................... 326
7.6.2 Machine Check Interrupt ...................................................................................................... 327
7.6.2.1 Machine Check Status Register (MCSR) ..................................................................... 329
7.6.3 Data Storage Interrupt ......................................................................................................... 330
7.6.4 Instruction Storage Interrupt ................................................................................................ 334
7.6.5 External Input Interrupt ........................................................................................................ 336
7.6.6 Alignment Interrupt ............................................................................................................... 337
7.6.7 Program Interrupt ................................................................................................................. 338
Contents
Page 10 of 864
Version 1.3
October 23, 2012
Page 11
User’s Manual
A2 Processor
7.6.8 Floating-Point Unavailable Interrupt .................................................................................... 342
7.6.9 System Call Interrupt ........................................................................................................... 342
7.6.10 Auxiliary Processor Unavailable Interrupt .......................................................................... 343
7.6.11 Decrementer Interrupt ....................................................................................................... 343
7.6.12 Fixed-Interval Timer Interrupt ............................................................................................ 344
7.6.13 Watchdog Timer Interrupt .................................................................................................. 344
7.6.14 Data TLB Error Interrupt .................................................................................................... 345
7.6.15 Instruction TLB Error Interrupt ........................................................................................... 346
7.6.16 Vector Unavailable Interrupt .............................................................................................. 347
7.6.17 Debug Interrupt .................................................................................................................. 347
7.6.18 Processor Doorbell Interrupt .............................................................................................. 351
7.6.19 Processor Doorbell Critical Interrupt .................................................................................. 352
7.6.20 Guest Processor Doorbell Interrupt ................................................................................... 352
7.6.21 Guest Processor Doorbell Critical Interrupt ....................................................................... 353
7.6.22 Guest Processor Doorbell Machine Check Interrupt ......................................................... 353
7.6.23 Embedded Hypervisor System Call Interrupt .................................................................... 354
7.6.24 Embedded Hypervisor Privilege Interrupt .......................................................................... 354
7.6.25 LRAT Error Interrupt .......................................................................................................... 355
7.6.26 User Decrementer Interrupt ............................................................................................... 356
7.6.27 Performance Monitor Interrupt ........................................................................................... 356
7.7 Processor Messages ..................................................................................................................... 357
7.7.1 Processor Message Handling and Filtering ......................................................................... 357
7.7.2 Doorbell Message Filtering .................................................................................................. 358
7.7.3 Doorbell Critical Message Filtering ...................................................................................... 359
7.7.4 Guest Doorbell Message Filtering ....................................................................................... 360
7.7.5 Guest Doorbell Critical Message Filtering ........................................................................... 360
7.7.6 Guest Doorbell Machine Check Message Filtering ............................................................. 361
7.8 Interrupt Ordering and Masking .................................................................................................... 362
7.8.1 Interrupt Ordering Software Requirements .......................................................................... 363
7.8.2 Interrupt Order ..................................................................................................................... 364
7.9 Exception Priorities ....................................................................................................................... 365
7.9.1 Exception Priorities for Integer Load, Store, and Cache Management Instructions ............ 366
7.9.2 Exception Priorities for Floating-Point Load and Store Instructions .................................... 367
7.9.3 Exception Priorities for Floating-Point Instructions (Other) .................................................. 367
7.9.4 Exception Priorities for Privileged Instructions .................................................................... 368
7.9.5 Exception Priorities for Trap Instructions ............................................................................. 368
7.9.6 Exception Priorities for System Call Instruction ................................................................... 368
7.9.7 Exception Priorities for Branch Instructions ......................................................................... 369
7.9.8 Exception Priorities for Return From Interrupt Instructions .................................................. 369
7.9.9 Exception Priorities for Reserved Instructions ..................................................................... 369
7.9.10 Exception Priorities for All Other Instructions .................................................................... 370
8. FU Interrupts and Exceptions ................................................................................ 371
8.1 Floating-Point Exceptions ............................................................................................................. 371
8.2 Exceptions List .............................................................................................................................. 372
8.3 Floating-Point Interrupts ................................................................................................................ 375
8.3.1 Floating-Point Unavailable Interrupt .................................................................................... 375
8.3.2 Floating-Point Assist Interrupt ............................................................................................. 375
8.4 Floating-Point Exception Behavior ................................................................................................ 375
Version 1.3 October 23, 2012
Contents
Page 11 of 864
Page 12
User’s Manual
A2 Processor
8.4.1 Invalid Operation Exception ................................................................................................. 375
8.4.1.1 Action ............................................................................................................................ 376
8.4.2 Zero Divide Exception .......................................................................................................... 377
8.4.2.1 Action ............................................................................................................................ 377
8.4.3 Overflow Exception .............................................................................................................. 378
8.4.3.1 Action ............................................................................................................................ 378
8.4.4 Underflow Exception ............................................................................................................ 379
8.4.4.1 Action ............................................................................................................................ 379
8.4.5 Inexact Exception ................................................................................................................. 380
8.4.5.1 Action ............................................................................................................................ 380
8.5 Exception Priorities for Floating-Point Load and Store Instructions .............................................. 380
8.6 Exception Priorities for Other Floating-Point Instructions .............................................................. 381
8.7 QNaN ............................................................................................................................................ 381
8.8 Updating FPRs on Exceptions ...................................................................................................... 382
8.9 Floating-Point Status and Control Register (FPSCR) ................................................................... 382
8.10 Updating the Condition Register ................................................................................................. 385
8.10.1 Condition Register (CR) ..................................................................................................... 385
8.10.2 Updating CR Fields ............................................................................................................ 386
8.10.3 Generation of QNaN Results ............................................................................................. 386
9. Timer Facilities ........................................................................................................ 387
9.1 Time Base ..................................................................................................................................... 388
9.1.1 Reading the Time Base ....................................................................................................... 389
9.1.2 Writing the Time Base .......................................................................................................... 389
9.2 Decrementer (DEC) ....................................................................................................................... 389
9.3 User Decrementer (UDEC) ........................................................................................................... 391
9.4 Fixed Interval Timer (FIT) .............................................................................................................. 392
9.5 Watchdog Timer ............................................................................................................................ 393
9.6 Timer Control Register (TCR) ....................................................................................................... 395
9.7 Timer Status Register (TSR) ......................................................................................................... 397
9.8 Freezing the Timer Facilities ......................................................................................................... 397
9.9 Selection of the Timer Clock Source ............................................................................................. 398
9.10 Synchronizing Timers Across Multiple Cores .............................................................................. 398
10. Debug Facilities ..................................................................................................... 399
10.1 Implications of Hypervisor on Debug Controls ............................................................................ 399
10.2 Support for Development Tools ................................................................................................... 399
10.3 Debug Modes .............................................................................................................................. 399
10.3.1 Internal Debug Mode ......................................................................................................... 400
10.3.2 External Debug Mode ........................................................................................................ 400
10.3.3 Trace Debug Mode ............................................................................................................ 401
10.4 Debug Events .............................................................................................................................. 402
10.4.1 Instruction Address Compare (IAC) Debug Event ............................................................. 402
10.4.1.1 IAC Debug Event Fields ............................................................................................. 403
10.4.1.2 IAC Debug Event Processing ..................................................................................... 404
10.4.2 Data Address Compare (DAC) Debug Event ..................................................................... 405
10.4.2.1 DAC Debug Event Fields ............................................................................................ 405
10.4.2.2 DAC Debug Event Processing ................................................................................... 407
Contents
Page 12 of 864
Version 1.3
October 23, 2012
Page 13
User’s Manual
A2 Processor
10.4.2.3 DAC Debug Events Applied to Instructions that Result in Multiple Storage Accesses 407
10.4.2.4 DAC Debug Events Applied to Various Instruction Types ......................................... 408
10.4.3 Data Value Compare (DVC) Debug Event ........................................................................ 409
10.4.3.1 DVC Debug Event Fields ........................................................................................... 409
10.4.3.2 DVC Debug Event Processing ................................................................................... 410
10.4.3.3 DVC Debug Events Applied to Instructions that Result in Multiple Storage Accesses 410
10.4.3.4 DVC Debug Events Applied to Various Instruction Types ......................................... 411
10.4.3.5 DVC Debug Events Applied to Floating-Point Loads and Stores ............................... 411
10.4.4 Instruction Complete (ICMP) Debug Event ....................................................................... 411
10.4.5 Branch Taken (BRT) Debug Event .................................................................................... 412
10.4.6 Trap (TRAP) Debug Event ................................................................................................ 412
10.4.7 Return (RET) Debug Event ............................................................................................... 412
10.4.8 Interrupt (IRPT) Debug Event ............................................................................................ 413
10.4.9 Unconditional Debug Event (UDE) .................................................................................... 414
10.4.10 Instruction Value Compare (IVC) Debug Event ............................................................... 414
10.4.11 Debug Event Summary ................................................................................................... 415
10.5 Debug Reset ............................................................................................................................... 415
10.6 Debug Timer Freeze ................................................................................................................... 415
10.7 Debug Registers ......................................................................................................................... 415
10.7.1 Debug Control Register 0 (DBCR0) .................................................................................. 416
10.7.2 Debug Control Register 1 (DBCR1) .................................................................................. 418
10.7.3 Debug Control Register 2 (DBCR2) .................................................................................. 419
10.7.4 Debug Control Register 3 (DBCR3) .................................................................................. 421
10.7.5 Debug Status Register (DBSR) ........................................................................................ 422
10.7.6 Debug Status Register Write Register (DBSRWR) ........................................................... 423
10.7.7 Instruction Address Compare Registers (IAC1–IAC4) ...................................................... 425
10.7.8 Data Address Compare Registers (DAC1–DAC2) ............................................................ 426
10.7.9 Data Value Compare Registers (DVC1–DVC2) ................................................................ 427
10.7.10 Instruction Address Register (IAR) .................................................................................. 428
10.7.11 Instruction Match Mask Registers (IMMR) ...................................................................... 429
10.7.12 Instruction Match Registers (IMR) ................................................................................... 429
10.8 Instruction Stuffing ...................................................................................................................... 429
10.8.1 Ram Mode Overview ......................................................................................................... 430
10.8.2 Ram Register Descriptions ................................................................................................ 431
10.8.3 Example Ram Mode Procedures ....................................................................................... 434
10.8.3.1 SPR Read/Write Using GPR as Temporary Storage ................................................. 434
10.8.3.2 Using Microcode Scratch Registers as Temporary Storage ...................................... 435
10.8.4 Supported Ram Instructions .............................................................................................. 436
10.9 Direct Access to I-Cache and D-Cache Directories .................................................................... 437
10.9.1 General Read D-Cache Directory Sequence for L1 D-Cache ........................................... 437
10.9.2 Instruction Unit Debug Register 0 (IUDBG0) ..................................................................... 438
10.9.3 Instruction Unit Debug Register 1 (IUDBG1) ..................................................................... 439
10.9.4 Instruction Unit Debug Register 2 (IUDBG2) ..................................................................... 439
10.9.5 Execution Unit Debug Register 0 (XUDBG0) .................................................................... 440
10.9.6 Execution Unit Debug Register 1 (XUDBG1) .................................................................... 440
10.9.7 Execution Unit Debug Register 2 (XUDBG2) .................................................................... 441
10.10 Thread Control and Status ........................................................................................................ 441
10.10.1 Using THRCTL Register to Stop Thread 0 ...................................................................... 443
10.10.2 Using THRCTL Register to Start Thread 0 ...................................................................... 443
10.10.3 Using THRCTL Register to Instruction Step Thread 0 .................................................... 443
Version 1.3 October 23, 2012
Contents
Page 13 of 864
Page 14
User’s Manual
A2 Processor
10.11 PC Configuration Register 0 (PCCR0) ...................................................................................... 444
10.12 Trace and Trigger Bus ............................................................................................................... 445
10.12.1 Trace and Trigger Bus Overview ..................................................................................... 445
10.12.2 Unit Level Trace and Trigger Bus Implementation ........................................................... 446
10.12.3 Debug Select Registers ................................................................................................... 447
11. Performance Events and Event Selection ........................................................... 449
11.1 Event Bus Overview .................................................................................................................... 449
11.2 A2 Core Event Bus and PC Unit Controls ................................................................................... 450
11.2.1 Enabling Performance Event and Trace Bus Latches ....................................................... 450
11.2.2 Performance Analysis Operating Modes ........................................................................... 450
11.2.3 Core Performance Event Selection to External Event Bus ................................................ 450
11.2.4 Core Event Select Register (CESR) .................................................................................. 452
11.3 Unit Level Performance Event Selection ..................................................................................... 454
11.3.1 Unit Event Multiplexer Component .................................................................................... 454
11.3.2 Performance Monitor Event Tags and Count Modes ......................................................... 456
11.3.3 Unit Performance Event Tables ......................................................................................... 457
11.4 Unit Performance Event Tables .................................................................................................. 458
11.4.1 FU Performance Events Table ........................................................................................... 458
11.4.2 IU Performance Events Table ............................................................................................ 458
11.4.3 XU Performance Events Table .......................................................................................... 460
11.4.4 LSU Performance Events Table ........................................................................................ 462
11.4.5 MMU Performance Events Table ....................................................................................... 465
11.5 Unit Event Select Registers ......................................................................................................... 466
11.5.1 FU Event Select Register (AESR) ..................................................................................... 466
11.5.2 IU Event Select Registers .................................................................................................. 468
11.5.3 XU Event Select Registers ................................................................................................. 470
11.5.4 LSU Event Select Registers ............................................................................................... 472
11.5.5 MMU Event Select Registers ............................................................................................. 474
11.6 A2 Support for Core Instruction Trace ......................................................................................... 476
11.6.1 Instruction Trace Mode Setup ............................................................................................ 476
11.6.2 Instruction Trace Record Data ........................................................................................... 476
11.6.3 Instruction Trace Record Formats and Ordering ............................................................... 477
11.6.4 Debug Bus Control When in Instruction Trace Mode ......................................................... 478
11.6.4.1 FU Trace Records ...................................................................................................... 479
11.6.4.2 XU Debug Bus Control ............................................................................................... 479
11.7 A2 Support for Instruction Sampling ............................................................................................ 479
12. Implementation Dependent Instructions ............................................................. 481
12.1 Miscellaneous .............................................................................................................................. 481
12.1.1 Attention (attn) ................................................................................................................... 481
12.2 TLB Management Instructions .................................................................................................... 482
12.2.1 TLB Read Entry (tlbre) ...................................................................................................... 482
12.2.2 TLB Write Entry (tlbwe) ..................................................................................................... 484
12.2.3 TLB Search Indexed (tlbsx[.]) ........................................................................................... 486
12.2.4 TLB Search and Reserve Indexed (tlbsrx.) ....................................................................... 488
12.2.5 TLB Invalidate Virtual Address Indexed (tlbivax) .............................................................. 490
12.2.6 TLB Invalidate Local Indexed (tlbilx) ................................................................................. 493
12.3 ERAT Management Instructions ................................................................................................. 496
Contents
Page 14 of 864
Version 1.3
October 23, 2012
Page 15
User’s Manual
A2 Processor
12.3.1 ERAT Read Entry (eratre) ................................................................................................. 496
12.3.2 ERAT Write Entry (eratwe) ............................................................................................... 499
12.3.3 ERAT Search Indexed (eratsx[.]) ..................................................................................... 502
12.3.4 ERAT Invalidate Virtual Address Indexed (erativax) ........................................................ 504
12.3.5 ERAT Invalidate Local Indexed (eratilx) ........................................................................... 507
12.4 Software Transactional Memory Instructions .............................................................................. 509
12.4.1 Load Doubleword and Watch Indexed X-Form (ldawx.) ................................................... 510
12.4.2 Watch Check All X-Form (wchkall) ................................................................................... 511
12.4.3 Watch Clear X-Form (wclr) ............................................................................................... 512
12.5 Coprocessor Instructions ............................................................................................................ 513
12.5.1 Initiate Coprocessor Store Word Indexed (icswx[.]) ......................................................... 515
12.5.1.1 General Registers ...................................................................................................... 516
12.5.1.2 Initial Execution .......................................................................................................... 517
12.5.2 Initiate Coprocessor Store Word External Process ID Indexed (icswepx[.]) .................... 518
12.5.3 Execution ........................................................................................................................... 518
12.5.3.1 Condition Register 0 ................................................................................................... 519
12.5.4 Coprocessor-Request Block .............................................................................................. 520
12.5.4.1 Available Coprocessor Register (ACOP) ................................................................... 520
12.5.4.2 Hypervisor Available Coprocessor Register (HACOP) ............................................... 521
12.6 Data Cache Block Flush .............................................................................................................. 523
12.6.1 Data Cache Block Flush (dcbf) ......................................................................................... 523
12.7 Data Cache Block Flush by External PID .................................................................................... 524
12.7.1 Data Cache Block Flush by External PID (dcbfep) ........................................................... 524
13. Power Management Methods .............................................................................. 525
13.1 Chip Power Management Controls ............................................................................................. 525
13.2 Power-Saving Instructions .......................................................................................................... 525
13.2.1 Power-Saving Instruction Sequence ................................................................................. 526
14. Register Summary ................................................................................................ 529
14.1 Register Categories .................................................................................................................... 529
14.2 Reserved Fields .......................................................................................................................... 535
14.3 Unimplemented SPRs ................................................................................................................. 535
14.4 Device Control Registers ............................................................................................................ 535
14.5 Alphabetical Register Listing ....................................................................................................... 537
14.5.1 ACOP - Available Coprocessor ......................................................................................... 538
14.5.2 AESR - AXU Event Select Register ................................................................................... 539
14.5.3 CCR0 - Core Configuration Register 0 .............................................................................. 541
14.5.4 CCR1 - Core Configuration Register 1 .............................................................................. 542
14.5.5 CCR2 - Core Configuration Register 2 .............................................................................. 543
14.5.6 CCR3 - Core Configuration Register 3 .............................................................................. 545
14.5.7 CESR - Core Event Select Register .................................................................................. 546
14.5.8 CR - Condition Register ..................................................................................................... 549
14.5.9 CSRR0 - Critical Save/Restore Register 0 ........................................................................ 550
14.5.10 CSRR1 - Critical Save/Restore Register 1 ...................................................................... 551
14.5.11 CTR - Count Register ...................................................................................................... 553
14.5.12 DAC1 - Data Address Compare 1 ................................................................................... 554
14.5.13 DAC2 - Data Address Compare 2 ................................................................................... 555
14.5.14 DAC3 - Data Address Compare 3 ................................................................................... 556
Version 1.3 October 23, 2012
Contents
Page 15 of 864
Page 16
User’s Manual
A2 Processor
14.5.15 DAC4 - Data Address Compare 4 .................................................................................... 557
14.5.16 DBCR0 - Debug Control Register 0 ................................................................................. 558
14.5.17 DBCR1 - Debug Control Register 1 ................................................................................. 560
14.5.18 DBCR2 - Debug Control Register 2 ................................................................................. 562
14.5.19 DBCR3 - Debug Control Register 3 ................................................................................. 564
14.5.20 DBSR - Debug Status Register ........................................................................................ 565
14.5.21 DBSRWR - Debug Status Register Write Register .......................................................... 567
14.5.22 DEAR - Data Exception Address Register ....................................................................... 569
14.5.23 DEC - Decrementer ......................................................................................................... 570
14.5.24 DECAR - Decrementer Auto-Reload ............................................................................... 571
14.5.25 DVC1 - Data Value Compare 1 ........................................................................................ 572
14.5.26 DVC2 - Data Value Compare 2 ........................................................................................ 573
14.5.27 EPCR - Embedded Processor Control Register .............................................................. 574
14.5.28 EPLC - External Process ID Load Context ...................................................................... 576
14.5.29 EPSC - External Process ID Store Context ..................................................................... 577
14.5.30 EPTCFG - Embedded Page Table Configuration Register .............................................. 578
14.5.31 ESR - Exception Syndrome Register ............................................................................... 579
14.5.32 GDEAR - Guest Data Exception Address Register ......................................................... 581
14.5.33 GESR - Guest Exception Syndrome Register ................................................................. 582
14.5.34 GIVPR - Guest Interrupt Vector Prefix Register ............................................................... 584
14.5.35 GPIR - Guest Processor ID Register ............................................................................... 585
14.5.36 GSPRG0 - Guest Software Special Purpose Register 0 ................................................. 586
14.5.37 GSPRG1 - Guest Software Special Purpose Register 1 ................................................. 587
14.5.38 GSPRG2 - Guest Software Special Purpose Register 2 ................................................. 588
14.5.39 GSPRG3 - Guest Software Special Purpose Register 3 ................................................. 589
14.5.40 GSRR0 - Guest Save/Restore Register 0 ........................................................................ 590
14.5.41 GSRR1 - Guest Save/Restore Register 1 ........................................................................ 591
14.5.42 HACOP - Hypvervisor Available Coprocessor ................................................................. 593
14.5.43 IAC1 - Instruction Address Compare 1 ............................................................................ 594
14.5.44 IAC2 - Instruction Address Compare 2 ............................................................................ 595
14.5.45 IAC3 - Instruction Address Compare 3 ............................................................................ 596
14.5.46 IAC4 - Instruction Address Compare 4 ............................................................................ 597
14.5.47 IAR - Instruction Address Register ................................................................................... 598
14.5.48 IESR1 - IU Event Select Register 1 ................................................................................. 599
14.5.49 IESR2 - IU Event Select Register 2 ................................................................................. 600
14.5.50 IMMR - Instruction Match Mask Register ......................................................................... 601
14.5.51 IMPDEP0 - Implementation Dependent Region 0 ........................................................... 602
14.5.52 IMPDEP1 - Implementation Dependent Region 1 ........................................................... 603
14.5.53 IMR - Instruction Match Register ..................................................................................... 604
14.5.54 IUCR0 - Instruction Unit Configuration Register 0 ........................................................... 605
14.5.55 IUCR1 - Instruction Unit Configuration Register 1 ........................................................... 606
14.5.56 IUCR2 - Instruction Unit Configuration Register 2 ........................................................... 607
14.5.57 IUDBG0 - Instruction Unit Debug Register 0 ................................................................... 608
14.5.58 IUDBG1 - Instruction Unit Debug Register 1 ................................................................... 609
14.5.59 IUDBG2 - Instruction Unit Debug Register 2 ................................................................... 610
14.5.60 IULFSR - Instruction Unit LFSR ....................................................................................... 611
14.5.61 IULLCR - Instruction Unit Live Lock Control Register ...................................................... 612
14.5.62 IVPR - Interrupt Vector Prefix Register ............................................................................ 613
14.5.63 LPER - Logical Page Exception Register ........................................................................ 614
14.5.64 LPERU - Logical Page Exception Register (Upper) ......................................................... 615
Contents
Page 16 of 864
Version 1.3
October 23, 2012
Page 17
User’s Manual
A2 Processor
14.5.65 LPIDR - Logical Partition ID Register .............................................................................. 616
14.5.66 LR - Link Register ............................................................................................................ 617
14.5.67 LRATCFG - LRAT Configuration Register ....................................................................... 618
14.5.68 LRATPS - LRAT Page Size Register .............................................................................. 619
14.5.69 MAS0 - MMU Assist Register 0 ....................................................................................... 620
14.5.70 MAS0_MAS1 - MMU Assist Registers 0 and 1 ............................................................... 621
14.5.71 MAS1 - MMU Assist Register 1 ....................................................................................... 622
14.5.72 MAS2 - MMU Assist Register 2 ....................................................................................... 624
14.5.73 MAS2U - MMU Assist Register 2 (Upper) ....................................................................... 625
14.5.74 MAS3 - MMU Assist Register 3 ....................................................................................... 626
14.5.75 MAS4 - MMU Assist Register 4 ....................................................................................... 628
14.5.76 MAS5 - MMU Assist Register 5 ....................................................................................... 629
14.5.77 MAS5_MAS6 - MMU Assist Registers 5 and 6 ............................................................... 630
14.5.78 MAS6 - MMU Assist Register 6 ....................................................................................... 631
14.5.79 MAS7 - MMU Assist Register 7 ....................................................................................... 632
14.5.80 MAS7_MAS3 - MMU Assist Registers 7 and 3 ............................................................... 633
14.5.81 MAS8 - MMU Assist Register 8 ....................................................................................... 634
14.5.82 MAS8_MAS1 - MMU Assist Registers 8 and 1 ............................................................... 635
14.5.83 MCSR - Machine Check Syndrome Register .................................................................. 636
14.5.84 MCSRR0 - Machine Check Save/Restore Register 0 ..................................................... 638
14.5.85 MCSRR1 - Machine Check Save/Restore Register 1 ..................................................... 639
14.5.86 MESR1 - MMU Event Select Register 1 .......................................................................... 641
14.5.87 MESR2 - MMU Event Select Register 2 .......................................................................... 642
14.5.88 MMUCFG - MMU Configuration Register ........................................................................ 643
14.5.89 MMUCR0 - Memory Management Unit Control Register 0 ............................................. 644
14.5.90 MMUCR1 - Memory Management Unit Control Register 1 ............................................. 645
14.5.91 MMUCR2 - Memory Management Unit Control Register 2 ............................................. 647
14.5.92 MMUCR3 - Memory Management Unit Control Register 3 ............................................. 649
14.5.93 MMUCSR0 - MMU Control and Status Register 0 .......................................................... 650
14.5.94 MSR - Machine State Register ........................................................................................ 651
14.5.95 MSRP - Machine State Register Protect ......................................................................... 653
14.5.96 PID - Process ID .............................................................................................................. 654
14.5.97 PIR - Processor ID Register ............................................................................................ 655
14.5.98 PPR32 - Program Priority Register .................................................................................. 656
14.5.99 PVR - Processor Version Register .................................................................................. 657
14.5.100 SPRG0 - Software Special Purpose Register 0 ............................................................ 658
14.5.101 SPRG1 - Software Special Purpose Register 1 ............................................................ 659
14.5.102 SPRG2 - Software Special Purpose Register 2 ............................................................ 660
14.5.103 SPRG3 - Software Special Purpose Register 3 ............................................................ 661
14.5.104 SPRG4 - Software Special Purpose Register 4 ............................................................ 662
14.5.105 SPRG5 - Software Special Purpose Register 5 ............................................................ 663
14.5.106 SPRG6 - Software Special Purpose Register 6 ............................................................ 664
14.5.107 SPRG7 - Software Special Purpose Register 7 ............................................................ 665
14.5.108 SPRG8 - Software Special Purpose Register 8 ............................................................ 666
14.5.109 SRR0 - Save/Restore Register 0 ................................................................................... 667
14.5.110 SRR1 - Save/Restore Register 1 ................................................................................... 668
14.5.111 TB - Timebase ............................................................................................................... 670
14.5.112 TBL - Timebase Lower .................................................................................................. 671
14.5.113 TBU - Timebase Upper .................................................................................................. 672
14.5.114 TCR - Timer Control Register ........................................................................................ 673
Version 1.3 October 23, 2012
Contents
Page 17 of 864
Page 18
User’s Manual
A2 Processor
14.5.115 TENC - Thread Enable Clear Register .......................................................................... 675
14.5.116 TENS - Thread Enable Set Register .............................................................................. 676
14.5.117 TENSR - Thread Enable Status Register ...................................................................... 677
14.5.118 TIR - Thread Identification Register ............................................................................... 678
14.5.119 TLB0CFG - TLB 0 Configuration Register ..................................................................... 679
14.5.120 TLB0PS - TLB 0 Page Size Register ............................................................................. 680
14.5.121 TRACE - Hardware Trace Macro Control Register ........................................................ 681
14.5.122 TSR - Timer Status Register .......................................................................................... 682
14.5.123 UDEC - User Decrementer ............................................................................................ 683
14.5.124 VRSAVE - Vector Register Save ................................................................................... 684
14.5.125 XER - Fixed Point Exception Register ........................................................................... 685
14.5.126 XESR1 - XU Event Select Register 1 ............................................................................ 686
14.5.127 XESR2 - XU Event Select Register 2 ............................................................................ 687
14.5.128 XESR3 - XU Event Select Register 3 ............................................................................ 688
14.5.129 XESR4 - XU Event Select Register 4 ............................................................................ 689
14.5.130 XUCR0 - Execution Unit Configuration Register 0 ......................................................... 690
14.5.131 XUCR1 - Execution Unit Configuration Register 1 ......................................................... 693
14.5.132 XUCR2 - Execution Unit Configuration Register 2 ......................................................... 694
14.5.133 XUCR3 - Execution Unit Configuration Register 3 ......................................................... 695
14.5.134 XUCR4 - Execution Unit Configuration Register 4 ......................................................... 696
14.5.135 XUDBG0 - Execution Unit Debug Register 0 ................................................................. 697
14.5.136 XUDBG1 - Execution Unit Debug Register 1 ................................................................. 698
14.5.137 XUDBG2 - Execution Unit Debug Register 2 ................................................................. 699
15. SCOM Accessible Registers ................................................................................. 701
15.1 Serial Communications (SCOM) Description .............................................................................. 701
15.2 SCOM Register Summary ........................................................................................................... 703
15.2.1 Read and Write Access Methods ....................................................................................... 703
15.2.1.1 Reset with AND Mask ................................................................................................. 703
15.2.1.2 Set with OR Mask ....................................................................................................... 703
15.2.2 SCOM Register Summary Table ....................................................................................... 703
15.3 Alphabetical Register Listing ....................................................................................................... 705
15.3.1 AXU Debug Select Register (ABDSR) ............................................................................... 705
15.3.2 Error Injection Register (ERRINJ) ...................................................................................... 706
15.3.3 Fault Isolation Register 0 and Associated Registers ......................................................... 707
15.3.4 Fault Isolation Register 1 and Associated Registers ......................................................... 711
15.3.5 Fault Isolation Register 2 and Associated Registers ......................................................... 716
15.3.6 IU Debug Select Register (IDSR) ...................................................................................... 720
15.3.7 MMU/PC Debug Select Register (MPDSR) ....................................................................... 723
15.3.8 PC Configuration Register 0 (PCCR0) ............................................................................... 725
15.3.9 Ram Data Registers (RAMD, RAMDH, RAMDL) ............................................................... 726
15.3.10 Ram Instruction and Command Registers (RAMC, RAMI, RAMIC) ................................ 727
15.3.11 Special Attention Register (SPATTN) .............................................................................. 729
15.3.12 Thread Control and Status Register (THRCTL) ............................................................... 730
15.3.13 XU Debug Select Register1 (XDSR1) .............................................................................. 731
15.3.14 XU Debug Select Register2 (XDSR2) .............................................................................. 734
Contents
Page 18 of 864
Version 1.3
October 23, 2012
Page 19
User’s Manual
A2 Processor
Appendix A. Processor Instruction Summary ......................................................... 737
A.1 Instruction Formats ....................................................................................................................... 737
A.2 Implemented Instructions Sorted by Mnemonic ............................................................................ 737
Appendix B. FU Instruction Summary ...................................................................... 756
B.1 FU Instructions Sorted by Opcode ................................................................................................ 756
Appendix C. Debug and Trigger Groups .................................................................. 761
C.1 Unit Debug Multiplexer Component .............................................................................................. 761
C.2 Debug Multiplexer Component Ordering on the Ramp Bus ......................................................... 761
C.3 Example Debug Multiplexer Configuration Settings ..................................................................... 762
C.3.1 Multiplexer Configuration for Trace/Trigger Signals from a Single Unit .............................. 762
C.3.2 Multiplexer Configuration for Trace/Trigger Signals from Multiple Units ............................. 762
C.4 AXU Debug Select Register and Debug Group Tables ................................................................ 763
C.5 IU Debug Select Register and Debug Group Tables .................................................................... 766
C.6 MMU and PC Debug Select Register and Debug Group Tables .................................................. 778
C.7 XU Debug Select Register1 and Debug Group Tables ................................................................ 798
C.8 XU Debug Select Register2 and Debug Group Tables ................................................................ 817
Appendix D. Instruction Execution Performance and Code Optimizations .......... 833
D.1 A2 Pipeline Overview ................................................................................................................... 833
D.1.1 Arbitration Stages ............................................................................................................... 834
D.1.2 Stall Stages ......................................................................................................................... 835
D.1.3 Flush Stages ....................................................................................................................... 835
D.2 Fetch ............................................................................................................................................. 835
D.2.1 Fetch Arbitration .................................................................................................................. 837
D.2.2 Next Instruction Fetch Address Computation ..................................................................... 837
D.2.3 Instruction Cache Access and Alignment ........................................................................... 837
D.2.4 Instruction Cache Misses .................................................................................................... 837
D.2.5 I-ERAT Misses .................................................................................................................... 838
D.2.6 Instruction Buffer Operation ................................................................................................ 838
D.2.7 Branches and Branch Prediction ........................................................................................ 838
D.2.7.1 Branch Direction Prediction and the Branch History Table (BHT) ............................... 840
D.2.7.2 Taken-Branch Redirection ........................................................................................... 840
D.2.7.3 Branch Target Prediction ............................................................................................. 840
D.2.7.4 Branch Resolution and Mispredictions ........................................................................ 841
D.3 Instruction Issue Operation ........................................................................................................... 841
D.4 Instruction Pair Execution Performance Rules ............................................................................. 841
D.4.1 Defining Latency, Penalty, and Execution Time ................................................................. 841
D.4.2 Unified CR Dependency ..................................................................................................... 842
D.4.3 General CR Operand Dependency ..................................................................................... 842
D.4.4 Move To Condition Register Fields (mtcrf) Instruction Dependency ................................... 843
D.4.5 Move From Condition Register (mfcr) Instruction Dependency .......................................... 843
D.4.6 Move From and Move To Special Purpose Register (mfspr) Dependency ......................... 843
D.4.7 Move From Machine State Register (mfmsr) Dependency ................................................. 843
D.4.8 Multiply Dependency ........................................................................................................... 843
D.4.9 Divide Dependency ............................................................................................................. 844
D.4.10 Store Word Conditional Indexed (stwcx.) Instruction Dependency ................................... 844
Version 1.3 October 23, 2012
Contents
Page 19 of 864
Page 20
User’s Manual
A2 Processor
D.4.11 TLB Management Instruction Dependencies .................................................................... 845
D.4.12 Processor Control Instruction Operation ........................................................................... 845
D.4.13 Load Instruction Dependency ............................................................................................ 846
D.4.14 String/Multiple Operations ................................................................................................. 846
D.4.15 Load-and-Reserve and Store-Conditional Instructions ..................................................... 846
D.4.16 Storage Synchronization Operations ................................................................................. 847
D.5 Loads, Stores, and Data Cache Organization .............................................................................. 847
D.5.1 Overview ............................................................................................................................. 847
D.5.2 Loads ................................................................................................................................... 848
D.5.3 Stores .................................................................................................................................. 848
D.5.4 Load Miss Queue ................................................................................................................ 849
D.5.5 L2 Command Arbitration ..................................................................................................... 849
D.5.6 D-ERAT Misses ................................................................................................................... 849
D.5.7 Back Invalidations ............................................................................................................... 849
D.5.8 Address Alignment .............................................................................................................. 849
D.6 Interrupt Effects ............................................................................................................................ 850
D.7 Floating-Point Instruction Handling ............................................................................................... 850
D.7.1 General FPR Operand Dependency ................................................................................... 852
D.7.2 Denormalized Results ......................................................................................................... 852
D.7.3 Denormalized Operands ..................................................................................................... 852
D.7.4 Not a Number (NaN) Cases ................................................................................................ 852
D.7.5 Floating-Point Load Dependency ........................................................................................ 852
D.7.6 Floating-Point Store Data Dependency ............................................................................... 852
D.7.7 General CR Operand Dependency ..................................................................................... 853
D.7.8 Floating-Point Divide Dependency ...................................................................................... 853
D.7.9 Floating-Point Square Root Dependency ............................................................................ 853
D.7.10 Move to Condition Register from Floating-Point Status and Control Register Dependency ....
853
D.7.11 Move to FPSCR Fields and FPSCR Dependencies .......................................................... 854
D.7.12 Floating-Point Record Forms ............................................................................................ 854
D.8 Interrupt Conditions ...................................................................................................................... 854
D.9 Flush Conditions ........................................................................................................................... 858
Appendix E. Programming Examples ........................................................................ 861
E.1 Wait Instruction with Fast Wakeup for Power Savings ................................................................. 861
E.2 Floating-Point Conversions ........................................................................................................... 861
E.2.1 Conversion from Floating-Point Number to Signed Integer Word ....................................... 861
E.2.2 Conversion from Floating-Point Number to Unsigned Integer Word ................................... 862
E.3 Floating-Point Selection ................................................................................................................ 862
E.3.1 Comparison to Zero ............................................................................................................. 863
E.3.2 Minimum and Maximum ...................................................................................................... 863
E.3.3 Simple If-Then-Else Constructions ...................................................................................... 863
E.4 Notes ............................................................................................................................................. 863
Contents
Page 20 of 864
Version 1.3
October 23, 2012
Page 21
User’s Manual
A2 Processor

List of Figures

Figure 1-1. A2 Core Organization ............................................................................................................. 50
Figure 1-2. A2 Processor Block Diagram ................................................................................................. 56
Figure 2-1. A2 Core Instruction Unit ......................................................................................................... 79
Figure 2-2. Instruction Issue Timing Diagram 1 ........................................................................................ 80
Figure 2-3. Instruction Issue Timing Diagram 2 ........................................................................................ 81
Figure 2-4. Instruction Issue Timing Diagram 3 ........................................................................................ 81
Figure 2-5. User Programming Model Registers ...................................................................................... 83
Figure 3-1. Approximation to Real Numbers .......................................................................................... 135
Figure 3-2. Selection of z1 and z2 .......................................................................................................... 140
Figure 4-1. Software-Initiated Reset Request Overview ........................................................................ 163
Figure 6-1. Virtual Address to TLB Entry Match Process ....................................................................... 190
Figure 6-2. Effective-to-Real Address Translation Flow ......................................................................... 192
Figure 6-3. ERAT Entry Word Definitions ............................................................................................... 220
Figure 6-4. ERAT Entry Word Definitions for 32-Bit Mode ..................................................................... 227
Figure 6-5. Indirect Entry to Page Table Size Calculation ...................................................................... 238
Figure 6-6. Page Table Entry Format ..................................................................................................... 239
Figure 9-1. Relationship of Timer Facilities to the Time Base ................................................................ 387
Figure 9-2. Watchdog State Machine ..................................................................................................... 395
Figure 10-1. Pass-Through Trace and Trigger Bus Overview .................................................................. 446
Figure 10-2. Trace and Trigger Bus Unit Description ............................................................................... 447
Figure 11-1. Performance Event Selection Overview ............................................................................... 449
Figure 11-2. Core Event Multiplexer Description ...................................................................................... 451
Figure 11-3. A2 Common Unit Event Multiplexer Component .................................................................. 456
Figure 12-1. ICSWX (RS
Figure 12-2. Coprocessor Command Word (CCW) .................................................................................. 518
Figure 12-3. Generic Coprocessor-Request Block ................................................................................... 520
Figure 15-1. Chip Level Infrastructure Example to Access SCOM Registers in the A2 Core .................. 702
Figure 15-2. Principle Timing of Information Carried on CCH and DCH .................................................. 702
Figure C-1. Debug Multiplexer Component ............................................................................................. 761
Figure D-1. A2 Pipeline Structure ........................................................................................................... 833
Figure D-2. Instruction Cache ................................................................................................................. 836
Figure D-3. Branch Prediction ................................................................................................................. 839
Figure D-4. FU Dataflow ......................................................................................................................... 851
) Coprocessor-Command Word ................................................................. 517
32:63
Version 1.3 October 23, 2012
List of Figures
Page 21 of 864
Page 22
User’s Manual
A2 Processor
List of Figures
Page 22 of 864
Version 1.3
October 23, 2012
Page 23
User’s Manual
A2 Processor

List of Tables

Table 2-1. Data Operand Definitions ....................................................................................................... 63
Table 2-2. Alignment Effects for Storage Access Instructions ................................................................ 63
Table 2-3. Priority Levels ......................................................................................................................... 76
Table 2-4. Other “or” Instruction Hints ..................................................................................................... 76
Table 2-5. Program Priority Register (PPR32) ........................................................................................ 76
Table 2-6. Register Mapping ................................................................................................................... 84
Table 2-7. Category Listing ..................................................................................................................... 86
Table 2-8. Instruction Categories ............................................................................................................ 89
Table 2-9. Integer Storage Access Instructions ...................................................................................... 90
Table 2-10. Integer Storage Access Instructions by External Process ID ................................................. 90
Table 2-11. Operand Handling Dependent on Alignment ......................................................................... 90
Table 2-12. Integer Arithmetic Instructions ................................................................................................ 91
Table 2-13. Integer Logical Instructions .................................................................................................... 92
Table 2-14. Integer Compare Instructions ................................................................................................. 92
Table 2-15. Integer Trap Instructions ........................................................................................................ 92
Table 2-16. Integer Rotate Instructions ..................................................................................................... 93
Table 2-17. Integer Shift Instructions ........................................................................................................ 93
Table 2-18. Integer Population Count Instructions .................................................................................... 93
Table 2-19. Integer Select Instruction ....................................................................................................... 93
Table 2-20. Branch Instructions ................................................................................................................ 94
Table 2-21. Condition Register Logical Instructions .................................................................................. 94
Table 2-22. Register Management Instructions ........................................................................................ 95
Table 2-23. System Linkage Instructions .................................................................................................. 95
Table 2-24. Processor Control Instruction ................................................................................................. 95
Table 2-25. Cache Management Instructions ........................................................................................... 96
Table 2-26. Cache Management Instructions by External Process ID ...................................................... 96
Table 2-27. TLB Management Instructions ............................................................................................... 96
Table 2-28. Processor Synchronization Instruction ................................................................................... 97
Table 2-29. Load and Reserve and Store Conditional Instructions ........................................................... 97
Table 2-30. Storage Synchronization Instructions ..................................................................................... 97
Table 2-31. Wait Instruction ...................................................................................................................... 98
Table 2-32. Initiate Coprocessor Instructions ............................................................................................ 98
Table 2-33. Cache Initialization Instructions .............................................................................................. 98
Table 2-34. BO Field Encodings ............................................................................................................. 100
Table 2-35. ‘at’ Bit Encodings .................................................................................................................. 100
Table 2-36. CR Updating Instructions ..................................................................................................... 108
Table 2-37. GPR Registers ..................................................................................................................... 110
Table 2-38. XER[SO,OV] Updating Instructions ...................................................................................... 111
Version 1.3 October 23, 2012
List of Tables
Page 23 of 864
Page 24
User’s Manual
A2 Processor
Table 2-39. XER[CA] Updating Instructions ............................................................................................111
Table 2-40. SPRG0 Register ...................................................................................................................114
Table 2-41. SPRG1 Register ...................................................................................................................114
Table 2-42. SPRG2 Register ...................................................................................................................115
Table 2-43. SPRG3 Register ...................................................................................................................115
Table 2-44. SPRG4 Register ...................................................................................................................115
Table 2-45. SPRG5 Register ...................................................................................................................116
Table 2-46. SPRG6 Register ...................................................................................................................116
Table 2-47. SPRG7 Register ...................................................................................................................116
Table 2-48. SPRG8 Register ...................................................................................................................117
Table 2-49. GSPRG0 Register ................................................................................................................117
Table 2-50. GSPRG1 Register ................................................................................................................117
Table 2-51. GSPRG2 Register ................................................................................................................118
Table 2-52. GSPRG3 Register ................................................................................................................118
Table 2-53. Privileged Instructions .......................................................................................................... 121
Table 3-1. Data Operand Definitions .....................................................................................................128
Table 3-2. Invalid Operation Exception Categories ............................................................................... 129
Table 3-3. Floating-Point Registers (FPR0–FPR31) ............................................................................. 130
Table 3-4. Floating-Point Status and Control Register (FPSCR) ...........................................................131
Table 3-5. Floating-Point Single Format ................................................................................................134
Table 3-6. Floating-Point Double Format ............................................................................................... 134
Table 3-7. Format Fields ........................................................................................................................134
Table 3-8. IEEE 754 Floating-Point Fields .............................................................................................134
Table 3-9. Rounding Modes .................................................................................................................. 140
Table 3-10. IEEE 64-Bit Execution Model ...............................................................................................141
Table 3-11. Interpretation of the G, R, and X Bits .................................................................................... 141
Table 3-12. Location of the Guard, Round, and Sticky Bits in the IEEE Execution Model ......................142
Table 3-13. Multiply-Add 64-Bit Execution Model ....................................................................................143
Table 3-14. Location of Guard, Round, and Sticky Bits in the Multiply-Add Execution Model .................143
Table 3-15. Floating-Point Load Instructions ...........................................................................................146
Table 3-16. Floating-Point Store Instructions ..........................................................................................147
Table 3-17. Floating-Point Move Instructions .......................................................................................... 148
Table 3-18. Floating-Point Elementary Arithmetic Instructions ................................................................148
Table 3-19. Floating-Point Multiply-Add Instructions ...............................................................................149
Table 3-20. Floating-Point Rounding and Conversion Instructions .........................................................150
Table 3-21. Comparison Sets ..................................................................................................................150
Table 3-22. Floating-Point Compare and Select Instructions .................................................................. 151
Table 3-23. Floating-Point Status and Control Register Instructions .......................................................151
Table 4-1. Register Reset Values ..........................................................................................................155
List of Tables
Page 24 of 864
Version 1.3
October 23, 2012
Page 25
User’s Manual
A2 Processor
Table 4-2. Shadow TLB Array Entry Initialization .................................................................................. 158
Table 5-1. Data Cache Array Organization ........................................................................................... 169
Table 5-2. Cache Size and Parameters ................................................................................................ 169
Table 5-3. Instruction Cache Array Organization .................................................................................. 170
Table 5-4. Cache Size and Parameters ................................................................................................ 170
Table 5-5. XUCR Bits ............................................................................................................................ 183
Table 6-1. Page Size and Effective Address to EPN Comparison ........................................................ 191
Table 6-2. Page Size and Real Address Formation .............................................................................. 192
Table 6-3. Access Control Applied to Cache Management Instructions ............................................... 194
Table 6-4. TLB Entry Fields ................................................................................................................... 199
Table 6-5. ERAT Class Field Reload Value For UTLB Hits .................................................................. 208
Table 6-6. LRAT Entry Fields ................................................................................................................ 211
Table 6-7. TLB Management Instruction Privilege Levels ..................................................................... 212
Table 6-8. TLB Congruence Class Hashing Function (of EPN Address Bits) ....................................... 214
Table 6-9. Supported EPN[27:51] Field Values in Downbound TLBIVAX Request .............................. 218
Table 6-10. ERAT Management Instruction Privilege Levels .................................................................. 219
Table 6-11. Summary of Supported IS Field Values in ERATIVAX ........................................................ 222
Table 6-12. Supported EPN[27:51] Field Values in Downbound erativax Request ............................... 224
Table 6-13. TLB Reservation Fields ........................................................................................................ 233
Table 6-14. TLB Update After Page Table Translation ........................................................................... 242
Table 6-15. MAS Register Update Summary .......................................................................................... 275
Table 7-1. Register Mapping in Guest State ......................................................................................... 301
Table 7-2. Interrupt Types and Associated Offsets ............................................................................... 316
Table 7-3. Interrupt and Exception Types ............................................................................................. 323
Table 8-1. Invalid Operation Exception Categories ............................................................................... 372
Table 8-2. MSR[FE0, FE1] Modes ........................................................................................................ 374
Table 8-3. Invalid Operation Exceptions ............................................................................................... 376
Table 8-4. QNaN Result ........................................................................................................................ 381
Table 8-5. FPSCR[FPRF] Result Flags ................................................................................................. 382
Table 8-6. Floating-Point Status and Control Register (FPSCR) .......................................................... 383
Table 8-7. Bit Encodings for a CR Field ................................................................................................ 386
Table 9-1. Timebase Register (TB) ....................................................................................................... 388
Table 9-2. Timebase Lower Register (TBL) .......................................................................................... 388
Table 9-3. Timebase Upper Register (TBU) .......................................................................................... 389
Table 9-4. Decrementer Register (DEC) ............................................................................................... 390
Table 9-5. Decrementer Auto-Reload Register (DECAR) ..................................................................... 390
Table 9-6. Fixed Interval Timer Period Selection .................................................................................. 392
Table 9-7. Watchdog Timer Period Selection ........................................................................................ 393
Table 9-8. Watchdog Timer Exception Behavior ................................................................................... 394
Version 1.3 October 23, 2012
List of Tables
Page 25 of 864
Page 26
User’s Manual
A2 Processor
Table 10-1. PCCR0[DBA] (Debug Action) Definition per Thread ............................................................ 400
Table 10-2. Debug Events .......................................................................................................................402
Table 10-3. Debug Event Summary ........................................................................................................415
Table 10-4. Ram Instruction and Command Register (RAMIC) ..............................................................431
Table 10-5. Ram Instruction Register (RAMI) ..........................................................................................431
Table 10-6. Ram Command Register (RAMC) ........................................................................................431
Table 10-7. Ram Data Register (RAMD) .................................................................................................433
Table 10-8. Ram Data Register High (RAMDH) ......................................................................................433
Table 10-9. Ram Data Register Low (RAMDL) ....................................................................................... 434
Table 10-10. Thread Control and Status Register (THRCTL) ...................................................................442
Table 10-11. PC Configuration Register 0 (PCCR0) .................................................................................444
Table 11-1. Core Event Multiplexer to External Event Bus ......................................................................451
Table 11-2. Performance Monitor Event Tags .........................................................................................457
Table 11-3. FU Performance Events Table .............................................................................................458
Table 11-4. IU Performance Events Table ..............................................................................................458
Table 11-5. XU Performance Events Table .............................................................................................460
Table 11-6. LSU Performance Events Table ...........................................................................................462
Table 11-7. MMU Performance Events Table .........................................................................................465
Table 11-8. Core Instruction Trace Data and Control Signals .................................................................477
Table 11-9. First Instruction Trace Record Format ..................................................................................477
Table 11-10. Format of Subsequent Instruction Trace Records ................................................................478
Table 11-11. Trace Record Type Decode and Instruction Trace Record Ordering ...................................478
Table 14-1. Register Summary ................................................................................................................530
Table 15-1. SCOM Register Summary ....................................................................................................703
Table 15-2. Error Injection Register .........................................................................................................706
Table 15-3. Fault Isolation Register 0 (FIR0) ........................................................................................... 708
Table 15-4. FIR0 Action1 Register (FIR0A1) ...........................................................................................709
Table 15-5. FIR0 Mask Register (FIR0M) ................................................................................................710
Table 15-6. FIR0 and FIR1 Registers (Read Only) ................................................................................. 711
Table 15-7. Fault Isolation Register 1 ......................................................................................................711
Table 15-8. FIR1 Action0 Register (FIR1A0) ...........................................................................................713
Table 15-9. FIR1 Action1 Register (FIR1A1) ...........................................................................................714
Table 15-10. FIR1 Mask Register (FIR1M) ................................................................................................714
Table 15-11. Fault Isolation Register 2 (FIR2) ........................................................................................... 716
Table 15-12. FIR2 Action0 Register (FIR2A0) ...........................................................................................717
Table 15-13. FIR2 Action1 Register (FIR2A1) ...........................................................................................718
Table 15-14. FIR2 Mask Register (FIR2M) ................................................................................................720
Table 15-15. PC Configuration Register 0 (PCCR0) .................................................................................725
Table 15-16. Ram Data Register (RAMD) .................................................................................................726
List of Tables
Page 26 of 864
Version 1.3
October 23, 2012
Page 27
User’s Manual
A2 Processor
Table 15-17. Ram Data Register High (RAMDH) ...................................................................................... 726
Table 15-18. Ram Data Register Low (RAMDL) ....................................................................................... 727
Table 15-19. Ram Command Register (RAMC) ........................................................................................ 727
Table 15-20. Ram Instruction Register (RAMI) ......................................................................................... 729
Table 15-21. Ram Instruction and Command Register (RAMIC) .............................................................. 729
Table 15-22. Special Attention Register .................................................................................................... 729
Table 15-23. Thread Control and Status Register (THRCTL) ................................................................... 730
Table A-1. A2 Core Instructions by Mnemonic ...................................................................................... 738
Table B-1. FU Instructions by Opcode ................................................................................................... 756
Table C-1. AXU Debug Select Register (ADBSR) ................................................................................. 763
Table C-2. AXU Debug Multiplexer Debug and Trigger Groups ............................................................ 764
Table C-3. IU Debug Select Register (IDSR) ......................................................................................... 766
Table C-4. IU Debug Mux1 Debug and Trigger Groups ........................................................................ 768
Table C-5. IU Debug Mux2 Debug and Trigger Groups ........................................................................ 774
Table C-6. MMU and PC Debug Select Register (MPDSR) .................................................................. 778
Table C-7. MMU Debug Multiplexer Debug and Trigger Groups ........................................................... 781
Table C-8. PC Debug Multiplexer Debug and Trigger Groups .............................................................. 796
Table C-9. XU Debug Select Register1 (XDSR1) .................................................................................. 798
Table C-10. XU Debug Mux1 Debug and Trigger Groups ....................................................................... 800
Table C-11. XU Debug Mux2 Debug and Trigger Groups ....................................................................... 807
Table C-12. XU Debug Select Register2 (XDSR2) .................................................................................. 817
Table C-13. XU Debug Mux3 Debug and Trigger Groups ....................................................................... 819
Table C-14. XU Debug Mux4 Debug and Trigger Groups ....................................................................... 830
Table D-1. Multiply Instructions and Their Associated Latency ............................................................. 844
Table D-2. Divide Instructions and Their Associated Latency ............................................................... 844
Table D-3. SRAM Operations ................................................................................................................ 847
Table D-4. Interrupt Conditions .............................................................................................................. 854
Table D-5. Flush Conditions .................................................................................................................. 858
Version 1.3 October 23, 2012
List of Tables
Page 27 of 864
Page 28
User’s Manual
A2 Processor
List of Tables
Page 28 of 864
Version 1.3
October 23, 2012
Page 29
User’s Manual
A2 Processor

Revision Log

Each release of this document supersedes all previously released versions. The revision log lists all signifi­cant changes made to the document since its initial release. In the rest of the document, change bars in the margin indicate that the adjacent text was modified from the previous release of this document.
Revision Date Pages Description
October 23, 2012
657
May 25, 2011 Version 1.2.
April 1, 2011 Version 1.1.
518 Added a programming note to Section 12.5.3 Execution.
90 Revised Table 2-11 Operand Handling Dependent on Alignment.
December 15, 2010 Version 1.0. Initial release.
Version 1.3. Updated Section 14.5.99 PVR - Processor Version Register.
Removed “IBM Confidential.”
Version 1.3 October 23, 2012
Revision Log
Page 29 of 864
Page 30
User’s Manual
A2 Processor
Revision Log
Page 30 of 864
Version 1.3
October 23, 2012
Page 31
User’s Manual
A2 Processor

About This Book

This user’s manual provides the architectural overview, programming model, and detailed information about the instruction set, registers, and other facilities of the IBM® Power ISA
The A2 embedded controller core features:
• Power ISA Architecture
• Concurrent-issue pipeline with dynamic branch prediction
A2 64-bit embedded processor core.
• Separate 16 KB
each instruction and data caches
• Memory management unit (MMU) with a 512-entry translation lookaside buffer (TLB)
•4TB
(42-bit) physical address capability
• 128-bit reload interface and 128-bit store interface
ANSI
/IEEE 754-1985 compliant floating-point1
• Single-precision and double-precision operation in hardware
• Auxiliary execution unit (AXU) that executes the Power ISA floating-point instruction set
• Super-pipelined: Single cycle throughput for most instructions
• In-order execution and completion

Who Should Use This Book

This book is for system hardware and software developers and for application developers who need to under­stand the A2 core. The audience should understand embedded system design, operating systems, RISC microprocessing, and computer organization and architecture.

How to Use This Book

This book describes the A2 core device architecture, programming model, registers, and instruction set. This book contains the following chapters:
Overview on page 45
CPU Programming Model on page 61
FU Programming Model on page 127
Initialization on page 153
Instruction and Data Caches on page 169
Memory Management on page 185
CPU Interrupts and Exceptions on page 293
FU Interrupts and Exceptions on page 371
Timer Facilities on page 387
1.Power ISA FUs require software support for IEEE compliance.
Version 1.3 October 23, 2012
About This Book
Page 31 of 864
Page 32
User’s Manual
A2 Processor
Debug Facilities on page 399
Performance Events and Event Selection on page 449
Implementation Dependent Instructions on page 481
Power Management Methods on page 525
Register Summary on page 529
SCOM Accessible Registers on page 701
This book contains the following appendixes:
Processor Instruction Summary on page 737
FU Instruction Summary on page 756
Debug and Trigger Groups on page 761
Instruction Execution Performance and Code Optimizations on page 833
Programming Examples on page 861

Notation

The manual uses the following notational conventions:
• Active low signals are shown with an overbar (Active_Low
).
• All numbers are decimal unless specified in some special way.
• 0bnnnn means a number expressed in binary format.
• 0xnnnn means a number expressed in hexadecimal format.
Underscores might be used between digits.
• RA refers to General Purpose Register (GPR) RA.
• (RA) refers to the contents of GPR RA.
• (RA|0) refers to the contents of GPR RA or to the value 0 if the RA field is 0.
• Bits in registers, instructions, and fields are specified as follows.
• Bits are numbered most-significant bit to least-significant bit, starting with bit 0.
•X
means bit p of register, instruction, or field X.
p
•X
means bits p through q of a register, instruction, or field X.
p:q
•X
means bits p, q,... of a register, instruction, or field X.
p,q,...
• X[p] means a named field p of register X.
• X[p:q] means named fields p through q of register X.
• X[p,q,...]
means named fields p, q,... of register X.
...
• ¬X means the ones complement of the contents of X.
• A period (.) as the last character of an instruction mnemonic means that the instruction records status information in certain fields of the Condition Register as a side effect of execution, as described in Section 12 Implementation Dependent Instructions on page 481.
About This Book
Page 32 of 864
Version 1.3
October 23, 2012
Page 33
User’s Manual
A2 Processor
• The symbol  is used to describe the concatenation of two values. For example, 0b010  0b111 is the same as 0b010111.
n
•x
means x raised to the n power.
n
x means the replication of x, n times (that is, x concatenated to itself n – 1 times). n0 and n1 are special
• cases:
n
0 means a field of n bits with each bit equal to 0. Thus 50 is equivalent to 0b00000.
n
1 means a field of n bits with each bit equal to 1. Thus 51 is equivalent to 0b11111.
• /, //, ///,... denotes a reserved field in an instruction or in a register.
• ? denotes an allocated bit in a register.
• A shaded field denotes a field that is reserved or allocated in an instruction or in a register.

Related Publications

Power ISA User Set Architecture (Book I, Version 2.06)
Power ISA Virtual Environment Architecture (Book II, Version 2.06)
Power ISA Operating Environment Architecture (Book III-E, Version 2.06)
The Power ISA specifications are available at www.power.org
.
Version 1.3 October 23, 2012
About This Book
Page 33 of 864
Page 34
User’s Manual
A2 Processor
About This Book
Page 34 of 864
Version 1.3
October 23, 2012
Page 35
User’s Manual
A2 Processor

List of Acronyms and Abbreviations

ABIST automatic built-in self test
ALU arithmetic logic unit
ANSI American National Standards Institute
ARE auto-reload enable
AS address space
ATB alternate time base category
attn attention
AXU auxiliary execution unit
B base category
BCLR branch conditional to Link Register
BE big endian
BHT branch history table
BP branch prediction
BRDCAST broadcast
BRT branch taken
BTA branch target address
CA carry
CAM content addressable memory
CC congruence class
CCH control channel
CCW coprocessor command word
CD coprocessor directive
CEE change exception enable
CI coprocessor instance
CIA current instruction address
CPU central processing unit
CRB coprocessor-request block
CS cache specification category
Version 1.3 October 23, 2012
List of Acronyms and Abbreviations
Page 35 of 864
Page 36
User’s Manual
A2 Processor
CSB control status block
CSI context synchronizing instruction
DAC data address compare
DBA debug action
DBELL doorbell interrupt
DCC data cache controller
DCH data channel
DCI data cache invalidate instruction
DCR device control register
DEA data effective address
DEC decrementer
D-ERAT data ERAT
DERRDET D-ERAT error detect
DFP decimal floating-point category
DL downlink
DRA data real address
DSI data storage interrupt
DSP digital signal processor
DVC data value compare
E endian or embedded category
E.CD embedded.cache debug category
E.CI embedded.cache initialization category
E.DC embedded.device control category
E.ED embedded.enhanced debug category
E.HV embedded.hypervisor category
E.LE embedded.little-endian category
E.PC embedded.processor control category
E.PD embedded.external PID category
E.PM embedded.performance monitor category
List of Acronyms and Abbreviations
Page 36 of 864
Version 1.3
October 23, 2012
Page 37
User’s Manual
A2 Processor
E.PT embedded.page table category
E.TWC embedded.tlb write conditional category
EA effective address
ECC error-correcting code
ECL embedded cache locking category
EDM external debug mode
EEN error entry number
EH exclusive access hint
EM embedded multithreading category
EM.TM embedded multithreading.thread management category
EPID external PID
EPLC external process ID load context
EPN effective page number
EPR external problem state bit
EPSC external process ID store context
ERAT effective to real address translation
ESID effective segment ID
EVPR Exception Vector Prefix Register
EXC external control category
EXP external proxy category
FE floating-point equal
FG floating-point greater than
FIFO first-in, first out
FIR fault isolation register
FIT fixed interval timer
FL floating-point less than
FP floating-point category
FP.R floating-point.record category
FPR floating-point register
Version 1.3 October 23, 2012
List of Acronyms and Abbreviations
Page 37 of 864
Page 38
User’s Manual
A2 Processor
FU floating-point unit
FXU fixed-point unit
G guarded
GB gigabyte
GB/sec gigabytes per second
GHz gigahertz
GPR general purpose register
GS guest state
HTM hardware trace macro
HWT hardware table walker
I caching inhibited
I/O input/output
IAC instruction address compare
IEA instruction effective address
IBUFF instruction buffer
ICC instruction cache controller
ICI instruction cache immediate instruction
ICMP instruction complete
IDE imprecise debug event
IEA instruction effective address
IEEE Institute of Electrical and Electronics Engineers
I-ERAT instruction ERAT
IERRDET I-ERAT error detect
IFAR Instruction Fetch Address Register
IND indirect
INSTTRACE instruction trace mode
I/O input/output
IR intermediate result
IRPT interrupt
List of Acronyms and Abbreviations
Page 38 of 864
Version 1.3
October 23, 2012
Page 39
User’s Manual
A2 Processor
IS instruction fetch address space OR invalidation select
ISA instruction set architecture
ISI instruction storage interrupt
IU instruction unit
IU0 - IU6 instruction unit pipeline stage
IVC instruction value compare
JTAG Joint Test Action Group
KB kilobyte
LA logical address
L1 level 1
L2 level 2
LA logical address
LBIST logic built-in self-test
LE little endian
LIFO last-in, first-out
LMA legacy integer multiply-accumulate1 category
LMQ load miss queue
LMV legacy move assist category
LPID logical partition identifier
LPIDTAG LPID tag
LPN logical page number
LRAT logical to real address translation
LRU least recently used
LSb least significant bit
LSB least significant byte
LSQ load/store quadword category
LSU load/store unit
M memory coherence required
MA move assist category
Version 1.3 October 23, 2012
List of Acronyms and Abbreviations
Page 39 of 864
Page 40
User’s Manual
A2 Processor
MAS MMU assist
MAV MMU Architecture version
MB megabyte
MESI modified, exclusive, shared, invalid
MHz megahertz
MMC memory coherence category
MMU memory management unit
MSB most significant byte
MSRP Machine State Register protect
MT multithread
NaN Not a Number
NAND not AND
NH next higher in magnitude
NIA next instruction address
NL next lower in magnitude
NOR not OR
OV overflow
OX overflow exception
PC processor control
PCB pervasive control bus
PCR processor compatibility category
PIB pervasive interconnect bus
PID processor ID
PIRTAG PIR tag
PME power-management
PMU performance monitor unit
POR power-on reset
PS page size specified by PTE
PTE page table entry
List of Acronyms and Abbreviations
Page 40 of 864
Version 1.3
October 23, 2012
Page 41
User’s Manual
A2 Processor
QNaN quiet NaN
RA real address
RAW read-after-write
REE reference exception enable
RET return
RISC reduced instruction set computing
RMT replacement management table
RO read only
ROM read-only memory
RPN real page number
S server category
S.PM server.performance monitor category
S.RPTA server.relaxed page table alignment category
SAO strong access order category
SCOM serial communications
SCPM store conditional page mobility category
SEM sequential execution model
SER soft error rate
SIMD single instruction, multiple data
SLB segment lookaside buffer
SNaN signalling NaN
SO summary overflow
SOC system-on-a-chip
SP signal processing engine category
SPE signal processing engine
SP.FD SPE.embedded float scalar double category
SP.FS SPE.embedded float scalar single category
SP.FV SPE.embedded float vector category
SPR Special Purpose Register
Version 1.3 October 23, 2012
List of Acronyms and Abbreviations
Page 41 of 864
Page 42
User’s Manual
A2 Processor
SPRN special purpose register number
SPRG Special Purpose Registers General
SR supervisor mode read access
SRAM static random access memory
STM stream category
SW supervisor mode write access
SX supervisor mode execution access
TB terabyte
TBC transfer byte count
TBL time base lower
TBU time base upper
TERRDET TLB error detect
TGS translation guest space identifier
TID translation ID
TLB translation lookaside buffer
TLPID translation logical partition identifier
TRC trace category
TS translation space identifier
UC microcode unit or uncorrectable error
uCode microcode
UCT unavailable coprocessor type
UDE unconditional debug event
UDEC user decrementer
UE underflow exception
UL uplink
UND undefined
UR user mode read access
UTLB unified translation lookaside buffer
UW user mode write access
List of Acronyms and Abbreviations
Page 42 of 864
Version 1.3
October 23, 2012
Page 43
User’s Manual
A2 Processor
UX underflow exception or user mode execution access
V vector category
V.LE little-endian category
VA virtual addresses
VF virtualization fault
VHDL very-high-speed integrated circuit (VHSIC) hardware description language
VLE variable length encoding category
VLPT virtual linear page table
VPN virtual page number
VSID virtual segment ID
VSX vector-scalar extension category
VX invalid operation exception
W write-through
WAW write-after-write
WC wake control or write to clear
WDT watchdog timer
WIMGE write-through, caching-inhibited, memory coherency required, guarded, and endi-
anness attributes
WP watchdog timer period
WS write to set
WT wait category
XOR exclusive OR
XU execution unit
ZX zero divide exception
Version 1.3 October 23, 2012
List of Acronyms and Abbreviations
Page 43 of 864
Page 44
User’s Manual
A2 Processor
List of Acronyms and Abbreviations
Page 44 of 864
Version 1.3
October 23, 2012
Page 45
User’s Manual
A2 Processor

1. Overview

The IBM Power ISA A2 64-bit embedded processor core is an implementation of the scalable and flexible Power ISA architecture. The A2 core implements four simultaneous threads of execution within the core. Each thread of execution can be viewed as a processor within a 4-way multiprocessor with shared dataflow. This gives the effective appearance of four independent processing units to software. The performance of the four threads is limited because they share some resources such as the L1 and L2 caches.
The floating-point unit interfaces to the A2 processor core and incorporates a 6-stage arithmetic pipeline. The pipeline enables one arithmetic instruction to be issued during each cycle. Floating-point instructions execute with 6-cycle latency and 1-cycle throughput, except for operations on denormalized operands, division, and square root.

1.1 A2 Core Key Design Fundamentals

The key design fundamentals of the A2 core are the following:
• 64-bit implementation of the Power ISA Version 2.06 Book III-E - Embedded Platform Environment.
– The A2 core provides binary compatibility for IBM PowerPC® application level code (problem state). – The A2 core implements the Embedded Hypervisor Architecture to provide secure compute domains
and operating system virtualization.
• The A2 core is optimized for aggregate throughput.
– 4-way, fine-grained simultaneous multithreaded. – 2-way concurrent issue. One branch/integer/load/store + one AXU
(FP/vector). – In-order dispatch and execution. – 27 FO4 design.
• The A2 core is a modular design to support reuse.
– The A2 core provides a general purpose coprocessor (AXU) port to attached unique AXUs.
• AXUs have full ISA flexibility.
• AXUs currently include: –FU
- Power ISA V2.06 scalar double-precision floating-point unit.
• The AXU is an optional unit.
– The A2 core provides for an optional MMU
unit.
• The MMU unit supports Power ISA V2.06 Book III-E Memory Management (MAV
• Without the MMU, the A2 core supports the software-managed ERATs defined in this document.
– The A2 core provides for an optional microcode engine and ROM
.
• Power ISA V2.06 Book I and II instructions are supported with a combination of microcoded
instructions and hardware implemented instructions.
2.0).
Version 1.3 October 23, 2012
Overview
Page 45 of 864
Page 46
User’s Manual
A2 Processor

1.2 A2 Core Features

The A2 core is a high-performance, low-power engine that implements the flexible and powerful Power ISA Architecture.
The A2 core contains a single-issue, in-order, pipelined processing unit, along with other functional elements required by embedded product specifications. These other functions include memory management, cache control, timers, and debug facilities. Interfaces for custom coprocessors and floating-point functions are provided. The processor interface is 128 bits for reads and 128 bits (optional 256 bits version of the A2) for writes and provides the framework to efficiently support system-on-a-chip (SOC) designs.
A2 core features include:
• High-performance, concurrent-issue, 64-bit RISC
CPU
• 4-way, fine-grained simultaneous multithreaded implementation of the full 64-bit Power ISA Architec­ture
– One outstanding I-fetch request to the L2
cache per thread – One 8-entry instruction fetch buffer per thread – Up to four instructions can be placed in the instruction buffer per cycle – Up to one instruction can be taken out of the instruction buffer per cycle per thread – Instruction decode and dependency per thread
• Two-way concurrent instruction decode and issue
• In-order dispatch, execution, and completion
• High-accuracy dynamic branch prediction
–81024 entry branch history table with 2 bits of history – Four-entry link stack per thread
• Highly-pipelined microarchitecture
– Full GPR – Full CR
bypass
bypass
– Link Register bypass
• Single unified pipeline
• Complex integer, system, branch, simple integer, and load/store pipelines
• 5-port (3-read, 2-write) 32  4  64-bit General Purpose Register (GPR) file
• Hardware support for all CPU misaligned accesses (except for lmw and stmw)
• Full support for both big- and little-endian byte ordering
• Primary caches
• Separate instruction and data cache arrays
• Array size offerings: 16 KB
• Single-cycle access
• 64-byte line size
• 8-way set-associative D-cache, 4-way set-associative I-cache
• Write-through operation
• Unified (for all threads) nonblocking with up to eight outstanding load misses
Overview
Page 46 of 864
Version 1.3
October 23, 2012
Page 47
User’s Manual
A2 Processor
• Cache line locking supported
• Caches can be partitioned to provide separate regions for transient instructions and data
• Critical-word-first data access and forwarding
• Pseudo LRU
replacement policy
• Cache tags and data are parity protected. Errors are recoverable.
• Memory Management Unit (MMU)
• Support for Power ISA categories Embedded.Hypervisor (E.HV), Embedded.Hypervisor.LRAT (E.HV.LRAT), Embedded.TLB Write Conditional (E.TWC), and Embedded.Page Table (E.PT)
• Support for Power ISA Book III-E MMU Architecture Version 2.0 (MAV 2.0)
Separate instruction and data ERAT
– Fully associative 16-entry I-ERAT
s
shared by all threads – Fully associative 32-entry D-ERAT shared by all threads – Entries can be shared by two or more threads via 4-bit thread ID mask field – Exclusion range function to allow address “holes” at base of page entries – ERATs operate in one of two modes: MMU mode or ERAT-only mode
1. MMU mode; ERAT with backing MMU – Software-managed page tables and indirect (IND = 1) TLB
entries – Hardware handles ERAT miss with TLB hit – Hardware handles direct (IND = 0) TLB miss via hardware page table walking – Software handles indirect (IND = 1) TLB miss via instruction and data TLB miss exceptions – Software can also install direct (IND = 0) TLB entries as required
2. ERAT-only mode; effective-to-real address translation with ERATs only – MMU removed, no backing TLB – Software-managed ERAT entries – I/D TLB miss exceptions
• 512-entry, 4-way set-associative unified TLB array
• Variable page sizes for direct (IND = 0) entries (4 KB, 64 KB, 1 MB, 16 MB, 1 GB), simultaneously resident in TLB and/or ERAT, and indirect (IND = 1) entries (1 MB and 256 MB) in TLB
• 88-bit virtual address (contains 64-bit effective address)
• 42-bit (4 TB
) real addressability
• Flexible TLB management via software management, or via hardware page table search
• Flexible storage attribute controls for write-through, caching inhibited, coherent, guarded, and byte order (endianness)
• Four user-definable storage attribute controls
• TLB tags and data are parity protected against soft errors.
• Debug facilities
• Extensive hardware debug facilities
• Multiple instruction and data address breakpoints
• Data value compare
• Instruction value compare
• Single-step, branch, trap, and other debug events
• Noninvasive real-time software trace interface
Version 1.3 October 23, 2012
Overview
Page 47 of 864
Page 48
User’s Manual
A2 Processor
• Timer facilities
• 64-bit time base
• Decrementer with auto-reload capability
• Fixed interval timer (FIT)
• Watchdog timer with critical interrupt and/or auto-reset
• Multiple core interfaces operating at core frequency
• System interface
• A command interface for instruction reads, data reads, and data writes
• A 256-bit interface for data writes (XUCR0[L2SIW] selects 128-bit mode)
• A 128-bit interface for instruction reads and data reads
• An invalidate interface to the core for the system to maintain L1
• Auxiliary execution unit (AXU) port
– Allows full ISA flexibility
• AXU includes support for separate decode and dependency
• Full support to stall and flush the processor
– A2 core pipeline exposed to allow high-performance, tightly coupled coprocessors
– Four-thread issue selection
– Provides functional extensions to the processor pipelines
– 256-bit load/store interface (direct access between AXU and the primary data cache)
– Interface can support AXU execution of all Power ISA floating-point instructions
– Attachment capability for DSP
– Enables customer-specific instruction enhancements for unique applications
• Clock and power management interface
• Debug interface
• Performance monitor event interface
Floating-point unit features include:
IEEE
754-1985 compliance
cache coherency
coprocessing such as accumulators and SIMD computation
1
• Single-precision and double-precision operation in hardware
• Executes Power ISA floating-point instruction set
• Masked exceptions handled in hardware
• Super-pipelined; single-cycle throughput for most instructions
• In-order dispatch, execution, and completion
• Single instruction decode and issue
• Thirty-two 64-bit Floating-Point Registers (FPRs)
• 64-bit load/store interface
1. The A2 FPU requires software support for IEEE 754 compliance. See IEEE 754 and Architectural Compliance on page 56 for details.
Overview
Page 48 of 864
Version 1.3
October 23, 2012
Page 49
User’s Manual
A2 Processor

1.3 The A2 Core as a Power ISA Implementation

The A2 core implements the full, 64-bit fixed-point Power ISA Architecture. The A2 core fully complies with these architectural specifications. The core does not implement the floating-point operations, although a floating-point unit (FU) can be attached (using the AXU interface).

1.3.1 Embedded Hypervisor

The A2 core implements the Embedded Hypervisor Architecture to provide secure compute domains and operating system virtualization. The Embedded Hypervisor Architecture introduces the concept of partitions by two main architectural changes. The first is by extending the virtual address with a logical partition identi­fier (LPID). The identifier serves an analogous purpose to the process ID (PID) and is used to distinguish partitions. The second change is introducing a new privilege level above supervisor and reallocating owner­ship of resources between the two levels. Moving the ownership of certain resources beyond the supervisor helps software to provide secure compute domains.
In addition to providing logical partitions, the following requirements are set forth:
• Ensure a secure environment. An operating system in one logical partition is not allowed to affect the resources of an operating system in another partition.
• Maintain compatibility with the existing programming model. An existing operating system today should require only minor initialization changes to run.
• An operating system running in a logical partition should not be able to deny service to any shared resources.
• Clean and secure communication channels between supervisor and embedded hypervisor states (in both directions).
• The ability to run guest operating systems efficiently and provide real-time response to interrupts.

1.4 A2 Core Organization

The A2 core includes a concurrent-issue instruction fetch and decode unit with an attached branch unit, together with a pipeline for complex integer, simple integer, and load/store operations. The A2 core also includes a memory management unit (MMU); separate instruction and data cache units; pervasive and debug logic; and timer facilities.
Version 1.3 October 23, 2012
Overview
Page 49 of 864
Page 50
User’s Manual
A2 Processor
Figure 1-1. A2 Core Organization

1.4.1 Instruction Unit

The instruction unit of the A2 core fetches, decodes, and issues two instructions from different threads per cycle to any combination of the one execution pipeline and the AXU interface (see Section 1.4.2 Execution Unit on page 51 and Section 1.5.2 Auxiliary Execution Unit (AXU) Port on page 59). The instruction unit includes a branch unit that provides dynamic branch prediction using a branch history table (BHT). This mechanism greatly improves the branch prediction accuracy and reduces the latency of taken branches, such that the target of a branch can usually be executed immediately after the branch itself with no penalty.
Overview
Page 50 of 864
Version 1.3
October 23, 2012
Page 51
User’s Manual
A2 Processor

1.4.2 Execution Unit

The A2 core contains a single execution pipeline. The pipeline consists of seven stages and can access the 5-ported (three read, two write) GPR file.
The pipeline handles all arithmetic, logical, branch, and system management instructions (such as interrupt and TLB management, move to/from system registers, and so on) as well as arithmetic, logical operations and all loads, stores and cache management operations. The pipelined multiply unit can perform 32-bit  32- bit multiply operations with single-cycle throughput and single-cycle latency. The width of the divider is 64 bits. Divide instructions dealing with 64-bit operands recirculate for 65 cycles, and operations with 32-bit oper­ands recirculate for 32 cycles. No divide instructions are pipelined; they all require some recirculation.
All misaligned operations are handled in hardware with no penalty on any operation that is contained within an aligned 32-byte region. The load/store pipeline supports all operations to both big-endian and little-endian data regions.
Appendix D Instruction Execution Performance and Code Optimizations on page 833 provides detailed infor­mation about instruction timings and performance implications in the A2 core.

1.4.3 Instruction and Data Cache Controllers

The A2 core provides separate instruction and data cache controllers and arrays, which allow concurrent access and minimize pipeline stalls. The storage capacity of the cache arrays 16 KB each. Both cache controllers have 64-byte lines, with 4-way set-associativity I-cache and 8-way set-associativity D-cache. Both caches support parity checking on the tags and data in the memory arrays to protect against soft errors. If a parity error is detected, the CPU forces an L1 miss and reloads from the system bus. The A2 core can be configured to cause a machine check exception on a D-cache parity error.
The Power ISA instruction set provides a rich set of cache management instructions for software-enforced coherency. See Instruction and Data Caches on page 169 for detailed information about the instruction and data cache controllers.
1.4.3.1 Instruction Cache Controller
The instruction cache controller (ICC) delivers up to four instructions per cycle to the instruction unit of the A2 core. The ICC also handles the execution of the Power ISA instruction cache management instructions for coherency.
1.4.3.2 Data Cache Controller
The data cache controller (DCC) handles all load and store data accesses, as well as the Power ISA data cache management instructions. All misaligned accesses are handled in hardware. Cacheable load accesses that are contained within a double quadword (32 bytes) are handled as a single request. Cacheable store or caching inhibited loads or store accesses that are contained within a quadword (16 bytes) are handled as a single request. Load and store accesses that cross these boundaries are broken into separate byte accesses by the hardware by the microcode engine. When in 32-byte store mode (XUCR0[L2SIW] = 1), then all misaligned store or load accesses contained within a double quadword (32 bytes) are handled as a single request. This includes cacheable and caching inhibited stores and loads.
Version 1.3 October 23, 2012
Overview
Page 51 of 864
Page 52
User’s Manual
A2 Processor
The DCC interfaces to the AXU port to provide direct load/store access to the data cache for AXU load and store operations. Such AXU load and store instructions can access up to 32 bytes (a double quadword) in a single cycle for cacheable accesses and can access up to 16 bytes (a quadword) in a single cycle for caching inhibited accesses.
The data cache always operates in a write-through manner.
The DCC also supports cache line locking and “transient” data via way locking.
The DCC provides for up to eight outstanding load misses, and the DCC can continue servicing subsequent load and store hits in an out-of-order fashion. Store-gathering is not performed within the A2 core.

1.4.4 Memory Management Unit (MMU)

The A2 core supports a flat, 42-bit (4 TB) real (physical) address space. This 42-bit real address is generated by the MMU as part of the translation process from the 64-bit effective address, which is calculated by the processor core as an instruction fetch or load/store address.
Note: In 32-bit mode, the A2 core forces bits 0:31 of the calculated 64-bit effective address to zeros. There­fore, to have a translation hit in 32-bit mode, software needs to set the effective address upper bits to zero in the ERATs and TLB.
The MMU provides address translation, access protection, and storage attribute control for embedded appli­cations. The MMU supports demand paged virtual memory and other management schemes that require precise control of logical to physical address mapping and flexible memory protection. Working with appro­priate system level software, the MMU provides the following functions:
• Translation of the 88-bit virtual address, 1-bit guest state (GS), 8-bit logical partition ID (LPID), 1-bit address space (AS) identifier, 14-bit process ID (PID), and 64-bit effective address into the 42-bit real address (note the 1-bit indirect entry IND bit is not considered part of the virtual address)
• Page-level read, write, and execute access control
• Storage attributes for cache policy, byte order (endianness), and speculative memory access
• Software control of page replacement strategy
The translation lookaside buffer (TLB) is the primary hardware resource involved in the control of translation, protection, and storage attributes. It consists of 512 entries, each specifying the various attributes of a given page of the address space. The TLB is 4-way set associative. The TLB entries can be of type direct (IND = 0), in which case the virtual address is translated immediately by a matching entry, or of type indirect (IND = 1), in which case the hardware page table walker is invoked to fetch and install an entry from the hardware page table.
The TLB tag and data memory arrays are parity protected against soft errors; if a parity error is detected during an address translation, the TLB and ERAT caches treat the parity error like a miss and proceed to either reload the entry with correct parity (in the case of an ERAT miss, TLB hit) and set the parity error bit in the appropriate fault isolation register (FIR), or generate a TLB exception where software can take appro­priate action (in the case of a TLB miss).
An operating system can choose to implement hardware page tables in memory that contain virtual to logical translation page table entries (PTEs) per Category E.PT. These PTEs are loaded into the TLB by the hard­ware page table walker logic after the logical address is converted to a real address via the logical to real address translation (LRAT) per Category E.HV.LRAT. Software must install indirect (IND = 1) type TLB entries for each page table that is to be traversed by the hardware walker. Alternately, software can manage
Overview
Page 52 of 864
Version 1.3
October 23, 2012
Page 53
User’s Manual
A2 Processor
the establishment and replacement of TLB entries by simply not using indirect entries (that is, by using only direct IND = 0 entries). This gives system software significant flexibility in implementing a custom page replacement strategy. For example, to reduce TLB thrashing or translation delays, software can reserve several TLB entries for globally accessible static mappings. The instruction set provides several instructions for managing TLB entries. These instructions are privileged, and the processor must be in supervisor state i for them to be executed.
The first step in the address translation process is to expand the effective address into a virtual address. This is done by taking the 64-bit effective address and prepending to it a 1-bit guest state (GS) identifier, an 8-bit logical partition ID (LPID), a 1-bit address space (AS) identifier, and the 14-bit process identifier (PID). The 1-bit indirect entry (IND) identifier is not considered part of the virtual address. The LPID value is provided by the LPIDR register, and the PID value is provided by the PID register (see Memory Management on page 185). The GS and AS identifiers are provided by the Machine State Register (MSR, see CPU Interrupts and Exceptions on page 293), which contains separate bits for the instruction fetch address space (MSR[IS]) and the data access address space (MSR[DS]). Together, the 64-bit effective address and the other identi­fiers form an 88-bit virtual address. This 88-bit virtual address is then translated into the 42-bit real address using the TLB.
The MMU divides the address space (whether effective, virtual, or real) into pages. Five direct (IND = 0) page sizes (4 KB, 64 KB, 1 MB, 16 MB, 1 GB) are simultaneously supported, such that at any given time the TLB can contain entries for any combination of page sizes. The MMU also supports two indirect (IND = 1) page sizes (1 MB and 256 MB) with associated sub-page sizes (see Section 6.16 Hardware Page Table Walking (Category E.PT)). For an address translation to occur, a valid direct entry for the page containing the virtual address must be in the TLB. An attempt to access an address for which no TLB direct exists results in a search for an indirect TLB entry to be used by the hardware page table walker. If neither a direct or indirect entry exists, an instruction (for fetches) or data (for load/store accesses) TLB miss exception occurs.
To improve performance, both the instruction cache and the data cache maintain separate shadow TLBs called ERATs. The ERATs contain only direct (IND = 0) type entries. The instruction ERAT (I-ERAT) contains 16 entries, while the data ERAT (D-ERAT) contains 32 entries. These ERAT arrays minimize TLB contention between instruction fetch and data load/store operations. The instruction fetch and data access mechanisms only access the main unified TLB when a miss occurs in the respective ERAT. Hardware manages the replacement and invalidation of both the I-ERAT and D-ERAT; no system software action is required in MMU mode. In ERAT-only mode, an attempt to access an address for which no ERAT entry exists causes an instruction (for fetches) or data (for load/store accesses) TLB miss exception.
Each TLB entry provides separate user state and supervisor state read, write, and execute permission controls for the memory page associated with the entry. If software attempts to access a page for which it does not have the necessary permission, an instruction (for fetches) or data (for load/store accesses) storage exception occurs.
Each TLB entry also provides a collection of storage attributes for the associated page. These attributes control cache policy (such as cacheability and write-through as opposed to copy-back behavior), byte order (big-endian as opposed to little-endian), and enabling of speculative access for the page. In addition, a set of four, user-definable storage attributes are provided. These attributes can be used to control various system­level behaviors.
Section 6 Memory Management describes the A2 core MMU functions in greater detail.
Version 1.3 October 23, 2012
Overview
Page 53 of 864
Page 54
User’s Manual
A2 Processor

1.4.5 Timers

The A2 core contains a time base and three timers: a decrementer (DEC), a fixed interval timer (FIT), and a watchdog timer. The time base is a 64-bit counter that gets incremented at a frequency either equal to the processor core clock rate or as controlled by a separate asynchronous timer clock input to the core. No inter­rupt is generated as a result of the time base wrapping back to zero.
The DEC is a 32-bit register that is decremented at the same rate at which the time base is incremented. The user loads the DEC register with a value to create the desired interval. When the register is decremented to zero, a number of actions occur: the DEC stops decrementing, a status bit is set in the Timer Status Register (TSR), and a decrementer exception is reported to the interrupt mechanism of the A2 core. Optionally, the DEC can be programmed to reload automatically the value contained in the Decrementer Auto-Reload Register (DECAR), after which the DEC resumes decrementing. The Timer Control Register (TCR) contains the interrupt enable for the decrementer interrupt.
The FIT generates periodic interrupts based on the transition of a selected bit from the time base. Users can select one of four intervals for the FIT period by setting a control field in the TCR to select the appropriate bit from the time base. When the selected time base bit transitions from 0 to 1, a status bit is set in the TSR and a fixed interval timer exception is reported to the interrupt mechanism of the A2 core. The FIT interrupt enable is contained in the TCR.
Similar to the FIT, the watchdog timer also generates a periodic interrupt based on the transition of a selected bit from the time base. Users can select one of four intervals for the watchdog period, again by setting a control field in the TCR to select the appropriate bit from the time base. Upon the first transition from 0 to 1 of the selected time base bit, a status bit is set in the TSR and a watchdog timer exception is reported to the interrupt mechanism of the A2 core. The watchdog timer can also be configured to initiate a hardware reset if a second transition of the selected time base bit occurs before the first watchdog exception being serviced. This capability provides an extra measure of recoverability from potential system lock-ups.
The timer functions of the A2 core are more fully described in Timer Facilities on page 387

1.4.6 Debug Facilities

The A2 core debug facilities include debug modes for the various types of debugging used during hardware and software development. Also included are debug events that allow developers to control the debug process. Debug modes and debug events are controlled using debug registers in the chip. The debug regis­ters are accessed either through software running on the processor, or through the serial communications (SCOM) port.
The debug modes, events, controls, and interfaces provide a powerful combination of debug facilities for hardware development tools such as the RISCWatch debugger from IBM.
A brief overview of the debug modes and development tool support are provided below. Debug Facilities on page 399 provides detailed information about each debug mode and other debug resources.
1.4.6.1 Debug Modes
The A2 core supports two debug modes: internal and external. Each mode supports a different type of debug tool used in embedded systems development. Internal debug mode supports software-based ROM
monitors, and external debug mode supports a hardware emulator type of debug. The debug modes are controlled by Debug Control Register 0 (DBCR0) and the setting of bits in the Machine State Register (MSR).
Overview
Page 54 of 864
Version 1.3
October 23, 2012
Page 55
User’s Manual
A2 Processor
Internal debug mode supports accessing architected processor resources, setting hardware and software breakpoints, and monitoring processor status. In internal debug mode, debug events can generate debug exceptions, which can interrupt normal program flow so that monitor software can collect processor status and alter processor resources.
Internal debug mode relies on exception-handling software—running on the processor—along with an external communications path to debug software problems. This mode is used while the processor continues executing instructions and enables debugging of problems in application or operating system code. Access to debugger software executing in the processor while in internal debug mode is through a communications port on the processor board, such as a serial port or Ethernet connection.
External debug mode supports stopping, starting, and single-stepping the processor, accessing architected processor resources, setting hardware and software breakpoints, and monitoring processor status. In external debug mode, debug events can architecturally “freeze” the processor. While the processor is frozen, normal instruction execution stops, and the architected processor resources can be accessed and altered using a debug tool (such as RISCWatch) attached through the SCOM port. This mode is useful for debugging hardware and low-level control software problems.
1.4.6.2 Development Tool Support
The A2 core provides powerful debug support for a wide range of hardware and software development tools.
RISCWatch is an example of a development tool that uses the external debug mode, debug events, and the SCOM port to support hardware and software development and debugging.

1.4.7 Floating-Point Unit Organization

The floating-point unit incorporates a single-issue instruction decode and issue unit and a 6-stage arithmetic pipeline working in parallel with a 4-stage load/store pipeline. The floating-point unit contains a Floating-Point Register (FPR) file that interfaces to both pipelines. There are thirty-two 64-bit FPRs.
Figure 1-2 illustrates the logical organization of the A2 core and its relationship to the A2 processor core.
Version 1.3 October 23, 2012
Overview
Page 55 of 864
Page 56
User’s Manual
Instruction Decode/Issue Unit
AXU
Interface
Data
Cache
Arithmetic
Pipe
CR
Load/Store
Pipe
FPSCR
Thread 0 Thread 1
Unit
FPR0
FPR1
FPR1
FPR30
FPR31
Floating-Point AXUA2 Core
Thread 2 Thread 3
A2 Processor
Figure 1-2. A2 Processor Block Diagram
1.4.7.1 Arithmetic and Load/Store Pipelines
The A2 core has a single execution pipeline. The pipeline handles all computational instructions and reads from and writes to the FPRs, Floating-Point Status and Control Register (FPSCR), and the Condition Register (CR).

1.4.8 IEEE 754 and Architectural Compliance

The A2 core is IEEE 754 and Power ISA compliant and implements single-precision and double-precision instructions.
Overview
Page 56 of 864
Version 1.3
October 23, 2012
Page 57
User’s Manual
A2 Processor
1.4.8.1 IEEE 754 Compliance
IEEE 754 requires a certain set of operations to be included in any implementation that claims to be compliant. Such operations can be implemented in hardware, software, or a combination of the two. The Power ISA floating-point architecture includes most of the required operations but some are missing. The missing operations are: floating-point remainder, format conversion between binary and decimal, and format conversion from integer to floating-point. It is necessary to provide a software library to support these missing functions. In other words, the Power ISA Architecture requires software support to be fully complaint with the IEEE standard.

1.4.9 Floating-Point Unit Implementation

Certain aspects of the behavior of the floating-point unit are implementation-specific.
1.4.9.1 Reciprocal Estimates
While the Power ISA Architecture defines single-precision reciprocal estimates and reciprocal square root estimates to have relative errors of 2 relative error of 2
-14
.
-5
and 2-8 respectively, both are implemented in the A2 core to have a
Programmers are encouraged to take advantage of this increased accuracy, but must be aware that code that relies on this increased accuracy might not work on any other Power ISA FU.
1.4.9.2 Denormalized B Operands
The floating-point unit supports all denormal numbers in the dataflow with no additional latency except the following cases:
1. B is a double-precision denorm AND NOT (move{fabs/fnabs/fneg} OR fsel OR fcfid OR mv_to_fpscr).
2. B is a single-precision denorm AND NOT (move{fabs/fnabs/fneg} OR fsel)
If any of the above cases are detected, the A2 core flushes to the microcode engine, which in turn issues a prenormalization instruction, followed by the original instruction. The latency for these operations increases by 20 cycles when this occurs.
1.4.9.3 Non-IEEE mode
Non-IEEE mode, controlled by the NI bit in the FPSCR, is intended to eliminate data-dependent overhead cycles caused by exceptional operands or results. The result is faster, deterministic performance with reason­able results. This mode is not supported by the A2 core. The value of the NI bit is ignored.

1.4.10 Floating-Point Unit Interfaces

The floating-point unit interfaces to the A2 processor core.
1.4.10.1 A2 Processor Core Interface
This interface enables the A2 core to interact with the A2 processor core. Interactions include resets and updating the CR.
Version 1.3 October 23, 2012
Overview
Page 57 of 864
Page 58
User’s Manual
A2 Processor
1.4.10.2 Clock and Power Management Interface
The CPM interface supports clock distribution and power management to reduce power consumption below the normal operational level. External logic is necessary for the sleep mode to function.

1.5 Core Interfaces

The core includes the following interfaces:
• System interface
• Auxiliary execution unit (AXU) port
• SCOM, debug, trace, and performance monitor event ports
• Interrupt interface
• Clock and power management interface
Several of these interfaces are described briefly in the sections below.

1.5.1 System Interface

The A2 core interface has one command interface for instruction reads, data reads, and data writes, and uses a 42-bit address bus. A full 64-byte cache line is implied for cacheable data reads and cacheable instruction fetches. The transfer length is used to indicate 1 byte, 2 byte, 4 byte, 8 byte, 16 byte, and 32 byte for noncacheable reads and 16 bytes for noncacheable instruction fetches. There is a 256-bit data interface for data writes with 32 byte enables indicating which bytes should be written.
Data writes can be 1 byte, 2 byte, 4 byte, 8 byte, or 16 byte for noncacheable or cacheable writes. There is a 128-bit data reload interface for instruction reads and data reads. When the reload data is less than 16 bytes (due to the transfer length indicating 1 byte, 2 byte, 4 byte or 8 byte), the data should be aligned within the 16 byte reload bus based on the associated command interface address. There is a back invalidate interface for systems with an entity outside the A2 core (such as an L2 cache controller) that provide hardware cache coherency.
A2 supports a mode that enables a 32-byte write bus to the A2 core/L2 interface. Only the AXU can produce 32-byte writes.
The command interface is a credit-based interface. The A2 core can handle up to eight load-type credits. The actual number of load-type credits (L) that it will handle is initialized in the A2 core configuration ring. In the A2 core, there is a 12-entry load command queue that includes eight entries for data loads and four entries for instruction fetches. An entity outside the A2 core is expected to have a near queue of L entries for load-type operations and to give a pop indication to the A2 core as each is sent to the far queue that contains 8 to 12 entries. The specific command is indicated in the transaction type.
Examples of transaction types that expect data to be returned on the reload bus are instruction fetch, load, and dcbt. Examples of transaction types that do not expect data to be returned on the reload bus are store, dcbz and dcbf. The A2 core can handle up to 32 store-type credits. The actual number of credits (S) that it will handle is initialized in the A2 core configuration ring.
Overview
Page 58 of 864
Version 1.3
October 23, 2012
Page 59
User’s Manual
A2 Processor
An entity outside the A2 core is expected to be able to queue the S store-type operations and give a pop indi­cation to the A2 core for each as it is processed and the queue entry is available. For an entity outside the A2 core that also support store gathering, it should give a gather indication to the A2 core when the store is gath­ered with an existing queue entry to let the A2 core know that an additional queue entry is available.

1.5.2 Auxiliary Execution Unit (AXU) Port

This interface provides the A2 core with the flexibility to attach a tightly-coupled coprocessor-type macro incorporating instructions that go beyond those provided within the processor core itself. The AXU port provides sufficient functionality for attachment of various coprocessor functions such as a fully-compliant Power ISA floating-point unit (single- or double-precision), multimedia engine, DSP
, or other custom function implementing algorithms appropriate for specific system applications. The AXU interface supports can be used with macros that contain their own register files. AXU load and store instructions can directly access the A2 core data cache, with operands of up to a double quadword (32 bytes) in length.
The AXU interface provides the capability for a coprocessor to execute instructions that are not part of the Power ISA instruction set at the same time that the A2 core is executing PowerISA instructions. Areas within the architected instruction space allow for these customer-specific or application-specific AXU instruction set extentions. Further description is beyond the scope of this document.

1.5.3 JTAG Port

The A2 core SCOM port supports the indirect attachment of a debug tool such as the RISCWatch product from IBM. A logic block outside the A2 core must provide JTAG
to SCOM port translation. Through the SCOM port, and using the debug facilities designed into the A2 core, a debug workstation can single-step the processor and interrogate the internal processor state to facilitate hardware and software debugging.
Version 1.3 October 23, 2012
Overview
Page 59 of 864
Page 60
User’s Manual
A2 Processor
Overview
Page 60 of 864
Version 1.3
October 23, 2012
Page 61
User’s Manual
A2 Processor

2. CPU Programming Model

The programming model of the A2 core describes how the following features and operations of the core appear to programmers:
Logical Partitioning on page 61
Storage Addressing on page 62
Multithreading on page 70
Registers on page 82
32-Bit Mode on page 85
Instruction Categories on page 86
Instruction Classes on page 87
Implemented Instruction Set Summary on page 88
Wait Instruction on page 98
Branch Processing on page 99
Integer Processing on page 110
Processor Control on page 113
Privileged Modes on page 120
Speculative Accesses on page 122
Synchronization on page 122
Software Transactional Memory Acceleration on page 125

2.1 Logical Partitioning

2.1.1 Overview

Logical partitioning defines instructions, resources, and methods for establishing an additional attribute of processor privilege called a guest state.
The Embedded.Hypervisor category permits processors and portions of real storage to be assigned to local collections called partitions such that a program executing on a processor in one partition cannot interfere with any program executing on a processor in a different partition. This isolation can be provided for both problem state and privileged state programs by using a layer of trusted software called a hypervisor program (or simply a “hypervisor”) and the resources provided by this category to manage system resources. The collection of software that runs in a given partition and its associated resources is called a guest. The guest normally includes an operating system (or other system software) running in privileged state and its associ­ated processes running in the problem state under the management of the hypervisor. The processor is in the guest state when a guest is executing, and it is in the hypervisor state when the hypervisor is executing. The processor is executing in the guest state when MSR[GS] = 1.
A2 implements 2
8
partitions. See Section 6.17.2 Logical Partition ID Register (LPIDR) on page 245. All
threads of a single A2 core must be assigned to the same logical partition.
Version 1.3 October 23, 2012
CPU Programming Model
Page 61 of 864
Page 62
User’s Manual
A2 Processor
A processor is assigned to one partition at any given time. A processor can be assigned to any given partition without consideration of the physical configuration of the system (for example, shared registers, caches, organization of the storage hierarchy), except that processors that share certain hypervisor resources might need to be assigned to the same partition. Additionally, certain resources can be used by the guest at the discretion of the hypervisor. Such usage might cause interference between partitions, and the hypervisor should allocate those resources accordingly. The primary registers and facilities used to control logical parti­tioning are described in the following subsections. Other facilities associated with logical partitioning are described within the appropriate sections within this book.
Category Embedded.Hypervisor changes the operating system programming model to allow for easier virtu­alization, while retaining a default backwards compatible mode where an operating system written for proces­sors not containing this category will still operate as before without using the logical partitioning facilities.

2.2 Storage Addressing

As a 64-bit implementation of the Power ISA Architecture, the A2 core implements a uniform 64-bit effective address (EA) space. Effective addresses are expanded into virtual addresses and then translated to 42-bit (4 TB) real addresses by the memory management unit (see Memory Management on page 185 for more information about the translation process). The organization of the real address space into a physical address space is system-dependent, and is described in the user’s manuals for chip-level products that incorporate an A2 core.
The A2 core generates an effective address whenever it executes a storage access, branch, cache manage­ment, or translation look aside buffer (TLB) management instruction, or when it fetches the next sequential instruction.

2.2.1 Storage Operands

Bytes in storage are numbered consecutively starting with 0. Each number is the address of the corre­sponding byte.
Data storage operands accessed by the integer load/store instructions can be bytes, halfwords, words, doublewords or—for load/store multiple and string instructions—a sequence of words or bytes, respectively. Data storage operands accessed by auxiliary execution unit (AXU) load/store instructions can be bytes, half­words, words, doublewords, quadwords or double quadwords. The address of a storage operand is the address of its first byte (that is, of its lowest-numbered byte). Byte ordering can be either big endian or little endian, as controlled by the endian storage attribute (see Byte Ordering on page 66; also see Endian (E) on page 197 for more information about the endian storage attribute).
Operand length is implicit for each scalar storage access instruction type (that is, each storage access instruction type other than the load/store multiple and string instructions). The operand of such a scalar storage access instruction has a “natural” alignment boundary equal to the operand length. In other words, the natural address of an operand is an integral multiple of the operand length. A storage operand is said to be aligned if it is aligned at its natural boundary; otherwise, it is said to be unaligned.
Data storage operands for storage access instructions have the characteristics shown in Table 2-1 on page 63.
CPU Programming Model
Page 62 of 864
Version 1.3
October 23, 2012
Page 63
User’s Manual
A2 Processor
Table 2-1. Data Operand Definitions
Storage Access Instruction Type Operand Length Addr[59:63] if Aligned
Byte (or String) 8 bits 0bxxxxx
Halfword 2 bytes 0bxxxx0
Word (or Multiple) 4 bytes 0bxxx00
Doubleword 8 bytes 0bxx000
Quadword (AXU only) 16 bytes 0bx0000
Double Quadword (AXU only) 32 bytes 0b00000
Note: An “x” in an address bit position indicates that the bit can be 0 or 1 independent of the state of other bits in the address.
The alignment of the operand effective address of some storage access instructions might affect perfor­mance; in some cases, it might cause an alignment exception to occur. For such storage access instructions, the best performance is obtained when the storage operands are aligned. Table 2-2 summarizes the effects of alignment on those storage access instruction types for which such effects exist. If an instruction type is not shown in the table, there are no alignment effects for that instruction type.
Table 2-2. Alignment Effects for Storage Access Instructions (Sheet 1 of 2)
Storage Access Instruction Type Alignment Effects
Integer cacheable load halfword
Integer cacheable store or caching inhibited load/store halfword
Integer cacheable load word
Integer cacheable store or caching inhibited load/store word
Integer cacheable load doubleword
Integer cacheable store or caching inhibited load/store doubleword
Integer load/store multiple
Integer load/store string Broken into a series of byte accesses until the last byte is accessed. (See notes.)
AXU cacheable load halfword
AXU cacheable store or caching inhib­ited load/store halfword
AXU cacheable load word
AXU cacheable store or caching inhib­ited load/store word
AXU cacheable load doubleword
AXU cacheable store or caching inhib­ited load/store doubleword
Broken into byte accesses if crosses 32-byte boundary (EA[59:63] = 0b11111); otherwise no effect. (See notes.)
Broken into byte accesses if crosses 16-byte boundary (EA[60:63] = 0b1111); otherwise no effect. (See notes.)
Broken into byte accesses if crosses 32-byte boundary (EA[59:63] > 0b11100); otherwise no effect. (See notes.)
Broken into byte accesses if crosses 16-byte boundary (EA[60:63] > 0b1100); otherwise no effect. (See notes.)
Broken into byte accesses if crosses 32-byte boundary (EA[59:63] > 0b11000); otherwise no effect. (See notes.)
Broken into byte accesses if crosses 16-byte boundary (EA[60:63] > 0b1000); otherwise no effect. (See notes.)
Broken into a series of word (4-byte) accesses until the last word is accessed. The load/store multiple address must be word aligned. (See notes.)
Broken into byte accesses if crosses 32-byte boundary (EA[59:63] = 0b11111); otherwise no effect. (See notes.)
Broken into byte accesses if crosses 16-byte boundary (EA[60:63] = 0b1111); otherwise no effect. (See notes.)
Broken into byte accesses if crosses 32-byte boundary (EA[59:63] > 0b11100); otherwise no effect. (See notes.)
Broken into byte accesses if crosses 16-byte boundary (EA[60:63] > 0b1100); otherwise no effect. (See notes.)
Broken into byte accesses if crosses 32-byte boundary (EA[59:63] > 0b11000); otherwise no effect. (See notes.)
Broken into byte accesses if crosses 16-byte boundary (EA[60:63] > 0b1000); otherwise no effect. (See notes.)
Version 1.3 October 23, 2012
CPU Programming Model
Page 63 of 864
Page 64
User’s Manual
A2 Processor
Table 2-2. Alignment Effects for Storage Access Instructions (Sheet 2 of 2)
Storage Access Instruction Type Alignment Effects
AXU cacheable load quadword
AXU cacheable store or caching inhib­ited load/store quadword
AXU cacheable load double quadword
AXU cacheable store or caching inhib­ited load/store double quadword
Notes:
• Any unaligned access that also crosses a 4 K page boundary causes an alignment exception.
• An auxiliary processor can specify that the EA for a given AXU load/store instruction must be aligned at the operand-size boundary or, alternatively, at a word boundary. If the AXU so indicates this requirement and the calculated EA fails to meet it, the A2 core generates an alignment exception. Alternatively, an auxiliary processor can specify that the EA for a given AXU load/store instruc­tion should be “forced” to be aligned by ignoring the appropriate number of low-order EA bits and processing the AXU load/store as if those bits were 0. Byte, halfword, word, doubleword, and quadword AXU load/store instructions ignore 0, 1, 2, 3, and 4 low-order EA bits, respectively.
Broken into byte accesses if crosses 32-byte boundary (EA[59:63] > 0b10000); otherwise no effect. (See notes.)
Broken into byte accesses if crosses 16-byte boundary (EA[60:63] > 0b0000); otherwise no effect. (See notes.)
Broken into byte accesses if crosses 32-byte boundary (EA[59:63] > 0b00000); otherwise no effect. (See notes.)
Broken into byte accesses if crosses 16-byte boundary (EA[60:63] > 0b0000); otherwise no effect. (See notes.)
Cache management instructions access cache block operands; for the A2 core, the cache block size is 64 bytes. However, the effective addresses calculated by cache management instructions are not required to be aligned on cache block boundaries. Instead, the architecture specifies that the associated low-order effective address bits (bits 58:63 for the A2 core) are ignored during the execution of these instructions.
Similarly, the TLB management instructions access page operands, and—as determined by the page size— the associated low-order effective address bits are ignored during the execution of these instructions.
Instruction storage operands, on the other hand, are always 4 bytes long, and the effective addresses calcu­lated by branch instructions are therefore always word-aligned.

2.2.2 Effective Address Calculation

For a storage access instruction, if the sum of the effective address and the operand length exceeds the maximum effective address of 2
64
–1 for 64-bit mode or 232–1 in 32-bit mode (that is, the storage operand itself crosses the maximum address boundary), the result of the operation is undefined, as specified by the architecture. The A2 core performs the operation as if the storage operand wrapped around from the maximum effective address to effective address 0. Software, however, should not depend upon this behavior, so that it can be ported to other implementations that do not handle this scenario in the same fashion. Accord­ingly, software should ensure that no data storage operands cross the maximum address boundary.
Note: Because instructions are words and because the effective addresses of instructions are always implic­itly on word boundaries, it is not possible for an instruction storage operand to cross any word boundary, including the maximum address boundary.
Effective address arithmetic, which calculates the starting address for storage operands, wraps around from the maximum address to address 0 for all effective address computations except next sequential instruction fetching. See Instruction Storage Addressing Modes on page 65 for more information about next sequential instruction fetching at the maximum address boundary.
CPU Programming Model
Page 64 of 864
Version 1.3
October 23, 2012
Page 65
User’s Manual
A2 Processor
2.2.2.1 Data Storage Addressing Modes
There are two data storage addressing modes supported by the A2 core:
• Base + displacement (D-mode) addressing mode:
The 16-bit D field is sign-extended and added to the contents of the GPR
designated by RA or to zero if
RA = 0.
• Base + index (X-mode) addressing mode:
The contents of the GPR designated by RB (or the value 0 for lswi and stswi) are added to the contents of the GPR designated by RA or to 0 if RA = 0.
2.2.2.2 Instruction Storage Addressing Modes
There are four instruction storage addressing modes supported by the A2 core:
• I-form branch instructions (unconditional):
The 24-bit LI field is concatenated on the right with 0b00, sign-extended, and then added to either the address of the branch instruction if AA = 0 or to 0 if AA = 1.
• Taken B-form branch instructions:
The 14-bit BD field is concatenated on the right with 0b00, sign-extended, and then added to either the address of the branch instruction if AA = 0 or to 0 if AA = 1.
• Taken XL-form branch instructions:
The contents of bits 0:61 of the Link Register (LR) or the Count Register (CTR) are concatenated on the right with 0b00 to form the 64-bit effective address of the next instruction.
Note: In 32-bit mode, the A2 core forces bits 0:31 of the calculated 64-bit effective address to zeros.
• Next sequential instruction fetching (including nontaken branch instructions):
The value 4 is added to the address of the current instruction to form the 64-bit effective address of the next instruction. If the address of the current instruction is 0xFFFF_FFFF_FFFF_FFFC in 64-bit mode or 0x0000_0000_FFFF_FFFC in 32-bit mode, the A2 core wraps the next sequential instruction address back to address 0. This behavior is not required by the architecture, which specifies that the next sequen­tial instruction address is undefined under these circumstances. Therefore, software should not depend upon this behavior, so that it can be ported to other implementations that do not handle this scenario in the same fashion. Accordingly, if software wants to execute across this maximum address boundary and wrap back to address 0, it should place an unconditional branch at the boundary with a displacement of 4.
In addition to the above four instruction storage addressing modes, the following behavior applies to branch instructions:
• Any branch instruction with LK = 1:
The value 4 is added to the address of the current instruction and the low-order 64 bits of the result are placed into the LR. As for the similar scenario for next sequential instruction fetching, if the address of the branch instruction is 0xFFFF_FFFF_FFFF_FFFC in 64-bit mode or 0x0000_0000_FFFF_FFFC in 32-bit mode, the result placed into the LR is architecturally undefined, although once again the A2 core wraps the LR update value back to address 0. Again, however, software should not depend on this behavior so that it can be ported to implementations that do not handle this scenario in the same fashion.
Version 1.3 October 23, 2012
CPU Programming Model
Page 65 of 864
Page 66
User’s Manual
A2 Processor

2.2.3 Byte Ordering

If scalars (individual data items and instructions) were indivisible, there would be no such concept as “byte ordering.” It is meaningless to consider the order of bits or groups of bits within the smallest addressable unit of storage, because nothing can be observed about such order. Only when scalars, which the programmer and processor regard as indivisible quantities, can comprise more than one addressable unit of storage does the question of order arise.
For a machine in which the smallest addressable unit of storage is the 64-bit doubleword, there is no question of the ordering of bytes within doublewords. All transfers of individual scalars between registers and storage are of doublewords, and the address of the byte containing the high-order 8 bits of a scalar is no different from the address of a byte containing any other part of the scalar.
For the Power ISA Architecture, as for most current computer architectures, the smallest addressable unit of storage is the 8-bit byte. Many scalars are halfwords, words, or doublewords that consist of groups of bytes. When a word-length scalar is moved from a register to storage, the scalar occupies 4 consecutive byte addresses. It thus becomes meaningful to discuss the order of the byte addresses with respect to the value of the scalar: which byte contains the highest-order 8 bits of the scalar, which byte contains the next-highest­order 8 bits, and so on.
Given a scalar that contains multiple bytes, the choice of byte ordering is essentially arbitrary. There are 24 ways to specify the ordering of 4 bytes within a word, but only two of these orderings are sensible:
• The ordering that assigns the lowest address to the highest-order (left-most) 8 bits of the scalar, the next sequential address to the next-highest-order 8 bits, and so on.
This ordering is called big endian because the “big end” (most-significant end) of the scalar, considered as a binary number, comes first in storage. IBM RISC
System/6000, IBM System/390®, and Motorola
680x0 are examples of computer architectures using this byte ordering.
• The ordering that assigns the lowest address to the lowest-order (“right-most”) 8 bits of the scalar, the next sequential address to the next-lowest-order 8 bits, and so on.
This ordering is called little endian because the “little end” (least-significant end) of the scalar, considered as a binary number, comes first in storage. The Intel x86 is an example of a processor architecture using this byte ordering.
Power ISA supports both big-endian and little-endian byte ordering, for both instruction and data storage accesses. Which byte ordering is used is controlled on a memory page basis by the endian (E) storage attribute, which is a field within the TLB entry for the page. The endian storage attribute is set to 0 for a big­endian page and is set to 1 for a little-endian page. See Memory Management on page 185 for more informa­tion about memory pages, the TLB, and storage attributes, including the endian storage attribute.
2.2.3.1 Structure Mapping Examples
The following C language structure,
s, contains an assortment of scalars and a character string. The
comments show the value assumed to be in each structure element; these values show how the bytes comprising each structure element are mapped into storage.
struct {
int a; /* 0x1112_1314 word */ long long b; /* 0x2122_2324_2526_2728 doubleword */ int c; /* 0x3132_3334 word */ char d[7]; /* 'A','B','C','D','E','F','G' array of bytes */
CPU Programming Model
Page 66 of 864
Version 1.3
October 23, 2012
Page 67
User’s Manual
A2 Processor
short e; /* 0x5152 halfword */ int f; /* 0x6162_6364 word */
} s;
C structure mapping rules permit the use of padding (skipped bytes) to align scalars on desirable boundaries. The following structure mapping examples show each scalar aligned at its natural boundary. This alignment introduces padding of 4 bytes between a and b, one byte between d and e, and two bytes between e and f. The same amount of padding is present in both big-endian and little-endian mappings.
Big-Endian Mapping
The big-endian mapping of structure
s follows (the data is highlighted in the structure mappings). Addresses,
in hexadecimal, are below the data stored at the address. The contents of each byte, as defined in structure
s, is shown as a (hexadecimal) number or character (for the string elements). The shaded cells correspond to
padded bytes.
11 12 13 14
0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07
21 22 23 24 25 26 27 28
0x08 0x09 0x0A 0x0B 0x0C 0x0D 0x0E 0x0F
31 32 33 34 'A' 'B' 'C' 'D'
0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17
'E' 'F' 'G'
0x18 0x19 0x1A 0x1B 0x1C 0x1D 0x1E 0x1F
61 62 63 64
0x20 0x21 0x22 0x23 0x24 0x25 0x26 0x27
51 52
Little-Endian Mapping
Structure
14 13 12 11
0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07
28 27 26 25 24 23 22 21
0x08 0x09 0x0A 0x0B 0x0C 0x0D 0x0E 0x0F
34 33 32 31 'A' 'B' 'C' 'D'
0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17
'E' 'F' 'G'
0x18 0x19 0x1A 0x1B 0x1C 0x1D 0x1E 0x1F
64 63 62 61
0x20 0x21 0x22 0x23 0x24 0x25 0x26 0x27
s is shown mapped little endian.
52 51
2.2.3.2 Instruction Byte Ordering
Power ISA defines instructions as aligned words (4 bytes) in memory. As such, instructions in a big-endian program image are arranged with the most-significant byte (MSB) of the instruction word at the lowest­numbered address.
Version 1.3 October 23, 2012
CPU Programming Model
Page 67 of 864
Page 68
User’s Manual
A2 Processor
Consider the big-endian mapping of instruction p at address 0x00, where, for example, p = add r7, r7, r4:
MSB LSB
0x00 0x01 0x02 0x03
On the other hand, in a little-endian mapping the same instruction is arranged with the least-significant byte (LSB) of the instruction word at the lowest-numbered address:
LSB MSB
0x00 0x01 0x02 0x03
By the definition of Power ISA bit numbering, the most-significant byte of an instruction is the byte containing bits 0:7 of the instruction. As depicted in the instruction format diagrams (see Instruction Formats in the Power ISA specification), this most-significant byte is the one that contains the primary opcode field (bits 0:5). Due to this difference in byte orderings, the processor must perform whatever byte reversal is required (depending on the particular byte ordering in use) to correctly deliver the opcode field to the instruction decoder. In the A2 core, this reversal is performed between the memory interface and the instruction cache, according to the value of the endian storage attribute for each memory page, such that the bytes in the instruction cache are always correctly arranged for delivery directly to the instruction decoder.
If the endian storage attribute for a memory page is reprogrammed from one byte ordering to the other, the contents of the memory page must be reloaded with program and data structures that are in the appropriate byte ordering. Furthermore, anytime the contents of instruction memory change, the instruction cache must be made coherent with the updates by invalidating the instruction cache and refetching the updated memory contents with the new byte ordering.
2.2.3.3 Data Byte Ordering
Unlike instruction fetches, data accesses cannot be byte-reversed between memory and the data cache. Data byte ordering in memory depends upon the data type (byte, halfword, word, and so on) of a specific data item. It is only when moving a data item of a specific type from or to an architected register (as directed by the execution of a particular storage access instruction) that it becomes known what kind of byte reversal might be required due to the byte ordering of the memory page containing the data item. Therefore, byte reversal during load or store accesses is performed between the data cache (or memory, on a data cache miss, for example) and the load register target or store register source, depending on the specific type of load or store instruction (that is, byte, halfword, word, and so on).
Comparing the big-endian and little-endian mappings of structure
s, as shown in Structure Mapping Exam-
ples on page 66, the differences between the byte locations of any data item in the structure depends upon
the size of the particular data item. For example (again referring to the big-endian and little-endian mappings of structure
s):
•The word a has its 4 bytes reversed within the word spanning addresses 0x00 – 0x03.
•The halfword e has its 2 bytes reversed within the halfword spanning addresses 0x1C – 0x1D.
Note: The array of bytes d, where each data item is a byte, is not reversed when the big-endian and little­endian mappings are compared. For example, the character 'A' is located at address 0x14 in both the big­endian and little-endian mappings.
The size of the data item being loaded or stored must be known before the processor can decide whether, and if so, how, to reorder the bytes when moving them between a register and the data cache (or memory).
CPU Programming Model
Page 68 of 864
Version 1.3
October 23, 2012
Page 69
User’s Manual
A2 Processor
• For byte loads and stores, including strings, no reordering of bytes occurs regardless of byte ordering.
• For halfword loads and stores, bytes are reversed within the halfword for one byte order with respect to the other.
• For word loads and stores (including load/store multiple), bytes are reversed within the word for one byte order with respect to the other.
• For doubleword loads and stores, bytes are reversed within the doubleword for one byte order with respect to the other.
• For quadword loads and stores (AXU loads/stores only), bytes are reversed within the quadword for one byte order with respect to the other.
Note: This mechanism applies independent of the alignment of data. In other words, when loading a multi­byte data operand with a scalar load instruction, bytes are accessed from the data cache (or memory) starting with the byte at the calculated effective address and continuing with consecutively higher-numbered bytes until the required number of bytes have been retrieved. Then, the bytes are arranged such that either the byte from the highest-numbered address (for big-endian storage regions) or the lowest-numbered address (for lit­tle-endian storage regions) is placed into the least-significant byte of the register. The rest of the register is filled in corresponding order with the rest of the accessed bytes. An analogous procedure is followed for sca­lar store instructions.
For load/store multiple instructions, each group of 4 bytes is transferred between memory and the register according to the procedure for a scalar load word instruction.
For load/store string instructions, the most-significant byte of the first register is transferred to or from memory at the starting (lowest-numbered) effective address, regardless of byte ordering. Subsequent register bytes (from most-significant to least-significant, and then moving into the next register, starting with the most-signif­icant byte, and so on) are transferred to or from memory at sequentially higher-numbered addresses. This behavior for byte strings ensures that if two strings are loaded into registers and then compared, the first bytes of the strings are treated as most significant with respect to the comparison.
2.2.3.4 Byte-Reverse Instructions
The Power ISA defines load/store byte-reverse instructions, which can access storage that is specified as being of one byte ordering in the same manner that a regular (that is, nonbyte-reverse) load/store instruction would access storage that is specified as being of the opposite byte ordering. In other words, a load/store byte-reverse instruction to a big-endian memory page transfers data between the data cache (or memory) and the register in the same manner that a normal load/store would transfer the data to or from a little-endian memory page. Similarly, a load/store byte-reverse instruction to a little-endian memory page transfers data between the data cache (or memory) and the register in the same manner that a normal load/store would transfer the data to or from a big-endian memory page.
The function of the load/store byte-reverse instructions is useful when a particular memory page contains a combination of data with both big-endian and little-endian byte ordering. In such an environment, the endian storage attribute for the memory page would be set according to the predominant byte ordering for the page, and the normal load/store instructions would be used to access data operands that used this predominant byte ordering. Conversely, the load/store byte-reverse instructions would be used to access the data oper­ands that were of the other (less prevalent) byte ordering.
Software compilers cannot typically make general use of the load/store byte-reverse instructions, so they are ordinarily used only in special, hand-coded device drivers.
Version 1.3 October 23, 2012
CPU Programming Model
Page 69 of 864
Page 70
User’s Manual
A2 Processor

2.3 Multithreading

The A2 core has four threads that allow simultaneous execution within the processor and can be viewed as a 4-way multiprocessor with shared dataflow. This gives the effective appearance of four independent processing units from the view of software. The performance of each thread can be limited due to the sharing of resources between each of the threads.

2.3.1 Thread Identification

2.3.1.1 Thread Identification Register (TIR)
The TIR is a read-only register that can be used to distinguish a thread from other threads on the A2 core. The TIR returns a value n, where n is referred to as “thread n.”
Register Short Name: TIR Read Access: Hypv
Decimal SPR Number: 446 Write Access: None
Initial Value: 0x0000000000000000 Duplicated for Multithread: N
Slow SPR: N Notes:
Guest Supervisor Mapping: Scan Ring: func
Bits Field Name
0:31 /// 0x0 Reserved
32:61 /// 0x0 Reserved
62:63 TID 0b00 Processor Thread ID
Initial Val ue
This field can be used to distinguish the thread from other threads on the processor. Threads are numbered sequentially, with valid values ranging from 0 to 3.
Description
2.3.1.2 Processor Identification Register (PIR)
The PIR is a read-only register that uniquely identifies a specific instance of a processor thread, within a multiprocessor configuration, enabling software to determine exactly which thread it is running on. This capa­bility is important for operating system software within multiprocessor configurations.
Register Short Name: PIR Read Access: Priv
Decimal SPR Number: 286 Write Access: None
Initial Value: 0x0000000000000000 Duplicated for Multithread: N
Slow SPR: N Notes:
Guest Supervisor Mapping: GPIR Scan Ring: func
Bits Field Name
32:53 ///
54:61 CID
IO
Initial Val ue
0x0 Reserved
0x0 Processor Core ID
Returns the value of the I/O pin an_ac_coreid. This can be used to distinguish a processor core from other processor cores in the system.
IO
Description
CPU Programming Model
Page 70 of 864
Version 1.3
October 23, 2012
Page 71
User’s Manual
A2 Processor
Bits Field Name
62:63 TID
Initial Val ue
0b00 Processor Thread ID
This field can be used to distinguish the thread from other threads on the processor. Threads are numbered sequentially, with valid values ranging from 0 to 3.
Description
2.3.1.3 Guest Processor Identification Register (GPIR)
The GPIR is a register that identifies a specific instance of a processor thread for the guest operating system. The GPIR is used to filter incoming processor messages. See Processor Messages on page 357.
Register Short Name: GPIR Read Access: Priv
Decimal SPR Number: 382 Write Access: Hypv
Initial Value: 0x0000000000000000 Duplicated for Multithread: Y
Slow SPR: N Notes: HM
Guest Supervisor Mapping: Y Scan Ring: func
Bits Field Name
32:49 VPTAG
50:63 DBTAG
Initial
Val ue
0x0 Virtual Processor Tag
Storage used by the guest operating system to identify the virtual processor on which the operating system is running.
0x0 Doorbell Tag
Used to match guest doorbell messages that are sent to all the processors and virtual pro­cessors in a coherence domain. If a sent guest doorbell message tag matches the DBTAG field, a guest doorbell is said to be accepted on the (virtual) processor.
Description

2.3.2 Thread Run State

The A2 core provides several methods for controlling a thread’s run state. For a thread to fetch instructions, all methods outlined below must be properly configured. If any one I/O or register is configured to stop a thread, the affected thread will not fetch instructions.
2.3.2.1 Thread Stop I/O Pin
The I/O pin, an_ac_pm_thread_stop, can be used to stop the A2 core from fetching instructions. Stopping a thread causes all instructions that have begun executing to be completed and all prefetched instructions to be discarded.
2.3.2.2 Thread Control and Status Register (THRCTL)
The SCOM
accessible THRCTL register can control the thread run state to allow an external debugger control of the processor. See Direct Access to I-Cache and D-Cache Directories on page 437. Stopping a thread via THRCTRL causes all instructions that have begun executing to be completed and all prefetched instructions to be discarded.
Version 1.3 October 23, 2012
CPU Programming Model
Page 71 of 864
Page 72
User’s Manual
A2 Processor
2.3.2.3 Core Configuration Register 0 (CCR0)
The CCR0 is used to disable or enable threads. When a thread is disabled by setting the CCR0 bit corre­sponding to the thread to 0, all instructions that have begun executing are completed and all prefetched instructions are discarded. Subsequent instructions are not prefetched or initiated. Asynchronous interrupts or other conditions that are unmasked and enabled in CCR1 for the thread will cause the thread to be re­enabled. Executing a wait instruction on a thread will cause that thread’s CCR0[WE] to be set to 1. CCR0 also contains controls for allowing the processor to enter a power managed state. See Section 13 Power Management Methods on page 525 for information about power savings modes.
Programming Note: When using mtccr0 to put other threads to sleep, using an external interrupt or any asynchronous interrupt as the wake-up method is not reliable. The thread being put to sleep might have just taken an interrupt and MSR(EE) is zero, preventing wake-up. In this case, mtccr0 should be used to wake up the sleeping threads. A thread can put itself to sleep using mtccr0 or the wait instruction and wake up using an external interrupt or any asynchronous interrupt reliably.
Register Short Name: CCR0 Read Access: Hypv
Decimal SPR Number: 1008 Write Access: Hypv
Initial Value: 0x0000000000000000 Duplicated for Multithread: N
Slow SPR: N Notes:
Guest Supervisor Mapping: Scan Ring: bcfg
Bits Field Name
32:33 PME
34:51 ///
52:55 WEM
56:59 ///
60:63 WE
Initial Val ue
0b00 Power Management Enable
00 Disabled: No power savings mode entered. 01 PM_Sleep_enable: PM_Sleep state entered when all threads are stopped. 10 PM_RVW_enable: PM_RVW state entered when all threads are stopped. 11 Disabled2: No power savings mode entered. Note: See the A2 User Manual, Power Management Methods section.
0x0 Reserved
0b0000 Wait Enable Mask
0 No effect to CCR0[WE]. 1 Allows writing of the corresponding bit in the CCR0[WE] field. These bits are non-
0b0000 Reserved
0b0000 Wait Enable
For t < 4, bit 63-t corresponds to thread t: 0 Indicates that the thread is enabled. 1 Indicates that the thread is disabled. Note: This field can also be set by a wait instruction.
persistent. A read always returns zeros.
Description
2.3.2.4 Thread Enable Register (TENS, TENC)
The Thread Enable Register is used to disable or enable threads and is provided as a means to access shared resources (see Accessing Shared Resources on page 78). When a thread is disabled by setting the TEN bit corresponding to the thread 0, all instructions that have begun executing are completed and all prefetched instructions are discarded. Subsequent instructions are not prefetched or initiated. All asynchro­nous interrupts for the thread are delayed until the thread is re-enabled.
CPU Programming Model
Page 72 of 864
Version 1.3
October 23, 2012
Page 73
User’s Manual
A2 Processor
The TEN is accessed by using two registers: TENS and TENC. When TENS is written, threads for which the corresponding bit in TENS is 1 are enabled; threads for which the corresponding bit in TENS is 0 are unaf­fected. When TENC is written, threads for which the corresponding bit in TENC is 1 are disabled; threads for which the corresponding bit in TENC is 0 are unaffected. When either SPR
is read, the current value of the
TEN is returned.
Register Short Name: TENS Read Access: Hypv
Decimal SPR Number: 438 Write Access: Hypv
Initial Value: 0x0000000000000001 Duplicated for Multithread: N
Slow SPR: N Notes: WS
Guest Supervisor Mapping: Scan Ring: bcfg
Bits Field Name
0:31 ///
32:59 ///
60:63 TEN
Initial Val ue
0x0 Reserved
0x0 Reserved
0b0001 Thread Enable Set
Description
For t < 4, bit 63-t corresponds to thread t. When bit 63-t is set to 1, thread t is enabled, if it is not already. When bit 63-t is set 0, thread t is unaffected.
When bit 63-t is read, the current value of the thread enable is returned.
Register Short Name: TENC Read Access: Hypv
Decimal SPR Number: 439 Write Access: Hypv
Initial Value: 0x0000000000000001 Duplicated for Multithread: N
Slow SPR: N Notes: WC
Guest Supervisor Mapping: Scan Ring: bcfg
Bits Field Name
0:31 ///
32:59 ///
60:63 TEN
Initial Val ue
0x0 Reserved
0x0 Reserved
0b0001 Thread Enable Clear
Description
For t < 4, bit 63-t corresponds to thread t. When bit 63-t is set to 1, thread t is disabled, if it is not already. When bit 63-t is set 0, thread t is unaffected.
When bit 63-t is read, the current value of the thread enable is returned.
2.3.2.5 Thread Enable Status Register (TENSR)
The TENSR indicates which threads are quiesced.
Programming Note: The TENSR is only valid after a context synchronizing instruction or an event that pre­cisely stops a thread, such as a write to TEN.
Programming Note: When thread T1 disables other threads, Tn, it sets the 10 bits corresponding to Tn to zeros. To ensure that all operations being performed by threads Tn have been performed with respect to all threads on the processor, thread T1 reads the TENSR until all the bits corresponding to the disabled threads, Tn, are zeros.
Version 1.3 October 23, 2012
CPU Programming Model
Page 73 of 864
Page 74
User’s Manual
A2 Processor
Register Short Name: TENSR Read Access: Hypv
Decimal SPR Number: 437 Write Access: None
Initial Value: 0x0000000000000000 Duplicated for Multithread: N
Slow SPR: N Notes:
Guest Supervisor Mapping: Scan Ring: func
Bits Field Name
0:31 ///
32:59 ///
60:63 TENSR
Initial Val ue
0x0 Reserved
0x0 Reserved
0b0000 Thread Enable Status Register
Description
Bit 63-t of the TENSR corresponds to thread t.

2.3.3 Wake On Interrupt

The A2 core can be configured to wake on interrupts or other conditions, if the thread was disabled by a write to CCR0 or by executing a wait instruction.
2.3.3.1 Core Configuration Register 1 (CCR1)
CCR1 provides additional masking on what conditions can cause the processor to resume execution. The conditions or interrupts specified must be appropriately unmasked and must also be enabled in CCR1 to exit the stopped state.
Register Short Name: CCR1 Read Access: Hypv
Decimal SPR Number: 1009 Write Access: Hypv
Initial Value: 0x000000000F0F0F0F Duplicated for Multithread: N
Slow SPR: N Notes:
Guest Supervisor Mapping: Scan Ring: func
Bits Field Name
32:33 ///
34:39 WC3
40:41 ///
CPU Programming Model
Page 74 of 864
Initial Val ue
0b00 Reserved
0xF Thread 3 Wake Control
Description
(0) 1 Disables sleep on waitrsv. (1) 1 Disables sleep on waitimpl. (2) 1 Enables wake on critical input, watchdog, critical doorbell, guest critical doorbell,
or guest machine check doorbell interrupts.
(3) 1 Enables wake on external input, performance monitor, doorbell, or guest doorbell
interrupts. (4) 1 Enables wake on decrementer or user decrementer interrupts. (5) 1 Enables wake on fixed interval timer interrupts.
0b00 Reserved
Version 1.3
October 23, 2012
Page 75
User’s Manual
A2 Processor
Bits Field Name
42:47 WC2
48:49 ///
50:55 WC1
56:57 ///
58:63 WC0
Initial Val ue
0xF Thread 2 Wake Control
(0) 1 Disables sleep on waitrsv. (1) 1 Disables sleep on waitimpl. (2) 1 Enables wake on critical input, watchdog, critical doorbell, guest critical doorbell,
(3) 1 Enables wake on external input, performance monitor, doorbell, or guest doorbell
(4) 1 Enables wake on decrementer or user decrementer interrupts. (5) 1 Enables wake on fixed interval timer interrupts.
0b00 Reserved
0xF Thread 1 Wake Control
(0) 1 Disables sleep on waitrsv. (1) 1 Disables sleep on waitimpl. (2) 1 Enables wake on critical input, watchdog, critical doorbell, guest critical doorbell,
(3) 1 Enables wake on external input, performance monitor, doorbell, or guest doorbell
(4) 1 Enables wake on decrementer or user decrementer interrupts. (5) 1 Enables wake on fixed interval timer interrupts.
0b00 Reserved
0xF Thread 0 Wake Control
(0) 1 Disables sleep on waitrsv. (1) 1 Disables sleep on waitimpl. (2) 1 Enables wake on critical input, watchdog, critical doorbell, guest critical doorbell,
(3) 1 Enables wake on external input, performance monitor, doorbell, or guest doorbell
(4) 1 Enables wake on decrementer or user decrementer interrupts. (5) 1 Enables wake on fixed interval timer interrupts.
or guest machine check doorbell interrupts.
interrupts.
or guest machine check doorbell interrupts.
interrupts.
or guest machine check doorbell interrupts.
interrupts.
Description

2.3.4 Thread Priority

Thread priority can be changed by writing the PPR32 register, executing an or Rx,Rx,Rx instruction, or by causing an interrupt.
2.3.4.1 Program Priority Register (PPR32)
The program priority register controls thread priority. A2 hardware supports three physical priorities. In A2’s lowest hardware priority, the number of cycles between two instructions being issued is determined by IUCR1[THRES]. See Instruction Unit Configuration Register 1 (IUCR1) on page 77.
The mapping of the three hardware priorities to the architected priorities in the PPR32 register is shown in
Table 2-3. An or Rx,Rx,Rx is used to set PPR32[PRI]; these are also shown in Table 2-3. Other defined or Rx,Rx,Rx hints shown in Table 2-4 are ignored. PPR32[PRI] remains unchanged if the privilege state of the
processor executing the instruction is lower than the privilege indicated in Table 2-3. PPR32[PRI] also remains unchanged if “000” is written to the field.
If MSR[EE] is 0 and PPR32 = low then thread priority is increased to medium; PPR32 is unchanged. When MSR[EE] is 1, thread priority is determined by PPR32[PRI]. This function is provided to reduce delay in the processing of interrupts.
Version 1.3 October 23, 2012
CPU Programming Model
Page 75 of 864
Page 76
User’s Manual
A2 Processor
Table 2-3. Priority Levels
Rx PPR32[PRI] ISA Priority
A2 Hardware Priority with IUCR1[HIPRI] Setting
00 01 10 11
31 001 very low a2low a2low a2low a2low yes
1010 low no
6 011 medium low a2medium a2medium a2medium a2medium no
2 100 medium a2high no
5 101 medium high a2high yes
3 110 high a2high yes
7 111 very high a2high hypv
Table 2-4. Other “or” Instruction Hints
Rx Mnemonic Reserved
27 yield Yes
29 mdoio Yes
30 mdoom Yes
Table 2-5. Program Priority Register (PPR32)
Register Short Name: PPR32 Read Access: Any
Decimal SPR Number: 898 Write Access: Any
Initial Value: 0x00000000000C0000 Duplicated for Multithread: Y
Slow SPR: Y Notes:
Guest Supervisor Mapping: Scan Ring: ccfg
Privileged
Bits Field Name
32:42 ///
43:45 PRI
46:63 ///
CPU Programming Model
Page 76 of 864
Initial Val ue
0x0 Reserved
0b011 Thread Priority
Description
001 Very low (privileged). 010 Low. 011 Medium low. 100 Medium. 101 Medium high (privileged). 110 High (privileged). 111 Very high (hypervisor). Access violations or writing a value of zero will result in a nop.
0x0 Reserved
Version 1.3
October 23, 2012
Page 77
User’s Manual
A2 Processor
2.3.4.2 Instruction Unit Configuration Register 1 (IUCR1)
Register Short Name: IUCR1 Read Access: Hypv
Decimal SPR Number: 883 Write Access: Hypv
Initial Value: 0x0000000000001000 Duplicated for Multithread: Y
Slow SPR: Y Notes:
Guest Supervisor Mapping: Scan Ring: ccfg
Bits Field Name
32:49 ///
50:51 HIPRI
52:57 ///
58:63 THRES
Initial Val ue
0x0 Reserved
0b01 High Priority Privilege Level
The A2 core has three priority values implemented in hardware. This field configures which value in PPR32[PRI] corresponds to the implementations highest priority.
00 Medium normal. 01 Medium high. 10 High. 11 Very high.
0x0 Reserved
0x0 Low Priority Minimum Issue Count
Sets the number of cycles between low priority issues, which is set by PPR32[PRI]. The number of cycles is equal to THRES 4. This field is not used when a thread is set to high or medium priority.
Description

2.3.5 Resources Shared between Threads

All architected states are duplicated for each thread except for logical partitioning and memory. This allows each thread to look independent from a software standpoint. Some nonarchitected resources are shared between threads to save on the overall area for the core. Section 2.3.6 provides more information about shared resources. Section 2.3.7 on page 78 provides more information about duplicated resources.

2.3.6 Shared Resources

Instruction ERAT array Entries can be used as shared or thread specific. L1 instruction cache array Data ERAT array Entries can be used as shared or thread specific. L1 data cache array Load miss queue Store queue Microcode ROM
array Branch history table This is a configurable resource and can be set up to be shared or duplicated. SPR registers Not all SPRs are shared. See Table 14-1 Register Summary on page 530 for
more information. Instruction fetch pipeline Instruction issue Integer execution pipeline
Version 1.3 October 23, 2012
CPU Programming Model
Page 77 of 864
Page 78
User’s Manual
A2 Processor
TLB LRAT
2.3.6.1 Accessing Shared Resources
When software executing in thread Tn writes a new value in an SPR (mtspr) that is shared with other threads, either of the following sequences of operations can be performed to ensure that the write operation has been performed with respect to other threads.
Sequence 1
• Disable all other threads (see Thread Enable Register (TENS, TENC) on page 72).
• Write to the shared SPR (mtspr).
• Perform a context synchronizing operation.
• Enable the previously disabled threads.
In the above sequence, the context synchronizing operation ensures that the write operation has been performed with respect to all other threads that share the SPR. The enabling of other threads ensures that subsequent instructions of the enabled threads use the new SPR value because enabling a thread is a context synchronizing operation.
Sequence 2
• All threads are put in hypervisor state and begin polling a storage flag.
• The thread updating the SPR does the following:
• Writes to the SPR (mtspr).
• Sets a storage flag indicating that the write operation was done.
• Performs a context synchronizing operation.
• When other threads see the updated storage flag, they perform context synchronizing operations.
In the above sequence, the context synchronizing operation by the thread that writes to the SPR ensures that the write operation has been performed with respect to all other threads that share the SPR; the context synchronizing operation by the other threads ensures that subsequent instructions for these threads use the updated value.

2.3.7 Duplicated Resources

Link stack queue Instruction buffer Thread dependency GPR register file This includes extra registers for microcode instruction use. SPR registers Not all SPRs are duplicated. See Table 14-1 Register Summary on page 530 for
more information.
Branch history table This is a configurable resource and can be setup to be shared or duplicated.
CPU Programming Model
Page 78 of 864
Version 1.3
October 23, 2012
Page 79
User’s Manual
A2 Processor

2.3.8 Pipeline Sharing

Figure 2-1 shows the instruction flow for the A2 core.
Figure 2-1. A2 Core Instruction Unit
Version 1.3 October 23, 2012
CPU Programming Model
Page 79 of 864
Page 80
User’s Manual
A2 Processor
2.3.8.1 Instruction Cache
The instruction cache is a shared resource between all threads where a single thread can be selected each cycle dependent upon the number of instructions currently contained within that thread’s instruction buffers. There are two watermarks within the instruction buffer that determine a thread’s priority level for fetches that are empty and half-empty. The empty watermarks gives the corresponding thread high priority and a half­empty level gives the thread a low-priority fetch request. The high-priority and low-priority fetches are two separate round-robin queues to give each thread an even chance at getting the next command. A low-priority fetch is only issued when none of the high-priority water marks are active. The instruction cache and instruc­tion directories are 4-way associative and are a shared resource between all threads. The branch prediction unit that is part of the instruction cache in Figure 2-1 on page 79 contains a branch history table and link stack to allow proper branch resolution. The link stack is a 4-deep queue per thread whereas the branch history table is a 2-bit history that can configured to either 1 k per thread or a 4 k history shared between all four threads.
2.3.8.2 Instruction Buffer and Decode Dependency
The colored portion of Figure 1-1 on page 50 contains all of the instruction buffer, decode, and dependency logic for each of the threads. This logic is duplicated for each thread to allow other threads with nondependent commands to be issued to maximize usage for the integer and floating-point pipelines.
2.3.8.3 Instruction Issue
Instruction issue is a shared resource within the core, and the logic is a 1+1 concurrent issue machine. This allows two commands to be issued per cycle; however, each of the commands issued must be from separate threads with one to the XU
and another to the AXU units. The selection logic for the issue logic is a simple
round-robin scheme with three levels of priority to allow software more flexibility.
See Figure 2-2, Figure 2-3, and Figure 2-4 for examples of round-robin logic.
Figure 2-2. Instruction Issue Timing Diagram 1
(Thread 0, high priority; threads 1, 2, 3 low priority; timeout set to 3.)
CPU Programming Model
Page 80 of 864
Version 1.3
October 23, 2012
Page 81
User’s Manual
A2 Processor
.
Figure 2-3. Instruction Issue Timing Diagram 2
(All threads set to high priority; timeout set to 3.)
Figure 2-4. Instruction Issue Timing Diagram 3 (Threads 0 and 1, high priority; threads 2 and 3, medium priority;
timeout set to 3.)
2.3.8.4 Ram Unit
The Ram unit allows an external command to be issued within a given thread’s instruction stream. This unit is a shared resource within a core in that only one thread can issue a Ram command at a time. It is software’s responsibility to only allow one outstanding command per core, and it is necessary to poll the core until this command has completed before issuing any new commands.
Version 1.3 October 23, 2012
CPU Programming Model
Page 81 of 864
Page 82
User’s Manual
A2 Processor
2.3.8.5 Microcode Unit
The microcode unit (uCode) is partially shared and partially duplicated logic. The ROM that contains the actual stream of instructions to be issued is a shared unit; however, each thread contains its own microcode engine so that all four threads can be within a uCode stream at the same time. One of the engines will read a single command from the ROM each cycle based upon a fair round-robin scheme (not based upon the thread priority level for the issue logic), and issue that command to the appropriate thread’s instruction buffer. If the instruction buffer is over halfway filled, the uCode will stop issuing new commands. In addition, it will not include this thread for ROM reads until the instruction buffer has drained below this point.
2.3.8.6 Integer Unit
The integer execution unit is shared between threads because there is a unified execution, load/store, and branch pipeline. Exceptions and flushes from one thread usually will not affect another thread.
However, a flush that will affect all threads when encountered by one of the threads is caused by a data cache invalidate (DCI) or instruction cache invalidate (ICI) that reaches completion. A DCI or ICI will flush all threads for one cycle to allow the L1 caches to be invalidated. Software is required to guarantee that the load miss queue is empty for all threads before execution of a DCI.
Another flush condition caused by one thread that can affect another thread occurs when reload data returning for an outstanding load collides with a load or store at the data cache array pins.
For a comprehensive list of flush conditions, see Interrupt Conditions on page 854.
Some multiply operations and all divide operations require recirculation within the multiply/divide unit, there­fore blocking all other threads from executing multiplies and divides. This does not prevent other threads from executing any instructions other than multiplies and divides. If any multiply or divide instructions are issued and collide with a recirculating multiply or divide, the younger instructions are flushed. In the case of the multi­plier, the size of the operands determines how many cycles are needed for recirculation. The width of the multiplier is 32 bits by 32 bits, so any operations that require multiplying 64-bit operands will require recircula­tion. If both operands are 32 bits, no recirculation is needed (in other words, the instruction is pipelined as normal). The width of the divider is 64 bits. Divide instructions dealing with 64-bit operands recirculate for 65 cycles, and operations with 32-bit operands recirculate for 32 cycles. No divide instructions are pipelined; they all require some recirculation.
A forward progress timer monitors that each thread is making forward progress. If the thread appears to be hung, thread priorities are adjusted to break out of a potential live-lock condition.

2.4 Registers

This section provides an overview of the register categories and types provided by the A2 core. Detailed descriptions of each of the registers are provided within the chapters covering the functions with which they are associated (for example, the cache control and cache debug registers are described in Instruction and
Data Caches on page 169). An alphabetical summary of all registers, including bit definitions, is provided in Register Summary on page 529
All registers in the A2 core are architected as 64 bits wide, although certain bits in some registers are reserved and thus not necessarily implemented. For all registers with fields marked as reserved, these reserved fields should be written as 0 and read as undefined. The recommended coding practice is to
CPU Programming Model
Page 82 of 864
Version 1.3
October 23, 2012
Page 83
User’s Manual
Integer Processing
GPR0
GPR1
GPR31
GPR2
Condition Register
CR
XER
Link Register
LR
CTR
Timer
TBU
TB
SPRG4
SPRG5
SPRG7
SPRG6
Processor Control
VR Save Register
VRSAVE
Count Register
Integer Exception Register
Time Base
Branch Control
SPR General 3–7
General Purpose
Replicated per Thread
SPRG3
UDEC
User Decrementer Register
A2 Processor
perform the initial write to a register with reserved fields set to 0, and to perform all subsequent writes to the register using a read-modify-write strategy: read the register; use logical instructions to alter defined fields, leaving reserved fields unmodified; and write the register.
All of the registers are grouped into categories according to the processor functions with which they are asso­ciated. In addition, each register is classified as being of a particular type, as characterized by the specific instructions that are used to read and write registers of that type. Finally, most of the registers contained within the A2 core are defined by the Power ISA Architecture, although some registers are implementation­specific and unique to the A2 core.
Figure 2-5 illustrates the A2 core registers contained in the user programming model; that is, those registers to which access is nonprivileged and that are available to both user and supervisor programs.
Figure 2-5. User Programming Model Registers
Table 14-1 on page 530 lists the A2 core registers contained in the supervisor or hypervisor programming
model, to which access is privileged.
Version 1.3 October 23, 2012
CPU Programming Model
Page 83 of 864
Page 84
User’s Manual
A2 Processor

2.4.1 Register Mapping

Some special purpose register (SPR) accesses in guest state are mapped to analogous registers for the guest state. This removes the requirement for the hypervisor software to handle embedded hypervisor privi­lege interrupts for these accesses and make the required emulated changes by the hypervisor for these high­use registers.
Accesses to the registers listed in Table 2-6 are changed by the processor to the registers given in the table when the processor is in guest state (MSR[GS] = 1). Accesses to these registers are not mapped when not in guest state.
Table 2-6. Register Mapping
SPR Accessed SPR Mapped to Type of Access
SRR0 GSRR0 mtspr, mfspr
SRR1 GSRR1 mtspr, mfspr
ESR GESR mtspr, mfspr
DEAR GDEAR mtspr, mfspr
PIR GPIR mtspr, mfspr
SPRG0 GSPRG0 mtspr, mfspr
SPRG1 GSPRG1 mtspr, mfspr
SPRG2 GSPRG2 mtspr, mfspr
SPRG3 GSPRG3 mtspr, mfspr
USPRG3 GSPRG3 mtspr, mfspr

2.4.2 Register Types

There are five register types contained within and/or supported by the A2 core. Each register type is charac­terized by the instructions that are used to read and write the registers of that type. The following subsections provide an overview of each of the register types and the instructions associated with them.
2.4.2.1 General Purpose Registers
The A2 core contains 32 integer general purpose registers (GPRs); each contains 64 bits. In 32-bit mode, all instructions that operate on GPRs produce the same GPR results in 32-bit mode as in 64-bit mode.
Integer Processing on page 110 provides more information about integer operations and the use of GPRs.
2.4.2.2 Special Purpose Registers
Special Purpose Registers (SPRs) are directly accessed using the mtspr and mfspr instructions. In addition, certain SPRs might be updated as a side-effect of the execution of various instructions. For example, the Integer Exception Register (XER) (see Integer Exception Register (XER) on page 110) is an SPR that is updated with arithmetic status (such as carry and overflow) upon execution of certain forms of integer arith­metic instructions.
CPU Programming Model
Page 84 of 864
Version 1.3
October 23, 2012
Page 85
User’s Manual
A2 Processor
SPRs control the use of the debug facilities, timers, interrupts, memory management, caches, and other architected processor resources. Table 14-1 on page 530 shows the mnemonic, name, and number for each SPR, in alphabetical order. Each of the SPRs is described in more detail within the section or chapter covering the function with which it is associated.
2.4.2.3 Condition Register
The Condition Register (CR) is a 32-bit register of its own unique type and is divided up into eight, indepen­dent 4-bit fields (CR0–CR7). The CR can be used to record certain conditional results of various arithmetic and logical operations. Subsequently, conditional branch instructions can designate a bit of the CR as one of the branch conditions (see Wait Instruction on page 98). Instructions are also provided for performing logical bit operations and for moving fields within the CR.
See Condition Register (CR) on page 107 for more information about the various instructions that can update the CR.
2.4.2.4 Machine State Register
The Machine State Register (MSR) is a register of its own unique type that controls important chip functions, such as the enabling or disabling of various interrupt types.
The MSR can be written from a GPR using the mtmsr instruction. The contents of the MSR can be read into a GPR using the mfmsr instruction. The MSR[EE] bit can be set or cleared atomically using the wrtee or wrteei instructions. The MSR contents are also automatically saved, altered, and restored by the interrupt­handling mechanism. See Machine State Register (MSR) on page 301 for more detailed information about the MSR and the function of each of its bits.

2.5 32-Bit Mode

2.5.1 64-Bit Specific Instructions

Instructions or registers that are categorized as 64-bit are only available in 64-bit implementations of the A2 core. In a 64-bit implementation in 32-bit mode, all instructions that operate on GPRs produce the same GPR results in 32-bit mode as in 64-bit mode. Instructions that set condition bits do so based on the 32-bit result computed. Effective addresses and all SPRs operate on the low-order 32 bits only unless otherwise stated.

2.5.2 32-Bit Instruction Selection

Any software that uses any of the instructions listed in the 64-bit category is considered 64-bit software. Generally speaking, 32-bit software should avoid using any instruction or instructions that depend on any particular setting of bits 0:31 of any 64-bit application-accessible system register, including General Purpose Registers, for producing the correct 32-bit results. Context switching might or might not preserve the upper 32 bits of application-accessible 64-bit system registers, and insertion of arbitrary settings of those upper 32 bits at arbitrary times during the execution of the 32-bit application must not affect the final result.
Version 1.3 October 23, 2012
CPU Programming Model
Page 85 of 864
Page 86
User’s Manual
A2 Processor

2.6 Instruction Categories

The Power ISA defines that each facility (including registers and fields therein) and instruction is in exactly one category. Table 2-7 indicate the categories that are implemented by the A2 processor core.
Table 2-7. Category Listing
Implemented
by A2 Core
Yes Base B Required for all implementations.
No Server S Required for server implementations.
Yes Embedded E Required for embedded implementations.
No Alternate Time Base ATB An additional time base; see Book II.
Yes Cache Specification CS Specify a specific cache for some instructions; see Book II.
No Decimal Floating-Point DFP Decimal floating-point facilities.
No Decorated Storage DS Decorated storage facilities.
No Embedded.Cache Debug E.CD Provides direct access to cache data and directory content.
Yes Embedded.Cache Initialization E.CI Instructions that invalidate the entire cache.
No Embedded.Device Control E.DC Embedded device control bus support.
No Embedded.Enhanced Debug E.ED Embedded enhanced debug facility; see Book III-E.
Yes Embedded.External PID E.PD Embedded external PID facility; see Book III-E.
Yes Embedded.Hypervisor
Embedded.Hypervisor.LRAT
Yes Embedded.Little-Endian E.LE Embedded little-endian page attribute; see Book III-E.
Yes Embedded.Page Table E.PT Embedded page table facility; see Book III-E.
Yes Embedded.TLB Write Conditional E.TWC Embedded TLB write conditional facility; see Book III-E.
No Embedded.Performance Monitor E.PM Embedded performance monitor example; see Book III-E.
Yes Embedded.Processor Control E.PC Processor control facility; see Book III-E.
Yes Embedded Cache Locking ECL Embedded cache locking facility; see Book III-E.
Yes Embedded Multithreading
Embedded multiThread­ing.Thread Management
No External Control EXC External control facility; see Book II.
No External Proxy EXP External proxy facility; see Book III-E.
Yes Floating-Point
Floating-Point.Record
No Legacy Move Assist LMV Determine left most zero byte instruction.
No Legacy Integer Multiply-
Accumulate1
No Load/Store Quadword LSQ Load/store quadword instructions; see Book III-S.
Yes Memory Coherence MMC Requirement for memory coherence; see Book II.
No Move Assist MA Move assist instructions.
No Processor Compatibility PCR Processor compatibility register.
(Sheet 1 of 2)
Category Abbreviation Notes
E.HV E.HV.LRAT
EM EM.TM
FP FP.R
LMA Legacy integer multiply-accumulate instructions.
Embedded logical partitioning and hypervisor facilities. Embedded hypervisor logical to real address translation.
Embedded multithreading; see Book III-E. Embedded multithreading thread management facility.
Floating-point facilities. Floating-point instructions with Rc
= 1.
CPU Programming Model
Page 86 of 864
Version 1.3
October 23, 2012
Page 87
User’s Manual
A2 Processor
Table 2-7. Category Listing (Sheet 2 of 2)
Implemented
by A2 Core
No Server.Performance Monitor S.PM Performance monitor example for servers; see Book III-S.
No Server.Relaxed Page Table Align-
No Signal Processing Engine
Yes Store Conditional Page Mobility SCPM Store conditional accounting for page movement; see Book II.
No Stream STM Stream variant of dcbt instruction; see Book II.
No Strong Access Order SAO Assist for X86 emulation; see Book II.
No Trace TRC Trace facility example; see Book III-S.
No Variable Length Encoding VLE Variable length encoding facility; see Book VLE.
determined by
AXU
determined by
AXU
Yes Wait WT Wait instruction; see Book II.
Yes
ment
SPE ble
SPE.Embedded Float Scalar Sin­gle
SPE.Embedded Float Vector
Vector-Scalar Extension VSX Vector-scalar extension.
Vector Little-Endian
64-Bit 64
Category Abbreviation Notes
S.RPTA HTAB alignment on a 256 KB boundary; see Book III-S.
.Embedded Float Scalar Dou-
SP SP.FD SP.FS SP.FV
V V.LE
Facility for signal processing. GPR-based floating-point double-precision instruction set. GPR-based floating-point single-precision instruction set. GPR-based floating-point vector instruction set.
Vector facilities. Little-endian support for vector storage operations.
Required for 64-bit implementations; not defined for 32-bit implementations.

2.7 Instruction Classes

Power ISA architecture defines all instructions as falling into exactly one of the following three classes, as determined by the primary opcode (and the extended opcode, if any):
1. Defined
2. Illegal
3. Reserved

2.7.1 Defined Instruction Class

This class of instructions consists of all the instructions defined in Power ISA. In general, defined instructions are guaranteed to be supported within a Power ISA system as specified by the architecture, either within the processor implementation itself or within emulation software supported by the system operating software.
As defined by Power ISA, any attempt to execute a defined instruction will:
• Cause an illegal instruction exception type of program interrupt, if the instruction is not recognized by the implementation; or
• Cause a floating-point unavailable interrupt if the instruction is recognized as a floating-point instruction, but floating-point processing is disabled; or
Version 1.3 October 23, 2012
CPU Programming Model
Page 87 of 864
Page 88
User’s Manual
A2 Processor
• Perform the actions described in the rest of this document, if the instruction is recognized and supported by the implementation. The architected behavior might cause other exceptions.
The A2 core recognizes and fully supports all of the instructions in the defined class and in the categories supported, with a few exceptions. First, instructions that are defined for floating-point processing are not supported within the A2 core, but can be implemented within an auxiliary processor and attached to the core using the AXU interface. If no such auxiliary processor is attached, attempting to execute any floating-point instructions causes an illegal instruction exception type of program interrupt. If an auxiliary processor that supports the floating-point instructions is attached, the behavior of these instructions is as defined above and as determined by the implementation details of the floating-point auxiliary processor.

2.7.2 Illegal Instruction Class

This class of instructions contains the set of instructions described in Power ISA Appendix D of Book Appen­dices. Illegal instructions are available for future extensions of the Power ISA; that is, some future version of the Power ISA might define any of these instructions to perform new functions.
Any attempt to execute an illegal instruction causes the system illegal instruction error handler to be invoked and will have no other effect.
An instruction consisting entirely of binary zeros is guaranteed always to be an illegal instruction. This increases the probability that an attempt to execute data or uninitialized storage will result in the invocation of the system illegal instruction error handler.

2.7.3 Reserved Instruction Class

This class of instructions contains the set of instructions described in Power ISA Appendix E of Book Appen­dices.
Reserved instructions are allocated to specific purposes that are outside the scope of the Power ISA.
Any attempt to execute a reserved instruction causes the system illegal instruction error handler to be invoked if the instruction is not implemented.
Because implementations are typically expected to treat reserved-nop instructions as true no-ops, these instruction opcodes are available for future extensions to Power ISA that have no effect on the architected state. Such extensions might include performance-enhancing hints, such as new forms of cache touch instructions. Software would be able to take advantage of the functionality offered by the new instructions and still remain backwards-compatible with implementations of previous versions of Power ISA.
The A2 core implements all of the reserved-nop instruction opcodes as true no-ops. The specific reserved­nop opcodes are the following extended opcodes under primary opcode 31: 530, 562, 594, 626, 658, 690, 722, and 754.

2.8 Implemented Instruction Set Summary

This section provides an overview of the various types and categories of instructions implemented within the A2 core. Appendix A Processor Instruction Summary on page 737 lists each implemented instruction alpha- betically (and by opcode) along with a short-form description and its extended mnemonics.
CPU Programming Model
Page 88 of 864
Version 1.3
October 23, 2012
Page 89
User’s Manual
A2 Processor
Table 2-8 summarizes the A2 core instruction set by category. Instructions within each category are described in subsequent sections.
Table 2-8. Instruction Categories
Category Subcategory Instruction Types
Integer Storage Access load, store
Integer Arithmetic add, subtract, multiply, divide, negate
Integer Logical
Integer Compare compare, compare logical
Integer
Integer Select select operand
Integer Trap trap
Integer Rotate rotate and insert, rotate and mask
Integer Shift shift left, shift right, shift right algebraic
Branch branch, branch conditional, branch to link, branch to count
Condition Register Logical crand, crandc, cror, crorc, crnand, crnor, crxor, crxnor
Register Management
Processor Control
System Linkage
Processor Synchronization instruction synchronize
Cache Management
Storage Control
Note: The A2 core does not implement any device control registers (DCRs). Move to and move from DCR instructions are dropped silently. They are no-ops and do not cause an exception.
TLB Management read, write, search, synchronize
Storage Synchronization memory synchronize, memory barrier
and, andc, or, orc, xor, nand, nor, xnor, extend sign, count leading zeros
move to/from SPR, move to/from MSR, write to external interrupt enable bit, move to/from CR
system call, return from interrupt, return from critical interrupt, return from machine check interrupt
data allocate, data invalidate, data touch, data zero, data flush, data store, instruction invalidate, instruction touch

2.8.1 Integer Instructions

Integer instructions transfer data between memory and the GPRs and perform various operations on the GPRs. This category of instructions is further divided into seven subcategories, described in the following sections.
2.8.1.1 Integer Storage Access Instructions
Integer storage access instructions load and store data between memory and the GPRs. These instructions operate on bytes, halfwords, and words. Integer storage access instructions also support loading and storing multiple registers, character strings, and byte-reversed data, and loading data with sign-extension.
Table 2-9 shows the integer storage access instructions in the A2 core. In the table, the syntax “[u]” indicates that the instruction has both an “update” form (in which the RA addressing register is updated with the calcu­lated address) and a “nonupdate” form. Similarly, the syntax “[x]” indicates that the instruction has both an “indexed” form (in which the address is formed by adding the contents of the RA and RB GPRs) and a “base + displacement” form (in which the address is formed by adding a 16-bit signed immediate value (spec­ified as part of the instruction) to the contents of GPR RA.
Version 1.3 October 23, 2012
CPU Programming Model
Page 89 of 864
Page 90
User’s Manual
A2 Processor
Table 2-9. Integer Storage Access Instructions
Loads Stores
Byte Halfword Word Double Multiple/String Byte Halfword Word Double Multiple/String
lbz[u][x]
lha[u][x] lhbrx lhz
[u][x]
lwbrx
[u][x]
lwz lwa[u][x]
ld[u][x] ldbrx
lmw lswi lswx
stb
[u][x]
sth[u][x] sthbrx
[u][x]
stw stwbrx
[u][x]
std stdbrx
stmw stswi stswx
Table 2-10. Integer Storage Access Instructions by External Process ID
Loads Stores
Byte Halfword Word Double Byte Halfword Word Double
lbepx lhepx lwepx ldepx stbepx sthepx stwepx stdepx
Table 2-11 shows how operands are handled depending on alignment. Optimal performance and configura­tion is achieved when operands are aligned.
Table 2-11. Operand Handling Dependent on Alignment (Sheet 1 of 2)
Operand Big Endian - Boundary Crossing Little Endian - Boundary Crossing
Size Byte Align None 32B Block 16B Block
8 Byte 8 Pipeline N/A N/A N/A Pipeline N/A N/A N/A
<8 Pipeline uCode uCode uCode Pipeline uCode uCode uCode
4 Byte 4 Pipeline N/A N/A N/A Pipeline N/A N/A N/A
<4 Pipeline uCode uCode uCode Pipeline uCode uCode uCode
2 Byte 2 Pipeline N/A N/A N/A Pipeline N/A N/A N/A
<2 Pipeline uCode uCode uCode Pipeline uCode uCode uCode
1 Byte 1 Pipeline N/A N/A N/A Pipeline N/A N/A N/A
lmw, stmw 4 uCode uCode uCode uCode uCode uCode uCode uCode
<4 Alignment
Exception
string uCode uCode uCode uCode uCode uCode uCode uCode
8 Byte 8 Pipeline N/A N/A N/A Pipeline N/A N/A N/A
<8 Pipeline uCode uCode uCode Pipeline uCode uCode uCode
4 Byte 4 Pipeline N/A N/A N/A Pipeline N/A N/A N/A
<4 Pipeline uCode uCode uCode Pipeline uCode uCode uCode
32 Byte 32 Pipeline N/A N/A N/A Pipeline N/A N/A N/A
<32 uCode uCode uCode uCode Pipeline uCode uCode uCode
Notes:
1. If the storage operand spans two virtual pages that have different storage control attributes, an alignment exception occurs.
2. Only valid if the request is a cache-inhibited load or a store request with the L2 interface in 16-byte mode.
Alignment Exception
Alignment
Exception
Any General Purpose AXU
2
Virtual Page None 32B Block 16B Block2Virtual Page
Integer
Alignment Exception
Float
Alignment Exception
Alignment
Exception
Alignment Exception
Alignment Exception
CPU Programming Model
Page 90 of 864
Version 1.3
October 23, 2012
Page 91
User’s Manual
A2 Processor
Table 2-11. Operand Handling Dependent on Alignment (Sheet 2 of 2)
Operand Big Endian - Boundary Crossing Little Endian - Boundary Crossing
2
Size Byte Align None 32B Block 16B Block
16 Byte 16 Pipeline N/A N/A N/A Pipeline N/A N/A N/A
<16 Pipeline uCode uCode uCode Pipeline uCode uCode uCode
8 Byte 8 Pipeline N/A N/A N/A Pipeline N/A N/A N/A
<8 Pipeline uCode uCode uCode Pipeline uCode uCode uCode
4 Byte 4 Pipeline N/A N/A N/A Pipeline N/A N/A N/A
<4 Pipeline uCode uCode uCode Pipeline uCode uCode uCode
2 Byte 2 Pipeline N/A N/A N/A Pipeline N/A N/A N/A
<2 Pipeline uCode uCode uCode Pipeline uCode uCode uCode
1 Byte 1 Pipeline N/A N/A N/A Pipeline N/A N/A N/A
Notes:
1. If the storage operand spans two virtual pages that have different storage control attributes, an alignment exception occurs.
2. Only valid if the request is a cache-inhibited load or a store request with the L2 interface in 16-byte mode.
Virtual Page None 32B Block 16B Block2Virtual Page
2.8.1.2 Integer Arithmetic Instructions
Arithmetic operations are performed on integer or ordinal operands stored in registers. Instructions that perform operations on two operands are defined in a 3-operand format; an operation is performed on the operands, which are stored in two registers. The result is placed in a third register. Instructions that perform operations on one operand are defined in a 2-operand format; the operation is performed on the operand in a register, and the result is placed in another register. Several instructions also have immediate formats in which one of the source operands is a field in the instruction.
Most integer arithmetic instructions have versions that can update CR[CR0] and/or XER[SO, OV] (Summary Overflow, Overflow), based on the result of the instruction. Some integer arithmetic instructions also update XER[CA] (Carry) implicitly. See Integer Processing on page 110 for more information about how these instructions update the CR and/or the XER.
Table 2-12 lists the integer arithmetic instructions in the A2 core. In the table, the syntax “[o]” indicates that the instruction has both an “o” form (which updates the XER[SO,OV] fields) and a “non-o” form. Similarly, the syntax “[.]” indicates that the instruction has both a “record” form (which updates CR[CR0]) and a “nonrecord” form.
Table 2-12. Integer Arithmetic Instructions
Add Subtract Multiply Divide Negate
add[o][.] addc[o][.] adde[o][.] addi addic
[.]
addis addme
[o][.]
addze[o][.]
subf[o][.] subfc[o][.] subfe[o][.] subfic subfme
[o][.]
subfze[o][.]
mulhw[.] mulhwu[.] mulli
[o][.]
mullw mulhd[.] mulhdu[.] mulld[o][.]
divw[o][.] divwu[o][.] divwe[o][.] divweu[o][.] divd
[o][.]
divdu[o][.] divde[o][.] divdeu[o][.]
neg
[o][.]
Version 1.3 October 23, 2012
CPU Programming Model
Page 91 of 864
Page 92
User’s Manual
A2 Processor
2.8.1.3 Integer Logical Instructions
Table 2-13 lists the integer logical instructions in the A2 core. See Integer Arithmetic Instructions on page 91 for an explanation of the “[.]” syntax.
Table 2-13. Integer Logical Instructions
Or with
Com-
ple-
ment
Nor Xor
orc[.] nor[.]
xor[.] xori xoris
Equiva-
lence
eqv[.]
Extend
Sign
extsb[.] extsh[.] extsw[.]
Count
Leading
Zeros
cntlzw[.] cntlzd[.]
Permute Parity
bpermd
prtyw prtyd
And
and[.] andi. andis.
And with Comple-
ment
Nand Or
andc[.] nand[.]
or[.] ori oris
2.8.1.4 Integer Compare Instructions
These instructions perform arithmetic or logical comparisons between two operands and update the CR with the result of the comparison.
Table 2-14 lists the integer compare instructions in the A2 core.
Table 2-14. Integer Compare Instructions
Arithmetic Logical
cmp cmpi cmpb
cmpl cmpli
2.8.1.5 Integer Trap Instructions
Table 2-15 lists the integer trap instructions in the A2 core.
Table 2-15. Integer Trap Instructions
Tr ap
tw twi
td tdi
2.8.1.6 Integer Rotate Instructions
These instructions rotate operands stored in the GPRs. Rotate instructions can also mask rotated operands.
Table 2-16 lists the rotate instructions in the A2 core. See Integer Arithmetic Instructions on page 91 for an explanation of the “[.]” syntax.
CPU Programming Model
Page 92 of 864
Version 1.3
October 23, 2012
Page 93
User’s Manual
A2 Processor
Table 2-16. Integer Rotate Instructions
Rotate and Insert Rotate and Mask Rotate and Clear
rldcl[.]
rlwimi[.] rldimi[.]
rlwinm[.] rlwnm[.]
rldcr[.] rldic[.] rldicl[.] rldicr[.]
2.8.1.7 Integer Shift Instructions
Table 2-17 lists the integer shift instructions in the A2 core. Note that the shift right algebraic instructions implicitly update the XER[CA] field. See Integer Arithmetic Instructions on page 91 for an explanation of the “[.]” syntax.
Table 2-17. Integer Shift Instructions
Shift Left Shift Right
Shift Right
Algebraic
sraw[.] slw[.] sld[.]
srw[.] srd[.]
srawi[.]
srad[.]
sradi[.]
2.8.1.8 Integer Population Count Instructions
Table 2-18 lists the integer population count instructions in the A2 core.
Table 2-18. Integer Population Count Instructions
Pop Count
popcntb popcntw popcntd
2.8.1.9 Integer Select Instruction
Table 2-19 lists the integer select instruction in the A2 core. The RA operand is 0 if the RA field of the instruc­tion is 0; it is the contents of GPR(RA) otherwise.
Table 2-19. Integer Select Instruction
Integer Select
isel
Version 1.3 October 23, 2012
CPU Programming Model
Page 93 of 864
Page 94
User’s Manual
A2 Processor

2.8.2 Branch Instructions

These instructions unconditionally or conditionally branch to an address. Conditional branch instructions can test condition codes set in the CR by a previous instruction and branch accordingly. Conditional branch instructions can also decrement and test the Count Register (CTR) as part of branch determination and can save the return address in the Link Register (LR). The target address for a branch can be a displacement from the current instruction address or an absolute address or contained in the LR or CTR.
See Wait Instruction on page 98 for more information about branch operations.
Table 2-20 lists the branch instructions in the A2 core. In the table, the syntax “[l]” indicates that the instruc- tion has both a “link update” form (which updates LR with the address of the instruction after the branch) and a “nonlink update” form. Similarly, the syntax “[a]” indicates that the instruction has both an “absolute address” form (in which the target address is formed directly using the immediate field specified as part of the instruction) and a “relative” form (in which the target address is formed by adding the specified immediate field to the address of the branch instruction).
Table 2-20. Branch Instructions
Branch
b[l][a] bc[l][a] bcctr[l] bclr[l]

2.8.3 Processor Control Instructions

Processor control instructions manipulate system registers, perform system software linkage, and synchro­nize processor operations. The instructions in these three subcategories of processor control instructions are described below.
2.8.3.1 Condition Register Logical Instructions
These instructions perform logical operations on a specified pair of bits in the CR, placing the result in another specified bit. The benefit of these instructions is that they can logically combine the results of several compar­ison operations without incurring the overhead of conditional branching between each one. Software perfor­mance can significantly improve if multiple conditions are tested at once as part of a branch decision.
Table 2-21 lists the condition register logical instructions in the A2 core.
Table 2-21. Condition Register Logical Instructions
crand crandc creqv crnand
crnor cror crorc crxor
CPU Programming Model
Page 94 of 864
Version 1.3
October 23, 2012
Page 95
User’s Manual
A2 Processor
2.8.3.2 Register Management Instructions
These instructions move data between the GPRs and control registers in the A2 core.
Table 2-22 lists the register management instructions in the A2 core.
Table 2-22. Register Management Instructions
CR DCR
mcrf mcrxr mfcr mfocrf mtcrf mtocrf
1. When CCR2(EN_DCR) is zero, DCR instructions are dropped silently. They are no-ops and do not cause an exception.
mfdcr mfdcrx mfdcrux mtdcr mtdcrx mtdcrux
1
MSR SPR TB
mfmsr mtmsr wrtee
mfspr mtspr
mttb
wrteei
2.8.3.3 System Linkage Instructions
These instructions invoke supervisor software level for system services and return from interrupts.
When executing in the guest state (MSR[GS,PR] = 0b10), execution of an rfi instruction is mapped to rfgi and the rfgi instruction is executed in place of the rfi.
Table 2-23 lists the system linkage instructions in the A2 core.
Table 2-23. System Linkage Instructions
ehpriv rfi rfci rfgi rfmci sc
2.8.3.4 Processor Control Instructions
The msgsnd and msgclr instructions are provided for sending and clearing messages to processors and other devices in the coherence domain. These instructions are hypervisor privileged.
Table 2-28 shows the processor control instructions in the A2 core.
Table 2-24. Processor Control Instruction
msgsnd msgclr

2.8.4 Storage Control Instructions

These instructions manage the instruction and data caches and the TLB of the A2 core. Instructions are also provided to synchronize and order storage accesses. The instructions in these three subcategories of storage control instructions are described in the following sections.
Version 1.3 October 23, 2012
CPU Programming Model
Page 95 of 864
Page 96
User’s Manual
A2 Processor
2.8.4.1 Cache Management Instructions
These instructions control the operation of the data and instruction caches. Instructions are provided to fill, flush, invalidate, or zero data cache blocks, where a block is defined as a 64-byte cache line. Instructions are also provided to fill or invalidate instruction cache blocks.
Table 2-25 lists the cache management instructions in the A2 core.
Table 2-25. Cache Management Instructions
Data Cache Instruction Cache
dcba dcbf dcbi dcbst dcbt
icbi
icbt dcbtst dcbz
icbtls
icblc dcbtls dcbtstls dcblc
Table 2-26. Cache Management Instructions by External Process ID
Data Cache Instruction Cache
dcbstep dcbtep dcbfep
icbiep dcbtstep dcbzep
2.8.4.2 TLB Management Instructions
The TLB management instructions read and write entries of the TLB array and search the TLB array for an entry that will translate a given virtual address.
Table 2-27 lists the TLB management instructions in the A2 core. See Integer Arithmetic Instructions on page 91 for an explanation of the “[.]” syntax.
.
Table 2-27. TLB Management Instructions
tlbre tlbsx[.] tlbsync tlbwe tlbivax
CPU Programming Model
Page 96 of 864
Version 1.3
October 23, 2012
Page 97
User’s Manual
A2 Processor
2.8.4.3 Processor Synchronization Instruction
The processor synchronization instruction, isync, forces the processor to complete all instructions preceding the isync before allowing any context changes as a result of any instructions that follow the isync. Addition­ally, all instructions that follow the isync will execute within the context established by the completion of all the instructions that precede the isync. See Synchronization on page 122 for more information about the synchronizing effect of isync.
Table 2-28 shows the processor synchronization instructions in the A2 core.
Table 2-28. Processor Synchronization Instruction
isync sync
2.8.4.4 Load and Reserve and Store Conditional Instructions
The load and reserve and store conditional instructions can be used to construct a sequence of instructions that appears to perform an atomic update operation on an aligned storage location.
The A2 core implements the exclusive access hint (EH) included in load and reserve instructions.
Table 2-29. Load and Reserve and Store Conditional Instructions
Loads Stores
Word Double Word Double
lwarx ldarx stwcx. stdcx.
2.8.4.5 Storage Synchronization Instructions
The storage synchronization instructions allow software to enforce ordering amongst the storage accesses caused by load and store instructions, which by default are weakly-ordered by the processor. “Weakly­ordered” means that the processor is architecturally permitted to perform loads and stores generally out-of­order with respect to their sequence within the instruction stream, with some exceptions. However, if a storage synchronization instruction is executed, then all storage accesses prompted by instructions preceding the synchronizing instruction must be performed before any storage accesses prompted by instructions that come after the synchronizing instruction. See Synchronization on page 122 for more infor­mation about storage synchronization.
msync is an extended mnemonic for the synchronize instruction so that it can be coded with the L value as part of the mnemonic rather than as a numeric operand.
Table 2-28 shows the storage synchronization instructions in the A2 core.
Table 2-30. Storage Synchronization Instructions
msync mbar
Version 1.3 October 23, 2012
CPU Programming Model
Page 97 of 864
Page 98
User’s Manual
A2 Processor
2.8.4.6 Wait Instruction
The wait instruction allows instruction fetching and execution to be suspended under certain conditions, depending on the value of the WC field. WC = 11 is treated as a no-op instruction. WC = 10 specifies a wake condition determined by the an A2 input signal called an_ac_sleep_en.
Table 2-31 shows the wait instructions in the A2 core.
Table 2-31. Wait Instruction
wait

2.8.5 Initiate Coprocessor Instructions

Initiation of a coprocessor is requested by issuing the Initiate Coprocessor Store Word Indexed (icswx) instruction. A coprocessor is not a standard processor, but instead is a specialized processor that is capable of one or more particular tasks with the intent to provide acceleration of each task that might have otherwise been done by the program. See Section 12.5 Coprocessor Instructions on page 513.
Table 2-32 shows the icswx instructions in the A2 core.
Table 2-32. Initiate Coprocessor Instructions
icswx[.] icswepx[.]
2.8.5.1 Cache Initialization Instructions
The dci and ici instructions are privileged instructions, and if executed in supervisor mode they will flash invalidate the entire associated cache. They do not generate an address, nor are they affected by the access control mechanism.
Table 2-28 shows the cache initialization instructions in the A2 core.
Table 2-33. Cache Initialization Instructions
dci ici
The dci and ici instructions have a CT field. The following describes the affects of the CT field.
• CT = 0 indicates L1 only. The L1 cache will be invalidated and request is not sent to the L2.
• CT = 2 indicates L1 and L2. The L1 cache will be invalidated and request is sent to the L2.
• CT != 0,2 indicates a no-op. No L1 caches are invalidated and the request is not sent to the L2.
CPU Programming Model
Page 98 of 864
Version 1.3
October 23, 2012
Page 99
User’s Manual
A2 Processor

2.9 Branch Processing

The four branch instructions provided by A2 core are summarized in Table 2.8.2 on page 94. The following sections provide additional information about branch addressing, instruction fields, prediction, and registers.

2.9.1 Branch Addressing

The branch instruction (b[l][a]) specifies the displacement of the branch target address as a 26-bit value (the 24-bit LI field right-extended with 0b00). This displacement is regarded as a signed 26-bit number covering an address range of 32 MB. Similarly, the branch conditional instruction (bc[l][a]) specifies the displacement as a 16-bit value (the 14-bit BD field right-extended with 0b00). This displacement covers an address range of 32 KB.
For the relative form of the branch and branch conditional instructions (b[l] and bc[l], with instruction field AA = 0), the target address is the address of the branch instruction itself (the current instruction address, or CIA) plus the signed displacement. This address calculation is defined to “wrap around” from the maximum effective address (0xFFFF_FFFF_FFFF_FFFF) to 0x0000_0000_0000_0000 and vice-versa.
For the absolute form of the branch and branch conditional instructions (ba[l] and bca[l], with instruction field AA = 1), the target address is the sign-extended displacement. This means that with absolute forms of the branch and branch conditional instructions, the branch target can be within the first or last 32 MB or 32 KB of the address space, respectively.
The other two branch instructions, bclr (branch conditional to LR) and bcctr (branch conditional to CTR), do not use absolute or relative addressing. Instead, they use indirect addressing, in which the target of the branch is specified indirectly as the contents of the LR or CTR.

2.9.2 Branch Instruction BI Field

Conditional branch instructions can optionally test one bit of the CR, as indicated by instruction field BO[0] (see Section 2.9.3). The value of instruction field BI specifies the CR bit to be tested (32-63). The BI field is ignored if BO[0] = 1. The branch (b[l][a]) instruction is by definition unconditional; hence, it does not have a BI instruction field. Instead, the position of this field is part of the LI displacement field.

2.9.3 Branch Instruction BO Field

The BO field specifies the condition under which a conditional branch is taken and whether the branch decre­ments the CTR as shown in Table 2-34. In the table, M = 0 in 64-bit mode and M = 32 in 32-bit mode. The branch (b[l][a]) instruction is by definition unconditional; hence, it does not have a BO instruction field. Instead, the position of this field is part of the LI displacement field.
Conditional branch instructions can optionally test one bit in the CR. This option is selected when BO[0] = 0. If BO[0] = 1, the CR does not participate in the branch condition test. If the CR condition option is selected, the condition is satisfied (branch can occur) if the CR bit selected by the BI instruction field matches BO[1].
Conditional branch instructions can also optionally decrement the CTR by one and test whether the decre­mented value is 0. This option is selected when BO[2] = 0. If BO[2] = 1, the CTR is not decremented and does not participate in the branch condition test. If the CTR decrement option is selected, BO[3] specifies the condition that must be satisfied to allow the branch to be taken. If BO[3] = 0, CTR 0 is required for the branch to occur. If BO[3] = 1, CTR = 0 is required for the branch to occur.
Version 1.3 October 23, 2012
CPU Programming Model
Page 99 of 864
Page 100
User’s Manual
A2 Processor
Table 2-34. BO Field Encodings
BO Description Description
0000z Decrement the CTR, then branch if the decremented CTRM:63 neq 0 and CRBI = 0.
0001z Decrement the CTR, then branch if the decremented CTRM:63 = 0 and CRBI = 0.
001at Branch if CRBI = 0.
0100z Decrement the CTR, then branch if the decremented CTRM:63 neq 0 and CRBI = 1.
0101z Decrement the CTR, then branch if the decremented CTRM:63 = 0 and CRBI = 1.
011at Branch if CRBI = 1.
1a00t Decrement the CTR, then branch if the decremented CTRM:63 neq 0.
1a01t Decrement the CTR, then branch if the decremented CTRM:63 = 0.
1z1zz Branch always.
Notes:
1. ‘z’ denotes a bit that is ignored.
2. The ‘a’ and ‘t’ bits are used as described in Table 2-35 on page 100.
The “a” and “t” bits of the BO field can be used by software to provide a hint about whether the branch is likely to be taken or is likely not to be taken, as shown in Table 2-35.
Table 2-35. ‘at’ Bit Encodings
at Hint
00 No hint is given.
01 Reserved.
10 The branch is very likely not to be taken.
11 The branch is very likely to be taken.
This implementation has dynamic mechanisms for predicting whether a branch will be taken. Because the dynamic prediction is likely to be very accurate and is likely to be overridden by any hint provided by the “at” bits, the “at” bits should be set to 0b00 unless the static prediction implied by at = 0b10 or at = 0b11 is highly likely to be correct.

2.9.4 Branch Prediction

The following sections detail the methods by which the branch predictor decodes incoming branches, gener­ates predictions for both the direction and target of these branches, and guides instruction flow based on these predictions.
2.9.4.1 Branch Decoder
Before the branch predictor itself, every instruction cache line is passed through the branch decoder. The primary purpose of the branch decoder is to identify any valid branch instructions contained within the cache line. Valid branches include b, bc, bclr, bcctr, and their derivatives.
The branch decoder also decodes any hints contained within the branch instructions. Hints can be specified for any branch conditional instruction (bc, bclr, bcctr, and their derivatives). Hints are encoded in the branch instruction's BO field.
CPU Programming Model
Page 100 of 864
Version 1.3
October 23, 2012
Loading...